Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
integer(kind=int32), | public, | parameter | :: | N_TILES_CANDIDATES | = | 5 |
Maximum number of tile candidates to generate |
integer(kind=int32), | public, | parameter | :: | N_BLOCKS_CANDIDATES | = | 5 |
Maximum number of block candidates to generate |
integer(kind=int32), | public, | parameter | :: | N_CANDIDATES | = | N_TILES_CANDIDATES*N_BLOCKS_CANDIDATES |
Maximum number of candidates to generate |
integer(kind=int32), | private, | parameter | :: | NUM_BANKS | = | 32 |
Number of banks in shared memory |
integer(kind=int32), | private, | parameter | :: | WARP_SIZE | = | 32 |
Warp size in threads |
integer(kind=int32), | private, | parameter | :: | BANK_WIDTH_BYTES | = | 4 |
Bank width in bytes |
Counts bank conflicts for a given tile size, padding, element size, and block rows.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | tile_size |
Size of the tile |
||
integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows in the block |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
integer(kind=int32), | intent(in) | :: | padding |
Padding added to the tile |
Total number of bank conflicts
This function evaluates the performance of a kernel configuration based on various architectural and problem-specific parameters.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | dims(:) |
Problem dimensions |
||
type(dtfft_transpose_t), | intent(in) | :: | transpose_type |
Type of transposition to perform |
||
type(kernel_config), | intent(in) | :: | config |
Kernel configuration |
||
type(device_props), | intent(in) | :: | props |
GPU architecture properties |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Performance score
Estimates the optimal padding for a given tile size and element size
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | tile_size |
Size of the tile |
||
integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows in the block |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Optimal padding to reduce bank conflicts
Estimates the bank conflict ratio for a given kernel configuration
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(kernel_config), | intent(in) | :: | config |
Kernel configuration |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Bank conflict estimation
Calculates theoretical occupancy for a given kernel configuration
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(kernel_config), | intent(in) | :: | config |
Kernel configuration |
||
type(device_props), | intent(in) | :: | props |
GPU architecture properties |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Estimated occupancy
Analytical estimation of memory pressure based on GPU architecture
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | dims(:) |
Size of the problem |
||
integer(kind=int32), | intent(in) | :: | tile_dim |
Tile dimension |
||
integer(kind=int32), | intent(in) | :: | other_dim |
Other dimension (not tiled) |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
type(device_props), | intent(in) | :: | props |
GPU architecture properties |
Pressure metric
Estimate memory coalescing efficiency for a given kernel configuration and transpose type
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | dims(:) |
Local dimensions of the input data |
||
type(dtfft_transpose_t), | intent(in) | :: | transpose_type |
Type of transpose operation |
||
type(kernel_config), | intent(in) | :: | config |
Kernel configuration |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Coalescing score
Generate kernel configuration candidates for given problem
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(in) | :: | dims(:) |
Local dimensions of the input data, always 3D |
||
integer(kind=int32), | intent(in) | :: | tile_dim |
Tile dimension |
||
integer(kind=int32), | intent(in) | :: | other_dim |
Other dimension (not tiled) |
||
integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
type(device_props), | intent(in) | :: | props |
GPU architecture properties |
||
type(kernel_config), | intent(out) | :: | candidates(:) |
Generated kernel configurations |
||
integer(kind=int32), | intent(out) | :: | num_candidates |
Number of generated candidates |
Sorting candidates by their performance scores
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=real32), | intent(in) | :: | scores(:) |
Performance scores of candidates generated by |
||
integer(kind=int32), | intent(in) | :: | num_candidates |
Number of candidates |
||
integer(kind=int32), | intent(out) | :: | sorted_indices(:) |
Sorted indices of candidates |
This subroutine optimizes the tile size and number of rows for narrow matrices by adjusting them to be compatible with the warp size.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=int32), | intent(inout) | :: | base_tile |
< Tile size |
||
integer(kind=int32), | intent(inout) | :: | base_rows |
< Number of rows |