Module for managing nvRTC compiled CUDA kernels Each module has only one templated kernel that can be instantiated with different parameters
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=*), | private, | parameter | :: | DEFAULT_KERNEL_NAME | = | "dtfft_kernel" |
Basic kernel name |
Class for managing nvRTC compiled CUDA kernels
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| logical, | private | :: | is_created | = | .false. |
Is module created |
|
| character(len=:), | private, | allocatable | :: | basic_name |
Basic kernel name |
||
| integer(kind=int32), | private | :: | ndims |
Number of dimensions, used only for forward permutation |
|||
| type(CUmodule), | private | :: | cumod |
CUDA module |
|||
| type(nvrtcProgram), | private | :: | prog |
nvRTC program |
|||
| type(kernel_type_t), | private | :: | kernel_type |
Type of kernel |
|||
| integer(kind=int64), | private | :: | base_storage |
Number of bytes needed to store single element |
|||
| type(kernel_config), | private, | allocatable | :: | configs(:) |
Kernel configurations that this module was compiled for |
| procedure, public, pass(self) :: create | ../../ Creates module with given parameters |
| procedure, public, pass(self) :: destroy | ../../ Destroys module and frees resources |
| procedure, public, pass(self) :: get | ../../ Returns kernel ready to be executed |
| generic, public :: check => check_instance, check_module | ../../ Checks if kernel is with given parameters is available in this module |
| procedure, private, pass(self) :: check_instance | ../../ Checks if kernel with given parameters is available in this module |
| procedure, private, pass(self) :: check_module | ../../ Basic check that this module provides kernels of given type |
Class for generating CUDA code
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| character(len=:), | public, | allocatable | :: | raw |
String |
| procedure, public, pass(self) :: destroy => destroy_string | |
| procedure, public, pass(self) :: add => add_line | ../../ Adds new line to CUDA code |
Returns kernel ready to be executed
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(nvrtc_module), | intent(in) | :: | self |
This module |
||
| integer(kind=int32), | intent(in) | :: | ndims |
Number of dimensions, used only for forward permutation |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to build |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| integer(kind=int32), | intent(in) | :: | tile_size |
Size of shared memory tile, template parameter |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows processed by single thread, template parameter |
Resulting kernel
Checks if kernel with given parameters is available in this module
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(nvrtc_module), | intent(in) | :: | self |
This module |
||
| integer(kind=int32), | intent(in) | :: | ndims |
Number of dimensions |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to build |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| integer(kind=int32), | intent(in) | :: | tile_size |
Size of shared memory tile, template parameter |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows processed by single thread, template parameter |
Basic check that this module provides kernels of given type
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(nvrtc_module), | intent(in) | :: | self |
This module |
||
| integer(kind=int32), | intent(in) | :: | ndims |
Number of dimensions |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to build |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
Compiles nvRTC program with given configurations
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(codegen_t), | intent(in) | :: | code |
CUDA code to compile |
||
| character(len=*), | intent(in) | :: | prog_name |
Basic kernel name |
||
| type(kernel_config), | intent(in) | :: | configs(:) |
Kernel configurations that this module should be compiled for |
||
| type(device_props), | intent(in) | :: | props |
GPU architecture properties |
Resulting nvRTC program
Generates name expression for given template parameters
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | basic_name |
Basic kernel name |
||
| integer(kind=int32), | intent(in) | :: | tile_dim |
Size of shared memory tile, template parameter |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows processed by single thread, template parameter |
||
| integer(kind=int32), | intent(in) | :: | padding |
Padding to avoid shared memory bank conflicts, template parameter |
Resulting name expression
Gets mangled name for given template parameters from nvRTC program
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | basic_name |
Basic kernel name |
||
| type(nvrtcProgram), | intent(in) | :: | prog |
nvRTC program |
||
| integer(kind=int32), | intent(in) | :: | tile_dim |
Size of shared memory tile, template parameter |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows processed by single thread, template parameter |
||
| integer(kind=int32), | intent(in) | :: | padding |
Padding to avoid shared memory bank conflicts, template parameter |
Mangled kernel name
Generates code that will be used to locally tranpose data and prepares to send it to other processes
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| character(len=*), | intent(in) | :: | kernel_name |
Name of CUDA kernel |
||
| integer(kind=int32), | intent(in) | :: | ndims |
Number of dimensions |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to generate code for |
Resulting code
Adds new line to CUDA code
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(codegen_t), | intent(inout) | :: | self |
Kernel code |
||
| character(len=*), | intent(in) | :: | line |
Line to add |
Creates module with given parameters, compiles nvRTC program and loads it as CUDA module
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(nvrtc_module), | intent(inout) | :: | self |
This module |
||
| integer(kind=int32), | intent(in) | :: | ndims |
Number of dimensions, used only for forward permutation |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to build |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| type(kernel_config), | intent(in) | :: | configs(:) |
Kernel configurations that this module should be compiled for |
||
| type(device_props), | intent(in) | :: | props |
GPU architecture properties |
Destroys module and frees resources
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(nvrtc_module), | intent(inout) | :: | self |
Sets name expression for given template parameters to nvRTC program
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(nvrtcProgram), | intent(in) | :: | prog |
nvRTC program |
||
| character(len=*), | intent(in) | :: | basic_name |
Basic kernel name |
||
| integer(kind=int32), | intent(in) | :: | tile_dim |
Size of shared memory tile, template parameter |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows processed by single thread, template parameter |
||
| integer(kind=int32), | intent(in) | :: | padding |
Padding to avoid shared memory bank conflicts, template parameter |