This module defines kernel_device type and its type bound procedures. It extends abstract_kernel type and implements its type bound procedures.
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| integer(kind=int32), | public, | parameter | :: | DEF_TILE_SIZE | = | 32 |
Default tile size |
Device kernel class
| Type | Visibility | Attributes | Name | Initial | |||
|---|---|---|---|---|---|---|---|
| logical, | public | :: | is_created | = | .false. |
Kernel is created flag. |
|
| logical, | public | :: | is_dummy | = | .false. |
If kernel should do anything or not. |
|
| type(kernel_type_t), | public | :: | kernel_type |
Type of the kernel |
|||
| character(len=:), | public, | allocatable | :: | kernel_string | |||
| integer(kind=int32), | public, | allocatable | :: | neighbor_data(:,:) |
Neighbor data for pipelined unpacking |
||
| integer(kind=int32), | public, | allocatable | :: | dims(:) |
Local dimensions to process |
||
| type(kernel_type_t), | private | :: | internal_kernel_type |
Actual kernel type used for execution, can be different from |
|||
| type(CUfunction), | private | :: | cuda_kernel |
Pointer to CUDA kernel. |
|||
| integer(kind=int32), | private | :: | tile_size |
Tile size used for this kernel |
|||
| integer(kind=int32), | private | :: | block_rows |
Number of rows in each block processed by each thread |
|||
| integer(kind=int64), | private | :: | copy_bytes |
Number of bytes to copy for |
| procedure, public, pass(self) :: create | ../../ Creates kernel |
| procedure, public, pass(self) :: execute | ../../ Executes kernel |
| procedure, public, pass(self) :: destroy | ../../ Destroys kernel |
| procedure, public :: create_private => create | ../../ Creates kernel |
| procedure, public :: execute_private => execute | ../../ Executes kernel |
| procedure, public :: destroy_private => destroy | ../../ Destroys kernel |
Creates kernel
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(kernel_device), | intent(inout) | :: | self |
Device kernel class |
||
| type(dtfft_effort_t), | intent(in) | :: | effort |
Effort level for generating transpose kernels |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| logical, | intent(in), | optional | :: | force_effort |
Should effort be forced or not |
Executes kernel on stream
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(kernel_device), | intent(inout) | :: | self |
Device kernel class |
||
| real(kind=real32), | intent(in), | target | :: | in(:) |
Device pointer |
|
| real(kind=real32), | intent(inout), | target | :: | out(:) |
Device pointer |
|
| type(dtfft_stream_t), | intent(in) | :: | stream |
Stream to execute on |
||
| integer(kind=int32), | intent(in), | optional | :: | neighbor |
Source rank for pipelined unpacking |
Destroys kernel
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| class(kernel_device), | intent(inout) | :: | self |
Device kernel class |
Populates kernel arguments based on kernel type
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel |
||
| integer(kind=int32), | intent(in) | :: | dims(:) |
Local dimensions to process |
||
| integer(kind=int32), | intent(out) | :: | nargs |
Number of arguments set by this subroutine |
||
| integer(kind=int32), | intent(out) | :: | args(MAX_KERNEL_ARGS) |
Kernel arguments |
||
| integer(kind=int32), | intent(in), | optional | :: | neighbor_data(:) |
Neighbor data for pipelined kernels |
Computes kernel launch parameters based on kernel type and dimensions
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel |
||
| integer(kind=int32), | intent(in) | :: | dims(:) |
Local dimensions to process |
||
| integer(kind=int32), | intent(in) | :: | tile_size |
Size of the tile in shared memory |
||
| integer(kind=int32), | intent(in) | :: | block_rows |
Number of rows in each block |
||
| type(dim3), | intent(out) | :: | blocks |
Number of blocks to launch |
||
| type(dim3), | intent(out) | :: | threads |
Number of threads per block |
Compiles kernel and caches it. Returns compiled kernel.
| Type | Intent | Optional | Attributes | Name | ||
|---|---|---|---|---|---|---|
| integer(kind=int32), | intent(in) | :: | dims(:) |
Local dimensions to process |
||
| type(kernel_type_t), | intent(in) | :: | kernel_type |
Type of kernel to build |
||
| type(dtfft_effort_t), | intent(in) | :: | effort |
How thoroughly |
||
| integer(kind=int64), | intent(in) | :: | base_storage |
Number of bytes needed to store single element |
||
| type(device_props), | intent(in) | :: | props |
GPU architecture properties |
||
| integer(kind=int32), | intent(out) | :: | tile_size |
Size of the tile in shared memory |
||
| integer(kind=int32), | intent(out) | :: | block_rows |
Number of rows in each block processed by each thread |
||
| type(CUfunction), | intent(out) | :: | kernel |
Compiled kernel to return |
||
| logical, | intent(in), | optional | :: | force_effort |
Should effort be forced or not |
|
| integer(kind=int32), | intent(in), | optional | :: | neighbor_data(:) |
Neighbor data for pipelined kernels |