Type that can be used to set additional configuration parameters to dtFFT
Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
logical(kind=c_bool), | public | :: | enable_log |
Should dtFFT print additional information during plan creation or not. Default is false. |
|||
logical(kind=c_bool), | public | :: | enable_z_slab |
Should dtFFT use Z-slab optimization or not. Default is true. One should consider disabling Z-slab optimization in order to resolve |
|||
integer(kind=c_int32_t), | public | :: | n_measure_warmup_iters |
Number of warmup iterations to execute when effort level is higher or equal to Default is 2. |
|||
integer(kind=c_int32_t), | public | :: | n_measure_iters |
Number of iterations to execute when effort level is higher or equal to Default is 5.
When |
|||
type(dtfft_platform_t), | public | :: | platform |
Selects platform to execute plan. Default is DTFFT_PLATFORM_HOST This option is only defined with device support build. Even when dtFFT is build with device support it does not nessasary means that all plans must be related to device. This enables single library installation to be compiled with both host, CUDA and HIP plans. |
|||
type(dtfft_stream_t), | public | :: | stream |
Main CUDA stream that will be used in dtFFT. This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by When user sets stream he is responsible of destroying it. Stream must not be destroyed before call to |
|||
type(dtfft_backend_t), | public | :: | backend |
Backend that will be used by dtFFT when Default is |
|||
logical(kind=c_bool), | public | :: | enable_mpi_backends |
Should MPI GPU Backends be enabled when Default is false. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely. For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs. One of the workarounds is to disable MPI Backends by default, which is done here. Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to |
|||
logical(kind=c_bool), | public | :: | enable_pipelined_backends |
Should pipelined GPU backends be enabled when Default is true. Pipelined backends require additional buffer that user has no control over. |
|||
logical(kind=c_bool), | public | :: | enable_nccl_backends |
Should NCCL Backends be enabled when Default is true. |
|||
logical(kind=c_bool), | public | :: | enable_nvshmem_backends |
Should NVSHMEM Backends be enabled when Default is true. |
|||
logical(kind=c_bool), | public | :: | enable_kernel_optimization |
Should dtFFT try to optimize NVRTC kernel block size when Default is true. This option is only defined when dtFFT is built with CUDA support. Enabling this option will make autotuning process longer, but may result in better performance for some problem sizes. It is recommended to keep this option enabled. |
|||
integer(kind=c_int32_t), | public | :: | n_configs_to_test |
Number of top theoretical best performing blocks of threads to test for transposition kernels
when Default is 5. This option is only defined when dtFFT is built with CUDA support. It is recommended to keep this value between 3 and 10. Maximum possible value is 25. Setting this value to zero or one will disable kernel optimization. |
|||
logical(kind=c_bool), | public | :: | force_kernel_optimization |
Whether to force kernel optimization when Default is false. This option is only defined when dtFFT is built with CUDA support. Enabling this option will make plan creation process longer, but may result in better performance for a long run. Since kernel optimization is performed without data transfers, the overall autotuning time increase should not be significant. |
Interface to create a new configuration
Creates a new configuration
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
logical, | intent(in), | optional | :: | enable_log |
Should dtFFT use Z-slab optimization or not. |
|
logical, | intent(in), | optional | :: | enable_z_slab |
Should dtFFT use Z-slab optimization or not. |
|
integer(kind=int32), | intent(in), | optional | :: | n_measure_warmup_iters |
Number of warmup iterations for measurements |
|
integer(kind=int32), | intent(in), | optional | :: | n_measure_iters |
Number of measurement iterations |
|
type(dtfft_platform_t), | intent(in), | optional | :: | platform |
Selects platform to execute plan. |
|
type(dtfft_stream_t), | intent(in), | optional | :: | stream |
Main CUDA stream that will be used in dtFFT. |
|
type(dtfft_backend_t), | intent(in), | optional | :: | backend |
Backend that will be used by dtFFT when |
|
logical, | intent(in), | optional | :: | enable_mpi_backends |
Should MPI GPU Backends be enabled when |
|
logical, | intent(in), | optional | :: | enable_pipelined_backends |
Should pipelined GPU backends be enabled when |
|
logical, | intent(in), | optional | :: | enable_nccl_backends |
Should NCCL Backends be enabled when |
|
logical, | intent(in), | optional | :: | enable_nvshmem_backends |
Should NVSHMEM Backends be enabled when |
|
logical, | intent(in), | optional | :: | enable_kernel_optimization |
Should dtFFT try to optimize NVRTC kernel block size during autotune or not. |
|
integer(kind=int32), | intent(in), | optional | :: | n_configs_to_test |
Number of top theoretical best performing blocks of threads to test for transposition kernels when |
|
logical, | intent(in), | optional | :: | force_kernel_optimization |
Whether to force kernel optimization when |
Constructed dtFFT
config ready to be set by call to dtfft_set_config