dtfft_config_t Derived Type

type, public, bind(C) :: dtfft_config_t

Type that can be used to set additional configuration parameters to dtFFT


Inherits

type~~dtfft_config_t~~InheritsGraph type~dtfft_config_t dtfft_config_t type~dtfft_backend_t dtfft_backend_t type~dtfft_config_t->type~dtfft_backend_t backend type~dtfft_platform_t dtfft_platform_t type~dtfft_config_t->type~dtfft_platform_t platform type~dtfft_stream_t dtfft_stream_t type~dtfft_config_t->type~dtfft_stream_t stream c_ptr c_ptr type~dtfft_stream_t->c_ptr stream

Components

Type Visibility Attributes Name Initial
logical(kind=c_bool), public :: enable_log

Should dtFFT print additional information during plan creation or not.

Default is false.

logical(kind=c_bool), public :: enable_z_slab

Should dtFFT use Z-slab optimization or not.

Default is true.

One should consider disabling Z-slab optimization in order to resolve DTFFT_ERROR_VKFFT_R2R_2D_PLAN error OR when underlying FFT implementation of 2D plan is too slow. In all other cases it is considered that Z-slab is always faster, since it reduces number of data transpositions.

integer(kind=c_int32_t), public :: n_measure_warmup_iters

Number of warmup iterations to execute when effort level is higher or equal to DTFFT_MEASURE

Default is 2.

integer(kind=c_int32_t), public :: n_measure_iters

Number of iterations to execute when effort level is higher or equal to DTFFT_MEASURE

Default is 5. When dtFFT is built with CUDA support, this value also used to determine number of iterations when selecting block of threads for NVRTC transpose kernel

type(dtfft_platform_t), public :: platform

Selects platform to execute plan.

Default is DTFFT_PLATFORM_HOST

This option is only defined with device support build. Even when dtFFT is build with device support it does not nessasary means that all plans must be related to device. This enables single library installation to be compiled with both host, CUDA and HIP plans.

type(dtfft_stream_t), public :: stream

Main CUDA stream that will be used in dtFFT.

This parameter is a placeholder for user to set custom stream.

Stream that is actually used by dtFFT plan is returned by plan%get_stream function.

When user sets stream he is responsible of destroying it.

Stream must not be destroyed before call to plan%destroy.

type(dtfft_backend_t), public :: backend

Backend that will be used by dtFFT when effort is DTFFT_ESTIMATE or DTFFT_MEASURE.

Default is DTFFT_GPU_BACKEND_NCCL if NCCL is enabled, otherwise DTFFT_BACKEND_MPI_P2P.

logical(kind=c_bool), public :: enable_mpi_backends

Should MPI GPU Backends be enabled when effort is DTFFT_PATIENT or not.

Default is false.

MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely. For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.

One of the workarounds is to disable MPI Backends by default, which is done here.

Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms

logical(kind=c_bool), public :: enable_pipelined_backends

Should pipelined GPU backends be enabled when effort is DTFFT_PATIENT or not.

Default is true.

Pipelined backends require additional buffer that user has no control over.

logical(kind=c_bool), public :: enable_nccl_backends

Should NCCL Backends be enabled when effort is DTFFT_PATIENT or not.

Default is true.

logical(kind=c_bool), public :: enable_nvshmem_backends

Should NVSHMEM Backends be enabled when effort is DTFFT_PATIENT or not.

Default is true.

logical(kind=c_bool), public :: enable_kernel_optimization

Should dtFFT try to optimize NVRTC kernel block size when effort is DTFFT_PATIENT or not.

Default is true.

This option is only defined when dtFFT is built with CUDA support.

Enabling this option will make autotuning process longer, but may result in better performance for some problem sizes. It is recommended to keep this option enabled.

integer(kind=c_int32_t), public :: n_configs_to_test

Number of top theoretical best performing blocks of threads to test for transposition kernels when effort is DTFFT_PATIENT or force_kernel_optimization set to true.

Default is 5.

This option is only defined when dtFFT is built with CUDA support.

It is recommended to keep this value between 3 and 10. Maximum possible value is 25. Setting this value to zero or one will disable kernel optimization.

logical(kind=c_bool), public :: force_kernel_optimization

Whether to force kernel optimization when effort is not DTFFT_PATIENT.

Default is false.

This option is only defined when dtFFT is built with CUDA support.

Enabling this option will make plan creation process longer, but may result in better performance for a long run. Since kernel optimization is performed without data transfers, the overall autotuning time increase should not be significant.


Constructor

public interface dtfft_config_t

Interface to create a new configuration

  • private pure function config_constructor(enable_log, enable_z_slab, n_measure_warmup_iters, n_measure_iters, platform, stream, backend, enable_mpi_backends, enable_pipelined_backends, enable_nccl_backends, enable_nvshmem_backends, enable_kernel_optimization, n_configs_to_test, force_kernel_optimization) result(config)

    Creates a new configuration

    Arguments

    Type IntentOptional Attributes Name
    logical, intent(in), optional :: enable_log

    Should dtFFT use Z-slab optimization or not.

    logical, intent(in), optional :: enable_z_slab

    Should dtFFT use Z-slab optimization or not.

    integer(kind=int32), intent(in), optional :: n_measure_warmup_iters

    Number of warmup iterations for measurements

    integer(kind=int32), intent(in), optional :: n_measure_iters

    Number of measurement iterations

    type(dtfft_platform_t), intent(in), optional :: platform

    Selects platform to execute plan.

    type(dtfft_stream_t), intent(in), optional :: stream

    Main CUDA stream that will be used in dtFFT.

    type(dtfft_backend_t), intent(in), optional :: backend

    Backend that will be used by dtFFT when effort is DTFFT_ESTIMATE or DTFFT_MEASURE.

    logical, intent(in), optional :: enable_mpi_backends

    Should MPI GPU Backends be enabled when effort is DTFFT_PATIENT or not.

    logical, intent(in), optional :: enable_pipelined_backends

    Should pipelined GPU backends be enabled when effort is DTFFT_PATIENT or not.

    logical, intent(in), optional :: enable_nccl_backends

    Should NCCL Backends be enabled when effort is DTFFT_PATIENT or not.

    logical, intent(in), optional :: enable_nvshmem_backends

    Should NVSHMEM Backends be enabled when effort is DTFFT_PATIENT or not.

    logical, intent(in), optional :: enable_kernel_optimization

    Should dtFFT try to optimize NVRTC kernel block size during autotune or not.

    integer(kind=int32), intent(in), optional :: n_configs_to_test

    Number of top theoretical best performing blocks of threads to test for transposition kernels when effort is DTFFT_PATIENT.

    logical, intent(in), optional :: force_kernel_optimization

    Whether to force kernel optimization when effort is not DTFFT_PATIENT.

    Return Value type(dtfft_config_t)

    Constructed dtFFT config ready to be set by call to dtfft_set_config