dtfft_config_t Derived Type

type, public, bind(C) :: dtfft_config_t

Type that can be used to set additional configuration parameters to dtFFT

Inherits

Help

Components

Type	Visibility	Attributes		Name		Initial
logical(kind=c_bool),	public		::	enable_log			Should dtFFT print additional information during plan creation or not. Default is false.
logical(kind=c_bool),	public		::	enable_z_slab			Should dtFFT use Z-slab optimization or not. Default is true. One should consider disabling Z-slab optimization in order to resolve `DTFFT_ERROR_VKFFT_R2R_2D_PLAN` error OR when underlying FFT implementation of 2D plan is too slow. In all other cases it is considered that Z-slab is always faster, since it reduces number of data transpositions.
integer(kind=c_int32_t),	public		::	n_measure_warmup_iters			Number of warmup iterations to execute when effort level is higher or equal to `DTFFT_MEASURE` Default is 2.
integer(kind=c_int32_t),	public		::	n_measure_iters			Number of iterations to execute when effort level is higher or equal to `DTFFT_MEASURE` Default is 5. When `dtFFT` is built with CUDA support, this value also used to determine number of iterations when selecting block of threads for NVRTC transpose kernel
type(dtfft_platform_t),	public		::	platform			Selects platform to execute plan. Default is DTFFT_PLATFORM_HOST This option is only defined with device support build. Even when dtFFT is build with device support it does not nessasary means that all plans must be related to device. This enables single library installation to be compiled with both host, CUDA and HIP plans.
type(dtfft_stream_t),	public		::	stream			Main CUDA stream that will be used in dtFFT. This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by `plan%get_stream` function. When user sets stream he is responsible of destroying it. Stream must not be destroyed before call to `plan%destroy`.
type(dtfft_backend_t),	public		::	backend			Backend that will be used by dtFFT when `effort` is `DTFFT_ESTIMATE` or `DTFFT_MEASURE`. Default is `DTFFT_GPU_BACKEND_NCCL` if NCCL is enabled, otherwise `DTFFT_BACKEND_MPI_P2P`.
logical(kind=c_bool),	public		::	enable_mpi_backends			Should MPI GPU Backends be enabled when `effort` is `DTFFT_PATIENT` or not. Default is false. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely. For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs. One of the workarounds is to disable MPI Backends by default, which is done here. Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to `mpiexec`, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms
logical(kind=c_bool),	public		::	enable_pipelined_backends			Should pipelined GPU backends be enabled when `effort` is `DTFFT_PATIENT` or not. Default is true. Pipelined backends require additional buffer that user has no control over.
logical(kind=c_bool),	public		::	enable_nccl_backends			Should NCCL Backends be enabled when `effort` is `DTFFT_PATIENT` or not. Default is true.
logical(kind=c_bool),	public		::	enable_nvshmem_backends			Should NVSHMEM Backends be enabled when `effort` is `DTFFT_PATIENT` or not. Default is true.
logical(kind=c_bool),	public		::	enable_kernel_optimization			Should dtFFT try to optimize NVRTC kernel block size when `effort` is `DTFFT_PATIENT` or not. Default is true. This option is only defined when dtFFT is built with CUDA support. Enabling this option will make autotuning process longer, but may result in better performance for some problem sizes. It is recommended to keep this option enabled.
integer(kind=c_int32_t),	public		::	n_configs_to_test			Number of top theoretical best performing blocks of threads to test for transposition kernels when `effort` is `DTFFT_PATIENT` or `force_kernel_optimization` set to `true`. Default is 5. This option is only defined when dtFFT is built with CUDA support. It is recommended to keep this value between 3 and 10. Maximum possible value is 25. Setting this value to zero or one will disable kernel optimization.
logical(kind=c_bool),	public		::	force_kernel_optimization			Whether to force kernel optimization when `effort` is not `DTFFT_PATIENT`. Default is false. This option is only defined when dtFFT is built with CUDA support. Enabling this option will make plan creation process longer, but may result in better performance for a long run. Since kernel optimization is performed without data transfers, the overall autotuning time increase should not be significant.

Constructor

public interface dtfft_config_t

Interface to create a new configuration

private pure function config_constructor(enable_log, enable_z_slab, n_measure_warmup_iters, n_measure_iters, platform, stream, backend, enable_mpi_backends, enable_pipelined_backends, enable_nccl_backends, enable_nvshmem_backends, enable_kernel_optimization, n_configs_to_test, force_kernel_optimization) result(config)

Creates a new configuration

Arguments

Type	Intent	Optional	Attributes		Name
logical,	intent(in),	optional		::	enable_log	Should dtFFT use Z-slab optimization or not.
logical,	intent(in),	optional		::	enable_z_slab	Should dtFFT use Z-slab optimization or not.
integer(kind=int32),	intent(in),	optional		::	n_measure_warmup_iters	Number of warmup iterations for measurements
integer(kind=int32),	intent(in),	optional		::	n_measure_iters	Number of measurement iterations
type(dtfft_platform_t),	intent(in),	optional		::	platform	Selects platform to execute plan.
type(dtfft_stream_t),	intent(in),	optional		::	stream	Main CUDA stream that will be used in dtFFT.
type(dtfft_backend_t),	intent(in),	optional		::	backend	Backend that will be used by dtFFT when `effort` is `DTFFT_ESTIMATE` or `DTFFT_MEASURE`.
logical,	intent(in),	optional		::	enable_mpi_backends	Should MPI GPU Backends be enabled when `effort` is `DTFFT_PATIENT` or not.
logical,	intent(in),	optional		::	enable_pipelined_backends	Should pipelined GPU backends be enabled when `effort` is `DTFFT_PATIENT` or not.
logical,	intent(in),	optional		::	enable_nccl_backends	Should NCCL Backends be enabled when `effort` is `DTFFT_PATIENT` or not.
logical,	intent(in),	optional		::	enable_nvshmem_backends	Should NVSHMEM Backends be enabled when `effort` is `DTFFT_PATIENT` or not.
logical,	intent(in),	optional		::	enable_kernel_optimization	Should dtFFT try to optimize NVRTC kernel block size during autotune or not.
integer(kind=int32),	intent(in),	optional		::	n_configs_to_test	Number of top theoretical best performing blocks of threads to test for transposition kernels when `effort` is `DTFFT_PATIENT`.
logical,	intent(in),	optional		::	force_kernel_optimization	Whether to force kernel optimization when `effort` is not `DTFFT_PATIENT`.

Return Value type(dtfft_config_t)

Constructed dtFFT config ready to be set by call to dtfft_set_config