| Procedure | Location | Procedure Type | Description |
|---|---|---|---|
| add | dtfft_nvrtc_module_cache | Subroutine | Adds new entry to cache |
| add_line | dtfft_nvrtc_module | Subroutine | Adds new line to CUDA code |
| aligned_alloc | dtfft_utils | Interface | |
| alloc_and_set_aux | dtfft_transpose_plan | Subroutine | Allocates auxiliary memory according to the backend and sets it to the plans |
| alloc_fft_plans | dtfft_plan | Subroutine | Allocates abstract_executor with required FFT class and populates fft_mapping with similar FFT ids |
| alloc_mem | dtfft_transpose_plan | Subroutine | Allocates memory based on |
| allocate_plans | dtfft_transpose_plan | Subroutine | Allocates array of plans |
| astring_f2c | dtfft_utils | Subroutine | Convert Fortran string to C allocatable string |
| autotune_grid | dtfft_transpose_plan | Subroutine | Creates cartesian grid and runs various backends on it. Returns best backend and execution time |
| autotune_grid_decomposition | dtfft_transpose_plan | Subroutine | Runs through all possible grid decompositions and selects the best one based on the lowest average execution time |
| autotune_transpose_id | dtfft_transpose_plan | Function | Creates forward and backward transpose plans for backend |
| backend_eq | dtfft_parameters | Function | |
| backend_ne | dtfft_parameters | Function | |
| check_aux | dtfft_plan | Subroutine | Checks if aux buffer was passed by user and if not will allocate one internally |
| check_continuity | dtfft_pencil | Function | Check if the local pencils cover the global space without gaps |
| check_create_args | dtfft_plan | Function | Check arguments provided by user and sets private variables |
| check_device_pointers | dtfft_plan | Function | Checks if device pointers are provided by user |
| check_if_even | dtfft_pencil | Function | Checks if data is evenly distributed across processes |
| check_if_overflow | dtfft_transpose_handle_generic | Subroutine | Checks if product of sizes fits into integer(int32) |
| check_instance | dtfft_nvrtc_module | Function | Checks if kernel with given parameters is available in this module |
| check_module | dtfft_nvrtc_module | Function | Basic check that this module provides kernels of given type |
| check_overlap | dtfft_pencil | Function | Check if two pencols overlap in ndims-dimensional space |
| Comm_f2c | dtfft_utils | Interface | |
| compare | test_host_kernels | Subroutine | |
| compile_program | dtfft_nvrtc_module | Function | Compiles nvRTC program with given configurations |
| compute_alltoall_schedule | dtfft_backend_mpi | Subroutine | Generate optimal round-robin communication schedule for all-to-all pattern |
| config_constructor | dtfft_config | Function | Creates a new configuration |
| count_bank_conflicts | dtfft_nvrtc_block_optimizer | Function | Counts bank conflicts for a given tile size, padding, element size, and block rows. |
| count_unique | dtfft_utils | Function | Count the number of unique elements in the array |
| create | dtfft_transpose_plan | Function | Creates transposition plan |
| create | dtfft_abstract_kernel | Subroutine | Creates kernel |
| create | dtfft_transpose_handle_generic | Subroutine | Creates Generic Transpose Handle |
| create | dtfft_backend_cufftmp_m | Subroutine | Creates cuFFTMp GPU Backend |
| create | dtfft_pencil | Subroutine | Creates pencil |
| create | dtfft_abstract_executor | Function | Creates FFT plan |
| create | dtfft_executor_fftw_m | Subroutine | Creates FFT plan via FFTW3 Interface |
| create | dtfft_kernel_device | Subroutine | Creates kernel |
| create | dtfft_nvrtc_module_cache | Subroutine | Creates cache |
| create | dtfft_abstract_transpose_handle | Subroutine | Creates transpose handle |
| create | dtfft_executor_mkl_m | Subroutine | Creates FFT plan via MKL DFTI Interface |
| create | dtfft_executor_cufft_m | Subroutine | Creates FFT plan via cuFFT Interface |
| create | dtfft_nvrtc_module | Subroutine | Creates module with given parameters, compiles nvRTC program and loads it as CUDA module |
| create | dtfft_executor_vkfft_m | Subroutine | Creates FFT plan via vkFFT Interface |
| create | dtfft_abstract_backend | Subroutine | Creates Abstract Backend |
| create | dtfft_transpose_handle_datatype | Subroutine | Creates |
| create_1d_comm | dtfft_pencil | Subroutine | Creates a new 1D communicator based on the fixed dimensions of the current pencil |
| create_back_permutation | dtfft_transpose_handle_datatype | Subroutine | Creates three-dimensional Y –> X and Z –> Y transposition datatypes |
| create_c2c | dtfft_plan | Subroutine | C2C Plan Constructor |
| create_c2c_core | dtfft_plan | Function | Creates plan for both C2C and R2C |
| create_c2c_internal | dtfft_plan | Function | Private method that combines common logic for C2C plan creation |
| create_c2c_pencil | dtfft_plan | Subroutine | C2C Plan Constructor |
| create_cart_comm | dtfft_transpose_plan | Subroutine | Creates cartesian communicator |
| create_data_handle | dtfft_transpose_handle_generic | Subroutine | Creates handle |
| create_forw_permutation | dtfft_transpose_handle_datatype | Subroutine | Creates three-dimensional X –> Y and Y -> Z transposition datatypes |
| create_handle | dtfft_transpose_handle_datatype | Subroutine | Creates transposition handle |
| create_helper | dtfft_backend_mpi | Subroutine | Creates MPI helper |
| create_helper | dtfft_abstract_backend | Subroutine | Creates helper |
| create_host | dtfft_kernel_host | Subroutine | Creates host kernel |
| create_mpi | dtfft_backend_mpi | Subroutine | Creates MPI backend |
| create_nccl | dtfft_backend_nccl_m | Subroutine | Creates NCCL backend |
| create_nvrtc_module | dtfft_nvrtc_module_cache | Subroutine | Creates and adds a new nvrtc module to the cache if it does not already exist |
| create_nvtx_domain | dtfft_interface_nvtx | Subroutine | Creates a new NVTX domain |
| create_pencil_init | dtfft_pencil | Function | Creates and validates pencil passed by user to plan constructors |
| create_pencil_t | dtfft_pencil | Function | Creates pencil object, that can be used to create dtFFT plans |
| create_pencils_and_comm | dtfft_transpose_plan | Subroutine | Creates cartesian communicator |
| create_private | dtfft_plan | Function | Creates core |
| create_r2c | dtfft_plan | Subroutine | R2C Generic Plan Constructor |
| create_r2c_internal | dtfft_plan | Function | Private method that combines common logic for R2C plan creation |
| create_r2c_pencil | dtfft_plan | Subroutine | R2C Plan Constructor with pencil |
| create_r2r | dtfft_plan | Subroutine | R2R Plan Constructor |
| create_r2r_internal | dtfft_plan | Function | Creates plan for R2R plans |
| create_r2r_pencil | dtfft_plan | Subroutine | R2R Plan Constructor |
| create_subcomm | dtfft_utils | Subroutine | Creates communicator with selected processes from |
| create_subcomm_include_all | dtfft_utils | Subroutine | Creates communicator including all processes from |
| create_transpose_2d | dtfft_transpose_handle_datatype | Subroutine | Creates two-dimensional transposition datatypes |
| create_transpose_XZ | dtfft_transpose_handle_datatype | Subroutine | Creates three-dimensional X –> Z transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
| create_transpose_ZX | dtfft_transpose_handle_datatype | Subroutine | Creates three-dimensional Z –> X transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
| cudaDeviceSynchronize | dtfft_interface_cuda_runtime | Interface | |
| cudaEventCreate | dtfft_interface_cuda_runtime | Interface | |
| cudaEventCreateWithFlags | dtfft_interface_cuda_runtime | Interface | |
| cudaEventDestroy | dtfft_interface_cuda_runtime | Interface | |
| cudaEventElapsedTime | dtfft_interface_cuda_runtime | Interface | |
| cudaEventRecord | dtfft_interface_cuda_runtime | Interface | |
| cudaEventSynchronize | dtfft_interface_cuda_runtime | Interface | |
| cudaFree | dtfft_interface_cuda_runtime | Interface | |
| cudaGetDevice | dtfft_interface_cuda_runtime | Interface | |
| cudaGetDeviceCount | dtfft_interface_cuda_runtime | Interface | |
| cudaGetErrorString | dtfft_interface_cuda_runtime | Function | Helper function that returns a string describing the given nvrtcResult code If the error code is not recognized, “unrecognized error code” is returned. |
| cudaGetErrorString_c | dtfft_interface_cuda_runtime | Interface | |
| cudaGetLastError | dtfft_interface_cuda_runtime | Interface | |
| cudaMalloc | dtfft_interface_cuda_runtime | Interface | |
| cudaMemcpy | dtfft_interface_cuda_runtime | Interface | Copies data synchronously between host and device. |
| cudaMemcpyAsync | dtfft_interface_cuda_runtime | Interface | Copies data asynchronously between host and device. |
| cudaMemGetInfo | dtfft_interface_cuda_runtime | Interface | |
| cudaMemset | dtfft_interface_cuda_runtime | Interface | |
| cudaSetDevice | dtfft_interface_cuda_runtime | Interface | |
| cudaStreamCreate | dtfft_interface_cuda_runtime | Interface | |
| cudaStreamDestroy | dtfft_interface_cuda_runtime | Interface | |
| cudaStreamQuery | dtfft_interface_cuda_runtime | Interface | |
| cudaStreamSynchronize | dtfft_interface_cuda_runtime | Interface | |
| cudaStreamWaitEvent | dtfft_interface_cuda_runtime | Interface | |
| cufftDestroy | dtfft_interface_cufft | Interface | Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. |
| cufftGetErrorString | dtfft_interface_cufft | Function | Returns a string representation of the cuFFT error code. |
| cufftMpAttachReshapeComm | dtfft_interface_cufft | Interface | Attaches a communication handle to a reshape. This function is not collective. |
| cufftMpCreateReshape | dtfft_interface_cufft | Interface | Initializes a reshape handle for future use. This function is not collective. |
| cufftMpDestroyReshape | dtfft_interface_cufft | Interface | Destroys a reshape and all its associated data. |
| cufftMpExecReshapeAsync | dtfft_interface_cufft | Interface | Executes the reshape, redistributing data_in into data_out using the workspace in workspace. |
| cufftMpGetReshapeSize | dtfft_interface_cufft | Interface | Returns the amount (in bytes) of workspace required to execute the handle. |
| cufftMpMakeReshape | dtfft_interface_cufft | Interface | Creates a reshape intended to re-distribute a global array of 3D data. |
| cufftPlanMany | dtfft_interface_cufft | Interface | Creates a FFT plan configuration of dimension rank, with sizes specified in the array n. |
| cufftSetStream | dtfft_interface_cufft | Interface | Associates a CUDA stream with a cuFFT plan. |
| cufftXtExec | dtfft_interface_cufft | Interface | Executes any cuFFT transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction parameter is ignored. |
| cuLaunchKernel | dtfft_interface_cuda | Function | Launches a CUDA kernel |
| destroy | dtfft_transpose_plan | Subroutine | Destroys transposition plans |
| destroy | dtfft_abstract_kernel | Subroutine | Destroys kernel |
| destroy | dtfft_transpose_handle_generic | Subroutine | Destroys Generic Transpose Handle |
| destroy | dtfft_backend_cufftmp_m | Subroutine | Destroys cuFFTMp GPU Backend |
| destroy | dtfft_pencil | Subroutine | Destroys pencil |
| destroy | dtfft_abstract_executor | Subroutine | Destroys plan |
| destroy | dtfft_executor_fftw_m | Subroutine | Destroys FFTW3 plan |
| destroy | dtfft_kernel_device | Subroutine | Destroys kernel |
| destroy | dtfft_executor_mkl_m | Subroutine | Destroys MKL plan |
| destroy | dtfft_executor_cufft_m | Subroutine | Destroys cuFFT plan |
| destroy | dtfft_nvrtc_module | Subroutine | Destroys module and frees resources |
| destroy | dtfft_executor_vkfft_m | Subroutine | Destroys vkFFT plan |
| destroy | dtfft_plan | Subroutine | Destroys plan, frees all memory |
| destroy | dtfft_abstract_backend | Subroutine | Destroys Abstract Backend |
| destroy | dtfft_transpose_handle_datatype | Subroutine | Destroys |
| destroy_data_handle | dtfft_transpose_handle_generic | Subroutine | Destroys handle |
| destroy_handle | dtfft_transpose_handle_datatype | Subroutine | Destroys transposition handle |
| destroy_helper | dtfft_backend_mpi | Subroutine | Destroys MPI helper |
| destroy_helper | dtfft_abstract_backend | Subroutine | Destroys helper |
| destroy_host | dtfft_kernel_host | Subroutine | Destroys host kernel |
| destroy_mpi | dtfft_backend_mpi | Subroutine | Destroys MPI backend |
| destroy_nccl | dtfft_backend_nccl_m | Subroutine | Destroys NCCL backend |
| destroy_pencil_init | dtfft_pencil | Subroutine | Destroys pencil_init |
| destroy_pencil_t | dtfft_pencil | Subroutine | Destroys pencil |
| destroy_pencil_t_private | dtfft_pencil | Subroutine | Destroys pencil |
| destroy_plans | dtfft_transpose_plan | Subroutine | Destroys array of plans |
| destroy_stream | dtfft_config | Subroutine | Destroy the default stream if it was created |
| destroy_string | dtfft_utils | Subroutine | |
| destroy_strings | dtfft_utils | Subroutine | Destroys array of string objects |
| DftiErrorMessage | dtfft_interface_mkl_m | Function | Generates an error message. |
| DftiErrorMessage_c | dtfft_interface_mkl_m | Interface | |
| dl_error | dtfft_utils | Subroutine | Writes error message to the error unit |
| dlclose | dtfft_utils | Interface | |
| dlerror | dtfft_utils | Interface | |
| dlopen | dtfft_utils | Interface | |
| dlsym | dtfft_utils | Interface | |
| double_to_string | dtfft_utils | Function | Convert double to string |
| dtfft_config_t | dtfft_config | Interface | Interface to create a new configuration |
| dtfft_create_config | dtfft_config | Subroutine | Creates a new configuration and sets default values. |
| dtfft_create_plan_c2c_c | dtfft_api | Function | Creates C2C dtFFT Plan, allocates all structures and prepares FFT, C interface |
| dtfft_create_plan_c2c_pencil_c | dtfft_api | Function | Creates C2C dtFFT plan from Pencil, allocates all structures and prepares FFT, C interface |
| dtfft_create_plan_r2r_c | dtfft_api | Function | Creates R2R dtFFT Plan, allocates all structures and prepares FFT, C interface |
| dtfft_create_plan_r2r_pencil_c | dtfft_api | Function | Creates R2R dtFFT Plan from Pencil, allocates all structures and prepares FFT, C interface |
| dtfft_destroy_c | dtfft_api | Function | Destroys dtFFT Plan, C interface |
| dtfft_execute_c | dtfft_api | Function | Executes dtFFT Plan, C interface. |
| dtfft_get_alloc_bytes_c | dtfft_api | Function | Returns minimum number of bytes required to execute plan, C interface |
| dtfft_get_alloc_size_c | dtfft_api | Function | Returns minimum number of bytes to be allocated for |
| dtfft_get_backend_c | dtfft_api | Function | Returns selected dtfft_backend_t during autotuning |
| dtfft_get_backend_string | dtfft_parameters | Function | Gets the string description of a backend |
| dtfft_get_backend_string_c | dtfft_api | Subroutine | Returns string representation of |
| dtfft_get_cuda_stream | dtfft_parameters | Function | Returns the CUDA stream from dtfft_stream_t |
| dtfft_get_dims_c | dtfft_api | Function | Returns dimensions of plan, C interface |
| dtfft_get_element_size_c | dtfft_api | Function | Returns size of element in bytes, C interface |
| dtfft_get_error_string | dtfft_errors | Function | Gets the string description of an error code |
| dtfft_get_error_string_c | dtfft_api | Subroutine | Returns an explaination of |
| dtfft_get_executor_c | dtfft_api | Function | Returns executor type used in plan, C interface |
| dtfft_get_executor_string | dtfft_parameters | Function | Gets the string description of an executor |
| dtfft_get_executor_string_c | dtfft_api | Subroutine | Returns string representation of |
| dtfft_get_grid_dims_c | dtfft_api | Function | Returns grid decomposition dimensions of plan, C interface |
| dtfft_get_local_sizes_c | dtfft_api | Function | Returns local sizes, counts in real and Fourier spaces and number of elements to be allocated for |
| dtfft_get_pencil_c | dtfft_api | Function | Returns pencil decomposition info, C interface |
| dtfft_get_platform_c | dtfft_api | Function | Returns selected dtfft_platform_t during autotuning |
| dtfft_get_precision_c | dtfft_api | Function | Returns precision used in plan, C interface |
| dtfft_get_precision_string | dtfft_parameters | Function | Gets the string description of a precision |
| dtfft_get_precision_string_c | dtfft_api | Subroutine | Returns string representation of |
| dtfft_get_stream_c | dtfft_api | Function | Returns Stream associated with plan |
| dtfft_get_version | dtfft_parameters | Interface | Get dtFFT version |
| dtfft_get_version_current | dtfft_parameters | Function | Returns the current version code |
| dtfft_get_version_required | dtfft_parameters | Function | Returns the version code required by the user |
| dtfft_get_y_slab_enabled_c | dtfft_api | Function | Checks if dtFFT Plan is using Y-slab optimization |
| dtfft_get_z_slab_enabled_c | dtfft_api | Function | Checks if dtFFT Plan is using Z-slab optimization |
| dtfft_mem_alloc_c | dtfft_api | Function | Allocates memory for dtFFT Plan, C interface |
| dtfft_mem_free_c | dtfft_api | Function | Frees memory for dtFFT Plan, C interface |
| dtfft_pencil_t | dtfft_pencil | Interface | Type bound constuctor for dtfft_pencil_t |
| dtfft_report_c | dtfft_api | Function | Reports dtFFT Plan, C interface |
| dtfft_set_config | dtfft_config | Subroutine | Sets configuration parameters |
| dtfft_set_config_c | dtfft_api | Function | Sets dtFFT configuration, C interface |
| dtfft_stream_t | dtfft_parameters | Interface | Creates dtfft_stream_t from integer(cuda_stream_kind) |
| dtfft_transpose_c | dtfft_api | Function | Executes single transposition, C interface. |
| dtfft_transpose_end_c | dtfft_api | Function | Finishes asynchronous transposition, C interface. |
| dtfft_transpose_start_c | dtfft_api | Function | Starts asynchronous transposition, returns transpose handle, C interface. |
| dynamic_load | dtfft_utils | Function | Dynamically loads library and its symbols |
| effort_eq | dtfft_parameters | Function | |
| effort_ne | dtfft_parameters | Function | |
| estimate_bank_conflict_ratio | dtfft_nvrtc_block_optimizer | Function | Estimates the bank conflict ratio for a given kernel configuration |
| estimate_coalescing | dtfft_nvrtc_block_optimizer | Function | Estimate memory coalescing efficiency for a given kernel configuration and transpose type |
| estimate_memory_pressure | dtfft_nvrtc_block_optimizer | Function | Analytical estimation of memory pressure based on GPU architecture |
| estimate_occupancy | dtfft_nvrtc_block_optimizer | Function | Calculates theoretical occupancy for a given kernel configuration |
| estimate_optimal_padding | dtfft_nvrtc_block_optimizer | Function | Estimates the optimal padding for a given tile size and element size |
| evaluate_analytical_performance | dtfft_nvrtc_block_optimizer | Function | This function evaluates the performance of a kernel configuration based on various architectural and problem-specific parameters. |
| exec_eq | dtfft_parameters | Function | |
| execute | dtfft_transpose_plan | Subroutine | Executes transposition |
| execute | dtfft_abstract_kernel | Subroutine | Executes kernel |
| execute | dtfft_transpose_handle_generic | Subroutine | Executes transpose - exchange - unpack |
| execute | dtfft_backend_cufftmp_m | Subroutine | Executes cuFFTMp GPU Backend |
| execute | dtfft_abstract_executor | Subroutine | Executes plan |
| execute | dtfft_executor_fftw_m | Subroutine | Executes FFTW3 plan |
| execute | dtfft_kernel_device | Subroutine | Executes kernel on stream |
| execute | dtfft_executor_mkl_m | Subroutine | Executes MKL plan |
| execute | dtfft_executor_cufft_m | Subroutine | Executes cuFFT plan |
| execute | dtfft_executor_vkfft_m | Subroutine | Executes vkFFT plan |
| execute | dtfft_plan | Subroutine | Executes plan |
| execute | dtfft_abstract_backend | Subroutine | Executes Backend |
| execute | dtfft_transpose_handle_datatype | Subroutine | Executes transposition |
| execute_2d | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
| execute_a2a | dtfft_backend_mpi | Subroutine | |
| execute_benchmark | dtfft_kernel_host | Subroutine | Executes benchmark for the given kernel |
| execute_end | dtfft_transpose_plan | Subroutine | Finishes asynchronous transposition |
| execute_end | dtfft_transpose_handle_generic | Subroutine | Ends execution of transposition |
| execute_end | dtfft_abstract_backend | Subroutine | Ends execution of Backend |
| execute_end | dtfft_transpose_handle_datatype | Subroutine | Ends execution of transposition |
| execute_end_mpi | dtfft_backend_mpi | Subroutine | |
| execute_f128 | dtfft_kernel_host | Subroutine | Executes kernel based on its type and access mode, complex(real64) version |
| execute_f128_block_16 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f128_block_32 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f128_block_64 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f32 | dtfft_kernel_host | Subroutine | Executes kernel based on its type and access mode, real(real32) version |
| execute_f32_block_16 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f32_block_32 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f32_block_64 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f64 | dtfft_kernel_host | Subroutine | Executes kernel based on its type and access mode, real(real64) version |
| execute_f64_block_16 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f64_block_32 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_f64_block_64 | dtfft_kernel_host | Subroutine | Executes the given kernel on host |
| execute_generic | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
| execute_host | dtfft_kernel_host | Subroutine | Executes host kernel |
| execute_mpi | dtfft_backend_mpi | Subroutine | Executes MPI backend |
| execute_nccl | dtfft_backend_nccl_m | Subroutine | Executes NCCL backend |
| execute_p2p | dtfft_backend_mpi | Subroutine | |
| execute_p2p_scheduled | dtfft_backend_mpi | Subroutine | |
| execute_private | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
| execute_ptr | dtfft_plan | Subroutine | Executes plan using type(c_ptr) pointers instead of buffers |
| execute_self_copy | dtfft_abstract_backend | Subroutine | |
| execute_type_eq | dtfft_parameters | Function | |
| execute_type_ne | dtfft_parameters | Function | |
| execute_z_slab | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
| executor_eq | dtfft_parameters | Function | |
| executor_ne | dtfft_parameters | Function | |
| fftw_execute_dft | dtfft_interface_fftw_m | Interface | |
| fftw_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
| fftw_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
| fftw_execute_r2r | dtfft_interface_fftw_m | Interface | |
| fftw_plan_many_dft | dtfft_interface_fftw_m | Interface | |
| fftw_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
| fftw_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
| fftw_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
| fftwf_execute_dft | dtfft_interface_fftw_m | Interface | |
| fftwf_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
| fftwf_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
| fftwf_execute_r2r | dtfft_interface_fftw_m | Interface | |
| fftwf_plan_many_dft | dtfft_interface_fftw_m | Interface | |
| fftwf_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
| fftwf_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
| fftwf_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
| find_valid_combination | dtfft_nvrtc_block_optimizer | Subroutine | This subroutine optimizes the tile size and number of rows for narrow matrices by adjusting them to be compatible with the warp size. |
| float_to_string | dtfft_utils | Function | Convert double to string |
| free_datatypes | dtfft_transpose_handle_datatype | Subroutine | Frees temporary datatypes |
| free_mem | dtfft_transpose_plan | Subroutine | Frees memory based on |
| generate_candidates | dtfft_nvrtc_block_optimizer | Subroutine | Generate kernel configuration candidates for given problem |
| get | dtfft_nvrtc_module | Function | Returns kernel ready to be executed |
| get_alloc_bytes | dtfft_plan | Function | Returns minimum number of bytes required to execute plan |
| get_alloc_size | dtfft_plan | Function | Wrapper around |
| get_ampere_architecture | dtfft_nvrtc_block_optimizer | Function | Ampere architecture (Compute Capability 8.0) |
| get_async_active | dtfft_transpose_plan | Function | Returns .true. if any of the plans is running asynchronously |
| get_async_active | dtfft_transpose_handle_generic | Function | |
| get_async_active | dtfft_backend_mpi | Function | Returns if async transpose is active |
| get_async_active | dtfft_abstract_backend | Function | Returns if async execution is active |
| get_async_active | dtfft_transpose_handle_datatype | Function | Returns if async transpose is active |
| get_aux_size | dtfft_transpose_plan | Function | Returns maximum auxiliary memory size needed by transpose plan |
| get_aux_size | dtfft_transpose_handle_generic | Function | Returns number of bytes required by aux buffer |
| get_aux_size | dtfft_abstract_transpose_handle | Function | Returns number of bytes required by aux buffer |
| get_aux_size | dtfft_abstract_backend | Function | Returns number of bytes required by aux buffer |
| get_aux_size_generic | dtfft_transpose_plan | Function | Returns maximum auxiliary memory size needed by plans |
| get_backend | dtfft_transpose_plan | Function | Returns plan GPU backend |
| get_backend | dtfft_plan | Function | Returns selected GPU backend during autotuning |
| get_code | dtfft_nvrtc_module | Function | Generates code that will be used to locally tranpose data and prepares to send it to other processes |
| get_comm | dtfft_api | Function | Converts C communicator to Fortran communicator |
| get_conf_backend | dtfft_config | Function | Returns backend set by the user or default one |
| get_conf_configs_to_test | dtfft_config | Function | Returns the number of configurations to test |
| get_conf_datatype_enabled | dtfft_config | Function | Whether MPI Datatype backend is enabled or not |
| get_conf_forced_kernel_optimization | dtfft_config | Function | Whether forced kernel optimization is enabled or not |
| get_conf_internal | dtfft_config | Interface | Returns value from configuration unless environment variable is set |
| get_conf_internal_int32 | dtfft_config | Function | Returns value from configuration unless environment variable is set |
| get_conf_internal_logical | dtfft_config | Function | Returns value from configuration unless environment variable is set |
| get_conf_kernel_optimization_enabled | dtfft_config | Function | Whether kernel optimization is enabled or not |
| get_conf_log_enabled | dtfft_config | Function | Whether logging is enabled or not |
| get_conf_measure_iters | dtfft_config | Function | Returns the number of measurement iterations |
| get_conf_measure_warmup_iters | dtfft_config | Function | Returns the number of warmup iterations |
| get_conf_mpi_enabled | dtfft_config | Function | Whether MPI backends are enabled or not |
| get_conf_nccl_enabled | dtfft_config | Function | Whether NCCL backends are enabled or not |
| get_conf_nvshmem_enabled | dtfft_config | Function | Whether nvshmem backends are enabled or not |
| get_conf_pipelined_enabled | dtfft_config | Function | Whether pipelined backends are enabled or not |
| get_conf_platform | dtfft_config | Function | Returns platform set by the user or default one |
| get_conf_stream | dtfft_config | Function | Returns either the custom provided by user or creates a new one |
| get_conf_y_slab_enabled | dtfft_config | Function | Whether Y-slab optimization is enabled or not |
| get_conf_z_slab_enabled | dtfft_config | Function | Whether Z-slab optimization is enabled or not |
| get_correct_backend | dtfft_config | Function | |
| get_datatype_from_env | dtfft_config | Function | Obtains datatype id from environment variable |
| get_device_props | dtfft_interface_cuda_runtime | Interface | |
| get_dims | dtfft_plan | Subroutine | Returns global dimensions |
| get_element_size | dtfft_plan | Function | Returns number of bytes required to store single element. |
| get_env | dtfft_config | Interface | Obtains environment variable |
| get_env_base | dtfft_config | Function | Base function of obtaining dtFFT environment variable |
| get_env_int32 | dtfft_config | Function | Base Integer function of obtaining dtFFT environment variable |
| get_env_int8 | dtfft_config | Function | Obtains int8 environment variable |
| get_env_logical | dtfft_config | Function | Obtains logical environment variable |
| get_env_string | dtfft_config | Function | Obtains string environment variable |
| get_executor | dtfft_plan | Function | Returns FFT Executor associated with plan |
| get_grid_dims | dtfft_plan | Subroutine | Returns grid decomposition dimensions |
| get_host_kernel_string | dtfft_kernel_host | Function | Returns string representation of the given host kernel type |
| get_inverse_kind | dtfft_utils | Function | Get the inverse R2R kind of transform for the given R2R kind |
| get_kernel | dtfft_kernel_device | Subroutine | Compiles kernel and caches it. Returns compiled kernel. |
| get_kernel_args | dtfft_kernel_device | Subroutine | Populates kernel arguments based on kernel type |
| get_kernel_instance | dtfft_nvrtc_module_cache | Function | Retrieves a kernel instance from the cache If the instance is not found, an error is raised |
| get_kernel_launch_params | dtfft_kernel_device | Subroutine | Computes kernel launch parameters based on kernel type and dimensions |
| get_kernel_string | dtfft_abstract_kernel | Function | Gets the string description of a kernel |
| get_local_size | dtfft_pencil | Subroutine | Computes local portions of data based on global count and position inside grid communicator |
| get_local_sizes | dtfft_pencil | Subroutine | Obtain local starts and counts in |
| get_local_sizes | dtfft_plan | Subroutine | Obtain local starts and counts in |
| get_mangled_name | dtfft_nvrtc_module | Function | Gets mangled name for given template parameters from nvRTC program |
| get_name_expression | dtfft_nvrtc_module | Function | Generates name expression for given template parameters |
| get_pencil | dtfft_plan | Function | Returns pencil decomposition |
| get_plan_execution_time | dtfft_transpose_plan | Function | Creates transpose plan for backend |
| get_platform | dtfft_plan | Function | Returns execution platform of the plan (HOST or CUDA) |
| get_precision | dtfft_plan | Function | Returns precision of the plan |
| get_stream_int64 | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
| get_stream_ptr | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
| get_transpose_type | dtfft_pencil | Function | Determines transpose ID based on pencils |
| get_varying_dim | dtfft_pencil | Function | |
| get_volta_architecture | dtfft_nvrtc_block_optimizer | Function | Volta architecture (Compute Capability 7.0) |
| get_y_slab_enabled | dtfft_plan | Function | Returns logical value is Y-slab optimization enabled internally |
| get_z_slab | dtfft_transpose_plan | Function | Returns .true. if Z-slab optimization is enabled |
| get_z_slab_enabled | dtfft_plan | Function | Returns logical value is Z-slab optimization enabled internally |
| host_kernel_eq | dtfft_kernel_host | Function | |
| init_environment | dtfft_config | Subroutine | |
| init_internal | dtfft_config | Function | Checks if MPI is initialized and loads environment variables |
| init_nvshmem | dtfft_interface_nvshmem | Interface | |
| int32_to_string | dtfft_utils | Function | Convert 32-bit integer to string |
| int64_to_string | dtfft_utils | Function | Convert 64-bit integer to string |
| int8_to_string | dtfft_utils | Function | Convert 8-bit integer to string |
| is_backend_cufftmp | dtfft_parameters | Function | |
| is_backend_mpi | dtfft_parameters | Function | |
| is_backend_nccl | dtfft_parameters | Function | |
| is_backend_nvshmem | dtfft_parameters | Function | |
| is_backend_pipelined | dtfft_parameters | Function | |
| is_cuda_executor | dtfft_parameters | Function | |
| is_device_ptr | dtfft_utils | Interface | |
| is_host_executor | dtfft_parameters | Function | |
| is_null_funptr | dtfft_utils | Function | Checks if pointer is NULL |
| is_null_ptr | dtfft_utils | Function | Checks if pointer is NULL |
| is_null_ptr | dtfft_utils | Interface | Checks if pointer is NULL |
| is_nvshmem_ptr | dtfft_interface_nvshmem | Function | Checks if pointer is a symmetric nvshmem allocated pointer |
| is_same_ptr | dtfft_utils | Function | Checks if two pointer are the same |
| is_transpose_kernel | dtfft_abstract_kernel | Function | |
| is_unpack_kernel | dtfft_abstract_kernel | Function | |
| is_valid_backend | dtfft_parameters | Function | |
| is_valid_comm_type | dtfft_parameters | Function | |
| is_valid_dimension | dtfft_parameters | Function | |
| is_valid_effort | dtfft_parameters | Function | |
| is_valid_execute_type | dtfft_parameters | Function | |
| is_valid_executor | dtfft_parameters | Function | |
| is_valid_platform | dtfft_parameters | Function | |
| is_valid_precision | dtfft_parameters | Function | |
| is_valid_r2r_kind | dtfft_parameters | Function | |
| is_valid_transpose_type | dtfft_parameters | Function | |
| kernel_type_eq | dtfft_abstract_kernel | Function | |
| kernel_type_ne | dtfft_abstract_kernel | Function | |
| load | dtfft_interface_vkfft_m | Function | Loads VkFFT library |
| load_cuda | dtfft_interface_cuda | Function | Loads the CUDA Driver library and needed symbols |
| load_library | dtfft_utils | Function | Dynamically loads library |
| load_nvrtc | dtfft_interface_nvrtc | Function | Dynamically loads nvRTC library and its functions |
| load_symbol | dtfft_utils | Function | Dynamically loads symbol from library |
| load_vkfft | dtfft_interface_vkfft_m | Function | Loads VkFFT library based on the platform |
| make_plan | dtfft_executor_mkl_m | Subroutine | Creates general MKL plan |
| make_public | dtfft_pencil | Function | Creates public object that users can use to create own FFT backends |
| mem_alloc | dtfft_transpose_plan | Subroutine | Allocates memory based on selected backend |
| mem_alloc | dtfft_executor_fftw_m | Subroutine | Allocates FFTW3 memory |
| mem_alloc | dtfft_executor_mkl_m | Subroutine | Allocates MKL memory |
| mem_alloc | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
| mem_alloc | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
| mem_alloc_c32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
| mem_alloc_c32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
| mem_alloc_c32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
| mem_alloc_c64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
| mem_alloc_c64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
| mem_alloc_c64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
| mem_alloc_host | dtfft_utils | Function | Allocates memory using C11 Standard alloc_align with 16 bytes alignment |
| mem_alloc_ptr | dtfft_plan | Function | Allocates memory specific for this plan |
| mem_alloc_r32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
| mem_alloc_r32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
| mem_alloc_r32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
| mem_alloc_r64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
| mem_alloc_r64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
| mem_alloc_r64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
| mem_free | dtfft_transpose_plan | Subroutine | Frees memory allocated with mem_alloc |
| mem_free | dtfft_executor_fftw_m | Subroutine | Frees FFTW3 aligned memory |
| mem_free | dtfft_executor_mkl_m | Subroutine | Frees MKL aligned memory |
| mem_free | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
| mem_free | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
| mem_free_c32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_c32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_c32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_c64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_c64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_c64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_host | dtfft_utils | Interface | |
| mem_free_ptr | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mem_free_r64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
| mkl_dfti_commit_desc | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_create_desc | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_execute | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_free_desc | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_mem_alloc | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_mem_free | dtfft_interface_mkl_m | Interface | |
| mkl_dfti_set_value | dtfft_interface_mkl_m | Interface | Sets one particular configuration parameter with the specified configuration value. |
| ncclCommDeregister | dtfft_interface_nccl | Interface | Deregister a buffer for collective communication. |
| ncclCommDestroy | dtfft_interface_nccl | Interface | Destroy a communicator object comm. |
| ncclCommInitRank | dtfft_interface_nccl | Interface | Creates a new communicator (multi thread/process version). |
| ncclCommRegister | dtfft_interface_nccl | Interface | Register a buffer for collective communication. |
| ncclGetErrorString | dtfft_interface_nccl | Function | Generates an error message. |
| ncclGetErrorString_c | dtfft_interface_nccl | Interface | Returns a human-readable string corresponding to the passed error code. |
| ncclGetUniqueId | dtfft_interface_nccl | Interface | Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user. |
| ncclGroupEnd | dtfft_interface_nccl | Interface | End a group call. |
| ncclGroupStart | dtfft_interface_nccl | Interface | Start a group call. |
| ncclMemAlloc | dtfft_interface_nccl | Interface | Allocate a GPU buffer with size. Allocated buffer head address will be returned by ptr, and the actual allocated size can be larger than requested because of the buffer granularity requirements from all types of NCCL optimizations. |
| ncclMemFree | dtfft_interface_nccl | Interface | Free memory allocated by ncclMemAlloc(). |
| ncclRecv | dtfft_interface_nccl | Interface | Receive data from rank peer into recvbuff. |
| ncclSend | dtfft_interface_nccl | Interface | Send data from sendbuff to rank peer. |
| nvrtcGetErrorString | dtfft_interface_nvrtc | Function | Helper function that returns a string describing the given nvrtcResult code For unrecognized enumeration values, it returns “NVRTC_ERROR unknown” |
| nvshmem_finalize_ | dtfft_interface_nvshmem | Interface | |
| nvshmem_free | dtfft_interface_nvshmem | Interface | |
| nvshmem_malloc | dtfft_interface_nvshmem | Interface | |
| nvshmem_my_pe | dtfft_interface_nvshmem | Interface | |
| nvshmem_ptr | dtfft_interface_nvshmem | Interface | |
| nvshmemx_float_alltoall_on_stream | dtfft_interface_nvshmem | Interface | |
| nvshmemx_init_status | dtfft_interface_nvshmem | Interface | |
| nvshmemx_sync_all_on_stream | dtfft_interface_nvshmem | Interface | |
| nvtxDomainCreate_c | dtfft_interface_nvtx | Interface | |
| nvtxDomainRangePop_c | dtfft_interface_nvtx | Interface | |
| nvtxDomainRangePushEx_c | dtfft_interface_nvtx | Interface | |
| operator(/=) | dtfft_parameters | Interface | |
| operator(/=) | dtfft_abstract_kernel | Interface | |
| operator(==) | dtfft_parameters | Interface | |
| operator(==) | dtfft_abstract_kernel | Interface | |
| operator(==) | dtfft_kernel_host | Interface | |
| pencil_c2f | dtfft_pencil | Subroutine | Converts C pencil to Fortran pencil |
| pencil_f2c | dtfft_pencil | Subroutine | Converts Fortran pencil to C pencil |
| permute_backward_end_pipelined_read_f128 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous reading, complex(real64) version |
| permute_backward_end_pipelined_read_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous reading, real(real32) version |
| permute_backward_end_pipelined_read_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous reading, real(real64) version |
| permute_backward_end_pipelined_read_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_read_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f128 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous writing, complex(real64) version |
| permute_backward_end_pipelined_write_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous writing, real(real32) version |
| permute_backward_end_pipelined_write_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor, contiguous writing, real(real64) version |
| permute_backward_end_pipelined_write_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_pipelined_write_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for a single neighbor |
| permute_backward_end_read_f128 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous reading, complex(real64) version |
| permute_backward_end_read_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous reading, real(real32) version |
| permute_backward_end_read_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous reading, real(real64) version |
| permute_backward_end_read_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_read_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f128 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous writing, complex(real64) version |
| permute_backward_end_write_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous writing, real(real32) version |
| permute_backward_end_write_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors, contiguous writing, real(real64) version |
| permute_backward_end_write_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_end_write_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation end of a 3D array for all neighbors |
| permute_backward_read_f128 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous reading, complex(real64) version |
| permute_backward_read_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous reading, real(real32) version |
| permute_backward_read_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous reading, real(real64) version |
| permute_backward_read_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_read_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_start_read_f128 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous reading, complex(real64) version |
| permute_backward_start_read_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous reading, real(real32) version |
| permute_backward_start_read_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous reading, real(real64) version |
| permute_backward_start_read_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_read_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f128 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous writing, complex(real64) version |
| permute_backward_start_write_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous writing, real(real32) version |
| permute_backward_start_write_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array, contiguous writing, real(real64) version |
| permute_backward_start_write_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_start_write_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation start of a 3D array |
| permute_backward_write_f128 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous writing, complex(real64) version |
| permute_backward_write_f128_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f128_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f128_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous writing, real(real32) version |
| permute_backward_write_f32_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f32_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f32_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays, contiguous writing, real(real64) version |
| permute_backward_write_f64_block_16 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f64_block_32 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_backward_write_f64_block_64 | dtfft_kernel_host | Subroutine | Backward permutation of a 2D and 3D arrays |
| permute_forward_read_f128 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous reading, complex(real64) version |
| permute_forward_read_f128_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f128_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f128_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous reading, real(real32) version |
| permute_forward_read_f32_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f32_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f32_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous reading, real(real64) version |
| permute_forward_read_f64_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f64_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_read_f64_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f128 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous writing, complex(real64) version |
| permute_forward_write_f128_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f128_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f128_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous writing, real(real32) version |
| permute_forward_write_f32_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f32_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f32_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays, contiguous writing, real(real64) version |
| permute_forward_write_f64_block_16 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f64_block_32 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| permute_forward_write_f64_block_64 | dtfft_kernel_host | Subroutine | Forward permutation of a 2D and 3D arrays |
| platform_eq | dtfft_parameters | Function | |
| platform_ne | dtfft_parameters | Function | |
| pop_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pops a range from the NVTX domain |
| precision_eq | dtfft_parameters | Function | |
| precision_ne | dtfft_parameters | Function | |
| push_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pushes a range to the NVTX domain |
| r2r_kind_eq | dtfft_parameters | Function | |
| r2r_kind_ne | dtfft_parameters | Function | |
| report | dtfft_plan | Subroutine | Prints plan-related information to stdout |
| report_timings | dtfft_transpose_plan | Function | |
| run_autotune_backend | dtfft_transpose_plan | Subroutine | Runs autotune for all backends Symmetric heap can be allocated after nvshmem_init, which is done during plan creation |
| run_autotune_datatypes | dtfft_transpose_plan | Subroutine | |
| run_permute_backward | test_host_kernels | Subroutine | |
| run_permute_backward_end | test_host_kernels | Subroutine | |
| run_permute_backward_start | test_host_kernels | Subroutine | |
| run_permute_forward | test_host_kernels | Subroutine | |
| run_unpack | test_host_kernels | Subroutine | |
| select_access_mode_f128 | dtfft_kernel_host | Subroutine | Selects the best access mode for host kernels, complex(real64) version |
| select_access_mode_f32 | dtfft_kernel_host | Subroutine | Selects the best access mode for host kernels, real(real32) version |
| select_access_mode_f64 | dtfft_kernel_host | Subroutine | Selects the best access mode for host kernels, real(real64) version |
| select_kernel | dtfft_kernel_host | Function | Selects the kernel implementation based on the given id and base storage size |
| set_name_expression | dtfft_nvrtc_module | Subroutine | Sets name expression for given template parameters to nvRTC program |
| set_unpack_kernel | dtfft_abstract_backend | Subroutine | Sets unpack kernel for pipelined backend |
| sort_by_varying_dim | dtfft_pencil | Subroutine | |
| sort_candidates_by_score | dtfft_nvrtc_block_optimizer | Subroutine | Sorting candidates by their performance scores |
| stream_from_int64 | dtfft_parameters | Function | Creates dtfft_stream_t from integer(cuda_stream_kind) |
| string | dtfft_utils | Interface | Creates string object |
| string_c2f | dtfft_utils | Subroutine | Convert C string to Fortran string |
| string_constructor | dtfft_utils | Function | Creates string object |
| string_f2c | dtfft_utils | Subroutine | Convert Fortran string to C string |
| to_str | dtfft_utils | Interface | Convert various types to string |
| transpose | dtfft_plan | Subroutine | Performs single transposition |
| transpose_end | dtfft_plan | Subroutine | Ends previously started transposition |
| transpose_private | dtfft_plan | Subroutine | Performs single transposition using type(c_ptr) pointers instead of buffers |
| transpose_ptr | dtfft_plan | Subroutine | Performs single transposition using type(c_ptr) pointers instead of buffers |
| transpose_start | dtfft_plan | Function | Starts an asynchronous transpose operation |
| transpose_start_ptr | dtfft_plan | Function | Starts an asynchronous transpose operation using type(c_ptr) pointers instead of buffers |
| transpose_type_eq | dtfft_parameters | Function | |
| transpose_type_ne | dtfft_parameters | Function | |
| unload_library | dtfft_utils | Subroutine | Unloads library |
| unpack_f128 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks, complex(real64) version. |
| unpack_f128_block_16 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f128_block_32 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f128_block_64 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f32 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks, real(real32) version. |
| unpack_f32_block_16 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f32_block_32 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f32_block_64 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f64 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks, real(real64) version. |
| unpack_f64_block_16 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f64_block_32 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_f64_block_64 | dtfft_kernel_host | Subroutine | Unpacks pack of contiguous buffer recieved from all ranks. |
| unpack_pipelined_f128 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank, complex(real64) version. |
| unpack_pipelined_f128_block_16 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f128_block_32 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f128_block_64 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f32 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank, real(real32) version. |
| unpack_pipelined_f32_block_16 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f32_block_32 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f32_block_64 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f64 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank, real(real64) version. |
| unpack_pipelined_f64_block_16 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f64_block_32 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| unpack_pipelined_f64_block_64 | dtfft_kernel_host | Subroutine | Unpacks part of contiguous buffer recieved from a single rank. |
| write_message | dtfft_utils | Subroutine | Write message to the specified unit |