Procedure | Location | Procedure Type | Description |
---|---|---|---|
add | dtfft_nvrtc_kernel_cache | Subroutine | Adds new entry to cache |
add_line | dtfft_nvrtc_kernel_generator | Subroutine | Adds new line to CUDA code |
aligned_alloc | dtfft_utils | Interface | |
alloc_and_set_aux | dtfft_transpose_plan_cuda | Function | Allocates auxiliary memory according to the backend and sets it to the plans |
alloc_fft_plans | dtfft_plan | Subroutine | Allocates abstract_executor with required FFT class and populates fft_mapping with similar FFT ids |
alloc_mem | dtfft_abstract_transpose_plan | Subroutine | Allocates memory based on |
astring_f2c | dtfft_utils | Subroutine | Convert Fortran string to C allocatable string |
autotune_grid | dtfft_transpose_plan_host | Subroutine | Creates cartesian communicator and executes various datatypes on it |
autotune_grid | dtfft_transpose_plan_cuda | Subroutine | Creates cartesian grid and runs various backends on it. Can return best backend and execution time |
autotune_grid_decomposition | dtfft_transpose_plan_host | Subroutine | Runs through all possible grid decompositions and selects the best one based on the lowest average execution time |
autotune_grid_decomposition | dtfft_transpose_plan_cuda | Subroutine | Runs through all possible grid decompositions and selects the best one based on the lowest average execution time |
autotune_mpi_datatypes | dtfft_transpose_plan_host | Subroutine | |
autotune_transpose_id | dtfft_transpose_plan_host | Function | Creates forward and backward transpose plans bases on source and target data distributing,
executes them |
check_aux | dtfft_plan | Subroutine | Checks if aux buffer was passed by user and if not will allocate one internally |
check_continuity | dtfft_pencil | Function | Check if the local pencils cover the global space without gaps |
check_create_args | dtfft_plan | Function | Check arguments provided by user and sets private variables |
check_device_pointers | dtfft_plan | Function | Checks if device pointers are provided by user |
check_if_even | dtfft_pencil | Function | Checks if data is evenly distributed across processes |
check_overlap | dtfft_pencil | Function | Check if two pencols overlap in ndims-dimensional space |
cleanup | dtfft_nvrtc_kernel_cache | Subroutine | Removes unused modules from cuda context |
Comm_f2c | dtfft_utils | Interface | |
compile_and_cache | dtfft_nvrtc_kernel | Function | Compiles kernel stored in |
config_constructor | dtfft_config | Function | Creates a new configuration |
count_bank_conflicts | dtfft_nvrtc_block_optimizer | Function | Counts bank conflicts for a given tile size, padding, element size, and block rows. |
count_unique | dtfft_utils | Function | Count the number of unique elements in the array |
create | dtfft_pencil | Subroutine | Creates pencil |
create | dtfft_abstract_executor | Function | Creates FFT plan |
create | dtfft_executor_fftw_m | Subroutine | Creates FFT plan via FFTW3 Interface |
create | dtfft_nvrtc_kernel | Subroutine | Creates kernel |
create | dtfft_executor_vkfft_m | Subroutine | Creates FFT plan via vkFFT Interface |
create | dtfft_abstract_transpose_plan | Function | Creates transposition plans |
create | dtfft_transpose_handle_host | Subroutine | Creates |
create | dtfft_executor_mkl_m | Subroutine | Creates FFT plan via MKL DFTI Interface |
create | dtfft_transpose_handle_cuda | Subroutine | Creates CUDA Transpose Handle |
create | dtfft_abstract_backend | Subroutine | Creates Abstract GPU Backend |
create | dtfft_executor_cufft_m | Subroutine | Creates FFT plan via cuFFT Interface |
create | dtfft_nvrtc_kernel_cache | Subroutine | Creates cache |
create | dtfft_backend_cufftmp_m | Subroutine | Creates cuFFTMp GPU Backend |
create_1d_comm | dtfft_pencil | Subroutine | Creates a new 1D communicator based on the fixed dimensions of the current pencil |
create_c2c | dtfft_plan | Subroutine | C2C Plan Constructor |
create_c2c_core | dtfft_plan | Function | Creates plan for both C2C and R2C |
create_c2c_internal | dtfft_plan | Function | Private method that combines common logic for C2C plan creation |
create_c2c_pencil | dtfft_plan | Subroutine | C2C Plan Constructor |
create_cart_comm | dtfft_abstract_transpose_plan | Subroutine | Creates cartesian communicator |
create_cuda | dtfft_transpose_plan_cuda | Function | Creates CUDA transpose plan |
create_data_handle | dtfft_transpose_handle_cuda | Subroutine | Creates handle |
create_device_pointer | dtfft_nvrtc_kernel | Subroutine | Allocates memory on a device and copies |
create_handle | dtfft_transpose_handle_host | Subroutine | Creates transposition handle |
create_helper | dtfft_abstract_backend | Subroutine | Creates helper |
create_helper | dtfft_backend_mpi | Subroutine | Creates MPI helper |
create_mpi | dtfft_backend_mpi | Subroutine | Creates MPI backend |
create_nccl | dtfft_backend_nccl_m | Subroutine | Creates NCCL backend |
create_nvtx_domain | dtfft_interface_nvtx | Subroutine | Creates a new NVTX domain |
create_pencil_init | dtfft_pencil | Function | Creates and validates pencil passed by user to plan constructors |
create_pencil_t | dtfft_pencil | Function | Creates pencil object, that can be used to create dtFFT plans |
create_pencils_and_comm | dtfft_abstract_transpose_plan | Subroutine | Creates cartesian communicator |
create_private | dtfft_transpose_plan_host | Function | Creates transposition plans |
create_private | dtfft_plan | Function | Creates core |
create_r2c | dtfft_plan | Subroutine | R2C Generic Plan Constructor |
create_r2c_internal | dtfft_plan | Function | Private method that combines common logic for R2C plan creation |
create_r2c_pencil | dtfft_plan | Subroutine | R2C Plan Constructor with pencil |
create_r2r | dtfft_plan | Subroutine | R2R Plan Constructor |
create_r2r_internal | dtfft_plan | Function | Creates plan for R2R plans |
create_r2r_pencil | dtfft_plan | Subroutine | R2R Plan Constructor |
create_transpose_2d | dtfft_transpose_handle_host | Subroutine | Creates two-dimensional transposition datatypes |
create_transpose_XY | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional X –> Y, Y –> X transposition datatypes |
create_transpose_XZ | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional X –> Z transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
create_transpose_YZ | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional Y –> Z, Z –> Y transposition datatypes |
create_transpose_ZX | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional Z –> X transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
cudaDeviceSynchronize | dtfft_interface_cuda_runtime | Interface | |
cudaEventCreate | dtfft_interface_cuda_runtime | Interface | |
cudaEventCreateWithFlags | dtfft_interface_cuda_runtime | Interface | |
cudaEventDestroy | dtfft_interface_cuda_runtime | Interface | |
cudaEventElapsedTime | dtfft_interface_cuda_runtime | Interface | |
cudaEventRecord | dtfft_interface_cuda_runtime | Interface | |
cudaEventSynchronize | dtfft_interface_cuda_runtime | Interface | |
cudaFree | dtfft_interface_cuda_runtime | Interface | |
cudaGetDevice | dtfft_interface_cuda_runtime | Interface | |
cudaGetDeviceCount | dtfft_interface_cuda_runtime | Interface | |
cudaGetErrorString | dtfft_interface_cuda_runtime | Function | Helper function that returns a string describing the given nvrtcResult code If the error code is not recognized, “unrecognized error code” is returned. |
cudaGetErrorString_c | dtfft_interface_cuda_runtime | Interface | |
cudaGetLastError | dtfft_interface_cuda_runtime | Interface | |
cudaMalloc | dtfft_interface_cuda_runtime | Interface | |
cudaMemcpy | dtfft_interface_cuda_runtime | Interface | Copies data synchronously between host and device. |
cudaMemcpyAsync | dtfft_interface_cuda_runtime | Interface | Copies data asynchronously between host and device. |
cudaMemGetInfo | dtfft_interface_cuda_runtime | Interface | |
cudaMemset | dtfft_interface_cuda_runtime | Interface | |
cudaSetDevice | dtfft_interface_cuda_runtime | Interface | |
cudaStreamCreate | dtfft_interface_cuda_runtime | Interface | |
cudaStreamDestroy | dtfft_interface_cuda_runtime | Interface | |
cudaStreamQuery | dtfft_interface_cuda_runtime | Interface | |
cudaStreamSynchronize | dtfft_interface_cuda_runtime | Interface | |
cudaStreamWaitEvent | dtfft_interface_cuda_runtime | Interface | |
cufftDestroy | dtfft_interface_cufft | Interface | Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. |
cufftGetErrorString | dtfft_interface_cufft | Function | Returns a string representation of the cuFFT error code. |
cufftMpAttachReshapeComm | dtfft_interface_cufft | Interface | Attaches a communication handle to a reshape. This function is not collective. |
cufftMpCreateReshape | dtfft_interface_cufft | Interface | Initializes a reshape handle for future use. This function is not collective. |
cufftMpDestroyReshape | dtfft_interface_cufft | Interface | Destroys a reshape and all its associated data. |
cufftMpExecReshapeAsync | dtfft_interface_cufft | Interface | Executes the reshape, redistributing data_in into data_out using the workspace in workspace. |
cufftMpGetReshapeSize | dtfft_interface_cufft | Interface | Returns the amount (in bytes) of workspace required to execute the handle. |
cufftMpMakeReshape | dtfft_interface_cufft | Interface | Creates a reshape intended to re-distribute a global array of 3D data. |
cufftPlanMany | dtfft_interface_cufft | Interface | Creates a FFT plan configuration of dimension rank, with sizes specified in the array n. |
cufftSetStream | dtfft_interface_cufft | Interface | Associates a CUDA stream with a cuFFT plan. |
cufftXtExec | dtfft_interface_cufft | Interface | Executes any cuFFT transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction parameter is ignored. |
cuLaunchKernel | dtfft_interface_cuda | Function | Launches a CUDA kernel |
destoy_helper | dtfft_backend_mpi | Subroutine | Destroys MPI helper |
destroy | dtfft_pencil | Subroutine | Destroys pencil |
destroy | dtfft_abstract_executor | Subroutine | Destroys plan |
destroy | dtfft_executor_fftw_m | Subroutine | Destroys FFTW3 plan |
destroy | dtfft_nvrtc_kernel | Subroutine | Destroys kernel |
destroy | dtfft_executor_vkfft_m | Subroutine | Destroys vkFFT plan |
destroy | dtfft_transpose_handle_host | Subroutine | Destroys |
destroy | dtfft_transpose_plan_host | Subroutine | Destroys transposition plans |
destroy | dtfft_executor_mkl_m | Subroutine | Destroys MKL plan |
destroy | dtfft_transpose_handle_cuda | Subroutine | Destroys CUDA Transpose Handle |
destroy | dtfft_abstract_backend | Subroutine | Destroys Abstract GPU Backend |
destroy | dtfft_executor_cufft_m | Subroutine | Destroys cuFFT plan |
destroy | dtfft_plan | Subroutine | Destroys plan, frees all memory |
destroy | dtfft_backend_cufftmp_m | Subroutine | Destroys cuFFTMp GPU Backend |
destroy_code | dtfft_nvrtc_kernel_generator | Subroutine | Frees all memory |
destroy_cuda | dtfft_transpose_plan_cuda | Subroutine | Destroys transposition plans |
destroy_data_handle | dtfft_transpose_handle_cuda | Subroutine | Destroys handle |
destroy_handle | dtfft_transpose_handle_host | Subroutine | Destroys transposition handle |
destroy_helper | dtfft_abstract_backend | Subroutine | Destroys helper |
destroy_mpi | dtfft_backend_mpi | Subroutine | Destroys MPI backend |
destroy_nccl | dtfft_backend_nccl_m | Subroutine | Destroys NCCL backend |
destroy_pencil_init | dtfft_pencil | Subroutine | Destroys pencil_init |
destroy_pencil_t | dtfft_pencil | Subroutine | Destroys pencil |
destroy_pencil_t_private | dtfft_pencil | Subroutine | Destroys pencil |
destroy_stream | dtfft_config | Subroutine | Destroy the default stream if it was created |
destroy_string | dtfft_utils | Subroutine | |
destroy_strings | dtfft_utils | Subroutine | Destroys array of string objects |
DftiErrorMessage | dtfft_interface_mkl_m | Function | Generates an error message. |
DftiErrorMessage_c | dtfft_interface_mkl_m | Interface | |
dl_error | dtfft_utils | Subroutine | Writes error message to the error unit |
dlclose | dtfft_utils | Interface | |
dlerror | dtfft_utils | Interface | |
dlopen | dtfft_utils | Interface | |
dlsym | dtfft_utils | Interface | |
double_to_string | dtfft_utils | Function | Convert double to string |
dtfft_config_t | dtfft_config | Interface | Interface to create a new configuration |
dtfft_create_config | dtfft_config | Subroutine | Creates a new configuration and sets default values. |
dtfft_create_plan_c2c_c | dtfft_api | Function | Creates C2C dtFFT Plan, allocates all structures and prepares FFT, C/C++ interface |
dtfft_create_plan_c2c_pencil_c | dtfft_api | Function | Creates C2C dtFFT plan from Pencil, allocates all structures and prepares FFT, C/C++/Python interface |
dtfft_create_plan_r2r_c | dtfft_api | Function | Creates R2R dtFFT Plan, allocates all structures and prepares FFT, C/C++/Python interface |
dtfft_create_plan_r2r_pencil_c | dtfft_api | Function | Creates R2R dtFFT Plan from Pencil, allocates all structures and prepares FFT, C/C++/Python interface |
dtfft_destroy_c | dtfft_api | Function | Destroys dtFFT Plan, C/C++ interface |
dtfft_execute_c | dtfft_api | Function | Executes dtFFT Plan, C/C++ interface. |
dtfft_get_alloc_bytes_c | dtfft_api | Function | Returns minimum number of bytes required to execute plan, C/C++ interface |
dtfft_get_alloc_size_c | dtfft_api | Function | Returns minimum number of bytes to be allocated for |
dtfft_get_backend_c | dtfft_api | Function | Returns selected dtfft_backend_t during autotuning |
dtfft_get_backend_string | dtfft_parameters | Function | Gets the string description of a GPU backend |
dtfft_get_backend_string_c | dtfft_api | Subroutine | Returns string representation of |
dtfft_get_cuda_stream | dtfft_parameters | Function | Returns the CUDA stream from dtfft_stream_t |
dtfft_get_dims_c | dtfft_api | Function | Returns dimensions of plan, C/C++ interface |
dtfft_get_element_size_c | dtfft_api | Function | Returns size of element in bytes, C/C++ interface |
dtfft_get_error_string | dtfft_errors | Function | Gets the string description of an error code |
dtfft_get_error_string_c | dtfft_api | Subroutine | Returns an explaination of |
dtfft_get_executor_c | dtfft_api | Function | Returns executor type used in plan, C/C++ interface |
dtfft_get_executor_string | dtfft_parameters | Function | Gets the string description of an executor |
dtfft_get_executor_string_c | dtfft_api | Subroutine | |
dtfft_get_local_sizes_c | dtfft_api | Function | Returns local sizes, counts in real and Fourier spaces and number of elements to be allocated for |
dtfft_get_pencil_c | dtfft_api | Function | Returns pencil decomposition info, C/C++ interface |
dtfft_get_platform_c | dtfft_api | Function | Returns selected dtfft_platform_t during autotuning |
dtfft_get_precision_c | dtfft_api | Function | Returns precision used in plan, C/C++ interface |
dtfft_get_precision_string | dtfft_parameters | Function | Gets the string description of a precision |
dtfft_get_precision_string_c | dtfft_api | Subroutine | |
dtfft_get_stream_c | dtfft_api | Function | Returns Stream associated with plan |
dtfft_get_version | dtfft_parameters | Interface | Get dtFFT version |
dtfft_get_version_current | dtfft_parameters | Function | Returns the current version code |
dtfft_get_version_required | dtfft_parameters | Function | Returns the version code required by the user |
dtfft_get_z_slab_enabled_c | dtfft_api | Function | Checks if dtFFT Plan is using Z-slab optimization |
dtfft_mem_alloc_c | dtfft_api | Function | Allocates memory for dtFFT Plan, C/C++ interface |
dtfft_mem_free_c | dtfft_api | Function | Frees memory for dtFFT Plan, C/C++ interface |
dtfft_pencil_t | dtfft_pencil | Interface | Type bound constuctor for dtfft_pencil_t |
dtfft_report_c | dtfft_api | Function | Reports dtFFT Plan, C/C++ interface |
dtfft_set_config | dtfft_config | Subroutine | Sets configuration parameters |
dtfft_set_config_c | dtfft_api | Function | Sets dtFFT configuration, C/C++ interface |
dtfft_stream_t | dtfft_parameters | Interface | Creates dtfft_stream_t from integer(cuda_stream_kind) |
dtfft_transpose_c | dtfft_api | Function | Executes single transposition, C/C++ interface. |
dynamic_load | dtfft_utils | Function | Dynamically loads library and its symbols |
effort_eq | dtfft_parameters | Function | |
effort_ne | dtfft_parameters | Function | |
estimate_bank_conflict_ratio | dtfft_nvrtc_block_optimizer | Function | Estimates the bank conflict ratio for a given kernel configuration |
estimate_coalescing | dtfft_nvrtc_block_optimizer | Function | Estimate memory coalescing efficiency for a given kernel configuration and transpose type |
estimate_memory_pressure | dtfft_nvrtc_block_optimizer | Function | Analytical estimation of memory pressure based on GPU architecture |
estimate_occupancy | dtfft_nvrtc_block_optimizer | Function | Calculates theoretical occupancy for a given kernel configuration |
estimate_optimal_padding | dtfft_nvrtc_block_optimizer | Function | Estimates the optimal padding for a given tile size and element size |
evaluate_analytical_performance | dtfft_nvrtc_block_optimizer | Function | This function evaluates the performance of a kernel configuration based on various architectural and problem-specific parameters. |
execute | dtfft_abstract_executor | Subroutine | Executes plan |
execute | dtfft_executor_fftw_m | Subroutine | Executes FFTW3 plan |
execute | dtfft_nvrtc_kernel | Subroutine | Executes kernel on stream |
execute | dtfft_executor_vkfft_m | Subroutine | Executes vkFFT plan |
execute | dtfft_abstract_transpose_plan | Subroutine | Executes single transposition |
execute | dtfft_transpose_handle_host | Subroutine | Executes transposition |
execute | dtfft_executor_mkl_m | Subroutine | Executes MKL plan |
execute | dtfft_transpose_handle_cuda | Subroutine | Executes transpose - exchange - unpack |
execute | dtfft_abstract_backend | Subroutine | Executes GPU Backend |
execute | dtfft_executor_cufft_m | Subroutine | Executes cuFFT plan |
execute | dtfft_plan | Subroutine | Executes plan |
execute | dtfft_backend_cufftmp_m | Subroutine | Executes cuFFTMp GPU Backend |
execute_cuda | dtfft_transpose_plan_cuda | Subroutine | Executes single transposition |
execute_mpi | dtfft_backend_mpi | Subroutine | Executes MPI backend |
execute_nccl | dtfft_backend_nccl_m | Subroutine | Executes NCCL backend |
execute_private | dtfft_transpose_plan_host | Subroutine | Executes single transposition |
execute_private | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
execute_ptr | dtfft_plan | Subroutine | Executes plan using type(c_ptr) pointers instead of buffers |
execute_type_eq | dtfft_parameters | Function | |
execute_type_ne | dtfft_parameters | Function | |
executor_eq | dtfft_parameters | Function | |
executor_ne | dtfft_parameters | Function | |
fftw_execute_dft | dtfft_interface_fftw_m | Interface | |
fftw_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftw_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftw_execute_r2r | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftwf_execute_r2r | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
find_valid_combination | dtfft_nvrtc_block_optimizer | Subroutine | This subroutine optimizes the tile size and number of rows for narrow matrices by adjusting them to be compatible with the warp size. |
float_to_string | dtfft_utils | Function | Convert double to string |
free_datatypes | dtfft_transpose_handle_host | Subroutine | Frees temporary datatypes |
free_mem | dtfft_abstract_transpose_plan | Subroutine | Frees memory based on |
generate_candidates | dtfft_nvrtc_block_optimizer | Subroutine | Generate kernel configuration candidates for given problem |
get | dtfft_nvrtc_kernel_cache | Function | Returns cached kernel if it exists. If not returns null pointer. |
get_alloc_bytes | dtfft_plan | Function | Returns minimum number of bytes required to execute plan |
get_alloc_size | dtfft_plan | Function | Wrapper around |
get_ampere_architecture | dtfft_nvrtc_block_optimizer | Function | Ampere architecture (Compute Capability 8.0) |
get_aux_size | dtfft_abstract_transpose_plan | Function | |
get_aux_size | dtfft_transpose_plan_cuda | Function | |
get_aux_size | dtfft_transpose_handle_cuda | Function | Returns number of bytes required by aux buffer |
get_aux_size | dtfft_abstract_backend | Function | Returns number of bytes required by aux buffer |
get_backend | dtfft_abstract_transpose_plan | Function | Returns plan GPU backend |
get_backend | dtfft_plan | Function | Returns selected GPU backend during autotuning |
get_code_init | dtfft_nvrtc_kernel_generator | Subroutine | Generates basic code that is used in all other kernels |
get_comm | dtfft_api | Function | |
get_conf_backend | dtfft_config | Function | Returns GPU backend set by the user or default one |
get_conf_configs_to_test | dtfft_config | Function | Returns the number of configurations to test |
get_conf_forced_kernel_optimization | dtfft_config | Function | Whether forced kernel optimization is enabled or not |
get_conf_internal | dtfft_config | Interface | Returns value from configuration unless environment variable is set |
get_conf_internal_int32 | dtfft_config | Function | Returns value from configuration unless environment variable is set |
get_conf_internal_logical | dtfft_config | Function | Returns value from configuration unless environment variable is set |
get_conf_kernel_optimization_enabled | dtfft_config | Function | Whether kernel optimization is enabled or not |
get_conf_log_enabled | dtfft_config | Function | Whether logging is enabled or not |
get_conf_measure_iters | dtfft_config | Function | Returns the number of measurement iterations |
get_conf_measure_warmup_iters | dtfft_config | Function | Returns the number of warmup iterations |
get_conf_mpi_enabled | dtfft_config | Function | Whether MPI backends are enabled or not |
get_conf_nccl_enabled | dtfft_config | Function | Whether NCCL backends are enabled or not |
get_conf_nvshmem_enabled | dtfft_config | Function | Whether nvshmem backends are enabled or not |
get_conf_pipelined_enabled | dtfft_config | Function | Whether pipelined backends are enabled or not |
get_conf_platform | dtfft_config | Function | Returns platform set by the user or default one |
get_conf_stream | dtfft_config | Function | Returns either the custom provided by user or creates a new one |
get_conf_z_slab_enabled | dtfft_config | Function | Whether Z-slab optimization is enabled or not |
get_contiguous_execution_blocks | dtfft_nvrtc_kernel | Subroutine | Gets the number of blocks and threads for a contiguous execution |
get_datatype_from_env | dtfft_config | Function | Obtains datatype id from environment variable |
get_device_props | dtfft_interface_cuda_runtime | Interface | |
get_dims | dtfft_plan | Subroutine | Returns global dimensions |
get_element_size | dtfft_plan | Function | Returns number of bytes required to store single element. |
get_env | dtfft_config | Interface | Obtains environment variable |
get_env_base | dtfft_config | Function | Base function of obtaining dtFFT environment variable |
get_env_int32 | dtfft_config | Function | Base Integer function of obtaining dtFFT environment variable |
get_env_int8 | dtfft_config | Function | Obtains int8 environment variable |
get_env_logical | dtfft_config | Function | Obtains logical environment variable |
get_env_string | dtfft_config | Function | Obtains string environment variable |
get_executor | dtfft_plan | Function | Returns FFT Executor associated with plan |
get_inverse_kind | dtfft_utils | Function | Get the inverse R2R kind of transform for the given R2R kind |
get_kernel | dtfft_nvrtc_kernel | Subroutine | Compiles kernel and caches it. Returns compiled kernel. |
get_kernel_args | dtfft_nvrtc_kernel | Subroutine | Populates kernel arguments based on kernel type |
get_local_size | dtfft_pencil | Subroutine | Computes local portions of data based on global count and position inside grid communicator |
get_local_sizes | dtfft_pencil | Subroutine | Obtain local starts and counts in |
get_local_sizes | dtfft_plan | Subroutine | Obtain local starts and counts in |
get_neighbor_function_code | dtfft_nvrtc_kernel_generator | Subroutine | Generated device function that is used to determite id of process that to which data is being sent or from which data has been recieved based on local element coordinate |
get_pencil | dtfft_plan | Function | Returns pencil decomposition |
get_plan_execution_time | dtfft_transpose_plan_host | Function | Creates transpose plan and executes it |
get_platform | dtfft_plan | Function | Returns execution platform of the plan (HOST or CUDA) |
get_precision | dtfft_plan | Function | Returns precision of the plan |
get_stream_int64 | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
get_stream_ptr | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
get_transpose_kernel | dtfft_nvrtc_kernel | Subroutine | |
get_transpose_kernel_code | dtfft_nvrtc_kernel_generator | Function | Generates code that will be used to locally tranpose data and prepares to send it to other processes ndims == 2 |
get_transpose_type | dtfft_pencil | Function | Determines transpose ID based on pencils |
get_true_transpose_type | dtfft_nvrtc_kernel_cache | Function | Returns generic transpose id. Since X-Y and Y-Z transpositions are symmectric, it returns only one of them. X-Z and Z-X are not symmetric |
get_unpack_kernel_code | dtfft_nvrtc_kernel_generator | Function | Generates code that will be used to unpack data when it is recieved |
get_unpack_pipelined_kernel_code | dtfft_nvrtc_kernel_generator | Function | Generates code that will be used to partially unpack data when it is recieved from other process |
get_varying_dim | dtfft_pencil | Function | |
get_volta_architecture | dtfft_nvrtc_block_optimizer | Function | Volta architecture (Compute Capability 7.0) |
get_z_slab_enabled | dtfft_plan | Function | Returns logical value is Z-slab optimization enabled internally |
gpu_backend_eq | dtfft_parameters | Function | |
gpu_backend_ne | dtfft_parameters | Function | |
init_environment | dtfft_config | Subroutine | |
init_internal | dtfft_config | Function | Checks if MPI is initialized and loads environment variables |
int32_to_string | dtfft_utils | Function | Convert 32-bit integer to string |
int64_to_string | dtfft_utils | Function | Convert 64-bit integer to string |
int8_to_string | dtfft_utils | Function | Convert 8-bit integer to string |
is_backend_cufftmp | dtfft_parameters | Function | |
is_backend_mpi | dtfft_parameters | Function | |
is_backend_nccl | dtfft_parameters | Function | |
is_backend_nvshmem | dtfft_parameters | Function | |
is_backend_pipelined | dtfft_parameters | Function | |
is_cuda_executor | dtfft_parameters | Function | |
is_device_ptr | dtfft_utils | Interface | |
is_host_executor | dtfft_parameters | Function | |
is_null_funptr | dtfft_utils | Function | Checks if pointer is NULL |
is_null_ptr | dtfft_utils | Function | Checks if pointer is NULL |
is_null_ptr | dtfft_utils | Interface | Checks if pointer is NULL |
is_nvshmem_ptr | dtfft_interface_nvshmem | Function | Checks if pointer is a symmetric nvshmem allocated pointer |
is_same_ptr | dtfft_utils | Function | Checks if two pointer are the same |
is_transpose_kernel | dtfft_parameters | Function | |
is_unpack_kernel | dtfft_parameters | Function | |
is_valid_comm_type | dtfft_parameters | Function | |
is_valid_dimension | dtfft_parameters | Function | |
is_valid_effort | dtfft_parameters | Function | |
is_valid_execute_type | dtfft_parameters | Function | |
is_valid_executor | dtfft_parameters | Function | |
is_valid_gpu_backend | dtfft_parameters | Function | |
is_valid_platform | dtfft_parameters | Function | |
is_valid_precision | dtfft_parameters | Function | |
is_valid_r2r_kind | dtfft_parameters | Function | |
is_valid_transpose_type | dtfft_parameters | Function | |
kernel_type_eq | dtfft_parameters | Function | |
kernel_type_ne | dtfft_parameters | Function | |
load | dtfft_interface_vkfft_m | Function | Loads VkFFT library |
load_cuda | dtfft_interface_cuda | Function | Loads the CUDA Driver library and needed symbols |
load_library | dtfft_utils | Function | Dynamically loads library |
load_nvrtc | dtfft_interface_nvrtc | Function | Dynamically loads nvRTC library and its functions |
load_symbol | dtfft_utils | Function | Dynamically loads symbol from library |
load_vkfft | dtfft_interface_vkfft_m | Function | Loads VkFFT library based on the platform |
make_plan | dtfft_executor_mkl_m | Subroutine | Creates general MKL plan |
make_public | dtfft_pencil | Function | Creates public object that users can use to create own FFT backends |
mem_alloc | dtfft_executor_fftw_m | Subroutine | Allocates FFTW3 memory |
mem_alloc | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
mem_alloc | dtfft_abstract_transpose_plan | Subroutine | Allocates memory based on selected backend |
mem_alloc | dtfft_executor_mkl_m | Subroutine | Allocates MKL memory |
mem_alloc | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
mem_alloc_c32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_c32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_c32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_c64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_c64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_c64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_host | dtfft_utils | Function | Allocates memory using C11 Standard alloc_align with 16 bytes alignment |
mem_alloc_ptr | dtfft_plan | Subroutine | Allocates memory specific for this plan |
mem_alloc_r32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_r32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_r32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_r64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_r64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_r64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_free | dtfft_executor_fftw_m | Subroutine | Frees FFTW3 aligned memory |
mem_free | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
mem_free | dtfft_abstract_transpose_plan | Subroutine | Frees memory allocated with mem_alloc |
mem_free | dtfft_executor_mkl_m | Subroutine | Frees MKL aligned memory |
mem_free | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
mem_free_c32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_host | dtfft_utils | Interface | |
mem_free_ptr | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mkl_dfti_commit_desc | dtfft_interface_mkl_m | Interface | |
mkl_dfti_create_desc | dtfft_interface_mkl_m | Interface | |
mkl_dfti_execute | dtfft_interface_mkl_m | Interface | |
mkl_dfti_free_desc | dtfft_interface_mkl_m | Interface | |
mkl_dfti_mem_alloc | dtfft_interface_mkl_m | Interface | |
mkl_dfti_mem_free | dtfft_interface_mkl_m | Interface | |
mkl_dfti_set_value | dtfft_interface_mkl_m | Interface | Sets one particular configuration parameter with the specified configuration value. |
ncclCommDeregister | dtfft_interface_nccl | Interface | Deregister a buffer for collective communication. |
ncclCommDestroy | dtfft_interface_nccl | Interface | Destroy a communicator object comm. |
ncclCommInitRank | dtfft_interface_nccl | Interface | Creates a new communicator (multi thread/process version). |
ncclCommRegister | dtfft_interface_nccl | Interface | Register a buffer for collective communication. |
ncclGetErrorString | dtfft_interface_nccl | Function | Generates an error message. |
ncclGetErrorString_c | dtfft_interface_nccl | Interface | Returns a human-readable string corresponding to the passed error code. |
ncclGetUniqueId | dtfft_interface_nccl | Interface | Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user. |
ncclGroupEnd | dtfft_interface_nccl | Interface | End a group call. |
ncclGroupStart | dtfft_interface_nccl | Interface | Start a group call. |
ncclMemAlloc | dtfft_interface_nccl | Interface | Allocate a GPU buffer with size. Allocated buffer head address will be returned by ptr, and the actual allocated size can be larger than requested because of the buffer granularity requirements from all types of NCCL optimizations. |
ncclMemFree | dtfft_interface_nccl | Interface | Free memory allocated by ncclMemAlloc(). |
ncclRecv | dtfft_interface_nccl | Interface | Receive data from rank peer into recvbuff. |
ncclSend | dtfft_interface_nccl | Interface | Send data from sendbuff to rank peer. |
nvrtcGetErrorString | dtfft_interface_nvrtc | Function | Helper function that returns a string describing the given nvrtcResult code For unrecognized enumeration values, it returns “NVRTC_ERROR unknown” |
nvshmem_free | dtfft_interface_nvshmem | Interface | |
nvshmem_malloc | dtfft_interface_nvshmem | Interface | |
nvshmem_my_pe | dtfft_interface_nvshmem | Interface | |
nvshmem_ptr | dtfft_interface_nvshmem | Interface | |
nvshmemx_float_alltoall_on_stream | dtfft_interface_nvshmem | Interface | |
nvshmemx_init_status | dtfft_interface_nvshmem | Interface | |
nvshmemx_sync_all_on_stream | dtfft_interface_nvshmem | Interface | |
nvtxDomainCreate_c | dtfft_interface_nvtx | Interface | |
nvtxDomainRangePop_c | dtfft_interface_nvtx | Interface | |
nvtxDomainRangePushEx_c | dtfft_interface_nvtx | Interface | |
operator(/=) | dtfft_parameters | Interface | |
operator(==) | dtfft_parameters | Interface | |
pencil_c2f | dtfft_pencil | Subroutine | Converts C pencil to Fortran pencil |
pencil_f2c | dtfft_pencil | Subroutine | Converts Fortran pencil to C pencil |
platform_eq | dtfft_parameters | Function | |
platform_ne | dtfft_parameters | Function | |
pop_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pops a range from the NVTX domain |
precision_eq | dtfft_parameters | Function | |
precision_ne | dtfft_parameters | Function | |
push_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pushes a range to the NVTX domain |
r2r_kind_eq | dtfft_parameters | Function | |
r2r_kind_ne | dtfft_parameters | Function | |
remove | dtfft_nvrtc_kernel_cache | Subroutine | Takes CUDA kernel as an argument and searches for it in cache
If kernel is found than reduces |
report | dtfft_plan | Subroutine | Prints plan-related information to stdout |
run_autotune_backend | dtfft_transpose_plan_cuda | Subroutine | Runs autotune for all backends |
run_mpi_a2a | dtfft_backend_mpi | Subroutine | Executes MPI all-to-all communication |
run_mpi_p2p | dtfft_backend_mpi | Subroutine | Executes MPI point-to-point communication |
set_unpack_kernel | dtfft_abstract_backend | Subroutine | Sets unpack kernel for pipelined backend |
sort_by_varying_dim | dtfft_pencil | Subroutine | |
sort_candidates_by_score | dtfft_nvrtc_block_optimizer | Subroutine | Sorting candidates by their performance scores |
stream_from_int64 | dtfft_parameters | Function | Creates dtfft_stream_t from integer(cuda_stream_kind) |
string | dtfft_utils | Interface | Creates string object |
string_c2f | dtfft_utils | Subroutine | Convert C string to Fortran string |
string_constructor | dtfft_utils | Function | Creates string object |
string_f2c | dtfft_utils | Subroutine | Convert Fortran string to C string |
to_cstr | dtfft_nvrtc_kernel_generator | Subroutine | Converts Fortran CUDA code to C pointer |
to_str | dtfft_utils | Interface | Convert various types to string |
transpose | dtfft_plan | Subroutine | Performs single transposition |
transpose_ptr | dtfft_plan | Subroutine | Performs single transposition using type(c_ptr) pointers instead of buffers |
transpose_type_eq | dtfft_parameters | Function | |
transpose_type_ne | dtfft_parameters | Function | |
unload_library | dtfft_utils | Subroutine | Unloads library |
write_message | dtfft_utils | Subroutine | Write message to the specified unit |