Procedure | Location | Procedure Type | Description |
---|---|---|---|
add_line | dtfft_nvrtc_kernel | Subroutine | Adds new line to CUDA code |
alloc_and_set_aux | dtfft_transpose_plan_cuda | Function | Allocates auxiliary memory according to the backend and sets it to the plans |
alloc_fft_plans | dtfft_plan | Subroutine | Allocates abstract_executor with required FFT class and populates fft_mapping with similar FFT ids |
alloc_mem | dtfft_abstract_transpose_plan | Subroutine | Allocates memory based on |
astring_f2c | dtfft_utils | Subroutine | Convert Fortran string to C allocatable string |
autotune_grid | dtfft_transpose_plan_cuda | Subroutine | Creates cartesian grid and runs various backends on it. Can return best backend and execution time |
autotune_grid | dtfft_transpose_plan_host | Subroutine | Creates cartesian communicator and executes various datatypes on it |
autotune_grid_decomposition | dtfft_transpose_plan_cuda | Subroutine | Runs through all possible grid decompositions and selects the best one based on the lowest average execution time |
autotune_grid_decomposition | dtfft_transpose_plan_host | Subroutine | Runs through all possible grid decompositions and selects the best one based on the lowest average execution time |
autotune_mpi_datatypes | dtfft_transpose_plan_host | Subroutine | |
autotune_transpose_id | dtfft_transpose_plan_host | Function | Creates forward and backward transpose plans bases on source and target data distributing,
executes them |
check_aux | dtfft_plan | Subroutine | Checks if aux buffer was passed by user and if not will allocate one internally |
check_create_args | dtfft_plan | Function | Check arguments provided by user and sets private variables |
check_device_pointers | dtfft_plan | Function | Checks if device pointers are provided by user |
clean_unused_cache | dtfft_nvrtc_kernel | Subroutine | Removes unused modules from cuda context |
Comm_f2c | dtfft_utils | Interface | Converts Fortran communicator to C |
compile_and_cache | dtfft_nvrtc_kernel | Function | Compiles kernel and caches it. Returns compiled kernel. |
config_constructor | dtfft_config | Function | Creates a new configuration |
count_unique | dtfft_utils | Function | Count the number of unique elements in the array |
create | dtfft_executor_cufft_m | Subroutine | Creates FFT plan via cuFFT Interface |
create | dtfft_nvrtc_kernel | Subroutine | Creates kernel |
create | dtfft_executor_mkl_m | Subroutine | Creates FFT plan via MKL DFTI Interface |
create | dtfft_abstract_transpose_plan | Function | Creates transposition plans |
create | dtfft_abstract_executor | Function | Creates FFT plan |
create | dtfft_abstract_backend | Subroutine | Creates Abstract GPU Backend |
create | dtfft_transpose_handle_cuda | Subroutine | Creates CUDA Transpose Handle |
create | dtfft_pencil | Subroutine | Creates pencil |
create | dtfft_executor_fftw_m | Subroutine | Creates FFT plan via FFTW3 Interface |
create | dtfft_backend_cufftmp_m | Subroutine | Creates cuFFTMp GPU Backend |
create | dtfft_executor_vkfft_m | Subroutine | Creates FFT plan via vkFFT Interface |
create | dtfft_transpose_handle_host | Subroutine | Creates |
create_c2c | dtfft_plan | Subroutine | C2C Plan Constructor |
create_c2c_internal | dtfft_plan | Function | Creates plan for both C2C and R2C |
create_cart_comm | dtfft_abstract_transpose_plan | Subroutine | Creates cartesian communicator |
create_cuda | dtfft_transpose_plan_cuda | Function | Creates CUDA transpose plan |
create_data_handle | dtfft_transpose_handle_cuda | Subroutine | Creates handle |
create_device_pointer | dtfft_nvrtc_kernel | Subroutine | Allocates memory on a device and copies |
create_handle | dtfft_transpose_handle_host | Subroutine | Creates transposition handle |
create_helper | dtfft_abstract_backend | Subroutine | Creates helper |
create_helper | dtfft_backend_mpi | Subroutine | Creates MPI helper |
create_mpi | dtfft_backend_mpi | Subroutine | Creates MPI backend |
create_nccl | dtfft_backend_nccl_m | Subroutine | Creates NCCL backend |
create_nvtx_domain | dtfft_interface_nvtx | Subroutine | Creates a new NVTX domain |
create_private | dtfft_plan | Function | Creates core |
create_private | dtfft_transpose_plan_host | Function | Creates transposition plans |
create_r2c | dtfft_plan | Subroutine | R2C Generic Plan Constructor |
create_r2r | dtfft_plan | Subroutine | R2R Plan Constructor |
create_transpose_2d | dtfft_transpose_handle_host | Subroutine | Creates two-dimensional transposition datatypes |
create_transpose_XY | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional X –> Y, Y –> X transposition datatypes |
create_transpose_XZ | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional X –> Z transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
create_transpose_YZ | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional Y –> Z, Z –> Y transposition datatypes |
create_transpose_ZX | dtfft_transpose_handle_host | Subroutine | Creates three-dimensional Z –> X transposition datatypes Can only be used with 3D slab decomposition when slabs are distributed in Z direction |
cudaDeviceSynchronize | dtfft_interface_cuda_runtime | Interface | Synchronizes the device, blocking until all preceding tasks in all streams have completed. |
cudaEventCreate | dtfft_interface_cuda_runtime | Interface | Creates an event. |
cudaEventCreateWithFlags | dtfft_interface_cuda_runtime | Interface | Creates an event with the specified flags. |
cudaEventDestroy | dtfft_interface_cuda_runtime | Interface | Destroys an event. |
cudaEventElapsedTime | dtfft_interface_cuda_runtime | Interface | Computes the elapsed time between two events. |
cudaEventRecord | dtfft_interface_cuda_runtime | Interface | Records an event in a stream. |
cudaEventSynchronize | dtfft_interface_cuda_runtime | Interface | Waits for an event to complete. |
cudaFree | dtfft_interface_cuda_runtime | Interface | Frees memory on the device. |
cudaGetDevice | dtfft_interface_cuda_runtime | Interface | Returns the current device. |
cudaGetDeviceCount | dtfft_interface_cuda_runtime | Interface | Returns the number of available devices. |
cudaGetErrorString | dtfft_interface_cuda_runtime | Function | Helper function that returns a string describing the given nvrtcResult code If the error code is not recognized, “unrecognized error code” is returned. |
cudaGetErrorString_c | dtfft_interface_cuda_runtime | Interface | Returns the string representation of an error code. |
cudaMalloc | dtfft_interface_cuda_runtime | Interface | Allocates memory on the device. |
cudaMemcpy | dtfft_interface_cuda_runtime | Interface | Copies data synchronously between host and device. |
cudaMemcpyAsync | dtfft_interface_cuda_runtime | Interface | Copies data asynchronously between host and device. |
cudaMemGetInfo | dtfft_interface_cuda_runtime | Interface | Returns the amount of free and total memory on the device. |
cudaMemset | dtfft_interface_cuda_runtime | Interface | Initializes or sets device memory to a value. |
cudaSetDevice | dtfft_interface_cuda_runtime | Interface | Sets the current device. |
cudaStreamCreate | dtfft_interface_cuda_runtime | Interface | Creates an asynchronous stream. |
cudaStreamDestroy | dtfft_interface_cuda_runtime | Interface | Destroys an asynchronous stream. |
cudaStreamQuery | dtfft_interface_cuda_runtime | Interface | Queries an asynchronous stream for completion status. |
cudaStreamSynchronize | dtfft_interface_cuda_runtime | Interface | Waits for stream tasks to complete. |
cudaStreamWaitEvent | dtfft_interface_cuda_runtime | Interface | Makes a stream wait on an event. |
cufftDestroy | dtfft_interface_cufft | Interface | Frees all GPU resources associated with a cuFFT plan and destroys the internal plan data structure. |
cufftGetErrorString | dtfft_interface_cufft | Function | Returns a string representation of the cuFFT error code. |
cufftMpAttachReshapeComm | dtfft_interface_cufft | Interface | Attaches a communication handle to a reshape. This function is not collective. |
cufftMpCreateReshape | dtfft_interface_cufft | Interface | Initializes a reshape handle for future use. This function is not collective. |
cufftMpDestroyReshape | dtfft_interface_cufft | Interface | Destroys a reshape and all its associated data. |
cufftMpExecReshapeAsync | dtfft_interface_cufft | Interface | Executes the reshape, redistributing data_in into data_out using the workspace in workspace. |
cufftMpGetReshapeSize | dtfft_interface_cufft | Interface | Returns the amount (in bytes) of workspace required to execute the handle. |
cufftMpMakeReshape | dtfft_interface_cufft | Interface | Creates a reshape intended to re-distribute a global array of 3D data. |
cufftPlanMany | dtfft_interface_cufft | Interface | Creates a FFT plan configuration of dimension rank, with sizes specified in the array n. |
cufftSetStream | dtfft_interface_cufft | Interface | Associates a CUDA stream with a cuFFT plan. |
cufftXtExec | dtfft_interface_cufft | Interface | Executes any cuFFT transform regardless of precision and type. In case of complex-to-real and real-to-complex transforms, the direction parameter is ignored. |
cuLaunchKernel | dtfft_interface_cuda | Function | Launches a CUDA function CUfunction or a CUDA kernel CUkernel. |
destoy_helper | dtfft_backend_mpi | Subroutine | Destroys MPI helper |
destroy | dtfft_plan | Subroutine | Destroys plan, frees all memory |
destroy | dtfft_executor_cufft_m | Subroutine | Destroys cuFFT plan |
destroy | dtfft_nvrtc_kernel | Subroutine | Destroys kernel |
destroy | dtfft_executor_mkl_m | Subroutine | Destroys MKL plan |
destroy | dtfft_abstract_executor | Subroutine | Destroys plan |
destroy | dtfft_abstract_backend | Subroutine | Destroys Abstract GPU Backend |
destroy | dtfft_transpose_handle_cuda | Subroutine | Destroys CUDA Transpose Handle |
destroy | dtfft_pencil | Subroutine | Destroys pencil |
destroy | dtfft_executor_fftw_m | Subroutine | Destroys FFTW3 plan |
destroy | dtfft_backend_cufftmp_m | Subroutine | Destroys cuFFTMp GPU Backend |
destroy | dtfft_executor_vkfft_m | Subroutine | Destroys vkFFT plan |
destroy | dtfft_transpose_plan_host | Subroutine | Destroys transposition plans |
destroy | dtfft_transpose_handle_host | Subroutine | Destroys |
destroy_code | dtfft_nvrtc_kernel | Subroutine | Frees all memory |
destroy_cuda | dtfft_transpose_plan_cuda | Subroutine | Destroys transposition plans |
destroy_data_handle | dtfft_transpose_handle_cuda | Subroutine | Destroys handle |
destroy_handle | dtfft_transpose_handle_host | Subroutine | Destroys transposition handle |
destroy_helper | dtfft_abstract_backend | Subroutine | Destroys helper |
destroy_mpi | dtfft_backend_mpi | Subroutine | Destroys MPI backend |
destroy_nccl | dtfft_backend_nccl_m | Subroutine | Destroys NCCL backend |
destroy_pencil_t | dtfft_pencil | Subroutine | Destroys pencil |
destroy_stream | dtfft_config | Subroutine | Destroy the default stream if it was created |
destroy_strings | dtfft_utils | Subroutine | Destroys array of string objects |
DftiErrorMessage | dtfft_interface_mkl_m | Function | Generates an error message. |
DftiErrorMessage_c | dtfft_interface_mkl_m | Interface | Generates an error message. |
dl_error | dtfft_utils | Subroutine | Writes error message to the error unit |
dlclose | dtfft_utils | Interface | Close a dynamic library or bundle |
dlerror | dtfft_utils | Interface | Get diagnostic information |
dlopen | dtfft_utils | Interface | Load and link a dynamic library or bundle |
dlsym | dtfft_utils | Interface | Get address of a symbol |
double_to_str | dtfft_utils | Function | Convert double to string |
dtfft_config_t | dtfft_config | Interface | Interface to create a new configuration |
dtfft_create_config | dtfft_config | Subroutine | Creates a new configuration with default values. |
dtfft_create_plan_c2c_c | dtfft_api | Function | Creates C2C dtFFT Plan, allocates all structures and prepares FFT, C/C++ interface |
dtfft_create_plan_r2r_c | dtfft_api | Function | Creates R2R dtFFT Plan, allocates all structures and prepares FFT, C/C++/Python interface |
dtfft_destroy_c | dtfft_api | Function | Destroys dtFFT Plan, C/C++ interface |
dtfft_execute_c | dtfft_api | Function | Executes dtFFT Plan, C/C++ interface. |
dtfft_get_alloc_bytes_c | dtfft_api | Function | Returns minimum number of bytes required to execute plan, C/C++ interface |
dtfft_get_alloc_size_c | dtfft_api | Function | Returns minimum number of bytes to be allocated for |
dtfft_get_backend_c | dtfft_api | Function | Returns selected dtfft_backend_t during autotuning |
dtfft_get_backend_string | dtfft_parameters | Function | Gets the string description of a GPU backend |
dtfft_get_backend_string_c | dtfft_api | Subroutine | Returns string representation of |
dtfft_get_cuda_stream | dtfft_parameters | Function | Returns the CUDA stream from dtfft_stream_t |
dtfft_get_element_size_c | dtfft_api | Function | Returns size of element in bytes, C/C++ interface |
dtfft_get_error_string | dtfft_parameters | Function | Gets the string description of an error code |
dtfft_get_error_string_c | dtfft_api | Subroutine | Returns an explaination of |
dtfft_get_local_sizes_c | dtfft_api | Function | Returns local sizes, counts in real and Fourier spaces and number of elements to be allocated for |
dtfft_get_pencil_c | dtfft_api | Function | Returns pencil decomposition info, C/C++ interface |
dtfft_get_platform_c | dtfft_api | Function | Returns selected dtfft_platform_t during autotuning |
dtfft_get_stream_c | dtfft_api | Function | Returns Stream associated with plan |
dtfft_get_version | dtfft_parameters | Interface | Get dtFFT version |
dtfft_get_version_current | dtfft_parameters | Function | Returns the current version code |
dtfft_get_version_required | dtfft_parameters | Function | Returns the version code required by the user |
dtfft_get_z_slab_enabled_c | dtfft_api | Function | Checks if dtFFT Plan is using Z-slab optimization |
dtfft_mem_alloc_c | dtfft_api | Function | Allocates memory for dtFFT Plan, C/C++ interface |
dtfft_mem_free_c | dtfft_api | Function | Frees memory for dtFFT Plan, C/C++ interface |
dtfft_report_c | dtfft_api | Function | Reports dtFFT Plan, C/C++ interface |
dtfft_set_config | dtfft_config | Subroutine | Sets configuration parameters |
dtfft_set_config_c | dtfft_api | Function | Sets dtFFT configuration, C/C++ interface |
dtfft_stream_t | dtfft_parameters | Interface | Creates dtfft_stream_t from integer(cuda_stream_kind) |
dtfft_transpose_c | dtfft_api | Function | Executes single transposition, C/C++ interface. |
dynamic_load | dtfft_utils | Function | Dynamically loads library and its symbols |
effort_eq | dtfft_parameters | Function | |
effort_ne | dtfft_parameters | Function | |
execute | dtfft_plan | Subroutine | Executes plan |
execute | dtfft_executor_cufft_m | Subroutine | Executes cuFFT plan |
execute | dtfft_nvrtc_kernel | Subroutine | Executes kernel on stream |
execute | dtfft_executor_mkl_m | Subroutine | Executes MKL plan |
execute | dtfft_abstract_transpose_plan | Subroutine | Executes single transposition |
execute | dtfft_abstract_executor | Subroutine | Executes plan |
execute | dtfft_abstract_backend | Subroutine | Executes GPU Backend |
execute | dtfft_transpose_handle_cuda | Subroutine | Executes transpose - exchange - unpack |
execute | dtfft_executor_fftw_m | Subroutine | Executes FFTW3 plan |
execute | dtfft_backend_cufftmp_m | Subroutine | Executes cuFFTMp GPU Backend |
execute | dtfft_executor_vkfft_m | Subroutine | Executes vkFFT plan |
execute | dtfft_transpose_handle_host | Subroutine | Executes transposition |
execute_cuda | dtfft_transpose_plan_cuda | Subroutine | Executes single transposition |
execute_mpi | dtfft_backend_mpi | Subroutine | Executes MPI backend |
execute_nccl | dtfft_backend_nccl_m | Subroutine | Executes NCCL backend |
execute_private | dtfft_plan | Subroutine | Executes plan with specified auxiliary buffer |
execute_private | dtfft_transpose_plan_host | Subroutine | Executes single transposition |
execute_ptr | dtfft_plan | Subroutine | Executes plan using type(c_ptr) pointers instead of buffers |
execute_type_eq | dtfft_parameters | Function | |
execute_type_ne | dtfft_parameters | Function | |
executor_eq | dtfft_parameters | Function | |
executor_ne | dtfft_parameters | Function | |
fftw_execute_dft | dtfft_interface_fftw_m | Interface | |
fftw_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftw_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftw_execute_r2r | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftw_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftwf_execute_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftwf_execute_r2r | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft_c2r | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_dft_r2c | dtfft_interface_fftw_m | Interface | |
fftwf_plan_many_r2r | dtfft_interface_fftw_m | Interface | |
free_datatypes | dtfft_transpose_handle_host | Subroutine | Frees temporary datatypes |
free_mem | dtfft_abstract_transpose_plan | Subroutine | Frees memory based on |
get_alloc_bytes | dtfft_plan | Function | Returns minimum number of bytes required to execute plan |
get_alloc_size | dtfft_plan | Function | Wrapper around |
get_aux_size | dtfft_abstract_backend | Function | Returns number of bytes required by aux buffer |
get_aux_size | dtfft_transpose_handle_cuda | Function | Returns number of bytes required by aux buffer |
get_backend | dtfft_plan | Function | Returns selected GPU backend during autotuning |
get_backend | dtfft_abstract_transpose_plan | Function | Returns plan GPU backend |
get_backend_from_env | dtfft_utils | Function | Returns GPU backend to use set by environment variable |
get_cached_kernel | dtfft_nvrtc_kernel | Function | Returns cached kernel if it exists. If not returns null pointer. |
get_code_init | dtfft_nvrtc_kernel | Subroutine | Generates basic code that is used in all other kernels |
get_comm | dtfft_api | Function | |
get_contiguous_execution_blocks | dtfft_nvrtc_kernel | Subroutine | |
get_cuda_architecture | dtfft_interface_cuda_runtime | Interface | Returns the CUDA architecture for a given device. |
get_datatype_from_env | dtfft_utils | Function | Obtains datatype id from environment variable |
get_element_size | dtfft_plan | Function | Returns number of bytes required to store single element. |
get_env | dtfft_utils | Interface | Obtains environment variable |
get_env_base | dtfft_utils | Function | Base function of obtaining dtFFT environment variable |
get_env_int32 | dtfft_utils | Function | Base Integer function of obtaining dtFFT environment variable |
get_env_int8 | dtfft_utils | Function | Obtains int8 environment variable |
get_env_logical | dtfft_utils | Function | Obtains logical environment variable |
get_env_string | dtfft_utils | Function | Obtains string environment variable |
get_inverse_kind | dtfft_utils | Function | Get the inverse R2R kind of transform for the given R2R kind |
get_iters_from_env | dtfft_utils | Function | Obtains number of iterations from environment variable |
get_local_size | dtfft_pencil | Subroutine | Computes local portions of data based on global count and position inside grid communicator |
get_local_sizes | dtfft_plan | Subroutine | Obtain local starts and counts in |
get_local_sizes | dtfft_pencil | Subroutine | Obtain local starts and counts in |
get_log_enabled | dtfft_utils | Function | Returns the value of the log_enabled variable |
get_mpi_enabled | dtfft_config | Function | Whether MPI backends are enabled or not |
get_mpi_enabled_from_env | dtfft_utils | Function | Returns usage of MPI Backends during autotune set by environment variable |
get_nccl_enabled | dtfft_config | Function | Whether NCCL backends are enabled or not |
get_nccl_enabled_from_env | dtfft_utils | Function | Returns usage of NCCL Backends during autotune set by environment variable |
get_neighbor_function_code | dtfft_nvrtc_kernel | Subroutine | Generated device function that is used to determite id of process that to which data is being sent or from which data has been recieved based on local element coordinate |
get_nvshmem_enabled | dtfft_config | Function | Whether nvshmem backends are enabled or not |
get_nvshmem_enabled_from_env | dtfft_utils | Function | Returns usage of NVSHMEM Backends during autotune set by environment variable |
get_pencil | dtfft_plan | Function | Returns pencil decomposition |
get_pipe_enabled_from_env | dtfft_utils | Function | Returns usage of Pipelined Backends during autotune set by environment variable |
get_pipelined_enabled | dtfft_config | Function | Whether pipelined backends are enabled or not |
get_plan_execution_time | dtfft_transpose_plan_host | Function | Creates transpose plan and executes it |
get_platform | dtfft_plan | Function | Returns execution platform of the plan (HOST or CUDA) |
get_platform_from_env | dtfft_utils | Function | Returns execution platform set by environment variable |
get_stream_int64 | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
get_stream_ptr | dtfft_plan | Subroutine | Returns CUDA stream associated with plan |
get_tile_size | dtfft_nvrtc_kernel | Function | Returns tile size to use in a tranpose kernel |
get_tranpose_type | dtfft_transpose_handle_cuda | Function | Returns transpose_type, associated with handle |
get_transpose_kernel_code | dtfft_nvrtc_kernel | Function | Generates code that will be used to locally tranpose data and prepares to send it to other processes ndims == 2 |
get_transpose_type | dtfft_pencil | Function | Determines transpose ID based on pencils |
get_true_transpose_type | dtfft_nvrtc_kernel | Function | Returns generic transpose id. Since X-Y and Y-Z transpositions are symmectric, it returns only one of them. X-Z and Z-X are not symmetric |
get_unpack_kernel_code | dtfft_nvrtc_kernel | Function | Generates code that will be used to unpack data when it is recieved |
get_unpack_pipelined_kernel_code | dtfft_nvrtc_kernel | Function | Generates code that will be used to partially unpack data when it is recieved from other process |
get_user_gpu_backend | dtfft_config | Function | Returns GPU backend set by the user or default one |
get_user_platform | dtfft_config | Function | Returns platform set by the user or default one |
get_user_stream | dtfft_config | Function | Returns either the custom provided by user or creates a new one |
get_z_slab | dtfft_config | Function | Whether Z-slab optimization is enabled or not |
get_z_slab_enabled | dtfft_plan | Function | Returns logical value is Z-slab optimization enabled internally |
get_z_slab_from_env | dtfft_utils | Function | Returns Z-slab to be used set by environment variable |
gpu_backend_eq | dtfft_parameters | Function | |
gpu_backend_ne | dtfft_parameters | Function | |
init_internal | dtfft_utils | Function | Checks if MPI is initialized and loads environment variables |
int_to_str | dtfft_utils | Interface | Converts integer to string |
int_to_str_int32 | dtfft_utils | Function | Convert 32-bit integer to string |
int_to_str_int64 | dtfft_utils | Function | Convert 64-bit integer to string |
int_to_str_int8 | dtfft_utils | Function | Convert 8-bit integer to string |
is_backend_mpi | dtfft_parameters | Function | |
is_backend_nccl | dtfft_parameters | Function | |
is_backend_nvshmem | dtfft_parameters | Function | |
is_backend_pipelined | dtfft_parameters | Function | |
is_cuda_executor | dtfft_parameters | Function | |
is_device_ptr | dtfft_utils | Interface | Checks if pointer can be accessed from device |
is_host_executor | dtfft_parameters | Function | |
is_null_funptr | dtfft_utils | Function | Checks if pointer is NULL |
is_null_ptr | dtfft_utils | Function | Checks if pointer is NULL |
is_null_ptr | dtfft_utils | Interface | Checks if pointer is NULL |
is_nvshmem_ptr | dtfft_interface_nvshmem | Function | Checks if pointer is a symmetric nvshmem allocated pointer |
is_same_ptr | dtfft_utils | Function | Checks if two pointer are the same |
is_valid_comm_type | dtfft_parameters | Function | |
is_valid_dimension | dtfft_parameters | Function | |
is_valid_effort | dtfft_parameters | Function | |
is_valid_execute_type | dtfft_parameters | Function | |
is_valid_executor | dtfft_parameters | Function | |
is_valid_gpu_backend | dtfft_parameters | Function | |
is_valid_platform | dtfft_parameters | Function | |
is_valid_precision | dtfft_parameters | Function | |
is_valid_r2r_kind | dtfft_parameters | Function | |
is_valid_transpose_type | dtfft_parameters | Function | |
load | dtfft_interface_vkfft_m | Function | Loads VkFFT library |
load_cuda | dtfft_interface_cuda | Function | Loads the CUDA Driver library and needed symbols |
load_library | dtfft_utils | Function | Dynamically loads library |
load_nvrtc | dtfft_interface_nvrtc | Function | Dynamically loads nvRTC library and its functions |
load_symbol | dtfft_utils | Function | Dynamically loads symbol from library |
load_vkfft | dtfft_interface_vkfft_m | Function | Loads VkFFT library based on the platform |
make_plan | dtfft_executor_mkl_m | Subroutine | Creates general MKL plan |
make_public | dtfft_pencil | Function | Creates public object that users can use to create own FFT backends |
mark_unused | dtfft_nvrtc_kernel | Subroutine | Takes CUDA kernel as an argument and searches for it in cache
If kernel is found than reduces |
mem_alloc | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
mem_alloc | dtfft_executor_mkl_m | Subroutine | Allocates MKL memory |
mem_alloc | dtfft_abstract_transpose_plan | Subroutine | Allocates memory based on selected backend |
mem_alloc | dtfft_executor_fftw_m | Subroutine | Allocates FFTW3 memory |
mem_alloc | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
mem_alloc_c32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_c32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_c32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_c64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_c64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_c64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_host | dtfft_utils | Interface | Allocates memory using C11 Standard alloc_align with 16 bytes alignment |
mem_alloc_ptr | dtfft_plan | Subroutine | Allocates memory specific for this plan |
mem_alloc_r32_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_r32_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_r32_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_alloc_r64_1d | dtfft_plan | Subroutine | Allocates pointer of rank 1 |
mem_alloc_r64_2d | dtfft_plan | Subroutine | Allocates pointer of rank 2 |
mem_alloc_r64_3d | dtfft_plan | Subroutine | Allocates pointer of rank 3 |
mem_free | dtfft_executor_cufft_m | Subroutine | Dummy method. Raises |
mem_free | dtfft_executor_mkl_m | Subroutine | Frees MKL aligned memory |
mem_free | dtfft_abstract_transpose_plan | Subroutine | Frees memory allocated with mem_alloc |
mem_free | dtfft_executor_fftw_m | Subroutine | Frees FFTW3 aligned memory |
mem_free | dtfft_executor_vkfft_m | Subroutine | Dummy method. Raises |
mem_free_c32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_c64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_host | dtfft_utils | Interface | Frees memory allocated with mem_alloc_host |
mem_free_ptr | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r32_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_1d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_2d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mem_free_r64_3d | dtfft_plan | Subroutine | Frees previously allocated memory specific for this plan |
mkl_dfti_commit_desc | dtfft_interface_mkl_m | Interface | Performs all initialization for the actual FFT computation. |
mkl_dfti_create_desc | dtfft_interface_mkl_m | Interface | Allocates the descriptor data structure and initializes it with default configuration values. |
mkl_dfti_execute | dtfft_interface_mkl_m | Interface | Computes FFT. |
mkl_dfti_free_desc | dtfft_interface_mkl_m | Interface | Frees the memory allocated for a descriptor. |
mkl_dfti_mem_alloc | dtfft_interface_mkl_m | Interface | Allocates pointer via |
mkl_dfti_mem_free | dtfft_interface_mkl_m | Interface | Frees pointer via |
mkl_dfti_set_value | dtfft_interface_mkl_m | Interface | Sets one particular configuration parameter with the specified configuration value. |
ncclCommDeregister | dtfft_interface_nccl | Interface | Deregister a buffer for collective communication. |
ncclCommDestroy | dtfft_interface_nccl | Interface | Destroy a communicator object comm. |
ncclCommInitRank | dtfft_interface_nccl | Interface | Creates a new communicator (multi thread/process version). |
ncclCommRegister | dtfft_interface_nccl | Interface | Register a buffer for collective communication. |
ncclGetErrorString | dtfft_interface_nccl | Function | Generates an error message. |
ncclGetErrorString_c | dtfft_interface_nccl | Interface | Returns a human-readable string corresponding to the passed error code. |
ncclGetUniqueId | dtfft_interface_nccl | Interface | Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user. |
ncclGroupEnd | dtfft_interface_nccl | Interface | End a group call. |
ncclGroupStart | dtfft_interface_nccl | Interface | Start a group call. |
ncclMemAlloc | dtfft_interface_nccl | Interface | Allocate a GPU buffer with size. Allocated buffer head address will be returned by ptr, and the actual allocated size can be larger than requested because of the buffer granularity requirements from all types of NCCL optimizations. |
ncclMemFree | dtfft_interface_nccl | Interface | Free memory allocated by ncclMemAlloc(). |
ncclRecv | dtfft_interface_nccl | Interface | Receive data from rank peer into recvbuff. |
ncclSend | dtfft_interface_nccl | Interface | Send data from sendbuff to rank peer. |
nvrtcGetErrorString | dtfft_interface_nvrtc | Function | Helper function that returns a string describing the given nvrtcResult code For unrecognized enumeration values, it returns “NVRTC_ERROR unknown” |
nvshmem_free | dtfft_interface_nvshmem | Interface | |
nvshmem_malloc | dtfft_interface_nvshmem | Interface | |
nvshmem_my_pe | dtfft_interface_nvshmem | Interface | |
nvshmem_ptr | dtfft_interface_nvshmem | Interface | |
nvshmemx_float_alltoall_on_stream | dtfft_interface_nvshmem | Interface | |
nvshmemx_init_status | dtfft_interface_nvshmem | Interface | |
nvshmemx_sync_all_on_stream | dtfft_interface_nvshmem | Interface | |
nvtxDomainCreate_c | dtfft_interface_nvtx | Interface | Creates an NVTX domain with the specified name. |
nvtxDomainRangePop_c | dtfft_interface_nvtx | Interface | Pops a range from the specified NVTX domain. |
nvtxDomainRangePushEx_c | dtfft_interface_nvtx | Interface | Pushes a range with a custom message and color onto the specified NVTX domain. |
operator(/=) | dtfft_parameters | Interface | |
operator(==) | dtfft_parameters | Interface | |
platform_eq | dtfft_parameters | Function | |
platform_ne | dtfft_parameters | Function | |
pop_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pops a range from the NVTX domain |
precision_eq | dtfft_parameters | Function | |
precision_ne | dtfft_parameters | Function | |
push_nvtx_domain_range | dtfft_interface_nvtx | Subroutine | Pushes a range to the NVTX domain |
r2r_kind_eq | dtfft_parameters | Function | |
r2r_kind_ne | dtfft_parameters | Function | |
report | dtfft_plan | Subroutine | Prints plan-related information to stdout |
run_autotune_backend | dtfft_transpose_plan_cuda | Subroutine | Runs autotune for all backends |
run_cuda_kernel | dtfft_interface_cuda | Interface | Launches a CUDA function CUfunction or a CUDA kernel CUkernel. |
run_mpi_a2a | dtfft_backend_mpi | Subroutine | Executes MPI all-to-all communication |
run_mpi_p2p | dtfft_backend_mpi | Subroutine | Executes MPI point-to-point communication |
set_unpack_kernel | dtfft_abstract_backend | Subroutine | Sets unpack kernel for pipelined backend |
stream_from_int64 | dtfft_parameters | Function | Creates dtfft_stream_t from integer(cuda_stream_kind) |
string | dtfft_utils | Interface | Creates string object |
string_c2f | dtfft_utils | Subroutine | Convert C string to Fortran string |
string_constructor | dtfft_utils | Function | Creates string object |
string_f2c | dtfft_utils | Subroutine | Convert Fortran string to C string |
to_cstr | dtfft_nvrtc_kernel | Subroutine | Converts Fortran CUDA code to C pointer |
transpose | dtfft_plan | Subroutine | Performs single transposition |
transpose_ptr | dtfft_plan | Subroutine | Performs single transposition using type(c_ptr) pointers instead of buffers |
transpose_type_eq | dtfft_parameters | Function | |
transpose_type_ne | dtfft_parameters | Function | |
unload_library | dtfft_utils | Subroutine | Unloads library |
write_message | dtfft_utils | Subroutine | Write message to the specified unit |