dtfft_interface_nccl Module

NCCL Interfaces


Uses

  • module~~dtfft_interface_nccl~~UsesGraph module~dtfft_interface_nccl dtfft_interface_nccl iso_c_binding iso_c_binding module~dtfft_interface_nccl->iso_c_binding module~dtfft_parameters dtfft_parameters module~dtfft_interface_nccl->module~dtfft_parameters module~dtfft_utils dtfft_utils module~dtfft_interface_nccl->module~dtfft_utils module~dtfft_parameters->iso_c_binding iso_fortran_env iso_fortran_env module~dtfft_parameters->iso_fortran_env mpi_f08 mpi_f08 module~dtfft_parameters->mpi_f08 module~dtfft_utils->iso_c_binding module~dtfft_utils->module~dtfft_parameters module~dtfft_utils->iso_fortran_env module~dtfft_utils->mpi_f08

Used by

  • module~~dtfft_interface_nccl~~UsedByGraph module~dtfft_interface_nccl dtfft_interface_nccl module~dtfft_abstract_backend dtfft_abstract_backend module~dtfft_abstract_backend->module~dtfft_interface_nccl module~dtfft_abstract_transpose_plan dtfft_abstract_transpose_plan module~dtfft_abstract_transpose_plan->module~dtfft_interface_nccl module~dtfft_abstract_transpose_plan->module~dtfft_abstract_backend module~dtfft_backend_nccl_m dtfft_backend_nccl_m module~dtfft_backend_nccl_m->module~dtfft_interface_nccl module~dtfft_backend_nccl_m->module~dtfft_abstract_backend module~dtfft_backend_cufftmp_m dtfft_backend_cufftmp_m module~dtfft_backend_cufftmp_m->module~dtfft_abstract_backend module~dtfft_backend_mpi dtfft_backend_mpi module~dtfft_backend_mpi->module~dtfft_abstract_backend module~dtfft_plan dtfft_plan module~dtfft_plan->module~dtfft_abstract_transpose_plan module~dtfft_transpose_plan_cuda dtfft_transpose_plan_cuda module~dtfft_plan->module~dtfft_transpose_plan_cuda module~dtfft_transpose_plan_host dtfft_transpose_plan_host module~dtfft_plan->module~dtfft_transpose_plan_host module~dtfft_transpose_handle_cuda dtfft_transpose_handle_cuda module~dtfft_transpose_handle_cuda->module~dtfft_abstract_backend module~dtfft_transpose_handle_cuda->module~dtfft_backend_nccl_m module~dtfft_transpose_handle_cuda->module~dtfft_backend_cufftmp_m module~dtfft_transpose_handle_cuda->module~dtfft_backend_mpi module~dtfft_transpose_plan_cuda->module~dtfft_abstract_backend module~dtfft_transpose_plan_cuda->module~dtfft_abstract_transpose_plan module~dtfft_transpose_plan_cuda->module~dtfft_transpose_handle_cuda module~dtfft_transpose_plan_host->module~dtfft_abstract_transpose_plan module~dtfft dtfft module~dtfft->module~dtfft_plan module~dtfft_api dtfft_api module~dtfft_api->module~dtfft_plan

Variables

Type Visibility Attributes Name Initial
type(ncclDataType), public, parameter :: ncclFloat = ncclDataType(7)

Interfaces

interface

Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user.

  • public function ncclGetUniqueId(uniqueId) result(ncclResult_t) bind(C, name="ncclGetUniqueId")

    Arguments

    Type IntentOptional Attributes Name
    type(ncclUniqueId), intent(out) :: uniqueId

    Unique ID

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Allocate a GPU buffer with size. Allocated buffer head address will be returned by ptr, and the actual allocated size can be larger than requested because of the buffer granularity requirements from all types of NCCL optimizations.

  • public function ncclMemAlloc(ptr, alloc_bytes) result(ncclResult_t) bind(C, name="ncclMemAlloc")

    Arguments

    Type IntentOptional Attributes Name
    type(c_ptr), intent(out) :: ptr

    Buffer address

    integer(kind=c_size_t), intent(in), value :: alloc_bytes

    Number of bytes to allocate

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Free memory allocated by ncclMemAlloc().

  • public function ncclMemFree(ptr) result(ncclResult_t) bind(C, name="ncclMemFree")

    Arguments

    Type IntentOptional Attributes Name
    type(c_ptr), intent(in), value :: ptr

    Buffer address

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Creates a new communicator (multi thread/process version).

rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank.

ncclCommInitRank implicitly synchronizes with other ranks, hence it must be called by different threads/processes or used within ncclGroupStart/ncclGroupEnd.

  • public function ncclCommInitRank(comm, nranks, uniqueId, rank) result(ncclResult_t) bind(C, name="ncclCommInitRank")

    Arguments

    Type IntentOptional Attributes Name
    type(ncclComm) :: comm

    Communicator

    integer(kind=c_int), value :: nranks

    Number of ranks in communicator

    type(ncclUniqueId), value :: uniqueId

    Unique ID

    integer(kind=c_int), value :: rank

    Calling rank

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Send data from sendbuff to rank peer.

Rank peer needs to call ncclRecv with the same datatype and the same count as this rank.

This operation is blocking for the GPU. If multiple ncclSend() and ncclRecv() operations need to progress concurrently to complete, they must be fused within a ncclGroupStart()/ ncclGroupEnd() section.

  • public function ncclSend(sendbuff, count, datatype, peer, comm, stream) result(ncclResult_t) bind(c, name='ncclSend')

    Arguments

    Type IntentOptional Attributes Name
    real(kind=c_float) :: sendbuff

    Buffer to send data from

    integer(kind=c_size_t), value :: count

    Number of elements to send

    type(ncclDataType), value :: datatype

    Datatype to send

    integer(kind=c_int), value :: peer

    Target GPU

    type(ncclComm), value :: comm

    Communicator

    type(dtfft_stream_t), value :: stream

    CUDA Stream

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Receive data from rank peer into recvbuff.

Rank peer needs to call ncclSend with the same datatype and the same count as this rank.

This operation is blocking for the GPU. If multiple ncclSend() and ncclRecv() operations need to progress concurrently to complete, they must be fused within a ncclGroupStart()/ ncclGroupEnd() section.

  • public function ncclRecv(recvbuff, count, datatype, peer, comm, stream) result(ncclResult_t) bind(c, name='ncclRecv')

    Arguments

    Type IntentOptional Attributes Name
    real(kind=c_float) :: recvbuff

    Buffer to recv data into

    integer(kind=c_size_t), value :: count

    Number of elements to recv

    type(ncclDataType), value :: datatype

    Datatype to recv

    integer(kind=c_int), value :: peer

    Source GPU

    type(ncclComm), value :: comm

    Communicator

    type(dtfft_stream_t), value :: stream

    CUDA Stream

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Start a group call.

All subsequent calls to NCCL until ncclGroupEnd will not block due to inter-CPU synchronization.

  • public function ncclGroupStart() result(ncclResult_t) bind(C, name="ncclGroupStart")

    Arguments

    None

    Return Value integer(kind=c_int32_t)

    Completion status

interface

End a group call.

Returns when all operations since ncclGroupStart have been processed. This means the communication primitives have been enqueued to the provided streams, but are not necessarily complete.

  • public function ncclGroupEnd() result(ncclResult_t) bind(C, name="ncclGroupEnd")

    Arguments

    None

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Destroy a communicator object comm.

  • public function ncclCommDestroy(comm) result(ncclResult_t) bind(C, name="ncclCommDestroy")

    Arguments

    Type IntentOptional Attributes Name
    type(ncclComm), value :: comm

    Communicator

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Register a buffer for collective communication.

  • public function ncclCommRegister(comm, buff, size, handle) result(ncclResult_t) bind(C, name="ncclCommRegister")

    Arguments

    Type IntentOptional Attributes Name
    type(ncclComm), value :: comm

    Communicator

    type(c_ptr), value :: buff

    Buffer to register

    integer(kind=c_size_t), value :: size

    Size of the buffer in bytes

    type(c_ptr) :: handle

    Handle to the registered buffer

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Deregister a buffer for collective communication.

  • public function ncclCommDeregister(comm, handle) result(ncclResult_t) bind(C, name="ncclCommDeregister")

    Arguments

    Type IntentOptional Attributes Name
    type(ncclComm), value :: comm

    Communicator

    type(c_ptr), value :: handle

    Handle to the registered buffer

    Return Value integer(kind=c_int32_t)

    Completion status

interface

Returns a human-readable string corresponding to the passed error code.

  • private function ncclGetErrorString_c(ncclResult_t) result(message) bind(C, name="ncclGetErrorString")

    Arguments

    Type IntentOptional Attributes Name
    integer(kind=c_int32_t), intent(in), value :: ncclResult_t

    Completion status of a NCCL function.

    Return Value type(c_ptr)

    Pointer to message


Derived Types

type, public, bind(c) ::  ncclUniqueId

Components

Type Visibility Attributes Name Initial
character(len=c_char), public :: internal(128)

type, public, bind(c) ::  ncclComm

Components

Type Visibility Attributes Name Initial
type(c_ptr), public :: member

type, public, bind(c) ::  ncclDataType

Components

Type Visibility Attributes Name Initial
integer(kind=c_int), public :: member

Functions

public function ncclGetErrorString(ncclResult_t) result(string)

Generates an error message.

Arguments

Type IntentOptional Attributes Name
integer(kind=c_int32_t), intent(in) :: ncclResult_t

Completion status of a function.

Return Value character(len=:), allocatable

Error message