NCCL Interfaces
Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
type(ncclDataType), | public, | parameter | :: | ncclFloat | = | ncclDataType(7) |
Generates an Id to be used in ncclCommInitRank. ncclGetUniqueId should be called once when creating a communicator and the Id should be distributed to all ranks in the communicator before calling ncclCommInitRank. uniqueId should point to a ncclUniqueId object allocated by the user.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(ncclUniqueId), | intent(out) | :: | uniqueId |
Unique ID |
Completion status
Allocate a GPU buffer with size. Allocated buffer head address will be returned by ptr, and the actual allocated size can be larger than requested because of the buffer granularity requirements from all types of NCCL optimizations.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(c_ptr), | intent(out) | :: | ptr |
Buffer address |
||
integer(kind=c_size_t), | intent(in), | value | :: | alloc_bytes |
Number of bytes to allocate |
Completion status
Free memory allocated by ncclMemAlloc().
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(c_ptr), | intent(in), | value | :: | ptr |
Buffer address |
Completion status
Creates a new communicator (multi thread/process version).
rank must be between 0 and nranks-1 and unique within a communicator clique. Each rank is associated to a CUDA device, which has to be set before calling ncclCommInitRank.
ncclCommInitRank implicitly synchronizes with other ranks, hence it must be called by different threads/processes or used within ncclGroupStart/ncclGroupEnd.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(ncclComm) | :: | comm |
Communicator |
|||
integer(kind=c_int), | value | :: | nranks |
Number of ranks in communicator |
||
type(ncclUniqueId), | value | :: | uniqueId |
Unique ID |
||
integer(kind=c_int), | value | :: | rank |
Calling rank |
Completion status
Send data from sendbuff to rank peer.
Rank peer needs to call ncclRecv with the same datatype and the same count as this rank.
This operation is blocking for the GPU. If multiple ncclSend() and ncclRecv() operations need to progress concurrently to complete, they must be fused within a ncclGroupStart()/ ncclGroupEnd() section.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=c_float) | :: | sendbuff |
Buffer to send data from |
|||
integer(kind=c_size_t), | value | :: | count |
Number of elements to send |
||
type(ncclDataType), | value | :: | datatype |
Datatype to send |
||
integer(kind=c_int), | value | :: | peer |
Target GPU |
||
type(ncclComm), | value | :: | comm |
Communicator |
||
type(dtfft_stream_t), | value | :: | stream |
CUDA Stream |
Completion status
Receive data from rank peer into recvbuff.
Rank peer needs to call ncclSend with the same datatype and the same count as this rank.
This operation is blocking for the GPU. If multiple ncclSend() and ncclRecv() operations need to progress concurrently to complete, they must be fused within a ncclGroupStart()/ ncclGroupEnd() section.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
real(kind=c_float) | :: | recvbuff |
Buffer to recv data into |
|||
integer(kind=c_size_t), | value | :: | count |
Number of elements to recv |
||
type(ncclDataType), | value | :: | datatype |
Datatype to recv |
||
integer(kind=c_int), | value | :: | peer |
Source GPU |
||
type(ncclComm), | value | :: | comm |
Communicator |
||
type(dtfft_stream_t), | value | :: | stream |
CUDA Stream |
Completion status
Start a group call.
All subsequent calls to NCCL until ncclGroupEnd will not block due to inter-CPU synchronization.
Completion status
End a group call.
Returns when all operations since ncclGroupStart have been processed. This means the communication primitives have been enqueued to the provided streams, but are not necessarily complete.
Completion status
Destroy a communicator object comm.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(ncclComm), | value | :: | comm |
Communicator |
Completion status
Register a buffer for collective communication.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(ncclComm), | value | :: | comm |
Communicator |
||
type(c_ptr), | value | :: | buff |
Buffer to register |
||
integer(kind=c_size_t), | value | :: | size |
Size of the buffer in bytes |
||
type(c_ptr) | :: | handle |
Handle to the registered buffer |
Completion status
Deregister a buffer for collective communication.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(ncclComm), | value | :: | comm |
Communicator |
||
type(c_ptr), | value | :: | handle |
Handle to the registered buffer |
Completion status
Returns a human-readable string corresponding to the passed error code.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=c_int32_t), | intent(in), | value | :: | ncclResult_t |
Completion status of a NCCL function. |
Pointer to message
Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
character(len=c_char), | public | :: | internal(128) |
Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
type(c_ptr), | public | :: | member |
Type | Visibility | Attributes | Name | Initial | |||
---|---|---|---|---|---|---|---|
integer(kind=c_int), | public | :: | member |
Generates an error message.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
integer(kind=c_int32_t), | intent(in) | :: | ncclResult_t |
Completion status of a function. |
Error message