C API Reference

This page describes all types, functions and macros available in dtFFT C API. In order to use them user have to #include <dtfft.h>.

Note

Not all of the API listed below can be accessible in runtime. For example dtfft_create_plan_r2c() can only be used if dtFFT compiled with any FFT

Predefined Macros

DTFFT_VERSION_MAJOR

dtFFT Major Version

DTFFT_VERSION_MINOR

dtFFT Minor Version

DTFFT_VERSION_PATCH

dtFFT Patch Version

DTFFT_VERSION_CODE

dtFFT Version Code.

Can be used in Version comparison

DTFFT_VERSION(X, Y, Z)

Generates Version Code based on Major, Minor, Patch.

DTFFT_CALL(call)

Safe call macro.

Should be used to check error codes returned by dtFFT.

Writes an error message to stderr and calls MPI_Abort if an error occurs.

Example

DTFFT_CALL( dtfft_transpose(plan, a, b) )

Enumerators

enum dtfft_error_t

This enum lists the different error codes that dtFFT can return.

Values:

enumerator DTFFT_SUCCESS

Successful execution.

enumerator DTFFT_ERROR_MPI_FINALIZED

MPI_Init is not called or MPI_Finalize has already been called.

enumerator DTFFT_ERROR_PLAN_NOT_CREATED

Plan not created.

enumerator DTFFT_ERROR_INVALID_TRANSPOSE_TYPE

Invalid transpose_type provided.

enumerator DTFFT_ERROR_INVALID_N_DIMENSIONS

Invalid Number of dimensions provided.

Valid options are 2 and 3

enumerator DTFFT_ERROR_INVALID_DIMENSION_SIZE

One or more provided dimension sizes <= 0.

enumerator DTFFT_ERROR_INVALID_COMM_TYPE

Invalid communicator type provided.

enumerator DTFFT_ERROR_INVALID_PRECISION

Invalid precision parameter provided.

enumerator DTFFT_ERROR_INVALID_EFFORT

Invalid effort parameter provided.

enumerator DTFFT_ERROR_INVALID_EXECUTOR

Invalid executor parameter provided.

enumerator DTFFT_ERROR_INVALID_COMM_DIMS

Number of dimensions in provided Cartesian communicator > Number of dimension passed to create subroutine.

enumerator DTFFT_ERROR_INVALID_COMM_FAST_DIM

Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.

enumerator DTFFT_ERROR_MISSING_R2R_KINDS

For R2R plan, kinds parameter must be passed if executor != DTFFT_EXECUTOR_NONE

enumerator DTFFT_ERROR_INVALID_R2R_KINDS

Invalid values detected in kinds parameter.

enumerator DTFFT_ERROR_R2C_TRANSPOSE_PLAN

Transpose plan is not supported in R2C, use R2R or C2C plan instead.

enumerator DTFFT_ERROR_INPLACE_TRANSPOSE

Inplace transpose is not supported.

enumerator DTFFT_ERROR_INVALID_AUX

Invalid aux buffer provided.

enumerator DTFFT_ERROR_INVALID_DIM

Invalid dim passed to dtfft_get_pencil

enumerator DTFFT_ERROR_INVALID_USAGE

Invalid API Usage.

enumerator DTFFT_ERROR_PLAN_IS_CREATED

Trying to create already created plan.

enumerator DTFFT_ERROR_R2R_FFT_NOT_SUPPORTED

Selected executor do not support R2R FFTs.

enumerator DTFFT_ERROR_ALLOC_FAILED

Internal call of dtfft_mem_alloc failed.

enumerator DTFFT_ERROR_FREE_FAILED

Internal call of dtfft_mem_free failed.

enumerator DTFFT_ERROR_INVALID_ALLOC_BYTES

Invalid alloc_bytes provided.

enumerator DTFFT_ERROR_DLOPEN_FAILED

Failed to dynamically load library.

enumerator DTFFT_ERROR_DLSYM_FAILED

Failed to dynamically load symbol.

enumerator DTFFT_ERROR_R2C_TRANSPOSE_CALLED

Calling to dtfft_transpose for R2C plan is not allowed.

enumerator DTFFT_ERROR_GPU_INVALID_STREAM

Invalid stream provided.

enumerator DTFFT_ERROR_GPU_INVALID_BACKEND

Invalid GPU backend provided.

enumerator DTFFT_ERROR_GPU_NOT_SET

Multiple MPI Processes located on same host share same GPU which is not supported.

enumerator DTFFT_ERROR_VKFFT_R2R_2D_PLAN

When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.

enumerator DTFFT_ERROR_GPU_BACKENDS_DISABLED

Passed effort == DTFFT_PATIENT but all GPU Backends has been disabled by dtfft_config_t

enumerator DTFFT_ERROR_NOT_DEVICE_PTR

One of pointers passed to dtfft_execute or dtfft_transpose cannot be accessed from device.

enumerator DTFFT_ERROR_NOT_NVSHMEM_PTR

One of pointers passed to dtfft_execute or dtfft_transpose is not an NVSHMEM pointer.

enumerator DTFFT_ERROR_INVALID_PLATFORM

Invalid platform provided.

enumerator DTFFT_ERROR_INVALID_PLATFORM_EXECUTOR_TYPE

Invalid executor provided for selected platform.


enum dtfft_execute_t

This enum lists valid execute_type parameters that can be passed to dtfft_execute.

Values:

enumerator DTFFT_EXECUTE_FORWARD

Perform XYZ –> YXZ –> ZXY plan execution (Forward)

enumerator DTFFT_EXECUTE_BACKWARD

Perform ZXY –> YXZ –> XYZ plan execution (Backward)


enum dtfft_transpose_t

This enum lists valid transpose_type parameters that can be passed to dtfft_transpose.

Values:

enumerator DTFFT_TRANSPOSE_X_TO_Y

Transpose from Fortran X aligned to Fortran Y aligned.

enumerator DTFFT_TRANSPOSE_Y_TO_X

Transpose from Fortran Y aligned to Fortran X aligned.

enumerator DTFFT_TRANSPOSE_Y_TO_Z

Transpose from Fortran Y aligned to Fortran Z aligned.

enumerator DTFFT_TRANSPOSE_Z_TO_Y

Transpose from Fortran Z aligned to Fortran Y aligned.

enumerator DTFFT_TRANSPOSE_X_TO_Z

Transpose from Fortran X aligned to Fortran Z aligned.

Note

This value is valid to pass only in 3D Plan and value returned by dtfft_get_z_slab_enabled must be true

enumerator DTFFT_TRANSPOSE_Z_TO_X

Transpose from Fortran Z aligned to Fortran X aligned.

Note

This value is valid to pass only in 3D Plan and value returned by dtfft_get_z_slab_enabled must be true


enum dtfft_precision_t

This enum lists valid precision values that can be passed while creating plan.

Values:

enumerator DTFFT_SINGLE

Use Single precision.

enumerator DTFFT_DOUBLE

Use Double precision.


enum dtfft_effort_t

This enum lists valid effort values that can be passed while creating plan.

Values:

enumerator DTFFT_ESTIMATE

Create plan as fast as possible.

enumerator DTFFT_MEASURE

Will attempt to find best MPI Grid decomposition.

Passing this flag and MPI Communicator with cartesian topology to dtfft_create_plan_* is same as DTFFT_ESTIMATE.

enumerator DTFFT_PATIENT

Same as DTFFT_MEASURE plus cycle through various send and receive MPI_Datatypes.

For GPU Build this flag will run autotune procedure to find best backend


enum dtfft_executor_t

This enum lists available FFT executors.

Values:

enumerator DTFFT_EXECUTOR_NONE

Do not create any FFT plans.

Creates transpose only plan.

enumerator DTFFT_EXECUTOR_FFTW3

FFTW3 Executor (Host only)

enumerator DTFFT_EXECUTOR_MKL

MKL DFTI Executor (Host only)

enumerator DTFFT_EXECUTOR_CUFFT

CUFFT Executor (GPU Only)

enumerator DTFFT_EXECUTOR_VKFFT

VkFFT Executor (GPU Only)


enum dtfft_r2r_kind_t

This enum lists the different R2R FFT kinds.

Values:

enumerator DTFFT_DCT_1

DCT-I (Logical N=2*(n-1), inverse is DTFFT_DCT_1)

enumerator DTFFT_DCT_2

DCT-II (Logical N=2*n, inverse is DTFFT_DCT_3)

enumerator DTFFT_DCT_3

DCT-III (Logical N=2*n, inverse is DTFFT_DCT_2)

enumerator DTFFT_DCT_4

DCT-IV (Logical N=2*n, inverse is DTFFT_DCT_4)

enumerator DTFFT_DST_1

DST-I (Logical N=2*(n+1), inverse is DTFFT_DST_1)

enumerator DTFFT_DST_2

DST-II (Logical N=2*n, inverse is DTFFT_DST_3)

enumerator DTFFT_DST_3

DST-III (Logical N=2*n, inverse is DTFFT_DST_2)

enumerator DTFFT_DST_4

DST-IV (Logical N=2*n, inverse is DTFFT_DST_4)


enum dtfft_backend_t

This enum lists the different available GPU backend options.

Values:

enumerator DTFFT_BACKEND_MPI_DATATYPE

Backend that uses MPI datatypes.

Not really recommended to use, since it is a million times slower than other backends. It is present here just to show how slow MPI Datatypes are for GPU usage.

enumerator DTFFT_BACKEND_MPI_P2P

MPI peer-to-peer algorithm.

enumerator DTFFT_BACKEND_MPI_P2P_PIPELINED

MPI peer-to-peer algorithm with overlapping data copying and unpacking.

enumerator DTFFT_BACKEND_MPI_A2A

MPI backend using MPI_Alltoallv.

enumerator DTFFT_BACKEND_NCCL

NCCL backend.

enumerator DTFFT_BACKEND_NCCL_PIPELINED

NCCL backend with overlapping data copying and unpacking.

enumerator DTFFT_BACKEND_CUFFTMP

cuFFTMp backend


enum dtfft_platform_t

Enum that specifies the execution platform, such as Host, CUDA, or HIP.

Values:

enumerator DTFFT_PLATFORM_HOST

Host.

enumerator DTFFT_PLATFORM_CUDA

CUDA.

Types

typedef struct dtfft_plan_private_t *dtfft_plan_t

Structure to hold plan data.


struct dtfft_pencil_t

Structure to hold pencil decomposition info.

See also

dtfft_get_pencil

Public Members

uint8_t dim

Aligned dimension id starting from 1.

uint8_t ndims

Number of dimensions in a pencil.

int32_t starts[3]

Local starts in natural Fortran order.

int32_t counts[3]

Local counts in natural Fortran order.

size_t size

Total number of elements in a pencil.


struct dtfft_config_t

Struct that can be used to set additional configuration parameters to dtFFT.

Public Members

bool enable_z_slab

Should dtFFT use Z-slab optimization or not.

Default is true

One should consider disabling Z-slab optimization in order to resolve DTFFT_ERROR_VKFFT_R2R_2D_PLAN error OR when underlying FFT implementation of 2D plan is too slow.

In all other cases it is considered that Z-slab is always faster, since it reduces number of data transpositions.

dtfft_platform_t platform

Selects platform to execute plan.

Default is DTFFT_PLATFORM_HOST

This option is only defined in a build with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related.

dtfft_stream_t stream

Main CUDA stream that will be used in dtFFT.

This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by dtfft_get_stream function. When user sets stream he is responsible of destroying it.

Stream must not be destroyed before call to dtfft_destroy.

dtfft_backend_t backend

Backend that will be used by dtFFT when effort is DTFFT_ESTIMATE or DTFFT_MEASURE.

Default is DTFFT_BACKEND_NCCL

bool enable_mpi_backends

Should MPI GPU Backends be enabled when effort is DTFFT_PATIENT or not.

Default is false

MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.

For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.

One of the workarounds is to disable MPI Backends by default, which is done here.

Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms

bool enable_pipelined_backends

Should pipelined GPU backends be enabled when effort is DTFFT_PATIENT or not.

Default is true

Note

Pipelined backends require additional buffer that user has no control over.

bool enable_nccl_backends

Should NCCL Backends be enabled when effort is DTFFT_PATIENT or not.

Default is true.

bool enable_nvshmem_backends

Should NVSHMEM Backends be enabled when effort is DTFFT_PATIENT or not.

Default is true.


typedef void *dtfft_stream_t

dtFFT stream representation.

For CUDA platform this should be casted from cudaStream_t.

Example

cudaStream_t stream;
cudaStreamCreate(&stream);
dtfft_stream_t dtfftStream = (dtfft_stream_t)stream;

Functions

int32_t dtfft_get_version()
Returns:

Version Code defined during compilation


const char *dtfft_get_error_string(const dtfft_error_t error_code)

Gets the string description of an error code.

Parameters:

error_code[in] Error code to convert to string

Returns:

Error string explaining error.


const char *dtfft_get_backend_string(const dtfft_backend_t backend)

Returns null terminated string with name of backend provided as argument.

Parameters:

backend[in] Backend to represent

Returns:

Character representation of backend.


dtfft_error_t dtfft_create_config(dtfft_config_t *config)

Sets default values to config.

Parameters:

config[out] Config to set default values into

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_set_config(dtfft_config_t config)

Set configuration values to dtFFT.

In order to take effect should be called before plan creation

Parameters:

config[in] Config to set

Returns:

DTFFT_SUCCESS on success or error code on failure.

Plan constructors

All plan constructors must be called after MPI_Init. Plan must be destroyed before call to MPI_Finalize.

dtfft_error_t dtfft_create_plan_r2r(const int8_t ndims, const int32_t *dims, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)

Real-to-Real Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order dims[0] must be the fastest varying

  • kinds[in] Array of size ndims containing Real FFT kinds in reverse order. Can be NULL if executor == DTFFT_EXECUTOR_NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_c2c(const int8_t ndims, const int32_t *dims, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)

Complex-to-Complex Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_r2c(const int8_t ndims, const int32_t *dims, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)

Real-to-Complex Plan constructor.

Note

Parameter executor cannot be DTFFT_EXECUTOR_NONE. Use C2C plan instead.

Note

This function is only present in the API when dtFFT was compiled with any external FFT.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise

Plan destructor

dtfft_error_t dtfft_destroy(dtfft_plan_t *plan)

Plan Destructor.

Parameters:

plan[inout] Plan handle

Returns:

DTFFT_SUCCESS on success or error code on failure.

Memory allocation

dtfft_error_t dtfft_mem_alloc(dtfft_plan_t plan, size_t alloc_bytes, void **ptr)

Allocates memory specific for this plan.

Parameters:
  • plan[in] Plan handle

  • alloc_bytes[in] Number of bytes to allocate

  • ptr[out] Allocated pointer

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_mem_free(dtfft_plan_t plan, void *ptr)

Frees memory specific for this plan.

Parameters:
  • plan[in] Plan handle

  • ptr[inout] Allocated pointer

Returns:

DTFFT_SUCCESS on success or error code on failure.

Plan execution

dtfft_error_t dtfft_execute(dtfft_plan_t plan, void *in, void *out, const dtfft_execute_t execute_type, void *aux)

Plan execution.

Neither in nor out are allowed to be NULL. The same pointer can safely be passed to both in and out.

Parameters:
  • plan[in] Plan handle

  • in[inout] Incoming buffer

  • out[out] Result buffer

  • execute_type[in] Type of transform.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If NULL during first call to this function, then auxiliary will be allocated internally and freed after call to dtfft_destroy

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_transpose(dtfft_plan_t plan, void *in, void *out, const dtfft_transpose_t transpose_type)

Transpose data in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • plan[in] Plan handle

  • in[inout] Incoming buffer

  • out[out] Transposed buffer

  • transpose_type[in] Type of transpose.

Returns:

DTFFT_SUCCESS on success or error code on failure.

Plan information

dtfft_error_t dtfft_report(dtfft_plan_t plan)

Prints plan-related information to stdout.

Parameters:

plan[in] Plan handle

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_local_sizes(dtfft_plan_t plan, int32_t *in_starts, int32_t *in_counts, int32_t *out_starts, int32_t *out_counts, size_t *alloc_size)

Get grid decomposition information.

Results may differ on different MPI processes

Parameters:
  • plan[in] Plan handle

  • in_starts[out] Starts of local portion of data in real space in reversed order

  • in_counts[out] Number of elements of local portion of data in real space in reversed order

  • out_starts[out] Starts of local portion of data in fourier space in reversed order

  • out_counts[out] Number of elements of local portion of data in fourier space in reversed order

  • alloc_size[out] Minimum number of elements to be allocated for in, out or aux buffers. Size of each element in bytes can be obtained by calling dtfft_get_element_size.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_alloc_size(dtfft_plan_t plan, size_t *alloc_size)

Wrapper around dtfft_get_local_sizes to obtain number of elements only.

Parameters:
  • plan[in] Plan handle

  • alloc_size[out] Minimum number of elements to be allocated for in, out or aux buffers. Size of each element in bytes can be obtained by calling dtfft_get_element_size.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_element_size(dtfft_plan_t plan, size_t *element_size)

Obtains number of bytes required to store single element by this plan.

Parameters:
  • plan[in] Plan handle

  • element_size[out] Size of element in bytes

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_alloc_bytes(dtfft_plan_t plan, size_t *alloc_bytes)

Returns minimum number of bytes required to execute plan.

This function is a combination of two calls: dtfft_get_alloc_size and dtfft_get_element_size

Parameters:
  • plan[in] Plan handle

  • alloc_bytes[out] Number of bytes required

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_pencil(dtfft_plan_t plan, int32_t dim, dtfft_pencil_t *pencil)

Obtains pencil information from plan.

This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.

Parameters:
  • plan[in] Plan handle

  • dim[in] Required dimension:

    • 0 for XYZ layout (real space, R2C only)

    • 1 for XYZ layout

    • 2 for YXZ layout

    • 3 for ZXY layout

  • pencil[out] Pencil data

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_z_slab_enabled(dtfft_plan_t plan, bool *is_z_slab_enabled)

Checks if plan is using Z-slab optimization.

If true then flags DTFFT_TRANSPOSE_X_TO_Z and DTFFT_TRANSPOSE_Z_TO_X will be valid to pass to dtfft_transpose.

Parameters:
  • plan[in] Plan handle

  • is_z_slab_enabled[out] Boolean value if Z-slab is used.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_stream(dtfft_plan_t plan, dtfft_stream_t *stream)

Returns stream associated with dtFFT plan.

This can either be stream passed by user to dtfft_set_config or stream created internally. Returns NULL pointer if plan’s platform is DTFFT_PLATFORM_HOST.

Parameters:
  • plan[in] Plan handle

  • stream[out] CUDA stream associated with plan

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_backend(dtfft_plan_t plan, dtfft_backend_t *backend)

Returns selected GPU backend during autotune if effort is DTFFT_PATIENT.

If effort passed to any create function is DTFFT_ESTIMATE or DTFFT_MEASURE returns value set by dtfft_set_config or default value, which is DTFFT_BACKEND_NCCL.

Parameters:
  • plan[in] Plan handle

  • backend[out] Selected backend

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_platform(dtfft_plan_t plan, dtfft_platform_t *platform)

Returns plan execution platform .

Parameters:
  • plan[in] Plan handle

  • platform[out] Plan platform

Returns:

DTFFT_SUCCESS on success or error code on failure.