C API Reference

This page describes all types, functions and macros available in dtFFT C API. In order to use them user have to #include <dtfft.h>.

Note

Not all of the API listed below can be accessible in runtime. For example dtfft_platform_t can only be used if dtFFT compiled with CUDA support.

Predefined Macros

DTFFT_VERSION_MAJOR

dtFFT Major Version

DTFFT_VERSION_MINOR

dtFFT Minor Version

DTFFT_VERSION_PATCH

dtFFT Patch Version

DTFFT_VERSION_CODE

dtFFT Version Code.

Can be used for version comparison

DTFFT_VERSION(X, Y, Z)

Generates Version Code based on Major, Minor, Patch.

DTFFT_CALL(call)

Safe call macro.

Should be used to check error codes returned by dtFFT.

Writes an error message to stderr and calls MPI_Abort if an error occurs.

Example

DTFFT_CALL( dtfft_transpose(plan, a, b) )

Enumerators

enum dtfft_error_t

This enum lists the different error codes that dtFFT can return.

Values:

enumerator DTFFT_SUCCESS

Successful execution.

enumerator DTFFT_ERROR_MPI_FINALIZED

MPI_Init is not called or MPI_Finalize has already been called.

enumerator DTFFT_ERROR_PLAN_NOT_CREATED

Plan not created.

enumerator DTFFT_ERROR_INVALID_TRANSPOSE_TYPE

Invalid transpose_type provided.

enumerator DTFFT_ERROR_INVALID_N_DIMENSIONS

Invalid Number of dimensions provided.

Valid options are 2 and 3

enumerator DTFFT_ERROR_INVALID_DIMENSION_SIZE

One or more provided dimension sizes <= 0.

enumerator DTFFT_ERROR_INVALID_COMM_TYPE

Invalid communicator type provided.

enumerator DTFFT_ERROR_INVALID_PRECISION

Invalid precision parameter provided.

enumerator DTFFT_ERROR_INVALID_EFFORT

Invalid effort parameter provided.

enumerator DTFFT_ERROR_INVALID_EXECUTOR

Invalid executor parameter provided.

enumerator DTFFT_ERROR_INVALID_COMM_DIMS

Number of dimensions in provided Cartesian communicator > Number of dimension passed to create subroutine.

enumerator DTFFT_ERROR_INVALID_COMM_FAST_DIM

Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.

enumerator DTFFT_ERROR_MISSING_R2R_KINDS

For R2R plan, kinds parameter must be passed if executor != DTFFT_EXECUTOR_NONE

enumerator DTFFT_ERROR_INVALID_R2R_KINDS

Invalid values detected in kinds parameter.

enumerator DTFFT_ERROR_R2C_TRANSPOSE_PLAN

Transpose plan is not supported in R2C, use R2R or C2C plan instead.

enumerator DTFFT_ERROR_INPLACE_TRANSPOSE

Inplace transpose is not supported.

enumerator DTFFT_ERROR_INVALID_AUX

Invalid aux buffer provided.

enumerator DTFFT_ERROR_INVALID_LAYOUT

Invalid layout passed to dtfft_get_pencil

enumerator DTFFT_ERROR_INVALID_USAGE

Invalid API Usage.

enumerator DTFFT_ERROR_PLAN_IS_CREATED

Trying to create already created plan.

enumerator DTFFT_ERROR_R2R_FFT_NOT_SUPPORTED

Selected executor do not support R2R FFTs.

enumerator DTFFT_ERROR_ALLOC_FAILED

Internal call of dtfft_mem_alloc failed.

enumerator DTFFT_ERROR_FREE_FAILED

Internal call of dtfft_mem_free failed.

enumerator DTFFT_ERROR_INVALID_ALLOC_BYTES

Invalid alloc_bytes provided.

enumerator DTFFT_ERROR_DLOPEN_FAILED

Failed to dynamically load library.

enumerator DTFFT_ERROR_DLSYM_FAILED

Failed to dynamically load symbol.

enumerator DTFFT_ERROR_PENCIL_ARRAYS_SIZE_MISMATCH

Deprecated/unused: R2C transpose call restriction (kept for backward compatibility of error code numbering)

Sizes of starts and counts arrays passed to dtfft_pencil_t constructor do not match

enumerator DTFFT_ERROR_PENCIL_ARRAYS_INVALID_SIZES

Sizes of starts and counts < 2 or > 3 provided to dtfft_pencil_t constructor.

enumerator DTFFT_ERROR_PENCIL_INVALID_COUNTS

Invalid counts provided to dtfft_pencil_t constructor.

enumerator DTFFT_ERROR_PENCIL_INVALID_STARTS

Invalid starts provided to dtfft_pencil_t constructor.

enumerator DTFFT_ERROR_PENCIL_SHAPE_MISMATCH

Processes have same lower bounds but different sizes in some dimensions.

enumerator DTFFT_ERROR_PENCIL_OVERLAP

Pencil overlap detected, i.e.

two processes share same part of global space

enumerator DTFFT_ERROR_PENCIL_NOT_CONTINUOUS

Local pencils do not cover the global space without gaps.

enumerator DTFFT_ERROR_PENCIL_NOT_INITIALIZED

Pencil is not initialized, i.e.

constructor subroutine was not called

enumerator DTFFT_ERROR_INVALID_MEASURE_WARMUP_ITERS

Invalid n_measure_warmup_iters provided.

enumerator DTFFT_ERROR_INVALID_MEASURE_ITERS

Invalid n_measure_iters provided.

enumerator DTFFT_ERROR_INVALID_REQUEST

Invalid dtfft_request_t provided.

enumerator DTFFT_ERROR_TRANSPOSE_ACTIVE

Attempting to execute already active transposition.

enumerator DTFFT_ERROR_TRANSPOSE_NOT_ACTIVE

Attempting to finalize non-active transposition.

enumerator DTFFT_ERROR_INVALID_RESHAPE_TYPE

Invalid reshape_type provided.

enumerator DTFFT_ERROR_RESHAPE_ACTIVE

Attempting to execute already active reshape.

enumerator DTFFT_ERROR_RESHAPE_NOT_ACTIVE

Attempting to finalize non-active reshape.

enumerator DTFFT_ERROR_INPLACE_RESHAPE

Inplace reshape is not supported.

enumerator DTFFT_ERROR_INVALID_EXECUTE_TYPE

R2C reshape was called.

Invalid execute_type provided

enumerator DTFFT_ERROR_RESHAPE_NOT_SUPPORTED

Reshape is not supported for this plan.

enumerator DTFFT_ERROR_R2C_EXECUTE_CALLED

Execute called for transpose-only R2C Plan.

enumerator DTFFT_ERROR_INVALID_CART_COMM

Invalid cartesian communicator provided.

enumerator DTFFT_ERROR_INVALID_TRANSPOSE_MODE

Invalid transpose mode provided.

enumerator DTFFT_ERROR_GPU_INVALID_STREAM

Invalid stream provided.

enumerator DTFFT_ERROR_INVALID_BACKEND

Invalid backend provided.

enumerator DTFFT_ERROR_GPU_NOT_SET

Multiple MPI Processes located on same host share same GPU which is not supported.

enumerator DTFFT_ERROR_VKFFT_R2R_2D_PLAN

When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.

enumerator DTFFT_ERROR_BACKENDS_DISABLED

Passed effort == DTFFT_PATIENT but all Backends has been disabled by dtfft_config_t

enumerator DTFFT_ERROR_NOT_DEVICE_PTR

One of pointers passed to dtfft_execute or dtfft_transpose cannot be accessed from device.

enumerator DTFFT_ERROR_NOT_NVSHMEM_PTR

One of pointers passed to dtfft_execute or dtfft_transpose is not an NVSHMEM pointer.

enumerator DTFFT_ERROR_INVALID_PLATFORM

Invalid platform provided.

enumerator DTFFT_ERROR_INVALID_PLATFORM_EXECUTOR

Invalid executor provided for selected platform.

enumerator DTFFT_ERROR_INVALID_PLATFORM_BACKEND

Invalid backend provided for selected platform.

enumerator DTFFT_ERROR_INVALID_ACCESS_MODE

Invalid access mode provided.

enumerator DTFFT_ERROR_COMPRESSION_CUDA_NOT_SUPPORTED

CUDA support is not available for compression.

enumerator DTFFT_ERROR_COMPRESSION_INVALID_RATE

Invalid compression rate.

enumerator DTFFT_ERROR_COMPRESSION_INVALID_PRECISION

Invalid compression precision.

enumerator DTFFT_ERROR_COMPRESSION_INVALID_TOLERANCE

Invalid compression tolerance.

enumerator DTFFT_ERROR_COMPRESSION_INVALID_MODE

Invalid compression mode.

enumerator DTFFT_ERROR_COMPRESSION_INVALID_LIBRARY

Invalid compression library.

enumerator DTFFT_ERROR_COMPRESSION_NOT_USED

Compressed backends are not used for this plan.


enum dtfft_execute_t

This enum lists valid execute_type parameters that can be passed to dtfft_execute.

Values:

enumerator DTFFT_EXECUTE_FORWARD

Perform XYZ –> YZX –> ZXY plan execution (Forward)

enumerator DTFFT_EXECUTE_BACKWARD

Perform ZXY –> YZX –> XYZ plan execution (Backward)


enum dtfft_transpose_t

This enum lists valid transpose_type parameters that can be passed to dtfft_transpose.

Values:

enumerator DTFFT_TRANSPOSE_X_TO_Y

Transpose from Fortran X aligned to Fortran Y aligned.

enumerator DTFFT_TRANSPOSE_Y_TO_X

Transpose from Fortran Y aligned to Fortran X aligned.

enumerator DTFFT_TRANSPOSE_Y_TO_Z

Transpose from Fortran Y aligned to Fortran Z aligned.

enumerator DTFFT_TRANSPOSE_Z_TO_Y

Transpose from Fortran Z aligned to Fortran Y aligned.

enumerator DTFFT_TRANSPOSE_X_TO_Z

Transpose from Fortran X aligned to Fortran Z aligned.

Note

This value is valid to pass only in 3D Plan and value returned by dtfft_get_z_slab_enabled must be true

enumerator DTFFT_TRANSPOSE_Z_TO_X

Transpose from Fortran Z aligned to Fortran X aligned.

Note

This value is valid to pass only in 3D Plan and value returned by dtfft_get_z_slab_enabled must be true


enum dtfft_precision_t

This enum lists valid precision values that can be passed while creating plan.

Values:

enumerator DTFFT_SINGLE

Use Single precision.

enumerator DTFFT_DOUBLE

Use Double precision.


enum dtfft_effort_t

This enum lists valid effort values that can be passed while creating plan.

Values:

enumerator DTFFT_ESTIMATE

Create plan as fast as possible.

enumerator DTFFT_MEASURE

Will attempt to find best MPI Grid decomposition.

Passing this flag and MPI Communicator with cartesian topology to dtfft_create_plan_* is same as DTFFT_ESTIMATE.

enumerator DTFFT_PATIENT

Same as DTFFT_MEASURE plus autotune will try to find best backend.

enumerator DTFFT_EXHAUSTIVE

Same as DTFFT_PATIENT plus will autotune all possible kernels and reshape backends to find best configuration.


enum dtfft_executor_t

This enum lists available FFT executors.

Values:

enumerator DTFFT_EXECUTOR_NONE

Do not create any FFT plans.

Creates transpose only plan.

enumerator DTFFT_EXECUTOR_FFTW3

FFTW3 Executor (Host only)

enumerator DTFFT_EXECUTOR_MKL

MKL DFTI Executor (Host only)

enumerator DTFFT_EXECUTOR_CUFFT

CUFFT Executor (GPU Only)

enumerator DTFFT_EXECUTOR_VKFFT

VkFFT Executor (GPU Only)


enum dtfft_r2r_kind_t

This enum lists the different R2R FFT kinds.

Values:

enumerator DTFFT_DCT_1

DCT-I (Logical N=2*(n-1), inverse is DTFFT_DCT_1)

enumerator DTFFT_DCT_2

DCT-II (Logical N=2*n, inverse is DTFFT_DCT_3)

enumerator DTFFT_DCT_3

DCT-III (Logical N=2*n, inverse is DTFFT_DCT_2)

enumerator DTFFT_DCT_4

DCT-IV (Logical N=2*n, inverse is DTFFT_DCT_4)

enumerator DTFFT_DST_1

DST-I (Logical N=2*(n+1), inverse is DTFFT_DST_1)

enumerator DTFFT_DST_2

DST-II (Logical N=2*n, inverse is DTFFT_DST_3)

enumerator DTFFT_DST_3

DST-III (Logical N=2*n, inverse is DTFFT_DST_2)

enumerator DTFFT_DST_4

DST-IV (Logical N=2*n, inverse is DTFFT_DST_4)


enum dtfft_backend_t

This enum lists the different available backend options.

Values:

enumerator DTFFT_BACKEND_MPI_DATATYPE

Backend that uses MPI datatypes.

This is default backend for Host platform.

Not really recommended to use for GPU usage, since it is a ‘million’ times slower than other backends. Not available for autotune when effort is DTFFT_PATIENT on CUDA platform.

enumerator DTFFT_BACKEND_MPI_P2P

MPI peer-to-peer algorithm.

enumerator DTFFT_BACKEND_MPI_P2P_PIPELINED

MPI peer-to-peer algorithm with overlapping data copying and unpacking.

enumerator DTFFT_BACKEND_MPI_A2A

MPI backend using MPI_Alltoallv.

enumerator DTFFT_BACKEND_MPI_RMA

MPI backend using one-sided communications.

enumerator DTFFT_BACKEND_MPI_RMA_PIPELINED

MPI backend using pipelined one-sided communications.

enumerator DTFFT_BACKEND_MPI_P2P_SCHEDULED

MPI peer-to-peer algorithm with scheduled communication.

enumerator DTFFT_BACKEND_MPI_P2P_FUSED

MPI peer-to-peer pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.

enumerator DTFFT_BACKEND_MPI_RMA_FUSED

MPI RMA pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.

enumerator DTFFT_BACKEND_MPI_P2P_COMPRESSED

Extension of Backend.MPI_P2P_FUSED Data is getting compressed before sending and decompressed after receiving.

enumerator DTFFT_BACKEND_MPI_RMA_COMPRESSED

Extension of Backend.MPI_RMA_FUSED Data is getting compressed before sending and decompressed after receiving.

enumerator DTFFT_BACKEND_NCCL

NCCL backend.

enumerator DTFFT_BACKEND_NCCL_PIPELINED

NCCL backend with overlapping data copying and unpacking.

enumerator DTFFT_BACKEND_NCCL_COMPRESSED

NCCL backend that performs compression before data exchange and decompression after.

enumerator DTFFT_BACKEND_CUFFTMP

cuFFTMp backend

enumerator DTFFT_BACKEND_CUFFTMP_PIPELINED

cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance

enumerator DTFFT_BACKEND_ADAPTIVE

Adaptive backend selection: during plan creation dtFFT benchmarks multiple backends and selects the fastest backend independently for each transpose/reshape operation.

The selection is fixed for the lifetime of the plan.

Note

Can only be used when effort >= DTFFT_PATIENT.

Note

Currently only available for HOST execution platform

enumerator DTFFT_BACKEND_NONE

Backend is not defined.

This value is used when no backend is selected, for example when executing on a single process.

Note

This value should never be set by user directly. It can only be returned by the library.


enum dtfft_transpose_mode_t

This enum specifies at which stage the local transposition is performed during global exchange.

It affects only Generic backends that perform explicit packing/unpacking.

Values:

enumerator DTFFT_TRANSPOSE_MODE_PACK

Perform transposition during the packing stage (Sender side).

enumerator DTFFT_TRANSPOSE_MODE_UNPACK

Perform transposition during the unpacking stage (Receiver side).


enum dtfft_access_mode_t

This enum lists valid access_mode parameters that can be passed to dtfft_config_t.

Values:

enumerator DTFFT_ACCESS_MODE_WRITE

Optimize for write access (Aligned writing).

This is the default mode.

enumerator DTFFT_ACCESS_MODE_READ

Optimize for read access (Aligned reading)


enum dtfft_platform_t

Enum that specifies the execution platform, such as Host, CUDA, or HIP.

Values:

enumerator DTFFT_PLATFORM_HOST

Host.

enumerator DTFFT_PLATFORM_CUDA

CUDA.


enum dtfft_reshape_t

This enum lists valid reshape_type parameters that can be passed to dtfft_reshape.

Values:

enumerator DTFFT_RESHAPE_X_BRICKS_TO_PENCILS

Reshape from X-bricks to X-pencils.

enumerator DTFFT_RESHAPE_X_PENCILS_TO_BRICKS

Reshape from X-pencils to X-bricks.

enumerator DTFFT_RESHAPE_Z_BRICKS_TO_PENCILS

Reshape from Z-bricks to Z-pencils.

enumerator DTFFT_RESHAPE_Z_PENCILS_TO_BRICKS

Reshape from Z-pencils to Z-bricks.

enumerator DTFFT_RESHAPE_Y_BRICKS_TO_PENCILS

Reshape from Y-bricks to Y-pencils This is to be used in 2D Plans.

enumerator DTFFT_RESHAPE_Y_PENCILS_TO_BRICKS

Reshape from Y-pencils to Y-bricks This is to be used in 2D Plans.


enum dtfft_layout_t

This enum represents different data layouts used in dtFFT and it should be used to retrieve layout information from plans.

Values:

enumerator DTFFT_LAYOUT_X_BRICKS

X-brick layout: data is distributed along all dimensions.

enumerator DTFFT_LAYOUT_X_PENCILS

X-pencil layout: data is distributed along Y and Z dimensions.

enumerator DTFFT_LAYOUT_X_PENCILS_FOURIER

X-pencil layout obtained after executing FFT for R2C plan: data is distributed along Y and Z dimensions.

enumerator DTFFT_LAYOUT_Y_PENCILS

Y-pencil layout: data is distributed along X and Z dimensions.

enumerator DTFFT_LAYOUT_Z_PENCILS

Z-pencil layout: data is distributed along X and Y dimensions.

enumerator DTFFT_LAYOUT_Z_BRICKS

Z-brick layout: data is distributed along all dimensions.


enum dtfft_compression_lib_t

This enum lists valid compression library parameters.

Values:

enumerator DTFFT_COMPRESSION_LIB_ZFP

ZFP compression library.


enum dtfft_compression_mode_t

This enum lists valid compression mode parameters.

Values:

enumerator DTFFT_COMPRESSION_MODE_LOSSLESS

Lossless compression mode.

enumerator DTFFT_COMPRESSION_MODE_FIXED_RATE

Fixed rate compression mode.

enumerator DTFFT_COMPRESSION_MODE_FIXED_PRECISION

Fixed precision compression mode.

enumerator DTFFT_COMPRESSION_MODE_FIXED_ACCURACY

Fixed accuracy compression mode.

Types

typedef void *dtfft_plan_t

Structure to hold plan data.


struct dtfft_pencil_t

Structure to hold pencil decomposition info.

There are two ways users might find pencils useful inside dtFFT:

  1. To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructors.

  2. To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.

In order to create plan using dtfft_pencil_t, user need to provide ndims, starts and counts arrays, other values will be ignored.

When pencil is returned from dtfft_get_pencil, all pencil properties are defined.

Public Members

uint8_t dim

Aligned dimension ID starting from 1.

uint8_t ndims

Number of dimensions in a pencil.

int32_t starts[3]

Local starts in natural Fortran order.

If ndims == 2, then only first two elements are defined

int32_t counts[3]

Local counts in natural Fortran order.

If ndims == 2, then only first two elements are defined

size_t size

Total number of elements in a pencil.


struct dtfft_config_t

Struct that can be used to set additional configuration parameters to dtFFT.

Public Members

bool enable_log

Should dtFFT print additional information or not.

Default is false.

bool enable_z_slab

Enables Z-slab optimization.

Default is true

One should consider disabling Z-slab optimization in order to resolve DTFFT_ERROR_VKFFT_R2R_2D_PLAN error or when underlying FFT implementation of 2D plan is too slow.

In all other cases, Z-slab is considered to be always faster.

bool enable_y_slab

Enables Y-slab optimization.

Default is false.

If true, then dtFFT will skip the transpose step between Y and Z aligned layouts during call to dtfft_execute.

One should consider disabling Y-slab optimization when the underlying FFT implementation of the 2D plan is too slow.

In all other cases, Y-slab is considered to be always faster.

int32_t n_measure_warmup_iters

Number of warmup iterations to execute during backend and kernel autotuning when effort level is DTFFT_MEASURE or higher.

Default is 2.

int32_t n_measure_iters

Number of iterations to execute during backend and kernel autotuning when effort level is DTFFT_MEASURE or higher.

Default is 5.

dtfft_platform_t platform

Selects platform to execute plan.

Default is DTFFT_PLATFORM_HOST.

This option is only available when dtFFT is built with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related. This enables a single library installation to support both host and CUDA plans.

Note

This option is only defined when dtFFT is built with CUDA support.

dtfft_stream_t stream

Main CUDA stream that will be used in dtFFT.

This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by dtfft_get_stream function. When user sets stream he is responsible of destroying it.

Stream must not be destroyed before call to dtfft_destroy.

Note

This option is only defined when dtFFT is built with CUDA support.

dtfft_backend_t backend

Backend that will be used by dtFFT when effort is DTFFT_ESTIMATE or DTFFT_MEASURE.

Default for HOST platform is DTFFT_BACKEND_MPI_DATATYPE.

Default for CUDA platform is DTFFT_BACKEND_NCCL if NCCL is enabled, otherwise DTFFT_BACKEND_MPI_P2P.

dtfft_backend_t reshape_backend

Backend that will be used by dtFFT for data reshaping from bricks to pencils and vice versa when effort is DTFFT_ESTIMATE or DTFFT_MEASURE.

Default for HOST platform is DTFFT_BACKEND_MPI_DATATYPE.

Default for CUDA platform is DTFFT_BACKEND_NCCL if NCCL is enabled, otherwise DTFFT_BACKEND_MPI_P2P.

bool enable_datatype_backend

Should DTFFT_BACKEND_MPI_DATATYPE be considered for autotuning when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true

This option works only when executing on a host.

bool enable_mpi_backends

Should MPI Backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is false.

This option applies to all DTFFT_BACKEND_MPI_* backends, except DTFFT_BACKEND_MPI_DATATYPE`.

The following applies only to CUDA builds. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.

For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.

One of the workarounds is to disable MPI Backends by default, which is done here.

Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms

bool enable_pipelined_backends

Should pipelined backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true.

bool enable_rma_backends

Should RMA backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true.

bool enable_fused_backends

Should fused backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true.

bool enable_nccl_backends

Should NCCL Backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true.

Note

This option is only defined when dtFFT is built with CUDA support.

bool enable_nvshmem_backends

Should NVSHMEM Backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is true.

Note

This option is only defined when dtFFT is built with CUDA support.

bool enable_kernel_autotune

Should dtFFT try to optimize kernel launch parameters during plan creation when effort is below DTFFT_EXHAUSTIVE.

Default is false.

Kernel optimization is always enabled for DTFFT_EXHAUSTIVE effort level. Setting this option to true enables kernel optimization for lower effort levels (DTFFT_ESTIMATE, DTFFT_MEASURE, DTFFT_PATIENT). This may increase plan creation time but can improve runtime performance. Since kernel optimization is performed without data transfers, the time increase is usually minimal.

bool enable_fourier_reshape

Should dtFFT execute reshapes from pencils to bricks and vice versa in Fourier space during calls to execute.

Default is false.

When enabled, data will be in brick layout in Fourier space, which may be useful for certain operations between forward and backward transforms. However, this requires additional data transpositions and will reduce overall FFT performance.

dtfft_transpose_mode_t transpose_mode

Specifies at which stage the local transposition is performed during global exchange when effort level is below DTFFT_EXHAUSTIVE.

Default is DTFFT_TRANSPOSE_MODE_PACK.

For DTFFT_EXHAUSTIVE effort level, dtFFT will always choose the best transpose mode based on internal autotuning.

Note

This option only takes effect when platform is DTFFT_PLATFORM_HOST

dtfft_access_mode_t access_mode

Specifies the memory access pattern (optimization target) for local transposition.

Default is DTFFT_ACCESS_MODE_WRITE.

This option allows user to force specific access mode (DTFFT_ACCESS_MODE_WRITE or DTFFT_ACCESS_MODE_READ) when autotuning is disabled. When autotuning is enabled (e.g. effort is DTFFT_EXHAUSTIVE), this option is ignored and best access mode is selected automatically.

bool enable_compressed_backends

Should compressed backends be enabled when effort is DTFFT_PATIENT or DTFFT_EXHAUSTIVE.

Default is false.

Only fixed-rate compression can be used during autotuning, since it provides predictable performance characteristics and does not require data-dependent decisions at runtime. To enable compressed backends during autotuning, set this option to true, set compression type to DTFFT_COMPRESSION_MODE_FIXED_RATE and provide desired compression rate.

dtfft_compression_config_t compression_config_transpose

Options for compression approach during transpositions.

dtfft_compression_config_t compression_config_reshape

Options for compression approach during reshape operations.


typedef void *dtfft_stream_t

dtFFT stream representation.

For CUDA platform this should be casted from cudaStream_t.

Example

cudaStream_t stream;
cudaStreamCreate(&stream);
dtfft_stream_t dtfftStream = (dtfft_stream_t)stream;


typedef void *dtfft_request_t

Helper type to manage asynchronous operations.


struct dtfft_compression_config_t

Struct that specifies compression configuration.

Public Members

dtfft_compression_lib_t compression_lib

Compression library to use.

dtfft_compression_mode_t compression_mode

Compression mode to use.

double rate

Rate for DTFFT_COMPRESSION_MODE_FIXED_RATE

int32_t precision

Precision for DTFFT_COMPRESSION_MODE_FIXED_PRECISION

double tolerance

Tolerance for DTFFT_COMPRESSION_MODE_FIXED_ACCURACY

Functions

int32_t dtfft_get_version()
Returns:

DTFFT_VERSION_CODE defined during library compilation


const char *dtfft_get_error_string(dtfft_error_t error_code)

Gets the string description of an error code.

Parameters:

error_code[in] Error code to convert to string

Returns:

Error string explaining error.


const char *dtfft_get_backend_string(dtfft_backend_t backend)

Returns null terminated string with name of backend provided as argument.

Parameters:

backend[in] Backend to represent

Returns:

Character representation of backend.


const char *dtfft_get_precision_string(dtfft_precision_t precision)

Gets the string description of a precision level.

Parameters:

precision[in] Precision level to convert to string

Returns:

String representation of dtfft_precision_t.


const char *dtfft_get_executor_string(dtfft_executor_t executor)

Gets the string description of an executor type.

Parameters:

executor[in] Executor type to convert to string

Returns:

String representation of dtfft_executor_t.


dtfft_error_t dtfft_create_config(dtfft_config_t *config)

Sets default values to config.

Parameters:

config[out] Config to set default values into

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_set_config(const dtfft_config_t *config)

Set configuration values to dtFFT.

In order to take effect should be called before plan creation

Parameters:

config[in] Config to set

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_backend_pipelined(const dtfft_backend_t backend, bool *is_pipe)

Returns true if passed backend is pipelined and false otherwise.

Parameters:
  • backend[in] Backend to check

  • is_pipe[out] Flag

Returns:

DTFFT_SUCCESS

Plan constructors

All plan constructors must be called after MPI_Init. Plan must be destroyed before call to MPI_Finalize.

dtfft_error_t dtfft_create_plan_r2r(int8_t ndims, const int32_t *dims, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Real-to-Real Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order. dims[0] must be the fastest varying

  • kinds[in] Array of size ndims containing Real FFT kinds in reverse order. Can be NULL if executor == DTFFT_EXECUTOR_NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_r2r_pencil(const dtfft_pencil_t *pencil, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Creates a Real-to-Real Plan using a pencil handle.

Parameters:
  • pencil[in] Pencil structure containing local dimensions and starts

  • kinds[in] Array of size ndims containing Real FFT kinds in reverse order. Can be NULL if executor == DTFFT_EXECUTOR_NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of the transform

  • effort[in] Effort level for the plan creation

  • executor[in] Executor to be used for the plan

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_c2c(int8_t ndims, const int32_t *dims, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Complex-to-Complex Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_c2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Complex-to-Complex Plan constructor using a pencil structure.

Parameters:
  • pencil[in] Pencil handle

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of the transform

  • effort[in] Effort level for the plan creation

  • executor[in] Executor to be used for the plan

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_r2c(int8_t ndims, const int32_t *dims, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Real-to-Complex Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Array of size ndims containing global dimensions in reverse order

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise


dtfft_error_t dtfft_create_plan_r2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)

Creates a Real-to-Complex Plan using a pencil structure.

Parameters:
  • pencil[in] Pencil structure containing local dimensions and starts

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of the transform

  • effort[in] Effort level for the plan creation

  • executor[in] Executor to be used for the plan

  • plan[out] Plan handle ready to be executed

Returns:

DTFFT_SUCCESS if plan was created, error code otherwise

Plan destructor

dtfft_error_t dtfft_destroy(dtfft_plan_t *plan)

Plan Destructor.

Parameters:

plan[inout] Plan handle

Returns:

DTFFT_SUCCESS on success or error code on failure.

Memory allocation

dtfft_error_t dtfft_mem_alloc(dtfft_plan_t plan, size_t alloc_bytes, void **ptr)

Allocates memory specific for this plan.

Parameters:
  • plan[in] Plan handle

  • alloc_bytes[in] Number of bytes to allocate

  • ptr[out] Allocated pointer

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_mem_free(dtfft_plan_t plan, void *ptr)

Frees memory specific for this plan.

Parameters:
  • plan[in] Plan handle

  • ptr[inout] Allocated pointer

Returns:

DTFFT_SUCCESS on success or error code on failure.

Plan execution

dtfft_error_t dtfft_execute(dtfft_plan_t plan, void *in, void *out, dtfft_execute_t execute_type, void *aux)

Plan execution.

Neither in nor out are allowed to be NULL. The same pointer can safely be passed to both in and out.

Note

This function is not supported for transpose-only R2C plans.

Parameters:
  • plan[in] Plan handle

  • in[inout] Incoming buffer

  • out[out] Result buffer

  • execute_type[in] Type of transform.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If NULL during first call to this function, then auxiliary will be allocated internally and freed after call to dtfft_destroy. If provided, must be at least dtfft_get_aux_bytes bytes.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_transpose(dtfft_plan_t plan, void *in, void *out, dtfft_transpose_t transpose_type, void *aux)

Transpose data in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • plan[in] Plan handle

  • in[inout] Incoming buffer

  • out[out] Transposed buffer

  • transpose_type[in] Type of transpose.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If provided, must be at least dtfft_get_alloc_size elements.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_transpose_start(dtfft_plan_t plan, void *in, void *out, dtfft_transpose_t transpose_type, void *aux, dtfft_request_t *request)

Starts an asynchronous transpose operation.

Note

Both in and out buffers must not be changed or freed until call to dtfft_transpose_end.

Parameters:
  • plan[in] Plan handle

  • in[inout] Incoming buffer

  • out[out] Transposed buffer

  • transpose_type[in] Type of transpose.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If provided, must be at least dtfft_get_alloc_size elements.

  • request[out] Handle to manage the asynchronous operation.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_transpose_end(dtfft_plan_t plan, dtfft_request_t request)

Finalizes an asynchronous transpose operation.

Parameters:
  • plan[in] Plan handle

  • request[inout] Handle to manage the asynchronous operation.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_reshape(dtfft_plan_t plan, void *in, void *out, dtfft_reshape_t reshape_type, void *aux)

Executes data reshape between brick and pencil decompositions.

Parameters:
  • plan[in] Plan handle

  • in[inout] Input pointer

  • out[out] Output pointer

  • reshape_type[in] Type of reshape.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If provided, must be at least dtfft_get_alloc_size elements.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_reshape_start(dtfft_plan_t plan, void *in, void *out, dtfft_reshape_t reshape_type, void *aux, dtfft_request_t *request)

Starts an asynchronous reshape operation.

Note

Both in and out buffers must not be changed or freed until call to dtfft_reshape_end.

Parameters:
  • plan[in] Plan handle

  • in[inout] Input pointer

  • out[out] Output pointer

  • reshape_type[in] Type of reshape.

  • aux[inout] Optional auxiliary buffer. Can be NULL. If provided, must be at least dtfft_get_alloc_size elements.

  • request[out] Handle to manage the asynchronous operation.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_reshape_end(dtfft_plan_t plan, dtfft_request_t request)

Finalizes an asynchronous reshape operation.

Parameters:
  • plan[in] Plan handle

  • request[inout] Handle to manage the asynchronous operation.

Returns:

DTFFT_SUCCESS on success or error code on failure.

Plan information

dtfft_error_t dtfft_report(dtfft_plan_t plan)

Prints plan-related information to stdout.

Parameters:

plan[in] Plan handle

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_report_compression(dtfft_plan_t plan)

Prints compression-related information to stdout.

Parameters:

plan[in] Plan handle

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_local_sizes(dtfft_plan_t plan, int32_t *in_starts, int32_t *in_counts, int32_t *out_starts, int32_t *out_counts, size_t *alloc_size)

Get grid decomposition information.

Results may differ on different MPI processes

Parameters:
  • plan[in] Plan handle

  • in_starts[out] Starts of local portion of data in real space in reversed order

  • in_counts[out] Number of elements of local portion of data in real space in reversed order

  • out_starts[out] Starts of local portion of data in fourier space in reversed order

  • out_counts[out] Number of elements of local portion of data in fourier space in reversed order

  • alloc_size[out] Minimum number of elements to be allocated for in, out buffers required by dtfft_execute, dtfft_transpose, or dtfft_reshape. Size of each element in bytes can be obtained by calling dtfft_get_element_size.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_alloc_size(dtfft_plan_t plan, size_t *alloc_size)

Wrapper around dtfft_get_local_sizes to obtain number of elements only.

Parameters:
Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_alloc_bytes(dtfft_plan_t plan, size_t *alloc_bytes)

Returns minimum number of bytes required for in and out buffers.

This function is a combination of two calls: dtfft_get_alloc_size and dtfft_get_element_size. Returns minimum number of bytes to be allocated for in and out buffers required by dtfft_execute, dtfft_transpose, or dtfft_reshape.

Parameters:
  • plan[in] Plan handle

  • alloc_bytes[out] Number of bytes required

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_size(dtfft_plan_t plan, size_t *aux_size)

Gets the number of elements required for auxiliary buffer by dtfft_execute.

Parameters:
  • plan[in] Plan handle

  • aux_size[out] Size of auxiliary buffer in bytes.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_bytes(dtfft_plan_t plan, size_t *aux_bytes)

Gets the number of bytes required for auxiliary buffer by dtfft_execute.

Parameters:
  • plan[in] Plan handle

  • aux_bytes[out] Number of bytes required for auxiliary buffer.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_size_reshape(dtfft_plan_t plan, size_t *aux_size)

Gets the number of elements required for auxiliary buffer by dtfft_reshape.

Parameters:
  • plan[in] Plan handle

  • aux_size[out] Size of auxiliary buffer in elements.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_bytes_reshape(dtfft_plan_t plan, size_t *aux_bytes)

Gets the number of bytes required for auxiliary buffer by dtfft_reshape.

Parameters:
  • plan[in] Plan handle

  • aux_bytes[out] Number of bytes required for auxiliary buffer.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_size_transpose(dtfft_plan_t plan, size_t *aux_size)

Gets the number of elements required for auxiliary buffer by dtfft_transpose.

Parameters:
  • plan[in] Plan handle

  • aux_size[out] Size of auxiliary buffer in elements.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_aux_bytes_transpose(dtfft_plan_t plan, size_t *aux_bytes)

Gets the number of bytes required for auxiliary buffer by dtfft_transpose.

Parameters:
  • plan[in] Plan handle

  • aux_bytes[out] Number of bytes required for auxiliary buffer.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_element_size(dtfft_plan_t plan, size_t *element_size)

Obtains number of bytes required to store single element by this plan.

Parameters:
  • plan[in] Plan handle

  • element_size[out] Size of element in bytes

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_pencil(dtfft_plan_t plan, dtfft_layout_t layout, dtfft_pencil_t *pencil)

Obtains pencil information from plan.

This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.

Parameters:
  • plan[in] Plan handle

  • layout[in] Required layout of the pencil

  • pencil[out] Pencil data

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_z_slab_enabled(dtfft_plan_t plan, bool *is_z_slab_enabled)

Checks if plan is using Z-slab optimization.

If true then flags DTFFT_TRANSPOSE_X_TO_Z and DTFFT_TRANSPOSE_Z_TO_X will be valid to pass to dtfft_transpose.

Parameters:
  • plan[in] Plan handle

  • is_z_slab_enabled[out] Boolean value if Z-slab is used.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_y_slab_enabled(dtfft_plan_t plan, bool *is_y_slab_enabled)

Checks if plan is using Y-slab optimization.

If true then dtFFT will skip the transpose step between Y and Z aligned layouts during call to dtfft_execute.

Parameters:
  • plan[in] Plan handle

  • is_y_slab_enabled[out] Boolean value if Y-slab is used.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_stream(dtfft_plan_t plan, dtfft_stream_t *stream)

Returns stream associated with dtFFT plan.

This can either be stream passed by user to dtfft_set_config or stream created internally. Returns NULL pointer if plan’s platform is DTFFT_PLATFORM_HOST.

Parameters:
  • plan[in] Plan handle

  • stream[out] CUDA stream associated with plan

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_backend(dtfft_plan_t plan, dtfft_backend_t *backend)

Returns selected backend during autotune if effort is DTFFT_PATIENT.

If effort passed to any create function is DTFFT_ESTIMATE or DTFFT_MEASURE returns value set by dtfft_set_config or default value, which is DTFFT_BACKEND_NCCL for CUDA build and DTFFT_BACKEND_MPI_DATATYPE for host build.

Parameters:
  • plan[in] Plan handle

  • backend[out] Selected backend

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_reshape_backend(dtfft_plan_t plan, dtfft_backend_t *backend)

Returns selected backend for reshape operations.

Parameters:
  • plan[in] Plan handle

  • backend[out] Selected backend for reshape operations

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_platform(dtfft_plan_t plan, dtfft_platform_t *platform)

Returns plan execution platform .

Parameters:
  • plan[in] Plan handle

  • platform[out] Plan platform

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_executor(dtfft_plan_t plan, dtfft_executor_t *executor)

Returns FFT executor used in plan.

Parameters:
  • plan[in] Plan handle

  • executor[out] FFT Executor

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_precision(dtfft_plan_t plan, dtfft_precision_t *precision)

Returns precision of the plan.

Parameters:
  • plan[in] Plan handle

  • precision[out] Precision of the plan

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_dims(dtfft_plan_t plan, int8_t *ndims, const int32_t *dims[])

Returns global dimensions of the plan.

Note

Do not free dims array, it is freed when the dtfft_plan_t is destroyed.

Parameters:
  • plan[in] Plan handle

  • ndims[out] Number of dimensions in plan. User can pass NULL if this value is not needed.

  • dims[out] Pointer of size ndims containing global dimensions in reverse order dims[0] is the fastest varying. User can pass NULL if this value is not needed.

Returns:

DTFFT_SUCCESS on success or error code on failure.


dtfft_error_t dtfft_get_grid_dims(dtfft_plan_t plan, int8_t *ndims, const int32_t *grid_dims[])

Returns grid decomposition dimensions of the plan.

Note

Do not free grid_dims array, it is freed when the dtfft_plan_t is destroyed.

Parameters:
  • plan[in] Plan handle

  • ndims[out] Number of dimensions in plan. User can pass NULL if this value is not needed.

  • grid_dims[out] Pointer of size ndims containing grid decomposition dimensions in reverse order grid_dims[0] is the fastest varying and is always equal to 1. User can pass NULL if this value is not needed.

Returns:

DTFFT_SUCCESS on success or error code on failure.