C++ API Reference

This page describes all classes, enumerators and functions available in dtFFT C++ API. In order to use them user have to #include <dtfft.hpp>. All API is contained within dtfft namespace.

Predefined Macros

DTFFT_CXX_CALL(call)

Safe call macro.

Should be used to check error codes returned by dtFFT.

Throws an exception with a message explaining the error if one occurs.

Example

DTFFT_CXX_CALL( plan.execute(a, b, dtfft::Execute::FORWARD) )

Enumerators

enum class dtfft::Error

This enum lists the different error codes that dtFFT can return.

Values:

enumerator SUCCESS

Successful execution.

enumerator MPI_FINALIZED

MPI_Init is not called or MPI_Finalize has already been called.

enumerator PLAN_NOT_CREATED

Plan not created.

enumerator INVALID_TRANSPOSE_TYPE

Invalid transpose_type provided.

enumerator INVALID_N_DIMENSIONS

Invalid Number of dimensions provided.

Valid options are 2 and 3

enumerator INVALID_DIMENSION_SIZE

One or more provided dimension sizes <= 0.

enumerator INVALID_COMM_TYPE

Invalid communicator type provided.

enumerator INVALID_PRECISION

Invalid precision parameter provided.

enumerator INVALID_EFFORT

Invalid effort parameter provided.

enumerator INVALID_EXECUTOR

Invalid executor parameter provided.

enumerator INVALID_COMM_DIMS

Number of dimensions in provided Cartesian communicator > Number of dimension passed to create subroutine.

enumerator INVALID_COMM_FAST_DIM

Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.

enumerator MISSING_R2R_KINDS

For R2R plan, kinds parameter must be passed if executor != Executor::NONE

enumerator INVALID_R2R_KINDS

Invalid values detected in kinds parameter.

enumerator R2C_TRANSPOSE_PLAN

Transpose plan is not supported in R2C, use R2R or C2C plan instead.

enumerator INPLACE_TRANSPOSE

Inplace transpose is not supported.

enumerator INVALID_AUX

Invalid aux buffer provided.

enumerator INVALID_DIM

Invalid dim passed to Plan.get_pencil

enumerator INVALID_USAGE

Invalid API Usage.

enumerator PLAN_IS_CREATED

Trying to create already created plan.

enumerator R2R_FFT_NOT_SUPPORTED

Selected executor does not support R2R FFTs.

enumerator ALLOC_FAILED

Internal call of Plan.mem_alloc failed.

enumerator FREE_FAILED

Internal call of Plan.mem_free failed.

enumerator INVALID_ALLOC_BYTES

Invalid alloc_bytes provided.

enumerator DLOPEN_FAILED

Failed to dynamically load library.

enumerator DLSYM_FAILED

Failed to dynamically load symbol.

enumerator R2C_TRANSPOSE_CALLED

Calling to Plan.transpose for R2C plan is not allowed.

enumerator PENCIL_ARRAYS_SIZE_MISMATCH

Sizes of starts and counts arrays passed to Pencil constructor do not match.

enumerator PENCIL_ARRAYS_INVALID_SIZES

Sizes of starts and counts < 2 or > 3 provided to Pencil constructor.

enumerator PENCIL_INVALID_COUNTS

Invalid counts provided to Pencil constructor.

enumerator PENCIL_INVALID_STARTS

Invalid starts provided to Pencil constructor.

enumerator PENCIL_SHAPE_MISMATCH

Processes have same lower bounds (starts) but different sizes in some dimensions.

enumerator PENCIL_OVERLAP

Pencil overlap detected, i.e.

two processes share same part of global space

enumerator PENCIL_NOT_CONTINUOUS

Local pencils do not cover the global space without gaps.

enumerator PENCIL_NOT_INITIALIZED

Pencil is not initialized, i.e.

constructor subroutine was not called

enumerator INVALID_MEASURE_WARMUP_ITERS

Invalid n_measure_warmup_iters provided.

enumerator INVALID_MEASURE_ITERS

Invalid n_measure_iters provided.

enumerator INVALID_REQUEST

Invalid dtfft_request_t provided.

enumerator TRANSPOSE_ACTIVE

Attempting to execute already active transposition.

enumerator TRANSPOSE_NOT_ACTIVE

Attempting to finalize non-active transposition.

enumerator GPU_INVALID_STREAM

Invalid stream provided.

enumerator INVALID_BACKEND

Invalid backend provided.

enumerator GPU_NOT_SET

Multiple MPI Processes located on same host share same GPU which is not supported.

enumerator VKFFT_R2R_2D_PLAN

When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.

enumerator BACKENDS_DISABLED

Passed effort == Effort::PATIENT but all GPU backends have been disabled by Config

enumerator NOT_DEVICE_PTR

One of pointers passed to Plan.execute or Plan.transpose cannot be accessed from device.

enumerator NOT_NVSHMEM_PTR

One of pointers passed to Plan.execute or Plan.transpose is not an NVSHMEM pointer.

enumerator INVALID_PLATFORM

Invalid platform provided.

enumerator INVALID_PLATFORM_EXECUTOR

Invalid executor provided for selected platform.

enumerator INVALID_PLATFORM_BACKEND

Invalid backend provided for selected platform.

enum class dtfft::Execute

This enum lists valid execute_type parameters that can be passed to Plan.execute.

Values:

enumerator FORWARD

Perform XYZ –> YZX –> ZXY plan execution (Forward)

enumerator BACKWARD

Perform ZXY –> YZX –> XYZ plan execution (Backward)

enum class dtfft::Transpose

This enum lists valid transpose_type parameters that can be passed to Plan.transpose.

Values:

enumerator X_TO_Y

Transpose from Fortran X aligned to Fortran Y aligned.

enumerator Y_TO_X

Transpose from Fortran Y aligned to Fortran X aligned.

enumerator Y_TO_Z

Transpose from Fortran Y aligned to Fortran Z aligned.

enumerator Z_TO_Y

Transpose from Fortran Z aligned to Fortran Y aligned.

enumerator X_TO_Z

Transpose from Fortran X aligned to Fortran Z aligned.

Note

This value is valid only for 3D plans, and Plan.get_z_slab_enabled() must return true

enumerator Z_TO_X

Transpose from Fortran Z aligned to Fortran X aligned.

Note

This value is valid only for 3D plans, and Plan.get_z_slab_enabled() must return true

enum class dtfft::Precision

This enum lists valid precision parameters that can be passed to Plan constructors.

Values:

enumerator SINGLE

Use Single precision.

enumerator DOUBLE

Use Double precision.

enum class dtfft::Effort

This enum lists valid effort parameters that can be passed to Plan constructors.

Values:

enumerator ESTIMATE

Create plan as fast as possible.

enumerator MEASURE

Will attempt to find best MPI Grid decomposition.

Passing this flag and MPI Communicator with cartesian topology to any Plan Constructor is same as Effort::ESTIMATE.

enumerator PATIENT

Same as Effort::MEASURE plus autotune will try to find best backend.

enum class dtfft::Executor

This enum lists available FFT executors.

Values:

enumerator NONE

Do not create any FFT plans.

Creates transpose only plan.

enumerator FFTW3

FFTW3 Executor (Host only)

enumerator MKL

MKL DFTI Executor (Host only)

enumerator CUFFT

CUFFT Executor (GPU Only)

enumerator VKFFT

VkFFT Executor (GPU Only)

enum class dtfft::R2RKind

Real-to-Real FFT kinds available in dtFFT.

Values:

enumerator DCT_1

DCT-I (Logical N=2*(n-1), inverse is R2RKind::DCT_1)

enumerator DCT_2

DCT-II (Logical N=2*n, inverse is R2RKind::DCT_3)

enumerator DCT_3

DCT-III (Logical N=2*n, inverse is R2RKind::DCT_2)

enumerator DCT_4

DCT-IV (Logical N=2*n, inverse is R2RKind::DCT_4)

enumerator DST_1

DST-I (Logical N=2*(n+1), inverse is R2RKind::DST_1)

enumerator DST_2

DST-II (Logical N=2*n, inverse is R2RKind::DST_3)

enumerator DST_3

DST-III (Logical N=2*n, inverse is R2RKind::DST_2)

enumerator DST_4

DST-IV (Logical N=2*n, inverse is R2RKind::DST_4)

enum class dtfft::Backend

Various Backends available in dtFFT.

Values:

enumerator MPI_DATATYPE

Backend that uses MPI datatypes.

This is default backend for Host build.

Not really recommended to use for GPU usage, since it is a ‘million’ times slower than other backends. Not available for autotune when effort is Effort::DTFFT_PATIENT in GPU build.

enumerator MPI_P2P

MPI peer-to-peer algorithm.

enumerator MPI_P2P_PIPELINED

MPI peer-to-peer algorithm with overlapping data copying and unpacking.

enumerator MPI_A2A

MPI backend using MPI_Alltoallv.

enumerator MPI_RMA

MPI backend using one-sided communications.

enumerator MPI_RMA_PIPELINED

MPI backend using pipelined one-sided communications.

enumerator NCCL

NCCL backend.

enumerator NCCL_PIPELINED

NCCL backend with overlapping data copying and unpacking.

enumerator CUFFTMP

cuFFTMp backend

enumerator CUFFTMP_PIPELINED

cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance

enum class dtfft::Platform

Enum that specifies runtime platform, e.g.

Host, CUDA, HIP

Values:

enumerator HOST

Host.

enumerator CUDA

CUDA.

Functions

std::string dtfft::get_backend_string(Backend backend)

Returns string with name of backend provided as argument.

Parameters:

backend[in] Backend to represent

Returns:

String representation of backend.

std::string dtfft::get_error_string(Error error_code) noexcept

Returns the string description of an error code.

Parameters:

error_code[in] Error code to convert to string

Returns:

String representation of error_code

std::string dtfft::get_precision_string(Precision precision) noexcept

Returns the string representation of a Precision value.

Parameters:

precision[in] Precision level to convert to string

Returns:

String representation of Precision.

std::string dtfft::get_executor_string(Executor executor) noexcept

Returns the string representation of an Executor value.

Parameters:

executor[in] Executor type to convert to string

Returns:

String representation of Executor.

Error dtfft::set_config(const Config &config) noexcept

Sets configuration values to dtFFT.

Must be called before plan creation to take effect.

See also

Config

Returns:

Error::SUCCESS if the call was successful, error code otherwise

Structs

struct Version

dtFFT version information

Public Static Functions

static inline int32_t get() noexcept
Returns:

Version Code defined during compilation

static inline constexpr int32_t get(int32_t major, int32_t minor, int32_t patch) noexcept
Returns:

Version Code based on input parameters

Public Static Attributes

static constexpr int32_t MAJOR = DTFFT_VERSION_MAJOR

dtFFT Major Version

static constexpr int32_t MINOR = DTFFT_VERSION_MINOR

dtFFT Minor Version

static constexpr int32_t PATCH = DTFFT_VERSION_PATCH

dtFFT Patch Version

static constexpr int32_t CODE = DTFFT_VERSION_CODE

dtFFT Version Code.

Can be used for version comparison

struct Pencil

Class to handle Pencils.

This is wrapper around dtfft_pencil_t C structure.

There are two ways users might find pencils useful inside dtFFT:

  1. To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructor.

  2. To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.

Public Functions

Pencil()

Default constructor, does not actually initialize anything.

explicit Pencil(int32_t n_dims, const int32_t *starts, const int32_t *counts)

Pencil constructor.

After calling this constructor, this pencil can be used to create Plan

Parameters:
  • n_dims[in] Number of dimensions in pencil, must be 2 or 3

  • starts[in] Local starts in natural Fortran order

  • counts[in] Local counts in natural Fortran order

explicit Pencil(const std::vector<int32_t> &starts, const std::vector<int32_t> &counts)

Pencil constructor.

After calling this constructor, this pencil can be used to create Plan

Parameters:
  • starts[in] Local starts in natural Fortran order

  • counts[in] Local counts in natural Fortran order

uint8_t get_ndims() const
Returns:

Number of dimensions in a pencil

uint8_t get_dim() const
Returns:

Aligned dimension ID starting from 1

std::vector<int32_t> get_starts() const
Returns:

Local starts in natural Fortran order

std::vector<int32_t> get_counts() const
Returns:

Local counts in natural Fortran order

size_t get_size() const
Returns:

Total number of elements in a pencil

const dtfft_pencil_t &c_struct() const
Returns:

Underlying C structure

struct Config

Class to set additional configuration parameters to dtFFT.

See also

set_config()

Public Functions

inline explicit Config()

Creates and sets default configuration values.

inline Config &set_enable_log(const bool enable_log) noexcept

Sets whether dtFFT should print additional information or not.

Default is false

inline Config &set_enable_z_slab(bool enable_z_slab) noexcept

Sets whether dtFFT use Z-slab optimization or not.

Default is true

One should consider disabling Z-slab optimization in order to resolve Error::VKFFT_R2R_2D_PLAN error or when underlying FFT implementation of 2D plan is too slow.

In all other cases, Z-slab is considered to be always faster.

inline Config &set_enable_y_slab(bool enable_y_slab) noexcept

Sets whether dtFFT should use Y-slab optimization or not.

Default is false

If true then dtFFT will skip the transpose step between Y and Z aligned layouts during call to Plan.execute(). One should consider disabling Y-slab optimization in order to resolve Error::VKFFT_R2R_2D_PLAN error or when underlying FFT implementation of 2D plan is too slow.

In all other cases, Y-slab is considered to be always faster.

inline Config &set_measure_warmup_iters(int32_t n_measure_warmup_iters) noexcept

Sets number of warmup iterations to underlying C structure.

Parameters:

n_measure_warmup_iters[in] Number of warmup iterations for transposition and data exchange to perform when effort exceeds DTFFT_ESTIMATE.

inline Config &set_measure_iters(int32_t n_measure_iters) noexcept

Sets number of actual iterations to underlying C structure.

Parameters:

n_measure_iters[in] Number of actual iterations for transposition and data exchange to perform when effort exceeds DTFFT_ESTIMATE.

inline Config &set_platform(Platform platform) noexcept

Sets platform to execute plan.

Default is Platform::HOST

This option is only defined with device support build. Even when dtFFT is build with device support it does not necessary means that all plans must be related to device.

inline Config &set_stream(dtfft_stream_t stream) noexcept

Sets Main CUDA stream that will be used in dtFFT.

This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by Plan.get_stream function. When user sets stream he is responsible of destroying it.

Stream must not be destroyed before call to destroy.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_backend(Backend backend) noexcept

Sets Backend that will be used by dtFFT when effort is Effort::ESTIMATE or Effort::MEASURE.

Default is Backend::NCCL

inline Config &set_enable_datatype_backend(bool enable_datatype_backend) noexcept

Should Backend::MPI_DATATYPE be enabled when effort is Effort::PATIENT or not.

Default is true

This option works only when executing on a host.

inline Config &set_enable_mpi_backends(bool enable_mpi_backends) noexcept

Should MPI Backends be enabled when effort is Effort:PATIENT or not.

Default is false

The following applies only to CUDA builds. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.

For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.

One of the workarounds is to disable MPI Backends by default, which is done here.

Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms

inline Config &set_enable_pipelined_backends(bool enable_pipelined_backends) noexcept

Sets whether pipelined backends be enabled when effort is Effort::PATIENT or not.

Default is true

Note

Pipelined backends require additional buffer that user has no control over.

inline Config &set_enable_nccl_backends(bool enable_nccl_backends) noexcept

Sets whether NCCL backends be enabled when effort is Effort::PATIENT or not.

Default is true.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_enable_nvshmem_backends(bool enable_nvshmem_backends) noexcept

Should NVSHMEM backends be enabled when effort is Effort::PATIENT or not.

Default is true.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_enable_kernel_optimization(bool enable_kernel_optimization) noexcept

Should dtFFT try to optimize NVRTC kernel block size when effort is DTFFT_PATIENT or not.

Default is true

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_n_configs_to_test(int32_t n_configs_to_test) noexcept

Set number of NVRTC kernels to try when effort is DTFFT_PATIENT.

Default is 5.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_force_kernel_optimization(bool force_kernel_optimization) noexcept

Sets whether kernel optimization should be enabled if effort is not DTFFT_PATIENT or not.

Default is false.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline dtfft_config_t c_struct() const
Returns:

Underlying C structure

Classes

class Exception : public std::exception

Basic exception class.

Public Functions

Exception(Error error_code, std::string msg, const char *file, int line)

Basic exception constructor.

Parameters:
  • error_code[in] Error code

  • msg[in] Message describing the error that occurred

  • file[in] Filename where the exception was thrown

  • line[in] Line number where the exception was thrown

const char *what() const noexcept override

Exception explanation.

Error get_error_code() const noexcept

Returns error code of exception.

const std::string &get_message() const noexcept

Returns error message of exception.

const std::string &get_file() const noexcept

Returns file name where exception occurred.

int get_line() const noexcept

Returns line number where exception occurred.

class Plan

Abstract plan for all dtFFT plans.

This class does not have any constructors. To create a plan user should use one of the inherited classes.

Subclassed by dtfft::PlanC2C, dtfft::PlanR2C, dtfft::PlanR2R

Public Functions

Error get_z_slab_enabled(bool *is_z_slab_enabled) const noexcept

Checks if plan is using Z-slab optimization.

If true then flags Transpose::X_TO_Z and Transpose::Z_TO_X will be valid to pass to Plan.transpose method.

Parameters:

is_z_slab_enabled[out] Boolean value if Z-slab is used.

Returns:

Error::SUCCESS if call was without error, error code otherwise

bool get_z_slab_enabled() const

Checks if plan is using Z-slab optimization.

Throws:

Exception – if underlying call fails

Returns:

true if Z-slab is enabled, false otherwise

Error get_y_slab_enabled(bool *is_y_slab_enabled) const noexcept

Checks if plan is using Y-slab optimization.

If true then during call to Plan.execute the transpose between Y and Z aligned layouts will be skipped.

Parameters:

is_y_slab_enabled[out] Boolean value if Y-slab is used.

Returns:

Error::SUCCESS if call was without error, error code otherwise

bool get_y_slab_enabled() const

Checks if plan is using Y-slab optimization.

Throws:

Exception – if underlying call fails

Returns:

true if Y-slab is enabled, false otherwise

Error report() const noexcept

Prints plan-related information to stdout.

Returns:

Error::SUCCESS if call was without error, error code otherwise

Error get_pencil(int32_t dim, Pencil &pencil) const noexcept

Obtains pencil information from plan.

This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.

Parameters:
  • dim[in] Required dimension:

    • 0 for XYZ layout (real space, valid for PlanR2C only)

    • 1 for XYZ layout (real space for C2C and R2R plans and fourier space for R2C plans)

    • 2 for YZX layout

    • 3 for ZXY layout

  • pencil[out] Created Pencil object

Returns:

Error::SUCCESS on success or error code on failure.

Pencil get_pencil(int32_t dim) const

Get the pencil object.

Parameters:

dim[in] Required dimension:

  • 0 for XYZ layout (real space, valid for PlanR2C only)

  • 1 for XYZ layout (real space for C2C and R2R plans and fourier space for R2C plans)

  • 2 for YZX layout

  • 3 for ZXY layout

Throws:

Exception – if underlying call fails

Returns:

Created Pencil object

Error execute(void *in, void *out, Execute execute_type, void *aux = nullptr) const noexcept

Plan execution.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *execute(void *inout, const Execute execute_type, void *aux = nullptr) const

In-place plan execution.

This template allows user to cast result pointer to desired type.

float *data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto fourier_data = plan.execute<std::complex<float>>(data,
Execute::FORWARD);
// `fourier_data` is still pointing to `data`, but is of type
std::complex<float>*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *execute(T *inout, const Execute execute_type, void *aux = nullptr) const

In-place plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto fourier_data = plan.execute(data, Execute::FORWARD);
// `fourier_data` is still pointing to `data` and is still of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error forward(void *in, void *out, void *aux) const noexcept

Forward plan execution.

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • aux[inout] Auxiliary pointer. Can be nullptr

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *forward(void *inout, void *aux = nullptr) const

In-place forward plan execution.

This template allows user to cast result pointer to desired type.

float *data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto fourier_data = plan.forward<std::complex<float>>(data);
// `fourier_data` is still pointing to `data`, but is of type
std::complex<float>*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *forward(T *inout, void *aux = nullptr) const

In-place forward plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto fourier_data = plan.forward(data);
// `fourier_data` is still pointing to `data` and is still of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error backward(void *in, void *out, void *aux) const noexcept

Backward plan execution.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • aux[inout] Auxiliary pointer. Can be nullptr

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *backward(void *inout, void *aux = nullptr) const

In-place backward plan execution.

This template allows user to cast result pointer to desired type.

std::complex<float> *fourier_data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto real_data = plan.backward<float>(fourier_data);
// `real_data` is still pointing to `fourier_data`, but is of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *backward(T *inout, void *aux = nullptr) const

In-place backward plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *fourier_data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto real_data = plan.backward(fourier_data);
// `real_data` is still pointing to `fourier_data` and is still of type
float *

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error transpose(void *in, void *out, Transpose transpose_type) const noexcept

Transpose data in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Pointer of transposed data

  • transpose_type[in] Type of transpose to perform.

Returns:

Error::SUCCESS on success or error code on failure.

Error transpose_start(void *in, void *out, Transpose transpose_type, dtfft_request_t *request) const noexcept

Starts an asynchronous transpose operation in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • transpose_type[in] Type of transpose to perform

  • request[out] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error transpose_end(dtfft_request_t request) const noexcept

Ends an asynchronous transpose operation.

Parameters:

request[inout] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error get_alloc_size(size_t *alloc_size) const noexcept

Wrapper around Plan.get_local_sizes to obtain alloc_size only.

Parameters:

alloc_size[out] Minimum number of elements to be allocated for in, out or aux buffers. Size of each element in bytes can be obtained by calling Plan.get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

size_t get_alloc_size() const

Wrapper around Plan.get_local_sizes to obtain alloc_size only.

Throws:

Exception – if underlying call fails

Returns:

Minimum number of elements to be allocated for in, out or aux buffers.

Error get_local_sizes(std::vector<int32_t> &in_starts, std::vector<int32_t> &in_counts, std::vector<int32_t> &out_starts, std::vector<int32_t> &out_counts, size_t *alloc_size) const noexcept

Get grid decomposition information.

Results may differ on different MPI processes

Note

Before calling this function, user must ensure that in_starts, in_counts, out_starts and out_counts vectors are large enough to hold the data.

Parameters:
  • in_starts[out] Starts of local portion of data in ‘real’ space in reversed order

  • in_counts[out] Sizes of local portion of data in ‘real’ space in reversed order

  • out_starts[out] Starts of local portion of data in ‘fourier’ space in reversed order

  • out_counts[out] Sizes of local portion of data in ‘fourier’ space in reversed order

  • alloc_size[out] Minimum number of elements to be allocated for in, out or aux buffers. Size of each element in bytes can be obtained by calling Plan.get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

Error get_local_sizes(int32_t *in_starts = nullptr, int32_t *in_counts = nullptr, int32_t *out_starts = nullptr, int32_t *out_counts = nullptr, size_t *alloc_size = nullptr) const noexcept

Get grid decomposition information.

Results may differ on different MPI processes

Parameters:
  • in_starts[out] Starts of local portion of data in ‘real’ space in reversed order

  • in_counts[out] Sizes of local portion of data in ‘real’ space in reversed order

  • out_starts[out] Starts of local portion of data in ‘fourier’ space in reversed order

  • out_counts[out] Sizes of local portion of data in ‘fourier’ space in reversed order

  • alloc_size[out] Minimum number of elements needs to be allocated for in, out or aux buffers. Size of each element in bytes can be obtained by calling Plan.get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

Error get_element_size(size_t *element_size) const noexcept

Obtains number of bytes required to store single element by this plan.

Parameters:

element_size[out] Size of element in bytes

Returns:

Error::SUCCESS on success or error code on failure.

size_t get_element_size() const

Obtains number of bytes required to store single element by this plan.

Throws:

Exception – if underlying call fails

Returns:

Size of element in bytes

Error get_alloc_bytes(size_t *alloc_bytes) const noexcept

Returns minimum number of bytes required to execute plan.

This function is a combination of two calls: Plan.get_alloc_size and Plan.get_element_size

Parameters:

alloc_bytes[out] Number of bytes required

Returns:

Error::SUCCESS on success or error code on failure.

size_t get_alloc_bytes() const

Returns minimum number of bytes required to execute plan.

This function is a combination of two calls: Plan.get_alloc_size and Plan.get_element_size

Throws:

Exception – if underlying call fails

Returns:

Number of bytes of each buffer required to execute plan

Error get_executor(Executor *executor) const noexcept

Returns executor used by this plan.

Parameters:

executor[out] Executor used by this plan.

Returns:

Error::SUCCESS on success or error code on failure.

Executor get_executor() const

Returns executor used by this plan.

Throws:

Exception – if underlying call fails

Returns:

Executor used by this plan.

Error get_precision(Precision *precision) const noexcept

Returns precision of the plan.

Parameters:

precision[out] Precision of the plan.

Returns:

Error::SUCCESS on success or error code on failure.

Precision get_precision() const

Returns precision of the plan.

Throws:

Exception – if underlying call fails

Returns:

Precision of the plan.

Error get_dims(int8_t *ndims, const int32_t *dims[]) const noexcept

Returns global dimensions of the plan.

Note

Do not free the array, it is freed when the Plan is destroyed.

Parameters:
  • ndims[out] Number of dimensions in the plan. User can pass nullptr if this value is not needed.

  • dims[out] Array of dimensions in natural Fortran order. User can pass nullptr if this value is not needed.

Returns:

Error::SUCCESS on success or error code on failure.

std::vector<int32_t> get_dims() const

Returns global dimensions of the plan.

Throws:

Exception – if underlying call fails

Returns:

Vector of dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan.

Error get_grid_dims(int8_t *ndims, const int32_t *grid_dims[]) const noexcept

Returns grid decomposition dimensions of the plan.

Note

Do not free grid_dims array, it is freed when the Plan is destroyed.

Parameters:
  • ndims[out] Number of dimensions in plan. User can pass NULL if this value is not needed.

  • grid_dims[out] Pointer of size ndims containing grid decomposition dimensions in reverse order grid_dims[0] is the fastest varying and is always equal to 1. User can pass NULL if this value is not needed.

Returns:

Error::SUCCESS on success or error code on failure.

std::vector<int32_t> get_grid_dims() const

Returns grid decomposition dimensions of the plan.

Throws:

Exception – if underlying call fails

Returns:

Vector of grid decomposition dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan. First value is always equal to 1.

Error mem_alloc(size_t alloc_bytes, void **ptr) const noexcept

Allocates memory specific for this plan.

Parameters:
  • alloc_bytes[in] Number of bytes to allocate

  • ptr[out] Allocated pointer

Returns:

Error::SUCCESS on success or error code on failure.

void *mem_alloc(size_t alloc_bytes) const

Allocates memory specific for this plan.

Parameters:

alloc_bytes – Number of bytes to allocate

Throws:

Exception – if underlying call fails

Returns:

Pointer to allocated memory

template<typename T>
inline T *mem_alloc(const size_t alloc_size) const

Allocates memory for an array of elements of type T.

Template Parameters:

T – Type of elements

Parameters:

alloc_size[in] Number of elements to allocate

Throws:

Exception – if underlying call fails

Returns:

Pointer to allocated memory

Error mem_free(void *ptr) const noexcept

Frees memory specific for this plan.

Parameters:

ptr[inout] Allocated pointer

Returns:

Error::SUCCESS on success or error code on failure.

Error destroy() noexcept

Plan Destructor.

To fully clean all internal memory, this should be called before MPI_Finalize

Returns:

Error::SUCCESS on success or error code on failure.

Error get_backend(Backend &backend) const noexcept

Returns selected backend during autotune if effort is Effort::PATIENT.

If effort passed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.

Returns:

Error::SUCCESS on success or error code on failure.

Backend get_backend() const

Returns selected backend during autotune if effort is Effort::PATIENT.

If effort passed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.

Throws:

Exception – if underlying call fails

Returns:

Backend used by this plan.

Error get_stream(dtfft_stream_t *stream) const noexcept

Returns stream associated with current Plan.

This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

Parameters:

stream[out] CUDA stream associated with plan

Returns:

Error::SUCCESS on success or error code on failure.

dtfft_stream_t get_stream() const

Returns stream associated with current Plan.

This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

Throws:

Exception – if underlying call fails

Returns:

dtFFT stream associated with plan

Error get_platform(Platform &platform) const noexcept

Returns plan execution platform.

Returns:

DTFFT_SUCCESS on success or error code on failure.

Platform get_platform() const

Returns plan execution platform.

Throws:

Exception – if underlying call fails

Returns:

Platform::HOST if plan is executed on host, Platform::CUDA if plan is executed on CUDA device.

inline dtfft_plan_t c_struct() const
Returns:

Underlying C structure

inline virtual ~Plan() noexcept = 0

Plan Destructor.

To fully clean all internal memory, this should be called before MPI_Finalize

class PlanC2C : public dtfft::Plan

Complex-to-Complex Plan.

Public Functions

explicit PlanC2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)

Complex-to-Complex Transpose-only Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(int8_t ndims, const int32_t *dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Generic Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)

Complex-to-Complex Plan constructor using pencil decomposition information.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • pencil[in] Initialized Pencil object.

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const Pencil &pencil, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Plan constructor using pencil decomposition information.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • pencil[in] Initialized Pencil object.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

class PlanR2C : public dtfft::Plan

Real-to-Complex Plan.

Note

This class is only present in the API when dtFFT was compiled with any external FFT.

Public Functions

explicit PlanR2C(const std::vector<int32_t> &dims, Executor executor, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE)

Real-to-Complex Plan constructor.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • executor[in] Type of external FFT executor

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(int8_t ndims, const int32_t *dims, Executor executor, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE)

Real-to-Complex Generic Plan constructor.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(const Pencil &pencil, Executor executor, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE)

Real-to-Complex Plan constructor.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • pencil[in] Initialized Pencil object.

  • executor[in] Type of external FFT executor

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

class PlanR2R : public dtfft::Plan

Real-to-Real Plan.

Public Functions

explicit PlanR2R(const std::vector<int32_t> &dims, const std::vector<R2RKind> &kinds = std::vector<R2RKind>(), MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • kinds[in] Real FFT kinds in reversed order. Can be empty vector if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Real Transpose-only Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(int8_t ndims, const int32_t *dims, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Generic Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • kinds[in] Buffer of size ndims with Real FFT kinds in reversed order. Can be nullptr if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Real Transpose-only Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, const std::vector<R2RKind> &kinds, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • kinds[in] Real FFT kinds in reversed order. Can be empty vector if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Generic Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • kinds[in] Buffer of size ndims with Real FFT kinds in reversed order. Can be nullptr if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] How thoroughly dtFFT searches for the optimal plan

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation