C++ API Reference

This page describes all classes, enumerators and functions available in dtFFT C++ API. In order to use them user have to #include <dtfft.hpp>. All API is contained within dtfft namespace.

Predefined Macros

DTFFT_CXX_CALL(call)

Safe call macro.

Should be used to check error codes returned by dtFFT.

Throws an exception with a message explaining the error if one occurs.

Example

DTFFT_CXX_CALL( plan.execute(a, b, dtfft::Execute::FORWARD) )

Enumerators

enum class dtfft::Error

This enum lists the different error codes that dtFFT can return.

Values:

enumerator SUCCESS

Successful execution.

enumerator MPI_FINALIZED

MPI_Init is not called or MPI_Finalize has already been called.

enumerator PLAN_NOT_CREATED

Plan not created.

enumerator INVALID_TRANSPOSE_TYPE

Invalid transpose_type provided.

enumerator INVALID_N_DIMENSIONS

Invalid Number of dimensions provided.

Valid options are 2 and 3

enumerator INVALID_DIMENSION_SIZE

One or more provided dimension sizes <= 0.

enumerator INVALID_COMM_TYPE

Invalid communicator type provided.

enumerator INVALID_PRECISION

Invalid precision parameter provided.

enumerator INVALID_EFFORT

Invalid effort parameter provided.

enumerator INVALID_EXECUTOR

Invalid executor parameter provided.

enumerator INVALID_COMM_DIMS

Number of dimensions in provided Cartesian communicator > Number of dimension passed to create subroutine.

enumerator INVALID_COMM_FAST_DIM

Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.

enumerator MISSING_R2R_KINDS

For R2R plan, kinds parameter must be passed if executor != Executor::NONE

enumerator INVALID_R2R_KINDS

Invalid values detected in kinds parameter.

enumerator R2C_TRANSPOSE_PLAN

Transpose plan is not supported in R2C, use R2R or C2C plan instead.

enumerator INPLACE_TRANSPOSE

Inplace transpose is not supported.

enumerator INVALID_AUX

Invalid aux buffer provided.

enumerator INVALID_LAYOUT

Invalid layout passed to Plan.get_pencil

enumerator INVALID_USAGE

Invalid API Usage.

enumerator PLAN_IS_CREATED

Trying to create already created plan.

enumerator R2R_FFT_NOT_SUPPORTED

Selected executor does not support R2R FFTs.

enumerator ALLOC_FAILED

Internal call of Plan.mem_alloc failed.

enumerator FREE_FAILED

Internal call of Plan.mem_free failed.

enumerator INVALID_ALLOC_BYTES

Invalid alloc_bytes provided.

enumerator DLOPEN_FAILED

Failed to dynamically load library.

enumerator DLSYM_FAILED

Failed to dynamically load symbol.

enumerator PENCIL_ARRAYS_SIZE_MISMATCH

Deprecated/unused: R2C transpose call restriction (kept for backward compatibility of error code numbering)

Sizes of starts and counts arrays passed to Pencil constructor do not match

enumerator PENCIL_ARRAYS_INVALID_SIZES

Sizes of starts and counts < 2 or > 3 provided to Pencil constructor.

enumerator PENCIL_INVALID_COUNTS

Invalid counts provided to Pencil constructor.

enumerator PENCIL_INVALID_STARTS

Invalid starts provided to Pencil constructor.

enumerator PENCIL_SHAPE_MISMATCH

Processes have same lower bounds (starts) but different sizes in some dimensions.

enumerator PENCIL_OVERLAP

Pencil overlap detected, i.e.

two processes share same part of global space

enumerator PENCIL_NOT_CONTINUOUS

Local pencils do not cover the global space without gaps.

enumerator PENCIL_NOT_INITIALIZED

Pencil is not initialized, i.e.

constructor subroutine was not called

enumerator INVALID_MEASURE_WARMUP_ITERS

Invalid n_measure_warmup_iters provided.

enumerator INVALID_MEASURE_ITERS

Invalid n_measure_iters provided.

enumerator INVALID_REQUEST

Invalid dtfft_request_t provided.

enumerator TRANSPOSE_ACTIVE

Attempting to execute already active transposition.

enumerator TRANSPOSE_NOT_ACTIVE

Attempting to finalize non-active transposition.

enumerator INVALID_RESHAPE_TYPE

Invalid reshape_type provided.

enumerator RESHAPE_ACTIVE

Attempting to execute already active reshape.

enumerator RESHAPE_NOT_ACTIVE

Attempting to finalize non-active reshape.

enumerator INPLACE_RESHAPE

Inplace reshape is not supported.

enumerator INVALID_EXECUTE_TYPE

R2C reshape was called.

Invalid execute_type provided

enumerator RESHAPE_NOT_SUPPORTED

Reshape is not supported for this plan.

enumerator R2C_EXECUTE_CALLED

Execute called for transpose-only R2C Plan.

enumerator INVALID_CART_COMM

Invalid cartesian communicator provided.

enumerator INVALID_TRANSPOSE_MODE

Invalid transpose mode provided.

enumerator GPU_INVALID_STREAM

Invalid stream provided.

enumerator INVALID_BACKEND

Invalid backend provided.

enumerator GPU_NOT_SET

Multiple MPI Processes located on same host share same GPU which is not supported.

enumerator VKFFT_R2R_2D_PLAN

When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.

enumerator BACKENDS_DISABLED

Passed effort == Effort::PATIENT but all GPU backends have been disabled by Config

enumerator NOT_DEVICE_PTR

One of pointers passed to Plan::execute or Plan::transpose cannot be accessed from device.

enumerator NOT_NVSHMEM_PTR

One of pointers passed to Plan::execute or Plan::transpose is not an NVSHMEM pointer.

enumerator INVALID_PLATFORM

Invalid platform provided.

enumerator INVALID_PLATFORM_EXECUTOR

Invalid executor provided for selected platform.

enumerator INVALID_PLATFORM_BACKEND

Invalid backend provided for selected platform.

enumerator COMPRESSION_CUDA_NOT_SUPPORTED

CUDA support is not available for compression.

enumerator COMPRESSION_INVALID_RATE

Invalid compression rate.

enumerator COMPRESSION_INVALID_PRECISION

Invalid compression precision.

enumerator COMPRESSION_INVALID_TOLERANCE

Invalid compression tolerance.

enumerator COMPRESSION_INVALID_MODE

Invalid compression mode.

enumerator COMPRESSION_INVALID_LIBRARY

Invalid compression library.

enumerator COMPRESSION_NOT_USED

Compressed backends are not used for this plan.

enum class dtfft::Execute

This enum lists valid execute_type parameters that can be passed to Plan::execute.

Values:

enumerator FORWARD

Perform XYZ –> YZX –> ZXY plan execution (Forward)

enumerator BACKWARD

Perform ZXY –> YZX –> XYZ plan execution (Backward)

enum class dtfft::Transpose

This enum lists valid transpose_type parameters that can be passed to Plan::transpose.

Values:

enumerator X_TO_Y

Transpose from Fortran X aligned to Fortran Y aligned.

enumerator Y_TO_X

Transpose from Fortran Y aligned to Fortran X aligned.

enumerator Y_TO_Z

Transpose from Fortran Y aligned to Fortran Z aligned.

enumerator Z_TO_Y

Transpose from Fortran Z aligned to Fortran Y aligned.

enumerator X_TO_Z

Transpose from Fortran X aligned to Fortran Z aligned.

Note

This value is valid only for 3D plans, and Plan::get_z_slab_enabled() must return true

enumerator Z_TO_X

Transpose from Fortran Z aligned to Fortran X aligned.

Note

This value is valid only for 3D plans, and Plan::get_z_slab_enabled() must return true

enum class dtfft::Precision

This enum lists valid precision parameters that can be passed to Plan constructors.

Values:

enumerator SINGLE

Use Single precision.

enumerator DOUBLE

Use Double precision.

enum class dtfft::Effort

This enum lists valid effort parameters that can be passed to Plan constructors.

Values:

enumerator ESTIMATE

Create plan as fast as possible.

enumerator MEASURE

Will attempt to find best MPI Grid decomposition.

Passing this flag and MPI Communicator with cartesian topology to any Plan Constructor is same as Effort::ESTIMATE.

enumerator PATIENT

Same as Effort::MEASURE plus autotune will try to find best backend.

enumerator EXHAUSTIVE

Same as Effort::PATIENT plus will autotune all possible kernels and reshape backends to find best configuration.

enum class dtfft::Executor

This enum lists available FFT executors.

Values:

enumerator NONE

Do not create any FFT plans.

Creates transpose only plan.

enumerator FFTW3

FFTW3 Executor (Host only)

enumerator MKL

MKL DFTI Executor (Host only)

enumerator CUFFT

CUFFT Executor (GPU Only)

enumerator VKFFT

VkFFT Executor (GPU Only)

enum class dtfft::R2RKind

Real-to-Real FFT kinds available in dtFFT.

Values:

enumerator DCT_1

DCT-I (Logical N=2*(n-1), inverse is R2RKind::DCT_1)

enumerator DCT_2

DCT-II (Logical N=2*n, inverse is R2RKind::DCT_3)

enumerator DCT_3

DCT-III (Logical N=2*n, inverse is R2RKind::DCT_2)

enumerator DCT_4

DCT-IV (Logical N=2*n, inverse is R2RKind::DCT_4)

enumerator DST_1

DST-I (Logical N=2*(n+1), inverse is R2RKind::DST_1)

enumerator DST_2

DST-II (Logical N=2*n, inverse is R2RKind::DST_3)

enumerator DST_3

DST-III (Logical N=2*n, inverse is R2RKind::DST_2)

enumerator DST_4

DST-IV (Logical N=2*n, inverse is R2RKind::DST_4)

enum class dtfft::Backend

Various Backends available in dtFFT.

Values:

enumerator MPI_DATATYPE

Backend that uses MPI datatypes.

This is default backend for Host build.

Not really recommended to use for GPU usage, since it is a ‘million’ times slower than other backends. Not available for autotune when effort is Effort::PATIENT in GPU build.

enumerator MPI_P2P

MPI peer-to-peer algorithm.

enumerator MPI_P2P_PIPELINED

MPI peer-to-peer algorithm with overlapping data copying and unpacking.

enumerator MPI_A2A

MPI backend using MPI_Alltoallv.

enumerator MPI_RMA

MPI backend using one-sided communications.

enumerator MPI_RMA_PIPELINED

MPI backend using pipelined one-sided communications.

enumerator MPI_P2P_SCHEDULED

MPI peer-to-peer algorithm with scheduled communication.

enumerator MPI_P2P_FUSED

MPI peer-to-peer pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.

enumerator MPI_RMA_FUSED

MPI RMA pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.

enumerator MPI_P2P_COMPRESSED

Extension of Backend.MPI_P2P_FUSED Data is getting compressed before sending and decompressed after receiving.

enumerator MPI_RMA_COMPRESSED

Extension of Backend.MPI_RMA_FUSED Data is getting compressed before sending and decompressed after receiving.

enumerator NCCL

NCCL backend.

enumerator NCCL_PIPELINED

NCCL backend with overlapping data copying and unpacking.

enumerator NCCL_COMPRESSED

NCCL backend that performs compression before data exchange and decompression after.

enumerator CUFFTMP

cuFFTMp backend

enumerator CUFFTMP_PIPELINED

cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance

enumerator ADAPTIVE

Adaptive backend selection: during plan creation dtFFT benchmarks multiple backends and selects the fastest backend independently for each transpose/reshape operation.

The selection is fixed for the lifetime of the plan.

Note

Can only be used when effort >= Effort.DTFFT_PATIENT.

Note

Currently only available for HOST execution platform

enumerator NONE

Backend is not defined.

This value is used when no backend is selected, for example when executing on a single process.

Note

This value should never be set by user directly. It can only be returned by the library.

enum class dtfft::Platform

Enum that specifies runtime platform, e.g.

Host, CUDA, HIP

Values:

enumerator HOST

Host.

enumerator CUDA

CUDA.

enum class dtfft::Layout

This enum represents different data layouts used in dtFFT and it should be used to retrieve layout information from plans.

Values:

enumerator X_BRICKS

X-brick layout: data is distributed along all dimensions.

enumerator X_PENCILS

X-pencil layout: data is distributed along Y and Z dimensions.

enumerator X_PENCILS_FOURIER

X-pencil layout obtained after executing FFT for R2C plan: data is distributed along Y and Z dimensions.

enumerator Y_PENCILS

Y-pencil layout: data is distributed along X and Z dimensions.

enumerator Z_PENCILS

Z-pencil layout: data is distributed along X and Y dimensions.

enumerator Z_BRICKS

Z-brick layout: data is distributed along all dimensions.

enum class dtfft::Reshape

This enum lists valid reshape_type parameters that can be passed to Plan::reshape.

Values:

enumerator X_BRICKS_TO_PENCILS

Reshape from X bricks to X pencils.

enumerator X_PENCILS_TO_BRICKS

Reshape from X pencils to X bricks.

enumerator Z_BRICKS_TO_PENCILS

Reshape from Z bricks to Z pencils.

enumerator Z_PENCILS_TO_BRICKS

Reshape from Z pencils to Z bricks.

enumerator Y_BRICKS_TO_PENCILS

Reshape from Y-bricks to Y-pencils This is to be used in 2D Plans.

enumerator Y_PENCILS_TO_BRICKS

Reshape from Y-pencils to Y-bricks This is to be used in 2D Plans.

enum class dtfft::TransposeMode

This enum specifies at which stage the local transposition is performed during global exchange.

It affects only Generic backends that perform explicit packing/unpacking.

Values:

enumerator PACK

Perform transposition during the packing stage (Sender side).

enumerator UNPACK

Perform transposition during the unpacking stage (Receiver side).

enum class dtfft::AccessMode

This enum specifies whether to prioritize write-aligned (contiguous in memory) or read-aligned (scattered access) operations during local transposition in Generic backends.

Values:

enumerator WRITE

Write-aligned access (scattered read, contiguous write).

Usually faster on CPUs.

enumerator READ

Read-aligned access (contiguous read, scattered write).

enum class dtfft::CompressionLib

Enum that specifies compression library.

Values:

enumerator ZFP

ZFP compression library.

enum class dtfft::CompressionMode

Enum that specifies compression mode.

Values:

enumerator LOSSLESS

Lossless compression mode.

enumerator FIXED_RATE

Fixed rate compression mode.

enumerator FIXED_PRECISION

Fixed precision compression mode.

enumerator FIXED_ACCURACY

Fixed accuracy compression mode.

Functions

std::string dtfft::get_backend_string(Backend backend)

Returns string with name of backend provided as argument.

Parameters:

backend[in] Backend to represent

Returns:

String representation of backend.

std::string dtfft::get_error_string(Error error_code) noexcept

Returns the string description of an error code.

Parameters:

error_code[in] Error code to convert to string

Returns:

String representation of error_code

std::string dtfft::get_precision_string(Precision precision) noexcept

Returns the string representation of a Precision value.

Parameters:

precision[in] Precision level to convert to string

Returns:

String representation of Precision.

std::string dtfft::get_executor_string(Executor executor) noexcept

Returns the string representation of an Executor value.

Parameters:

executor[in] Executor type to convert to string

Returns:

String representation of Executor.

Error dtfft::set_config(const Config &config) noexcept

Sets configuration values to dtFFT.

Must be called before plan creation to take effect.

See also

Config

Returns:

Error::SUCCESS if the call was successful, error code otherwise

bool dtfft::get_backend_pipelined(Backend backend)

Returns true if passed backend is pipelined and false otherwise.

Parameters:

backend[in] Backend to check

Returns:

Logical flag

Structs

struct Version

dtFFT version information

Public Static Functions

static inline int32_t get() noexcept
Returns:

Version Code defined during compilation

static inline constexpr int32_t get(int32_t major, int32_t minor, int32_t patch) noexcept
Returns:

Version Code based on input parameters

Public Static Attributes

static constexpr int32_t MAJOR = DTFFT_VERSION_MAJOR

dtFFT Major Version

static constexpr int32_t MINOR = DTFFT_VERSION_MINOR

dtFFT Minor Version

static constexpr int32_t PATCH = DTFFT_VERSION_PATCH

dtFFT Patch Version

static constexpr int32_t CODE = DTFFT_VERSION_CODE

dtFFT Version Code.

Can be used for version comparison

struct Pencil

Class to handle Pencils.

This is wrapper around dtfft_pencil_t C structure.

There are two ways users might find pencils useful inside dtFFT:

  1. To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructor.

  2. To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.

Public Functions

Pencil()

Default constructor, does not actually initialize anything.

explicit Pencil(int32_t n_dims, const int32_t *starts, const int32_t *counts)

Pencil constructor.

After calling this constructor, this pencil can be used to create Plan

Parameters:
  • n_dims[in] Number of dimensions in pencil, must be 2 or 3

  • starts[in] Local starts in natural Fortran order

  • counts[in] Local counts in natural Fortran order

explicit Pencil(const std::vector<int32_t> &starts, const std::vector<int32_t> &counts)

Pencil constructor.

After calling this constructor, this pencil can be used to create Plan

Parameters:
  • starts[in] Local starts in natural Fortran order

  • counts[in] Local counts in natural Fortran order

uint8_t get_ndims() const
Returns:

Number of dimensions in a pencil

uint8_t get_dim() const
Returns:

Aligned dimension ID starting from 1

std::vector<int32_t> get_starts() const
Returns:

Local starts in natural Fortran order

std::vector<int32_t> get_counts() const
Returns:

Local counts in natural Fortran order

size_t get_size() const
Returns:

Total number of elements in a pencil

const dtfft_pencil_t &c_struct() const
Returns:

Underlying C structure

struct Config

Class to set additional configuration parameters to dtFFT.

See also

set_config()

Public Functions

inline explicit Config()

Creates and sets default configuration values.

inline Config &set_enable_log(const bool enable_log) noexcept

Sets whether dtFFT should print additional information or not.

Default is false

inline Config &set_enable_z_slab(bool enable_z_slab) noexcept

Sets whether dtFFT use Z-slab optimization or not.

Default is true

One should consider disabling Z-slab optimization in order to resolve Error::VKFFT_R2R_2D_PLAN error or when underlying FFT implementation of 2D plan is too slow.

In all other cases, Z-slab is considered to be always faster.

inline Config &set_enable_y_slab(bool enable_y_slab) noexcept

Sets whether dtFFT should use Y-slab optimization or not.

Default is false

If true then dtFFT will skip the transpose step between Y and Z aligned layouts during call to Plan::execute(). One should consider disabling Y-slab optimization in order to resolve Error::VKFFT_R2R_2D_PLAN error or when underlying FFT implementation of 2D plan is too slow.

In all other cases, Y-slab is considered to be always faster.

inline Config &set_measure_warmup_iters(int32_t n_measure_warmup_iters) noexcept

Sets number of warmup iterations to underlying C structure.

Parameters:

n_measure_warmup_iters[in] Number of warmup iterations to execute during backend and kernel autotuning when effort level is Effort::MEASURE or higher.

inline Config &set_measure_iters(int32_t n_measure_iters) noexcept

Sets number of actual iterations to underlying C structure.

Parameters:

n_measure_iters[in] Number of iterations to execute during backend and kernel autotuning when effort level is Effort::MEASURE or higher.

inline Config &set_platform(Platform platform) noexcept

Sets platform to execute plan.

Default is Platform::HOST.

This option is only available when dtFFT is built with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related. This enables a single library installation to support both host and CUDA plans.

inline Config &set_stream(dtfft_stream_t stream) noexcept

Sets Main CUDA stream that will be used in dtFFT.

This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by Plan::get_stream function. When user sets stream he is responsible of destroying it.

Stream must not be destroyed before call to destroy.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_backend(Backend backend) noexcept

Sets Backend that will be used by dtFFT when effort is Effort::ESTIMATE or Effort::MEASURE.

Default for HOST platform is Backend::MPI_DATATYPE.

Default for CUDA platform is Backend::NCCL if NCCL is enabled, otherwise Backend::MPI_P2P.

inline Config &set_reshape_backend(Backend backend) noexcept

Sets Backend that will be used by dtFFT for data reshaping from bricks to pencils and vice versa when effort is Effort::ESTIMATE or Effort::MEASURE.

Default for HOST platform is Backend::MPI_DATATYPE.

Default for CUDA platform is Backend::NCCL if NCCL is enabled, otherwise Backend::MPI_P2P.

inline Config &set_enable_datatype_backend(bool enable_datatype_backend) noexcept

Should Backend::MPI_DATATYPE be considered for autotuning when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

This option only works when platform is Platform::HOST. When platform is Platform::CUDA, Backend::MPI_DATATYPE is always disabled during autotuning.

inline Config &set_enable_mpi_backends(bool enable_mpi_backends) noexcept

Should MPI Backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is false.

This option applies to all Backend::MPI_* backends, except Backend::MPI_DATATYPE.

The following applies only to CUDA builds. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.

For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.

One of the workarounds is to disable MPI Backends by default, which is done here.

Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms

inline Config &set_enable_pipelined_backends(bool enable_pipelined_backends) noexcept

Sets whether pipelined backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

inline Config &set_enable_rma_backends(bool enable_rma_backends) noexcept

Sets whether RMA backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

inline Config &set_enable_fused_backends(bool enable_fused_backends) noexcept

Sets whether fused backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

inline Config &set_enable_nccl_backends(bool enable_nccl_backends) noexcept

Sets whether NCCL backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_enable_nvshmem_backends(bool enable_nvshmem_backends) noexcept

Should NVSHMEM backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is true.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

inline Config &set_enable_kernel_autotune(bool enable_kernel_autotune) noexcept

Should dtFFT try to optimize kernel launch parameters during plan creation when effort is below Effort::EXHAUSTIVE.

Default is false.

Kernel optimization is always enabled for Effort::EXHAUSTIVE effort level. Setting this option to true enables kernel optimization for lower effort levels (Effort::ESTIMATE, Effort::MEASURE, Effort::PATIENT). This may increase plan creation time but can improve runtime performance. Since kernel optimization is performed without data transfers, the time increase is usually minimal.

inline Config &set_enable_fourier_reshape(bool enable_fourier_reshape) noexcept

Should dtFFT execute reshapes from pencils to bricks and vice versa in Fourier space during calls to execute.

Default is false.

When enabled, data will be in brick layout in Fourier space, which may be useful for certain operations between forward and backward transforms. However, this requires additional all-to-all exchange and will reduce overall performance.

inline Config &set_transpose_mode(TransposeMode transpose_mode) noexcept

Sets at which stage the local transposition is performed during global exchange when effort level is below Effort::EXHAUSTIVE.

Default is TransposeMode::PACK.

For Effort::EXHAUSTIVE effort level, dtFFT will always choose the best transpose mode based on internal autotuning.

Note

This option only takes effect if platform == Platform::HOST

inline Config &set_access_mode(AccessMode access_mode) noexcept

Sets the memory access mode for local transposition in Generic backends.

This setting allows choosing between write-aligned (contiguous write) or read-aligned (contiguous read) memory access patterns during the local transposition phase.

Default is AccessMode::WRITE.

Write-aligned access is generally faster on CPU architectures due to cache line utilization optimization. However, specific hardware or memory subsystem characteristics might favor read-aligned access.

inline Config &set_enable_compressed_backends(bool enable_compressed_backends) noexcept

Should compressed backends be enabled when effort is Effort::PATIENT or Effort::EXHAUSTIVE.

Default is false.

Only fixed-rate compression can be used during autotuning, since it provides predictable performance characteristics and does not require data-dependent decisions at runtime. To enable compressed backends during autotuning, set this option to true, set compression type to CompressionMode::FIXED_RATE and provide desired compression rate.

inline Config &set_compression_config_transpose(const CompressionConfig &compression_config) noexcept

Sets compression configuration for transpositions.

inline Config &set_compression_config_reshape(const CompressionConfig &compression_config) noexcept

Sets compression configuration for reshape operations.

inline dtfft_config_t c_struct() const
Returns:

Underlying C structure

struct CompressionConfig

Struct that specifies compression configuration.

Public Functions

CompressionConfig() = default

Default constructor.

inline CompressionConfig(CompressionLib lib, CompressionMode mode, double rate = -1.0, int32_t precision = -1, double tolerance = -1.0)

Constructor with parameters.

inline CompressionConfig(CompressionMode mode, double value)

Fixed rate or fixed accuracy compression constructor.

inline CompressionConfig(int32_t precision)

Fixed precision compression constructor.

Public Members

CompressionLib compression_lib = CompressionLib::ZFP

Compression library to use.

CompressionMode compression_mode = CompressionMode::LOSSLESS

Compression mode to use.

double rate = -1.0

Rate for CompressionMode::FIXED_RATE.

int32_t precision = -1

Precision for CompressionMode::FIXED_PRECISION.

double tolerance = -1.0

Tolerance for CompressionMode::FIXED_ACCURACY.

Classes

class Exception : public std::exception

Basic exception class.

Public Functions

Exception(Error error_code, std::string msg, const char *file, int line)

Basic exception constructor.

Parameters:
  • error_code[in] Error code

  • msg[in] Message describing the error that occurred

  • file[in] Filename where the exception was thrown

  • line[in] Line number where the exception was thrown

const char *what() const noexcept override

Exception explanation.

Error get_error_code() const noexcept

Returns error code of exception.

const std::string &get_message() const noexcept

Returns error message of exception.

const std::string &get_file() const noexcept

Returns file name where exception occurred.

int get_line() const noexcept

Returns line number where exception occurred.

class Plan

Abstract plan for all dtFFT plans.

This class does not have any constructors. To create a plan user should use one of the inherited classes.

Subclassed by dtfft::PlanC2C, dtfft::PlanR2C, dtfft::PlanR2R

Public Functions

Error get_z_slab_enabled(bool *is_z_slab_enabled) const noexcept

Checks if plan is using Z-slab optimization.

If true then flags Transpose::X_TO_Z and Transpose::Z_TO_X will be valid to pass to Plan::transpose method.

Parameters:

is_z_slab_enabled[out] Boolean value if Z-slab is used.

Returns:

Error::SUCCESS if call was without error, error code otherwise

bool get_z_slab_enabled() const

Checks if plan is using Z-slab optimization.

Throws:

Exception – if underlying call fails

Returns:

true if Z-slab is enabled, false otherwise

Error get_y_slab_enabled(bool *is_y_slab_enabled) const noexcept

Checks if plan is using Y-slab optimization.

If true then during call to Plan::execute the transpose between Y and Z aligned layouts will be skipped.

Parameters:

is_y_slab_enabled[out] Boolean value if Y-slab is used.

Returns:

Error::SUCCESS if call was without error, error code otherwise

bool get_y_slab_enabled() const

Checks if plan is using Y-slab optimization.

Throws:

Exception – if underlying call fails

Returns:

true if Y-slab is enabled, false otherwise

Error report() const noexcept

Prints plan-related information to stdout.

Returns:

Error::SUCCESS if call was without error, error code otherwise

Error report_compression() const noexcept

Prints compression-related information to stdout.

Returns:

Error::SUCCESS if call was without error, error code otherwise

Error get_pencil(Layout layout, Pencil &pencil) const noexcept

Obtains pencil information from plan.

This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.

Parameters:
  • layout[in] Required layout of the pencil

  • pencil[out] Created Pencil object

Returns:

Error::SUCCESS on success or error code on failure.

Pencil get_pencil(Layout layout) const

Get the pencil object.

Parameters:

layout[in] Required layout of the pencil

Throws:

Exception – if underlying call fails

Returns:

Created Pencil object

Error execute(void *in, void *out, Execute execute_type, void *aux = nullptr) const noexcept

Plan execution.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer. If provided, must be at least get_aux_bytes() bytes.

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *execute(void *inout, const Execute execute_type, void *aux = nullptr) const

In-place plan execution.

This template allows user to cast result pointer to desired type.

float *data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto fourier_data = plan.execute<std::complex<float>>(data,
Execute::FORWARD);
// `fourier_data` is still pointing to `data`, but is of type
std::complex<float>*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *execute(T *inout, const Execute execute_type, void *aux = nullptr) const

In-place plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto fourier_data = plan.execute(data, Execute::FORWARD);
// `fourier_data` is still pointing to `data` and is still of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • execute_type[in] Direction of execution

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error forward(void *in, void *out, void *aux) const noexcept

Forward plan execution.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_bytes() bytes.

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *forward(void *inout, void *aux = nullptr) const

In-place forward plan execution.

This template allows user to cast result pointer to desired type.

float *data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto fourier_data = plan.forward<std::complex<float>>(data);
// `fourier_data` is still pointing to `data`, but is of type
std::complex<float>*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *forward(T *inout, void *aux = nullptr) const

In-place forward plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto fourier_data = plan.forward(data);
// `fourier_data` is still pointing to `data` and is still of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error backward(void *in, void *out, void *aux) const noexcept

Backward plan execution.

Parameters:
  • in[inout] Input pointer

  • out[out] Result pointer

  • aux[inout] Auxiliary pointer. Can be nullptr

Returns:

Error::SUCCESS on success or error code on failure.

template<typename Tr>
inline Tr *backward(void *inout, void *aux = nullptr) const

In-place backward plan execution.

This template allows user to cast result pointer to desired type.

std::complex<float> *fourier_data = ...; // Pointer to data

PlanR2C plan = ...; // Create plan

auto real_data = plan.backward<float>(fourier_data);
// `real_data` is still pointing to `fourier_data`, but is of type float*

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:

Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

template<typename T, typename Tr = T>
inline Tr *backward(T *inout, void *aux = nullptr) const

In-place backward plan execution.

This template allows user to keep result pointer of the same type as input pointer.

float *fourier_data = ...; // Pointer to data

PlanR2R plan = ...; // Create plan

auto real_data = plan.backward(fourier_data);
// `real_data` is still pointing to `fourier_data` and is still of type
float *

Note

Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.

Template Parameters:
  • T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

  • Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those

Parameters:
  • inout[inout] Input/output pointer

  • aux[inout] Optional Auxiliary pointer

Throws:

Exception – if underlying call fails

Returns:

Pointer to the processed data casted to type Tr

Error transpose(void *in, void *out, Transpose transpose_type, void *aux = nullptr) const noexcept

Transpose data in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Pointer of transposed data

  • transpose_type[in] Type of transpose to perform.

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_transpose() elements.

Returns:

Error::SUCCESS on success or error code on failure.

Error transpose_start(void *in, void *out, Transpose transpose_type, void *aux, dtfft_request_t *request) const noexcept

Starts an asynchronous transpose operation in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • transpose_type[in] Type of transpose to perform

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_transpose() bytes.

  • request[out] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error transpose_start(void *in, void *out, Transpose transpose_type, dtfft_request_t *request) const noexcept

Starts an asynchronous transpose operation in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • transpose_type[in] Type of transpose to perform

  • request[out] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

dtfft_request_t transpose_start(void *in, void *out, Transpose transpose_type, void *aux = nullptr) const

Starts an asynchronous transpose operation in single dimension, e.g.

X align -> Y align

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • transpose_type[in] Type of transpose to perform

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_transpose() bytes.

Returns:

Handle to manage the asynchronous operation

Error transpose_end(dtfft_request_t request) const noexcept

Ends an asynchronous transpose operation.

Parameters:

request[inout] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error reshape(void *in, void *out, Reshape reshape_type, void *aux = nullptr) const noexcept

Reshape data from bricks to pencils and vice versa.

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Pointer of reshaped data

  • reshape_type[in] Type of reshape to perform.

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_reshape() elements.

Returns:

Error::SUCCESS on success or error code on failure.

Error reshape_start(void *in, void *out, Reshape reshape_type, void *aux, dtfft_request_t *request) const noexcept

Starts an asynchronous reshape operation from bricks to pencils and vice versa.

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • reshape_type[in] Type of reshape to perform

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_reshape() elements.

  • request[out] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error reshape_start(void *in, void *out, Reshape reshape_type, dtfft_request_t *request) const noexcept

Starts an asynchronous reshape operation from bricks to pencils and vice versa.

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • reshape_type[in] Type of reshape to perform

  • request[out] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

dtfft_request_t reshape_start(void *in, void *out, Reshape reshape_type, void *aux = nullptr) const

Starts an asynchronous reshape operation from bricks to pencils and vice versa.

Attention

in and out cannot be the same pointers

Parameters:
  • in[inout] Input pointer

  • out[out] Output pointer

  • reshape_type[in] Type of reshape to perform

  • aux[inout] Auxiliary pointer. Can be nullptr. If provided, must be at least get_aux_size_reshape() elements.

Returns:

Handle to manage the asynchronous operation

Error reshape_end(dtfft_request_t request) const noexcept

Ends an asynchronous reshape operation.

Parameters:

request[inout] Handle to manage the asynchronous operation

Returns:

Error::SUCCESS on success or error code on failure.

Error get_alloc_size(size_t *alloc_size) const noexcept

Wrapper around Plan.get_local_sizes to obtain alloc_size only.

Parameters:

alloc_size[out] Minimum number of elements to be allocated for in and out buffers required by Plan::execute, Plan::transpose, and Plan::reshape. Size of each element in bytes can be obtained by calling Plan::get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_alloc_size() const

Wrapper around Plan.get_local_sizes to obtain alloc_size only.

Throws:

Exception – if underlying call fails

Returns:

Minimum number of elements to be allocated for in and out buffers required by Plan::execute, Plan::transpose, and Plan::reshape.

Error get_aux_size(std::size_t *aux_size) const noexcept

Get auxiliary buffer size required to execute the plan.

Parameters:

aux_size[out] Number of elements required for auxiliary buffer. Size of each element in bytes can be obtained by calling Plan::get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_size() const

Get auxiliary buffer size required to execute the plan.

Throws:

Exception – if underlying call fails

Returns:

Number of elements required for auxiliary buffer.

Error get_aux_bytes(std::size_t *aux_bytes) const noexcept

Get auxiliary buffer size in bytes required to execute the plan.

Parameters:

aux_bytes[out] Number of bytes required for auxiliary buffer.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_bytes() const

Get auxiliary buffer size in bytes required to execute the plan.

Throws:

Exception – if underlying call fails

Returns:

Number of bytes required for auxiliary buffer.

Error get_aux_size_reshape(std::size_t *aux_size) const noexcept

Get number of elements required by Plan::reshape.

Parameters:

aux_size[out] Number of elements required for auxiliary buffer during reshape operation. Size of each element in bytes can be obtained by calling Plan::get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_size_reshape() const

Get number of elements required by Plan::reshape.

Throws:

Exception – if underlying call fails

Returns:

Number of elements required for auxiliary buffer during reshape operation.

Error get_aux_bytes_reshape(std::size_t *aux_bytes) const noexcept

Get number of bytes required by Plan.reshape.

Parameters:

aux_bytes[out] Number of bytes required for auxiliary buffer during reshape operation.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_bytes_reshape() const

Get number of bytes required by Plan::reshape.

Throws:

Exception – if underlying call fails

Returns:

Number of bytes required for auxiliary buffer during reshape operation.

Error get_aux_size_transpose(std::size_t *aux_size) const noexcept

Get number of elements required by Plan::transpose.

Parameters:

aux_size[out] Number of elements required for auxiliary buffer during transpose operations.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_size_transpose() const

Get number of elements required by Plan::transpose.

Throws:

Exception – if underlying call fails

Returns:

Number of elements required for auxiliary buffer during transpose operations.

Error get_aux_bytes_transpose(std::size_t *aux_bytes) const noexcept

Get number of bytes required by Plan::transpose.

Parameters:

aux_bytes[out] Number of bytes required for auxiliary buffer during transpose operations.

Returns:

Error::SUCCESS on success or error code on failure.

std::size_t get_aux_bytes_transpose() const

Get number of bytes required by Plan::transpose.

Throws:

Exception – if underlying call fails

Returns:

Number of bytes required for auxiliary buffer during transpose operations.

Error get_local_sizes(std::vector<int32_t> &in_starts, std::vector<int32_t> &in_counts, std::vector<int32_t> &out_starts, std::vector<int32_t> &out_counts, std::size_t *alloc_size) const noexcept

Get grid decomposition information.

Results may differ on different MPI processes

Note

Before calling this function, user must ensure that in_starts, in_counts, out_starts and out_counts vectors are large enough to hold the data.

Parameters:
  • in_starts[out] Starts of local portion of data in ‘real’ space in reversed order

  • in_counts[out] Sizes of local portion of data in ‘real’ space in reversed order

  • out_starts[out] Starts of local portion of data in ‘fourier’ space in reversed order

  • out_counts[out] Sizes of local portion of data in ‘fourier’ space in reversed order

  • alloc_size[out] Minimum number of elements to be allocated for in, out buffers. Size of each element in bytes can be obtained by calling Plan::get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

Error get_local_sizes(int32_t *in_starts = nullptr, int32_t *in_counts = nullptr, int32_t *out_starts = nullptr, int32_t *out_counts = nullptr, size_t *alloc_size = nullptr) const noexcept

Get grid decomposition information.

Results may differ on different MPI processes

Parameters:
  • in_starts[out] Starts of local portion of data in ‘real’ space in reversed order

  • in_counts[out] Sizes of local portion of data in ‘real’ space in reversed order

  • out_starts[out] Starts of local portion of data in ‘fourier’ space in reversed order

  • out_counts[out] Sizes of local portion of data in ‘fourier’ space in reversed order

  • alloc_size[out] Minimum number of elements needs to be allocated for in, out buffers. Size of each element in bytes can be obtained by calling Plan.get_element_size.

Returns:

Error::SUCCESS on success or error code on failure.

Error get_element_size(size_t *element_size) const noexcept

Obtains number of bytes required to store single element by this plan.

Parameters:

element_size[out] Size of element in bytes

Returns:

Error::SUCCESS on success or error code on failure.

size_t get_element_size() const

Obtains number of bytes required to store single element by this plan.

Throws:

Exception – if underlying call fails

Returns:

Size of element in bytes

Error get_alloc_bytes(size_t *alloc_bytes) const noexcept

Returns minimum number of bytes required for in and out buffers.

This function is a combination of two calls: Plan::get_alloc_size and Plan::get_element_size. Returns minimum number of bytes to be allocated for in and out buffers required by Plan::execute. Minimum number of aux bytes required by Plan::execute can be obtained by calling Plan::get_aux_bytes.

Parameters:

alloc_bytes[out] Number of bytes required

Returns:

Error::SUCCESS on success or error code on failure.

size_t get_alloc_bytes() const

Returns minimum number of bytes required for in and out buffers.

This function is a combination of two calls: Plan::get_alloc_size and Plan::get_element_size. Returns minimum number of bytes to be allocated for in and out buffers required by Plan::execute. Minimum number of aux bytes required by Plan::execute can be obtained by calling Plan::get_aux_bytes.

Throws:

Exception – if underlying call fails

Returns:

Number of bytes of each buffer required to execute plan

Error get_executor(Executor *executor) const noexcept

Returns executor used by this plan.

Parameters:

executor[out] Executor used by this plan.

Returns:

Error::SUCCESS on success or error code on failure.

Executor get_executor() const

Returns executor used by this plan.

Throws:

Exception – if underlying call fails

Returns:

Executor used by this plan.

Error get_precision(Precision *precision) const noexcept

Returns precision of the plan.

Parameters:

precision[out] Precision of the plan.

Returns:

Error::SUCCESS on success or error code on failure.

Precision get_precision() const

Returns precision of the plan.

Throws:

Exception – if underlying call fails

Returns:

Precision of the plan.

Error get_dims(int8_t *ndims, const int32_t *dims[]) const noexcept

Returns global dimensions of the plan.

Note

Do not free the array, it is freed when the Plan is destroyed.

Parameters:
  • ndims[out] Number of dimensions in the plan. User can pass nullptr if this value is not needed.

  • dims[out] Array of dimensions in natural Fortran order. User can pass nullptr if this value is not needed.

Returns:

Error::SUCCESS on success or error code on failure.

std::vector<int32_t> get_dims() const

Returns global dimensions of the plan.

Throws:

Exception – if underlying call fails

Returns:

Vector of dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan.

Error get_grid_dims(int8_t *ndims, const int32_t *grid_dims[]) const noexcept

Returns grid decomposition dimensions of the plan.

Note

Do not free grid_dims array, it is freed when the Plan is destroyed.

Parameters:
  • ndims[out] Number of dimensions in plan. User can pass nullptr if this value is not needed.

  • grid_dims[out] Pointer of size ndims containing grid decomposition dimensions in reverse order: grid_dims[0] is the fastest varying and is always equal to 1. User can pass nullptr if this value is not needed.

Returns:

Error::SUCCESS on success or error code on failure.

std::vector<int32_t> get_grid_dims() const

Returns grid decomposition dimensions of the plan.

Throws:

Exception – if underlying call fails

Returns:

Vector of grid decomposition dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan. First value is always equal to 1.

Error mem_alloc(size_t alloc_bytes, void **ptr) const noexcept

Allocates memory specific for this plan.

Parameters:
  • alloc_bytes[in] Number of bytes to allocate

  • ptr[out] Allocated pointer

Returns:

Error::SUCCESS on success or error code on failure.

void *mem_alloc(size_t alloc_bytes) const

Allocates memory specific for this plan.

Parameters:

alloc_bytes – Number of bytes to allocate

Throws:

Exception – if underlying call fails

Returns:

Pointer to allocated memory

template<typename T>
inline T *mem_alloc(const size_t alloc_size) const

Allocates memory for an array of elements of type T.

Template Parameters:

T – Type of elements

Parameters:

alloc_size[in] Number of elements to allocate

Throws:

Exception – if underlying call fails

Returns:

Pointer to allocated memory

Error mem_free(void *ptr) const noexcept

Frees memory specific for this plan.

Parameters:

ptr[inout] Allocated pointer

Returns:

Error::SUCCESS on success or error code on failure.

Error destroy() noexcept

Plan Destructor.

To fully clean all internal memory, this should be called before MPI_Finalize

Returns:

Error::SUCCESS on success or error code on failure.

Error get_backend(Backend &backend) const noexcept

Returns selected backend during autotune if effort is Effort::PATIENT.

If effort passed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.

Returns:

Error::SUCCESS on success or error code on failure.

Backend get_backend() const

Returns selected backend during autotune if effort is Effort::PATIENT.

If effort passed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.

Throws:

Exception – if underlying call fails

Returns:

Backend used by this plan.

Error get_reshape_backend(Backend &backend) const noexcept

Returns backend used for reshape operations.

Parameters:

backend[out] Backend used for reshape operations

Returns:

Error::SUCCESS on success or error code on failure.

Backend get_reshape_backend() const

Returns backend used for reshape operations.

Throws:

Exception – if underlying call fails

Returns:

Backend used for reshape operations

Error get_stream(dtfft_stream_t *stream) const noexcept

Returns stream associated with current Plan.

This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

Parameters:

stream[out] CUDA stream associated with plan

Returns:

Error::SUCCESS on success or error code on failure.

dtfft_stream_t get_stream() const

Returns stream associated with current Plan.

This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.

Note

This method is only present in the API when dtFFT was compiled with CUDA Support.

Throws:

Exception – if underlying call fails

Returns:

dtFFT stream associated with plan

Error get_platform(Platform &platform) const noexcept

Returns plan execution platform.

Returns:

DTFFT_SUCCESS on success or error code on failure.

Platform get_platform() const

Returns plan execution platform.

Throws:

Exception – if underlying call fails

Returns:

Platform::HOST if plan is executed on host, Platform::CUDA if plan is executed on CUDA device.

inline dtfft_plan_t c_struct() const
Returns:

Underlying C structure

inline virtual ~Plan() noexcept = 0

Plan Destructor.

To fully clean all internal memory, this should be called before MPI_Finalize

class PlanC2C : public dtfft::Plan

Complex-to-Complex Plan.

Public Functions

explicit PlanC2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)

Complex-to-Complex Transpose-only Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(int8_t ndims, const int32_t *dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Generic Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)

Complex-to-Complex Plan constructor using pencil decomposition information.

Parameters:
  • pencil[in] Initialized Pencil object.

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanC2C(const Pencil &pencil, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Complex-to-Complex Plan constructor using pencil decomposition information.

Parameters:
  • pencil[in] Initialized Pencil object.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

class PlanR2C : public dtfft::Plan

Real-to-Complex Plan.

Public Functions

explicit PlanR2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Complex Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Complex Transpose-only Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(int8_t ndims, const int32_t *dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Complex Generic Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Complex Plan constructor using pencil decomposition information.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • pencil[in] Initialized Pencil object.

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2C(const Pencil &pencil, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Complex Plan constructor using pencil decomposition information.

Note

Parameter executor cannot be Executor::NONE. PlanC2C should be used instead.

Parameters:
  • pencil[in] Initialized Pencil object.

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor

Throws:

Exception – In case error occurs during plan creation

class PlanR2R : public dtfft::Plan

Real-to-Real Plan.

Public Functions

explicit PlanR2R(const std::vector<int32_t> &dims, const std::vector<R2RKind> &kinds = std::vector<R2RKind>(), MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • kinds[in] Real FFT kinds in reversed order. Can be empty vector if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Real Transpose-only Plan constructor.

Parameters:
  • dims[in] Vector with global dimensions in reversed order. dims.size() must be 2 or 3

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(int8_t ndims, const int32_t *dims, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Generic Plan constructor.

Parameters:
  • ndims[in] Number of dimensions: 2 or 3

  • dims[in] Buffer of size ndims with global dimensions in reversed order.

  • kinds[in] Buffer of size ndims with Real FFT kinds in reversed order. Can be nullptr if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)

Real-to-Real Transpose-only Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, const std::vector<R2RKind> &kinds, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • kinds[in] Real FFT kinds in reversed order. Can be empty vector if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation

explicit PlanR2R(const Pencil &pencil, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)

Real-to-Real Generic Plan constructor.

Parameters:
  • pencil[in] Initialized Pencil object.

  • kinds[in] Buffer of size ndims with Real FFT kinds in reversed order. Can be nullptr if executor == Executor::NONE

  • comm[in] MPI communicator: MPI_COMM_WORLD or Cartesian communicator

  • precision[in] Precision of transform.

  • effort[in] Effort level for the plan creation

  • executor[in] Type of external FFT executor.

Throws:

Exception – In case error occurs during plan creation