C++ API Reference¶
This page describes all classes, enumerators and functions available in dtFFT C++ API.
In order to use them user have to #include <dtfft.hpp>. All API is contained within dtfft namespace.
Predefined Macros¶
-
DTFFT_CXX_CALL(call)¶
Safe call macro.
Should be used to check error codes returned by
dtFFT.Throws an exception with a message explaining the error if one occurs.
Example
DTFFT_CXX_CALL( plan.execute(a, b, dtfft::Execute::FORWARD) )
Enumerators¶
-
enum class dtfft::Error¶
This enum lists the different error codes that
dtFFTcan return.Values:
-
enumerator SUCCESS¶
Successful execution.
-
enumerator MPI_FINALIZED¶
MPI_Init is not called or MPI_Finalize has already been called.
-
enumerator INVALID_TRANSPOSE_TYPE¶
Invalid
transpose_typeprovided.
-
enumerator INVALID_N_DIMENSIONS¶
Invalid Number of dimensions provided.
Valid options are 2 and 3
-
enumerator INVALID_DIMENSION_SIZE¶
One or more provided dimension sizes <= 0.
-
enumerator INVALID_COMM_TYPE¶
Invalid communicator type provided.
-
enumerator INVALID_PRECISION¶
Invalid
precisionparameter provided.
-
enumerator INVALID_EFFORT¶
Invalid
effortparameter provided.
-
enumerator INVALID_EXECUTOR¶
Invalid
executorparameter provided.
-
enumerator INVALID_COMM_DIMS¶
Number of dimensions in provided Cartesian communicator > Number of dimension passed to
createsubroutine.
-
enumerator INVALID_COMM_FAST_DIM¶
Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.
-
enumerator MISSING_R2R_KINDS¶
For R2R plan,
kindsparameter must be passed ifexecutor!=Executor::NONE
-
enumerator INVALID_R2R_KINDS¶
Invalid values detected in
kindsparameter.
-
enumerator R2C_TRANSPOSE_PLAN¶
Transpose plan is not supported in R2C, use R2R or C2C plan instead.
-
enumerator INPLACE_TRANSPOSE¶
Inplace transpose is not supported.
-
enumerator INVALID_AUX¶
Invalid
auxbuffer provided.
-
enumerator INVALID_LAYOUT¶
Invalid
layoutpassed toPlan.get_pencil
-
enumerator INVALID_USAGE¶
Invalid API Usage.
-
enumerator PLAN_IS_CREATED¶
Trying to create already created plan.
-
enumerator R2R_FFT_NOT_SUPPORTED¶
Selected
executordoes not support R2R FFTs.
-
enumerator ALLOC_FAILED¶
Internal call of Plan.mem_alloc failed.
-
enumerator FREE_FAILED¶
Internal call of Plan.mem_free failed.
-
enumerator INVALID_ALLOC_BYTES¶
Invalid
alloc_bytesprovided.
-
enumerator DLOPEN_FAILED¶
Failed to dynamically load library.
-
enumerator DLSYM_FAILED¶
Failed to dynamically load symbol.
-
enumerator PENCIL_ARRAYS_SIZE_MISMATCH¶
Deprecated/unused: R2C transpose call restriction (kept for backward compatibility of error code numbering)
Sizes of
startsandcountsarrays passed to Pencil constructor do not match
-
enumerator PENCIL_ARRAYS_INVALID_SIZES¶
Sizes of
startsandcounts< 2 or > 3 provided to Pencil constructor.
-
enumerator PENCIL_SHAPE_MISMATCH¶
Processes have same lower bounds (starts) but different sizes in some dimensions.
-
enumerator PENCIL_OVERLAP¶
Pencil overlap detected, i.e.
two processes share same part of global space
-
enumerator PENCIL_NOT_CONTINUOUS¶
Local pencils do not cover the global space without gaps.
-
enumerator PENCIL_NOT_INITIALIZED¶
Pencil is not initialized, i.e.
constructorsubroutine was not called
-
enumerator INVALID_MEASURE_WARMUP_ITERS¶
Invalid
n_measure_warmup_itersprovided.
-
enumerator INVALID_MEASURE_ITERS¶
Invalid
n_measure_itersprovided.
-
enumerator INVALID_REQUEST¶
Invalid
dtfft_request_tprovided.
-
enumerator TRANSPOSE_ACTIVE¶
Attempting to execute already active transposition.
-
enumerator TRANSPOSE_NOT_ACTIVE¶
Attempting to finalize non-active transposition.
-
enumerator INVALID_RESHAPE_TYPE¶
Invalid
reshape_typeprovided.
-
enumerator RESHAPE_ACTIVE¶
Attempting to execute already active reshape.
-
enumerator RESHAPE_NOT_ACTIVE¶
Attempting to finalize non-active reshape.
-
enumerator INPLACE_RESHAPE¶
Inplace reshape is not supported.
-
enumerator INVALID_EXECUTE_TYPE¶
R2C reshape was called.
Invalid
execute_typeprovided
-
enumerator RESHAPE_NOT_SUPPORTED¶
Reshape is not supported for this plan.
-
enumerator INVALID_CART_COMM¶
Invalid cartesian communicator provided.
-
enumerator INVALID_TRANSPOSE_MODE¶
Invalid transpose mode provided.
-
enumerator GPU_INVALID_STREAM¶
Invalid stream provided.
-
enumerator INVALID_BACKEND¶
Invalid backend provided.
-
enumerator GPU_NOT_SET¶
Multiple MPI Processes located on same host share same GPU which is not supported.
-
enumerator VKFFT_R2R_2D_PLAN¶
When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.
-
enumerator BACKENDS_DISABLED¶
Passed
effort==Effort::PATIENTbut all GPU backends have been disabled byConfig
-
enumerator NOT_DEVICE_PTR¶
One of pointers passed to
Plan::executeorPlan::transposecannot be accessed from device.
-
enumerator NOT_NVSHMEM_PTR¶
One of pointers passed to
Plan::executeorPlan::transposeis not anNVSHMEMpointer.
-
enumerator INVALID_PLATFORM¶
Invalid platform provided.
-
enumerator INVALID_PLATFORM_EXECUTOR¶
Invalid executor provided for selected platform.
-
enumerator INVALID_PLATFORM_BACKEND¶
Invalid backend provided for selected platform.
-
enumerator COMPRESSION_CUDA_NOT_SUPPORTED¶
CUDA support is not available for compression.
-
enumerator COMPRESSION_INVALID_RATE¶
Invalid compression rate.
-
enumerator COMPRESSION_INVALID_PRECISION¶
Invalid compression precision.
-
enumerator COMPRESSION_INVALID_TOLERANCE¶
Invalid compression tolerance.
-
enumerator COMPRESSION_INVALID_MODE¶
Invalid compression mode.
-
enumerator COMPRESSION_INVALID_LIBRARY¶
Invalid compression library.
-
enumerator COMPRESSION_NOT_USED¶
Compressed backends are not used for this plan.
-
enumerator SUCCESS¶
-
enum class dtfft::Execute¶
This enum lists valid
execute_typeparameters that can be passed to Plan::execute.Values:
-
enumerator FORWARD¶
Perform XYZ –> YZX –> ZXY plan execution (Forward)
-
enumerator BACKWARD¶
Perform ZXY –> YZX –> XYZ plan execution (Backward)
-
enumerator FORWARD¶
-
enum class dtfft::Transpose¶
This enum lists valid
transpose_typeparameters that can be passed to Plan::transpose.Values:
-
enumerator X_TO_Y¶
Transpose from Fortran X aligned to Fortran Y aligned.
-
enumerator Y_TO_X¶
Transpose from Fortran Y aligned to Fortran X aligned.
-
enumerator Y_TO_Z¶
Transpose from Fortran Y aligned to Fortran Z aligned.
-
enumerator Z_TO_Y¶
Transpose from Fortran Z aligned to Fortran Y aligned.
-
enumerator X_TO_Z¶
Transpose from Fortran X aligned to Fortran Z aligned.
Note
This value is valid only for 3D plans, and Plan::get_z_slab_enabled() must return
true
-
enumerator Z_TO_X¶
Transpose from Fortran Z aligned to Fortran X aligned.
Note
This value is valid only for 3D plans, and Plan::get_z_slab_enabled() must return
true
-
enumerator X_TO_Y¶
-
enum class dtfft::Precision¶
This enum lists valid
precisionparameters that can be passed to Plan constructors.See also
Values:
-
enumerator SINGLE¶
Use Single precision.
-
enumerator DOUBLE¶
Use Double precision.
-
enumerator SINGLE¶
-
enum class dtfft::Effort¶
This enum lists valid
effortparameters that can be passed to Plan constructors.Values:
-
enumerator ESTIMATE¶
Create plan as fast as possible.
-
enumerator MEASURE¶
Will attempt to find best MPI Grid decomposition.
Passing this flag and MPI Communicator with cartesian topology to any Plan Constructor is same as Effort::ESTIMATE.
-
enumerator PATIENT¶
Same as Effort::MEASURE plus autotune will try to find best backend.
-
enumerator EXHAUSTIVE¶
Same as Effort::PATIENT plus will autotune all possible kernels and reshape backends to find best configuration.
-
enumerator ESTIMATE¶
-
enum class dtfft::Executor¶
This enum lists available FFT executors.
See also
Values:
-
enumerator NONE¶
Do not create any FFT plans.
Creates transpose only plan.
-
enumerator FFTW3¶
FFTW3 Executor (Host only)
-
enumerator MKL¶
MKL DFTI Executor (Host only)
-
enumerator CUFFT¶
CUFFT Executor (GPU Only)
-
enumerator VKFFT¶
VkFFT Executor (GPU Only)
-
enumerator NONE¶
-
enum class dtfft::R2RKind¶
Real-to-Real FFT kinds available in dtFFT.
Values:
-
enumerator DCT_1¶
DCT-I (Logical N=2*(n-1), inverse is R2RKind::DCT_1)
-
enumerator DCT_2¶
DCT-II (Logical N=2*n, inverse is R2RKind::DCT_3)
-
enumerator DCT_3¶
DCT-III (Logical N=2*n, inverse is R2RKind::DCT_2)
-
enumerator DCT_4¶
DCT-IV (Logical N=2*n, inverse is R2RKind::DCT_4)
-
enumerator DST_1¶
DST-I (Logical N=2*(n+1), inverse is R2RKind::DST_1)
-
enumerator DST_2¶
DST-II (Logical N=2*n, inverse is R2RKind::DST_3)
-
enumerator DST_3¶
DST-III (Logical N=2*n, inverse is R2RKind::DST_2)
-
enumerator DST_4¶
DST-IV (Logical N=2*n, inverse is R2RKind::DST_4)
-
enumerator DCT_1¶
-
enum class dtfft::Backend¶
Various Backends available in dtFFT.
Values:
-
enumerator MPI_DATATYPE¶
Backend that uses MPI datatypes.
This is default backend for Host build.
Not really recommended to use for GPU usage, since it is a ‘million’ times slower than other backends. Not available for autotune when
effortis Effort::PATIENT in GPU build.
-
enumerator MPI_P2P¶
MPI peer-to-peer algorithm.
-
enumerator MPI_P2P_PIPELINED¶
MPI peer-to-peer algorithm with overlapping data copying and unpacking.
-
enumerator MPI_A2A¶
MPI backend using MPI_Alltoallv.
-
enumerator MPI_RMA¶
MPI backend using one-sided communications.
-
enumerator MPI_RMA_PIPELINED¶
MPI backend using pipelined one-sided communications.
-
enumerator MPI_P2P_SCHEDULED¶
MPI peer-to-peer algorithm with scheduled communication.
-
enumerator MPI_P2P_FUSED¶
MPI peer-to-peer pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.
-
enumerator MPI_RMA_FUSED¶
MPI RMA pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.
-
enumerator MPI_P2P_COMPRESSED¶
Extension of Backend.MPI_P2P_FUSED Data is getting compressed before sending and decompressed after receiving.
-
enumerator MPI_RMA_COMPRESSED¶
Extension of Backend.MPI_RMA_FUSED Data is getting compressed before sending and decompressed after receiving.
-
enumerator NCCL¶
NCCL backend.
-
enumerator NCCL_PIPELINED¶
NCCL backend with overlapping data copying and unpacking.
-
enumerator NCCL_COMPRESSED¶
NCCL backend that performs compression before data exchange and decompression after.
-
enumerator CUFFTMP¶
cuFFTMp backend
-
enumerator CUFFTMP_PIPELINED¶
cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance
-
enumerator ADAPTIVE¶
Adaptive backend selection: during plan creation dtFFT benchmarks multiple backends and selects the fastest backend independently for each transpose/reshape operation.
The selection is fixed for the lifetime of the plan.
Note
Can only be used when effort >= Effort.DTFFT_PATIENT.
Note
Currently only available for HOST execution platform
-
enumerator NONE¶
Backend is not defined.
This value is used when no backend is selected, for example when executing on a single process.
Note
This value should never be set by user directly. It can only be returned by the library.
-
enumerator MPI_DATATYPE¶
-
enum class dtfft::Platform¶
Enum that specifies runtime platform, e.g.
Host, CUDA, HIP
Values:
-
enumerator HOST¶
Host.
-
enumerator CUDA¶
CUDA.
-
enumerator HOST¶
-
enum class dtfft::Layout¶
This enum represents different data layouts used in dtFFT and it should be used to retrieve layout information from plans.
Values:
-
enumerator X_BRICKS¶
X-brick layout: data is distributed along all dimensions.
-
enumerator X_PENCILS¶
X-pencil layout: data is distributed along Y and Z dimensions.
-
enumerator X_PENCILS_FOURIER¶
X-pencil layout obtained after executing FFT for R2C plan: data is distributed along Y and Z dimensions.
-
enumerator Y_PENCILS¶
Y-pencil layout: data is distributed along X and Z dimensions.
-
enumerator Z_PENCILS¶
Z-pencil layout: data is distributed along X and Y dimensions.
-
enumerator Z_BRICKS¶
Z-brick layout: data is distributed along all dimensions.
-
enumerator X_BRICKS¶
-
enum class dtfft::Reshape¶
This enum lists valid
reshape_typeparameters that can be passed to Plan::reshape.Values:
-
enumerator X_BRICKS_TO_PENCILS¶
Reshape from X bricks to X pencils.
-
enumerator X_PENCILS_TO_BRICKS¶
Reshape from X pencils to X bricks.
-
enumerator Z_BRICKS_TO_PENCILS¶
Reshape from Z bricks to Z pencils.
-
enumerator Z_PENCILS_TO_BRICKS¶
Reshape from Z pencils to Z bricks.
-
enumerator Y_BRICKS_TO_PENCILS¶
Reshape from Y-bricks to Y-pencils This is to be used in 2D Plans.
-
enumerator Y_PENCILS_TO_BRICKS¶
Reshape from Y-pencils to Y-bricks This is to be used in 2D Plans.
-
enumerator X_BRICKS_TO_PENCILS¶
-
enum class dtfft::TransposeMode¶
This enum specifies at which stage the local transposition is performed during global exchange.
It affects only Generic backends that perform explicit packing/unpacking.
Values:
-
enumerator PACK¶
Perform transposition during the packing stage (Sender side).
-
enumerator UNPACK¶
Perform transposition during the unpacking stage (Receiver side).
-
enumerator PACK¶
-
enum class dtfft::AccessMode¶
This enum specifies whether to prioritize write-aligned (contiguous in memory) or read-aligned (scattered access) operations during local transposition in Generic backends.
Values:
-
enumerator WRITE¶
Write-aligned access (scattered read, contiguous write).
Usually faster on CPUs.
-
enumerator READ¶
Read-aligned access (contiguous read, scattered write).
-
enumerator WRITE¶
-
enum class dtfft::CompressionLib¶
Enum that specifies compression library.
Values:
-
enumerator ZFP¶
ZFP compression library.
-
enumerator ZFP¶
-
enum class dtfft::CompressionMode¶
Enum that specifies compression mode.
Values:
-
enumerator LOSSLESS¶
Lossless compression mode.
-
enumerator FIXED_RATE¶
Fixed rate compression mode.
-
enumerator FIXED_PRECISION¶
Fixed precision compression mode.
-
enumerator FIXED_ACCURACY¶
Fixed accuracy compression mode.
-
enumerator LOSSLESS¶
Functions¶
-
std::string dtfft::get_backend_string(Backend backend)¶
Returns string with name of backend provided as argument.
- Parameters:
backend – [in] Backend to represent
- Returns:
String representation of
backend.
-
std::string dtfft::get_error_string(Error error_code) noexcept¶
Returns the string description of an error code.
- Parameters:
error_code – [in] Error code to convert to string
- Returns:
String representation of
error_code
-
std::string dtfft::get_precision_string(Precision precision) noexcept¶
Returns the string representation of a Precision value.
- Parameters:
precision – [in] Precision level to convert to string
- Returns:
String representation of Precision.
-
std::string dtfft::get_executor_string(Executor executor) noexcept¶
Returns the string representation of an Executor value.
- Parameters:
executor – [in] Executor type to convert to string
- Returns:
String representation of Executor.
-
Error dtfft::set_config(const Config &config) noexcept¶
Sets configuration values to dtFFT.
Must be called before plan creation to take effect.
See also
- Returns:
Error::SUCCESS if the call was successful, error code otherwise
Structs¶
-
struct Version¶
dtFFT version information
Public Static Functions
Public Static Attributes
-
struct Pencil¶
Class to handle Pencils.
This is wrapper around
dtfft_pencil_tC structure.There are two ways users might find pencils useful inside dtFFT:
To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructor.
To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.
See also
Public Functions
-
Pencil()¶
Default constructor, does not actually initialize anything.
-
explicit Pencil(int32_t n_dims, const int32_t *starts, const int32_t *counts)¶
Pencil constructor.
After calling this constructor, this pencil can be used to create Plan
- Parameters:
n_dims – [in] Number of dimensions in pencil, must be 2 or 3
starts – [in] Local starts in natural Fortran order
counts – [in] Local counts in natural Fortran order
-
explicit Pencil(const std::vector<int32_t> &starts, const std::vector<int32_t> &counts)¶
Pencil constructor.
After calling this constructor, this pencil can be used to create Plan
- Parameters:
starts – [in] Local starts in natural Fortran order
counts – [in] Local counts in natural Fortran order
-
uint8_t get_ndims() const¶
- Returns:
Number of dimensions in a pencil
-
uint8_t get_dim() const¶
- Returns:
Aligned dimension ID starting from 1
-
std::vector<int32_t> get_starts() const¶
- Returns:
Local starts in natural Fortran order
-
std::vector<int32_t> get_counts() const¶
- Returns:
Local counts in natural Fortran order
-
size_t get_size() const¶
- Returns:
Total number of elements in a pencil
-
const dtfft_pencil_t &c_struct() const¶
- Returns:
Underlying C structure
-
struct Config¶
Class to set additional configuration parameters to dtFFT.
See also
Public Functions
-
inline explicit Config()¶
Creates and sets default configuration values.
-
inline Config &set_enable_log(const bool enable_log) noexcept¶
Sets whether dtFFT should print additional information or not.
Default is
false
-
inline Config &set_enable_z_slab(bool enable_z_slab) noexcept¶
Sets whether dtFFT use Z-slab optimization or not.
Default is
trueOne should consider disabling Z-slab optimization in order to resolve
Error::VKFFT_R2R_2D_PLANerror or when underlying FFT implementation of 2D plan is too slow.In all other cases, Z-slab is considered to be always faster.
-
inline Config &set_enable_y_slab(bool enable_y_slab) noexcept¶
Sets whether dtFFT should use Y-slab optimization or not.
Default is
falseIf
truethendtFFTwill skip the transpose step between Y and Z aligned layouts during call to Plan::execute(). One should consider disabling Y-slab optimization in order to resolveError::VKFFT_R2R_2D_PLANerror or when underlying FFT implementation of 2D plan is too slow.In all other cases, Y-slab is considered to be always faster.
-
inline Config &set_measure_warmup_iters(int32_t n_measure_warmup_iters) noexcept¶
Sets number of warmup iterations to underlying C structure.
- Parameters:
n_measure_warmup_iters – [in] Number of warmup iterations to execute during backend and kernel autotuning when effort level is Effort::MEASURE or higher.
-
inline Config &set_measure_iters(int32_t n_measure_iters) noexcept¶
Sets number of actual iterations to underlying C structure.
- Parameters:
n_measure_iters – [in] Number of iterations to execute during backend and kernel autotuning when effort level is Effort::MEASURE or higher.
-
inline Config &set_platform(Platform platform) noexcept¶
Sets platform to execute plan.
Default is Platform::HOST.
This option is only available when dtFFT is built with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related. This enables a single library installation to support both host and CUDA plans.
-
inline Config &set_stream(dtfft_stream_t stream) noexcept¶
Sets Main CUDA stream that will be used in dtFFT.
This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by Plan::get_stream function. When user sets stream he is responsible of destroying it.
Stream must not be destroyed before call to destroy.
Note
This method is only present in the API when
dtFFTwas compiled with CUDA Support.
-
inline Config &set_backend(Backend backend) noexcept¶
Sets Backend that will be used by dtFFT when
effortis Effort::ESTIMATE or Effort::MEASURE.Default for HOST platform is Backend::MPI_DATATYPE.
Default for CUDA platform is Backend::NCCL if NCCL is enabled, otherwise Backend::MPI_P2P.
-
inline Config &set_reshape_backend(Backend backend) noexcept¶
Sets Backend that will be used by dtFFT for data reshaping from bricks to pencils and vice versa when
effortis Effort::ESTIMATE or Effort::MEASURE.Default for HOST platform is Backend::MPI_DATATYPE.
Default for CUDA platform is Backend::NCCL if NCCL is enabled, otherwise Backend::MPI_P2P.
-
inline Config &set_enable_datatype_backend(bool enable_datatype_backend) noexcept¶
Should Backend::MPI_DATATYPE be considered for autotuning when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.This option only works when
platformis Platform::HOST. Whenplatformis Platform::CUDA, Backend::MPI_DATATYPE is always disabled during autotuning.
-
inline Config &set_enable_mpi_backends(bool enable_mpi_backends) noexcept¶
Should MPI Backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
false.This option applies to all
Backend::MPI_*backends, except Backend::MPI_DATATYPE.The following applies only to CUDA builds. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.
For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.
One of the workarounds is to disable MPI Backends by default, which is done here.
Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to
mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms
-
inline Config &set_enable_pipelined_backends(bool enable_pipelined_backends) noexcept¶
Sets whether pipelined backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.
-
inline Config &set_enable_rma_backends(bool enable_rma_backends) noexcept¶
Sets whether RMA backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.
-
inline Config &set_enable_fused_backends(bool enable_fused_backends) noexcept¶
Sets whether fused backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.
-
inline Config &set_enable_nccl_backends(bool enable_nccl_backends) noexcept¶
Sets whether NCCL backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.Note
This method is only present in the API when
dtFFTwas compiled with CUDA Support.
-
inline Config &set_enable_nvshmem_backends(bool enable_nvshmem_backends) noexcept¶
Should NVSHMEM backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
true.Note
This method is only present in the API when
dtFFTwas compiled with CUDA Support.
-
inline Config &set_enable_kernel_autotune(bool enable_kernel_autotune) noexcept¶
Should dtFFT try to optimize kernel launch parameters during plan creation when
effortis below Effort::EXHAUSTIVE.Default is
false.Kernel optimization is always enabled for Effort::EXHAUSTIVE effort level. Setting this option to true enables kernel optimization for lower effort levels (Effort::ESTIMATE, Effort::MEASURE, Effort::PATIENT). This may increase plan creation time but can improve runtime performance. Since kernel optimization is performed without data transfers, the time increase is usually minimal.
-
inline Config &set_enable_fourier_reshape(bool enable_fourier_reshape) noexcept¶
Should dtFFT execute reshapes from pencils to bricks and vice versa in Fourier space during calls to execute.
Default is
false.When enabled, data will be in brick layout in Fourier space, which may be useful for certain operations between forward and backward transforms. However, this requires additional all-to-all exchange and will reduce overall performance.
-
inline Config &set_transpose_mode(TransposeMode transpose_mode) noexcept¶
Sets at which stage the local transposition is performed during global exchange when effort level is below Effort::EXHAUSTIVE.
Default is TransposeMode::PACK.
For Effort::EXHAUSTIVE effort level, dtFFT will always choose the best transpose mode based on internal autotuning.
Note
This option only takes effect if platform == Platform::HOST
-
inline Config &set_access_mode(AccessMode access_mode) noexcept¶
Sets the memory access mode for local transposition in Generic backends.
This setting allows choosing between write-aligned (contiguous write) or read-aligned (contiguous read) memory access patterns during the local transposition phase.
Default is AccessMode::WRITE.
Write-aligned access is generally faster on CPU architectures due to cache line utilization optimization. However, specific hardware or memory subsystem characteristics might favor read-aligned access.
-
inline Config &set_enable_compressed_backends(bool enable_compressed_backends) noexcept¶
Should compressed backends be enabled when
effortis Effort::PATIENT or Effort::EXHAUSTIVE.Default is
false.Only fixed-rate compression can be used during autotuning, since it provides predictable performance characteristics and does not require data-dependent decisions at runtime. To enable compressed backends during autotuning, set this option to true, set compression type to CompressionMode::FIXED_RATE and provide desired compression rate.
-
inline Config &set_compression_config_transpose(const CompressionConfig &compression_config) noexcept¶
Sets compression configuration for transpositions.
-
inline Config &set_compression_config_reshape(const CompressionConfig &compression_config) noexcept¶
Sets compression configuration for reshape operations.
-
inline dtfft_config_t c_struct() const¶
- Returns:
Underlying C structure
-
inline explicit Config()¶
-
struct CompressionConfig¶
Struct that specifies compression configuration.
Public Functions
-
CompressionConfig() = default¶
Default constructor.
-
inline CompressionConfig(CompressionLib lib, CompressionMode mode, double rate = -1.0, int32_t precision = -1, double tolerance = -1.0)¶
Constructor with parameters.
-
inline CompressionConfig(CompressionMode mode, double value)¶
Fixed rate or fixed accuracy compression constructor.
-
inline CompressionConfig(int32_t precision)¶
Fixed precision compression constructor.
Public Members
-
CompressionLib compression_lib = CompressionLib::ZFP¶
Compression library to use.
-
CompressionMode compression_mode = CompressionMode::LOSSLESS¶
Compression mode to use.
-
double rate = -1.0¶
Rate for CompressionMode::FIXED_RATE.
-
int32_t precision = -1¶
Precision for CompressionMode::FIXED_PRECISION.
-
double tolerance = -1.0¶
Tolerance for CompressionMode::FIXED_ACCURACY.
-
CompressionConfig() = default¶
Classes¶
-
class Exception : public std::exception¶
Basic exception class.
Public Functions
-
Exception(Error error_code, std::string msg, const char *file, int line)¶
Basic exception constructor.
- Parameters:
error_code – [in] Error code
msg – [in] Message describing the error that occurred
file – [in] Filename where the exception was thrown
line – [in] Line number where the exception was thrown
-
const std::string &get_message() const noexcept¶
Returns error message of exception.
-
const std::string &get_file() const noexcept¶
Returns file name where exception occurred.
-
int get_line() const noexcept¶
Returns line number where exception occurred.
-
Exception(Error error_code, std::string msg, const char *file, int line)¶
-
class Plan¶
Abstract plan for all dtFFT plans.
This class does not have any constructors. To create a plan user should use one of the inherited classes.
Subclassed by dtfft::PlanC2C, dtfft::PlanR2C, dtfft::PlanR2R
Public Functions
-
Error get_z_slab_enabled(bool *is_z_slab_enabled) const noexcept¶
Checks if plan is using Z-slab optimization.
If
truethen flags Transpose::X_TO_Z and Transpose::Z_TO_X will be valid to pass to Plan::transpose method.- Parameters:
is_z_slab_enabled – [out] Boolean value if Z-slab is used.
- Returns:
Error::SUCCESS if call was without error, error code otherwise
-
bool get_z_slab_enabled() const¶
Checks if plan is using Z-slab optimization.
- Throws:
Exception – if underlying call fails
- Returns:
trueif Z-slab is enabled, false otherwise
-
Error get_y_slab_enabled(bool *is_y_slab_enabled) const noexcept¶
Checks if plan is using Y-slab optimization.
If
truethen during call to Plan::execute the transpose between Y and Z aligned layouts will be skipped.- Parameters:
is_y_slab_enabled – [out] Boolean value if Y-slab is used.
- Returns:
Error::SUCCESS if call was without error, error code otherwise
-
bool get_y_slab_enabled() const¶
Checks if plan is using Y-slab optimization.
- Throws:
Exception – if underlying call fails
- Returns:
trueif Y-slab is enabled, false otherwise
-
Error report() const noexcept¶
Prints plan-related information to stdout.
- Returns:
Error::SUCCESS if call was without error, error code otherwise
-
Error report_compression() const noexcept¶
Prints compression-related information to stdout.
- Returns:
Error::SUCCESS if call was without error, error code otherwise
-
Error get_pencil(Layout layout, Pencil &pencil) const noexcept¶
Obtains pencil information from plan.
This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.
- Parameters:
layout – [in] Required layout of the pencil
pencil – [out] Created Pencil object
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error execute(void *in, void *out, Execute execute_type, void *aux = nullptr) const noexcept¶
Plan execution.
- Parameters:
in – [inout] Input pointer
out – [out] Result pointer
execute_type – [in] Direction of execution
aux – [inout] Optional Auxiliary pointer. If provided, must be at least get_aux_bytes() bytes.
- Returns:
Error::SUCCESS on success or error code on failure.
-
template<typename Tr>
inline Tr *execute(void *inout, const Execute execute_type, void *aux = nullptr) const¶ In-place plan execution.
This template allows user to cast result pointer to desired type.
float *data = ...; // Pointer to data PlanR2C plan = ...; // Create plan auto fourier_data = plan.execute<std::complex<float>>(data, Execute::FORWARD); // `fourier_data` is still pointing to `data`, but is of type std::complex<float>*
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
execute_type – [in] Direction of execution
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
template<typename T, typename Tr = T>
inline Tr *execute(T *inout, const Execute execute_type, void *aux = nullptr) const¶ In-place plan execution.
This template allows user to keep result pointer of the same type as input pointer.
float *data = ...; // Pointer to data PlanR2R plan = ...; // Create plan auto fourier_data = plan.execute(data, Execute::FORWARD); // `fourier_data` is still pointing to `data` and is still of type float*
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
execute_type – [in] Direction of execution
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
Error forward(void *in, void *out, void *aux) const noexcept¶
Forward plan execution.
- Parameters:
in – [inout] Input pointer
out – [out] Result pointer
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_bytes() bytes.
- Returns:
Error::SUCCESS on success or error code on failure.
-
template<typename Tr>
inline Tr *forward(void *inout, void *aux = nullptr) const¶ In-place forward plan execution.
This template allows user to cast result pointer to desired type.
float *data = ...; // Pointer to data PlanR2C plan = ...; // Create plan auto fourier_data = plan.forward<std::complex<float>>(data); // `fourier_data` is still pointing to `data`, but is of type std::complex<float>*
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
template<typename T, typename Tr = T>
inline Tr *forward(T *inout, void *aux = nullptr) const¶ In-place forward plan execution.
This template allows user to keep result pointer of the same type as input pointer.
float *data = ...; // Pointer to data PlanR2R plan = ...; // Create plan auto fourier_data = plan.forward(data); // `fourier_data` is still pointing to `data` and is still of type float*
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
Error backward(void *in, void *out, void *aux) const noexcept¶
Backward plan execution.
- Parameters:
in – [inout] Input pointer
out – [out] Result pointer
aux – [inout] Auxiliary pointer. Can be
nullptr
- Returns:
Error::SUCCESS on success or error code on failure.
-
template<typename Tr>
inline Tr *backward(void *inout, void *aux = nullptr) const¶ In-place backward plan execution.
This template allows user to cast result pointer to desired type.
std::complex<float> *fourier_data = ...; // Pointer to data PlanR2C plan = ...; // Create plan auto real_data = plan.backward<float>(fourier_data); // `real_data` is still pointing to `fourier_data`, but is of type float*
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
template<typename T, typename Tr = T>
inline Tr *backward(T *inout, void *aux = nullptr) const¶ In-place backward plan execution.
This template allows user to keep result pointer of the same type as input pointer.
float *fourier_data = ...; // Pointer to data PlanR2R plan = ...; // Create plan auto real_data = plan.backward(fourier_data); // `real_data` is still pointing to `fourier_data` and is still of type float *
Note
Not all plans support in-place plan executing. Refer to the manual for list of unsupported cases.
- Template Parameters:
T – Type of input/output data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
Tr – Type of returned data. This should be a basic pointer type, e.g. float, double or std::complex of any of those
- Parameters:
inout – [inout] Input/output pointer
aux – [inout] Optional Auxiliary pointer
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to the processed data casted to type
Tr
-
Error transpose(void *in, void *out, Transpose transpose_type, void *aux = nullptr) const noexcept¶
Transpose data in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Pointer of transposed data
transpose_type – [in] Type of transpose to perform.
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_transpose() elements.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error transpose_start(void *in, void *out, Transpose transpose_type, void *aux, dtfft_request_t *request) const noexcept¶
Starts an asynchronous transpose operation in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
transpose_type – [in] Type of transpose to perform
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_transpose() bytes.request – [out] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error transpose_start(void *in, void *out, Transpose transpose_type, dtfft_request_t *request) const noexcept¶
Starts an asynchronous transpose operation in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
transpose_type – [in] Type of transpose to perform
request – [out] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
dtfft_request_t transpose_start(void *in, void *out, Transpose transpose_type, void *aux = nullptr) const¶
Starts an asynchronous transpose operation in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
transpose_type – [in] Type of transpose to perform
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_transpose() bytes.
- Returns:
Handle to manage the asynchronous operation
-
Error transpose_end(dtfft_request_t request) const noexcept¶
Ends an asynchronous transpose operation.
- Parameters:
request – [inout] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error reshape(void *in, void *out, Reshape reshape_type, void *aux = nullptr) const noexcept¶
Reshape data from bricks to pencils and vice versa.
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Pointer of reshaped data
reshape_type – [in] Type of reshape to perform.
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_reshape() elements.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error reshape_start(void *in, void *out, Reshape reshape_type, void *aux, dtfft_request_t *request) const noexcept¶
Starts an asynchronous reshape operation from bricks to pencils and vice versa.
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
reshape_type – [in] Type of reshape to perform
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_reshape() elements.request – [out] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error reshape_start(void *in, void *out, Reshape reshape_type, dtfft_request_t *request) const noexcept¶
Starts an asynchronous reshape operation from bricks to pencils and vice versa.
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
reshape_type – [in] Type of reshape to perform
request – [out] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
dtfft_request_t reshape_start(void *in, void *out, Reshape reshape_type, void *aux = nullptr) const¶
Starts an asynchronous reshape operation from bricks to pencils and vice versa.
- Attention
inandoutcannot be the same pointers
- Parameters:
in – [inout] Input pointer
out – [out] Output pointer
reshape_type – [in] Type of reshape to perform
aux – [inout] Auxiliary pointer. Can be
nullptr. If provided, must be at least get_aux_size_reshape() elements.
- Returns:
Handle to manage the asynchronous operation
-
Error reshape_end(dtfft_request_t request) const noexcept¶
Ends an asynchronous reshape operation.
- Parameters:
request – [inout] Handle to manage the asynchronous operation
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error get_alloc_size(size_t *alloc_size) const noexcept¶
Wrapper around
Plan.get_local_sizesto obtainalloc_sizeonly.- Parameters:
alloc_size – [out] Minimum number of elements to be allocated for
inandoutbuffers required byPlan::execute,Plan::transpose, andPlan::reshape. Size of each element in bytes can be obtained by callingPlan::get_element_size.- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_alloc_size() const¶
Wrapper around Plan.get_local_sizes to obtain
alloc_sizeonly.- Throws:
Exception – if underlying call fails
- Returns:
Minimum number of elements to be allocated for
inandoutbuffers required byPlan::execute,Plan::transpose, andPlan::reshape.
-
Error get_aux_size(std::size_t *aux_size) const noexcept¶
Get auxiliary buffer size required to execute the plan.
- Parameters:
aux_size – [out] Number of elements required for auxiliary buffer. Size of each element in bytes can be obtained by calling
Plan::get_element_size.- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_size() const¶
Get auxiliary buffer size required to execute the plan.
- Throws:
Exception – if underlying call fails
- Returns:
Number of elements required for auxiliary buffer.
-
Error get_aux_bytes(std::size_t *aux_bytes) const noexcept¶
Get auxiliary buffer size in bytes required to execute the plan.
- Parameters:
aux_bytes – [out] Number of bytes required for auxiliary buffer.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_bytes() const¶
Get auxiliary buffer size in bytes required to execute the plan.
- Throws:
Exception – if underlying call fails
- Returns:
Number of bytes required for auxiliary buffer.
-
Error get_aux_size_reshape(std::size_t *aux_size) const noexcept¶
Get number of elements required by Plan::reshape.
- Parameters:
aux_size – [out] Number of elements required for auxiliary buffer during reshape operation. Size of each element in bytes can be obtained by calling
Plan::get_element_size.- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_size_reshape() const¶
Get number of elements required by Plan::reshape.
- Throws:
Exception – if underlying call fails
- Returns:
Number of elements required for auxiliary buffer during reshape operation.
-
Error get_aux_bytes_reshape(std::size_t *aux_bytes) const noexcept¶
Get number of bytes required by Plan.reshape.
- Parameters:
aux_bytes – [out] Number of bytes required for auxiliary buffer during reshape operation.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_bytes_reshape() const¶
Get number of bytes required by Plan::reshape.
- Throws:
Exception – if underlying call fails
- Returns:
Number of bytes required for auxiliary buffer during reshape operation.
-
Error get_aux_size_transpose(std::size_t *aux_size) const noexcept¶
Get number of elements required by Plan::transpose.
- Parameters:
aux_size – [out] Number of elements required for auxiliary buffer during transpose operations.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_size_transpose() const¶
Get number of elements required by Plan::transpose.
- Throws:
Exception – if underlying call fails
- Returns:
Number of elements required for auxiliary buffer during transpose operations.
-
Error get_aux_bytes_transpose(std::size_t *aux_bytes) const noexcept¶
Get number of bytes required by Plan::transpose.
- Parameters:
aux_bytes – [out] Number of bytes required for auxiliary buffer during transpose operations.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::size_t get_aux_bytes_transpose() const¶
Get number of bytes required by Plan::transpose.
- Throws:
Exception – if underlying call fails
- Returns:
Number of bytes required for auxiliary buffer during transpose operations.
-
Error get_local_sizes(std::vector<int32_t> &in_starts, std::vector<int32_t> &in_counts, std::vector<int32_t> &out_starts, std::vector<int32_t> &out_counts, std::size_t *alloc_size) const noexcept¶
Get grid decomposition information.
Results may differ on different MPI processes
Note
Before calling this function, user must ensure that
in_starts,in_counts,out_startsandout_countsvectors are large enough to hold the data.- Parameters:
in_starts – [out] Starts of local portion of data in ‘real’ space in reversed order
in_counts – [out] Sizes of local portion of data in ‘real’ space in reversed order
out_starts – [out] Starts of local portion of data in ‘fourier’ space in reversed order
out_counts – [out] Sizes of local portion of data in ‘fourier’ space in reversed order
alloc_size – [out] Minimum number of elements to be allocated for
in,outbuffers. Size of each element in bytes can be obtained by callingPlan::get_element_size.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error get_local_sizes(int32_t *in_starts = nullptr, int32_t *in_counts = nullptr, int32_t *out_starts = nullptr, int32_t *out_counts = nullptr, size_t *alloc_size = nullptr) const noexcept¶
Get grid decomposition information.
Results may differ on different MPI processes
- Parameters:
in_starts – [out] Starts of local portion of data in ‘real’ space in reversed order
in_counts – [out] Sizes of local portion of data in ‘real’ space in reversed order
out_starts – [out] Starts of local portion of data in ‘fourier’ space in reversed order
out_counts – [out] Sizes of local portion of data in ‘fourier’ space in reversed order
alloc_size – [out] Minimum number of elements needs to be allocated for
in,outbuffers. Size of each element in bytes can be obtained by callingPlan.get_element_size.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error get_element_size(size_t *element_size) const noexcept¶
Obtains number of bytes required to store single element by this plan.
- Parameters:
element_size – [out] Size of element in bytes
- Returns:
Error::SUCCESS on success or error code on failure.
-
size_t get_element_size() const¶
Obtains number of bytes required to store single element by this plan.
- Throws:
Exception – if underlying call fails
- Returns:
Size of element in bytes
-
Error get_alloc_bytes(size_t *alloc_bytes) const noexcept¶
Returns minimum number of bytes required for in and out buffers.
This function is a combination of two calls: Plan::get_alloc_size and Plan::get_element_size. Returns minimum number of bytes to be allocated for
inandoutbuffers required byPlan::execute. Minimum number ofauxbytes required byPlan::executecan be obtained by callingPlan::get_aux_bytes.- Parameters:
alloc_bytes – [out] Number of bytes required
- Returns:
Error::SUCCESS on success or error code on failure.
-
size_t get_alloc_bytes() const¶
Returns minimum number of bytes required for in and out buffers.
This function is a combination of two calls: Plan::get_alloc_size and Plan::get_element_size. Returns minimum number of bytes to be allocated for
inandoutbuffers required byPlan::execute. Minimum number ofauxbytes required byPlan::executecan be obtained by callingPlan::get_aux_bytes.- Throws:
Exception – if underlying call fails
- Returns:
Number of bytes of each buffer required to execute plan
-
Error get_executor(Executor *executor) const noexcept¶
Returns executor used by this plan.
- Parameters:
executor – [out] Executor used by this plan.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Executor get_executor() const¶
Returns executor used by this plan.
- Throws:
Exception – if underlying call fails
- Returns:
Executor used by this plan.
-
Error get_precision(Precision *precision) const noexcept¶
Returns precision of the plan.
- Parameters:
precision – [out] Precision of the plan.
- Returns:
Error::SUCCESS on success or error code on failure.
-
Precision get_precision() const¶
Returns precision of the plan.
- Throws:
Exception – if underlying call fails
- Returns:
Precision of the plan.
-
Error get_dims(int8_t *ndims, const int32_t *dims[]) const noexcept¶
Returns global dimensions of the plan.
Note
Do not free the array, it is freed when the Plan is destroyed.
- Parameters:
ndims – [out] Number of dimensions in the plan. User can pass nullptr if this value is not needed.
dims – [out] Array of dimensions in natural Fortran order. User can pass nullptr if this value is not needed.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::vector<int32_t> get_dims() const¶
Returns global dimensions of the plan.
- Throws:
Exception – if underlying call fails
- Returns:
Vector of dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan.
-
Error get_grid_dims(int8_t *ndims, const int32_t *grid_dims[]) const noexcept¶
Returns grid decomposition dimensions of the plan.
Note
Do not free
grid_dimsarray, it is freed when the Plan is destroyed.- Parameters:
ndims – [out] Number of dimensions in plan. User can pass nullptr if this value is not needed.
grid_dims – [out] Pointer of size
ndimscontaining grid decomposition dimensions in reverse order: grid_dims[0] is the fastest varying and is always equal to 1. User can pass nullptr if this value is not needed.
- Returns:
Error::SUCCESS on success or error code on failure.
-
std::vector<int32_t> get_grid_dims() const¶
Returns grid decomposition dimensions of the plan.
- Throws:
Exception – if underlying call fails
- Returns:
Vector of grid decomposition dimensions in natural Fortran order. Size of vector is equal to number of dimensions in the plan. First value is always equal to 1.
-
Error mem_alloc(size_t alloc_bytes, void **ptr) const noexcept¶
Allocates memory specific for this plan.
- Parameters:
alloc_bytes – [in] Number of bytes to allocate
ptr – [out] Allocated pointer
- Returns:
Error::SUCCESS on success or error code on failure.
-
void *mem_alloc(size_t alloc_bytes) const¶
Allocates memory specific for this plan.
- Parameters:
alloc_bytes – Number of bytes to allocate
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to allocated memory
-
template<typename T>
inline T *mem_alloc(const size_t alloc_size) const¶ Allocates memory for an array of elements of type T.
- Template Parameters:
T – Type of elements
- Parameters:
alloc_size – [in] Number of elements to allocate
- Throws:
Exception – if underlying call fails
- Returns:
Pointer to allocated memory
-
Error mem_free(void *ptr) const noexcept¶
Frees memory specific for this plan.
- Parameters:
ptr – [inout] Allocated pointer
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error destroy() noexcept¶
Plan Destructor.
To fully clean all internal memory, this should be called before MPI_Finalize
- Returns:
Error::SUCCESS on success or error code on failure.
-
Error get_backend(Backend &backend) const noexcept¶
Returns selected backend during autotune if
effortis Effort::PATIENT.If
effortpassed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.- Returns:
Error::SUCCESS on success or error code on failure.
-
Backend get_backend() const¶
Returns selected backend during autotune if
effortis Effort::PATIENT.If
effortpassed to any create function is Effort::ESTIMATE or Effort::MEASURE returns value set by Config.set_backend followed by set_config() or default value, which is Backend::NCCL.- Throws:
Exception – if underlying call fails
- Returns:
Backend used by this plan.
-
Error get_reshape_backend(Backend &backend) const noexcept¶
Returns backend used for reshape operations.
- Parameters:
backend – [out] Backend used for reshape operations
- Returns:
Error::SUCCESS on success or error code on failure.
-
Backend get_reshape_backend() const¶
Returns backend used for reshape operations.
- Throws:
Exception – if underlying call fails
- Returns:
Backend used for reshape operations
-
Error get_stream(dtfft_stream_t *stream) const noexcept¶
Returns stream associated with current Plan.
This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.
Note
This method is only present in the API when
dtFFTwas compiled with CUDA Support.- Parameters:
stream – [out] CUDA stream associated with plan
- Returns:
Error::SUCCESS on success or error code on failure.
-
dtfft_stream_t get_stream() const¶
Returns stream associated with current Plan.
This can either be stream passed by Config.set_stream followed by set_config() or stream created internally. Returns NULL pointer if plan’s platform is Platform::HOST.
Note
This method is only present in the API when
dtFFTwas compiled with CUDA Support.- Throws:
Exception – if underlying call fails
- Returns:
dtFFT stream associated with plan
-
Error get_platform(Platform &platform) const noexcept¶
Returns plan execution platform.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
Platform get_platform() const¶
Returns plan execution platform.
- Throws:
Exception – if underlying call fails
- Returns:
Platform::HOST if plan is executed on host, Platform::CUDA if plan is executed on CUDA device.
-
inline dtfft_plan_t c_struct() const¶
- Returns:
Underlying C structure
-
Error get_z_slab_enabled(bool *is_z_slab_enabled) const noexcept¶
-
class PlanC2C : public dtfft::Plan¶
Complex-to-Complex Plan.
Public Functions
-
explicit PlanC2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Complex-to-Complex Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanC2C(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)¶
Complex-to-Complex Transpose-only Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3precision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanC2C(int8_t ndims, const int32_t *dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Complex-to-Complex Generic Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Buffer of size
ndimswith global dimensions in reversed order.comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanC2C(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)¶
Complex-to-Complex Plan constructor using pencil decomposition information.
-
explicit PlanC2C(const Pencil &pencil, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Complex-to-Complex Plan constructor using pencil decomposition information.
- Parameters:
pencil – [in] Initialized Pencil object.
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanC2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
-
class PlanR2C : public dtfft::Plan¶
Real-to-Complex Plan.
Public Functions
-
explicit PlanR2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Complex Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2C(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)¶
Real-to-Complex Transpose-only Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3precision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2C(int8_t ndims, const int32_t *dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Complex Generic Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Buffer of size
ndimswith global dimensions in reversed order.comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2C(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)¶
Real-to-Complex Plan constructor using pencil decomposition information.
Note
Parameter
executorcannot be Executor::NONE. PlanC2C should be used instead.
-
explicit PlanR2C(const Pencil &pencil, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Complex Plan constructor using pencil decomposition information.
Note
Parameter
executorcannot be Executor::NONE. PlanC2C should be used instead.- Parameters:
pencil – [in] Initialized Pencil object.
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2C(const std::vector<int32_t> &dims, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
-
class PlanR2R : public dtfft::Plan¶
Real-to-Real Plan.
Public Functions
-
explicit PlanR2R(const std::vector<int32_t> &dims, const std::vector<R2RKind> &kinds = std::vector<R2RKind>(), MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Real Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3kinds – [in] Real FFT kinds in reversed order. Can be empty vector if
executor== Executor::NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2R(const std::vector<int32_t> &dims, Precision precision, Effort effort = Effort::ESTIMATE)¶
Real-to-Real Transpose-only Plan constructor.
- Parameters:
dims – [in] Vector with global dimensions in reversed order.
dims.size()must be 2 or 3precision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2R(int8_t ndims, const int32_t *dims, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Real Generic Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Buffer of size
ndimswith global dimensions in reversed order.kinds – [in] Buffer of size
ndimswith Real FFT kinds in reversed order. Can be nullptr ifexecutor== Executor::NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2R(const Pencil &pencil, Precision precision, Effort effort = Effort::ESTIMATE)¶
Real-to-Real Transpose-only Plan constructor.
-
explicit PlanR2R(const Pencil &pencil, const std::vector<R2RKind> &kinds, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Real Plan constructor.
- Parameters:
pencil – [in] Initialized Pencil object.
kinds – [in] Real FFT kinds in reversed order. Can be empty vector if
executor== Executor::NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2R(const Pencil &pencil, const R2RKind *kinds = nullptr, MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶
Real-to-Real Generic Plan constructor.
- Parameters:
pencil – [in] Initialized Pencil object.
kinds – [in] Buffer of size
ndimswith Real FFT kinds in reversed order. Can be nullptr ifexecutor== Executor::NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
- Throws:
Exception – In case error occurs during plan creation
-
explicit PlanR2R(const std::vector<int32_t> &dims, const std::vector<R2RKind> &kinds = std::vector<R2RKind>(), MPI_Comm comm = MPI_COMM_WORLD, Precision precision = Precision::DOUBLE, Effort effort = Effort::ESTIMATE, Executor executor = Executor::NONE)¶