C API Reference¶
This page describes all types, functions and macros available in dtFFT C API.
In order to use them user have to #include <dtfft.h>.
Note
Not all of the API listed below can be accessible in runtime.
For example dtfft_platform_t can only be used if dtFFT compiled with CUDA support.
Predefined Macros¶
-
DTFFT_VERSION_MAJOR¶
dtFFT Major Version
-
DTFFT_VERSION_MINOR¶
dtFFT Minor Version
-
DTFFT_VERSION_PATCH¶
dtFFT Patch Version
-
DTFFT_VERSION_CODE¶
dtFFT Version Code.
Can be used for version comparison
-
DTFFT_VERSION(X, Y, Z)¶
Generates Version Code based on Major, Minor, Patch.
-
DTFFT_CALL(call)¶
Safe call macro.
Should be used to check error codes returned by
dtFFT.Writes an error message to
stderrand callsMPI_Abortif an error occurs.Example
DTFFT_CALL( dtfft_transpose(plan, a, b) )
Enumerators¶
-
enum dtfft_error_t¶
This enum lists the different error codes that
dtFFTcan return.See also
Values:
-
enumerator DTFFT_SUCCESS¶
Successful execution.
-
enumerator DTFFT_ERROR_MPI_FINALIZED¶
MPI_Init is not called or MPI_Finalize has already been called.
-
enumerator DTFFT_ERROR_PLAN_NOT_CREATED¶
Plan not created.
-
enumerator DTFFT_ERROR_INVALID_TRANSPOSE_TYPE¶
Invalid
transpose_typeprovided.
-
enumerator DTFFT_ERROR_INVALID_N_DIMENSIONS¶
Invalid Number of dimensions provided.
Valid options are 2 and 3
-
enumerator DTFFT_ERROR_INVALID_DIMENSION_SIZE¶
One or more provided dimension sizes <= 0.
-
enumerator DTFFT_ERROR_INVALID_COMM_TYPE¶
Invalid communicator type provided.
-
enumerator DTFFT_ERROR_INVALID_PRECISION¶
Invalid
precisionparameter provided.
-
enumerator DTFFT_ERROR_INVALID_EFFORT¶
Invalid
effortparameter provided.
-
enumerator DTFFT_ERROR_INVALID_EXECUTOR¶
Invalid
executorparameter provided.
-
enumerator DTFFT_ERROR_INVALID_COMM_DIMS¶
Number of dimensions in provided Cartesian communicator > Number of dimension passed to
createsubroutine.
-
enumerator DTFFT_ERROR_INVALID_COMM_FAST_DIM¶
Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.
-
enumerator DTFFT_ERROR_MISSING_R2R_KINDS¶
For R2R plan,
kindsparameter must be passed ifexecutor!=DTFFT_EXECUTOR_NONE
-
enumerator DTFFT_ERROR_INVALID_R2R_KINDS¶
Invalid values detected in
kindsparameter.
-
enumerator DTFFT_ERROR_R2C_TRANSPOSE_PLAN¶
Transpose plan is not supported in R2C, use R2R or C2C plan instead.
-
enumerator DTFFT_ERROR_INPLACE_TRANSPOSE¶
Inplace transpose is not supported.
-
enumerator DTFFT_ERROR_INVALID_AUX¶
Invalid
auxbuffer provided.
-
enumerator DTFFT_ERROR_INVALID_LAYOUT¶
Invalid
layoutpassed todtfft_get_pencil
-
enumerator DTFFT_ERROR_INVALID_USAGE¶
Invalid API Usage.
-
enumerator DTFFT_ERROR_PLAN_IS_CREATED¶
Trying to create already created plan.
-
enumerator DTFFT_ERROR_R2R_FFT_NOT_SUPPORTED¶
Selected
executordo not support R2R FFTs.
-
enumerator DTFFT_ERROR_ALLOC_FAILED¶
Internal call of
dtfft_mem_allocfailed.
-
enumerator DTFFT_ERROR_FREE_FAILED¶
Internal call of
dtfft_mem_freefailed.
-
enumerator DTFFT_ERROR_INVALID_ALLOC_BYTES¶
Invalid
alloc_bytesprovided.
-
enumerator DTFFT_ERROR_DLOPEN_FAILED¶
Failed to dynamically load library.
-
enumerator DTFFT_ERROR_DLSYM_FAILED¶
Failed to dynamically load symbol.
-
enumerator DTFFT_ERROR_PENCIL_ARRAYS_SIZE_MISMATCH¶
Deprecated/unused: R2C transpose call restriction (kept for backward compatibility of error code numbering)
Sizes of
startsandcountsarrays passed todtfft_pencil_tconstructor do not match
-
enumerator DTFFT_ERROR_PENCIL_ARRAYS_INVALID_SIZES¶
Sizes of
startsandcounts< 2 or > 3 provided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_INVALID_COUNTS¶
Invalid
countsprovided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_INVALID_STARTS¶
Invalid
startsprovided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_SHAPE_MISMATCH¶
Processes have same lower bounds but different sizes in some dimensions.
-
enumerator DTFFT_ERROR_PENCIL_OVERLAP¶
Pencil overlap detected, i.e.
two processes share same part of global space
-
enumerator DTFFT_ERROR_PENCIL_NOT_CONTINUOUS¶
Local pencils do not cover the global space without gaps.
-
enumerator DTFFT_ERROR_PENCIL_NOT_INITIALIZED¶
Pencil is not initialized, i.e.
constructorsubroutine was not called
-
enumerator DTFFT_ERROR_INVALID_MEASURE_WARMUP_ITERS¶
Invalid
n_measure_warmup_itersprovided.
-
enumerator DTFFT_ERROR_INVALID_MEASURE_ITERS¶
Invalid
n_measure_itersprovided.
-
enumerator DTFFT_ERROR_INVALID_REQUEST¶
Invalid
dtfft_request_tprovided.
-
enumerator DTFFT_ERROR_TRANSPOSE_ACTIVE¶
Attempting to execute already active transposition.
-
enumerator DTFFT_ERROR_TRANSPOSE_NOT_ACTIVE¶
Attempting to finalize non-active transposition.
-
enumerator DTFFT_ERROR_INVALID_RESHAPE_TYPE¶
Invalid
reshape_typeprovided.
-
enumerator DTFFT_ERROR_RESHAPE_ACTIVE¶
Attempting to execute already active reshape.
-
enumerator DTFFT_ERROR_RESHAPE_NOT_ACTIVE¶
Attempting to finalize non-active reshape.
-
enumerator DTFFT_ERROR_INPLACE_RESHAPE¶
Inplace reshape is not supported.
-
enumerator DTFFT_ERROR_INVALID_EXECUTE_TYPE¶
R2C reshape was called.
Invalid
execute_typeprovided
-
enumerator DTFFT_ERROR_RESHAPE_NOT_SUPPORTED¶
Reshape is not supported for this plan.
-
enumerator DTFFT_ERROR_R2C_EXECUTE_CALLED¶
Execute called for transpose-only R2C Plan.
-
enumerator DTFFT_ERROR_INVALID_CART_COMM¶
Invalid cartesian communicator provided.
-
enumerator DTFFT_ERROR_INVALID_TRANSPOSE_MODE¶
Invalid transpose mode provided.
-
enumerator DTFFT_ERROR_GPU_INVALID_STREAM¶
Invalid stream provided.
-
enumerator DTFFT_ERROR_INVALID_BACKEND¶
Invalid backend provided.
-
enumerator DTFFT_ERROR_GPU_NOT_SET¶
Multiple MPI Processes located on same host share same GPU which is not supported.
-
enumerator DTFFT_ERROR_VKFFT_R2R_2D_PLAN¶
When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.
-
enumerator DTFFT_ERROR_BACKENDS_DISABLED¶
Passed
effort==DTFFT_PATIENTbut all Backends has been disabled bydtfft_config_t
-
enumerator DTFFT_ERROR_NOT_DEVICE_PTR¶
One of pointers passed to
dtfft_executeordtfft_transposecannot be accessed from device.
-
enumerator DTFFT_ERROR_NOT_NVSHMEM_PTR¶
One of pointers passed to
dtfft_executeordtfft_transposeis not anNVSHMEMpointer.
-
enumerator DTFFT_ERROR_INVALID_PLATFORM¶
Invalid platform provided.
-
enumerator DTFFT_ERROR_INVALID_PLATFORM_EXECUTOR¶
Invalid executor provided for selected platform.
-
enumerator DTFFT_ERROR_INVALID_PLATFORM_BACKEND¶
Invalid backend provided for selected platform.
-
enumerator DTFFT_ERROR_INVALID_ACCESS_MODE¶
Invalid access mode provided.
-
enumerator DTFFT_ERROR_COMPRESSION_CUDA_NOT_SUPPORTED¶
CUDA support is not available for compression.
-
enumerator DTFFT_ERROR_COMPRESSION_INVALID_RATE¶
Invalid compression rate.
-
enumerator DTFFT_ERROR_COMPRESSION_INVALID_PRECISION¶
Invalid compression precision.
-
enumerator DTFFT_ERROR_COMPRESSION_INVALID_TOLERANCE¶
Invalid compression tolerance.
-
enumerator DTFFT_ERROR_COMPRESSION_INVALID_MODE¶
Invalid compression mode.
-
enumerator DTFFT_ERROR_COMPRESSION_INVALID_LIBRARY¶
Invalid compression library.
-
enumerator DTFFT_ERROR_COMPRESSION_NOT_USED¶
Compressed backends are not used for this plan.
-
enumerator DTFFT_SUCCESS¶
-
enum dtfft_execute_t¶
This enum lists valid
execute_typeparameters that can be passed todtfft_execute.Values:
-
enumerator DTFFT_EXECUTE_FORWARD¶
Perform XYZ –> YZX –> ZXY plan execution (Forward)
-
enumerator DTFFT_EXECUTE_BACKWARD¶
Perform ZXY –> YZX –> XYZ plan execution (Backward)
-
enumerator DTFFT_EXECUTE_FORWARD¶
-
enum dtfft_transpose_t¶
This enum lists valid transpose_type parameters that can be passed to
dtfft_transpose.Values:
-
enumerator DTFFT_TRANSPOSE_X_TO_Y¶
Transpose from Fortran X aligned to Fortran Y aligned.
-
enumerator DTFFT_TRANSPOSE_Y_TO_X¶
Transpose from Fortran Y aligned to Fortran X aligned.
-
enumerator DTFFT_TRANSPOSE_Y_TO_Z¶
Transpose from Fortran Y aligned to Fortran Z aligned.
-
enumerator DTFFT_TRANSPOSE_Z_TO_Y¶
Transpose from Fortran Z aligned to Fortran Y aligned.
-
enumerator DTFFT_TRANSPOSE_X_TO_Z¶
Transpose from Fortran X aligned to Fortran Z aligned.
Note
This value is valid to pass only in 3D Plan and value returned by
dtfft_get_z_slab_enabledmust betrue
-
enumerator DTFFT_TRANSPOSE_Z_TO_X¶
Transpose from Fortran Z aligned to Fortran X aligned.
Note
This value is valid to pass only in 3D Plan and value returned by
dtfft_get_z_slab_enabledmust betrue
-
enumerator DTFFT_TRANSPOSE_X_TO_Y¶
-
enum dtfft_precision_t¶
This enum lists valid
precisionvalues that can be passed while creating plan.Values:
-
enumerator DTFFT_SINGLE¶
Use Single precision.
-
enumerator DTFFT_DOUBLE¶
Use Double precision.
-
enumerator DTFFT_SINGLE¶
-
enum dtfft_effort_t¶
This enum lists valid
effortvalues that can be passed while creating plan.Values:
-
enumerator DTFFT_ESTIMATE¶
Create plan as fast as possible.
-
enumerator DTFFT_MEASURE¶
Will attempt to find best MPI Grid decomposition.
Passing this flag and MPI Communicator with cartesian topology to
dtfft_create_plan_*is same asDTFFT_ESTIMATE.
-
enumerator DTFFT_PATIENT¶
Same as
DTFFT_MEASUREplus autotune will try to find best backend.
-
enumerator DTFFT_EXHAUSTIVE¶
Same as
DTFFT_PATIENTplus will autotune all possible kernels and reshape backends to find best configuration.
-
enumerator DTFFT_ESTIMATE¶
-
enum dtfft_executor_t¶
This enum lists available FFT executors.
Values:
-
enumerator DTFFT_EXECUTOR_NONE¶
Do not create any FFT plans.
Creates transpose only plan.
-
enumerator DTFFT_EXECUTOR_FFTW3¶
FFTW3 Executor (Host only)
-
enumerator DTFFT_EXECUTOR_MKL¶
MKL DFTI Executor (Host only)
-
enumerator DTFFT_EXECUTOR_CUFFT¶
CUFFT Executor (GPU Only)
-
enumerator DTFFT_EXECUTOR_VKFFT¶
VkFFT Executor (GPU Only)
-
enumerator DTFFT_EXECUTOR_NONE¶
-
enum dtfft_r2r_kind_t¶
This enum lists the different R2R FFT kinds.
Values:
-
enumerator DTFFT_DCT_1¶
DCT-I (Logical N=2*(n-1), inverse is
DTFFT_DCT_1)
-
enumerator DTFFT_DCT_2¶
DCT-II (Logical N=2*n, inverse is
DTFFT_DCT_3)
-
enumerator DTFFT_DCT_3¶
DCT-III (Logical N=2*n, inverse is
DTFFT_DCT_2)
-
enumerator DTFFT_DCT_4¶
DCT-IV (Logical N=2*n, inverse is
DTFFT_DCT_4)
-
enumerator DTFFT_DST_1¶
DST-I (Logical N=2*(n+1), inverse is
DTFFT_DST_1)
-
enumerator DTFFT_DST_2¶
DST-II (Logical N=2*n, inverse is
DTFFT_DST_3)
-
enumerator DTFFT_DST_3¶
DST-III (Logical N=2*n, inverse is
DTFFT_DST_2)
-
enumerator DTFFT_DST_4¶
DST-IV (Logical N=2*n, inverse is
DTFFT_DST_4)
-
enumerator DTFFT_DCT_1¶
-
enum dtfft_backend_t¶
This enum lists the different available backend options.
See also
Values:
-
enumerator DTFFT_BACKEND_MPI_DATATYPE¶
Backend that uses MPI datatypes.
This is default backend for Host platform.
Not really recommended to use for GPU usage, since it is a ‘million’ times slower than other backends. Not available for autotune when
effortisDTFFT_PATIENTon CUDA platform.
-
enumerator DTFFT_BACKEND_MPI_P2P¶
MPI peer-to-peer algorithm.
-
enumerator DTFFT_BACKEND_MPI_P2P_PIPELINED¶
MPI peer-to-peer algorithm with overlapping data copying and unpacking.
-
enumerator DTFFT_BACKEND_MPI_A2A¶
MPI backend using MPI_Alltoallv.
-
enumerator DTFFT_BACKEND_MPI_RMA¶
MPI backend using one-sided communications.
-
enumerator DTFFT_BACKEND_MPI_RMA_PIPELINED¶
MPI backend using pipelined one-sided communications.
-
enumerator DTFFT_BACKEND_MPI_P2P_SCHEDULED¶
MPI peer-to-peer algorithm with scheduled communication.
-
enumerator DTFFT_BACKEND_MPI_P2P_FUSED¶
MPI peer-to-peer pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.
-
enumerator DTFFT_BACKEND_MPI_RMA_FUSED¶
MPI RMA pipelined algorithm with overlapping packing, exchange and unpacking with scheduled communication.
-
enumerator DTFFT_BACKEND_MPI_P2P_COMPRESSED¶
Extension of Backend.MPI_P2P_FUSED Data is getting compressed before sending and decompressed after receiving.
-
enumerator DTFFT_BACKEND_MPI_RMA_COMPRESSED¶
Extension of Backend.MPI_RMA_FUSED Data is getting compressed before sending and decompressed after receiving.
-
enumerator DTFFT_BACKEND_NCCL¶
NCCL backend.
-
enumerator DTFFT_BACKEND_NCCL_PIPELINED¶
NCCL backend with overlapping data copying and unpacking.
-
enumerator DTFFT_BACKEND_NCCL_COMPRESSED¶
NCCL backend that performs compression before data exchange and decompression after.
-
enumerator DTFFT_BACKEND_CUFFTMP¶
cuFFTMp backend
-
enumerator DTFFT_BACKEND_CUFFTMP_PIPELINED¶
cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance
-
enumerator DTFFT_BACKEND_ADAPTIVE¶
Adaptive backend selection: during plan creation dtFFT benchmarks multiple backends and selects the fastest backend independently for each transpose/reshape operation.
The selection is fixed for the lifetime of the plan.
Note
Can only be used when effort >=
DTFFT_PATIENT.Note
Currently only available for HOST execution platform
-
enumerator DTFFT_BACKEND_NONE¶
Backend is not defined.
This value is used when no backend is selected, for example when executing on a single process.
Note
This value should never be set by user directly. It can only be returned by the library.
-
enumerator DTFFT_BACKEND_MPI_DATATYPE¶
-
enum dtfft_transpose_mode_t¶
This enum specifies at which stage the local transposition is performed during global exchange.
It affects only Generic backends that perform explicit packing/unpacking.
Values:
-
enumerator DTFFT_TRANSPOSE_MODE_PACK¶
Perform transposition during the packing stage (Sender side).
-
enumerator DTFFT_TRANSPOSE_MODE_UNPACK¶
Perform transposition during the unpacking stage (Receiver side).
-
enumerator DTFFT_TRANSPOSE_MODE_PACK¶
-
enum dtfft_access_mode_t¶
This enum lists valid
access_modeparameters that can be passed todtfft_config_t.Values:
-
enumerator DTFFT_ACCESS_MODE_WRITE¶
Optimize for write access (Aligned writing).
This is the default mode.
-
enumerator DTFFT_ACCESS_MODE_READ¶
Optimize for read access (Aligned reading)
-
enumerator DTFFT_ACCESS_MODE_WRITE¶
-
enum dtfft_platform_t¶
Enum that specifies the execution platform, such as Host, CUDA, or HIP.
Values:
-
enumerator DTFFT_PLATFORM_HOST¶
Host.
-
enumerator DTFFT_PLATFORM_CUDA¶
CUDA.
-
enumerator DTFFT_PLATFORM_HOST¶
-
enum dtfft_reshape_t¶
This enum lists valid
reshape_typeparameters that can be passed todtfft_reshape.Values:
-
enumerator DTFFT_RESHAPE_X_BRICKS_TO_PENCILS¶
Reshape from X-bricks to X-pencils.
-
enumerator DTFFT_RESHAPE_X_PENCILS_TO_BRICKS¶
Reshape from X-pencils to X-bricks.
-
enumerator DTFFT_RESHAPE_Z_BRICKS_TO_PENCILS¶
Reshape from Z-bricks to Z-pencils.
-
enumerator DTFFT_RESHAPE_Z_PENCILS_TO_BRICKS¶
Reshape from Z-pencils to Z-bricks.
-
enumerator DTFFT_RESHAPE_Y_BRICKS_TO_PENCILS¶
Reshape from Y-bricks to Y-pencils This is to be used in 2D Plans.
-
enumerator DTFFT_RESHAPE_Y_PENCILS_TO_BRICKS¶
Reshape from Y-pencils to Y-bricks This is to be used in 2D Plans.
-
enumerator DTFFT_RESHAPE_X_BRICKS_TO_PENCILS¶
-
enum dtfft_layout_t¶
This enum represents different data layouts used in dtFFT and it should be used to retrieve layout information from plans.
Values:
-
enumerator DTFFT_LAYOUT_X_BRICKS¶
X-brick layout: data is distributed along all dimensions.
-
enumerator DTFFT_LAYOUT_X_PENCILS¶
X-pencil layout: data is distributed along Y and Z dimensions.
-
enumerator DTFFT_LAYOUT_X_PENCILS_FOURIER¶
X-pencil layout obtained after executing FFT for R2C plan: data is distributed along Y and Z dimensions.
-
enumerator DTFFT_LAYOUT_Y_PENCILS¶
Y-pencil layout: data is distributed along X and Z dimensions.
-
enumerator DTFFT_LAYOUT_Z_PENCILS¶
Z-pencil layout: data is distributed along X and Y dimensions.
-
enumerator DTFFT_LAYOUT_Z_BRICKS¶
Z-brick layout: data is distributed along all dimensions.
-
enumerator DTFFT_LAYOUT_X_BRICKS¶
-
enum dtfft_compression_lib_t¶
This enum lists valid compression library parameters.
Values:
-
enumerator DTFFT_COMPRESSION_LIB_ZFP¶
ZFP compression library.
-
enumerator DTFFT_COMPRESSION_LIB_ZFP¶
-
enum dtfft_compression_mode_t¶
This enum lists valid compression mode parameters.
Values:
-
enumerator DTFFT_COMPRESSION_MODE_LOSSLESS¶
Lossless compression mode.
-
enumerator DTFFT_COMPRESSION_MODE_FIXED_RATE¶
Fixed rate compression mode.
-
enumerator DTFFT_COMPRESSION_MODE_FIXED_PRECISION¶
Fixed precision compression mode.
-
enumerator DTFFT_COMPRESSION_MODE_FIXED_ACCURACY¶
Fixed accuracy compression mode.
-
enumerator DTFFT_COMPRESSION_MODE_LOSSLESS¶
Types¶
-
typedef void *dtfft_plan_t¶
Structure to hold plan data.
-
struct dtfft_pencil_t¶
Structure to hold pencil decomposition info.
There are two ways users might find pencils useful inside dtFFT:
To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructors.
To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.
In order to create plan using dtfft_pencil_t, user need to provide
ndims,startsandcountsarrays, other values will be ignored.When pencil is returned from
dtfft_get_pencil, all pencil properties are defined.See also
dtfft_get_pencil dtfft_create_plan_r2r_pencil dtfft_create_plan_c2c_pencil dtfft_create_plan_r2c_pencil
Public Members
-
uint8_t dim¶
Aligned dimension ID starting from 1.
-
uint8_t ndims¶
Number of dimensions in a pencil.
-
int32_t starts[3]¶
Local starts in natural Fortran order.
If
ndims== 2, then only first two elements are defined
-
int32_t counts[3]¶
Local counts in natural Fortran order.
If
ndims== 2, then only first two elements are defined
-
size_t size¶
Total number of elements in a pencil.
-
struct dtfft_config_t¶
Struct that can be used to set additional configuration parameters to dtFFT.
See also
Public Members
-
bool enable_log¶
Should dtFFT print additional information or not.
Default is
false.
-
bool enable_z_slab¶
Enables Z-slab optimization.
Default is
trueOne should consider disabling Z-slab optimization in order to resolve
DTFFT_ERROR_VKFFT_R2R_2D_PLANerror or when underlying FFT implementation of 2D plan is too slow.In all other cases, Z-slab is considered to be always faster.
-
bool enable_y_slab¶
Enables Y-slab optimization.
Default is
false.If true, then dtFFT will skip the transpose step between Y and Z aligned layouts during call to
dtfft_execute.One should consider disabling Y-slab optimization when the underlying FFT implementation of the 2D plan is too slow.
In all other cases, Y-slab is considered to be always faster.
-
int32_t n_measure_warmup_iters¶
Number of warmup iterations to execute during backend and kernel autotuning when effort level is
DTFFT_MEASUREor higher.Default is
2.
-
int32_t n_measure_iters¶
Number of iterations to execute during backend and kernel autotuning when effort level is
DTFFT_MEASUREor higher.Default is
5.
-
dtfft_platform_t platform¶
Selects platform to execute plan.
Default is
DTFFT_PLATFORM_HOST.This option is only available when dtFFT is built with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related. This enables a single library installation to support both host and CUDA plans.
Note
This option is only defined when dtFFT is built with CUDA support.
-
dtfft_stream_t stream¶
Main CUDA stream that will be used in dtFFT.
This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by
dtfft_get_streamfunction. When user sets stream he is responsible of destroying it.Stream must not be destroyed before call to
dtfft_destroy.Note
This option is only defined when dtFFT is built with CUDA support.
-
dtfft_backend_t backend¶
Backend that will be used by dtFFT when
effortisDTFFT_ESTIMATEorDTFFT_MEASURE.Default for HOST platform is
DTFFT_BACKEND_MPI_DATATYPE.Default for CUDA platform is
DTFFT_BACKEND_NCCLif NCCL is enabled, otherwiseDTFFT_BACKEND_MPI_P2P.
-
dtfft_backend_t reshape_backend¶
Backend that will be used by dtFFT for data reshaping from bricks to pencils and vice versa when
effortisDTFFT_ESTIMATEorDTFFT_MEASURE.Default for HOST platform is
DTFFT_BACKEND_MPI_DATATYPE.Default for CUDA platform is
DTFFT_BACKEND_NCCLif NCCL is enabled, otherwiseDTFFT_BACKEND_MPI_P2P.
-
bool enable_datatype_backend¶
Should
DTFFT_BACKEND_MPI_DATATYPEbe considered for autotuning wheneffortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
trueThis option works only when executing on a host.
-
bool enable_mpi_backends¶
Should MPI Backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
false.This option applies to all
DTFFT_BACKEND_MPI_*backends, exceptDTFFT_BACKEND_MPI_DATATYPE`.The following applies only to CUDA builds. MPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.
For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.
One of the workarounds is to disable MPI Backends by default, which is done here.
Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to
mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithms
-
bool enable_pipelined_backends¶
Should pipelined backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
true.
-
bool enable_rma_backends¶
Should RMA backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
true.
-
bool enable_fused_backends¶
Should fused backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
true.
-
bool enable_nccl_backends¶
Should NCCL Backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
true.Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_nvshmem_backends¶
Should NVSHMEM Backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
true.Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_kernel_autotune¶
Should dtFFT try to optimize kernel launch parameters during plan creation when
effortis belowDTFFT_EXHAUSTIVE.Default is
false.Kernel optimization is always enabled for
DTFFT_EXHAUSTIVEeffort level. Setting this option to true enables kernel optimization for lower effort levels (DTFFT_ESTIMATE,DTFFT_MEASURE,DTFFT_PATIENT). This may increase plan creation time but can improve runtime performance. Since kernel optimization is performed without data transfers, the time increase is usually minimal.
-
bool enable_fourier_reshape¶
Should dtFFT execute reshapes from pencils to bricks and vice versa in Fourier space during calls to execute.
Default is
false.When enabled, data will be in brick layout in Fourier space, which may be useful for certain operations between forward and backward transforms. However, this requires additional data transpositions and will reduce overall FFT performance.
-
dtfft_transpose_mode_t transpose_mode¶
Specifies at which stage the local transposition is performed during global exchange when effort level is below
DTFFT_EXHAUSTIVE.Default is
DTFFT_TRANSPOSE_MODE_PACK.For
DTFFT_EXHAUSTIVEeffort level, dtFFT will always choose the best transpose mode based on internal autotuning.Note
This option only takes effect when platform is
DTFFT_PLATFORM_HOST
-
dtfft_access_mode_t access_mode¶
Specifies the memory access pattern (optimization target) for local transposition.
Default is
DTFFT_ACCESS_MODE_WRITE.This option allows user to force specific access mode (
DTFFT_ACCESS_MODE_WRITEorDTFFT_ACCESS_MODE_READ) when autotuning is disabled. When autotuning is enabled (e.g.effortisDTFFT_EXHAUSTIVE), this option is ignored and best access mode is selected automatically.
-
bool enable_compressed_backends¶
Should compressed backends be enabled when
effortisDTFFT_PATIENTorDTFFT_EXHAUSTIVE.Default is
false.Only fixed-rate compression can be used during autotuning, since it provides predictable performance characteristics and does not require data-dependent decisions at runtime. To enable compressed backends during autotuning, set this option to true, set compression type to
DTFFT_COMPRESSION_MODE_FIXED_RATEand provide desired compression rate.
-
dtfft_compression_config_t compression_config_transpose¶
Options for compression approach during transpositions.
-
dtfft_compression_config_t compression_config_reshape¶
Options for compression approach during reshape operations.
-
bool enable_log¶
-
typedef void *dtfft_stream_t¶
dtFFTstream representation.For CUDA platform this should be casted from
cudaStream_t.Example
cudaStream_t stream; cudaStreamCreate(&stream); dtfft_stream_t dtfftStream = (dtfft_stream_t)stream;
-
typedef void *dtfft_request_t¶
Helper type to manage asynchronous operations.
-
struct dtfft_compression_config_t¶
Struct that specifies compression configuration.
Public Members
-
dtfft_compression_lib_t compression_lib¶
Compression library to use.
-
dtfft_compression_mode_t compression_mode¶
Compression mode to use.
-
double rate¶
Rate for
DTFFT_COMPRESSION_MODE_FIXED_RATE
-
int32_t precision¶
Precision for
DTFFT_COMPRESSION_MODE_FIXED_PRECISION
-
double tolerance¶
Tolerance for
DTFFT_COMPRESSION_MODE_FIXED_ACCURACY
-
dtfft_compression_lib_t compression_lib¶
Functions¶
-
int32_t dtfft_get_version()¶
- Returns:
DTFFT_VERSION_CODEdefined during library compilation
-
const char *dtfft_get_error_string(dtfft_error_t error_code)¶
Gets the string description of an error code.
- Parameters:
error_code – [in] Error code to convert to string
- Returns:
Error string explaining error.
-
const char *dtfft_get_backend_string(dtfft_backend_t backend)¶
Returns null terminated string with name of backend provided as argument.
- Parameters:
backend – [in] Backend to represent
- Returns:
Character representation of backend.
-
const char *dtfft_get_precision_string(dtfft_precision_t precision)¶
Gets the string description of a precision level.
- Parameters:
precision – [in] Precision level to convert to string
- Returns:
String representation of
dtfft_precision_t.
-
const char *dtfft_get_executor_string(dtfft_executor_t executor)¶
Gets the string description of an executor type.
- Parameters:
executor – [in] Executor type to convert to string
- Returns:
String representation of
dtfft_executor_t.
-
dtfft_error_t dtfft_create_config(dtfft_config_t *config)¶
Sets default values to config.
- Parameters:
config – [out] Config to set default values into
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_set_config(const dtfft_config_t *config)¶
Set configuration values to dtFFT.
In order to take effect should be called before plan creation
- Parameters:
config – [in] Config to set
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_backend_pipelined(const dtfft_backend_t backend, bool *is_pipe)¶
Returns true if passed backend is pipelined and false otherwise.
- Parameters:
backend – [in] Backend to check
is_pipe – [out] Flag
- Returns:
Plan constructors¶
All plan constructors must be called after MPI_Init. Plan must be destroyed before call to MPI_Finalize.
-
dtfft_error_t dtfft_create_plan_r2r(int8_t ndims, const int32_t *dims, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Real-to-Real Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse order. dims[0] must be the fastest varyingkinds – [in] Array of size
ndimscontaining Real FFT kinds in reverse order. Can be NULL ifexecutor==DTFFT_EXECUTOR_NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2r_pencil(const dtfft_pencil_t *pencil, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Creates a Real-to-Real Plan using a pencil handle.
- Parameters:
pencil – [in] Pencil structure containing local dimensions and starts
kinds – [in] Array of size
ndimscontaining Real FFT kinds in reverse order. Can be NULL ifexecutor==DTFFT_EXECUTOR_NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_c2c(int8_t ndims, const int32_t *dims, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Complex-to-Complex Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse ordercomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor.
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_c2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Complex-to-Complex Plan constructor using a pencil structure.
- Parameters:
pencil – [in] Pencil handle
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2c(int8_t ndims, const int32_t *dims, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Real-to-Complex Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse ordercomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] Effort level for the plan creation
executor – [in] Type of external FFT executor
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, dtfft_precision_t precision, dtfft_effort_t effort, dtfft_executor_t executor, dtfft_plan_t *plan)¶
Creates a Real-to-Complex Plan using a pencil structure.
- Parameters:
pencil – [in] Pencil structure containing local dimensions and starts
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
Plan destructor¶
-
dtfft_error_t dtfft_destroy(dtfft_plan_t *plan)¶
Plan Destructor.
- Parameters:
plan – [inout] Plan handle
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Memory allocation¶
-
dtfft_error_t dtfft_mem_alloc(dtfft_plan_t plan, size_t alloc_bytes, void **ptr)¶
Allocates memory specific for this plan.
- Parameters:
plan – [in] Plan handle
alloc_bytes – [in] Number of bytes to allocate
ptr – [out] Allocated pointer
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_mem_free(dtfft_plan_t plan, void *ptr)¶
Frees memory specific for this plan.
- Parameters:
plan – [in] Plan handle
ptr – [inout] Allocated pointer
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Plan execution¶
-
dtfft_error_t dtfft_execute(dtfft_plan_t plan, void *in, void *out, dtfft_execute_t execute_type, void *aux)¶
Plan execution.
Neither
innoroutare allowed to beNULL. The same pointer can safely be passed to bothinandout.Note
This function is not supported for transpose-only R2C plans.
- Parameters:
plan – [in] Plan handle
in – [inout] Incoming buffer
out – [out] Result buffer
execute_type – [in] Type of transform.
aux – [inout] Optional auxiliary buffer. Can be
NULL. IfNULLduring first call to this function, then auxiliary will be allocated internally and freed after call todtfft_destroy. If provided, must be at leastdtfft_get_aux_bytesbytes.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_transpose(dtfft_plan_t plan, void *in, void *out, dtfft_transpose_t transpose_type, void *aux)¶
Transpose data in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
- Parameters:
plan – [in] Plan handle
in – [inout] Incoming buffer
out – [out] Transposed buffer
transpose_type – [in] Type of transpose.
aux – [inout] Optional auxiliary buffer. Can be
NULL. If provided, must be at leastdtfft_get_alloc_sizeelements.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_transpose_start(dtfft_plan_t plan, void *in, void *out, dtfft_transpose_t transpose_type, void *aux, dtfft_request_t *request)¶
Starts an asynchronous transpose operation.
Note
Both
inandoutbuffers must not be changed or freed until call todtfft_transpose_end.- Parameters:
plan – [in] Plan handle
in – [inout] Incoming buffer
out – [out] Transposed buffer
transpose_type – [in] Type of transpose.
aux – [inout] Optional auxiliary buffer. Can be
NULL. If provided, must be at leastdtfft_get_alloc_sizeelements.request – [out] Handle to manage the asynchronous operation.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_transpose_end(dtfft_plan_t plan, dtfft_request_t request)¶
Finalizes an asynchronous transpose operation.
- Parameters:
plan – [in] Plan handle
request – [inout] Handle to manage the asynchronous operation.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_reshape(dtfft_plan_t plan, void *in, void *out, dtfft_reshape_t reshape_type, void *aux)¶
Executes data reshape between brick and pencil decompositions.
- Parameters:
plan – [in] Plan handle
in – [inout] Input pointer
out – [out] Output pointer
reshape_type – [in] Type of reshape.
aux – [inout] Optional auxiliary buffer. Can be
NULL. If provided, must be at leastdtfft_get_alloc_sizeelements.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_reshape_start(dtfft_plan_t plan, void *in, void *out, dtfft_reshape_t reshape_type, void *aux, dtfft_request_t *request)¶
Starts an asynchronous reshape operation.
Note
Both
inandoutbuffers must not be changed or freed until call todtfft_reshape_end.- Parameters:
plan – [in] Plan handle
in – [inout] Input pointer
out – [out] Output pointer
reshape_type – [in] Type of reshape.
aux – [inout] Optional auxiliary buffer. Can be
NULL. If provided, must be at leastdtfft_get_alloc_sizeelements.request – [out] Handle to manage the asynchronous operation.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_reshape_end(dtfft_plan_t plan, dtfft_request_t request)¶
Finalizes an asynchronous reshape operation.
- Parameters:
plan – [in] Plan handle
request – [inout] Handle to manage the asynchronous operation.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Plan information¶
-
dtfft_error_t dtfft_report(dtfft_plan_t plan)¶
Prints plan-related information to stdout.
- Parameters:
plan – [in] Plan handle
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_report_compression(dtfft_plan_t plan)¶
Prints compression-related information to stdout.
- Parameters:
plan – [in] Plan handle
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_local_sizes(dtfft_plan_t plan, int32_t *in_starts, int32_t *in_counts, int32_t *out_starts, int32_t *out_counts, size_t *alloc_size)¶
Get grid decomposition information.
Results may differ on different MPI processes
- Parameters:
plan – [in] Plan handle
in_starts – [out] Starts of local portion of data in
realspace in reversed orderin_counts – [out] Number of elements of local portion of data in
realspace in reversed orderout_starts – [out] Starts of local portion of data in
fourierspace in reversed orderout_counts – [out] Number of elements of local portion of data in
fourierspace in reversed orderalloc_size – [out] Minimum number of elements to be allocated for
in,outbuffers required bydtfft_execute,dtfft_transpose, ordtfft_reshape. Size of each element in bytes can be obtained by callingdtfft_get_element_size.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_alloc_size(dtfft_plan_t plan, size_t *alloc_size)¶
Wrapper around
dtfft_get_local_sizesto obtain number of elements only.- Parameters:
plan – [in] Plan handle
alloc_size – [out] Minimum number of elements to be allocated for
inandoutbuffers required bydtfft_execute,dtfft_transpose, ordtfft_reshape. Size of each element in bytes can be obtained by callingdtfft_get_element_size.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_alloc_bytes(dtfft_plan_t plan, size_t *alloc_bytes)¶
Returns minimum number of bytes required for in and out buffers.
This function is a combination of two calls:
dtfft_get_alloc_sizeanddtfft_get_element_size. Returns minimum number of bytes to be allocated forinandoutbuffers required bydtfft_execute,dtfft_transpose, ordtfft_reshape.- Parameters:
plan – [in] Plan handle
alloc_bytes – [out] Number of bytes required
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_size(dtfft_plan_t plan, size_t *aux_size)¶
Gets the number of elements required for auxiliary buffer by
dtfft_execute.- Parameters:
plan – [in] Plan handle
aux_size – [out] Size of auxiliary buffer in bytes.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_bytes(dtfft_plan_t plan, size_t *aux_bytes)¶
Gets the number of bytes required for auxiliary buffer by
dtfft_execute.- Parameters:
plan – [in] Plan handle
aux_bytes – [out] Number of bytes required for auxiliary buffer.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_size_reshape(dtfft_plan_t plan, size_t *aux_size)¶
Gets the number of elements required for auxiliary buffer by
dtfft_reshape.- Parameters:
plan – [in] Plan handle
aux_size – [out] Size of auxiliary buffer in elements.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_bytes_reshape(dtfft_plan_t plan, size_t *aux_bytes)¶
Gets the number of bytes required for auxiliary buffer by
dtfft_reshape.- Parameters:
plan – [in] Plan handle
aux_bytes – [out] Number of bytes required for auxiliary buffer.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_size_transpose(dtfft_plan_t plan, size_t *aux_size)¶
Gets the number of elements required for auxiliary buffer by
dtfft_transpose.- Parameters:
plan – [in] Plan handle
aux_size – [out] Size of auxiliary buffer in elements.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_aux_bytes_transpose(dtfft_plan_t plan, size_t *aux_bytes)¶
Gets the number of bytes required for auxiliary buffer by
dtfft_transpose.- Parameters:
plan – [in] Plan handle
aux_bytes – [out] Number of bytes required for auxiliary buffer.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_element_size(dtfft_plan_t plan, size_t *element_size)¶
Obtains number of bytes required to store single element by this plan.
- Parameters:
plan – [in] Plan handle
element_size – [out] Size of element in bytes
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_pencil(dtfft_plan_t plan, dtfft_layout_t layout, dtfft_pencil_t *pencil)¶
Obtains pencil information from plan.
This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.
- Parameters:
plan – [in] Plan handle
layout – [in] Required layout of the pencil
pencil – [out] Pencil data
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_z_slab_enabled(dtfft_plan_t plan, bool *is_z_slab_enabled)¶
Checks if plan is using Z-slab optimization.
If
truethen flagsDTFFT_TRANSPOSE_X_TO_ZandDTFFT_TRANSPOSE_Z_TO_Xwill be valid to pass todtfft_transpose.- Parameters:
plan – [in] Plan handle
is_z_slab_enabled – [out] Boolean value if Z-slab is used.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_y_slab_enabled(dtfft_plan_t plan, bool *is_y_slab_enabled)¶
Checks if plan is using Y-slab optimization.
If
truethendtFFTwill skip the transpose step between Y and Z aligned layouts during call todtfft_execute.- Parameters:
plan – [in] Plan handle
is_y_slab_enabled – [out] Boolean value if Y-slab is used.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_stream(dtfft_plan_t plan, dtfft_stream_t *stream)¶
Returns stream associated with
dtFFTplan.This can either be stream passed by user to
dtfft_set_configor stream created internally. Returns NULL pointer if plan’s platform isDTFFT_PLATFORM_HOST.- Parameters:
plan – [in] Plan handle
stream – [out] CUDA stream associated with plan
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_backend(dtfft_plan_t plan, dtfft_backend_t *backend)¶
Returns selected backend during autotune if
effortisDTFFT_PATIENT.If
effortpassed to any create function isDTFFT_ESTIMATEorDTFFT_MEASUREreturns value set bydtfft_set_configor default value, which isDTFFT_BACKEND_NCCLfor CUDA build andDTFFT_BACKEND_MPI_DATATYPEfor host build.- Parameters:
plan – [in] Plan handle
backend – [out] Selected backend
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_reshape_backend(dtfft_plan_t plan, dtfft_backend_t *backend)¶
Returns selected backend for reshape operations.
- Parameters:
plan – [in] Plan handle
backend – [out] Selected backend for reshape operations
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_platform(dtfft_plan_t plan, dtfft_platform_t *platform)¶
Returns plan execution platform .
- Parameters:
plan – [in] Plan handle
platform – [out] Plan platform
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_executor(dtfft_plan_t plan, dtfft_executor_t *executor)¶
Returns FFT executor used in plan.
- Parameters:
plan – [in] Plan handle
executor – [out] FFT Executor
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_precision(dtfft_plan_t plan, dtfft_precision_t *precision)¶
Returns precision of the plan.
- Parameters:
plan – [in] Plan handle
precision – [out] Precision of the plan
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_dims(dtfft_plan_t plan, int8_t *ndims, const int32_t *dims[])¶
Returns global dimensions of the plan.
Note
Do not free
dimsarray, it is freed when thedtfft_plan_tis destroyed.- Parameters:
plan – [in] Plan handle
ndims – [out] Number of dimensions in plan. User can pass NULL if this value is not needed.
dims – [out] Pointer of size
ndimscontaining global dimensions in reverse order dims[0] is the fastest varying. User can pass NULL if this value is not needed.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_grid_dims(dtfft_plan_t plan, int8_t *ndims, const int32_t *grid_dims[])¶
Returns grid decomposition dimensions of the plan.
Note
Do not free
grid_dimsarray, it is freed when thedtfft_plan_tis destroyed.- Parameters:
plan – [in] Plan handle
ndims – [out] Number of dimensions in plan. User can pass NULL if this value is not needed.
grid_dims – [out] Pointer of size
ndimscontaining grid decomposition dimensions in reverse order grid_dims[0] is the fastest varying and is always equal to 1. User can pass NULL if this value is not needed.
- Returns:
DTFFT_SUCCESSon success or error code on failure.