C API Reference¶
This page describes all types, functions and macros available in dtFFT C API.
In order to use them user have to #include <dtfft.h>.
Note
Not all of the API listed below can be accessible in runtime.
For example dtfft_create_plan_r2c() can only be used if dtFFT compiled with any FFT
Predefined Macros¶
-
DTFFT_VERSION_MAJOR¶
dtFFT Major Version
-
DTFFT_VERSION_MINOR¶
dtFFT Minor Version
-
DTFFT_VERSION_PATCH¶
dtFFT Patch Version
-
DTFFT_VERSION_CODE¶
dtFFT Version Code.
Can be used in Version comparison
-
DTFFT_VERSION(X, Y, Z)¶
Generates Version Code based on Major, Minor, Patch.
-
DTFFT_CALL(call)¶
Safe call macro.
Should be used to check error codes returned by
dtFFT.Writes an error message to
stderrand callsMPI_Abortif an error occurs.Example
DTFFT_CALL( dtfft_transpose(plan, a, b) )
Enumerators¶
-
enum dtfft_error_t¶
This enum lists the different error codes that
dtFFTcan return.See also
Values:
-
enumerator DTFFT_SUCCESS¶
Successful execution.
-
enumerator DTFFT_ERROR_MPI_FINALIZED¶
MPI_Init is not called or MPI_Finalize has already been called.
-
enumerator DTFFT_ERROR_PLAN_NOT_CREATED¶
Plan not created.
-
enumerator DTFFT_ERROR_INVALID_TRANSPOSE_TYPE¶
Invalid
transpose_typeprovided.
-
enumerator DTFFT_ERROR_INVALID_N_DIMENSIONS¶
Invalid Number of dimensions provided.
Valid options are 2 and 3
-
enumerator DTFFT_ERROR_INVALID_DIMENSION_SIZE¶
One or more provided dimension sizes <= 0.
-
enumerator DTFFT_ERROR_INVALID_COMM_TYPE¶
Invalid communicator type provided.
-
enumerator DTFFT_ERROR_INVALID_PRECISION¶
Invalid
precisionparameter provided.
-
enumerator DTFFT_ERROR_INVALID_EFFORT¶
Invalid
effortparameter provided.
-
enumerator DTFFT_ERROR_INVALID_EXECUTOR¶
Invalid
executorparameter provided.
-
enumerator DTFFT_ERROR_INVALID_COMM_DIMS¶
Number of dimensions in provided Cartesian communicator > Number of dimension passed to
createsubroutine.
-
enumerator DTFFT_ERROR_INVALID_COMM_FAST_DIM¶
Passed Cartesian communicator with number of processes in 1st (fastest varying) dimension > 1.
-
enumerator DTFFT_ERROR_MISSING_R2R_KINDS¶
For R2R plan,
kindsparameter must be passed ifexecutor!=DTFFT_EXECUTOR_NONE
-
enumerator DTFFT_ERROR_INVALID_R2R_KINDS¶
Invalid values detected in
kindsparameter.
-
enumerator DTFFT_ERROR_R2C_TRANSPOSE_PLAN¶
Transpose plan is not supported in R2C, use R2R or C2C plan instead.
-
enumerator DTFFT_ERROR_INPLACE_TRANSPOSE¶
Inplace transpose is not supported.
-
enumerator DTFFT_ERROR_INVALID_AUX¶
Invalid
auxbuffer provided.
-
enumerator DTFFT_ERROR_INVALID_DIM¶
Invalid
dimpassed todtfft_get_pencil
-
enumerator DTFFT_ERROR_INVALID_USAGE¶
Invalid API Usage.
-
enumerator DTFFT_ERROR_PLAN_IS_CREATED¶
Trying to create already created plan.
-
enumerator DTFFT_ERROR_R2R_FFT_NOT_SUPPORTED¶
Selected
executordo not support R2R FFTs.
-
enumerator DTFFT_ERROR_ALLOC_FAILED¶
Internal call of
dtfft_mem_allocfailed.
-
enumerator DTFFT_ERROR_FREE_FAILED¶
Internal call of
dtfft_mem_freefailed.
-
enumerator DTFFT_ERROR_INVALID_ALLOC_BYTES¶
Invalid
alloc_bytesprovided.
-
enumerator DTFFT_ERROR_DLOPEN_FAILED¶
Failed to dynamically load library.
-
enumerator DTFFT_ERROR_DLSYM_FAILED¶
Failed to dynamically load symbol.
-
enumerator DTFFT_ERROR_R2C_TRANSPOSE_CALLED¶
Calling to
dtfft_transposefor R2C plan is not allowed.
-
enumerator DTFFT_ERROR_PENCIL_ARRAYS_SIZE_MISMATCH¶
Sizes of
startsandcountsarrays passed todtfft_pencil_tconstructor do not match.
-
enumerator DTFFT_ERROR_PENCIL_ARRAYS_INVALID_SIZES¶
Sizes of
startsandcounts< 2 or > 3 provided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_INVALID_COUNTS¶
Invalid
countsprovided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_INVALID_STARTS¶
Invalid
startsprovided todtfft_pencil_tconstructor.
-
enumerator DTFFT_ERROR_PENCIL_SHAPE_MISMATCH¶
Processes have same lower bounds but different sizes in some dimensions.
-
enumerator DTFFT_ERROR_PENCIL_OVERLAP¶
Pencil overlap detected, i.e.
two processes share same part of global space
-
enumerator DTFFT_ERROR_PENCIL_NOT_CONTINUOUS¶
Local pencils do not cover the global space without gaps.
-
enumerator DTFFT_ERROR_PENCIL_NOT_INITIALIZED¶
Pencil is not initialized, i.e.
constructorsubroutine was not called
-
enumerator DTFFT_ERROR_INVALID_MEASURE_WARMUP_ITERS¶
Invalid
n_measure_warmup_itersprovided.
-
enumerator DTFFT_ERROR_INVALID_MEASURE_ITERS¶
Invalid
n_measure_itersprovided.
-
enumerator DTFFT_ERROR_GPU_INVALID_STREAM¶
Invalid stream provided.
-
enumerator DTFFT_ERROR_GPU_INVALID_BACKEND¶
Invalid GPU backend provided.
-
enumerator DTFFT_ERROR_GPU_NOT_SET¶
Multiple MPI Processes located on same host share same GPU which is not supported.
-
enumerator DTFFT_ERROR_VKFFT_R2R_2D_PLAN¶
When using R2R FFT and executor type is vkFFT and plan uses Z-slab optimization, it is required that types of R2R transform are same in X and Y directions.
-
enumerator DTFFT_ERROR_GPU_BACKENDS_DISABLED¶
Passed
effort==DTFFT_PATIENTbut all GPU Backends has been disabled bydtfft_config_t
-
enumerator DTFFT_ERROR_NOT_DEVICE_PTR¶
One of pointers passed to
dtfft_executeordtfft_transposecannot be accessed from device.
-
enumerator DTFFT_ERROR_NOT_NVSHMEM_PTR¶
One of pointers passed to
dtfft_executeordtfft_transposeis not anNVSHMEMpointer.
-
enumerator DTFFT_ERROR_INVALID_PLATFORM¶
Invalid platform provided.
-
enumerator DTFFT_ERROR_INVALID_PLATFORM_EXECUTOR_TYPE¶
Invalid executor provided for selected platform.
-
enumerator DTFFT_SUCCESS¶
-
enum dtfft_execute_t¶
This enum lists valid
execute_typeparameters that can be passed todtfft_execute.Values:
-
enumerator DTFFT_EXECUTE_FORWARD¶
Perform XYZ –> YXZ –> ZXY plan execution (Forward)
-
enumerator DTFFT_EXECUTE_BACKWARD¶
Perform ZXY –> YXZ –> XYZ plan execution (Backward)
-
enumerator DTFFT_EXECUTE_FORWARD¶
-
enum dtfft_transpose_t¶
This enum lists valid transpose_type parameters that can be passed to
dtfft_transpose.Values:
-
enumerator DTFFT_TRANSPOSE_X_TO_Y¶
Transpose from Fortran X aligned to Fortran Y aligned.
-
enumerator DTFFT_TRANSPOSE_Y_TO_X¶
Transpose from Fortran Y aligned to Fortran X aligned.
-
enumerator DTFFT_TRANSPOSE_Y_TO_Z¶
Transpose from Fortran Y aligned to Fortran Z aligned.
-
enumerator DTFFT_TRANSPOSE_Z_TO_Y¶
Transpose from Fortran Z aligned to Fortran Y aligned.
-
enumerator DTFFT_TRANSPOSE_X_TO_Z¶
Transpose from Fortran X aligned to Fortran Z aligned.
Note
This value is valid to pass only in 3D Plan and value returned by
dtfft_get_z_slab_enabledmust betrue
-
enumerator DTFFT_TRANSPOSE_Z_TO_X¶
Transpose from Fortran Z aligned to Fortran X aligned.
Note
This value is valid to pass only in 3D Plan and value returned by
dtfft_get_z_slab_enabledmust betrue
-
enumerator DTFFT_TRANSPOSE_X_TO_Y¶
-
enum dtfft_precision_t¶
This enum lists valid
precisionvalues that can be passed while creating plan.Values:
-
enumerator DTFFT_SINGLE¶
Use Single precision.
-
enumerator DTFFT_DOUBLE¶
Use Double precision.
-
enumerator DTFFT_SINGLE¶
-
enum dtfft_effort_t¶
This enum lists valid
effortvalues that can be passed while creating plan.Values:
-
enumerator DTFFT_ESTIMATE¶
Create plan as fast as possible.
-
enumerator DTFFT_MEASURE¶
Will attempt to find best MPI Grid decomposition.
Passing this flag and MPI Communicator with cartesian topology to
dtfft_create_plan_*is same asDTFFT_ESTIMATE.
-
enumerator DTFFT_PATIENT¶
Same as
DTFFT_MEASUREplus cycle through various send and receive MPI_Datatypes.For GPU Build this flag will run autotune procedure to find best backend
-
enumerator DTFFT_ESTIMATE¶
-
enum dtfft_executor_t¶
This enum lists available FFT executors.
Values:
-
enumerator DTFFT_EXECUTOR_NONE¶
Do not create any FFT plans.
Creates transpose only plan.
-
enumerator DTFFT_EXECUTOR_FFTW3¶
FFTW3 Executor (Host only)
-
enumerator DTFFT_EXECUTOR_MKL¶
MKL DFTI Executor (Host only)
-
enumerator DTFFT_EXECUTOR_CUFFT¶
CUFFT Executor (GPU Only)
-
enumerator DTFFT_EXECUTOR_VKFFT¶
VkFFT Executor (GPU Only)
-
enumerator DTFFT_EXECUTOR_NONE¶
-
enum dtfft_r2r_kind_t¶
This enum lists the different R2R FFT kinds.
Values:
-
enumerator DTFFT_DCT_1¶
DCT-I (Logical N=2*(n-1), inverse is
DTFFT_DCT_1)
-
enumerator DTFFT_DCT_2¶
DCT-II (Logical N=2*n, inverse is
DTFFT_DCT_3)
-
enumerator DTFFT_DCT_3¶
DCT-III (Logical N=2*n, inverse is
DTFFT_DCT_2)
-
enumerator DTFFT_DCT_4¶
DCT-IV (Logical N=2*n, inverse is
DTFFT_DCT_4)
-
enumerator DTFFT_DST_1¶
DST-I (Logical N=2*(n+1), inverse is
DTFFT_DST_1)
-
enumerator DTFFT_DST_2¶
DST-II (Logical N=2*n, inverse is
DTFFT_DST_3)
-
enumerator DTFFT_DST_3¶
DST-III (Logical N=2*n, inverse is
DTFFT_DST_2)
-
enumerator DTFFT_DST_4¶
DST-IV (Logical N=2*n, inverse is
DTFFT_DST_4)
-
enumerator DTFFT_DCT_1¶
-
enum dtfft_backend_t¶
This enum lists the different available GPU backend options.
See also
Values:
-
enumerator DTFFT_BACKEND_MPI_DATATYPE¶
Backend that uses MPI datatypes.
Not really recommended to use, since it is a million times slower than other backends. It is present here just to show how slow MPI Datatypes are for GPU usage.
-
enumerator DTFFT_BACKEND_MPI_P2P¶
MPI peer-to-peer algorithm.
-
enumerator DTFFT_BACKEND_MPI_P2P_PIPELINED¶
MPI peer-to-peer algorithm with overlapping data copying and unpacking.
-
enumerator DTFFT_BACKEND_MPI_A2A¶
MPI backend using MPI_Alltoallv.
-
enumerator DTFFT_BACKEND_NCCL¶
NCCL backend.
-
enumerator DTFFT_BACKEND_NCCL_PIPELINED¶
NCCL backend with overlapping data copying and unpacking.
-
enumerator DTFFT_BACKEND_CUFFTMP¶
cuFFTMp backend
-
enumerator DTFFT_BACKEND_CUFFTMP_PIPELINED¶
cuFFTMp backend that uses additional buffer to avoid extra copy and gain performance
-
enumerator DTFFT_BACKEND_MPI_DATATYPE¶
Types¶
-
typedef void *dtfft_plan_t¶
Structure to hold plan data.
-
struct dtfft_pencil_t¶
Structure to hold pencil decomposition info.
There are two ways users might find pencils useful inside dtFFT:
To create a Plan using users’s own grid decomposition, you can pass Pencil to Plan constructors.
To obtain Pencil from Plan in all possible layouts, in order to run FFT not available in dtFFT.
In order to create plan using dtfft_pencil_t, user need to provide
ndims,startsandcountsarrays, other values will be ignored.When pencil is returned from
dtfft_get_pencil, all pencil properties are defined.See also
dtfft_get_pencil dtfft_create_plan_r2r_pencil dtfft_create_plan_c2c_pencil dtfft_create_plan_r2c_pencil
Public Members
-
uint8_t dim¶
Aligned dimension id starting from 1.
-
uint8_t ndims¶
Number of dimensions in a pencil.
-
int32_t starts[3]¶
Local starts in natural Fortran order.
If
ndims== 2, then only first two elements are defined
-
int32_t counts[3]¶
Local counts in natural Fortran order.
If
ndims== 2, then only first two elements are defined
-
size_t size¶
Total number of elements in a pencil.
-
struct dtfft_config_t¶
Struct that can be used to set additional configuration parameters to dtFFT.
See also
Public Members
-
bool enable_log¶
Should dtFFT print additional information or not.
Default is
false.
-
bool enable_z_slab¶
Should dtFFT use Z-slab optimization or not.
Default is
trueOne should consider disabling Z-slab optimization in order to resolve
DTFFT_ERROR_VKFFT_R2R_2D_PLANerror OR when underlying FFT implementation of 2D plan is too slow.In all other cases it is considered that Z-slab is always faster, since it reduces number of data transpositions.
-
int32_t n_measure_warmup_iters¶
Defines the number of warmup iterations for transposition and data exchange to perform when
effortexceedsDTFFT_ESTIMATE.Default is
2.Setting this value to a higher number may improve accuracy of performance measurements, but will also increase the time spent in warmup.
-
int32_t n_measure_iters¶
Defines the number of actual iterations for transposition and data exchange to perform when
effortexceedsDTFFT_ESTIMATE.Default is
5.Setting this value to a higher number may improve accuracy of performance measurements, but will also increase the time spent in measurement.
-
dtfft_platform_t platform¶
Selects platform to execute plan.
Default is
DTFFT_PLATFORM_HOSTThis option is only defined in a build with device support. Even when dtFFT is built with device support, it does not necessarily mean that all plans must be device-related.
Note
This option is only defined when dtFFT is built with CUDA support.
-
dtfft_stream_t stream¶
Main CUDA stream that will be used in dtFFT.
This parameter is a placeholder for user to set custom stream. Stream that is actually used by dtFFT plan is returned by
dtfft_get_streamfunction. When user sets stream he is responsible of destroying it.Stream must not be destroyed before call to
dtfft_destroy.Note
This option is only defined when dtFFT is built with CUDA support.
-
dtfft_backend_t backend¶
Backend that will be used by dtFFT when
effortisDTFFT_ESTIMATEorDTFFT_MEASURE.Default is
DTFFT_BACKEND_NCCLNote
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_mpi_backends¶
Should MPI GPU Backends be enabled when
effortisDTFFT_PATIENTor not.Default is
falseMPI Backends are disabled by default during autotuning process due to OpenMPI Bug https://github.com/open-mpi/ompi/issues/12849 It was noticed that during plan autotuning GPU memory not being freed completely.
For example: 1024x1024x512 C2C, double precision, single GPU, using Z-slab optimization, with MPI backends enabled, plan autotuning will leak 8Gb GPU memory. Without Z-slab optimization, running on 4 GPUs, will leak 24Gb on each of the GPUs.
One of the workarounds is to disable MPI Backends by default, which is done here.
Other is to pass “–mca btl_smcuda_use_cuda_ipc 0” to
mpiexec, but it was noticed that disabling CUDA IPC seriously affects overall performance of MPI algorithmsNote
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_pipelined_backends¶
Should pipelined GPU backends be enabled when
effortisDTFFT_PATIENTor not.Default is
trueNote
Pipelined backends require additional buffer that user has no control over.
Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_nccl_backends¶
Should NCCL Backends be enabled when
effortisDTFFT_PATIENTor not.Default is
true.Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_nvshmem_backends¶
Should NVSHMEM Backends be enabled when
effortisDTFFT_PATIENTor not.Default is
true.Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_kernel_optimization¶
Should dtFFT try to optimize NVRTC kernel block size when
effortisDTFFT_PATIENTor not.Default is
true.Enabling this option will make autotuning process longer, but may result in better performance for some problem sizes. It is recommended to keep this option enabled.
Note
This option is only defined when dtFFT is built with CUDA support.
-
int32_t n_configs_to_test¶
Number of top theoretical best performing blocks of threads to test for transposition kernels when
effortisDTFFT_PATIENT.Default is
5. It is recommended to keep this value between 3 and 10. Maximum possible value is 25. Setting this value to zero or one will disable kernel optimization.Note
This option is only defined when dtFFT is built with CUDA support.
-
bool force_kernel_optimization¶
Whether to force kernel optimization when
effortis notDTFFT_PATIENT.Default is
false.Since kernel optimization is performed without data transfers, the overall autotuning time increase should not be significant.
Note
This option is only defined when dtFFT is built with CUDA support.
-
bool enable_log¶
-
typedef void *dtfft_stream_t¶
dtFFTstream representation.For CUDA platform this should be casted from
cudaStream_t.Example
cudaStream_t stream; cudaStreamCreate(&stream); dtfft_stream_t dtfftStream = (dtfft_stream_t)stream;
Functions¶
-
int32_t dtfft_get_version()¶
- Returns:
DTFFT_VERSION_CODEdefined during library compilation
-
const char *dtfft_get_error_string(const dtfft_error_t error_code)¶
Gets the string description of an error code.
- Parameters:
error_code – [in] Error code to convert to string
- Returns:
Error string explaining error.
-
const char *dtfft_get_backend_string(const dtfft_backend_t backend)¶
Returns null terminated string with name of backend provided as argument.
- Parameters:
backend – [in] Backend to represent
- Returns:
Character representation of backend.
-
const char *dtfft_get_precision_string(const dtfft_precision_t precision)¶
Gets the string description of a precision level.
- Parameters:
precision – [in] Precision level to convert to string
- Returns:
String representation of
dtfft_precision_t.
-
const char *dtfft_get_executor_string(const dtfft_executor_t executor)¶
Gets the string description of an executor type.
- Parameters:
executor – [in] Executor type to convert to string
- Returns:
String representation of
dtfft_executor_t.
-
dtfft_error_t dtfft_create_config(dtfft_config_t *config)¶
Sets default values to config.
- Parameters:
config – [out] Config to set default values into
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_set_config(const dtfft_config_t *config)¶
Set configuration values to dtFFT.
In order to take effect should be called before plan creation
- Parameters:
config – [in] Config to set
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Plan constructors¶
All plan constructors must be called after MPI_Init. Plan must be destroyed before call to MPI_Finalize.
-
dtfft_error_t dtfft_create_plan_r2r(const int8_t ndims, const int32_t *dims, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Real-to-Real Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse order dims[0] must be the fastest varyingkinds – [in] Array of size
ndimscontaining Real FFT kinds in reverse order. Can be NULL ifexecutor==DTFFT_EXECUTOR_NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] How thoroughly
dtFFTsearches for the optimal planexecutor – [in] Type of external FFT executor.
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2r_pencil(const dtfft_pencil_t *pencil, const dtfft_r2r_kind_t *kinds, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Creates a Real-to-Real Plan using a pencil handle.
- Parameters:
pencil – [in] Pencil structure containing local dimensions and starts
kinds – [in] Array of size
ndimscontaining Real FFT kinds in reverse order. Can be NULL ifexecutor==DTFFT_EXECUTOR_NONEcomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_c2c(const int8_t ndims, const int32_t *dims, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Complex-to-Complex Plan constructor.
- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse ordercomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] How thoroughly
dtFFTsearches for the optimal planexecutor – [in] Type of external FFT executor.
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_c2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Complex-to-Complex Plan constructor using a pencil structure.
- Parameters:
pencil – [in] Pencil handle
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2c(const int8_t ndims, const int32_t *dims, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Real-to-Complex Plan constructor.
Note
Parameter
executorcannot beDTFFT_EXECUTOR_NONE. Use C2C plan instead.Note
This function is only present in the API when
dtFFTwas compiled with any external FFT.- Parameters:
ndims – [in] Number of dimensions: 2 or 3
dims – [in] Array of size
ndimscontaining global dimensions in reverse ordercomm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of transform.
effort – [in] How thoroughly
dtFFTsearches for the optimal planexecutor – [in] Type of external FFT executor
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
-
dtfft_error_t dtfft_create_plan_r2c_pencil(const dtfft_pencil_t *pencil, MPI_Comm comm, const dtfft_precision_t precision, const dtfft_effort_t effort, const dtfft_executor_t executor, dtfft_plan_t *plan)¶
Creates a Real-to-Complex Plan using a pencil structure.
Note
Parameter
executorcannot beDTFFT_EXECUTOR_NONE. Use C2C plan instead.Note
This function is only present in the API when
dtFFTwas compiled with any external FFT.- Parameters:
pencil – [in] Pencil structure containing local dimensions and starts
comm – [in] MPI communicator:
MPI_COMM_WORLDor Cartesian communicatorprecision – [in] Precision of the transform
effort – [in] Effort level for the plan creation
executor – [in] Executor to be used for the plan
plan – [out] Plan handle ready to be executed
- Returns:
DTFFT_SUCCESSif plan was created, error code otherwise
Plan destructor¶
-
dtfft_error_t dtfft_destroy(dtfft_plan_t *plan)¶
Plan Destructor.
- Parameters:
plan – [inout] Plan handle
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Memory allocation¶
-
dtfft_error_t dtfft_mem_alloc(dtfft_plan_t plan, size_t alloc_bytes, void **ptr)¶
Allocates memory specific for this plan.
- Parameters:
plan – [in] Plan handle
alloc_bytes – [in] Number of bytes to allocate
ptr – [out] Allocated pointer
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_mem_free(dtfft_plan_t plan, void *ptr)¶
Frees memory specific for this plan.
- Parameters:
plan – [in] Plan handle
ptr – [inout] Allocated pointer
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Plan execution¶
-
dtfft_error_t dtfft_execute(dtfft_plan_t plan, void *in, void *out, const dtfft_execute_t execute_type, void *aux)¶
Plan execution.
Neither
innoroutare allowed to beNULL. The same pointer can safely be passed to bothinandout.- Parameters:
plan – [in] Plan handle
in – [inout] Incoming buffer
out – [out] Result buffer
execute_type – [in] Type of transform.
aux – [inout] Optional auxiliary buffer. Can be
NULL. IfNULLduring first call to this function, then auxiliary will be allocated internally and freed after call todtfft_destroy
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_transpose(dtfft_plan_t plan, void *in, void *out, const dtfft_transpose_t transpose_type)¶
Transpose data in single dimension, e.g.
X align -> Y align
- Attention
inandoutcannot be the same pointers
Note
This function is not supported for R2C plans. Use R2R or C2C plan instead.
- Parameters:
plan – [in] Plan handle
in – [inout] Incoming buffer
out – [out] Transposed buffer
transpose_type – [in] Type of transpose.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
Plan information¶
-
dtfft_error_t dtfft_report(dtfft_plan_t plan)¶
Prints plan-related information to stdout.
- Parameters:
plan – [in] Plan handle
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_local_sizes(dtfft_plan_t plan, int32_t *in_starts, int32_t *in_counts, int32_t *out_starts, int32_t *out_counts, size_t *alloc_size)¶
Get grid decomposition information.
Results may differ on different MPI processes
- Parameters:
plan – [in] Plan handle
in_starts – [out] Starts of local portion of data in
realspace in reversed orderin_counts – [out] Number of elements of local portion of data in
realspace in reversed orderout_starts – [out] Starts of local portion of data in
fourierspace in reversed orderout_counts – [out] Number of elements of local portion of data in
fourierspace in reversed orderalloc_size – [out] Minimum number of elements to be allocated for
in,outorauxbuffers. Size of each element in bytes can be obtained by callingdtfft_get_element_size.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_alloc_size(dtfft_plan_t plan, size_t *alloc_size)¶
Wrapper around
dtfft_get_local_sizesto obtain number of elements only.- Parameters:
plan – [in] Plan handle
alloc_size – [out] Minimum number of elements to be allocated for
in,outorauxbuffers. Size of each element in bytes can be obtained by callingdtfft_get_element_size.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_element_size(dtfft_plan_t plan, size_t *element_size)¶
Obtains number of bytes required to store single element by this plan.
- Parameters:
plan – [in] Plan handle
element_size – [out] Size of element in bytes
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_alloc_bytes(dtfft_plan_t plan, size_t *alloc_bytes)¶
Returns minimum number of bytes required to execute plan.
This function is a combination of two calls:
dtfft_get_alloc_sizeanddtfft_get_element_size- Parameters:
plan – [in] Plan handle
alloc_bytes – [out] Number of bytes required
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_pencil(dtfft_plan_t plan, int32_t dim, dtfft_pencil_t *pencil)¶
Obtains pencil information from plan.
This can be useful when user wants to use own FFT implementation, that is unavailable in dtFFT.
- Parameters:
plan – [in] Plan handle
dim – [in] Required dimension:
0 for XYZ layout (real space, R2C only)
1 for XYZ layout (real space for C2C and R2R plans and fourier space for R2C plans)
2 for YXZ layout
3 for ZXY layout
pencil – [out] Pencil data
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_z_slab_enabled(dtfft_plan_t plan, bool *is_z_slab_enabled)¶
Checks if plan is using Z-slab optimization.
If
truethen flagsDTFFT_TRANSPOSE_X_TO_ZandDTFFT_TRANSPOSE_Z_TO_Xwill be valid to pass todtfft_transpose.- Parameters:
plan – [in] Plan handle
is_z_slab_enabled – [out] Boolean value if Z-slab is used.
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_stream(dtfft_plan_t plan, dtfft_stream_t *stream)¶
Returns stream associated with
dtFFTplan.This can either be stream passed by user to
dtfft_set_configor stream created internally. Returns NULL pointer if plan’s platform isDTFFT_PLATFORM_HOST.- Parameters:
plan – [in] Plan handle
stream – [out] CUDA stream associated with plan
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_backend(dtfft_plan_t plan, dtfft_backend_t *backend)¶
Returns selected GPU backend during autotune if
effortisDTFFT_PATIENT.If
effortpassed to any create function isDTFFT_ESTIMATEorDTFFT_MEASUREreturns value set bydtfft_set_configor default value, which isDTFFT_BACKEND_NCCL.- Parameters:
plan – [in] Plan handle
backend – [out] Selected backend
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_platform(dtfft_plan_t plan, dtfft_platform_t *platform)¶
Returns plan execution platform .
- Parameters:
plan – [in] Plan handle
platform – [out] Plan platform
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_executor(dtfft_plan_t plan, dtfft_executor_t *executor)¶
Returns FFT executor used in plan.
- Parameters:
plan – [in] Plan handle
executor – [out] FFT Executor
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_precision(dtfft_plan_t plan, dtfft_precision_t *precision)¶
Returns precision of the plan.
- Parameters:
plan – [in] Plan handle
precision – [out] Precision of the plan
- Returns:
DTFFT_SUCCESSon success or error code on failure.
-
dtfft_error_t dtfft_get_dims(dtfft_plan_t plan, int8_t *ndims, const int32_t *dims[])¶
Returns global dimensions of the plan.
Note
Do not free
dimsarray, it is freed when thedtfft_plan_tis destroyed.- Parameters:
plan – [in] Plan handle
ndims – [out] Number of dimensions in plan. User can pass NULL if this value is not needed.
dims – [out] Pointer of size
ndimscontaining global dimensions in reverse order dims[0] is the fastest varying. User can pass NULL if this value is not needed.
- Returns:
DTFFT_SUCCESSon success or error code on failure.