Welcome to dtFFT Documentation¶
Usage Guide¶
dtFFT (DataTyped Fast Fourier Transform) is a high-performance library designed for parallel data transpositions and optional Fast Fourier Transforms (FFTs) in multidimensional computing environments.
dtFFT aims to optimize following cycles of transformations (forward and backward):
for 2D case, and
for 3D case. Where \(X, Y, Z\) are the spatial dimensions of the data, \(X\) being the fastest varying index and \(P_0, P_1, P_2, Q_1, Q_2, Q_2'\) are the number of processes in the appropriate direction.
Initially developed to perform zero-copy transpositions using custom MPI datatypes on CPU clusters, dtFFT leverages these efficient data structures to minimize memory overhead in distributed systems. However, as the demand for GPU-accelerated computing increased, it became evident that MPI datatypes were suboptimal for GPU workflows. To address this limitation, a parallel approach was developed for GPU execution: instead of relying on custom datatypes, dtFFT compiles CUDA kernels at runtime using nvrtc, tailoring them to the specific plan and data layout.
The library supports MPI for distributed systems and GPU acceleration via CUDA, integrating seamlessly with external FFT libraries such as FFTW3, MKL DFTI, cuFFT, and VkFFT, or operating in transpose-only mode.
Whether you are working on CPU clusters or GPU-enabled nodes, dtFFT provides a flexible and efficient framework for scientific computing tasks requiring large-scale data transformations.
This documentation covers the essentials of building and using dtFFT. Please explore the sections below to get started.
Getting Started¶
To begin using dtFFT:
Build the Library: Follow the instructions in Building the Library to compile
dtFFTwith your desired features (e.g., CUDA or FFTW3 support).Use the Library: Refer to the Usage Guide for step-by-step examples of creating plans, allocating memory, and executing transformations.
Configure Runtime: Set environment variables as needed (see Environment Variables) to adjust logging or datatype selection.
Detailed API specifications are available in the Fortran, C, and C++ sections.
Contributing¶
Feedback, bug reports, and contributions are welcome. Please submit issues or pull requests via the project’s repository. For API-specific details, consult the respective language sections.
Table of Contents¶
- Building the Library
- Usage Guide
- Fortran API Reference
- C API Reference
- C++ API Reference
- Python API Reference
- Environment Variables
- DTFFT_ENABLE_LOG
- DTFFT_MEASURE_WARMUP_ITERS
- DTFFT_MEASURE_ITERS
- DTFFT_PLATFORM
- DTFFT_BACKEND
- DTFFT_RESHAPE_BACKEND
- DTFFT_NCCL_BUFFER_REGISTER
- DTFFT_ENABLE_Z_SLAB
- DTFFT_ENABLE_Y_SLAB
- DTFFT_ENABLE_MPI_DT
- DTFFT_ENABLE_MPI
- DTFFT_ENABLE_NCCL
- DTFFT_ENABLE_NVSHMEM
- DTFFT_ENABLE_PIPE
- DTFFT_ENABLE_RMA
- DTFFT_ENABLE_FUSED
- DTFFT_ENABLE_COMPRESSED
- DTFFT_ENABLE_KERNEL_AUTOTUNE
- DTFFT_ENABLE_FOURIER_RESHAPE
- DTFFT_TRANSPOSE_MODE
- DTFFT_ACCESS_MODE
- Benchmark Overview