Releases: szcompressor/cuSZp
Releases · szcompressor/cuSZp
V2.0.1
V2.0.0
This patch release consists of the major updates from cuSZp1 (SC'23 paper) to cuSZp2 (SC'24 paper).
cuSZp V2.x has following designs:
- One kernel function for compression/decompression.
- Outlier- and plain-fixed-length encoding mode.
- Using optimized memory access patterns in compression and decompression.
- Using latency control in global synchronization.
cuSZp V2.x has following features:
- Ultra-fast end-to-end throughput (2x~3x compared with cuSZp V1.x).
- High compression ratio in various data patterns.
- F64 and F32 data types are supported.
- Executable binary, C/C++ API, and Python API are supported.
V1.1.0
This patch release moves padding into cuSZp kernel functions along with various kernel updates. Users can directly use the following APIs to perform compression and decompression on device pointers. In other words, extra cudaMalloc()
and cudaMemcpy()
for padding are no longer required, making cuSZp easier to deploy in inline compression tasks.
SZp_compress_deviceptr_f32();
SZp_decompress_deviceptr_f32();
This release can be seen and evaluated as the final implementation of cuSZp V1.x (i.e. [SC'23] paper).
V1.0.0
This release includes the design that is mentioned in [SC'23] paper.
cuSZp V1.x has following designs:
- One kernel function for compression/decompression.
- Using fixed-length encoding as the core compression algorithm.
- Global synchronization is performed via a serial chain scan.
cuSZp V1.x has following features:
- Fast end-to-end throughput.
- High compression ratio for sparse and non-smooth datasets.
- F64 and F32 data types supported.