Skip to content

Releases: szcompressor/cuSZp

V2.0.1

17 Nov 06:52
240240b
Compare
Choose a tag to compare

This patch release includes the implementation of the SC'24 compression tutorial. Specifically:

  • Updates cuSZp.h and cuSZp.cpp to increase compatibility.
  • In float32 data type and plain mode compression kernel, update a partial re-execution design (alleviating register usages).

V2.0.0

27 Oct 01:50
Compare
Choose a tag to compare

This patch release consists of the major updates from cuSZp1 (SC'23 paper) to cuSZp2 (SC'24 paper).

cuSZp V2.x has following designs:

  • One kernel function for compression/decompression.
  • Outlier- and plain-fixed-length encoding mode.
  • Using optimized memory access patterns in compression and decompression.
  • Using latency control in global synchronization.

cuSZp V2.x has following features:

  • Ultra-fast end-to-end throughput (2x~3x compared with cuSZp V1.x).
  • High compression ratio in various data patterns.
  • F64 and F32 data types are supported.
  • Executable binary, C/C++ API, and Python API are supported.

V1.1.0

25 Oct 02:31
f4df2f1
Compare
Choose a tag to compare

This patch release moves padding into cuSZp kernel functions along with various kernel updates. Users can directly use the following APIs to perform compression and decompression on device pointers. In other words, extra cudaMalloc() and cudaMemcpy() for padding are no longer required, making cuSZp easier to deploy in inline compression tasks.

  • SZp_compress_deviceptr_f32();
  • SZp_decompress_deviceptr_f32();

This release can be seen and evaluated as the final implementation of cuSZp V1.x (i.e. [SC'23] paper).

V1.0.0

03 Apr 07:32
Compare
Choose a tag to compare

This release includes the design that is mentioned in [SC'23] paper.

cuSZp V1.x has following designs:

  • One kernel function for compression/decompression.
  • Using fixed-length encoding as the core compression algorithm.
  • Global synchronization is performed via a serial chain scan.

cuSZp V1.x has following features:

  • Fast end-to-end throughput.
  • High compression ratio for sparse and non-smooth datasets.
  • F64 and F32 data types supported.