Skip to content

Extremely fast checksum runner designed for big files and lots of CPU cores

License

Notifications You must be signed in to change notification settings

nrminor/checkle

Repository files navigation

checkle: Extremely fast checksum runner for arbitrarily large batches of large files

Open Source Starter Files Rust CI

A checksum utility for the multicore age. It's (going to be) so fast it will make you chuckle.

Overview

I work in genomics. This means I often transfer small handfuls of files from sequencing cores, where each file can be as much as a half-a-terabyte. As such, checking the integrity of these files post-transfer can be an arduous, time-consuming task. In my experience, bioinformaticians tackle this problem with shell or Python for loops that will run checksum or some other single-threaded utility and wait however long it takes for the integrity checks to finish before they get going with their analyses.

checkle aims to make this approach obsolete. It will perform checksums on batches of files transferred over the interwebs, using Merkle Trees to accelerate hashing on multicore machines.

Development Goals

I have the following goals for checkle:

  • Find all recently transferred files based on a set of file attribute filters.
  • Spread hashing across as many (virtual) cores as possible using Merkle Trees (for the heads: checkle is a portmanteau of checksum and Merkle).
  • If a manifest of hashes from the source server is provided, spread post-transfer checksums across cores as well.
  • Support md5 for backward compatibility along with at least one more cryptographically secure hashing function.
  • Be capable of reaching into tar and zip archives to checksum files without decompressing the whole archive.
  • Have an easy-to-use command line interface powered by clap.
  • Be easy to install, either through crates.io or with binaries for your platform of choice distributed in this repo.
  • Print a report to stdout on which files should be re-transferred.

checkle will be made available on crates.io when it reaches a reasonable level of stability.

About

Extremely fast checksum runner designed for big files and lots of CPU cores

Resources

License

Stars

Watchers

Forks

Releases

No releases published