-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core-clp: Add CLI command to extract a compressed file as IR. #420
Conversation
� Conflicts: � components/core/src/clp/GlobalMySQLMetadataDB.cpp
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
…r; Refactor new CLI args code.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the PR title, how about:
core-clp: Add CLI command to extract a file from an archive as IR.
how about
|
sgtm |
References
based on #417
Description
This changes adds ir decompression execution path to the clp executable.
The PR contains two notable changes:
clp i
. The command allows user to decompress a file split to one or multiple IR files, by providing the orig_file_id and a message index. It also let user pick a custom threshold for the uncompressed IR size and a directory to temporarily write IRs to.IrDecompression.cpp
. Compared to thedecompression.cpp
,Validation performed
To validate the functionality, we compressed a 64MB file into archive(s). We then decompressed it into mulitple IRs, decoded and concatnate them, and did a binary comparison with the original file.
We used two configuration to cover all the possible cases:
Compressed a 64MB hadoop log using smaller encoded file size and archive size, such that it splits the original file into 3 splits across 2 archives. We then decompressed all 3 IRs by running clp 3 times, using different message index
Compressed the 64MB hadoop log using default settings, so only one file and archive was generated. We then decompressed the IR using a 32MB threshold, generating 3 IRs on disk.