core-clp: Add CLI command to extract a compressed file as IR. #420

haiqi96 · 2024-05-30T15:01:12Z

References

based on #417

Description

This changes adds ir decompression execution path to the clp executable.

The PR contains two notable changes:

The PR introduce a new command clp i. The command allows user to decompress a file split to one or multiple IR files, by providing the orig_file_id and a message index. It also let user pick a custom threshold for the uncompressed IR size and a directory to temporarily write IRs to.
Since the message_index and the orig_file_id can unique identiy a file split, we implemented a simplified decompression logic in IrDecompression.cpp. Compared to the decompression.cpp,

Validation performed

To validate the functionality, we compressed a 64MB file into archive(s). We then decompressed it into mulitple IRs, decoded and concatnate them, and did a binary comparison with the original file.

We used two configuration to cover all the possible cases:

Compressed a 64MB hadoop log using smaller encoded file size and archive size, such that it splits the original file into 3 splits across 2 archives. We then decompressed all 3 IRs by running clp 3 times, using different message index
Compressed the 64MB hadoop log using default settings, so only one file and archive was generated. We then decompressed the IR using a 32MB threshold, generating 3 IRs on disk.

� Conflicts: � components/core/src/clp/GlobalMySQLMetadataDB.cpp

components/core/src/clp/clp/IrDecompression.hpp

components/core/src/clp/clp/CommandLineArguments.hpp

components/core/src/clp/clp/CommandLineArguments.cpp

components/core/src/clp/clp/run.cpp

components/core/src/clp/clp/FileDecompressor.inc

components/core/src/clp/clp/FileDecompressor.hpp

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

…r; Refactor new CLI args code.

components/core/src/clp/clp/CommandLineArguments.hpp

components/core/src/clp/clp/run.cpp

components/core/src/clp/clp/FileDecompressor.hpp

components/core/src/clp/clp/decompression.hpp

components/core/src/clp/clp/decompression.cpp

components/core/src/clp/clp/CommandLineArguments.cpp

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

kirkrodrigues

For the PR title, how about:

core-clp: Add CLI command to extract a file from an archive as IR.

haiqi96 · 2024-06-12T04:43:53Z

For the PR title, how about:

core-clp: Add CLI command to extract a file from an archive as IR.

how about core-clp: Add CLI command to extract a compressed file as IR.

An archive gives me the impression that user needs to specifiy an archive.

kirkrodrigues · 2024-06-12T04:49:11Z

For the PR title, how about:
core-clp: Add CLI command to extract a file from an archive as IR.

how about core-clp: Add CLI command to extract a compressed file as IR.

sgtm

…e#420)

haiqi96 added 30 commits May 15, 2024 19:16

Add combined file msg offset to the global metadata databases

ad69313

Add combined file message offset to reader side file and File metadata

d53afcd

Linter fix

12d39a9

Linter fix again

9f3c1cb

Fix comment

87456f5

address code review comments

73c4216

linter

437ee87

small fix

dc56550

Update msg_idx to message_ix for consistency

0a5cde1

Update msg_idx to message_ix for consistency

ad9bbd8

Further clean up

83eaaf1

Linter

7cd1ff6

Address code review comments

7700b8f

Add partial support for file ID & msg_ix querying

06e3975

Add support for global MySQL database

dd355ad

� Conflicts: � components/core/src/clp/GlobalMySQLMetadataDB.cpp

Replace file iterator with simpler function

8b09311

Get previous code's review changes.

abd1940

Linter

75fc71b

rebase and linter

f17e955

Update comments and function interfaces

58a28ce

Remove extra empty line

201b336

Resolve code review concerns

66fe18e

Revert string view

3e0da95

Merge branch 'main' into FileIDFilter

3246c61

Replace magic number with constexpr to avoid confusion

2781f8a

Update docstring

356736d

Add Ir decompression api

8b2d2b1

Linter

aa9377f

Retouch the core Archive to IR functions

85e928d

Merge remote-tracking branch 'origin/main' into ArchiveToIR

323e80e

haiqi96 added 2 commits June 7, 2024 22:06

Some refactor

0d8ef4f

update caller

2156987

haiqi96 force-pushed the ArchiveToIRCmd branch from c578324 to 2156987 Compare June 8, 2024 17:39

Merge branch 'main' into ArchiveToIRCmd

1e3fead

haiqi96 force-pushed the ArchiveToIRCmd branch from a457c44 to c1ca7aa Compare June 9, 2024 02:28

kirkrodrigues marked this pull request as ready for review June 10, 2024 14:27

kirkrodrigues self-requested a review June 10, 2024 21:52

Fix

c4a2c84

haiqi96 force-pushed the ArchiveToIRCmd branch from c1ca7aa to c4a2c84 Compare June 11, 2024 04:12

kirkrodrigues requested changes Jun 11, 2024

View reviewed changes

haiqi96 and others added 5 commits June 11, 2024 11:25

Apply suggestions from code review

44b1b44

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

Address code review concerns

7209ce4

Address more code review concerns

8b94242

Add docstrings

f430fd9

linter

9b9892d

haiqi96 requested a review from kirkrodrigues June 11, 2024 21:59

kirkrodrigues added 2 commits June 11, 2024 19:58

Fix inconsistent indent.

1960858

Fix as many clang-tidy warnings as possible; Refactor decompress_to_i…

0feef34

…r; Refactor new CLI args code.

kirkrodrigues requested changes Jun 12, 2024

View reviewed changes

haiqi96 and others added 4 commits June 11, 2024 21:09

Apply suggestions from code review

d8d053e

Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>

reorder argument check

21a268c

update function docstring and signature

aa6a0ee

fix

dd91cc7

kirkrodrigues approved these changes Jun 12, 2024

View reviewed changes

haiqi96 changed the title ~~core-clp: add Archive to IR decompression as a command line option for clp~~ core-clp: Add CLI command to extract a compressed file as IR. Jun 12, 2024

haiqi96 merged commit d5fcd6b into y-scope:main Jun 12, 2024
11 checks passed

haiqi96 deleted the ArchiveToIRCmd branch June 28, 2024 14:43

jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024

core-clp: Add CLI command to extract a compressed file as IR. (y-scop…

619fdb0

…e#420)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core-clp: Add CLI command to extract a compressed file as IR. #420

core-clp: Add CLI command to extract a compressed file as IR. #420

haiqi96 commented May 30, 2024 •

edited

Loading

kirkrodrigues left a comment

haiqi96 commented Jun 12, 2024 •

edited

Loading

kirkrodrigues commented Jun 12, 2024

core-clp: Add CLI command to extract a compressed file as IR. #420

core-clp: Add CLI command to extract a compressed file as IR. #420

Conversation

haiqi96 commented May 30, 2024 • edited Loading

References

Description

Validation performed

kirkrodrigues left a comment

Choose a reason for hiding this comment

haiqi96 commented Jun 12, 2024 • edited Loading

kirkrodrigues commented Jun 12, 2024

haiqi96 commented May 30, 2024 •

edited

Loading

haiqi96 commented Jun 12, 2024 •

edited

Loading