Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add different dict modes to compression ratio regression test, update results.csv #2559

Merged
merged 3 commits into from
Mar 25, 2021

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented Mar 25, 2021

On the row-hash PR, DMS was silently broken until recently fixed/discovered - our coverage of tests on different dict modes (DMS, DDS, Copy, Load) is not particularly high. This should be a good first step in ensuring we detect such things. (The row-hash PR adds a DMS unit test in fuzzer.c as well, to check for regressions)

This PR:

  • Adds ratio regression tests to clevels >= 1 for each dictMode when testing using advanced API that can honor requested parameters.
  • Updates results.csv
    • New rows with new dictmode tests
    • Adjustments for improvements introduced with block splitter and levels 16, 19
    • Minor spelling fix
  • Fixes compiler warnings when building this regression test

Testing: I tested this on row-hash with a broken DMS, and indeed the compression results were bad.

.cli_args = "-" #x, \
.param_values = PARAM_VALUES(level_##x##_param_values_dictload), \
.use_dictionary = 1, \
.advanced_api_only = 1, \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I added the new configs to LEVEL() so we automatically get testing for all the levels we try.

An alternative approach is to have each of these just be their own config, with a fixed compression level (and if we wanted to test more clevels/strategies, we could just add more configs). But that does have the downside of not automatically hitting all the tested clevels.

I do prefer having these run on all the compression levels, in case in the future, strategies get changed in a way that might affect the dictionary strategies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@senhuang42 senhuang42 force-pushed the add_dict_regression_tests_backup branch from d980071 to bbbd578 Compare March 25, 2021 18:11
@terrelln
Copy link
Contributor

Thanks for adding this Sen! Really glad to see increased coverage of dictionary compression.

@senhuang42 senhuang42 merged commit ab216bc into facebook:dev Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants