Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compression ratio of the --patch-from mode #4288

Merged
merged 10 commits into from
Feb 10, 2025
Merged

Conversation

Cyan4973
Copy link
Contributor

@Cyan4973 Cyan4973 commented Feb 8, 2025

--patch-from mode essentially combines --long with dictionary, in order to create a very small patch when compressing a new document based on an older revision of the same document.

The --long mode features advanced parameters, that can be tuned by the user, but are rarely employed. As a consequence, the "default" values are almost always used, resulting in non-optimal performance.

This patch dynamically updates the advanced parameters of the --long mode, spending more cpu time to favor compression ratio at higher levels.

For illustration, here is an extract of the updated performance, using the following scenario:
zstd -# -T6 ~/dev/bench/linux-v6.13.tar --patch-from ~/dev/bench/linux-v6.12.tar
running on a ubuntu desktop with a i7-9700k cpu:

Level strategy dev time PR time dev cSize PR cSize ratio improv.
1 fast 2.81s 2.99s 7,806,350 7,619,530 2.45%
3 dfast 2.83s 3.01s 7,526,673 7,347,628 2.44%
5 greedy 2.85s 3.51s 7,992,373 7,429,686 7.57%
6 lazy 2.86s 3.52s 7,719,580 7,233,837 6.71%
8 lazy2 2.89s 3.69s 7,450,059 6,960,293 7.04%
13 btlazy2 9.2s 10s 6,829,065 6,048,655 12.90%
16 btopt 30s 31s 6,076,109 5,421,444 12.08%
18 btultra 41s 42s 5,780,344 5,075,507 13.89%
19 btultra2 59s 62s 5,522,028 4,724,116 16.89%

}
} else {
assert(1 <= (int)cParams->strategy && (int)cParams->strategy <= 9);
/* mapping: strat1 -> rate8 ... strat9 -> rate4*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is this comment out of date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, indeed,
it should be from 7 to 4 now

to be updated

@Cyan4973 Cyan4973 merged commit d84d70b into dev Feb 10, 2025
100 checks passed
@Cyan4973 Cyan4973 mentioned this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants