-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary build script: Switch from timestamp to hash based .pyc files #1322
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fcd6d2c
to
28df132
Compare
When we build Python binaries, the `make install` step automatically generates `.pyc` files for the Python stdlib, however: - It generates these using the default `timestamp` invalidation mode, which does not work well with the CNB file timestamp normalisation behaviour. - It generates `.pyc`s for all three optimisation levels (standard, `-O` and `-OO`), when the vast majority of apps only use the standard mode. As such, this changes our builds to: - Use one of the hash-based pyc invalidation modes to prevent the `.pyc`s from always being treated as outdated and so being regenerated at application boot. - Ship only the standard optimisation level pycs (and not the `.opt-{1,2}.pyc` files), reducing build output by 18MB. We use the `unchecked-hash` mode rather than `checked-hash` since it improves app startup times by ~5%, and is only an issue if manual edits are made to the stdlib, which is not something we support. See: https://docs.python.org/3/reference/import.html#cached-bytecode-invalidation https://docs.python.org/3/library/compileall.html https://peps.python.org/pep-0488/ https://peps.python.org/pep-0552/ https://github.com/python/cpython/blob/v3.10.4/Makefile.pre.in#L1603-L1629 GUS-W-10988998. GUS-W-10989125.
28df132
to
09f8faf
Compare
The combined build output size reductions from this PR plus #1319, #1320 and #1321, are:
These size reductions reduce:
...plus they also reduce the chance of an app running into slug size limits when using heavier dependencies. In the future I plan to explore switching the archives to using zstd instead of gzip, for further size/performance wins. |
doriskwan
approved these changes
May 9, 2022
This was referenced May 17, 2022
edmorley
added a commit
that referenced
this pull request
Apr 18, 2024
As part of the CNB multi-architecture support work, we need to change the Python runtime archive S3 URLs to include the architecture name. In addition, for the CNB transition from "stacks" to "targets", it would be helpful to switch from stack ID references (such as `heroku-22`) in the URL scheme, to the distro name+version (eg `ubuntu` and `22.04`) available to CNBs via the CNB targets feature. See: https://github.com/buildpacks/spec/blob/buildpack/0.10/buildpack.md#targets-1 Rather than duplicate the Python archives on S3 under different filenames/locations, it makes sense to migrate this buildpack to the new archive names too, so the same S3 archives can be used by both this buildpack and the CNB. Moving to new archive names/URLs also means we can safely regenerate all existing Python versions to pick up the changes in #1566 (and changes made in the past, such as #1319, #1320, #1321 and #1322), since we won't have to worry about overwriting the old archives (which is something we've typically avoided, since it isn't compatible with the model of being able to roll back to an older buildpack version to return to prior behaviour). Since we're changing the S3 URLs anyway, now is also a good time to make another change that would otherwise cause churn in the S3 URLs again (which affects people that pin buildpack version): Switching archive compression format from gzip to Zstandard (something that we've been wanting to do for a while). Zstandard (aka zstd) is a much superior compression format over gzip (smaller archives and much faster decompression), and is seeing widespread adoption across multiple ecosystems (eg APT packages, Docker images, web browsers etc). See: https://github.com/facebook/zstd https://github.com/facebook/zstd/blob/dev/programs/README.md#usage-of-command-line-interface Our base images already have `zstd` installed (and for Rust for the CNB, there is the [zstd](https://crates.io/crates/zstd) crate available), so it's an easy switch. Various compression levels were tested using zstd's benchmarking feature and in the end the highest level of compression picked, since: 1. Unlike some other compression algorithms, zstd's decompression speed is generally not affected by the compression level. 2. We only have to perform the compression once (when compiling Python). 3. Even at the highest compression ratio, it only takes 20 seconds to compress the Python archives compared to the 10 minutes it takes to compile Python itself (when using PGO+LTO). For the Ubuntu 22.04 Python 3.12.3 archive, switching from gzip to zstd (level 22, with long window mode enabled) results in a 26% reduction in compressed archive size. GUS-W-15158299. GUS-W-15505556.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we build Python binaries, the
make install
step automatically generates.pyc
files for the Python stdlib, however:timestamp
invalidation mode, which does not work well with the CNB file timestamp normalisation behaviour..pyc
s for all three optimisation levels (standard,-O
and-OO
), when the vast majority of apps only use the standard mode.As such, this changes our builds to:
.pyc
s from always being treated as outdated and so being regenerated at application boot..opt-{1,2}.pyc
files), reducing build output by 18MB.We use the
unchecked-hash
mode rather thanchecked-hash
since it improves app startup times by ~5%, and is only an issue if manual edits are made to the stdlib, which is not something we support.See:
https://docs.python.org/3/reference/import.html#cached-bytecode-invalidation
https://docs.python.org/3/library/compileall.html
https://peps.python.org/pep-0488/
https://peps.python.org/pep-0552/
https://github.com/python/cpython/blob/v3.10.4/Makefile.pre.in#L1603-L1629
GUS-W-10988998.
GUS-W-10989125.