-
-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rise forth the PBS Provider from the ashes of bitrot #21422
Conversation
…mes may only contain lowercase alphanumeric characters or underscores: got "python-build-standalone
fingerprint=None, | ||
# One would normally set append_only_caches=PBS_APPEND_ONLY_CACHES | ||
# here, but it is already going to be injected into the pex | ||
# environment by PythonBuildStandaloneBinary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to avoid this FrozenDict exploding way down the line. I'm pretty sure this comment is truthful, but there was much head scratching along the way.
append_only_caches: FrozenDict[str, str] = FrozenDict( |
src/python/pants/backend/python/providers/python_build_standalone/scripts/generate_urls.py
Show resolved
Hide resolved
src/python/pants/backend/python/providers/python_build_standalone/rules.py
Outdated
Show resolved
Hide resolved
|
||
PBS_SANDBOX_NAME = ".python_build_standalone" | ||
PBS_NAMED_CACHE_NAME = "python_build_standalone" | ||
PBS_APPEND_ONLY_CACHES = FrozenDict({PBS_NAMED_CACHE_NAME: PBS_SANDBOX_NAME}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is potentially DRYing out both with the PythonBuildStandaloneBinary
adhoc rules and the pyenv provider. I would prefer to not block landing on either.
There is certainly more polish that could be done, but I think this is a useful feature that is going to be better served by being Added the reviewers from the original PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for picking this up!
src/python/pants/backend/python/providers/python_build_standalone/rules.py
Outdated
Show resolved
Hide resolved
) | ||
|
||
def get_all_pbs_pythons(self) -> dict[str, dict[str, PBSPythonInfo]]: | ||
all_pythons = load_pbs_pythons().copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copy
here is good, to avoid poisoning the cache if we can... What if we had the function return FrozenDict
or similar, or had a wrapper that did the copy
internally, so we don't have to remember to do it everywhere that mutates?
(This doesn't need to block merging, can be follow-up.)
level=LogLevel.DEBUG, | ||
input_digest=downloaded_python.digest, | ||
description=f"Install Python {python_version}", | ||
append_only_caches=PBS_APPEND_ONLY_CACHES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I feel the use of append-only/named-caches here isn't great, as I think it makes it incompatible with remote execution. In particular, even within a single session, we might run this on machine A to seed its named cache, and then execute downstream processes on machine B that don't have a seeded cache.
An alternative approach would be capturing digests and including them in sandboxes etc... but maybe this requires rearchitecting how the providers work.
So, definitely happy for it to be follow-up/filing a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I lean heavily on remote caching by way of REAPI at $dayjob, but have not used remote execution. )
Some cross-linking notes:
- This pattern is I think pre-existing at
append_only_caches=PythonBuildStandaloneBinary.APPEND_ONLY_CACHES, - From simple search, Add support for Pants hosted Go toolchain #20485 appears to be providing a (golang) toolchain without
append_only_caches
- Use a named_cache for downloading go modules #20552 foundered on the non-optional use of named_caches
- From the motivation for Use a named_cache for downloading go modules #20552 I'm worried that naively treating the O(thousands) of little files in a Python install as digests.
- Are named_caches used by REAPI at all today? reenable append-only caches during remote execution #10078
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some context, there is an REAPI-adajcent protocol called "Remote Asset API" which is designed to tell a remote execution server to fetch and download artifacts into a remote store. Pants does not implement it though.
Also, there is not append-only / named caches in remote execution unless a server operator takes actions to provide a persisted writeable directory to store that cache. If a sever operator does provide such a location, the Pants option --remote-execution-append-only-caches-base-path
can be set to let Pants make use of it. Most of the servers implementing REAPI are not going to provide that out of the box.
src/python/pants/backend/python/providers/python_build_standalone/scripts/generate_urls.py
Show resolved
Hide resolved
Some notes:
|
Thanks! In the spirit in which this PR was originally opened, and the accompanying review + emojis, I'm going to go ahead and land this. I've opened several cross linked issues for followup. |
Sweet, agreed. Thanks! |
( I still watch from the sidelines) @cburroughs thank you so much for picking this up ❤️❤️❤️ |
## Before this PR The initial implementation of the python-build-standalone provider for hermetic python in #21422 failed on MacOS since the `cp` flags specified were incompatible ## After this PR Update the logic for copying the python-build-standalone binary into the sandbox originally set up in #21422 to do a `cp` and `mv` dance to be atomic, and support macOS and Linux. ## Test Plan Existing test coverage (now that we're properly running them across multiple platforms after annotating existing tests w/ `@pytest.mark.platform_specific_behavior`) --------- Co-authored-by: Rahul Mehta <rmehta2@anduril.com> Co-authored-by: Huon Wilson <huon@exoflare.io>
Add Python Build Standalone (aka PBS) as another "Python Provider". Additionally, a script is provided that scrapes the GitHub API for release information.
Original work (most of it!) in #19462 by @thejcannon