Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rise forth the PBS Provider from the ashes of bitrot #21422

Merged
merged 16 commits into from
Sep 20, 2024

Conversation

cburroughs
Copy link
Contributor

Add Python Build Standalone (aka PBS) as another "Python Provider". Additionally, a script is provided that scrapes the GitHub API for release information.

Original work (most of it!) in #19462 by @thejcannon

fingerprint=None,
# One would normally set append_only_caches=PBS_APPEND_ONLY_CACHES
# here, but it is already going to be injected into the pex
# environment by PythonBuildStandaloneBinary
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to avoid this FrozenDict exploding way down the line. I'm pretty sure this comment is truthful, but there was much head scratching along the way.

append_only_caches: FrozenDict[str, str] = FrozenDict(


PBS_SANDBOX_NAME = ".python_build_standalone"
PBS_NAMED_CACHE_NAME = "python_build_standalone"
PBS_APPEND_ONLY_CACHES = FrozenDict({PBS_NAMED_CACHE_NAME: PBS_SANDBOX_NAME})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is potentially DRYing out both with the PythonBuildStandaloneBinary adhoc rules and the pyenv provider. I would prefer to not block landing on either.

@cburroughs
Copy link
Contributor Author

There is certainly more polish that could be done, but I think this is a useful feature that is going to be better served by being experimental than sitting in a PR.

Added the reviewers from the original PR.

@cburroughs cburroughs marked this pull request as ready for review September 16, 2024 20:54
@cburroughs cburroughs self-assigned this Sep 16, 2024
Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking this up!

)

def get_all_pbs_pythons(self) -> dict[str, dict[str, PBSPythonInfo]]:
all_pythons = load_pbs_pythons().copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy here is good, to avoid poisoning the cache if we can... What if we had the function return FrozenDict or similar, or had a wrapper that did the copy internally, so we don't have to remember to do it everywhere that mutates?

(This doesn't need to block merging, can be follow-up.)

level=LogLevel.DEBUG,
input_digest=downloaded_python.digest,
description=f"Install Python {python_version}",
append_only_caches=PBS_APPEND_ONLY_CACHES,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I feel the use of append-only/named-caches here isn't great, as I think it makes it incompatible with remote execution. In particular, even within a single session, we might run this on machine A to seed its named cache, and then execute downstream processes on machine B that don't have a seeded cache.

An alternative approach would be capturing digests and including them in sandboxes etc... but maybe this requires rearchitecting how the providers work.

So, definitely happy for it to be follow-up/filing a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I lean heavily on remote caching by way of REAPI at $dayjob, but have not used remote execution. )

Some cross-linking notes:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some context, there is an REAPI-adajcent protocol called "Remote Asset API" which is designed to tell a remote execution server to fetch and download artifacts into a remote store. Pants does not implement it though.

Also, there is not append-only / named caches in remote execution unless a server operator takes actions to provide a persisted writeable directory to store that cache. If a sever operator does provide such a location, the Pants option --remote-execution-append-only-caches-base-path can be set to let Pants make use of it. Most of the servers implementing REAPI are not going to provide that out of the box.

@huonw
Copy link
Contributor

huonw commented Sep 16, 2024

Some notes:

  • good to get this in given the code basically all exists, but I note that uv-python-provider for managing Python installations #21362 is an idea that would allow pants to not have to enumerate all the possible versions by shifting that responsibility to an external tool (uv)
  • total brainstorm: is there a way to have the script automatically run on a schedule that vaguely matches PBS's so that we don't have to have humans remember to do it? Maybe something with GitHub actions and worker pants?

@cburroughs
Copy link
Contributor Author

Thanks! In the spirit in which this PR was originally opened, and the accompanying review + emojis, I'm going to go ahead and land this. I've opened several cross linked issues for followup.

@cburroughs cburroughs merged commit 8dd532e into pantsbuild:main Sep 20, 2024
25 checks passed
@huonw
Copy link
Contributor

huonw commented Sep 20, 2024

Sweet, agreed. Thanks!

@thejcannon
Copy link
Member

( I still watch from the sidelines)

@cburroughs thank you so much for picking this up ❤️❤️❤️

huonw added a commit that referenced this pull request Nov 4, 2024
## Before this PR
The initial implementation of the python-build-standalone provider for
hermetic python in #21422 failed
on MacOS since the `cp` flags specified were incompatible

## After this PR

Update the logic for copying the python-build-standalone binary into the
sandbox originally set up in
#21422 to do a `cp` and `mv`
dance to be atomic, and support macOS and Linux.

## Test Plan
Existing test coverage (now that we're properly running them across
multiple platforms after annotating existing tests w/
`@pytest.mark.platform_specific_behavior`)

---------

Co-authored-by: Rahul Mehta <rmehta2@anduril.com>
Co-authored-by: Huon Wilson <huon@exoflare.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants