Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run python sources with a VenvPex #17700

Merged
merged 3 commits into from
Dec 5, 2022
Merged

Conversation

thejcannon
Copy link
Member

@thejcannon thejcannon commented Dec 1, 2022

This change makes it so we use a VenvPex to run Python sources, which is a speed boost (I measure a gain of about ~500ms, which is also quoted in pex.py).

In order to make this work (specifically ensuring we don't revert the fix for #12055) we now have to weave the complete pex environment through to VenvPexRequest.

@thejcannon
Copy link
Member Author

thejcannon commented Dec 1, 2022

I marked this as cherry-pick back to 2.14.x, since I'd love to get the perf boost when upgrading to a minor release. (Hopefully the merge won't be painful)

@benjyw
Copy link
Contributor

benjyw commented Dec 2, 2022

Cool change! I will review properly asap, but I'd be against the cherry-pick to 2.14 (or even 2.15). We really only want to backport bugfixes. This change is not a bugfix, and may be destabilizing.

@benjyw
Copy link
Contributor

benjyw commented Dec 2, 2022

I highly recommend deferring the local_dists part of this change to a separate PR. I think I see a problem there, and it will be much easier to reason about if we do it separately.

These are both potentially good changes, but they are also potentially disruptive, and if not reasoned about carefully could lead to subtle issues, so let's do one thing at a time?

@thejcannon thejcannon removed this from the 2.14.x milestone Dec 2, 2022
@thejcannon
Copy link
Member Author

IIRC the local dists did come out of a speed bump I hit, although might not be there anymore.

I'll split it out. While I'm doing that, come up with a unit test for what you're thinking, since everything passes currently.

@thejcannon
Copy link
Member Author

Ah I found the speed bump. PEX_PATH doesn't work with VenvPex shenanigans, and so the local_dists pex needs to be a part of the VenvPex. It's doable, so I'll do that.

@jsirois
Copy link
Contributor

jsirois commented Dec 2, 2022

PEX_PATH doesn't work with VenvPex shenanigans

@thejcannon I assume you ran into the fact that adding a PEX_PATH to a venv causes the venv to be re-built to include the items on the PEX_PATH in the new venv and so any older seed location is no longer the right one?

@thejcannon
Copy link
Member Author

thejcannon commented Dec 2, 2022

PEX_PATH doesn't work with VenvPex shenanigans

@thejcannon I assume you ran into the fact that adding a PEX_PATH to a venv causes the venv to be re-built to include the items on the PEX_PATH in the new venv and so any older seed location is no longer the right one?

Very specifically, I ran into a test failure where the code being run couldn't import modules from the local dist. What you describe would certainly explain the test failure, but I didn't look into it past "it failed, let's make it not fail". I'll keep this in mind for my own edification.

As a related aside, our fondness of pex env vars in our process executions (for things that could be folded into the PEX) is only slightly alarming. But perhaps thats to maximize a particular PEX's re-usability?

@jsirois
Copy link
Contributor

jsirois commented Dec 2, 2022

But perhaps thats to maximize a particular PEX's re-usability?

I think so. The number of perf hacks is alarming. Some day we'll get time to start to address the real costs of sandboxing and tune it specifically per-OS. Basically sandboxing just doesn't work, so we've added a ton of complexity to paper over that. Its sort of working, but pretty creaky.

@@ -99,12 +99,17 @@ async def generate_python_from_protobuf(
protoc_gen_mypy_script = "protoc-gen-mypy"
protoc_gen_mypy_grpc_script = "protoc-gen-mypy_grpc"
mypy_pex = None
complete_pex_env = pex_environment.in_sandbox(working_directory=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drive-by fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, below (as you may have seen) we now need to pass complete_pex_env into VenvPexRequest. We could default it, but I like the explicitness of forcing callers to choose.

) -> VenvScriptWriter:
# N.B.: We don't know the working directory that will be used in any given
# invocation of the venv scripts; so we deal with working_directory once in an
# `adjust_relative_paths` function inside the script to save rule authors from having to do
# CWD offset math in every rule for all the relative paths their process depends on.
complete_pex_env = pex_environment.in_sandbox(working_directory=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you stripped the comment of its claim. The now CompletePexEnvironment may have a known working directory, which seems to suggest an assert complete_pex_env._working_directory is None, but that's obviously begging why let people pass this in at all?

Copy link
Member Author

@thejcannon thejcannon Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm only partially grokking the comment, but it still feels true, albeit missing a qualifier.

Should it not start with "We may not know the ..."?

EDIT: No, if we may not know, then we don't know... so I think it's still OK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and "why let people pass this in at all?": Because in the case of run, working_directory very much is set to something: the build root!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, missed that obvious case given the PR subject. As to the comment, it seems like "in any given invocation" -> "in any given sandbox invocation" then makes it True again with the creation of the complete_pex_env now delegated upstream. Not worth another CI burn though.

@benjyw
Copy link
Contributor

benjyw commented Dec 3, 2022

PS if splitting off the local dists part of the change is disproportionally hard to do, then we don't absolutely have to.

Copy link
Contributor

@benjyw benjyw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC I notice quite a few diff lines that were unrelated to the substantial purpose of the PR, but seem more related to a personal preference for inlining? Those can muddy the waters in a review, so for future changes I'd recommend considering whether that sort of subjective aesthetic choice is worth the diff lines... :)

@thejcannon
Copy link
Member Author

IIUC I notice quite a few diff lines that were unrelated to the substantial purpose of the PR, but seem more related to a personal preference for inlining? Those can muddy the waters in a review, so for future changes I'd recommend considering whether that sort of subjective aesthetic choice is worth the diff lines... :)

Can you give an example? I suspect everything you're noticing is a result of black formatting a line that has exceeded our configured line length.

# variables are not stripped.
"--no-strip-pex-env",

pex_request, sources = await MultiGet(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the example I was thinking of. AFAICT this diff is entirely due to inlining the two Gets into the MultiGet?

Copy link
Member Author

@thejcannon thejcannon Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, sorry that'd a byproduct of several refactors where it was part of the earlier MultiGet (on line 51) which has them inlined.

I wish we had better lint tools and we could choose a convention 😮‍💨

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal at all, just something to think about in the future. I often do little cleanups like this in PRs and then realize it's contributing to the diff in ways that aren't directly pertinent to the change at hand...

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. That said, IIUC, this is a perf improvement that was actually along the lines of a footgun that let this slip out of the gate with poorer than needed perf. This PR just whacs the footgun mole and brings the perf up to par; so the emphasis in my mind here is that the existence of Pex and VenvPex is confusing and prone to perf errors.

@thejcannon
Copy link
Member Author

Couldn't agree more (solely in the context of Pants)

@thejcannon
Copy link
Member Author

And the naming doesn't help. I naively originally thought a --venv PEX was the speedup, instead of this particular hack/technique

@thejcannon thejcannon merged commit 780ff73 into pantsbuild:main Dec 5, 2022
@thejcannon thejcannon deleted the runvenv branch December 5, 2022 20:47
thejcannon added a commit to thejcannon/pants that referenced this pull request Dec 9, 2022
This change makes it so we use a `VenvPex` to run Python sources, which is a speed boost (I measure a gain of about ~500ms, which is also quoted in `pex.py`).

In order to make this work (specifically ensuring we don't revert the fix for pantsbuild#12055) we now have to weave the complete pex environment through to `VenvPexRequest`.
thejcannon added a commit that referenced this pull request Dec 9, 2022
This change makes it so we use a `VenvPex` to run Python sources, which is a speed boost (I measure a gain of about ~500ms, which is also quoted in `pex.py`).

In order to make this work (specifically ensuring we don't revert the fix for #12055) we now have to weave the complete pex environment through to `VenvPexRequest`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants