-
Notifications
You must be signed in to change notification settings - Fork 555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invoking bazel with a different PATH re-fetches pip_repository #2551
Comments
Here is the function that causes the re-fetch: https://github.com/bazelbuild/rules_python/blob/main/python/private/repo_utils.bzl#L256-L282 def _which_unchecked(mrctx, binary_name):
"""Tests to see if a binary exists.
This is also watch the `PATH` environment variable.
Args:
binary_name: name of the binary to find.
mrctx: repository context.
Returns:
`struct` with attributes:
* `binary`: `repository_ctx.Path`
* `describe_failure`: `Callable | None`; takes no args. If the
binary couldn't be found, provides a detailed error description.
"""
path = _getenv(mrctx, "PATH", "")
binary = mrctx.which(binary_name)
if binary:
_watch(mrctx, binary)
describe_failure = None
else:
describe_failure = lambda: _which_describe_failure(binary_name, path)
return struct(
binary = binary,
describe_failure = describe_failure,
) The fix is simple. The result of the It's possible I'm missing the larger purpose of this code. Is it desirable to watch |
It's reasonable to ask what the expected behavior of Bazel (rulesets, etc.) should be when
I have to set this flag in my Years ago, the Bazel team tried to enable this flag by default, but it was reverted and they haven't tried since: bazelbuild/bazel#7026. So the expected behavior is unclear. As anecdotal evidence, I use lots of other rulesets (java, kotlin, js, go), and I've never observed repository rule re-fetching when |
Sent #2552 as a potential fix. Maintainers, happy to discuss on this issue or the PR. I'm not sure how/whether to write a test, or whether the change should be considered breaking. |
Thanks for the well researched and detailed report!
Huh. TIL. I wonder if this explains another report of frequent invalidation that has been hard to figure out.
I can't remember if this is just sloppy coding on my part, or if I intentionally did it. I rewrote that code quite a few times over a long time and lost track of the what/why of many changes. I'm more inclined to chalk this up to sloppy coding. I'm generally pretty good about adding comments when for a non-obvious side-effect type of thing like this. Assigning the variable in an outer block when it's only used in an inner block is also suspicious (I avoid doing that). If I did do it intentionally, then it'd be because...uhhhh...probably under the assumption that which() is based upon PATH, so changes to PATH should invalidate which() results. This makes sense for setting up a local toolchain, where a main use case is to use the environment to find So yes, I think it generally makes sense to not watch PATH for every which() call. I think #2552 makes sense.
Yeah, this is a tricky question in light of git's behavior to modify PATH. For a local toolchain, it makes sense to watch PATH, however, this would cause the same unnecessary invalidation if one were to run a python tool in a git hook (a pretty reasonable thing). Hrmph. I think in the abstract, rules don't care if PATH itself changes, they just care if the program they're looking for changed. Which, translated, is more like asking, "watch for if $program shows up in an earlier PATH directory". I'm not sure how, or if, we could approximate that on the Starlark side without watching PATH. It is probably an optimization that Bazel itself would need to implement. |
This is hard to google for. Here's my best reference: https://github.com/git/git/blob/master/exec-cmd.c#L333-L350: void setup_path(void)
{
const char *exec_path = git_exec_path();
const char *old_path = getenv("PATH");
struct strbuf new_path = STRBUF_INIT;
git_set_exec_path(exec_path);
add_path(&new_path, exec_path);
if (old_path)
strbuf_addstr(&new_path, old_path);
else
strbuf_addstr(&new_path, _PATH_DEFPATH);
setenv("PATH", new_path.buf, 1);
strbuf_release(&new_path);
} If I'm reading this correctly, git prepends |
Thank you @Ubehebe for taking time to document this. This is an amazing bug report. |
Fixes bazelbuild#2551. Currently, the _which_unchecked helper unconditionally watches the `PATH` env var via repository_ctx.getenv. getenv is documented https://bazel.build/rules/lib/builtins/repository_ctx#getenv: > any change to the value of the variable named by name will cause > this repository to be re-fetched. Thus, any change to `PATH` will cause any repository rule that transitively calls _which_unchecked to be re-fetched. This includes python_repository and whl_library. There are reasonable development workflows that modify `PATH`. In particular, when git runs a hook, it adds the value of `GIT_EXEC_PATH` to `PATH` before invoking the hook. If the hook invokes bazel (for example, a pre-commit hook running `bazel build ...`), it will cause the Python repository rules to be re-fetched. This commit lowers the repository_ctx.getenv("PATH") call to its only use site in _which_unchecked, which happens to be a failure case (when the binary is not found). This allows the success case to not watch `PATH`, and therefore not to re-fetch the repository rule when it changes.
Currently, the _which_unchecked helper unconditionally watches the `PATH` env var via repository_ctx.getenv. getenv is documented https://bazel.build/rules/lib/builtins/repository_ctx#getenv: > any change to the value of the variable named by name will cause this repository to be re-fetched. Thus, any change to `PATH` will cause any repository rule that transitively calls _which_unchecked to be re-fetched. This includes python_repository and whl_library. There are reasonable development workflows that modify `PATH`. In particular, when git runs a hook, it adds the value of `GIT_EXEC_PATH` to `PATH` before invoking the hook. If the hook invokes bazel (for example, a pre-commit hook running `bazel build ...`), it will cause the Python repository rules to be re-fetched. This commit lowers the repository_ctx.getenv("PATH") call to its only use site in _which_unchecked, which happens to be a failure case (when the binary is not found). This allows the success case to not watch `PATH`, and therefore not to re-fetch the repository rule when it changes. Fixes #2551.
🐞 bug report
Forked from #2283 (comment)
Affected Rule
The issue is caused by the rule: pip_repositoryIs this a regression?
Yes, the previous version in which this bug was not present was: ~0.32.2Description
I run
bazel build ...
from a git pre-commit hook as part of my development workflow. Although it is not well-documented, git modifies thePATH
env var when running hooks. (You can see this for yourself by puttingecho "$PATH"
in a pre-commit hook, then comparing the output ofgit hook run pre-commit
versus running the hook script directly.)Meanwhile, since #2000, the pip_repository rule has internal code that calls repository_ctx.getenv("PATH"): https://github.com/bazelbuild/rules_python/blob/main/python/private/repo_utils.bzl#L271. repository_ctx.getenv is documented:
Putting the git and rules_python behaviors together causes the bug. Doing a
bazel build ...
directly and then doing it indirectly viagit hook run pre-commit
re-fetches the pip_repository. (The reverse order also works.)🔬 Minimal Reproduction
https://github.com/Ubehebe/rules_python_2283_mvce
🔥 Exception or Error
Console messages printed when pip_repository is unexpectedly re-fetched. Example:
🌍 Your Environment
Operating System:
Output of
bazel version
:Rules_python version:
1.0.0
The text was updated successfully, but these errors were encountered: