Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: allow ctx.which to watch for PATH/ program changes #24864

Open
rickeylev opened this issue Jan 8, 2025 · 5 comments
Open

FR: allow ctx.which to watch for PATH/ program changes #24864

rickeylev opened this issue Jan 8, 2025 · 5 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@rickeylev
Copy link
Contributor

Description of the feature request:

In short: Add a watch=True arg to ctx.which(). This makes it parse PATH and watch for $program to exist/change in a directory before where it first found $program. Unless that condition is met, the which() result is not invalidated. The pesudo-code for this would be something like:

Background:

The ctx.which() function is used to find a program by looking for it in the locations specified by the PATH environment variable. Thus, changing PATH could change the result of which(), depending on how path changed and where a program was found.

Today, changes to the PATH environment variable won't invalidate any lookups using which(). This makes some sense because it's pretty context specific as to whether it matters or not. However, this is problematic for cases which do care if the result would have changed. My go-to example is when configuring a language toolchain using the local system (non-ideal, but IMHO, quite reasonable when users intentionally want to use a custom toolchain for whatever reason).

While a repository rule can call getenv("PATH") to watch the path environment variable, this ends up being too noisy -- tools like git may modify PATH when running hooks (bazelbuild/rules_python#2551 (comment)) in such a way that wouldn't invalidate most which() calls.

Which category does this issue belong to?

External Dependency

What underlying problem are you trying to solve with this feature?

Making a repository rule correctly watch for changes to its inputs while not invalidating when unnecessary.

Which operating system are you running Bazel on?

Linux

What is the output of bazel info release?

8.0.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

Searched all, didn't find a mention of this.

Any other information, logs, or outputs that you want to share?

This feature request originates from discussion in bazelbuild/rules_python#2551

@github-actions github-actions bot added the team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. label Jan 8, 2025
@fmeum
Copy link
Collaborator

fmeum commented Jan 8, 2025

What do you think of the alternative of watching the binary that is returned? That's something you can already do today and it sounds more reliable to me (lower false negative and positive rate).

@rickeylev
Copy link
Contributor Author

rickeylev commented Jan 8, 2025

i.e, given which(watch=True, ...) it would automatically watch the found binary?

A repo rule can do that today, so it's a small, but welcome, quality-of-life improvement IMHO.

However, it doesn't address the case that user code can't implement today: if PATH changes in such a way that which()'s value would be different. i.e. these cases:

  • A new PATH entry is added before the matched-entry and $newEntry/$program exists
  • The original PATH entry that matched is removed from PATH.
  • $program starts existing at a PATH entry before where it was originally found. It'd be easy to make which() also handle this (just watch all earlier entries) if started watching the found binary.

@fmeum
Copy link
Collaborator

fmeum commented Jan 15, 2025

Thanks for the additional explanation.

I agree that having this on ctx.watch would be convenient, but I think that you can implement it today. By separating out the which logic into a separate repo rule, you only have to overapproximate the conditions under which it should be rerun and can then benefit from "change pruning" that avoids rerunning your actual repo rule if the resulting binary path doesn't change. This could look something like this:

def _watch_which_impl(ctx):
    # TODO: Support Windows.
    path = ctx.getenv("PATH").split(":")
    tool_path = None
    for p in path:
        candidate = ctx.path(p).get_child(ctx.attr.tool)
        # Rerun if the file is created or deleted.
        # This also reruns if the existing file changes, but
        # this repo rule is cheap enough to evaluate.
        ctx.watch(candidate)
        # TODO: Check for executable bit on Unix.
        if candidate.exists:
            tool_path = p
            break
    ctx.file("BUILD")
    ctx.file("path.bzl", "TOOL_PATH = {}".format(repr(tool_path)))

watch_which = repository_rule(
    implementation = _watch_which_impl,
    attrs = {
        "tool": attr.string(mandatory = True),
    },
)

Would that be a suitable polyfill for you?

Edit: It looks like this doesn't watch the executable bit of the existing file, but that could be fixed for ctx.watch.

@meteorcloudy meteorcloudy added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Jan 20, 2025
@rickeylev
Copy link
Contributor Author

Ohh, that's clever :)

Yeah, that will probably work well enough. I'm trying to think of differences if an input is executed in one repo vs another, but can't think of anything. If the env was different in one repo vs another? Not sure how that would happen with PATH, though. I don't think there are flags/settings/things that would affect the env one repo sees but not other repos?

For posterity, the Bazel implementation of which:

public StarlarkPath which(String program, StarlarkThread thread) throws EvalException {

Looking at the Java code, I don't think an identical implementation is possible in Starlark:

  • Java checks if the file is executable; I'm not aware of a Starlark equiv. It'd have to call more subprocs to get equivalent.
  • Java checks if it's a file while respecting symlinks. Presumably x.realpath().is_dir() would be the same?

Other differences look easy to handle: colon/semicolon path separator, adding .exe extension

Maybe it'd make more sense to just call ctx.which() for the actual computed value, but still do the path parsing to capture the edge case of an earlier non-existent PATH entry coming into existence.

@rickeylev
Copy link
Contributor Author

Just for posterity: I was thinking through how to use the repo-per-tool for my case, and, yeah, I'm pretty sure its doable, but can become tedious to do. The tediousness comes from because, in my case, the tool is user-specified. e.g., in MODULE, a user might put:

python.local_toolchain(interpreter = "python3.12", ...)
python.local_toolchain(interpreter = "python3.11", ...)

Which means, ultimately, I need e.g. @pylocal_python3_12 and pylocal_python_3_11 repos. I also don't think the result can be put into a .bzl file and load()'d by the repo rule that needs it -- what the repo's bzl file should load is a function of a particular usage of the repo rule. We can work around that fairly easily, though: pass a label to the repo rule that needs it, then read the path from a file. Easy enough. Or, maybe a hub-spoke pattern (hub of all the lookups in a map).

More generally, as a pattern, this can probably work for any case. The main limitation is how tedious it would be to implement. Fundamentally, what it does is split a repo rule in two: one half computes a value, the other half uses the value. So, as long as a repo can serialize its state to a file for the second repo to pick up from, it can be made to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants