Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace #4505 with a different set of workarounds #4527

Merged
merged 10 commits into from
Dec 20, 2024

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Nov 14, 2024

This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking prefix_root/bin/carbon or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with bazel run //toolchain, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:

  • It should handle sibling symlinks like clang++ to clang or
    ld.lld to lld, where that symlink in turn points at the busybox.
    We want to use initial argv[0] value to select the mode there.
  • It avoids bouncing through Python (or other subprocesses) when
    invoking the carbon binary in Bazel rules, which will be nice for
    building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like my_carbon_symlink -- we expect the
first argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative carbon-busybox
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.

@chandlerc chandlerc requested a review from jonmeow November 14, 2024 09:56
@github-actions github-actions bot added documentation An issue or proposed change to our documentation toolchain labels Nov 14, 2024
@jonmeow
Copy link
Contributor

jonmeow commented Nov 14, 2024

Maybe I'm missing something, but the use of copy_file to create a symlink looks equivalent to a rollback of #4505. I think bazelbuild/bazel#23620 is correct about workaround options, but let me give my understanding of what's happening.

This change tries to create a symlink chain such as:

./bazel-bin/toolchain/install/run/carbon -> /.../toolchain/install/prefix_root/bin/carbon
/.../toolchain/install/prefix_root/bin/carbon -> /.../toolchain/install/prefix_root/lib/carbon/carbon-busybox
/.../toolchain/install/prefix_root/lib/carbon/carbon-busybox (regular file)

While that's typical output, Bazel could use its CAS (content-addressable storage) cache. In this model, Bazel's promise is to provide something which matches the checksum of the regular file, not the file/symlink structure (this is bazelbuild/bazel#23620). This is also true of carbon-busybox, which is not required to be a regular file: it just needs equivalent content. For example, this is also valid:

./bazel-bin/toolchain/install/run/carbon -> /cas/somechecksum
/.../toolchain/install/prefix_root/bin/carbon -> /cas/somechecksum
/.../toolchain/install/prefix_root/lib/carbon/carbon-busybox -> /cas/somechecksum

Note that things like remote caching can be a lot firmer about removing indirections, even more than the local cache. That's why we need bazelbuild/bazel#23620 to be fixed if we want to use symlinks with Bazel, since it's about retaining symlink structure.

As far as alternative workarounds:

So that's why I went with something like the 2nd workaround: scripts rather than symlinks.

I believe Google's Bazel support makes much more eager decisions about when to reuse cache content, and how to structure symlinks. If you want to go this route, perhaps it would make sense to verify that it works robustly using your access to Google's infrastructure?

@jonmeow
Copy link
Contributor

jonmeow commented Nov 14, 2024

Oh, and to be sure, I believe that the main difference prior to #4505 is in changing the name "run_carbon" to "run/carbon". But since it's going back to symlinks, I do think that this is brittle. Maybe I could have been clearer in-person, but I think any solution using symlinks is going to require a split implementation: one for this repository on GitHub, one for Google.

@chandlerc
Copy link
Contributor Author

Oh, and to be sure, I believe that the main difference prior to #4505 is in changing the name "run_carbon" to "run/carbon".

Yes, it was trying to couple this change with the change to prevent a symlink for the busybox itself.

But thanks for the context on worrying about remote caches forcing even the original busybox to be a symlink into some CAS or other thing that loses all the interesting context. That was what I was missing that #4505 was trying to work around. Sorry if it wasn't clear -- this PR was mostly a question, I wasn't at all confident.

I have another idea that I'm going to play with, but not sure it will work. But now I know how to test that.

@chandlerc chandlerc force-pushed the busybox-take3 branch 2 times, most recently from 42849ee to d751db4 Compare November 21, 2024 03:04
@chandlerc chandlerc changed the title Replace #4505 with a hopefully simpler workaround Replace #4505 with a different set of workarounds Nov 21, 2024
@chandlerc
Copy link
Contributor Author

Ok, after this helpful feedback and some offline discussions, I think a new attempt.

The core of this is to look for a relative carbon-busybox binary prior to walking through symlinks, and an override to use from the Bazel run_tool rule.

Should get us symlink overhead but be resilient to all the perplexities of symlinks (I hope!). PTAL!

@chandlerc chandlerc requested a review from jonmeow November 21, 2024 03:26
@chandlerc
Copy link
Contributor Author

(Also, just flagging that I updated the PR description with new context.)

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood you intend things to change, I still would appreciate tests demonstrating the behavior that's not intended to work anymore. It would make it clearer that it's an intended break, not an accidental one. (suggestions for specific things below)

Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, PTAL.

@chandlerc chandlerc requested a review from jonmeow November 22, 2024 22:43
Copy link
Contributor Author

@chandlerc chandlerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was only one real outstanding issue here, but I know Jon is out for the next week so if someone else is up to picking up the last iteration on the review, I'd appreciate that. I think I've addressed Jon's comment here.

@chandlerc chandlerc removed the request for review from jonmeow November 23, 2024 05:15
@geoffromer geoffromer self-requested a review November 26, 2024 18:21
Comment on lines 27 to 28
// Returns the busybox information, given argv[0]. This primarily handles
// resolving symlinks that point at the busybox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble evaluating the test changes, because this comment doesn't really give me enough information to understand this function's desired behavior under the test conditions. Also, "This primarily handles resolving symlinks" doesn't seem accurate anymore.

After a fair amount of investigation, I think the key information that's missing here is that this function tries to find and return a bin_path referring to a file named "carbon-busybox" that is part of the same Carbon install (as indicated by their paths) as argv0 or some file in the chain of symlinks from argv0 (which means we can presumably access the rest of the install by rewriting bin_path). If that's right, I think it would help to document that as part of the contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated comment a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I don't think the new comment is specific enough to have helped me understand the implementation or the tests (e.g. I don't think I'd know what "most consistent" means, and I particularly wouldn't expect it to mean that shallower symlink traversals are preferred). However, that level of specificity may not be what other readers need, especially when the function is doing something heuristic like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried a new version of the comment. Is it any better?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be more explicit: I don't think this is blocking, because I'm not sure how much my experience as a reader generalizes, and I'm not sure how much effort it's worth putting into wordsmithing this API comment.

The previous revision worked better for me than this one, because it focused on the contract (albeit in imprecise terms), whereas this focuses on the implementation, and the implementation is complex enough that it's not clear what it implies about the contract.

If you want to keep iterating on this comment, maybe something like this?

"If argv0 is the path of a busyboxed executable that is part of a valid Carbon install, or indirectly refers to it via symlinks, then bin_path will be the path of the lib/carbon/carbon-busybox file in the same Carbon install."

I'm not sure that fully captures the intent, but that's the kind of contract-focused specificity that would have helped me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

Looks good, thanks.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Sorry, I'll try to be more explicit about that.

Comment on lines +245 to +246
// Starting from the second install uses the relative busybox rather than
// traversing the symlink further.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this important? If they're all symlinks to the same file, won't the user get the same behavior either way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two installs might be different versions or unrelated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how that could be the case, given that the test seems to say that one of them is a collection of symlinks to the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two installs, let's say version v0.1 in /usr and v0.2 in /usr/local.

For some strange reason, there is also a symlink /usr/local/carbon -> /usr/local/bin/carbon.

There is no symlink from /usr/local/... into /usr (or vice versa).

The problem is that the symlink /usr/local/carbon (which points to /usr/local/bin/carbon and eventually to /usr/local/lib/carbon/carbon-busybox) could use the same path-relative traversal as /usr/bin/carbon and accidentally find /usr/lib/carbon/carbon-busybox. The test is checking that it does not do this.

While we don't really intend symlinks to exist in this way, the goal is to avoid confusing behavior if it does occur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like you're describing the behavior of RejectSymlinkInUnrelatedInstall, not StopSearchAtFirstSymlinkWithRelativeBusybox?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, for some reason I thought that's where this comment thread was attached.

Because the new logic stops as soon as we see an install-relative busybox, we don't continue to follow symlinks when they, for example, link into a CAS directory tree or some other part of the Bazel output tree. This is what lets us use actual symlinks inside the prefix_root/... file group even though Bazel fully resolves them rather than leave the relative symlink structure in tact: once we see the correctly shaped install tree, we stop traversing symlinks.

There is some explanation about this in the PR description, but let me know if I should add comments here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the point is that even though in this case further traversal will still yield a correctly-shaped install tree, the non-traversal is important because of other cases where that's not true? If so, I think it'd be helpful to explain that in a comment here.

(It would be even better for the symlinks to point to something that's not a correctly-shaped install, so the test code directly illustrates the kind of case we're concerned about, but I don't think that's blocking.)

@chandlerc chandlerc requested a review from geoffromer November 26, 2024 20:41
@chandlerc
Copy link
Contributor Author

Thanks, PTAL!

Comment on lines +245 to +246
// Starting from the second install uses the relative busybox rather than
// traversing the symlink further.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the point is that even though in this case further traversal will still yield a correctly-shaped install tree, the non-traversal is important because of other cases where that's not true? If so, I think it'd be helpful to explain that in a comment here.

(It would be even better for the symlinks to point to something that's not a correctly-shaped install, so the test code directly illustrates the kind of case we're concerned about, but I don't think that's blocking.)

Comment on lines 27 to 28
// Returns the busybox information, given argv[0]. This primarily handles
// resolving symlinks that point at the busybox.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, tried wording much closer to that.

Looks good, thanks.

And it wasn't clear any of the comments were non-blocking, mostly trying to make progress here as this PR has been in review for a long time now.

Sorry, I'll try to be more explicit about that.

chandlerc and others added 5 commits December 20, 2024 00:27
This restores the symlinks for the installation, but teaches the busybox
info search to look for a relative path to the busybox binary itself
before walking through symlinks. This let's it find the tree structure
when directly invoking `prefix_root/bin/carbon` or similar, either
inside of a Bazel rule or from the command line, and mirrors how we
expect the installed tree to look. This works even when Bazel resolves
the symlink target fully, and potentially to something nonsensical like
a CAS file.

In order to make a convenient Bazel target that can be used with `bazel
run //toolchain`, this adds an override to explicitly set the desired
argv[0] to use when selecting a mode for the busybox and a busybox
binary. Currently, the workaround uses an environment variable because
that required the least amount of plumbing, and seems a useful override
mechanism generally, but I'm open to other approaches.

This should allow a few things to work a bit more nicely:
- It should handle sibling symlinks like `clang++` to `clang` or
  `ld.lld` to `lld`, where that symlink in turn points at the busybox.
  We want to use *initial* `argv[0]` value to select the mode there.
- It avoids bouncing through Python (or other subprocesses) when
  invoking the `carbon` binary in Bazel rules, which will be nice for
  building the example code and benchmarking.

It does come at a cost of removing one feature: the initial symlink
can't be some unrelated alias like `my_carbon_symlink` -- we expect the
*first* argv[0] name to have the meaningful filename for selecting
a busybox mode.

It also trades the complexity of the Python script for some complexity
in the busybox search in order to look for a relative `carbon-busybox`
binary. On the whole, I think that tradeoff is worthwhile, but it isn't
free.
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@chandlerc chandlerc enabled auto-merge December 20, 2024 00:32
@chandlerc chandlerc added this pull request to the merge queue Dec 20, 2024
jonmeow added a commit to jonmeow/carbon-lang that referenced this pull request Dec 20, 2024
@@ -43,7 +43,7 @@
"carbon.carbonPath": {
"type": "string",
"description": "The path to the 'carbon' binary.",
"default": "./bazel-bin/toolchain/install/run_carbon"
"default": "./bazel-bin/toolchain/carbon"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent #4722 to support a release with this change.

Merged via the queue into carbon-language:trunk with commit 13502b7 Dec 20, 2024
8 checks passed
@chandlerc chandlerc deleted the busybox-take3 branch December 20, 2024 00:40
jonmeow added a commit to jonmeow/carbon-lang that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation An issue or proposed change to our documentation toolchain
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants