Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add path to cache key for Rust dep inference, for relative imports #19630

Merged
merged 4 commits into from
Aug 22, 2023

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Aug 21, 2023

This adds the file path for the file being dependency-inferred to the cache key for the Rust dep inference. Without this, the same file contents appearing multiple times in different places will give the wrong results for relative imports, because the dep inference process mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a file is moved, with the inference results from before the move reused for the file after the move too.

I've tested this manually on the reproducer in #19618 (comment), in addition to comparing the before (fails) and after (passes) behaviour of the new test.

To try to make this code more resilient to this sort of mistake, I've removed the top level PreparedInferenceRequest.path key, in favour of using the new input_file_path on the request protobuf struct, at the cost of an additional String -> PathBuf conversion when doing the inference.

Fixes #19618

@huonw huonw added needs-cherrypick category:internal CI, fixes for not-yet-released features, etc. labels Aug 21, 2023
@huonw huonw added this to the 2.17.x milestone Aug 21, 2023
@huonw huonw force-pushed the bugfix/19618-rust-infer-path branch from bfe02bd to 5498b2d Compare August 21, 2023 08:06
@huonw huonw requested review from stuhood, thejcannon and tobni August 21, 2023 08:06
)),
|_workunit| async move {
let result: ParsedPythonDependencies = get_or_create_inferred_dependencies(
core,
&store,
prepared_inference_request,
|content, request| python::get_dependencies(content, request.path),
|content, request| {
python::get_dependencies(content, request.inner.input_file_path.into())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the PR description, this is the additional String -> PathBuf conversion. We can avoid this in a few ways:

  1. use and manipulate String in the get_dependencies functions directly, instead of PathBuf, which seems reasonable given the functions do a &Path -> &str conversion internally anyway (although this would lose some convenience helpers for manipulating the paths)
  2. have a protobuf type that maps to PathBuf directly

Or... just not worry about it, because this conversion is fast.

Copy link
Member

@thejcannon thejcannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏 You rock! Thanks!

Comment on lines 769 to 772
/// The request that's guaranteed to have been constructed via ::prepare().
///
/// NB. this `inner` value is used as the cache key, so anything that can influence the dep
/// inference should (also) be inside it, not just a key on this struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@huonw huonw merged commit ec77fa9 into pantsbuild:main Aug 22, 2023
@huonw huonw deleted the bugfix/19618-rust-infer-path branch August 22, 2023 00:43
@WorkerPants
Copy link
Member

I tried to automatically cherry-pick this change back to each relevant milestone, so that it is available in those older releases of Pants.

❌ 2.17.x

I was unable to cherry-pick this PR to 2.17.x, likely due to merge-conflicts.

Steps to Cherry-Pick locally

To resolve:

  1. (Ensure your git working directory is clean)
  2. Run the following script to reproduce the merge-conflicts:
    git fetch https://github.com/pantsbuild/pants main \
      && git fetch https://github.com/pantsbuild/pants 2.17.x \
      && git checkout -b cherry-pick-19630-to-2.17.x FETCH_HEAD \
      && git cherry-pick ec77fa9346dd612006dbdba24e5d823b7733da46
  3. Fix the merge conflicts and commit the changes
  4. Run build-support/cherry_pick/make_pr.sh "19630" "2.17.x"

Please note that I cannot re-run CI if a job fails. Please work with your PR approver(s) to re-run CI if necessary.


When you're done manually cherry-picking, please remove the needs-cherrypick label on this PR.

Thanks again for your contributions!

🤖 Beep Boop here's my run link

@WorkerPants WorkerPants added the auto-cherry-picking-failed Auto Cherry-Picking Failed label Aug 22, 2023
thejcannon pushed a commit that referenced this pull request Aug 22, 2023
…herry-pick of #19630) (#19640)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618 

This is a cherry pick of #19630, but is essentially a slightly weird
rewrite that partially cherry-picks #19001 too, by copying over the
whole `DependencyInferenceRequest` protobuf type as is (even with some
fields that are unused) because that's the easiest way to generate
appropriate bytes for turning into the digest for use in the cache-key.
I think it's okay to have this "weird" behaviour in the 2.17 branch,
with the real/normal code in main/2.18?
@huonw huonw modified the milestones: 2.17.x, 2.18.x Aug 29, 2023
github-actions bot pushed a commit that referenced this pull request Aug 29, 2023
…19630)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618
@WorkerPants
Copy link
Member

I tried to automatically cherry-pick this change back to each relevant milestone, so that it is available in those older releases of Pants.

✔️ 2.18.x

Successfully opened #19690.


Thanks again for your contributions!

🤖 Beep Boop here's my run link

huonw added a commit that referenced this pull request Aug 29, 2023
…herry-pick of #19630) (#19690)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618

Co-authored-by: Huon Wilson <huon@exoflare.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-cherry-picking-failed Auto Cherry-Picking Failed category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relative imports not mapping to correct sources when multiple files have identical contents
3 participants