Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative imports not mapping to correct sources when multiple files have identical contents #19618

Closed
njgrisafi opened this issue Aug 17, 2023 · 3 comments · Fixed by #19630
Closed
Assignees
Labels
backend: Python Python backend-related issues bug
Milestone

Comments

@njgrisafi
Copy link

njgrisafi commented Aug 17, 2023

Describe the bug
Some background can be found on slack https://pantsbuild.slack.com/archives/C046T6T9U/p1692230510929189

When a renaming/moving files with relative imports, pants dependency inference fails on the new file with UnownedDependencyError. The error only occurs on relative imports and what's interesting is pants says the source for the relative import is the old file that was moved.

Pants version
2.17.0rc1

OS
Both MacOS and Linux

Additional info

This issue is a bit difficult to reproduce. There's some specific order of operations that causes pants to get into this state.

I pushed this a branch to my example repo https://github.com/njgrisafi/pants-example/tree/example-dep-inf-issue to hopefully help with narrowing down reproducing this.

Simple reproduction

  1. Clone repo git clone git@github.com:njgrisafi/pants-example.git
  2. Checkout the first branch git checkout example-dep-inf-issue
  3. Run dependency inference ./pants peek :: >> /dev/null
  4. Checkout the second branch git checkout example-dep-inf-issue-2
  5. Run dependency inference ./pants peek :: >> /dev/null

You should see the following error

UnownedDependencyError: Pants cannot infer owners for the following imports in the target app/module_2/break_pants/__init__.py:

  * module_1.break_pants._example.x (line: 1)

To fix the issue you can nuke all the pants artifacts

rm -rf ~/.cache/pants
rm -rf .pids
rm -rf .pants.d

Rerunning dependency inference will now succeed as expected.

Detailed reproduction

Detailed steps to repro this behavior (first checkout repo / branch example-dep-inf-issue):

  1. Run dependency inference ./pants peek :: >> /dev/null
  2. Add new module app/module_2/break_pants
    • Create file app/module_2/break_pants/__init__.py (with content from ._example import x)
    • Move file app/module_1/break_pants/_example.py -> app/module_2/break_pants/_example.py
  3. Update the old code
    • app/module_1/break_pants/__init__.py (with content from module_2.break_pants import *)
  4. Run dependency inference ./pants peek :: >> /dev/null

You should see an error like

UnownedDependencyError: Pants cannot infer owners for the following imports in the target app/module_2/break_pants/__init__.py:

  * module_1.break_pants._example.x (line: 1)

The expected behavior is pants would map the relative import to module_2.break_pants._example.x.

@njgrisafi njgrisafi added the bug label Aug 17, 2023
@huonw huonw added the backend: Python Python backend-related issues label Aug 17, 2023
@huonw huonw modified the milestone: 2.17.x Aug 20, 2023
@huonw
Copy link
Contributor

huonw commented Aug 21, 2023

I can reproduce this on my machine; thanks for providing a self-contained reproducer (and of course, sorry for the trouble)!

Unfortunately, I haven't had time to dig into the root cause fully yet... I'm hoping it's not the new Rust parser in 2.17, but I'm suspicious 🤔

I may or may not get a chance to look in more detail over the next day or two.

@huonw
Copy link
Contributor

huonw commented Aug 21, 2023

Okay, yes, this is does indeed caused by the Rust parser. This means that there's a work around: continuing to use the old parser [python-infer] use_rust_parser = false in pants.toml.

Here's the reproduction script I was using:

# easier management/clearing of the cache
export PANTS_LOCAL_STORE_DIR=$(mktemp -d)

echo "*** Before"
git checkout example-dep-inf-issue
pants peek :: > /dev/null

echo "*** After"
git checkout example-dep-inf-issue-2
pants peek :: > /dev/null

echo "*** New cleared cache"
export PANTS_LOCAL_STORE_DIR=$(mktemp -d)

pants peek :: > /dev/null

Running bash ./run.sh gives output like in the issue 💥

*** Before
...
Exit code: 0
*** After
...
UnownedDependencyError: Pants cannot infer owners for the following imports in the target app/module_2/break_pants/__init__.py:

  * module_1.break_pants._example.x (line: 1)

...
Exit code: 1
*** New cleared cache
...
Exit code: 0

Running it with PANTS_PYTHON_INFER_USE_RUST_PARSER=false bash ./run.sh gives Exit code: 0 for all runs. ✅

This is the diff between the two branches (git diff example-dep-inf-issue example-dep-inf-issue-2):

diff --git a/app/module_1/break_pants/__init__.py b/app/module_1/break_pants/__init__.py
index 3c2673c..8b38b6b 100644
--- a/app/module_1/break_pants/__init__.py
+++ b/app/module_1/break_pants/__init__.py
@@ -1 +1 @@
-from ._example import x
\ No newline at end of file
+from module_2.break_pants import *
\ No newline at end of file
diff --git a/app/module_2/break_pants/BUILD b/app/module_2/break_pants/BUILD
new file mode 100644
index 0000000..db46e8d
--- /dev/null
+++ b/app/module_2/break_pants/BUILD
@@ -0,0 +1 @@
+python_sources()
diff --git a/app/module_2/break_pants/__init__.py b/app/module_2/break_pants/__init__.py
new file mode 100644
index 0000000..3c2673c
--- /dev/null
+++ b/app/module_2/break_pants/__init__.py
@@ -0,0 +1 @@
+from ._example import x
\ No newline at end of file
diff --git a/app/module_1/break_pants/_example.py b/app/module_2/break_pants/_example.py
similarity index 100%
rename from app/module_1/break_pants/_example.py
rename to app/module_2/break_pants/_example.py

As implied by having to nuke all pants artifacts, this appears to be a on-disk caching issue, not just an in-memory pantsd one: running PANTS_PANTSD=false bash ./run.sh to have no in-memory caching reproduces the same error.

@huonw huonw added this to the 2.17.x milestone Aug 21, 2023
@huonw
Copy link
Contributor

huonw commented Aug 21, 2023

Okay, mechanism: this is inappropriate caching when there's two files in different locations that have identical contents, the relative import is resolved and cached in the old location, and isn't recomputed for the new location. Moving a file is a good way to get that in practice, as in the old reproducer.

Reduced reproducer (thanks again @njgrisafi, your initial repo was crucial):

set -x

cd $(mktemp -d)

# avoid corrupting the normal cache
export PANTS_LOCAL_STORE_DIR=$(mktemp -d)

# Setup:
cat > pants.toml <<EOF
[GLOBAL]
pants_version = "2.17.0rc3"
backend_packages = ["pants.backend.python"]

[source]
root_patterns = ["/"]

[python]
interpreter_constraints = ["==3.9.*"]

[python-infer]
use_rust_parser = true
EOF

mkdir a b


echo 'python_sources()' > a/BUILD
echo 'python_sources()' > b/BUILD

# two identical files with relative imports:
echo 'from .file import x' > a/__init__.py
echo 'from .file import x' > b/__init__.py

# only one of those imports is valid
touch a/file.py

pants dependencies a/__init__.py
# OK: a/file.py
pants dependencies b/__init__.py
# BUG: a/file.py
# should be an error because there's no b/file.py (and also, it's not what Python would load, if b/file.py did exist)

@huonw huonw changed the title Relative imports not mapping to correct sources Relative imports not mapping to correct sources when multiple files have identical contents Aug 21, 2023
@huonw huonw self-assigned this Aug 21, 2023
huonw added a commit that referenced this issue Aug 22, 2023
…19630)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618
thejcannon pushed a commit that referenced this issue Aug 22, 2023
…herry-pick of #19630) (#19640)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618 

This is a cherry pick of #19630, but is essentially a slightly weird
rewrite that partially cherry-picks #19001 too, by copying over the
whole `DependencyInferenceRequest` protobuf type as is (even with some
fields that are unused) because that's the easiest way to generate
appropriate bytes for turning into the digest for use in the cache-key.
I think it's okay to have this "weird" behaviour in the 2.17 branch,
with the real/normal code in main/2.18?
github-actions bot pushed a commit that referenced this issue Aug 29, 2023
…19630)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618
huonw added a commit that referenced this issue Aug 29, 2023
…herry-pick of #19630) (#19690)

This adds the file path for the file being dependency-inferred to the
cache key for the Rust dep inference. Without this, the same file
contents appearing multiple times in different places will give the
wrong results for relative imports, because the dep inference process
mashes together the file path and the relative import.

The circumstances here seem most likely to occur in the real world if a
file is moved, with the inference results from before the move reused
for the file _after_ the move too.

I've tested this manually on the reproducer in
#19618 (comment),
in addition to comparing the before (fails) and after (passes) behaviour
of the new test.

To try to make this code more resilient to this sort of mistake, I've
removed the top level `PreparedInferenceRequest.path` key, in favour of
using the new `input_file_path` on the request protobuf struct, at the
cost of an additional `String` -> `PathBuf` conversion when doing the
inference.

Fixes #19618

Co-authored-by: Huon Wilson <huon@exoflare.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants