Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No such file or directory: 'local_dists.pex/PEX-INFO' when running a python_source #17987

Closed
danxmoran opened this issue Jan 12, 2023 · 11 comments · Fixed by #18035
Closed

No such file or directory: 'local_dists.pex/PEX-INFO' when running a python_source #17987

danxmoran opened this issue Jan 12, 2023 · 11 comments · Fixed by #18035
Assignees
Labels
backend: Python Python backend-related issues bug
Milestone

Comments

@danxmoran
Copy link
Contributor

Describe the bug

After updating one of our CI checks to ./pants run a python_source directly (vs. the previous code which ran the pex_binary), some of our jobs started failing with:

Traceback (most recent call last):
  File "/opt/python/3.8.14/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/python/3.8.14/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/./src.pex/__main__.py", line 89, in <module>
    __venv_dir__ = __maybe_run_venv__(
  File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/./src.pex/__main__.py", line 37, in __maybe_run_venv__
    venv_dir = venv_dir(
  File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/src.pex/.bootstrap/pex/variables.py", line 738, in venv_dir
  File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/src.pex/.bootstrap/pex/variables.py", line 736, in add_pex_path_items
  File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/src.pex/.bootstrap/pex/pex_info.py", line 82, in from_pex
FileNotFoundError: [Errno 2] No such file or directory: 'local_dists.pex/PEX-INFO'

The python_source has run_goal_use_sandbox=False.

Pants version

v2.15.0rc1

OS

Linux

@danxmoran danxmoran added the bug label Jan 12, 2023
@jsirois
Copy link
Contributor

jsirois commented Jan 12, 2023

@danxmoran I should know this, but for visibility / clarity here locally, is remote caching in play here?

@danxmoran
Copy link
Contributor Author

@jsirois yes, here's a run where it happened: https://app.toolchain.com/organizations/color/repos/color/builds/pants_run_2023_01_12_20_00_55_964_2c6a8359d6dd451d822e35fc67f222ec/

This is the same CI system where we sometimes see the Dockerfile parser's file mysteriously not be present - not sure if it could be the same underlying issue.

@jsirois
Copy link
Contributor

jsirois commented Jan 12, 2023

Thanks @danxmoran. This could be the same but it's unclear still. The BSD lock fix for the other issue has worked for every other Pants / Pex user (4 cases IIRC) except you and you're the only instance amongst those using remote caching. Tool chain has also seen this exact error on rc1 and it appears to toggle with remote caching. Still shakiness in all this but your data point is useful to have in the mix.

@stuhood stuhood added this to the 2.15.x milestone Jan 13, 2023
@stuhood stuhood self-assigned this Jan 13, 2023
@danxmoran
Copy link
Contributor Author

We've stopped seeing the problem after downgrading to 2.15.0rc0.

@jsirois
Copy link
Contributor

jsirois commented Jan 13, 2023

Thanks, that matches Toolchain's experience.

@stuhood
Copy link
Member

stuhood commented Jan 14, 2023

I triaged this a little bit this afternoon.

In the repro case that we have:

I'm adding some additional workunit metadata to get the precise digest and process definition for the InteractiveProcess, in order to see whether the input file was missing in the definition.

@stuhood
Copy link
Member

stuhood commented Jan 18, 2023

Ok, via the debug information added above, I was able to confirm that local_dists.pex/PEX-INFO does exist inside the sandbox, so I'm fairly sure that #17761 has triggered a non-portable venv scripts issue, whereby a cache entry fetched from the remote cache is expecting the venv to already exist, and fails when it doesn't (?).

The venv-pex script has accommodations in place to attempt to survive that case:

# If the seeded venv has been removed from the PEX_ROOT, we re-seed from the original
# `--venv` mode PEX file.
if [ ! -e "${{venv_dir}}" ]; then
PEX_INTERPRETER=1 ${{execute_pex_args}} -c ''
fi
, but it relies on everything being properly relativized to the sandbox.

Inspecting the ci_checks.pex_pex_shim.sh script from the example I was looking at shows some suspicious absolute paths in the run script (i.e. /home/$USER/.cache/pants/named_caches/pex_root/venvs/3e124598879745852f17d3617f4c6e9dfb452d35/bce0f88380a53c54c81779a31f158530a0fef322/pex), which I would have expected to be relative.

It should be possible to repro this case by running a python_source, and then clearing the ~/.cache/pants/named_caches but not ~/.cache/pants/lmdb_store: the cache hit from the lmdb_store should be able to trigger the issue.


@thejcannon : How would you feel about reverting #17761 out of 2.15.x until this can be resolved on main?

@jsirois
Copy link
Contributor

jsirois commented Jan 18, 2023

@stuhood how does this relate to the backtrace in the OP?:

File "/home/runner/.pants/execution/pants-sandbox-g6NiAf/src.pex/.bootstrap/pex/pex_info.py", line 82, in from_pex
FileNotFoundError: [Errno 2] No such file or directory: 'local_dists.pex/PEX-INFO'

That has 0 trace of the elements you're talking about FWICT. The only missing piece of info from that backtrace is what the CWD is that the PEX_PATH item local_dists.pex/PEX-INFO is being searched for relative to.

@stuhood
Copy link
Member

stuhood commented Jan 18, 2023

I was able to repro this on 2.15.x by removing named_caches without removing lmdb_store. It did not repro on latest main, but did repro at #17700. Bisecting showed that #17742 fixed this on main, so I'll get a cherry-pick out for that one. Additionally, #17750 is apparently the fix on main for the absolute paths in the venv script, so I'll pick that one as well.

@stuhood
Copy link
Member

stuhood commented Jan 18, 2023

Argh. There are other commits relevant to those two changes that cause tests to fail, and there are too many picks to the branch to back out. I'll try to figure out what is missing.

In future, I think that we should avoid cherry-picks of this size.

@stuhood
Copy link
Member

stuhood commented Jan 18, 2023

Fixed in #18035.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Python Python backend-related issues bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants