Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i/o error while capturing larger artifact #13401

Closed
stuhood opened this issue Oct 28, 2021 · 10 comments
Closed

i/o error while capturing larger artifact #13401

stuhood opened this issue Oct 28, 2021 · 10 comments
Labels

Comments

@stuhood
Copy link
Member

stuhood commented Oct 28, 2021

Describe the bug
During a constraints.txt resolve (pre-#13400), a large output PEX captured in the sandbox resulted in an input/output error from the LMDB store:

"Error storing Digest { hash: Fingerprint<012ed7213b69805f0e19542c9ec2fc98640d356b25739ec6c687bc20abcf9ab6>, size_bytes: 2392030244 }: Input/output error"

Adjusting the [GLOBAL].local_store_shard_count did not help, so if this is related to artifact size, it would have to be an internal LMDB limitation that we are not aware of (rather than the LMDB max size, which the lower shard count would increase significantly).

Pants version
2.7.1

OS
Linux.

Reproducing the issue on other machines and platforms (the reporter was on Linux) to confirm that it was not machine or platform specific will be important.

@stuhood stuhood added the bug label Oct 28, 2021
@joshua-cannon-techlabs
Copy link

This should get you very far:

mxnet-cu110
mxnet-cu102
mxnet-cu101
mxnet-cu100
pytorch

Assuming you are able to install them all at once ;)

@stuhood
Copy link
Member Author

stuhood commented Nov 1, 2021

Thanks. It should hopefully be possible to reproduce synthetically. 🤞

@gautiervarjo
Copy link

Hello! I appear to be running into the same issue. I'm also building a large PEX (just like Joshua mentioned above, it's large mainly because of pytorch), and I get:

  Exception: Failed to execute: Process {
    argv: [
        "/usr/bin/python3",
        "./pex",
        ...
    ],
    ...
    description: "Extracting 19 requirements to build xxx.pex from python-default_lockfile.pex: ..."

Failed to digest inputs: "Error storing Digest { hash: Fingerprint<...>, size_bytes: 2280626439 }: Input/output error"
  • Pants version: 2.11.0
  • OS: Linux

Unfortunately I can't share the repository itself, so let me see if I can't reproduce with a dummy codebase.

@gautiervarjo
Copy link

An update: I was able to make a small pants repo for reproducing the issue, but while I was doing so I noticed while adding and removing 3rd party deps that:

  • I was able to produce a PEX 2086621883 bytes large (2**30.95).
  • I wasn't able to produce a PEX 2163448531 bytes large (2**31.01)

Seeing how the threshold is suspiciously close to 2**31 (an i32 size somewhere?) I attempted to reproduce the problem more directly, and succeeded: gautiervarjo@4aba38d

This commit adds a test to the sharded_lmdb crate that successfully writes a 1GiB value, but fails to write a 2GiB value. That's all I've got for now, but baby steps!

@stuhood
Copy link
Member Author

stuhood commented May 18, 2022

Thanks a lot @gautiervarjo! Much appreciated.

To work around this issue, you should be able to use pex_binary(.., layout='packed'): see https://www.pantsbuild.org/docs/reference-pex_binary#codelayoutcode

@gautiervarjo
Copy link

Ah! Thank you so much for the quick response, switching layouts works!

I won't investigate further since the workaround suits me fine (I wanted to unpack the pex zipapp into a docker image anyway). I'm not sure how high/low priority this bug was for you, but don't bump it up on my account 🙂

@stuhood
Copy link
Member Author

stuhood commented May 19, 2022

I'm not sure how high/low priority this bug was for you, but don't bump it up on my account 🙂

It's definitely worrisome (and there are a few other papercuts due to LMDB, including the sharding and configured max sizes), but it's unclear which database we would want to switch to for this use case, so we've been punting.

Thanks again!

@PGrothaus
Copy link

Just wanted to comment that I ran into the same issue for a large .pex file (con pytorch as well). Setting the layout to "packaged" has helped me. Thanks a lot!

@tgolsson
Copy link
Contributor

tgolsson commented May 5, 2023

Hit this as well. Wondering whether it'd be worth having a specific catch for this with a diagnostic/help message.

Also, is this related to #16697?

@stuhood
Copy link
Member Author

stuhood commented May 5, 2023

This is fixed in the 2.17.0.dev releases (by #18153), but it is definitely too large to backport, unfortunately.

As mentioned above though: you almost certainly want to be using the packed layout in this case anyway, as it has much better content-addressability, and so will use less space on disk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants