Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafted libraries can end up with the same hash but different contents #389

Open
lkollar opened this issue Jul 26, 2022 · 1 comment
Open
Labels

Comments

@lkollar
Copy link
Contributor

lkollar commented Jul 26, 2022

The numpy==1.18.4 and scipy==1.4.1 libraries both contain the libopenblasp-r0-34a18dc3.3.7.so file, as they most likely used the same version when invoking auditwheel. The hashes match, but the files differ in contents:

$ cmp numpy.libs/libopenblasp-r0-34a18dc3.3.7.so scipy/.libs/libopenblasp-r0-34a18dc3.3.7.so
numpy.libs/libopenblasp-r0-34a18dc3.3.7.so scipy/.libs/libopenblasp-r0-34a18dc3.3.7.so differ: byte 57, line 1

Most likely this is happening because we compute the hash before performing the patching on the files and using a different version of patchelf can produce slightly differenet .so files:

with open(src_path, "rb") as f:
shorthash = hashfile(f)[:8]
src_name = os.path.basename(src_path)
base, ext = src_name.split(".", 1)
if not base.endswith("-%s" % shorthash):
new_soname = f"{base}-{shorthash}.{ext}"
else:
new_soname = src_name
dest_path = os.path.join(dest_dir, new_soname)
if os.path.exists(dest_path):
return new_soname, dest_path
logger.debug("Grafting: %s -> %s", src_path, dest_path)

Given that the resulting files are practically identical, they should not cause any issues, even if for some reason the loader decides to reuse the cached version, but I think we should generate the hash after modifying the file.

Another side effect of generating the hash before grafting is that it is impossible to validate the hash, if one wanted to make sure no modifications have been done to the grafted library.

@lkollar lkollar added the bug label Jul 26, 2022
@lkollar
Copy link
Contributor Author

lkollar commented Jul 28, 2022

Another side effect of generating the hash before grafting is that it is impossible to validate the hash, if one wanted to make sure no modifications have been done to the grafted library.

This is actually causes an issue in the code snippet I pasted above: the code tries to avoid patching an already patched library, but it does so based on the generated hash. The else on line 143 will never be reached because the hash is computed before we make the modification. The next time auditwheel runs, the hash will be different, since we've modified the binary.

As we now use the hash as the SONAME, which is embedded in the file, there is no way to perform the hashing after the modification. If we want to avoid modifying a library we've already patched before, we could add some information in the NOTE section maybe. But given that it doesn't seem that this feature works properly, maybe it's not worth fixing it and we could just switch to different way to make the SONAME unique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant