Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] conan.tools.files.unzip() tries to keep files original owner/group and silently fail #16915

Closed
pinam45 opened this issue Aug 31, 2024 · 5 comments · Fixed by #16918
Closed
Assignees

Comments

@pinam45
Copy link

pinam45 commented Aug 31, 2024

Describe the bug

Tested under real Linux, but I only have a WSL while writing this:

$ uname -a
Linux ***** 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 GNU/Linux

$ conan --version
Conan version 2.7.0

$ conan profile detect
detect_api: Found cc=gcc- 14.2.0
detect_api: gcc>=5, using the major as version
detect_api: gcc C++ standard library: libstdc++11

Detected profile:
[settings]
arch=x86_64
build_type=Release
compiler=gcc
compiler.cppstd=gnu17
compiler.libcxx=libstdc++11
compiler.version=14
os=Linux

.tar files contain user/group information for files, by default they are not kept when you extract the files (with for example tar -xvf libpng.tar.gz), but you can add --same-owner to keep them. Most of the time the user/group does not exist on your machine and it fails to change the owner:

$ wget https://sourceforge.net/projects/libpng/files/libpng16/1.6.34/libpng-1.6.34.tar.gz/download -O libpng.tar.gz
...
$ tar --same-owner -xvf libpng.tar.gz -C tmp
libpng-1.6.34/
libpng-1.6.34/pngwio.c
tar: libpng-1.6.34/pngwio.c: Cannot change ownership to uid 1004, gid 5101: Operation not permitted
libpng-1.6.34/libpng.3
tar: libpng-1.6.34/libpng.3: Cannot change ownership to uid 1004, gid 5101: Operation not permitted
...

I expected conan.tools.files.unzip() to have the same default behavior, not keeping the user/group stored in the tar. But it seems conan is trying to apply the original user/group, but with silent failing.

as maxime:

$ conan source .
conanfile.py (name/0.0.1): Calling source() in /mnt/d/tmp/conan_test/external
conanfile.py (name/0.0.1): Unzipping libpng.tar.gz to .

$ ls -lah external/README
-rw-r--r-- 1 maxime maxime 12K Sep 29  2017 external/README

No error messages, files in maxime/maxime, as expected.

as root:

# conan source .
conanfile.py (name/0.0.1): Calling source() in /mnt/d/tmp/conan_test/external
conanfile.py (name/0.0.1): Unzipping libpng.tar.gz to .

# ls -lah external/README
-rw-r--r-- 1 1004 5101 12K Sep 29  2017 external/README

Now I have files in a user that doesn't exist on my machine, quite unexpected.

You don't need superuser access to encounter this issue, in a real more complex setup, when running my build outside a container everything is fine (the "chown" silently fail).
When I run my builds in a docker rootless container (the uid/gid of root in the container match the uid/gid of my user outside of the container), the build run in root inside the container to be in my user outside, and to keep my conan cache between builds I mount my ~/.conan2/p. Here the original owner of the files is kept (the "chown" is successful in "fake" root), files are in my conan cache in 1004/5101 and I end up with conan remove failing as it does not have permission to remove those files.

Seems to me like an unwanted behavior, but if it is the indented behavior it would be nice to describe it in the doc or FAQ.

If you are in a similar situation, I solved my problem by changing the owner and group of the files after the unzip call:

        uid=os.stat(self.source_folder).st_uid
        gid=os.stat(self.source_folder).st_gid
        for root, dirs, files in os.walk(self.source_folder):
            for dir in dirs:
                os.chown(os.path.join(root, dir), uid, gid)
            for file in files:
                os.chown(os.path.join(root, file), uid, gid)

How to reproduce it

minimal conanfile.py used to reproduce:

from conan import ConanFile
from conan.tools.files import download, unzip, rm

class ConanRecipe(ConanFile):
    # info
    name = "name"
    version = "0.0.1"
    package_type = "library"

    def source(self):
        # remove source folder
        rm(self, "*", self.source_folder, recursive=True)

        # download archive
        archive_url="https://sourceforge.net/projects/libpng/files/libpng16/1.6.34/libpng-1.6.34.tar.gz/download"
        archive_sha1="5c4f8984ff4dd8a150effa32dc9166b32abe0622"
        archive_filename="libpng.tar.gz"
        download(self, url=archive_url, filename=archive_filename, sha1=archive_sha1)

        # uncompress archive
        unzip(self, filename=archive_filename, destination=".", strip_root=True)

        # remove archive
        rm(self, archive_filename, self.export_sources_folder)

    def layout(self):
        self.folders.source="external"

as maxime:

$ conan source .
conanfile.py (name/0.0.1): Calling source() in /mnt/d/tmp/conan_test/external
conanfile.py (name/0.0.1): Unzipping libpng.tar.gz to .

$ ls -lah external/README
-rw-r--r-- 1 maxime maxime 12K Sep 29  2017 external/README

as root:

# conan source .
conanfile.py (name/0.0.1): Calling source() in /mnt/d/tmp/conan_test/external
conanfile.py (name/0.0.1): Unzipping libpng.tar.gz to .

# ls -lah external/README
-rw-r--r-- 1 1004 5101 12K Sep 29  2017 external/README
@memsharded memsharded self-assigned this Aug 31, 2024
@memsharded
Copy link
Member

Hi @pinam45

Thanks for your feedback.

We are aware of this behavior, and at the moment we don't consider it a bug.
In this regard, Conan is not explicitly defining this behavior either, it is just using the Python tar extractall default behavior, basically the implementation is just:

    with tarfile.TarFile.open(filename, 'r:*') as tarredgzippedFile:
        if not pattern and not strip_root:
            tarredgzippedFile.extractall(destination)

This is not generally an issue because:

  • Conan should probably never be ran as root. Conan is designed to not need root permissions for anything (except the system package manager like apt and for those case, a explicit conf to enable sudo is required, to make this a explicit opt-in by users)
  • This hasn't been an issue in the download & unzip of source for +1500 packages in ConanCenter, as in most cases it doesn't make things fail.
  • The Conan cache is not designed to be reused across different machines, it is designed to work in a given machine, and at the moment not concurrently.

Also we have considered this, and the problem with this kind of behavior is that there is no approach that makes everyone happy. I am very sure that if we changed this to remove the owner/group automatically, there would be some users that would complained because they rely on this behavior for some reasons.

So in any case we could consider adding an opt-in that would implement such behavior. But not sure if it is worth the effort, if this is important, it would be just a matter of calling self.run(f"tar -xvf {filename}") instead of unzip(self, ... so not a big deal.
Would you be interested in contributing such a feature?

@pinam45
Copy link
Author

pinam45 commented Sep 1, 2024

Thanks for the answer, I am not much of a Python dev, but why not try to add a param to conan.tools.files.unzip() similar to keep_permissions but for ownership.

I searched around to better understand the behavior of TarFile.extractall, especially the filter parm:

TarFile.extractall(path='.', members=None, *, numeric_owner=False, filter=None)

the doc explain:

The filter argument specifies how members are modified or rejected before extraction. See Extraction filters for details. It is recommended to set this explicitly depending on which tar features you need to support.

Seems like the data filter is what I want (the 2 last points) and add some safety:

Implements the 'data' filter. In addition to what tar_filter does:

  • Refuse to extract links (hard or soft) that link to absolute paths, or ones that link outside the destination.
  • Refuse to extract device files (including pipes). This raises SpecialFileError.
  • For regular files, including hard links:
  • For other files (directories), set mode to None, so that extraction methods skip applying permission bits.
  • Set user and group info (uid, gid, uname, gname) to None, so that extraction methods skip setting it.

But what is the filter currently used by conan? None, well back to Extraction filters:

None (default): Use TarFile.extraction_filter.

and from TarFile.extraction_filter:

If extraction_filter is None (the default), calling an extraction method without a filter argument will raise a DeprecationWarning, and fall back to the fully_trusted filter, whose dangerous behavior matches previous versions of Python.

The fully_trusted_filter does basically nothing, which explain the "dangerous behavior" part:

The tar format is designed to capture all details of a UNIX-like filesystem, which makes it very powerful. Unfortunately, the features make it easy to create tar files that have unintended – and possibly malicious – effects when extracted. For example, extracting a tar file can overwrite arbitrary files in various ways (e.g. by using absolute paths, .. path components, or symlinks that affect later members).

In most cases, the full functionality is not needed. Therefore, tarfile supports extraction filters: a mechanism to limit functionality, and thus mitigate some of the security issues.

Well, quite the old school choice from Python, but back to Extraction filters again:

In Python 3.14, the 'data' filter will become the default instead. It’s possible to switch earlier; see TarFile.extraction_filter.

And of course:

$ python3 --version
Python 3.12.5

So the data filter with the behavior I expected will be used without any modification on the conan side once Python 3.14 is out.

Knowing this, I think it is not worth making changes to the current conan code, what is your opinion?

@memsharded
Copy link
Member

Knowing this, I think it is not worth making changes to the current conan code, what is your opinion?

We are aware that Python 3.14 is changing some behavior, and we need to prepare for that, we have already started this conversation. We are discussing at the moment between reducing the risk of breaking but keeping the current a bit less safe behavior, or change it and increase the possibility of breaking to some users.

The truth is that I thought that there was something in Python lib that could be used, but you are right that it seems there is nothing built-in that is easily used. So it seems you are right, and the best would be at least to wait until this filter change is done, and re-think from there if that is the case.

@memsharded
Copy link
Member

This would be the PR we will handle this: #16918

@memsharded
Copy link
Member

This has been closed by #16918

Conan 2.8 will make available the tools.files.unzip:filter=data|tar|fully_trusted conf, so the default Python 3.14 behavior will be anticipated. The recommendation is to define tools.files.unzip:filter=data as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants