Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.10] gh-102950: Implement PEP 706 – Filter for tarfile.extractall (GH-102953) #104128

Merged
merged 7 commits into from
May 10, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -634,8 +634,9 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
registered for that extension. In case none is found,
a :exc:`ValueError` is raised.

The keyword-only *filter* argument is passed to the underlying unpacking
function. For zip files, *filter* is not accepted.
The keyword-only *filter* argument, which was added in Python 3.11.4,
mcepl marked this conversation as resolved.
Show resolved Hide resolved
is passed to the underlying unpacking function.
For zip files, *filter* is not accepted.
For tar files, it is recommended to set it to ``'data'``,
unless using features specific to tar and UNIX-like filesystems.
(See :ref:`tarfile-extraction-filter` for details.)
Expand All @@ -654,7 +655,7 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
.. versionchanged:: 3.7
Accepts a :term:`path-like object` for *filename* and *extract_dir*.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4
Added the *filter* argument.

.. function:: register_unpack_format(name, extensions, function[, extra_args[, description]])
Expand Down
61 changes: 32 additions & 29 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,6 @@ Some facts and figures:
.. versionchanged:: 3.3
Added support for :mod:`lzma` compression.

.. versionchanged:: 3.12
Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`,
which makes it possible to either limit surprising/dangerous features,
or to acknowledge that they are expected and the archive is fully trusted.
By default, archives are fully trusted, but this default is deprecated
and slated to change in Python 3.14.


.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Expand Down Expand Up @@ -437,8 +430,8 @@ be finalized; only the internally used file object will be closed. See the
are used to set the owner/group for the extracted files. Otherwise, the named
values from the tarfile are used.

The *filter* argument specifies how ``members`` are modified or rejected
before extraction.
The *filter* argument, which was added in Python 3.11.4, specifies how
``members`` are modified or rejected before extraction.
See :ref:`tarfile-extraction-filter` for details.
It is recommended to set this explicitly depending on which *tar* features
you need to support.
Expand All @@ -459,7 +452,7 @@ be finalized; only the internally used file object will be closed. See the
.. versionchanged:: 3.6
The *path* parameter accepts a :term:`path-like object`.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4
Added the *filter* parameter.


Expand Down Expand Up @@ -495,7 +488,7 @@ be finalized; only the internally used file object will be closed. See the
.. versionchanged:: 3.6
The *path* parameter accepts a :term:`path-like object`.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4
Added the *filter* parameter.


Expand Down Expand Up @@ -533,7 +526,7 @@ be finalized; only the internally used file object will be closed. See the

.. attribute:: TarFile.extraction_filter

.. versionadded:: 3.12
.. versionadded:: 3.11.4

The :ref:`extraction filter <tarfile-extraction-filter>` used
as a default for the *filter* argument of :meth:`~TarFile.extract`
Expand All @@ -544,10 +537,12 @@ be finalized; only the internally used file object will be closed. See the
argument to :meth:`~TarFile.extract`.

If ``extraction_filter`` is ``None`` (the default),
calling an extraction method without a *filter* argument will raise a
``DeprecationWarning``,
and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter,
whose dangerous behavior matches previous versions of Python.
calling an extraction method without a *filter* argument will
use the :func:`fully_trusted <fully_trusted_filter>` filter for
compatibility with previous Python versions.

In Python 3.12+, leaving ``extraction_filter=None`` will emit a
``DeprecationWarning``.

In Python 3.14+, leaving ``extraction_filter=None`` will cause
extraction methods to use the :func:`data <data_filter>` filter by default.
Expand Down Expand Up @@ -652,6 +647,11 @@ Different :class:`TarInfo` methods handle ``None`` differently:
- :meth:`~TarFile.addfile` will fail.
- :meth:`~TarFile.list` will print a placeholder string.


.. versionchanged:: 3.11.4
Added :meth:`~TarInfo.replace` and handling of ``None``.


.. class:: TarInfo(name="")

Create a :class:`TarInfo` object.
Expand Down Expand Up @@ -700,7 +700,7 @@ A ``TarInfo`` object has the following public data attributes:
Time of last modification in seconds since the :ref:`epoch <epoch>`,
as in :attr:`os.stat_result.st_mtime`.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -711,7 +711,7 @@ A ``TarInfo`` object has the following public data attributes:

Permission bits, as for :func:`os.chmod`.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -738,7 +738,7 @@ A ``TarInfo`` object has the following public data attributes:

User ID of the user who originally stored this member.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -749,7 +749,7 @@ A ``TarInfo`` object has the following public data attributes:

Group ID of the user who originally stored this member.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -760,7 +760,7 @@ A ``TarInfo`` object has the following public data attributes:

User name.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -771,7 +771,7 @@ A ``TarInfo`` object has the following public data attributes:

Group name.

.. versionchanged:: 3.12
.. versionchanged:: 3.11.4

Can be set to ``None`` for :meth:`~TarFile.extract` and
:meth:`~TarFile.extractall`, causing extraction to skip applying this
Expand All @@ -786,7 +786,7 @@ A ``TarInfo`` object has the following public data attributes:
uid=..., gid=..., uname=..., gname=...,
deep=True)

.. versionadded:: 3.12
.. versionadded:: 3.11.4

Return a *new* copy of the :class:`!TarInfo` object with the given attributes
changed. For example, to return a ``TarInfo`` with the group name set to
Expand Down Expand Up @@ -851,7 +851,7 @@ A :class:`TarInfo` object also provides some convenient query methods:
Extraction filters
------------------

.. versionadded:: 3.12
.. versionadded:: 3.11.4

The *tar* format is designed to capture all details of a UNIX-like filesystem,
which makes it very powerful.
Expand Down Expand Up @@ -888,9 +888,10 @@ can be:

* ``None`` (default): Use :attr:`TarFile.extraction_filter`.

If that is also ``None`` (the default), raise a ``DeprecationWarning``,
and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior
matches previous versions of Python.
If that is also ``None`` (the default), the ``'fully_trusted'``
filter will be used (for compatibility with earlier versions of Python).

In Python 3.12, the default will emit a ``DeprecationWarning``.

In Python 3.14, the ``'data'`` filter will become the default instead.
It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
Expand Down Expand Up @@ -1027,7 +1028,7 @@ Also note that:
Supporting older Python versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Extraction filters were added to Python 3.12, but may be backported to older
Extraction filters were added to Python 3.12, and are backported to older
versions as security updates.
To check whether the feature is available, use e.g.
``hasattr(tarfile, 'data_filter')`` rather than checking the Python version.
Expand Down Expand Up @@ -1174,6 +1175,8 @@ Command-line options
Only string names are accepted (that is, ``fully_trusted``, ``tar``,
and ``data``).

.. versionadded:: 3.11.4

.. _tar-examples:

Examples
Expand All @@ -1183,7 +1186,7 @@ How to extract an entire tar archive to the current working directory::

import tarfile
tar = tarfile.open("sample.tar.gz")
tar.extractall(filter='data')
tar.extractall()
tar.close()

How to extract a subset of a tar archive with :meth:`TarFile.extractall` using
Expand Down
16 changes: 16 additions & 0 deletions Doc/whatsnew/3.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2332,3 +2332,19 @@ The deprecated :mod:`mailcap` module now refuses to inject unsafe text
text, it will warn and act as if a match was not found (or for test commands,
as if the test failed).
(Contributed by Petr Viktorin in :gh:`98966`.)

Notable Changes in 3.10.12
==========================

tarfile
-------

* The extraction methods in :mod:`tarfile`, and :func:`shutil.unpack_archive`,
have a new a *filter* argument that allows limiting tar features than may be
surprising or dangerous, such as creating files outside the destination
directory.
See :ref:`tarfile-extraction-filter` for details.
In Python 3.12, use without the *filter* argument will show a
:exc:`DeprecationWarning`.
In Python 3.14, the default will switch to ``'data'``.
(Contributed by Petr Viktorin in :pep:`706`.)