Skip to content

Commit

Permalink
pythongh-99726: Add 'fast' argument to os.[l]stat for faster calculation
Browse files Browse the repository at this point in the history
When passed as True, only st_mode's type bits, st_size and st_mtime[_nsec] are guaranteed to be set. Other fields may also be set for a given Python version on a given platform version, but may change without warning (in the case of OS changes - Python will try to keep them stable).
This first implementation uses a new Windows API that is significantly faster, provided the volume identifier is not required. Other optimizations may be added later.
  • Loading branch information
zooba committed Nov 23, 2022
1 parent 55bad19 commit 9871983
Show file tree
Hide file tree
Showing 11 changed files with 375 additions and 55 deletions.
53 changes: 50 additions & 3 deletions Doc/library/os.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2175,7 +2175,7 @@ features:
Accepts a :term:`path-like object`.


.. function:: lstat(path, *, dir_fd=None)
.. function:: lstat(path, *, dir_fd=None, fast=False)

Perform the equivalent of an :c:func:`lstat` system call on the given path.
Similar to :func:`~os.stat`, but does not follow symbolic links. Return a
Expand All @@ -2184,8 +2184,15 @@ features:
On platforms that do not support symbolic links, this is an alias for
:func:`~os.stat`.

Passing *fast* as ``True`` may omit some information on some platforms
for the sake of performance. These omissions are not guaranteed (that is,
the information may be returned anyway), and may change between Python
releases without a deprecation period or due to operating system updates
without warning. See :class:`stat_result` documentation for the fields
that are guaranteed to be present under this option.

As of Python 3.3, this is equivalent to ``os.stat(path, dir_fd=dir_fd,
follow_symlinks=False)``.
follow_symlinks=False, fast=fast)``.

This function can also support :ref:`paths relative to directory descriptors
<dir_fd>`.
Expand All @@ -2209,6 +2216,9 @@ features:
Other kinds of reparse points are resolved by the operating system as
for :func:`~os.stat`.

.. versionchanged:: 3.12
Added the *fast* parameter.


.. function:: mkdir(path, mode=0o777, *, dir_fd=None)

Expand Down Expand Up @@ -2781,7 +2791,7 @@ features:
for :class:`bytes` paths on Windows.


.. function:: stat(path, *, dir_fd=None, follow_symlinks=True)
.. function:: stat(path, *, dir_fd=None, follow_symlinks=True, fast=False)

Get the status of a file or a file descriptor. Perform the equivalent of a
:c:func:`stat` system call on the given path. *path* may be specified as
Expand All @@ -2806,6 +2816,13 @@ features:
possible and call :func:`lstat` on the result. This does not apply to
dangling symlinks or junction points, which will raise the usual exceptions.

Passing *fast* as ``True`` may omit some information on some platforms
for the sake of performance. These omissions are not guaranteed (that is,
the information may be returned anyway), and may change between Python
releases without a deprecation period or due to operating system updates
without warning. See :class:`stat_result` documentation for the fields
that are guaranteed to be present under this option.

.. index:: module: stat

Example::
Expand Down Expand Up @@ -2838,19 +2855,32 @@ features:
returns the information for the original path as if
``follow_symlinks=False`` had been specified instead of raising an error.

.. versionchanged:: 3.12
Added the *fast* parameter.


.. class:: stat_result

Object whose attributes correspond roughly to the members of the
:c:type:`stat` structure. It is used for the result of :func:`os.stat`,
:func:`os.fstat` and :func:`os.lstat`.

When the *fast* argument to these functions is passed ``True``, some
information may be reduced or omitted. Those attributes that are
guaranteed to be valid, and those currently known to be omitted, are
marked in the documentation below. If not specified and you depend on
that field, explicitly pass *fast* as ``False`` to ensure it is
calculated.

Attributes:

.. attribute:: st_mode

File mode: file type and file mode bits (permissions).

When *fast* is ``True``, only the file type bits are guaranteed
to be valid (the mode bits may be zero).

.. attribute:: st_ino

Platform dependent, but if non-zero, uniquely identifies the
Expand All @@ -2865,6 +2895,8 @@ features:

Identifier of the device on which this file resides.

On Windows, when *fast* is ``True``, this may be zero.

.. attribute:: st_nlink

Number of hard links.
Expand All @@ -2883,6 +2915,8 @@ features:
The size of a symbolic link is the length of the pathname it contains,
without a terminating null byte.

This field is guaranteed to be filled when specifying *fast*.

Timestamps:

.. attribute:: st_atime
Expand All @@ -2893,6 +2927,8 @@ features:

Time of most recent content modification expressed in seconds.

This field is guaranteed to be filled when specifying *fast*.

.. attribute:: st_ctime

Platform dependent:
Expand All @@ -2909,6 +2945,9 @@ features:
Time of most recent content modification expressed in nanoseconds as an
integer.

This field is guaranteed to be filled when specifying *fast*, subject
to the note below.

.. attribute:: st_ctime_ns

Platform dependent:
Expand Down Expand Up @@ -2998,12 +3037,16 @@ features:
:c:func:`GetFileInformationByHandle`. See the ``FILE_ATTRIBUTE_*``
constants in the :mod:`stat` module.

This field is guaranteed to be filled when specifying *fast*.

.. attribute:: st_reparse_tag

When :attr:`st_file_attributes` has the ``FILE_ATTRIBUTE_REPARSE_POINT``
set, this field contains the tag identifying the type of reparse point.
See the ``IO_REPARSE_TAG_*`` constants in the :mod:`stat` module.

This field is guaranteed to be filled when specifying *fast*.

The standard module :mod:`stat` defines functions and constants that are
useful for extracting information from a :c:type:`stat` structure. (On
Windows, some items are filled with dummy values.)
Expand Down Expand Up @@ -3039,6 +3082,10 @@ features:
files as :const:`S_IFCHR`, :const:`S_IFIFO` or :const:`S_IFBLK`
as appropriate.

.. versionchanged:: 3.12
Added the *fast* argument and defined the minimum set of returned
fields.

.. function:: statvfs(path)

Perform a :c:func:`statvfs` system call on the given path. The return value is
Expand Down
77 changes: 77 additions & 0 deletions Include/internal/pycore_fileutils_windows.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
#ifndef Py_INTERNAL_FILEUTILS_WINDOWS_H
#define Py_INTERNAL_FILEUTILS_WINDOWS_H
#ifdef __cplusplus
extern "C" {
#endif

#ifndef Py_BUILD_CORE
# error "Py_BUILD_CORE must be defined to include this header"
#endif

#ifdef MS_WINDOWS

#if !defined(NTDDI_WIN10_NI) || !(NTDDI_VERSION >= NTDDI_WIN10_NI)
typedef struct _FILE_STAT_BASIC_INFORMATION {
LARGE_INTEGER FileId;
LARGE_INTEGER CreationTime;
LARGE_INTEGER LastAccessTime;
LARGE_INTEGER LastWriteTime;
LARGE_INTEGER ChangeTime;
LARGE_INTEGER AllocationSize;
LARGE_INTEGER EndOfFile;
ULONG FileAttributes;
ULONG ReparseTag;
ULONG NumberOfLinks;
ULONG DeviceType;
ULONG DeviceCharacteristics;
} FILE_STAT_BASIC_INFORMATION;

typedef enum _FILE_INFO_BY_NAME_CLASS {
FileStatByNameInfo,
FileStatLxByNameInfo,
FileCaseSensitiveByNameInfo,
FileStatBasicByNameInfo,
MaximumFileInfoByNameClass
} FILE_INFO_BY_NAME_CLASS;
#endif

typedef BOOL (WINAPI *PGetFileInformationByName)(
PCWSTR FileName,
FILE_INFO_BY_NAME_CLASS FileInformationClass,
PVOID FileInfoBuffer,
ULONG FileInfoBufferSize
);

static inline BOOL GetFileInformationByName(
PCWSTR FileName,
FILE_INFO_BY_NAME_CLASS FileInformationClass,
PVOID FileInfoBuffer,
ULONG FileInfoBufferSize
) {
static PGetFileInformationByName GetFileInformationByName = NULL;
static int GetFileInformationByName_init = -1;

if (GetFileInformationByName_init < 0) {
HMODULE hMod = LoadLibraryW(L"api-ms-win-core-file-l2-1-4");
GetFileInformationByName_init = 0;
if (hMod) {
GetFileInformationByName = (PGetFileInformationByName)GetProcAddress(
hMod, "GetFileInformationByName");
if (GetFileInformationByName) {
GetFileInformationByName_init = 1;
} else {
FreeLibrary(hMod);
}
}
}

if (GetFileInformationByName_init <= 0) {
SetLastError(ERROR_NOT_SUPPORTED);
return FALSE;
}
return GetFileInformationByName(FileName, FileInformationClass, FileInfoBuffer, FileInfoBufferSize);
}

#endif

#endif
1 change: 1 addition & 0 deletions Include/internal/pycore_global_objects_fini_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Include/internal/pycore_global_strings.h
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ struct _Py_global_strings {
STRUCT_FOR_ID(false)
STRUCT_FOR_ID(family)
STRUCT_FOR_ID(fanout)
STRUCT_FOR_ID(fast)
STRUCT_FOR_ID(fd)
STRUCT_FOR_ID(fd2)
STRUCT_FOR_ID(fdel)
Expand Down
1 change: 1 addition & 0 deletions Include/internal/pycore_runtime_init_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Include/internal/pycore_unicodeobject_generated.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions Lib/test/test_os.py
Original file line number Diff line number Diff line change
Expand Up @@ -613,6 +613,18 @@ def test_stat_result_pickle(self):
unpickled = pickle.loads(p)
self.assertEqual(result, unpickled)

def test_stat_result_fast(self):
# Minimum guaranteed fields when requesting incomplete info
result_1 = os.stat(self.fname, fast=True)
result_2 = os.stat(self.fname, fast=False)
result_3 = os.stat(self.fname)
self.assertEqual(stat.S_IFMT(result_1.st_mode),
stat.S_IFMT(result_2.st_mode))
self.assertEqual(result_1.st_size, result_2.st_size)
self.assertEqual(result_1.st_mtime, result_2.st_mtime)
# Ensure the default matches fast=False
self.assertEqual(result_2, result_3)

@unittest.skipUnless(hasattr(os, 'statvfs'), 'test needs os.statvfs()')
def test_statvfs_attributes(self):
result = os.statvfs(self.fname)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Adds `fast` argument to :func:`os.stat` and :func:`os.lstat` to enable
performance optimizations by skipping some fields in the result.
Loading

0 comments on commit 9871983

Please sign in to comment.