-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of os.walk()
#119169
Labels
performance
Performance or resource usage
Comments
Upon closer inspection, I think I was wrong. Symlinks to directories where |
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
May 19, 2024
Handle errors from `os.scandir()` and `ScandirIterator` similarly, which lets us loop over directory entries with `for`. In top-down mode, call `os.path.join()` at most once per iteration.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
May 23, 2024
…wn=False)` In `os.walk(topdown=False)`, don't bother reversing `walk_dirs`. This means that sibling directories are visited in a different order, but 1) that order is arbitrary and comes from `os.scandir()`, and 2) unlike in top-down mode, users can't influence which directories are visited or in what order. This change caused `test_walk_bottom_up` to fail. I think this test made assertions that were too specific and relied on `os.scandir()` returning things in a specific order, and the test code is pretty hard to understand once you get into the details. I've replaced it with a version of the same test from `test_pathlib_abc.py`.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
May 26, 2024
For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call though to `os.walk()`. Symlink handling is a little different between the two `walk()` implementations when `followlinks=False`. In `pathlib` it means never following symlinks, not even for distinguishing between files and directories. In `os` it means never *walking* into symlinks, including any symlinks created by the user between iterations. We smooth over these differences with a private sentinel - `os._walk_symlinks_as_files` - that enables the pathlib behaviour.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
May 26, 2024
For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call though to `os.walk()`. Symlink handling is a little different between the two `walk()` implementations when `followlinks=False`. In `pathlib` it means never following symlinks, not even for distinguishing between files and directories. In `os` it means never *walking* into symlinks, including any symlinks created by the user between iterations. We smooth over these differences with a private sentinel - `os._walk_symlinks_as_files` - that enables the pathlib behaviour.
barneygale
added a commit
that referenced
this issue
May 29, 2024
For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`.
miss-islington
pushed a commit
to miss-islington/cpython
that referenced
this issue
May 29, 2024
…ythonGH-119573) For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`. (cherry picked from commit 7ff61f5) Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale
added a commit
that referenced
this issue
May 29, 2024
…H-119573) (#119750) GH-119169: Implement `pathlib.Path.walk()` using `os.walk()` (GH-119573) For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`. (cherry picked from commit 7ff61f5) Co-authored-by: Barney Gale <barney.gale@gmail.com>
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Jul 6, 2024
Handle "disappearing" files as in `walk()`: add them to the `nondirs` list rather than omitting them entirely.
barneygale
added a commit
to barneygale/cpython
that referenced
this issue
Jul 6, 2024
Add entries to the stack while iterating over `os.scandir()` results, rather than afterwards. This removes the need for an `entries` list and some zipping.
This was referenced Jul 6, 2024
There are too many open PRs. Are they independent? Alternative? What should I review? |
barneygale
added a commit
that referenced
this issue
Jul 8, 2024
Handle errors from `os.scandir()` and `ScandirIterator` similarly, which lets us loop over directory entries with `for`.
noahbkim
pushed a commit
to hudson-trading/cpython
that referenced
this issue
Jul 11, 2024
…ython#119573) For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`.
noahbkim
pushed a commit
to hudson-trading/cpython
that referenced
this issue
Jul 11, 2024
Handle errors from `os.scandir()` and `ScandirIterator` similarly, which lets us loop over directory entries with `for`.
estyxx
pushed a commit
to estyxx/cpython
that referenced
this issue
Jul 17, 2024
…ython#119573) For silly reasons, pathlib's generic implementation of `walk()` currently resides in `glob._Globber`. This commit moves it into `pathlib._abc.PathBase.walk()` where it really belongs, and makes `pathlib.Path.walk()` call `os.walk()`.
estyxx
pushed a commit
to estyxx/cpython
that referenced
this issue
Jul 17, 2024
Handle errors from `os.scandir()` and `ScandirIterator` similarly, which lets us loop over directory entries with `for`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There are a couple of minor performance improvements possible in
os.walk()
:os.scandir
iterator, given we handle exceptions fromnext()
like exceptions fromscandir()
itself, i.e. by ignoring the problematic directory and moving on. We can use afor
loop like filthy casuals.In bottom-up mode, we can handle exceptions fromentry.is_symlink()
in the same block as those fromentry.is_dir()
, which avoids a few temporary variables.os.path.join()
once on a parent directory rather than for each child path.Linked PRs
os.walk(topdown=False)
#119186os.[f]walk(topdown=False)
#119473pathlib.Path.walk()
usingos.walk()
#119573pathlib.Path.walk()
usingos.walk()
(GH-119573) #119750os.walk(topdown=True)
#121431os.fwalk()
exception handling #121432os.fwalk(topdown=False)
#121433os.walk()
exception handling #121435The text was updated successfully, but these errors were encountered: