-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-78079: Fix UNC device path root normalization in pathlib #102003
GH-78079: Fix UNC device path root normalization in pathlib #102003
Conversation
We no longer add a root to device paths such as `//./PhysicalDrive0`, `//?/BootPartition` and `//./c:` while normalizing. We also avoid adding a root to incomplete UNC share paths, like `//`, `//a` and `//a/`.
Hey @eryksun, does this PR look OK to you? As I mentioned on the issue, it does not fix |
Handling "Global" is a low priority. It turns out that handling "GLOBALROOT" is a more pressing concern, but it's a small enough niche and such a complicated problem that I think leaving the drive as "\\?\GLOBALROOT" is good enough.
|
Co-authored-by: Eryk Sun <eryksun@gmail.com>
This is presently blocked on #102789. |
@@ -330,8 +330,7 @@ def _parse_path(cls, path): | |||
elif len(drv_parts) == 6: | |||
# e.g. //?/unc/server/share | |||
root = sep | |||
unfiltered_parsed = [drv + root] + rel.split(sep) | |||
parsed = [sys.intern(x) for x in unfiltered_parsed if x and x != '.'] | |||
parsed = [sys.intern(str(x)) for x in rel.split(sep) if x and x != '.'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, we sys.intern
everything? That seems like a painful memory leak. Would we be better off with a class level lru_cache
'd function1?
Don't we already know that rel
will be str
here? If it's bytes, str(x)
is the wrong conversion anyway. Maybe the intern function should be cls._flavour.str_and_intern
?
Footnotes
-
@lru_cache(max_size=REASONABLE_VALUE) def intern(x): return x
↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interning of the path parts is longstanding pathlib behaviour. Honestly I don't know enough about the benefits/drawbacks of interning to say whether it's reasonable.
Don't we already know that
rel
will bestr
here?
We know that it's an instance of str
(and not bytes
), but we don't know whether it's a true str
object (required by sys.intern()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa, we
sys.intern
everything? That seems like a painful memory leak.
Is it really all that bad? sys.intern()
calls PyUnicode_InternInPlace()
, not PyUnicode_InternImmortal()
. An interned string gets deallocated based on its reference count, like any other object. For example:
>>> s = sys.intern('spam and eggs')
>>> t = sys.intern('spam and eggs')
>>> s is t
True
>>> id(s)
139835455199152
>>> del s, t
>>> s = sys.intern('spam and eggs')
>>> id(s)
139835455198912
Hey @eryksun, are you happy for this to land? Thanks |
LGTM. The implementation of |
Thank you both for the reviews + advice! |
|
* main: Remove `expert-*` from `project-updater` GH workflow (python#103579) pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540) pythongh-48330: address review comments to PR-12271 (python#103209) pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567) pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554) pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518) pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115) pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003) pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547) pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528) pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544) pythongh-103532: Add NEWS entry (python#103542)
* superopt: update generated cases with new comment review comments Remove `expert-*` from `project-updater` GH workflow (python#103579) pythongh-103583: Add codecs and maps to _codecs_* module state (python#103540) pythongh-48330: address review comments to PR-12271 (python#103209) pythongh-103527: Add multibytecodec.h as make dep for _codecs_* (python#103567) pythongh-103553: Improve `test_inspect`: add more assertions, remove unused (python#103554) pythonGH-103517: Improve tests for `pathlib.Path.walk()` (pythonGH-103518) pythongh-102114: Make dis print more concise tracebacks for syntax errors in str inputs (python#102115) pythonGH-78079: Fix UNC device path root normalization in pathlib (pythonGH-102003) pythongh-101517: Add regression test for a lineno bug in try/except* impacting pdb (python#103547) pythongh-103527: Add make deps for _codecs_* and _multibytecodec (python#103528) pythongh-103532: Fix reST syntax in NEWS entry (pythonGH-103544) pythongh-103532: Add NEWS entry (python#103542)
We no longer add a root to device paths such as
//./PhysicalDrive0
,//?/BootPartition
and//./c:
while normalizing. We also avoid adding a root to incomplete UNC share paths, like//
,//a
and//a/
.This isn't easy to backport as the the surrounded code has changed a lot since 3.11.