-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pathlib.Path.parents negative indexing is wrong for absolute paths #89577
Comments
At least on PosixPath (not currently able to try on Windows to check WindowsPath, but from a quick code check I think it'll behave the same way), the negative indexing added in bpo-21041 is implemented incorrectly for absolute paths. Passing either -1 or -2 will return a path representing the root, '/' for PosixPath (which should only be returned for -1), and passing an index of -3 or beyond returns the value expected for that index + 1, e.g. -3 gets the result expected for -2, -4 gets the result for -3, etc. And for the negative index that should be equivalent to index 0, you end up with an IndexError. The underlying problem appears to be that absolute paths (at least, those created from a string) are represented in self._parts with the root '/' included (redundantly, since self._root has it too), so all the actual components of the path are offset by one. This does not affect slicing (slicing is implemented using range and slice.indices to perform normalization from negative to positive indices, so it never indexes with a negative index). Example: >>> from pathlib import Path
>>> p = Path('/1/2/3')
>>> p._parts
['/', '1', '2', '3']
>>> p.parents[:]
(PosixPath('/1/2'), PosixPath('/1'), PosixPath('/'))
>>> p.parents[-1]
PosixPath('/')
>>> p.parents[-1]._parts # Still behaves normally as self._root is still '/'
[]
>>> p.parents[-2]
PosixPath('/')
>>> p.parents[-2]._parts
['/']
>>> p.parents[-3]
PosixPath('/1')
>>> p.parents[-4]
Traceback (most recent call last):
...
IndexError: -4 It looks like the underlying problem is that the negative indexing code doesn't account for the possibility of '/' being in _parts and behaving as a component separate from the directory/files in the path. Frankly, it's a little odd that _parts includes '/' at all (Path has a ._root/.root attribute that stores it too, and even when '/' isn't in the ._parts/.parts, the generated complete path includes it because of ._root), but it looks like the docs guaranteed that behavior in their examples. It looks like one of two options must be chosen:
I think #1 is probably the way to go. I believe all that would require is to add: if idx < 0:
return self.__getitem__(len(self) + idx) just before: return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1]) so it never tries to use a negative idx directly (it has to occur after the check for valid index in [-len(self), len(self) so very negative indices don't recurse until they become positive). This takes advantage of _PathParents's already adjusting the reported length for the presence of drive/root, keeping the code simple; the alternative I came up with that doesn't recurse changes the original return line: return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - 1]) to: adjust = idx >= 0 or not (self._drv or self._root)
return self._pathcls._from_parsed_parts(self._drv, self._root, self._parts[:-idx - adjust]) which is frankly terrible, even if it's a little faster. |
This is a great bug report, but for anyone else who gets a bit lost in the details, here's the core of the issue:
>>> p.parents[-1] # This is correct
PosixPath('/')
>>> q.parents[-1]
PosixPath('.')
>>> p.parents[-2] # Should be PosixPath('/1')
PosixPath('/')
>>> q.parents[-2]
PosixPath('1')
>>> p.parents[-3] # Should be PosixPath('/1/2')
PosixPath('/1')
>>> q.parents[-3]
PosixPath('1/2') I think a refactoring where '/' doesn't appear in ._parts would be a good idea if we can get past Chesterton's Fence and determine that this was indeed not a deliberate design decision (or at least one whose concerns no longer apply), but at least in the short term, I agree that transforming negative indexes into positive indices is the right, expedient thing to do. We'll definitely want to make sure that we're careful about bad indices (and add relevant tests), though, since it would be easy to get weird behavior where too-large negative indexes start "wrapping around" (e.g. p.parents[-4] with len(p._parents) == 3 → p.parents[-1]). |
"We'll definitely want to make sure that we're careful about bad indices ... since it would be easy to get weird behavior where too-large negative indexes start 'wrapping around'" When I noticed the problem, I originally thought "Hey, the test for a negative index can come *before* the range check and save some work for negative indices". Then I realized, while composing this bug report, that that would make p.parents[-4] with len(p.parents) == 3 → p.parents[-1] as you said, and die with a RecursionError for p.parents[-3000] or so. I'm going to ignore the possibility I'm sleep-deprived and/or sloppy, and assume a lot of good programmers would think to make that "optimization" and accidentally introduce new bugs. :-) So yeah, all the tests. |
On the subject of sleep-deprived and/or sloppy, just realized: return self.__getitem__(len(self) + idx) should really just be: idx += len(self) no need to recurse. |
Closing as a duplicate of #93156 . |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: