-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Argument Clinic] Change the errors raised by the path_t
converter
#122228
Comments
All functions that check for embedded null characters or bytes raise ValueError. This is built in This is definitely not an OS error. These OS functions take null-terminated strings, therefore they cannot check for embedded null characters. ValueError is raised on the boundary between Python and OS, because there is no way to pass a string with embedded null characters to OS. Embedded null characters is not the only problem you should guard against. You can also get UnicodeEncodeError or UnicodeDecodeError (which are ValueError subclasses, so existing guards against ValueError work for them). You should guards against ValueError only if you can get a right answer at the low level (for example |
Ok, that's what I thought about the embedded null bytes (but then I don't know the import system). However, the question remains on the too long names. Should we raise a ValueError or an OSError (or should we do something else?) Because |
I updated my post and restricted my proposal to:
If the path does not have embedded NUL bytes, but is too long, are there any system calls that could fail with a non-OS-like error? |
Isn't it for Windows only? I do not know what happens on Windows when too long path is passed. If it crashes or truncates the path, we should raise ValueError to prevent this. If it requires the length to be representable as a signed 16-bit integer, we should raise ValueError, because there is no way to represent the length. Otherwise we can remove the guard, but we should check that all functions work with long paths on all supported Windows versions. This is also a breaking change, so we should investigate what effect it causes on the third-party code. |
I think so (sorry I forgot about the Windows label). By the way, the latest modification on this code path was #118355 which introduced the possibility to suppress ValueError (AFAICT) but this would also suppress NUL bytes related errors (which would not be good). I think the ValueError in
That.. is something I don't know. Eryk seemed to have much deeper knowledge in that area. Alternatively, I can try to do what I did for linecache, namely guarding against |
The check was added by Larry Hastings in #58831 in Python 3.3, back in 2012 when Windows XP was still supported. Larry didn't discuss a rationale for adding it. Pathname arguments in the Windows file API are null-terminated strings. Internally, the base NT system uses a counted It could be that there was an issue in NT 5.1 (Windows XP) with long paths prefixed by "\\?\" getting truncated. I know that NT 4.0 and 5.0 (Windows 2000) used
In #122170 (comment), I discussed the behavior when the pathname or filename exceeds the maximum allowed length in WinAPI |
There are a number of issues on this topic going on right now, but for this one, we should not change the error type. A path longer than 32KB is always an invalid value on Windows, and so ValueError is correct. Places where it ought to be handled silently rather than letting the user know they've provided an invalid value can be discussed individually. There's no general rule to apply here. Let's approach each one in terms of the real problems that are being observed. |
Thanks for confirming that it's indeed the correct exception type. I'm closing this specific issue as |
@zooba, I see no reason for Python to be required to raise |
It's that inconsistency in error that I don't like. Long names are pretty thoroughly documented as only being 32KB, so we can reliably raise a suitable error. We wouldn't do it for the smaller limit because we couldn't reliably detect the ways of bypassing it, but those ways don't exist at this limit. One of the biggest problems with the 260 limit is that the error doesn't say that it's too long. If all the error codes were consistent (and ideally if they said it was too long), then I'd let them bubble out. But because it's a bit of a mess and it's easy for us to verify this particular case, so we should do it. I'd argue that |
Our existing check has a one-off error. The upper limit is 32766, not 32767, because Beyond that, the check doesn't take into account the path length after the system API resolves the working directory and "\??\" NT device prefix (e.g. "spam" -> "\??\C:\long\working\path\spam"). This can lead to internal library calls failing, as either boolean false or with status codes such as FYI, I spoke too soon about the API having been fixed back in Windows XP. I just discovered that for some reason they never updated |
That's a bug in the check, not a problem with the error type 😉 We should fix it - does it have its own issue yet? I'm sure it's an "easy" tag one.
These are impacted by the OS state, and so if the OS state causes the value to become unusable, then it's an OSError. Changing the OS state could (theoretically) resolve it. So this is consistent with my reasoning from above. |
Feature or enhancement
Proposal:
Currently, a
ValueError
is raised by thepath_t
converter if the input path is too long or if there are embedded null bytes. For any C code that is using this converter, such asos.stat
, this actually makesos.stat
raise a ValueError instead of an OSError in this case.EDIT: Remove paragraph on NUL bytes. For NUL bytes, the converter must anyway raise
ValueError
. I fixed linecache in #122176 for that.See #122170 (comment) for another issue (only on Windows though I don't know why since Linux may also have too long paths in general).
So, what I suggest is to raise OSError, or have a way to only disable ValueError that would not make
stat(2)
fail (i.e., keep ValueError for NUL bytes but not for long paths). I don't know which is the best but I'd prefer just ignoring long paths inpath_t
converter itself.Links to previous discussion of this feature:
#122170 (comment)
The text was updated successfully, but these errors were encountered: