-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-105704: Disallow IPv6 URLs with invalid prefix/suffix #111261
Conversation
bcail
commented
Oct 24, 2023
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: urlparse does not flag hostname *containing* [ or ] as incorrect #105704
The following commit authors need to sign the Contributor License Agreement: |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
@bcail Would it be possible to also add a test case for having a bracket in the username / password? Because this is case where the current implementation fails for me, but it should work according to the referenced spec. |
@pschoen-itsc Sure, if there's consensus. What about this comment? Doesn't that comment argue that the spec says that brackets in the username/password should be percent-encoded? |
@bcail There was a breaking change which wasn't covered by the tests. So at least we should habe tests which cover the spec (not the "faulty" implementation before 3.11.4). (Edited because to stupid to read...) |
Here's the description of what characters should be encoded in user info. Looks like we don't currently check for characters (like @orsenthil Do we want to add some checking for characters that weren't properly encoded, and should we decode the encoded characters in the username/password? |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
I also just pushed a commit that adds the checking for bytes URLs as well. |
dd954f2
to
5944939
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
5944939
to
33b0781
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
FYI - can you please sign the CLA (see above)? If not, we'll need to close this PR per our contributor policy. |
@gpshead I actually have signed the CLA, on 2023-10-23, using the contrib form here: https://www.python.org/psf/contrib/contrib-form/. But it doesn't seem to have been processed? |
@ambv can you straighten the CLA bit out here? |
@bcail the contrib form is for organizations. Can you please click through the workflow that the button in this comment suggests? |
@ambv OK, I clicked through the individual CLA process. Thanks. |
@gpshead It says I've signed the CLA now. |
33b0781
to
f75c6a5
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Lib/test/test_urlparse.py
Outdated
for case in cases: | ||
with self.subTest(case=case): | ||
with self.assertRaises(ValueError): | ||
urllib.parse.urlsplit(case).hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously urlsplit raised the ValueError. This PR as is changes that and only raises it upon hostname or port attribute access.
We need to continue raising the ValueError at urlsplit time. Existing code must be assumed to expect that split time validation behavior rather than at attribute access time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @gpshead. I updated the code to do the check at urlsplit time.
@@ -504,9 +514,6 @@ def _urlsplit(url, scheme=None, allow_fragments=True): | |||
if (('[' in netloc and ']' not in netloc) or | |||
(']' in netloc and '[' not in netloc)): | |||
raise ValueError("Invalid IPv6 URL") | |||
if '[' in netloc and ']' in netloc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be kept and updated to error appropriately as callers of urlsplit and urlparse expect the exception at parsing time. Not later on when accessing attributes.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @gpshead: please review the changes made to this pull request. |
@@ -1402,16 +1402,34 @@ def test_issue14072(self): | |||
self.assertEqual(p2.path, '+31641044153') | |||
|
|||
def test_invalid_bracketed_hosts(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactoring, personally makes it hard to review. Could you please share the description, which existing test cases behavior are changing now (if any)? If you don't get to it, I will try to find out too. (But this can be considered for future refactors of test code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - here are the differences in the tests:
- urlsplit and urlparse are now tested (not just urlsplit)
- strings and bytes are now tested (not just strings)
- four new test cases are added (the first ten are the same as they were before)
If I need to split the refactoring into a separate PR, I can do that.
If this was replaced by #129418, shouldn’t this be closed? |
Closed in favor of #129418 |