Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-105704: Disallow IPv6 URLs with invalid prefix/suffix #111261

Closed
wants to merge 5 commits into from

Conversation

bcail
Copy link

@bcail bcail commented Oct 24, 2023

@cpython-cla-bot
Copy link

cpython-cla-bot bot commented Oct 24, 2023

The following commit authors need to sign the Contributor License Agreement:

Click the button to sign:
CLA not signed

@bedevere-app
Copy link

bedevere-app bot commented Oct 24, 2023

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@pschoen-itsc
Copy link

@bcail Would it be possible to also add a test case for having a bracket in the username / password? Because this is case where the current implementation fails for me, but it should work according to the referenced spec.

@bcail
Copy link
Author

bcail commented Oct 24, 2023

@pschoen-itsc Sure, if there's consensus. What about this comment? Doesn't that comment argue that the spec says that brackets in the username/password should be percent-encoded?

@pschoen-itsc
Copy link

pschoen-itsc commented Oct 24, 2023

@bcail There was a breaking change which wasn't covered by the tests. So at least we should habe tests which cover the spec (not the "faulty" implementation before 3.11.4).
Just my opinion.

(Edited because to stupid to read...)

@bcail
Copy link
Author

bcail commented Oct 24, 2023

Here's the description of what characters should be encoded in user info.

Looks like we don't currently check for characters (like [) that must be percent-encoded, and we don't decode them - the username and password are just left however they come in.

@orsenthil Do we want to add some checking for characters that weren't properly encoded, and should we decode the encoded characters in the username/password?

@bedevere-app
Copy link

bedevere-app bot commented Oct 24, 2023

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bcail
Copy link
Author

bcail commented Oct 24, 2023

I also just pushed a commit that adds the checking for bytes URLs as well.

@bcail bcail force-pushed the issue-105704-urlsplit branch from dd954f2 to 5944939 Compare January 25, 2024 16:08
@bedevere-app
Copy link

bedevere-app bot commented Jan 25, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gpshead gpshead self-assigned this Jan 29, 2024
@bcail bcail force-pushed the issue-105704-urlsplit branch from 5944939 to 33b0781 Compare September 17, 2024 20:30
@bedevere-app
Copy link

bedevere-app bot commented Sep 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gpshead
Copy link
Member

gpshead commented Nov 19, 2024

FYI - can you please sign the CLA (see above)? If not, we'll need to close this PR per our contributor policy.

@gpshead gpshead removed their assignment Nov 19, 2024
@bcail
Copy link
Author

bcail commented Nov 19, 2024

@gpshead I actually have signed the CLA, on 2023-10-23, using the contrib form here: https://www.python.org/psf/contrib/contrib-form/. But it doesn't seem to have been processed?

@gpshead
Copy link
Member

gpshead commented Nov 19, 2024

@ambv can you straighten the CLA bit out here?

@ambv
Copy link
Contributor

ambv commented Nov 22, 2024

@bcail the contrib form is for organizations. Can you please click through the workflow that the button in this comment suggests?

@bcail
Copy link
Author

bcail commented Nov 22, 2024

@ambv OK, I clicked through the individual CLA process. Thanks.

@bcail
Copy link
Author

bcail commented Nov 22, 2024

@gpshead It says I've signed the CLA now.

@bcail bcail force-pushed the issue-105704-urlsplit branch from 33b0781 to f75c6a5 Compare November 22, 2024 16:39
@bedevere-app
Copy link

bedevere-app bot commented Nov 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@gpshead gpshead self-assigned this Nov 24, 2024
for case in cases:
with self.subTest(case=case):
with self.assertRaises(ValueError):
urllib.parse.urlsplit(case).hostname
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously urlsplit raised the ValueError. This PR as is changes that and only raises it upon hostname or port attribute access.

We need to continue raising the ValueError at urlsplit time. Existing code must be assumed to expect that split time validation behavior rather than at attribute access time.

Copy link
Author

@bcail bcail Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @gpshead. I updated the code to do the check at urlsplit time.

@@ -504,9 +514,6 @@ def _urlsplit(url, scheme=None, allow_fragments=True):
if (('[' in netloc and ']' not in netloc) or
(']' in netloc and '[' not in netloc)):
raise ValueError("Invalid IPv6 URL")
if '[' in netloc and ']' in netloc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be kept and updated to error appropriately as callers of urlsplit and urlparse expect the exception at parsing time. Not later on when accessing attributes.

@bedevere-app
Copy link

bedevere-app bot commented Nov 24, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@bedevere-app
Copy link

bedevere-app bot commented Nov 26, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bcail
Copy link
Author

bcail commented Nov 27, 2024

I have made the requested changes; please review again.

@bedevere-app
Copy link

bedevere-app bot commented Nov 27, 2024

Thanks for making the requested changes!

@gpshead: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from gpshead November 27, 2024 22:04
@@ -1402,16 +1402,34 @@ def test_issue14072(self):
self.assertEqual(p2.path, '+31641044153')

def test_invalid_bracketed_hosts(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactoring, personally makes it hard to review. Could you please share the description, which existing test cases behavior are changing now (if any)? If you don't get to it, I will try to find out too. (But this can be considered for future refactors of test code).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - here are the differences in the tests:

  • urlsplit and urlparse are now tested (not just urlsplit)
  • strings and bytes are now tested (not just strings)
  • four new test cases are added (the first ten are the same as they were before)

If I need to split the refactoring into a separate PR, I can do that.

@mcepl
Copy link
Contributor

mcepl commented Jan 31, 2025

If this was replaced by #129418, shouldn’t this be closed?

@sethmlarson
Copy link
Contributor

Closed in favor of #129418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants