-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-98433: Fix quadratic time idna decoding. #99092
Conversation
There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. An early length check would still be a good idea given that DNS IDNA label names cannot be more than 63 ASCII characters.
The code looks weird (already did before the change):
Doesn't make sense to run the same two inner checks multiple times (note they don't use Is there a good reason it's not as follows? (Replaced the
I think it's only because the code was written in 2003 and |
+10
Ahha, that explains so much. Thanks @pochmann! I was aiming for minimal diff once I profiled to find the hot spot, but this code is pretty gross. I'll go with your suggested version. Much cleaner. |
random observation: If someone wanted a truly minimal patch, putting a |
such as :mod:`urllib` http ``3xx`` redirects potentially allow for an attacker | ||
to supply such a name. | ||
|
||
Individual labels within an IDNA encoded DNS name will now raise an error early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with leaving this length limit out either entirely - or just out of the backports if release managers are uncomfortable with this in a bugfix.
I view it as protection from other as yet unknown future idna or punycode decoder logic bugs to avoid wasting cpu time. The only reference numbers related to my arbitrary input length limit choice are: A DNS hostname is 255 bytes at most. A label within a DNS name is 63 bytes at most.
The "Nothing" characters in question that are removed in nameprep()
in https://github.com/python/cpython/blob/main/Lib/encodings/idna.py#L18 are non-ascii unicode oddities.
Other IDNA decoding libraries in the wild like the well known ICU library do not limit the length at this level of API, instead leaving that to higher levels in the application:
cc'ing release managers @pablogsal and @ambv for feedback on the shape of this to backport. See the code review comment i left on the news entry. Q: backport (a) just the unnecessarily quadratic algorithm removal OR (b) also the added up front length check. I think either would be fine. The conservative backport choice would be (a). I'm happy to do either. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the comment explaining the arbitrary limit.
thanks victor!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Merging. I'll trigger the other backports from the 3.11 backport PR and let RMs chime in there. |
Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11. |
There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. This also adds an early length check in IDNA decoding to outright reject huge inputs early on given the ultimate result is defined to be 63 or fewer characters. (cherry picked from commit d315722) Co-authored-by: Gregory P. Smith <greg@krypto.org>
GH-99222 is a backport of this pull request to the 3.11 branch. |
|
) (pythonGH-99222) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
) (pythonGH-99222) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
) (pythonGH-99222) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
The buildbot failed with: |
Network issues should not mark a whole build as failure. I made buildbot change to mark next builds as "warning" rather than "failed" in this case: python/buildmaster-config#348 |
… (GH-99231) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
… (#99230) There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear. (cherry picked from commit d315722) (cherry picked from commit a6f6c3a) Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear and adds an upfront length check to avoid even bothering the decoding attempt when clearly unreasonable.