Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-98433: Fix quadratic time idna decoding. #99092

Merged
merged 8 commits into from
Nov 8, 2022

Conversation

gpshead
Copy link
Member

@gpshead gpshead commented Nov 4, 2022

There was an unnecessary quadratic loop in idna decoding. This restores the behavior to linear and adds an upfront length check to avoid even bothering the decoding attempt when clearly unreasonable.

  • test-with-buildbots
  • (to be done on the backport PR chain) get release manager decision on the shape of the backports.

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

An early length check would still be a good idea given that DNS IDNA
label names cannot be more than 63 ASCII characters.
Lib/encodings/idna.py Outdated Show resolved Hide resolved
@gpshead gpshead marked this pull request as ready for review November 4, 2022 10:33
@pochmann
Copy link
Contributor

pochmann commented Nov 4, 2022

The code looks weird (already did before the change):

    RandAL = [stringprep.in_table_d1(x) for x in label]
    any_in_table_d2 = any(stringprep.in_table_d2(x) for x in label)
    for c in RandAL:
        if c:
            if any_in_table_d2:
                raise UnicodeError("Violation of BIDI requirement 2")
            if not RandAL[0] or not RandAL[-1]:
                raise UnicodeError("Violation of BIDI requirement 3")

Doesn't make sense to run the same two inner checks multiple times (note they don't use c). Either they raise the first time or they never do.

Is there a good reason it's not as follows? (Replaced the for+if with if any, unindented, and moved the D2 check back down where it was before the change).

    RandAL = [stringprep.in_table_d1(x) for x in label]
    if any(RandAL):
        if any(stringprep.in_table_d2(x) for x in label):
            raise UnicodeError("Violation of BIDI requirement 2")
        if not RandAL[0] or not RandAL[-1]:
            raise UnicodeError("Violation of BIDI requirement 3")

I think it's only because the code was written in 2003 and any didn't exist yet (introduced in 2006).

@gpshead
Copy link
Member Author

gpshead commented Nov 4, 2022

The code looks weird (already did before the change)

+10

I think it's only because the code was written in 2003 and any didn't exist yet (introduced in 2006).

Ahha, that explains so much. Thanks @pochmann! I was aiming for minimal diff once I profiled to find the hot spot, but this code is pretty gross. I'll go with your suggested version. Much cleaner.

@gpshead
Copy link
Member Author

gpshead commented Nov 4, 2022

random observation: If someone wanted a truly minimal patch, putting a break at the end of the for c in RandAL: ... if c: ... block would also work. I went with the modern easier to understand suggestion from Stefan.

such as :mod:`urllib` http ``3xx`` redirects potentially allow for an attacker
to supply such a name.

Individual labels within an IDNA encoded DNS name will now raise an error early
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with leaving this length limit out either entirely - or just out of the backports if release managers are uncomfortable with this in a bugfix.

I view it as protection from other as yet unknown future idna or punycode decoder logic bugs to avoid wasting cpu time. The only reference numbers related to my arbitrary input length limit choice are: A DNS hostname is 255 bytes at most. A label within a DNS name is 63 bytes at most.

The "Nothing" characters in question that are removed in nameprep() in https://github.com/python/cpython/blob/main/Lib/encodings/idna.py#L18 are non-ascii unicode oddities.

Other IDNA decoding libraries in the wild like the well known ICU library do not limit the length at this level of API, instead leaving that to higher levels in the application:

@gpshead gpshead requested review from pablogsal and ambv November 4, 2022 20:29
@gpshead
Copy link
Member Author

gpshead commented Nov 4, 2022

cc'ing release managers @pablogsal and @ambv for feedback on the shape of this to backport. See the code review comment i left on the news entry.

Q: backport (a) just the unnecessarily quadratic algorithm removal OR (b) also the added up front length check.

I think either would be fine. The conservative backport choice would be (a). I'm happy to do either.

@gpshead gpshead added release-blocker 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Nov 4, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gpshead for commit bd51456 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 4, 2022
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the comment explaining the arbitrary limit.

Lib/test/test_codecs.py Outdated Show resolved Hide resolved
thanks victor!
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@gpshead
Copy link
Member Author

gpshead commented Nov 8, 2022

Merging. I'll trigger the other backports from the 3.11 backport PR and let RMs chime in there.

@gpshead gpshead merged commit d315722 into python:main Nov 8, 2022
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Nov 8, 2022
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

This also adds an early length check in IDNA decoding to outright reject
huge inputs early on given the ultimate result is defined to be 63 or fewer
characters.
(cherry picked from commit d315722)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-bot
Copy link

GH-99222 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Nov 8, 2022
@gpshead gpshead deleted the security/idna_dos_mitigation branch November 8, 2022 01:21
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot s390x Fedora Clang Installed 3.x has failed when building commit d315722.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/531/builds/2842) and take a look at the build logs.
  4. Check if the failure is related to this commit (d315722) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/531/builds/2842

Summary of the results of the build (if available):

Click to see traceback logs
fatal: unable to access 'https://github.com/python/cpython.git/': The requested URL returned error: 503

chmod: cannot access 'target/': No such file or directory

make: *** No rule to make target 'distclean'.  Stop.

gpshead added a commit that referenced this pull request Nov 8, 2022
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 8, 2022
) (pythonGH-99222)

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)

(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 8, 2022
) (pythonGH-99222)

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)

(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit to miss-islington/cpython that referenced this pull request Nov 8, 2022
) (pythonGH-99222)

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)

(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
ned-deily pushed a commit that referenced this pull request Nov 8, 2022
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
miss-islington added a commit that referenced this pull request Nov 8, 2022
There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)

(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
@encukou
Copy link
Member

encukou commented Nov 8, 2022

The buildbot failed with: fatal: unable to access 'https://github.com/python/cpython.git/': The requested URL returned error: 503
https://www.githubstatus.com/ confirms “intermittent availability issues on some GitHub services”. Later builds are green.

@vstinner
Copy link
Member

vstinner commented Nov 8, 2022

The buildbot failed with: fatal: unable to access 'https://github.com/python/cpython.git/': The requested URL returned error: 503

Network issues should not mark a whole build as failure. I made buildbot change to mark next builds as "warning" rather than "failed" in this case: python/buildmaster-config#348

ambv pushed a commit that referenced this pull request Nov 10, 2022
… (GH-99231)

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)
(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
ambv pushed a commit that referenced this pull request Nov 10, 2022
… (#99230)

There was an unnecessary quadratic loop in idna decoding. This restores
the behavior to linear.

(cherry picked from commit d315722)
(cherry picked from commit a6f6c3a)

Co-authored-by: Miss Islington (bot) <31488909+miss-islington@users.noreply.github.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants