Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular expression catastrophic backtracking in bottle.Router.rule_syntax #1194

Closed
mschwager opened this issue Jan 2, 2020 · 8 comments
Closed

Comments

@mschwager
Copy link

mschwager commented Jan 2, 2020

Hi there!

I'm working on a static analysis tool called Dlint. A rule I've recently written looks for catastrophic backtracking in the Python re module. Looks like this has come up before in #1155.

After running this rule against your code base I found one issue:

$ python -m flake8 --select=DUO138 bottle.py
bottle.py:302:19: DUO138 catastrophic "re" usage - denial-of-service possible

If we look at the rule_syntax expression we can indeed confirm it has catastrophic cases:

    rule_syntax = re.compile('(\\\\*)'
        '(?:(?::([a-zA-Z_][a-zA-Z_0-9]*)?()(?:#(.*?)#)?)'
          '|(?:<([a-zA-Z_][a-zA-Z_0-9]*)?(?::([a-zA-Z_]*)'
            '(?::((?:\\\\.|[^\\\\>]+)+)?)?)?>))')

Under normal cases the expression runs quickly:

$ python -c "import bottle; list(bottle.Router.rule_syntax.finditer('<abc:def:' + '.' * 64 + '>'))"

But if we craft a special subject string we can cause it to catastrophically backtrack:

$ python -c "import bottle; list(bottle.Router.rule_syntax.finditer('<abc:def:' + '.' * 64 + '<'))"

The culprit here is this portion of the expression:

((?:\\\\.|[^\\\\>]+)+)?

Due to mutually inclusive alternation, a long string of dots (.) will backtrack this expression. Changing the expression to the following should fix the issue:

((?:\\\\.|[^\\\\.\\\\>]+)+)?

If this breaks intended functionality there may be other ways to fix the expression.

Let me know if you have any questions, and I hope you'll check out Dlint! Note that the DUO138 catastrophic backtracking check hasn't been released to PyPI yet as I'm still ironing out some kinks. Installing from the Github repository itself will provide the new rule, though.

@defnull
Copy link
Member

defnull commented Jan 3, 2020

The pattern should match "everything up until the next non-excaped >". It is not about literal dots and the proposed fix won't work, unfortunately.

I'm not sure if this is really an issue, because route definitions are usually short, controlled by the developer, and evaluated only once. #1155 is way more serious and I have no idea how to fix that yet. Any help would be welcomed.

For this one, would removing the inner + help? It would probably slow down the good case, but prevent the catastrophic runtime in the bad one?

((?:\\\\.|[^\\\\>])+)?

@defnull
Copy link
Member

defnull commented Jan 3, 2020

Forgot to mention: Tools being able to detect these issues are a HUGE win and I hope python adds ways around these errors in the future. Most mitigations against catastrophic backtracking are not possible in python because the regular expression engine lacks support for certain features. No idea how to fix #1155 because of that :/

@mschwager
Copy link
Author

mschwager commented Jan 3, 2020

For this one, would removing the inner + help? It would probably slow down the good case, but prevent the catastrophic runtime in the bad one?

Unfortunately removing the inner + would not fix this issue:

>>> re.match(r'((?:\.|[^\>])+)?>', '.' * 64 + '<')
... Spins ...

The issue is that the alternation or branch (|) has overlapping characters: \. and [^\>]. Since [^\>] is anything BUT > it can be . which matches the other side of the branch and can result in exponential backtracking.

The pattern should match "everything up until the next non-escaped >". It is not about literal dots and the proposed fix won't work, unfortunately.

To match "everything up until the next non-escaped >" could we simply do ([^\>]+)?> or even [^\>]+> which will match everything except > until the next >. This is a common pattern.

I'm confused about significance of the literal dot. If they aren't important we can just let [^\>] match literal dots too?

Thoughts?

@defnull
Copy link
Member

defnull commented Jan 3, 2020

I think you misinterpret what the regular expression actually does. Count the backslashes. The python String (\\\\.|[^\\\\>]) resolves to the regular expression (\\.|[^\\>]). The backslashes are escaped and resolve to literal backslashes. The dot is not escaped, it matches any character but the newline. In words: "Backslash followed by any single character, OR any single character other than backslash or >". There is no literal dot in the expression and no overlap between the left and right side of the |.

Removing the inner repetition should fix the backtracking issue.

>>> re.compile('(\\\\*)(?:(?::([a-zA-Z_][a-zA-Z_0-9]*)?()(?:#(.*?)#)?)|(?:<([a-zA-Z_][a-zA-Z_0-9]*)?(?::([a-zA-Z_]*)(?::((?:\\\\.|[^\\\\>])+)?)?)?>))').match('<abc:def:' + 'x'*64 + '>').groups()
('', None, None, None, 'abc', 'def', 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
>>> re.compile('(\\\\*)(?:(?::([a-zA-Z_][a-zA-Z_0-9]*)?()(?:#(.*?)#)?)|(?:<([a-zA-Z_][a-zA-Z_0-9]*)?(?::([a-zA-Z_]*)(?::((?:\\\\.|[^\\\\>])+)?)?)?>))').match('<abc:def:' + 'x'*64 + 'NOT-GT').groups()

That seems to work as expected, and no longer hang for large invalid input.

@defnull defnull closed this as completed in aaee93a Jan 3, 2020
@mschwager
Copy link
Author

Ah, I see. Normally I use raw strings (e.g. r'...') for regular expressions to avoid all the backslash escaping. Yes, I believe (\\\\.|[^\\\\>]) will work then - the literal \ at the beginning of one branch and disallowed from the other branch should prevent catastrophic backtracking. And removing the additional + avoids nested quantifiers 👍

@defnull
Copy link
Member

defnull commented Jan 3, 2020

I finally understood this issue a bit better today and was able to (hopefully) fix three backtracking issues in bottle with your help. Thanks a lot for that!

@mschwager
Copy link
Author

Glad we got everything figured out! I know, catastrophic backtracking regular expressions caused some catastrophic backtracking in my brain for a long time before I fully understood them :)

kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
Related to bottlepy#1194

This backports the patch from 332215b to the 0.12 release branch.
kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
…ttle.Router.rule_syntax

This backports the patch from aaee93a to the 0.12 release branch.
kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
Related to bottlepy#1194

This backports the patch from 332215b to the 0.12 release branch.

This fix can be validated using the following repl commands:

    >>> import bottle
    >>> bottle.template("""<img src="{{usar_webp(''/static/images/branco400.jpg')}}" alt="Sem imagem"/>""")
kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
…ttle.Router.rule_syntax

This backports the patch from aaee93a to the 0.12 release branch.

This fix can be validated with the following command from the issue:

    python -c "import bottle; list(bottle.Router.rule_syntax.finditer('<abc:def:' + '.' * 64 + '<'))"
kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
Related to bottlepy#1194

This backports the patch from 332215b to the 0.12 release branch.

This fix can be validated using the following repl commands:

    >>> import bottle
    >>> bottle.template("""<img src="{{usar_webp(''/static/images/branco400.jpg')}}" alt="Sem imagem"/>""")
kujenga added a commit to kujenga/bottle that referenced this issue Nov 1, 2022
…ttle.Router.rule_syntax

This backports the patch from aaee93a to the 0.12 release branch.

This fix can be validated with the following command from the issue:

    python -c "import bottle; list(bottle.Router.rule_syntax.finditer('<abc:def:' + '.' * 64 + '<'))"
@kujenga
Copy link

kujenga commented Nov 1, 2022

Hi @defnull , I've submitted this PR to back-port the fix here to the 0.12 branch: #1402

defnull pushed a commit that referenced this issue Mar 4, 2023
Related to #1194

This backports the patch from 332215b to the 0.12 release branch.

This fix can be validated using the following repl commands:

    >>> import bottle
    >>> bottle.template("""<img src="{{usar_webp(''/static/images/branco400.jpg')}}" alt="Sem imagem"/>""")
defnull pushed a commit that referenced this issue Mar 4, 2023
…ter.rule_syntax

This backports the patch from aaee93a to the 0.12 release branch.

This fix can be validated with the following command from the issue:

    python -c "import bottle; list(bottle.Router.rule_syntax.finditer('<abc:def:' + '.' * 64 + '<'))"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants