Skip to content

Commit

Permalink
Merge pull request #629 from ossf/python-ruby-permissive
Browse files Browse the repository at this point in the history
Fix "Correctly Using Regex" table for Python and Ruby
  • Loading branch information
SecurityCRob authored Sep 24, 2024
2 parents 3ba1400 + ff74d1b commit f7ba781
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/Correctly-Using-Regular-Expressions-Rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ Setting both PCRE2_ANCHORED and PCRE2_ENDANCHORED forces a full-string match, bu
The [Python3 language documentation on re](https://docs.python.org/3/library/re.html) notes that its operations are “similar to those found in Perl” - but note that they are _similar_ not _identical_. In this library:

* ^ (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.
* $ Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.
* $ Matches the end of the string or just before the newline at the end of the string (it is _permissive_), and in MULTILINE mode it also matches before a newline.
* \A Matches only at the start of the string.
* \Z Matches only at the end of the string. Note that this is spelled \Z not \z, and there is no \z.

Expand Down
8 changes: 4 additions & 4 deletions docs/Correctly-Using-Regular-Expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Platform
</td>
<td>“\Z” (not “$” nor “\z”)
</td>
<td>No
<td>Yes
</td>
</tr>
<tr>
Expand All @@ -112,18 +112,18 @@ Platform
</td>
<td>“\z” (not “$”)
</td>
<td>No
<td>Yes
</td>
</tr>
</table>

For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab&#x7c;de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab&#x7c;de)\Z</tt>” or “<tt>\A(ab&#x7c;de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab&#x7c;de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab&#x7c;de)\Z</tt>” or “<tt>\A(ab&#x7c;de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive by default and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).

In addition, ensure your regex is not vulnerable to a Regular Expression Denial of Service (ReDoS) attack. A ReDoS “[is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size)](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)”. Many regex implementations are “backtracking” implementations, that is, they try all possible matches. In these implementations, a poorly-written regular expression can be exploited by an attacker to take a vast amount of time.

1. One solution is to use a regex implementation that does not have this vulnerability because it never backtracks. E.g., use Go’s default regex system, RE2, or on .NET enable the RegexOptions.NonBacktracking option. Non-backtracking implementations can sometimes be orders of magnitude faster, but they also omit some features (e.g., backreferences).
2. Alternatively, create regexes that require no or little backtracking. Where a branch (“&#x7c;”) occurs, the next character should select one branch. Where there is optional repetition (e.g., “&#x2a;”), the next character should determine if there is a repetition or not. One common cause of unnecessary backtracking are poorly-written regexes with repetitions in repetitions, e.g., “(a+)&#x2a;”. Some tools can help find these defects.
3. A partial countermeasure is to greatly limit the length of the untrusted input. This can limit the impact of a vulnerability.
3. A partial countermeasure is to greatly limit the length of the untrusted input and/or the number of repetitions. This can limit the impact of a vulnerability. For example, in a regex, use “{0,4}” (0 through 4 repetitions inclusive) instead of “*” (0 or more repetitions, with no maximum).

## Detailed Rationale

Expand Down
9 changes: 9 additions & 0 deletions docs/src/regex.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env python3

import re

print('Test Python regex')
print("Must be false: ", bool(re.search(r'^wrong$', "hello")))
print("Must be true: ", bool(re.search(r'^hello$', "hello")))
print("True if permissive: ", bool(re.search(r'^hello$', "hello\n")))
print("Should be false: ", bool(re.search(r'^hello$', "hello\nthere")))
7 changes: 7 additions & 0 deletions docs/src/regex.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env ruby

puts('Test Ruby regex')
puts("Must be false: ", !! /^wrong$/.match("hello"))
puts("Must be true: ", !! /^hello$/.match("hello"))
puts("True if permissive: ", !! /^hello$/.match("hello\n"))
puts("Should be true ($ always multi): ", !! /^hello$/.match("hello\nthere"))

0 comments on commit f7ba781

Please sign in to comment.