Skip to content

Commit

Permalink
Fix compile loop in 32-bit mode for characters above the Unicode limi…
Browse files Browse the repository at this point in the history
…t when caseless and ucp are set.
  • Loading branch information
PhilipHazel committed Dec 1, 2023
1 parent 0820852 commit afce00e
Show file tree
Hide file tree
Showing 5 changed files with 27 additions and 5 deletions.
12 changes: 8 additions & 4 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -174,10 +174,14 @@ undefined behaviour.
that its end is handled similarly to other recursions. This has altered the
behaviour of /|(?0)./endanchored which was previously not right.

48. Improved the test for looping recursion by checking the last referenced
character as well as the current character. This allows some patterns that
previously triggered the check to run to completion instead of giving the loop
error.
48. Improved the test for looping recursion by checking the last referenced
character as well as the current character. This allows some patterns that
previously triggered the check to run to completion instead of giving the loop
error.

49. In 32-bit mode, the compiler looped for the pattern /[\x{ffffffff}]/ when
PCRE2_CASELESS and PCRE2_UCP (but not PCRE2_UTF) were set. Fixed by not trying
to look for other cases for characters above the Unicode range.


Version 10.42 11-December-2022
Expand Down
6 changes: 5 additions & 1 deletion src/pcre2_compile.c
Original file line number Diff line number Diff line change
Expand Up @@ -5155,10 +5155,14 @@ unsigned int co;

/* Find the first character that has an other case. If it has multiple other
cases, return its case offset value. When CASELESS_RESTRICT is set, ignore the
multi-case entries that begin with ASCII values. */
multi-case entries that begin with ASCII values. In 32-bit mode, a value
greater than the Unicode maximum ends the range. */

for (c = *cptr; c <= d; c++)
{
#if PCRE2_CODE_UNIT_WIDTH == 32
if (c > MAX_UTF_CODE_POINT) return -1;
#endif
if ((co = UCD_CASESET(c)) != 0 &&
(!restricted || PRIV(ucd_caseless_sets)[co] > 127))
{
Expand Down
4 changes: 4 additions & 0 deletions testdata/testinput12
Original file line number Diff line number Diff line change
Expand Up @@ -573,4 +573,8 @@
/\X++/
a\x{110000}\x{ffffffff}

# This used to loop in 32-bit mode; it will fail in 16-bit mode.
/[\x{ffffffff}]/caseless,ucp
\x{ffffffff}xyz

# End of testinput12
5 changes: 5 additions & 0 deletions testdata/testoutput12-16
Original file line number Diff line number Diff line change
Expand Up @@ -1823,4 +1823,9 @@ Failed: error 134 at offset 11: character code point value in \x{} or \o{} is to
** Truncation will probably give the wrong result.
0: a\x00\x{ffff}

# This used to loop in 32-bit mode; it will fail in 16-bit mode.
/[\x{ffffffff}]/caseless,ucp
Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
\x{ffffffff}xyz

# End of testinput12
5 changes: 5 additions & 0 deletions testdata/testoutput12-32
Original file line number Diff line number Diff line change
Expand Up @@ -1817,4 +1817,9 @@ No match
a\x{110000}\x{ffffffff}
0: a\x{110000}\x{ffffffff}

# This used to loop in 32-bit mode; it will fail in 16-bit mode.
/[\x{ffffffff}]/caseless,ucp
\x{ffffffff}xyz
0: \x{ffffffff}

# End of testinput12

0 comments on commit afce00e

Please sign in to comment.