Fix inadvertently case sensitive Boyer-Moore #39420

danmoseley · 2020-07-16T07:12:07Z

In this case the pattern "H#" would not match "#H#" iff RegexOptions.IgnoreCase | RegexOptions.Compiled.

Because the pattern contains a literal prefix (indeed it is the entire pattern) we will use Boyer-Moore to find the first instance of it. (One could imagine a more efficient way to search for a 2-character prefix.) Because the IgnoreCase was passed, we lowercase the pattern immediately to "h#", and when we match against a character in the text, we must lower case that character to compare it.

As a performance optimization, in the Compiled path, we avoid calling ToLower on the text candidate if we can cheaply verify that the character we are searching for is not be affected by case conversion. In this case, for example, we need not bother to lower case the text candidate character when we are searching for "#" because it is in a UnicodeCategory ("OtherPunctuation") which we know is not affected by case conversion. This optimization, like many others, does not exist in the non Compiled path.

The bug was that when deciding whether to lowercase the text candidate, instead of examining the character we were searching for, we were examining the last character of the prefix instead. In this repro case that is "#" so when searching for "H" we would not lower case it.

I added a test that fails without this fix.

Dotnet-GitSync-Bot · 2020-07-16T07:12:11Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

danmoseley · 2020-07-16T07:12:45Z

We need to get this into Preview 8.

danmoseley · 2020-07-16T07:28:19Z

Separate from this PR, it would probably be good to add a test that searches for a random string against a text that may or may not contain it somewhere, and compare compiled with non compiled. Such a test could quickly have found this bug and might protect us against others. The comparison with non compiled is interesting because the implementation is so different.

...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs

ghost · 2020-07-16T14:37:31Z

Tagging subscribers to this area: @eerhardt, @pgovind
Notify danmosemsft if you want to be subscribed.

pgovind

Nice! LGTM!

danmoseley added 2 commits July 16, 2020 00:05

Unit test

b7ba4e6

Fix

c5b1ac8

danmoseley requested review from eerhardt and pgovind July 16, 2020 07:12

This was referenced Jul 16, 2020

Port Regex Boyer-Moore fix to Preview 8 #39421

Closed

Port Regex Boyer-Moore fix to Preview 8 #39422

Merged

Typo

9855bfb

stephentoub approved these changes Jul 16, 2020

View reviewed changes

...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Outdated Show resolved Hide resolved

CR feedback

2943542

danmoseley added the area-System.Text.RegularExpressions label Jul 16, 2020

pgovind approved these changes Jul 16, 2020

View reviewed changes

danmoseley mentioned this pull request Jul 16, 2020

Tests failed on WASM with NRE in CustomAttributeTypedArgument.CanonicalizeValue #39473

Closed

danmoseley merged commit c9627a1 into dotnet:master Jul 16, 2020

danmoseley deleted the regexbug branch July 16, 2020 20:58

karelz added this to the 5.0.0 milestone Aug 18, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

danmoseley restored the regexbug branch December 22, 2020 05:07

danmoseley deleted the regexbug branch September 6, 2021 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix inadvertently case sensitive Boyer-Moore #39420

Fix inadvertently case sensitive Boyer-Moore #39420

danmoseley commented Jul 16, 2020

Dotnet-GitSync-Bot commented Jul 16, 2020

danmoseley commented Jul 16, 2020

danmoseley commented Jul 16, 2020

ghost commented Jul 16, 2020

pgovind left a comment

Fix inadvertently case sensitive Boyer-Moore #39420

Fix inadvertently case sensitive Boyer-Moore #39420

Conversation

danmoseley commented Jul 16, 2020

Dotnet-GitSync-Bot commented Jul 16, 2020

danmoseley commented Jul 16, 2020

danmoseley commented Jul 16, 2020

ghost commented Jul 16, 2020

pgovind left a comment

Choose a reason for hiding this comment