Expand Regex perf tests #1117

stephentoub · 2020-01-07T04:18:18Z

adamsitnik · 2020-01-07T10:14:56Z

src/benchmarks/micro/corefx/System.Text.RegularExpressions/Perf.Regex.Common.cs

-        [Benchmark] public void IP() => _ip.IsMatch("012.345.678.910");
-        [Benchmark] public void Uri() => _uri.IsMatch("http://example.org");
+        [Benchmark] public void Email_IsMatch() => _email.IsMatch("yay.performance@dot.net");
+        [Benchmark] public void Email_IsNotMatch() => _email.IsMatch("yay.performance@dot.net#");


How big is the difference in the reported time for IsMatch and IsNotMatch? I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Code-Paths)

Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#benchmarks-are-immutable)

I wonder if instead of this we should have a single benchmark where the mismatched character is the first char. Just to measure the case where the method should return quickly.

[Benchmark] public void IsNotMatch_EmailFirstChar() => _email.IsMatch("#yay.performance@dot.net");

Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks

I don't think that's important. We should ensure we have the benchmarks we want to have. If we're not measuring the right things, we should stop doing so and measure the right things. I really don't care about the history of benchmarks that aren't meaningful, and we can run the benchmarks at any given point in time against all previous builds/releases we care about. The only problem with changing the IDs is that we might have more difficulty tracking down when a regression occurred, and that is not an issue here. Further, these benchmarks are only a month or two old, anyway.

How big is the difference in the reported time for IsMatch and IsNotMatch?

Big. See dotnet/runtime#1348 for examples.

I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks

It is. The match case is the happy path: every check as part of the regex succeeds. The not-match case highlights the quality of the regex engine, since that's where backtracking potentially kicks in.

I wonder if instead of this we should have a single benchmark where the mismatched character is the first char.

That'd be a fine additional case, but it's not the most important case.

adamsitnik

LGTM, thank you!

stephentoub · 2020-01-07T15:42:39Z

Thanks.

Expand Regex perf tests

027c591

stephentoub mentioned this pull request Jan 7, 2020

More Regex perf improvements dotnet/runtime#1348

Merged

adamsitnik reviewed Jan 7, 2020

View reviewed changes

adamsitnik approved these changes Jan 7, 2020

View reviewed changes

adamsitnik merged commit 93a1782 into dotnet:master Jan 7, 2020

stephentoub deleted the regexperf branch January 7, 2020 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand Regex perf tests #1117

Expand Regex perf tests #1117

stephentoub commented Jan 7, 2020

adamsitnik Jan 7, 2020 •

edited

Loading

stephentoub Jan 7, 2020 •

edited

Loading

adamsitnik left a comment

stephentoub commented Jan 7, 2020

Expand Regex perf tests #1117

Expand Regex perf tests #1117

Conversation

stephentoub commented Jan 7, 2020

adamsitnik Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

stephentoub Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

adamsitnik left a comment

Choose a reason for hiding this comment

stephentoub commented Jan 7, 2020

adamsitnik Jan 7, 2020 •

edited

Loading

stephentoub Jan 7, 2020 •

edited

Loading