Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Regex perf tests #1117

Merged
merged 1 commit into from
Jan 7, 2020
Merged

Expand Regex perf tests #1117

merged 1 commit into from
Jan 7, 2020

Conversation

stephentoub
Copy link
Member

[Benchmark] public void IP() => _ip.IsMatch("012.345.678.910");
[Benchmark] public void Uri() => _uri.IsMatch("http://example.org");
[Benchmark] public void Email_IsMatch() => _email.IsMatch("yay.performance@dot.net");
[Benchmark] public void Email_IsNotMatch() => _email.IsMatch("yay.performance@dot.net#");
Copy link
Member

@adamsitnik adamsitnik Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the difference in the reported time for IsMatch and IsNotMatch? I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Code-Paths)

Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#benchmarks-are-immutable)

I wonder if instead of this we should have a single benchmark where the mismatched character is the first char. Just to measure the case where the method should return quickly.

[Benchmark] public void IsNotMatch_EmailFirstChar() => _email.IsMatch("#yay.performance@dot.net");

Copy link
Member Author

@stephentoub stephentoub Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks

I don't think that's important. We should ensure we have the benchmarks we want to have. If we're not measuring the right things, we should stop doing so and measure the right things. I really don't care about the history of benchmarks that aren't meaningful, and we can run the benchmarks at any given point in time against all previous builds/releases we care about. The only problem with changing the IDs is that we might have more difficulty tracking down when a regression occurred, and that is not an issue here. Further, these benchmarks are only a month or two old, anyway.

How big is the difference in the reported time for IsMatch and IsNotMatch?

Big. See dotnet/runtime#1348 for examples.

I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks

It is. The match case is the happy path: every check as part of the regex succeeds. The not-match case highlights the quality of the regex engine, since that's where backtracking potentially kicks in.

I wonder if instead of this we should have a single benchmark where the mismatched character is the first char.

That'd be a fine additional case, but it's not the most important case.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@adamsitnik adamsitnik merged commit 93a1782 into dotnet:master Jan 7, 2020
@stephentoub stephentoub deleted the regexperf branch January 7, 2020 15:42
@stephentoub
Copy link
Member Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants