-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand Regex perf tests #1117
Expand Regex perf tests #1117
Conversation
[Benchmark] public void IP() => _ip.IsMatch("012.345.678.910"); | ||
[Benchmark] public void Uri() => _uri.IsMatch("http://example.org"); | ||
[Benchmark] public void Email_IsMatch() => _email.IsMatch("yay.performance@dot.net"); | ||
[Benchmark] public void Email_IsNotMatch() => _email.IsMatch("yay.performance@dot.net#"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big is the difference in the reported time for IsMatch
and IsNotMatch
? I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#Code-Paths)
Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks (https://github.com/dotnet/performance/blob/master/docs/microbenchmark-design-guidelines.md#benchmarks-are-immutable)
I wonder if instead of this we should have a single benchmark where the mismatched character is the first char. Just to measure the case where the method should return quickly.
[Benchmark] public void IsNotMatch_EmailFirstChar() => _email.IsMatch("#yay.performance@dot.net");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another disadvantage of adding the "IsNotMatch" test case is changing the IDs of the existing benchmarks
I don't think that's important. We should ensure we have the benchmarks we want to have. If we're not measuring the right things, we should stop doing so and measure the right things. I really don't care about the history of benchmarks that aren't meaningful, and we can run the benchmarks at any given point in time against all previous builds/releases we care about. The only problem with changing the IDs is that we might have more difficulty tracking down when a regression occurred, and that is not an issue here. Further, these benchmarks are only a month or two old, anyway.
How big is the difference in the reported time for IsMatch and IsNotMatch?
Big. See dotnet/runtime#1348 for examples.
I wonder if the execution path is different enough to make it worth to "duplicate" the benchmarks
It is. The match case is the happy path: every check as part of the regex succeeds. The not-match case highlights the quality of the regex engine, since that's where backtracking potentially kicks in.
I wonder if instead of this we should have a single benchmark where the mismatched character is the first char.
That'd be a fine additional case, but it's not the most important case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Thanks. |
cc: @adamsitnik, @billwert