Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special-case two set loops in RegexCompiler #32003

Merged
merged 3 commits into from
Feb 11, 2020

Conversation

stephentoub
Copy link
Member

  • If an expression contains .* and RegexOptions.Singleline was specified, it ends up meaning [everything]*, in which case we don't have to match anything and can just jump to the last position.
  • If an expression contains [^xyz]* with 2 or 3 negated chars in a set loop, we can use IndexOfAny for those chars to vectorize the search.

Contributes to #1349
cc: @danmosemsft, @eerhardt, @ViktorHofer

stephentoub and others added 3 commits February 10, 2020 23:48
When RegexOptions.Singleline is specified, `.*` goes from meaning `[^\n]*` to meaning `[everything]*`.  As such, when we encounter it, we can just jump to the end without actually comparing anything.  This is worth optimizing in in the compiler because, well, why not, but we don't in the interpreter because it's not a common-enough pattern to spend cycles checking for.
Co-Authored-By: Dan Moseley <danmose@microsoft.com>
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@stephentoub stephentoub merged commit 839dd69 into dotnet:master Feb 11, 2020
@stephentoub stephentoub deleted the regexopts branch February 11, 2020 18:35
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants