-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize symbolic regex Antimirov mode to avoid so much allocation #60918
Comments
Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions Issue DetailsWhen in Antimirov mode, we logically maintain a list of states we're currently in and, for each step, generate a new list of states based on looking at each current state and determining all the states it could lead us to. However, rather than something like: HashSet<Node> nextStates = _nextStates;
HashSet<Node> currentStates = _currentStates;
foreach (Node current in currentStates)
{
foreach (Node next in Transition(current))
{
nextStates.Add(next);
}
}
currentStates.Clear();
_nextStates = currentStates;
_currentStates = nextStates; it's implemented using an immutable SymbolicRegexNode, where every additional state found involves unioning in the new state into the next states node, which means generating one or more new nodes for each additional state we union in.
|
@veanes, I believe you're working on this or a very related optimization, right? |
Yes, started but got sidetracked a bit, will try to get this done soon (within a few days), essentially improving the overhead in the Antimirov mode, regarding state-set representation (using a more lightweight def, initially just |
The PR #65637 should fix this issue, once this has been validated. |
When in Antimirov mode, we logically maintain a list of states we're currently in and, for each step, generate a new list of states based on looking at each current state and determining all the states it could lead us to. However, rather than something like:
it's implemented using an immutable SymbolicRegexNode, where every additional state found involves unioning in the new state into the next states node, which means generating one or more new nodes for each additional state we union in.
The text was updated successfully, but these errors were encountered: