-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile Regexes and Cache them for performance #205
Labels
enhancement
New feature or request
Comments
@gfs will do. Thanks for catching. |
guyacosta
pushed a commit
that referenced
this issue
May 13, 2020
…Regexes (#204) * Yield return enumerables instead of full list. * Finish IEnumerablifying output * Paralellize analyze command * Add single threaded option * Pass through singlethreaded option * Atomic updates to fix stats when running in parallel. * Switch to concurrent dictionary to avoid collisions * Stream off of disk instead of loading full file into memory. * Catch more exceptions earlier. * No need to check .Any * Catch errors when reading xz files. * Parallelize extraction where possible. Undo yield return design pattern. * Catch exceptions in Rar extractor. * Update Extractor.cs * Catch remaining exceptions in multi extractor. * Add more rules for Json + Serialization * Fix serilog rule * Check both single and multithreaded algorithms * Use string not String * String to string * Hygeine * Add message reminding users to check the log if one exists. * Update Program.cs * Restore metadata setters * Fix missing dispose * Fix #205 Compile and Cache Regexes in SearchPattern and TagSearchPattern. Annotate remaining dynamically generated regexes with TODO. * Update SearchPattern.cs * Compile frequently reused regexes in metadatahelper * Reduce number of regexes created by factor of 3 in GetTagInfoListByConfidence * Compile frequently used regex * Reduce Regex constructor calls by 5x in GetTagListInfoBySeverity * Remove unused regexoptions declaration. * Dont serialize the regex variable itself * Whitespace * Rewrite one test to check. * Call Program.Main instead of Process calling AI * Build fix. * Adds test cleanup for improved test stability on CLI set * Unit test corrections for init/end Co-authored-by: Guy Acosta <guacosta@microsoft.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Each call to the regex constructor is expensive as the constructor has to parse the string into its logical regex representation. If you want maximum performance out of each call to the regex you also want to use
RegexOptions.Compiled
which further increases that cost.Creating a new regex object for each call to match pattern like in
TextContainer.cs
is very expensive.ApplicationInspector/RulesEngine/TextContainer.cs
Line 161 in 71df078
Instead the
SearchPattern
when created should have a compiled regex inside of it..NET 5 Has extra improvements for compiled regexes (and normal regexes) as well. https://devblogs.microsoft.com/dotnet/regex-performance-improvements-in-net-5/
The text was updated successfully, but these errors were encountered: