Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile Regexes and Cache them for performance #205

Closed
gfs opened this issue May 10, 2020 · 1 comment · Fixed by #204
Closed

Compile Regexes and Cache them for performance #205

gfs opened this issue May 10, 2020 · 1 comment · Fixed by #204
Assignees
Labels
enhancement New feature or request

Comments

@gfs
Copy link
Contributor

gfs commented May 10, 2020

Each call to the regex constructor is expensive as the constructor has to parse the string into its logical regex representation. If you want maximum performance out of each call to the regex you also want to use RegexOptions.Compiled which further increases that cost.

Creating a new regex object for each call to match pattern like in TextContainer.cs is very expensive.

Regex patRegx = new Regex(pattern.Pattern, reopt);

Instead the SearchPattern when created should have a compiled regex inside of it.

.NET 5 Has extra improvements for compiled regexes (and normal regexes) as well. https://devblogs.microsoft.com/dotnet/regex-performance-improvements-in-net-5/

@gfs gfs added the enhancement New feature or request label May 10, 2020
gfs added a commit that referenced this issue May 10, 2020
Compile and Cache Regexes in SearchPattern and TagSearchPattern.
Annotate remaining dynamically generated regexes with TODO.
@guyacosta
Copy link
Contributor

@gfs will do. Thanks for catching.

@guyacosta guyacosta self-assigned this May 11, 2020
guyacosta pushed a commit that referenced this issue May 13, 2020
…Regexes (#204)

* Yield return enumerables instead of full list.

* Finish IEnumerablifying output

* Paralellize analyze command

* Add single threaded option

* Pass through singlethreaded option

* Atomic updates to fix stats when running in parallel.

* Switch to concurrent dictionary to avoid collisions

* Stream off of disk instead of loading full file into memory.

* Catch more exceptions earlier.

* No need to check .Any

* Catch errors when reading xz files.

* Parallelize extraction where possible.

Undo yield return design pattern.

* Catch exceptions in Rar extractor.

* Update Extractor.cs

* Catch remaining exceptions in multi extractor.

* Add more rules for Json + Serialization

* Fix serilog rule

* Check both single and multithreaded algorithms

* Use string not String

* String to string

* Hygeine

* Add message reminding users to check the log if one exists.

* Update Program.cs

* Restore metadata setters

* Fix missing dispose

* Fix #205

Compile and Cache Regexes in SearchPattern and TagSearchPattern.
Annotate remaining dynamically generated regexes with TODO.

* Update SearchPattern.cs

* Compile frequently reused regexes in metadatahelper

* Reduce number of regexes created by factor of 3 in GetTagInfoListByConfidence

* Compile frequently used regex

* Reduce Regex constructor calls by 5x in GetTagListInfoBySeverity

* Remove unused regexoptions declaration.

* Dont serialize the regex variable itself

* Whitespace

* Rewrite one test to check.

* Call Program.Main instead of Process calling AI

* Build fix.

* Adds test cleanup for improved test stability on CLI set

* Unit test corrections for init/end

Co-authored-by: Guy Acosta <guacosta@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
2 participants