Change regex usage to manual parsing #1094

lahma · 2023-06-08T17:29:27Z

I started profiling the "worst case" of evaluating formulas which seemed to take a lot of resources. The culprit turned out to be usage of regex parsing to get column/row/cell information. I created a custom parser that should fulfill the current needs and is a lot faster as it's being called in hot code paths. I also updated APIs to allow also ReadOnlySpan<char> for operations that needed substring to work with.

Based on running the benchmarks in the other PR, here are the improvement numbers. Still room to improve but this was the low-hanging fruit as first step.

NPOI.Benchmarks.LargeExcelFileBenchmark

Diff	Method	Mean	Error	Allocated
Old	Load	11.741 s	1.1257 s	3.65 GB
New		11.497 s (-2%)	2.4215 s	2.79 GB (-24%)
Old	Write	3.483 s	0.1738 s	1.19 GB
New		3.492 s (0%)	0.2931 s	1.19 GB (0%)
Old	Evaluate	62.677 s	4.1891 s	75.32 GB
New		18.854 s (-70%)	2.7709 s	12.93 GB (-83%)

NPOI.Benchmarks.RangeValuesBenchmark

Diff	Method	Mean	Error	Allocated
Old	Double	85.46 ms	0.367 ms	31.45 MB
New		86.43 ms (+1%)	0.181 ms	31.45 MB (0%)
Old	String	99.93 ms	0.449 ms	59.32 MB
New		98.76 ms (-1%)	0.521 ms	59.32 MB (0%)
Old	Date	114.71 ms	1.603 ms	35.73 MB
New		115.39 ms (+1%)	0.832 ms	35.73 MB (0%)
Old	Formulas	1,064.12 ms	10.889 ms	1480.15 MB
New		402.57 ms (-62%)	1.058 ms	284.07 MB (-81%)

tonyqus · 2023-06-08T18:52:38Z

Wow, this is a incredible result. I'm looking forward to this PR.

lahma · 2023-06-09T18:12:37Z

Hey @tonyqus I believe this is ready for review. I've updated the benchmarks table to reflect latest version. I'm quite confident that this PR shouldn't break much and public API only has additions of ROS version side-by-side with the string-taking version. I was a good thing to have the tests as part of PR as they discovered one case I hadn't covered!

To improve read/load speed I think it would require changing from XmlDocument to XmlReader which is a lower-level API. I think there's some easy wins in Write part though, there's some unnecessary string allocations there.

tonyqus · 2023-06-21T18:54:30Z

I did think of changing XmlDocument to XmlReader. But it's kind of tradeoff between convenience and performance. XmlDocument is much easier to use than XmlReader for most cases.

tonyqus · 2023-07-31T19:21:13Z

Can you help put CellReferenceParser.cs into SS/Util folder instead of main folder?

lahma · 2023-07-31T19:51:08Z

Rebased and tweaks in last commit, Linux build still fails due to the fix PR still pending.

* use spans when possible

tonyqus · 2023-08-29T13:32:57Z

LGTM

lahma force-pushed the manual-cell-address-parsing branch from 01009b5 to 2b3a949 Compare June 8, 2023 18:06

tonyqus added this to the NPOI 2.7.0 milestone Jun 8, 2023

lahma force-pushed the manual-cell-address-parsing branch 6 times, most recently from da33d7d to dea2293 Compare June 9, 2023 17:58

lahma marked this pull request as ready for review June 9, 2023 18:11

tonyqus modified the milestones: NPOI 2.7.1, NPOI 2.7.0 Jun 21, 2023

tonyqus added performance xls xlsx labels Jun 22, 2023

lahma force-pushed the manual-cell-address-parsing branch 2 times, most recently from 4c427cb to 11e5c05 Compare July 31, 2023 19:41

lahma added 2 commits August 26, 2023 08:27

Change regex usage to manual parsing

1d658be

* use spans when possible

Move CellReferenceParser to SS/Util

0bd93e9

lahma force-pushed the manual-cell-address-parsing branch from 11e5c05 to 0bd93e9 Compare August 26, 2023 05:27

tonyqus merged commit d87aae7 into nissl-lab:master Aug 29, 2023

lahma deleted the manual-cell-address-parsing branch August 29, 2023 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change regex usage to manual parsing #1094

Change regex usage to manual parsing #1094

lahma commented Jun 8, 2023 •

edited

Loading

tonyqus commented Jun 8, 2023

lahma commented Jun 9, 2023

tonyqus commented Jun 21, 2023 •

edited

Loading

tonyqus commented Jul 31, 2023

lahma commented Jul 31, 2023

tonyqus commented Aug 29, 2023

Change regex usage to manual parsing #1094

Change regex usage to manual parsing #1094

Conversation

lahma commented Jun 8, 2023 • edited Loading

NPOI.Benchmarks.LargeExcelFileBenchmark

NPOI.Benchmarks.RangeValuesBenchmark

tonyqus commented Jun 8, 2023

lahma commented Jun 9, 2023

tonyqus commented Jun 21, 2023 • edited Loading

tonyqus commented Jul 31, 2023

lahma commented Jul 31, 2023

tonyqus commented Aug 29, 2023

lahma commented Jun 8, 2023 •

edited

Loading

tonyqus commented Jun 21, 2023 •

edited

Loading