Understand performance of string comparisons #196

nsmith- · 2023-07-20T13:33:57Z

For corrections that use string input fields extensively, string comparison could have a significant performance impact. It may be the case that the strings are relatively static (e.g. constant for an entire dataset or subset of dataset), so one would expect branch prediction to do a good job in amortizing the expense. Nevertheless, some profiling to understand the extent of the issue would be useful.
There are a few improvements we could make to reduce string comparison:

Add an API that allows to pre-create some integer token that represents the string and pass that as an argument in the Correction::evaluate call, which internally would then do a faster lookup
Project out the string dimension and return a reduced correction, as discussed in Partially evaluated correction object #38
Provide a context manager in which certain nodes the correction's evaluation tree are frozen to pre-defined values (c.f. @arizzi)

The text was updated successfully, but these errors were encountered:

nsmith- added help wanted Extra attention is needed evaluator Issues related to the evaluator labels Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understand performance of string comparisons #196

Understand performance of string comparisons #196

nsmith- commented Jul 20, 2023

Understand performance of string comparisons #196

Understand performance of string comparisons #196

Comments

nsmith- commented Jul 20, 2023