-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out next areas to vectorize #4580
Comments
Been trying to figure out ways to leverage colm to do a smarter search, but I'm there yet. In an effort to at least get the boll rolling, I dumped out a report for "queries that contain A haystack of this size is not likely to be easy to make sense of with eyeballs alone, but I'm going to try and use grep to gain some weak insights. Maybe I can carve this up so as to divide and conquer. |
Accounting for some warts in this methodology. Some number of the callsites grepped out of the full output will be formatted such that there's nothing visible inside the call on the line where the call appears.
This means I'm losing 545,729 calls worth of detail right out of the gate 😵💫. Not yet sure what to do about this, but for now it is what it is. I'm going to remove these lines from the callsites file and focus on the one liners that are easier to inspect. With this, the haystack is reduced to 2,447,174 lines, or around 215Mb. |
Some findings, just grepping against the one liner data. Greps included below in case I goofed and accidentally inflated these somehow. pie
title Occurrences in one line map calls
"dict.get": 63712
"exists": 7550
"if/then/else": 127544
"LT": 14696
"LTE": 1926
"GT": 16043
"GTE": 14600
"Neq": 2673
"Eq": 66388
"Mult": 97604
"Div": 236285
"Sub": 158218
"Add": 46644
"string": 9352
"int": 71812
"float": 169152
"duration": 15815
"uint": 16008
"regexp": 2218
"or": 7207
"and": 16775
EDIT: The 3rd column was added to see the counts for boolean/comparison ops that don't first require us to have if/then/else vectorized to really benefit.
|
Briefly discussed the above numbers with the team and it sounds like arithmetic operators are the next step. Comparisons (eq, neq, lt, lte, gt, gte) are only high-impact once if/then/else can be optimized, but arithmetic is applicable for many cases without any prerequisites. |
Looking at this data if we can do the first four div, float, sub, if/then/else, we can cover > 50% of uses for map. That's a great position to be in as we would be able to see early impact of this work. My recommendation would be to work on binary operators generally and then do |
I think when I brought these numbers to slack, there was some concern about |
The epic has been updated to show a "quest tracking" section at the bottom of the description, and I've linked the immediate next hops on the roadmap there. Issues were filed specifically to get through the arithmetic ops, getting us 2 of the top 4. @nathanielc I was going to close this issue out now that we have numbers to guide us, and next (immediate) steps filed as issues. If you prefer I can try file issues for the extended roadmap you just laid out. Let me know. |
Yeah I'd like to create a set of issues for |
Okay, issues for logical ops, conditional expressions, and float are all in the backlog and linked from the epic. |
This is my current top priority, but it wasn't anywhere on the sprint board. Filing this issue for visibility.
Using the query archiver to help make a data-driven decision about the next item to vectorize, look at the structure of historical queries to see what the functions received by
map
actually do, then recommend as far reaching an item as possible.The text was updated successfully, but these errors were encountered: