Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Create a new Expression execution framework #7260

Closed
revans2 opened this issue Dec 5, 2022 · 1 comment
Closed

[FEA] Create a new Expression execution framework #7260

revans2 opened this issue Dec 5, 2022 · 1 comment
Assignees
Labels
feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Dec 5, 2022

Is your feature request related to a problem? Please describe.
After #7258 is at least partly done, and we have a few initial expressions from #7259 we should start to work on a framework to execute these expressions. This will likely involve a lot of back and forth with #7259 to refine things. But it is there to at least get a starting point.

The ideal for a framework like this would be to reduce the total memory usage, but that is kind of a hard problem to solve, so for the time being it will just be to keep track of the memory usage and avoid running out of memory. We may find ways to optimize memory usage in the future. The execution API from #7258 should be used. We should have a config to turn this off if we run into issues. It should dedupe expressions like tiered project does. I hope that the tiered project is stable enough that we can just have the config decide if we want to do a regular tiered project or use the memory aware version. In the memory aware version instead of executing an expression tree all at once, it should split it up. Anything under a Stateful modifying expression, including those expressions, will need to be executed as an entire group for now. For expressions that have not been updated yet, they will need to run as a part of a tree, but we can probably rewrite them with new children that are bound to specific inputs to be able to isolate them. For all other expressions we will want to try and keep everything under budget.

This would mean that we subtract the input batch size from the current budget. Starting at the leaf expressions and working towards the root we should ask each expression if they can predict up front what the size will be, and if so ask them for that size (based on the number of input rows). I think in the short term it would be best to find a subtree that can execute within the remaining budget and run it, then repeat. But if you just want to make things work we can treat them just like predictable expressions. For expressions that can only predict what they need, we need to execute these in isolation. We would need to pass their children to them and ask them to either execute it within the budget or return a response that indicates that they could not. If we see that they cannot run within budget, then we will split all pending results in half (based on the number of rows), making one half of it spillable, and the other half we will keep for execution. If we are down to one row, then we need to fail because we cannot spit it. If we think this is going to be really rare, we could also throw away all of the intermediate results and just split the input batch and redo all of the computation. I don’t like this as a long term solution, but it could be a good way to get started quickly.

If we don't do the long term solution, then we should file follow on issues, or have a good reason why we are doing it differently.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 5, 2022
@sameerz sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Dec 6, 2022
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 8, 2022
@revans2
Copy link
Collaborator Author

revans2 commented Mar 31, 2023

This is probably done enough that we can close it. Not 100% of what we originally envisioned, but it is good enough to cover 99% of the use cases for expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

No branches or pull requests

3 participants