Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Update GpuWindowExec to use OOM retry framework #7254

Closed
revans2 opened this issue Dec 5, 2022 · 2 comments · Fixed by #7824
Closed

[FEA] Update GpuWindowExec to use OOM retry framework #7254

revans2 opened this issue Dec 5, 2022 · 2 comments · Fixed by #7824
Assignees
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin

Comments

@revans2
Copy link
Collaborator

revans2 commented Dec 5, 2022

Is your feature request related to a problem? Please describe.
Once #7253 is complete we should update GpuWindowExec to use it.

Describe the solution you'd like
There are a few parts to this that we need to get right, and may need to be done in multiple separate steps or PRs.

We need to combine the preceding GpuCoalesceBatches with the GpuWindowExec. This is needed because the current design of the GpuMemoryLeaseManager does not allow for passing batches between nodes in a SparkPlan that are larger than the default lease size. This is because of limitations in Spark itself, but also because it should help with debugging and avoiding leaks of leases.

Once the two are combined after the task knows how much memory it will need to hold the input and the number of input rows it should do a high water mark estimation on how much GPU memory it will need to complete the operation. This is likely to be tricky to know and we should use some of the memory usage tools being worked on to help us know what the size actually use is. We should also look at adding some scale tests using the same technology. If the technology is not available when we first start to work on this, they should be filled as follow on issues so we don't lose track of them.

If the node needs a higher lease to complete the processing it should ask for the lease at this point before going on, as all of the data is already spillable.

Once it has the lease it should do the operations, and ideally split the output batch into smaller chunks that are about the target batch size.

For window operations this can get to be rather complicated because there are multiple different backends that could be used to perform a given window aggregation/function. We should strive for accuracy in the size of the data used as much as possible, but in cases where we just don't know we should try to err on the side of the worst case situation. Only if we see serious performance degradation should we rethink this plan.

This is intended to be the first operator that we are going to tackle. This is because the input to a window operation can be unbounded, and there are a number of cases where we do not currently support out of core algorithms to allow us to process arbitrary amounts of data.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Dec 5, 2022
@revans2 revans2 changed the title [FEA] Update GpuWindowExec to use GPuMemoryLeaseManager [FEA] Update GpuWindowExec to use GpuMemoryLeaseManager Dec 5, 2022
@mattahrens mattahrens added reliability Features to improve reliability or bugs that severly impact the reliability of the plugin and removed ? - Needs Triage Need team to review and classify labels Dec 6, 2022
@mattahrens mattahrens changed the title [FEA] Update GpuWindowExec to use GpuMemoryLeaseManager [FEA] Update GpuWindowExec to use OOO retry framework Jan 27, 2023
@abellina
Copy link
Collaborator

abellina commented Feb 15, 2023

A first crack at this adds the retry logic within BasicWindowCalc.computeBasicWindow. This should allow us to take care of RetryOOM.

A follow on issue needs to be defined for split and retry and running window seems like the first candidate. Running window already has logic to handle fixup around batch boundaries, to carry the window aggregation through, and the split/merge logic would plug into this. But again, likely should be split up from this issue when we start looking at this.

@sameerz sameerz changed the title [FEA] Update GpuWindowExec to use OOO retry framework [FEA] Update GpuWindowExec to use OOM retry framework Feb 18, 2023
@abellina abellina self-assigned this Feb 22, 2023
@abellina
Copy link
Collaborator

I have something working here, with some tests that touch row, range, and row-optimized windows (just going off what I see in doAggs). These tests are very basic and nowhere near exhaustive to test all of the window functionality. I believe what we actually want is CI to trigger retries at some frequency for the full fledged tests, to figure out correctness in many more scenarios I could possibly test with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reliability Features to improve reliability or bugs that severly impact the reliability of the plugin
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants