Have host spill use the new HostAlloc API #9257

revans2 · 2023-09-19T13:54:15Z

This also includes a small memory leak fix for the RapidsDiskStore that was exposed when I started to test with spill this way.

This does not test the code path when we enable host memory limits instead of the host spill store limits. That will be covered by #8883, which should be the next thing I work on.

This fixes #8881

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 · 2023-09-19T13:54:24Z

build

abellina · 2023-09-19T16:09:37Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsDiskStore.scala

@@ -192,7 +192,7 @@ class RapidsDiskStore(diskBlockManager: RapidsDiskBlockManager)
      val path = id.getDiskPath(diskBlockManager)
      withResource(new FileInputStream(path)) { fis =>
        val (header, hostBuffer) = SerializedHostTableUtils.readTableHeaderAndBuffer(fis)
-        val hostCols = closeOnExcept(hostBuffer) { _ =>
+        val hostCols = withResource(hostBuffer) { _ =>


abellina · 2023-09-19T16:13:36Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsHostMemoryStore.scala

-    // TODO: this is disabled for now since subsequent work will tie this into
-    //   our host allocator apis.
-    if (false && !wouldFit) {
+    if (!wouldFit) {


@revans2 this will change the behavior in 23.10. Before this, we would still to host. Now we are going to skip to disk if we can't fit given the limit. Just making sure that was intended.

I personally think that is what we want, but if anyone has a different opinion I am all ears. We want to get to the point where we have a hard limit on host memory. The reason we don't want to make the changes piecemeal is to reduce the pain a customer might see when running a job. This change is only going to show up when a customer wants to spill from GPU memory, and it is larger than the spill store size. I think that is fairly rare, so I am willing to take the hit.

By default the host store is configured to 1GB + pinnedPoolSize, and pinnedPoolSize is defaulted to 0. I think if we raised it to 2GB I'd agree, otherwise it seems fairly common to bump up batchSizeBytes to 2GB and those would spill to disk.

That said, with such a small spill store, we are bound to spill to disk very often anyway... so I don't know if we are saving too much.

revans2 · 2023-09-20T21:35:29Z

@jlowe can you please take a look?

revans2 · 2023-09-21T13:28:01Z

build

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsHostMemoryStore.scala

revans2 · 2023-09-21T14:14:30Z

build

Have host spill use the new HostAlloc API

a39515b

Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>

revans2 self-assigned this Sep 19, 2023

revans2 mentioned this pull request Sep 19, 2023

[BUG] Leaks and Double Frees in Unit Tests #9261

Closed

abellina reviewed Sep 19, 2023

View reviewed changes

abellina previously approved these changes Sep 19, 2023

View reviewed changes

abellina reviewed Sep 19, 2023

View reviewed changes

revans2 mentioned this pull request Sep 19, 2023

Fix leak in test and double free in corner case #9264

Merged

sameerz added the reliability Features to improve reliability or bugs that severly impact the reliability of the plugin label Sep 19, 2023

Merge branch 'branch-23.10' into memory_limit_spill_uses_new_API

2b83cdc

revans2 dismissed abellina’s stale review via 2b83cdc September 21, 2023 13:27

jlowe previously approved these changes Sep 21, 2023

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsHostMemoryStore.scala Outdated Show resolved Hide resolved

Fix nit

f5bfed2

revans2 dismissed jlowe’s stale review via f5bfed2 September 21, 2023 14:14

jlowe approved these changes Sep 21, 2023

View reviewed changes

revans2 merged commit 2a8518e into NVIDIA:branch-23.10 Sep 21, 2023

revans2 deleted the memory_limit_spill_uses_new_API branch September 21, 2023 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have host spill use the new HostAlloc API #9257

Have host spill use the new HostAlloc API #9257

revans2 commented Sep 19, 2023

revans2 commented Sep 19, 2023

abellina Sep 19, 2023

abellina Sep 19, 2023

revans2 Sep 19, 2023

abellina Sep 19, 2023

abellina Sep 19, 2023

revans2 commented Sep 20, 2023

revans2 commented Sep 21, 2023

revans2 commented Sep 21, 2023

Have host spill use the new HostAlloc API #9257

Have host spill use the new HostAlloc API #9257

Conversation

revans2 commented Sep 19, 2023

revans2 commented Sep 19, 2023

abellina Sep 19, 2023

Choose a reason for hiding this comment

abellina Sep 19, 2023

Choose a reason for hiding this comment

revans2 Sep 19, 2023

Choose a reason for hiding this comment

abellina Sep 19, 2023

Choose a reason for hiding this comment

abellina Sep 19, 2023

Choose a reason for hiding this comment

revans2 commented Sep 20, 2023

revans2 commented Sep 21, 2023

revans2 commented Sep 21, 2023