make kudo shuffle read retryable and spillable #12236

binmahone · 2025-02-26T10:15:45Z

This PR fixes #12215 in the way suggested by @abellina in #12184 (comment). It requires NVIDIA/spark-rapids-jni#2991 being merged first

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

abellina · 2025-02-26T15:19:54Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/SpillableKudoTable.java

+import com.nvidia.spark.rapids.jni.kudo.KudoTable;
+import com.nvidia.spark.rapids.jni.kudo.KudoTableHeader;
+
+public class SpillableKudoTable extends KudoTable {


if this was in scala, it might make more sense, as it's only used from scala.

For me I would prefer to have SpillableKudoTable not extend KudoTable.

Conceptually I want to say that I have a list of SpillableKudoTables, and then when I want to concat and serialize them into a Table I convert all of them into regular KudoTables which guarantees that they are all resident in memory, do the concat operation, and then close them when I am done.

This implies that a SpillableKudoTable is a KudoTable, which we can make it work, but as @abellina pointed out it makes the code much more complicated. If you have SpillableKudoTable produce a KudoTable, then we don't need guaranteeSpillable. The act of getting the KudoTable out implicitly increments the reference count and make it not spillable until it goes out of scope.

abellina · 2025-02-26T15:20:47Z

sql-plugin/src/main/java/com/nvidia/spark/rapids/SpillableKudoTable.java

+import com.nvidia.spark.rapids.jni.kudo.KudoTableHeader;
+
+public class SpillableKudoTable extends KudoTable {
+  private SpillableHostBuffer shb;


really this should just be a SpillableHostBufferHandle, but for now it probably fits better with the code.

abellina · 2025-02-26T15:35:28Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuShuffleCoalesceExec.scala

-
-      KudoHostMergeResultWrapper(result)
+      try {
+        for (skt <- columns.map(_.spillableKudoTable)) {


could a SpillableKudoTable make a KudoTable that isn't spillable? I don't know if SpillableKudoTable has to subclass KudoTable, but I feel this would clean this interface a bit.

So you say skts.safeMap(_.makeKudoTable). KudoTable is already closeable, so when we close the sequence of KudoTable all HostMemoryBuffer references are closed and we are now spillable again.

We then later skts.safeClose() to close the spillable handles.

integrate kudo table into spillable framework

71a110c

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>

binmahone requested a review from abellina February 26, 2025 10:15

binmahone changed the title ~~integrate kudo table into spillable framework~~ Kudo shuffle read should be retryable and spillable on Host Memory Feb 26, 2025

binmahone mentioned this pull request Feb 26, 2025

[FEA] Support unspill for SpillableHostBuffer #12184

Open

binmahone requested a review from revans2 February 26, 2025 10:22

binmahone changed the title ~~Kudo shuffle read should be retryable and spillable on Host Memory~~ make kudo shuffle read retryable and spillable Feb 26, 2025

abellina reviewed Feb 26, 2025

View reviewed changes

revans2 mentioned this pull request Feb 26, 2025

support unspill for SpillableHostBuffer #12186

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make kudo shuffle read retryable and spillable #12236

make kudo shuffle read retryable and spillable #12236

binmahone commented Feb 26, 2025

abellina Feb 26, 2025

revans2 Feb 26, 2025

abellina Feb 26, 2025

abellina Feb 26, 2025 •

edited

Loading

make kudo shuffle read retryable and spillable #12236

Are you sure you want to change the base?

make kudo shuffle read retryable and spillable #12236

Conversation

binmahone commented Feb 26, 2025

abellina Feb 26, 2025

Choose a reason for hiding this comment

revans2 Feb 26, 2025

Choose a reason for hiding this comment

abellina Feb 26, 2025

Choose a reason for hiding this comment

abellina Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

abellina Feb 26, 2025 •

edited

Loading