Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make kudo shuffle read retryable and spillable #12236

Open
wants to merge 1 commit into
base: branch-25.04
Choose a base branch
from

Conversation

binmahone
Copy link
Collaborator

This PR fixes #12215 in the way suggested by @abellina in #12184 (comment). It requires NVIDIA/spark-rapids-jni#2991 being merged first

Signed-off-by: Hongbin Ma (Mahone) <mahongbin@apache.org>
@binmahone binmahone requested a review from abellina February 26, 2025 10:15
@binmahone binmahone changed the title integrate kudo table into spillable framework Kudo shuffle read should be retryable and spillable on Host Memory Feb 26, 2025
@binmahone binmahone requested a review from revans2 February 26, 2025 10:22
@binmahone binmahone changed the title Kudo shuffle read should be retryable and spillable on Host Memory make kudo shuffle read retryable and spillable Feb 26, 2025
import com.nvidia.spark.rapids.jni.kudo.KudoTable;
import com.nvidia.spark.rapids.jni.kudo.KudoTableHeader;

public class SpillableKudoTable extends KudoTable {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this was in scala, it might make more sense, as it's only used from scala.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me I would prefer to have SpillableKudoTable not extend KudoTable.

Conceptually I want to say that I have a list of SpillableKudoTables, and then when I want to concat and serialize them into a Table I convert all of them into regular KudoTables which guarantees that they are all resident in memory, do the concat operation, and then close them when I am done.

This implies that a SpillableKudoTable is a KudoTable, which we can make it work, but as @abellina pointed out it makes the code much more complicated. If you have SpillableKudoTable produce a KudoTable, then we don't need guaranteeSpillable. The act of getting the KudoTable out implicitly increments the reference count and make it not spillable until it goes out of scope.

import com.nvidia.spark.rapids.jni.kudo.KudoTableHeader;

public class SpillableKudoTable extends KudoTable {
private SpillableHostBuffer shb;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really this should just be a SpillableHostBufferHandle, but for now it probably fits better with the code.


KudoHostMergeResultWrapper(result)
try {
for (skt <- columns.map(_.spillableKudoTable)) {
Copy link
Collaborator

@abellina abellina Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could a SpillableKudoTable make a KudoTable that isn't spillable? I don't know if SpillableKudoTable has to subclass KudoTable, but I feel this would clean this interface a bit.

So you say skts.safeMap(_.makeKudoTable). KudoTable is already closeable, so when we close the sequence of KudoTable all HostMemoryBuffer references are closed and we are now spillable again.

We then later skts.safeClose() to close the spillable handles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Kudo shuffle read should be retryable and spillable on Host Memory
3 participants