Enable kudo serializer by default #12222

liurenjie1024 · 2025-02-25T04:00:44Z

Closes #12202 .

Enable kudo serializer by default, and contains several fix due to shuffle size change.

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 · 2025-02-25T10:09:15Z

build

abellina

@liurenjie1024 can you provide performance numbers for NDS with and without MT shuffle?

liurenjie1024 · 2025-02-26T02:05:31Z

build

liurenjie1024 · 2025-02-26T10:06:57Z

tests/src/test/scala/com/nvidia/spark/rapids/AdaptiveQueryExecSuite.scala

@@ -500,7 +500,7 @@ class AdaptiveQueryExecSuite
    val conf = new SparkConf()
      .set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true")
      .set(SQLConf.LOCAL_SHUFFLE_READER_ENABLED.key, "true")
-      .set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "400")
+      .set(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, "50")


This change is required since shuffle size changed.

liurenjie1024 · 2025-02-26T10:07:24Z

tests/src/test/scala/com/nvidia/spark/rapids/lore/GpuLoreSuite.scala

@@ -99,7 +99,7 @@ class GpuLoreSuite extends SparkQueryCompareTestSuite with FunSuiteWithTempDir w
  }

  test("AQE broadcast") {
-    doTestReplay("90[*]") { spark =>
+    doTestReplay("93[*]") { spark =>


Same as above, shuffle size change leads to plan change.

winningsix · 2025-02-26T12:39:43Z

sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala

@@ -2052,7 +2052,7 @@ val SHUFFLE_COMPRESSION_LZ4_CHUNK_SIZE = conf("spark.rapids.shuffle.compression.
    .internal()
    .startupOnly()
    .booleanConf
-    .createWithDefault(false)
+    .createWithDefault(true)


For spark.rapids.shuffle.kudo.serializer.measure.buffer.copy.enabled, should we enable that by default?

revans2

The changes themselves look fine. I mostly want to see the performance numbers to show that it is at least as good as the old code. I know we have done some of that in the past and that there have been a lot of optimizations recently so it should be good. But this is a big change so I want to see it.

Enable kudo serializer by default

d11c535

Signed-off-by: liurenjie1024 <liurenjie2008@gmail.com>

liurenjie1024 requested review from abellina and revans2 February 25, 2025 04:00

abellina requested changes Feb 25, 2025

View reviewed changes

sameerz added the performance A performance related task/issue label Feb 25, 2025

liurenjie1024 added 2 commits February 26, 2025 09:49

Fix unit tests

32fd316

Fix header

8178027

liurenjie1024 commented Feb 26, 2025

View reviewed changes

winningsix reviewed Feb 26, 2025

View reviewed changes

revans2 reviewed Feb 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable kudo serializer by default #12222

Enable kudo serializer by default #12222

liurenjie1024 commented Feb 25, 2025

liurenjie1024 commented Feb 25, 2025

abellina left a comment

liurenjie1024 commented Feb 26, 2025

liurenjie1024 Feb 26, 2025

liurenjie1024 Feb 26, 2025

winningsix Feb 26, 2025

revans2 left a comment

Enable kudo serializer by default #12222

Are you sure you want to change the base?

Enable kudo serializer by default #12222

Conversation

liurenjie1024 commented Feb 25, 2025

liurenjie1024 commented Feb 25, 2025

abellina left a comment

Choose a reason for hiding this comment

liurenjie1024 commented Feb 26, 2025

liurenjie1024 Feb 26, 2025

Choose a reason for hiding this comment

liurenjie1024 Feb 26, 2025

Choose a reason for hiding this comment

winningsix Feb 26, 2025

Choose a reason for hiding this comment

revans2 left a comment

Choose a reason for hiding this comment