Skip to content

Commit

Permalink
Add GPU support for spark.sql.catalyst.expressions.Slice (#12183)
Browse files Browse the repository at this point in the history
Closes #11607

Depends on
* NVIDIA/spark-rapids-jni#2921

This PR adds support for
`org.apache.spark.sql.catalyst.expressions.Slice`.

Ran a simple perf test for 5 times
``` scala
// use big data gen
import org.apache.spark.sql.tests.datagen._
val dataTable = DBGen().addTable("data", "x array<int>", 10000000)
dataTable.toDF(spark).write.mode("OVERWRITE").parquet("PERF")

// spark-rapids
val df = spark.read.parquet("PERF")
spark.time(df.selectExpr("sum(size(slice(x, 2, 3)))").show())
```
Results:
|      | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 |
|----------|-------|-------|-------|-------|-------|
|  GPU     |  325 ms |  329 ms |  334 ms |  328 ms |  341 ms |
|  CPU     | 1307 ms | 1224 ms | 1353 ms | 1185 ms | 1175 ms |

The results showed that the `GpuSlice` (TITAN RTX) demonstrated
approximately 4 times speedup compared to the `CpuSlice` (8 CPU cores).

---------

Signed-off-by: Yan Feng <fengyan_@mail.ustc.edu.cn>
  • Loading branch information
ustcfy authored Feb 25, 2025
1 parent 6fa0abf commit 8dc022a
Show file tree
Hide file tree
Showing 53 changed files with 610 additions and 226 deletions.
1 change: 1 addition & 0 deletions docs/additional-functionality/advanced_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ Name | SQL Function(s) | Description | Default Value | Notes
<a name="sql.expression.Sin"></a>spark.rapids.sql.expression.Sin|`sin`|Sine|true|None|
<a name="sql.expression.Sinh"></a>spark.rapids.sql.expression.Sinh|`sinh`|Hyperbolic sine|true|None|
<a name="sql.expression.Size"></a>spark.rapids.sql.expression.Size|`cardinality`, `size`|The size of an array or a map|true|None|
<a name="sql.expression.Slice"></a>spark.rapids.sql.expression.Slice|`slice`|Subsets array x starting from index start (array indices start at 1, or starting from the end if start is negative) with the specified length.|true|None|
<a name="sql.expression.SortArray"></a>spark.rapids.sql.expression.SortArray|`sort_array`|Returns a sorted array with the input array and the ascending / descending order|true|None|
<a name="sql.expression.SortOrder"></a>spark.rapids.sql.expression.SortOrder| |Sort order|true|None|
<a name="sql.expression.SparkPartitionID"></a>spark.rapids.sql.expression.SparkPartitionID|`spark_partition_id`|Returns the current partition id|true|None|
Expand Down
Loading

0 comments on commit 8dc022a

Please sign in to comment.