Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support RaiseError for DB 14.3 and Spark 4.0.0 #10969

Closed
razajafri opened this issue Jun 4, 2024 · 1 comment · Fixed by #11969
Closed

[FEA] Support RaiseError for DB 14.3 and Spark 4.0.0 #10969

razajafri opened this issue Jun 4, 2024 · 1 comment · Fixed by #11969
Assignees
Labels
Spark 4.0+ Spark 4.0+ issues

Comments

@razajafri
Copy link
Collaborator

razajafri commented Jun 4, 2024

This PR #5487 added the ability to convert a UDF that can throw SparkException into a catalyst expression with RaiseError.
Support for RaiseError was added in #5540 but Apache Spark 4.0 changed fundamentally how it throws exceptions so we have to match that change.

We'd like to have GpuRaiseError so that we can prevent a columnar-to-row transition when an error should be raised.

@mythrocks
Copy link
Collaborator

mythrocks commented Oct 28, 2024

@razajafri, could you please confirm if this issue is necessary, if we already have #10107? Can we close this as a dupe? (Or vice versa.)

mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Oct 28, 2024
Fixes NVIDIA#11537.

This commit addresses the failure of the `test_raise_error` test
in `misc_expr_test.py` for Databricks 14.3.

This is an extension of NVIDIA#11129, where this test was skipped for
Apache Spark 4.0.  The failure on Databricks 14.3 shares the same
cause as in Spark 4.0, i.e. a backward-incompatible Spark change
in the signature of RaiseError, as introduced in
https://issues.apache.org/jira/browse/SPARK-44838.

The work to support this change in a Spark-RAPIDS shim will be
tracked in NVIDIA#10969.  This test will be skipped until that work
is completed.

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit that referenced this issue Nov 4, 2024
Fixes #11537.

This commit addresses the failure of the `test_raise_error` test
in `misc_expr_test.py` for Databricks 14.3.

This is an extension of #11129, where this test was skipped for
Apache Spark 4.0.  The failure on Databricks 14.3 shares the same
cause as in Spark 4.0, i.e. a backward-incompatible Spark change
in the signature of RaiseError, as introduced in
https://issues.apache.org/jira/browse/SPARK-44838.

The work to support this change in a Spark-RAPIDS shim will be
tracked in #10969.  This test will be skipped until that work
is completed.

Signed-off-by: MithunR <mithunr@nvidia.com>
@mythrocks mythrocks self-assigned this Jan 7, 2025
@mythrocks mythrocks changed the title [FEA] Support RaiseError for Spark 4.0.0 [FEA] Support RaiseError for DB 14.3 and Spark 4.0.0 Jan 14, 2025
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jan 16, 2025
Fixes NVIDIA#10969.

This commit adds support for `raise_error()` on Databricks 14.3 and
Spark 4.0.

On these new Spark versions, the `RaiseError` expression (that powers
the `raise_error()` API function) was changed from a Unary expression to
a Binary one.  This was done without modifying the arity of
`raise_error()`.  The ostensible reason seems to have been to eventually allow
user-code to raise custom errors via `raise_error()`.

This commit allows `raise_error()` to work on the GPU as it currently
does on the CPU:  as a unary function powered by a binary expression in
the background.

The tests have been modified to verify both the new behaviour and the
legacy one on new platforms, while continuing to run as before on legacy
platforms.

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jan 16, 2025
Fixes NVIDIA#10969.

This commit adds support for `raise_error()` on Databricks 14.3 and
Spark 4.0.

On these new Spark versions, the `RaiseError` expression (that powers
the `raise_error()` API function) was changed from a Unary expression to
a Binary one.  This was done without modifying the arity of
`raise_error()`.  The ostensible reason seems to have been to eventually allow
user-code to raise custom errors via `raise_error()`.

This commit allows `raise_error()` to work on the GPU as it currently
does on the CPU:  as a unary function powered by a binary expression in
the background.

The tests have been modified to verify both the new behaviour and the
legacy one on new platforms, while continuing to run as before on legacy
platforms.

Signed-off-by: MithunR <mithunr@nvidia.com>
mythrocks added a commit to mythrocks/spark-rapids that referenced this issue Jan 16, 2025
Fixes NVIDIA#10969.

This commit adds support for `raise_error()` on Databricks 14.3 and
Spark 4.0.

On these new Spark versions, the `RaiseError` expression (that powers
the `raise_error()` API function) was changed from a Unary expression to
a Binary one.  This was done without modifying the arity of
`raise_error()`.  The ostensible reason seems to have been to eventually allow
user-code to raise custom errors via `raise_error()`.

This commit allows `raise_error()` to work on the GPU as it currently
does on the CPU:  as a unary function powered by a binary expression in
the background.

The tests have been modified to verify both the new behaviour and the
legacy one on new platforms, while continuing to run as before on legacy
platforms.

Signed-off-by: MithunR <mithunr@nvidia.com>
@sameerz sameerz removed the feature request New feature or request label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Spark 4.0+ Spark 4.0+ issues
Projects
None yet
4 participants