You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
With ANSI enabled, when a cast operation fails, one sees raw CUDF exceptions rather than the appropriate Spark exception.
This problem is not exclusive to Spark 4; this behaviour also occurs on Spark 3.x, but only with ANSI enabled.
Repro
Consider the following String cast example:
Seq( "", "", "" ).toDF("a").write.mode("overwrite").parquet("/tmp/myth/test_input")
spark.read.parquet("/tmp/myth/test_input").selectExpr(" CAST(a AS INTEGER)").show
With ANSI enabled, empty strings should cause exceptions rather than yield NULLs.
On Apache Spark 4, the exception looks like:
org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22018
== SQL (line 1, position 1) ==
cast(a as integer)
^^^^^^^^^^^^^^^^^^
at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:801)
...
When ANSI is enabled with the spark-rapids plugin, one sees:
com.nvidia.spark.rapids.jni.CastException: Error casting data on row 0:
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(Native Method)
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:50)
at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:37)
at com.nvidia.spark.rapids.GpuCast$.doCast(GpuCast.scala:551)
at com.nvidia.spark.rapids.GpuCast.doColumnar(GpuCast.scala:1816)
at com.nvidia.spark.rapids.GpuUnaryExpression.doItColumnar(GpuExpressions.scala:276)
Expected behavior
One would expect that the CUDF exception would be caught and handled (or wrapped into a Spark-specific exception).
Environment details
ANSI enabled
Spark 4, 3.x
Additional context
This is an ANSI mode test. This won't be addressed as part of #11009. It's likely to need RapidsErrorUtils shim work.
The text was updated successfully, but these errors were encountered:
Description
With ANSI enabled, when a cast operation fails, one sees raw CUDF exceptions rather than the appropriate Spark exception.
This problem is not exclusive to Spark 4; this behaviour also occurs on Spark 3.x, but only with ANSI enabled.
Repro
Consider the following String cast example:
With ANSI enabled, empty strings should cause exceptions rather than yield
NULL
s.On Apache Spark 4, the exception looks like:
When ANSI is enabled with the
spark-rapids
plugin, one sees:Expected behavior
One would expect that the CUDF exception would be caught and handled (or wrapped into a Spark-specific exception).
Environment details
Additional context
This is an ANSI mode test. This won't be addressed as part of #11009. It's likely to need
RapidsErrorUtils
shim work.The text was updated successfully, but these errors were encountered: