Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Spark 4] Decimal casting errors raised from the plugin do not match those from Spark 4.0 in ANSI mode #11550

Open
mythrocks opened this issue Oct 1, 2024 · 1 comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues

Comments

@mythrocks
Copy link
Collaborator

mythrocks commented Oct 1, 2024

Description
On Spark 4.0, when ANSI mode is enabled, and a DECIMAL(3,0) column is cast to a lower width type (e.g. DECIMAL(1,0)), the plugin's error message does not match the one from Apache Spark.

On Spark:

org.apache.spark.SparkArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE.WITH_SUGGESTION]  48 cannot be represented as Decimal(1, 0). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003
== SQL (line 1, position 2) ==
 cast(a as decimal(1,0))
 ^^^^^^^^^^^^^^^^^^^^^^^

On the Spark RAPIDS plugin:

org.apache.spark.SparkArithmeticException: [ARITHMETIC_OVERFLOW] overflow occurred. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22003

Repro

Here is a minimal pytest repro:

@pytest.mark.parametrize('data_gen', [
    DecimalGen(3, 0)], ids=meta_idfn('from:'))
@pytest.mark.parametrize('to_type', [
    DecimalType(1, -1)], ids=meta_idfn('to:'))
def test_ansi_cast_failures_decimal_to_decimal(data_gen, to_type):
    assert_gpu_and_cpu_error(
        lambda spark : unary_op_df(spark, data_gen).select(f.col('a').cast(to_type), f.col('a')).collect(),
        conf=ansi_enabled_conf,
        error_message="cannot be represented as Decimal")

Expected behavior
The overflow exception should match what is produced from Spark 4.

Misc
Depends on #11414.

@mythrocks mythrocks added ? - Needs Triage Need team to review and classify bug Something isn't working Spark 4.0+ Spark 4.0+ issues and removed ? - Needs Triage Need team to review and classify labels Oct 1, 2024
@mythrocks
Copy link
Collaborator Author

The "correct" solution here would be to shim the code that generates the exception, ideally in RapidsErrorUtils.

The problem is that RapidsErrorUtils underwent refactor, as part of #11414. That change has yet to be merged. Attempting to fix this simultaneously will lead to rework from conflicts.

I'm not inclined to fix this as part of addressing #11009. I will include this repro as part of #11009, with an xfail.

@mythrocks mythrocks changed the title [BUG] [Spark 4] Decimal casting errors raised from the plugin do not match those from Spark 4.0 [BUG] [Spark 4] Casting errors raised from the plugin do not match those from Spark 4.0 Oct 1, 2024
@mythrocks mythrocks changed the title [BUG] [Spark 4] Casting errors raised from the plugin do not match those from Spark 4.0 [BUG] [Spark 4] Casting errors raised from the plugin do not match those from Spark 4.0 in ANSI mode Oct 1, 2024
@mythrocks mythrocks changed the title [BUG] [Spark 4] Casting errors raised from the plugin do not match those from Spark 4.0 in ANSI mode [BUG] [Spark 4] Decimal casting errors raised from the plugin do not match those from Spark 4.0 in ANSI mode Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

No branches or pull requests

1 participant