Skip to content

Commit

Permalink
[SPARK-47781][SQL] Handle negative scale decimals for JDBC data sources
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?

SPARK-30252 has disabled the definition of the negative scale for decimals. It has a regression that also disabled reading negative scale decimals from JDBC data sources. **Although there is a legacy config to restore the old behavior**, it seemed neither designed for such a case nor convenient in a data pipeline that extracts negative scale decimals from a database such as Oracle to Parquet files w/o negative scale decimal support.

In addition, Postgres has the negative scale decimals support since v15, which was one of the supporters for disabling negative scale decimals on our side.

In this PR, we change the schema from `decimal(p,s)` to `decimal(p-s,0)` if s<0.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

Negative scale decimals have many supporters for rounding the internal parts, such as Oracle, Postgres, etc.
- Oracle
> Negative scale is the number of significant digits to the left of the decimal point, to but not including the least significant digit. For negative scale the least significant digit is on the left side of the decimal point, because the actual data is rounded to the specified number of places to the left of the decimal point. For example, a specification of (10,-2) means to round to hundreds.

- Postgres
> Beginning in PostgreSQL 15, it is allowed to declare a numeric column with a negative scale. Then values will be rounded to the left of the decimal point. The precision still represents the maximum number of non-rounded digits. Thus, a column declared as
NUMERIC(2, -3)
will round values to the nearest thousand and can store values between -99000 and 99000, inclusive. It is also allowed to declare a scale larger than the declared precision. Such a column can only hold fractional values, and it requires the number of zero digits just to the right of the decimal point to be at least the declared scale minus the declared precision. For example, a column declared as

### How was this patch tested?

new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45956 from yaooqinn/SPARK-47781.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
  • Loading branch information
yaooqinn committed Apr 10, 2024
1 parent 520f3b1 commit b53ec00
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -121,10 +121,11 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark
""".stripMargin.replaceAll("\n", " "))


conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f DECIMAL(3, 2), i DECIMAL(10))")
conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f DECIMAL(3, 2), i DECIMAL(10)," +
"n NUMBER(7,-2))")
.executeUpdate()
conn.prepareStatement(
"INSERT INTO numerics VALUES (4, 1.23, 9999999999)").executeUpdate()
"INSERT INTO numerics VALUES (4, 1.23, 9999999999, 7456123.89)").executeUpdate()
conn.commit()

conn.prepareStatement("CREATE TABLE oracle_types (d BINARY_DOUBLE, f BINARY_FLOAT)")
Expand Down Expand Up @@ -159,19 +160,14 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark
conn.commit()
}

test("SPARK-16625 : Importing Oracle numeric types") {
val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties)
val rows = df.collect()
assert(rows.length == 1)
val row = rows(0)
// The main point of the below assertions is not to make sure that these Oracle types are
// mapped to decimal types, but to make sure that the returned values are correct.
// A value > 1 from DECIMAL(1) is correct:
assert(row.getDecimal(0).compareTo(BigDecimal.valueOf(4)) == 0)
// A value with fractions from DECIMAL(3, 2) is correct:
assert(row.getDecimal(1).compareTo(BigDecimal.valueOf(1.23)) == 0)
// A value > Int.MaxValue from DECIMAL(10) is correct:
assert(row.getDecimal(2).compareTo(BigDecimal.valueOf(9999999999L)) == 0)
test("SPARK-16625: Importing Oracle numeric types") {
Seq("true", "false").foreach { flag =>
withSQLConf((SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key, flag)) {
val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties)
checkAnswer(df, Seq(Row(BigDecimal.valueOf(4), BigDecimal.valueOf(1.23),
BigDecimal.valueOf(9999999999L), BigDecimal.valueOf(7456100))))
}
}
}


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -196,9 +196,12 @@ object JdbcUtils extends Logging with SQLConfHelper {
case java.sql.Types.CHAR => CharType(precision)
case java.sql.Types.CLOB => StringType
case java.sql.Types.DATE => DateType
case java.sql.Types.DECIMAL if precision != 0 || scale != 0 =>
case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC if precision == 0 && scale == 0 =>
DecimalType.SYSTEM_DEFAULT
case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC if scale < 0 =>
DecimalType.bounded(precision - scale, 0)
case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC =>
DecimalType.bounded(precision, scale)
case java.sql.Types.DECIMAL => DecimalType.SYSTEM_DEFAULT
case java.sql.Types.DOUBLE => DoubleType
case java.sql.Types.FLOAT => FloatType
case java.sql.Types.INTEGER => if (signed) IntegerType else LongType
Expand All @@ -207,9 +210,6 @@ object JdbcUtils extends Logging with SQLConfHelper {
case java.sql.Types.LONGVARCHAR => StringType
case java.sql.Types.NCHAR => StringType
case java.sql.Types.NCLOB => StringType
case java.sql.Types.NUMERIC if precision != 0 || scale != 0 =>
DecimalType.bounded(precision, scale)
case java.sql.Types.NUMERIC => DecimalType.SYSTEM_DEFAULT
case java.sql.Types.NVARCHAR => StringType
case java.sql.Types.REAL => DoubleType
case java.sql.Types.REF => StringType
Expand Down

0 comments on commit b53ec00

Please sign in to comment.