[SPARK-47781][SQL] Handle negative scale decimals for JDBC data sources

### What changes were proposed in this pull request? SPARK-30252 has disabled the definition of the negative scale for decimals. It has a regression that also disabled reading negative scale decimals from JDBC data sources. **Although there is a legacy config to restore the old behavior**, it seemed neither designed for such a case nor convenient in a data pipeline that extracts negative scale decimals from a database such as Oracle to Parquet files w/o negative scale decimal support. In addition, Postgres has the negative scale decimals support since v15, which was one of the supporters for disabling negative scale decimals on our side. In this PR, we change the schema from `decimal(p,s)` to `decimal(p-s,0)` if s<0. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Negative scale decimals have many supporters for rounding the internal parts, such as Oracle, Postgres, etc. - Oracle > Negative scale is the number of significant digits to the left of the decimal point, to but not including the least significant digit. For negative scale the least significant digit is on the left side of the decimal point, because the actual data is rounded to the specified number of places to the left of the decimal point. For example, a specification of (10,-2) means to round to hundreds. - Postgres > Beginning in PostgreSQL 15, it is allowed to declare a numeric column with a negative scale. Then values will be rounded to the left of the decimal point. The precision still represents the maximum number of non-rounded digits. Thus, a column declared as NUMERIC(2, -3) will round values to the nearest thousand and can store values between -99000 and 99000, inclusive. It is also allowed to declare a scale larger than the declared precision. Such a column can only hold fractional values, and it requires the number of zero digits just to the right of the decimal point to be at least the declared scale minus the declared precision. For example, a column declared as ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45956 from yaooqinn/SPARK-47781. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Kent Yao <yao@apache.org>
apache · Apr 10, 2024 · b53ec00 · b53ec00
1 parent 520f3b1
commit b53ec00
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 20 deletions.
diff --git a/...r-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala b/...r-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
@@ -121,10 +121,11 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark
       """.stripMargin.replaceAll("\n", " "))
 
 
-    conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f DECIMAL(3, 2), i DECIMAL(10))")
+    conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f DECIMAL(3, 2), i DECIMAL(10)," +
+        "n NUMBER(7,-2))")
       .executeUpdate()
     conn.prepareStatement(
-      "INSERT INTO numerics VALUES (4, 1.23, 9999999999)").executeUpdate()
+      "INSERT INTO numerics VALUES (4, 1.23, 9999999999, 7456123.89)").executeUpdate()
     conn.commit()
 
     conn.prepareStatement("CREATE TABLE oracle_types (d BINARY_DOUBLE, f BINARY_FLOAT)")
@@ -159,19 +160,14 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSpark
     conn.commit()
   }
 
-  test("SPARK-16625 : Importing Oracle numeric types") {
-    val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties)
-    val rows = df.collect()
-    assert(rows.length == 1)
-    val row = rows(0)
-    // The main point of the below assertions is not to make sure that these Oracle types are
-    // mapped to decimal types, but to make sure that the returned values are correct.
-    // A value > 1 from DECIMAL(1) is correct:
-    assert(row.getDecimal(0).compareTo(BigDecimal.valueOf(4)) == 0)
-    // A value with fractions from DECIMAL(3, 2) is correct:
-    assert(row.getDecimal(1).compareTo(BigDecimal.valueOf(1.23)) == 0)
-    // A value > Int.MaxValue from DECIMAL(10) is correct:
-    assert(row.getDecimal(2).compareTo(BigDecimal.valueOf(9999999999L)) == 0)
+  test("SPARK-16625: Importing Oracle numeric types") {
+    Seq("true", "false").foreach { flag =>
+      withSQLConf((SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key, flag)) {
+        val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties)
+        checkAnswer(df, Seq(Row(BigDecimal.valueOf(4), BigDecimal.valueOf(1.23),
+          BigDecimal.valueOf(9999999999L), BigDecimal.valueOf(7456100))))
+      }
+    }
   }
 
 

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
@@ -196,9 +196,12 @@ object JdbcUtils extends Logging with SQLConfHelper {
     case java.sql.Types.CHAR => CharType(precision)
     case java.sql.Types.CLOB => StringType
     case java.sql.Types.DATE => DateType
-    case java.sql.Types.DECIMAL if precision != 0 || scale != 0 =>
+    case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC if precision == 0 && scale == 0 =>
+      DecimalType.SYSTEM_DEFAULT
+    case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC if scale < 0 =>
+      DecimalType.bounded(precision - scale, 0)
+    case java.sql.Types.DECIMAL | java.sql.Types.NUMERIC =>
       DecimalType.bounded(precision, scale)
-    case java.sql.Types.DECIMAL => DecimalType.SYSTEM_DEFAULT
     case java.sql.Types.DOUBLE => DoubleType
     case java.sql.Types.FLOAT => FloatType
     case java.sql.Types.INTEGER => if (signed) IntegerType else LongType
@@ -207,9 +210,6 @@ object JdbcUtils extends Logging with SQLConfHelper {
     case java.sql.Types.LONGVARCHAR => StringType
     case java.sql.Types.NCHAR => StringType
     case java.sql.Types.NCLOB => StringType
-    case java.sql.Types.NUMERIC if precision != 0 || scale != 0 =>
-      DecimalType.bounded(precision, scale)
-    case java.sql.Types.NUMERIC => DecimalType.SYSTEM_DEFAULT
     case java.sql.Types.NVARCHAR => StringType
     case java.sql.Types.REAL => DoubleType
     case java.sql.Types.REF => StringType