[SPARK-31405][SQL][3.0] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files #28526

cloud-fan · 2020-05-14T06:35:57Z

What changes were proposed in this pull request?

When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not.

Why are the changes needed?

Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values.

Does this PR introduce any user-facing change?

Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config.

How was this patch tested?

updated tests

…me values from/to Parquet/Avro files When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not. Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values. Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config. updated tests Closes apache#28477 from cloud-fan/rebase. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

cloud-fan · 2020-05-14T06:36:28Z

cc @HyukjinKwon @MaxGekk

SparkQA · 2020-05-14T06:53:44Z

Test build #122607 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

MaxGekk

In which places does the original PR conflict with 3.0?

HyukjinKwon · 2020-05-14T07:46:10Z

Many ..

diff --cc external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index 27206edb287,1d18594fd34..00000000000
--- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@@ -110,22 -116,13 +116,29 @@@ class AvroDeserializer
        case (LONG, TimestampType) => avroType.getLogicalType match {
          // For backward compatibility, if the Avro type is Long and it is not logical type
          // (the `null` case), the value is processed as timestamp type with millisecond precision.
++<<<<<<< HEAD
 +        case null | _: TimestampMillis if rebaseDateTime => (updater, ordinal, value) =>
 +          val millis = value.asInstanceOf[Long]
 +          val micros = DateTimeUtils.fromMillis(millis)
 +          val rebasedMicros = rebaseJulianToGregorianMicros(micros)
 +          updater.setLong(ordinal, rebasedMicros)
 +        case null | _: TimestampMillis => (updater, ordinal, value) =>
 +          val millis = value.asInstanceOf[Long]
 +          val micros = DateTimeUtils.fromMillis(millis)
 +          updater.setLong(ordinal, micros)
 +        case _: TimestampMicros if rebaseDateTime => (updater, ordinal, value) =>
 +          val micros = value.asInstanceOf[Long]
 +          val rebasedMicros = rebaseJulianToGregorianMicros(micros)
 +          updater.setLong(ordinal, rebasedMicros)
++=======
+         case null | _: TimestampMillis => (updater, ordinal, value) =>
+           val millis = value.asInstanceOf[Long]
+           val micros = DateTimeUtils.millisToMicros(millis)
+           updater.setLong(ordinal, timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
          case _: TimestampMicros => (updater, ordinal, value) =>
            val micros = value.asInstanceOf[Long]
-           updater.setLong(ordinal, micros)
+           updater.setLong(ordinal, timestampRebaseFunc(micros))
          case other => throw new IncompatibleSchemaException(
            s"Cannot convert Avro logical type ${other} to Catalyst Timestamp type.")
        }
diff --cc external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
index dc232168fd2,21c5dec6239..00000000000
--- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
+++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
@@@ -155,15 -160,10 +160,22 @@@ class AvroSerializer
        case (TimestampType, LONG) => avroType.getLogicalType match {
            // For backward compatibility, if the Avro type is Long and it is not logical type
            // (the `null` case), output the timestamp value as with millisecond precision.
++<<<<<<< HEAD
 +          case null | _: TimestampMillis if rebaseDateTime => (getter, ordinal) =>
 +            val micros = getter.getLong(ordinal)
 +            val rebasedMicros = rebaseGregorianToJulianMicros(micros)
 +            DateTimeUtils.toMillis(rebasedMicros)
 +          case null | _: TimestampMillis => (getter, ordinal) =>
 +            DateTimeUtils.toMillis(getter.getLong(ordinal))
 +          case _: TimestampMicros if rebaseDateTime => (getter, ordinal) =>
 +            rebaseGregorianToJulianMicros(getter.getLong(ordinal))
 +          case _: TimestampMicros => (getter, ordinal) => getter.getLong(ordinal)
++=======
+           case null | _: TimestampMillis => (getter, ordinal) =>
+             DateTimeUtils.microsToMillis(timestampRebaseFunc(getter.getLong(ordinal)))
+           case _: TimestampMicros => (getter, ordinal) =>
+             timestampRebaseFunc(getter.getLong(ordinal))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            case other => throw new IncompatibleSchemaException(
              s"Cannot convert Catalyst Timestamp type to Avro logical type ${other}")
          }
diff --cc sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6c18280ce4d,aeaf884c7d1..00000000000
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@@ -2509,57 -2520,71 +2509,83 @@@ object SQLConf
      .booleanConf
      .createWithDefault(false)

++<<<<<<< HEAD
 +  val LEGACY_PARQUET_REBASE_DATETIME_IN_WRITE =
 +    buildConf("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled")
 +      .internal()
 +      .doc("When true, rebase dates/timestamps from Proleptic Gregorian calendar " +
 +        "to the hybrid calendar (Julian + Gregorian) in write. " +
 +        "The rebasing is performed by converting micros/millis/days to " +
 +        "a local date/timestamp in the source calendar, interpreting the resulted date/" +
 +        "timestamp in the target calendar, and getting the number of micros/millis/days " +
 +        "since the epoch 1970-01-01 00:00:00Z.")
-       .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
-
-   val LEGACY_PARQUET_REBASE_DATETIME_IN_READ =
-     buildConf("spark.sql.legacy.parquet.rebaseDateTimeInRead.enabled")
++=======
+    val LEGACY_INTEGER_GROUPING_ID =
+     buildConf("spark.sql.legacy.integerGroupingId")
        .internal()
-       .doc("When true, rebase dates/timestamps " +
-         "from the hybrid calendar to Proleptic Gregorian calendar in read. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
-       .version("3.0.0")
+       .doc("When true, grouping_id() returns int values instead of long values.")
+       .version("3.1.0")
        .booleanConf
        .createWithDefault(false)

-   val LEGACY_AVRO_REBASE_DATETIME_IN_WRITE =
-     buildConf("spark.sql.legacy.avro.rebaseDateTimeInWrite.enabled")
+   val LEGACY_PARQUET_REBASE_MODE_IN_WRITE =
+     buildConf("spark.sql.legacy.parquet.datetimeRebaseModeInWrite")
        .internal()
-       .doc("When true, rebase dates/timestamps from Proleptic Gregorian calendar " +
-         "to the hybrid calendar (Julian + Gregorian) in write. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
+       .doc("When LEGACY, Spark will rebase dates/timestamps from Proleptic Gregorian calendar " +
+         "to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files. " +
+         "When CORRECTED, Spark will not do rebase and write the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the writing if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars.")
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
        .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)
+
+   val LEGACY_PARQUET_REBASE_MODE_IN_READ =
+     buildConf("spark.sql.legacy.parquet.datetimeRebaseModeInRead")
+       .internal()
+       .doc("When LEGACY, Spark will rebase dates/timestamps from the legacy hybrid (Julian + " +
+         "Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files. " +
+         "When CORRECTED, Spark will not do rebase and read the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the reading if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars. This config is " +
+         "only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.")
+       .version("3.0.0")
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)

-   val LEGACY_AVRO_REBASE_DATETIME_IN_READ =
-     buildConf("spark.sql.legacy.avro.rebaseDateTimeInRead.enabled")
+   val LEGACY_AVRO_REBASE_MODE_IN_WRITE =
+     buildConf("spark.sql.legacy.avro.datetimeRebaseModeInWrite")
        .internal()
-       .doc("When true, rebase dates/timestamps " +
-         "from the hybrid calendar to Proleptic Gregorian calendar in read. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
+       .doc("When LEGACY, Spark will rebase dates/timestamps from Proleptic Gregorian calendar " +
+         "to the legacy hybrid (Julian + Gregorian) calendar when writing Avro files. " +
+         "When CORRECTED, Spark will not do rebase and write the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the writing if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars.")
        .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)
+
+   val LEGACY_AVRO_REBASE_MODE_IN_READ =
+     buildConf("spark.sql.legacy.avro.datetimeRebaseModeInRead")
+       .internal()
+       .doc("When LEGACY, Spark will rebase dates/timestamps from the legacy hybrid (Julian + " +
+         "Gregorian) calendar to Proleptic Gregorian calendar when reading Avro files. " +
+         "When CORRECTED, Spark will not do rebase and read the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the reading if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars. This config is " +
+         "only effective if the writer info (like Spark, Hive) of the Avro files is unknown.")
+       .version("3.0.0")
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)

    /**
     * Holds information about keys that have been deprecated.
@@@ -3139,9 -3166,7 +3165,13 @@@ class SQLConf extends Serializable wit

    def csvFilterPushDown: Boolean = getConf(CSV_FILTER_PUSHDOWN_ENABLED)

++<<<<<<< HEAD
 +  def parquetRebaseDateTimeInReadEnabled: Boolean = {
 +    getConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ)
 +  }
++=======
+   def integerGroupingIdEnabled: Boolean = getConf(SQLConf.LEGACY_INTEGER_GROUPING_ID)
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files

    /** ********************** SQLConf functionality methods ************ */

diff --cc sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 11ce11dd721,3e409ab9a50..00000000000
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@@ -324,20 -352,20 +352,32 @@@ public class VectorizedColumnReader
              }
            }
          } else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
-           if (rebaseDateTime) {
+           if ("CORRECTED".equals(datetimeRebaseMode)) {
              for (int i = rowId; i < rowId + num; ++i) {
                if (!column.isNullAt(i)) {
++<<<<<<< HEAD
 +                long julianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
 +                long julianMicros = DateTimeUtils.fromMillis(julianMillis);
 +                long gregorianMicros = RebaseDateTime.rebaseJulianToGregorianMicros(julianMicros);
 +                column.putLong(i, gregorianMicros);
++=======
+                 long gregorianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
+                 column.putLong(i, DateTimeUtils.millisToMicros(gregorianMillis));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                }
              }
            } else {
+             final boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
              for (int i = rowId; i < rowId + num; ++i) {
                if (!column.isNullAt(i)) {
++<<<<<<< HEAD
 +                long gregorianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
 +                column.putLong(i, DateTimeUtils.fromMillis(gregorianMillis));
++=======
+                 long julianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
+                 long julianMicros = DateTimeUtils.millisToMicros(julianMillis);
+                 column.putLong(i, rebaseMicros(julianMicros, failIfRebase));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                }
              }
            }
@@@ -485,27 -514,29 +526,38 @@@
        defColumn.readLongs(
          num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
      } else if (originalType == OriginalType.TIMESTAMP_MICROS) {
-       if (rebaseDateTime) {
-         defColumn.readLongsWithRebase(
-           num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
-       } else {
+       if ("CORRECTED".equals(datetimeRebaseMode)) {
          defColumn.readLongs(
            num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
+       } else {
+         boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
+         defColumn.readLongsWithRebase(
+           num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn, failIfRebase);
        }
      } else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
-       if (rebaseDateTime) {
+       if ("CORRECTED".equals(datetimeRebaseMode)) {
          for (int i = 0; i < num; i++) {
            if (defColumn.readInteger() == maxDefLevel) {
++<<<<<<< HEAD
 +            long micros = DateTimeUtils.fromMillis(dataColumn.readLong());
 +            column.putLong(rowId + i, RebaseDateTime.rebaseJulianToGregorianMicros(micros));
++=======
+             column.putLong(rowId + i, DateTimeUtils.millisToMicros(dataColumn.readLong()));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            } else {
              column.putNull(rowId + i);
            }
          }
        } else {
+         final boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
          for (int i = 0; i < num; i++) {
            if (defColumn.readInteger() == maxDefLevel) {
++<<<<<<< HEAD
 +            column.putLong(rowId + i, DateTimeUtils.fromMillis(dataColumn.readLong()));
++=======
+             long julianMicros = DateTimeUtils.millisToMicros(dataColumn.readLong());
+             column.putLong(rowId + i, rebaseMicros(julianMicros, failIfRebase));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            } else {
              column.putNull(rowId + i);
            }
diff --cc sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
index 08fbca2995c,201ee16faeb..00000000000
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
@@@ -291,19 -289,10 +289,26 @@@ private[parquet] class ParquetRowConver
          }

        case TimestampType if parquetType.getOriginalType == OriginalType.TIMESTAMP_MILLIS =>
++<<<<<<< HEAD
 +        if (rebaseDateTime) {
 +          new ParquetPrimitiveConverter(updater) {
 +            override def addLong(value: Long): Unit = {
 +              val micros = DateTimeUtils.fromMillis(value)
 +              val rebased = rebaseJulianToGregorianMicros(micros)
 +              updater.setLong(rebased)
 +            }
 +          }
 +        } else {
 +          new ParquetPrimitiveConverter(updater) {
 +            override def addLong(value: Long): Unit = {
 +              updater.setLong(DateTimeUtils.fromMillis(value))
 +            }
++=======
+         new ParquetPrimitiveConverter(updater) {
+           override def addLong(value: Long): Unit = {
+             val micros = DateTimeUtils.millisToMicros(value)
+             updater.setLong(timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            }
          }

diff --cc sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
index e367b9cc774,6c333671d59..00000000000
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
@@@ -187,24 -198,15 +198,29 @@@ class ParquetWriteSupport extends Write
                buf.order(ByteOrder.LITTLE_ENDIAN).putLong(timeOfDayNanos).putInt(julianDay)
                recordConsumer.addBinary(Binary.fromReusedByteArray(timestampBuffer))

-           case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MICROS if rebaseDateTime =>
-             (row: SpecializedGetters, ordinal: Int) =>
-               val rebasedMicros = rebaseGregorianToJulianMicros(row.getLong(ordinal))
-               recordConsumer.addLong(rebasedMicros)
-
            case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MICROS =>
              (row: SpecializedGetters, ordinal: Int) =>
++<<<<<<< HEAD
 +              recordConsumer.addLong(row.getLong(ordinal))
 +
 +          case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS if rebaseDateTime =>
 +            (row: SpecializedGetters, ordinal: Int) =>
 +              val rebasedMicros = rebaseGregorianToJulianMicros(row.getLong(ordinal))
 +              val millis = DateTimeUtils.toMillis(rebasedMicros)
 +              recordConsumer.addLong(millis)
 +
 +          case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS =>
 +            (row: SpecializedGetters, ordinal: Int) =>
 +              val millis = DateTimeUtils.toMillis(row.getLong(ordinal))
++=======
+               val micros = row.getLong(ordinal)
+               recordConsumer.addLong(timestampRebaseFunc(micros))
+
+           case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS =>
+             (row: SpecializedGetters, ordinal: Int) =>
+               val micros = row.getLong(ordinal)
+               val millis = DateTimeUtils.microsToMillis(timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                recordConsumer.addLong(millis)
          }

HyukjinKwon

LGTM. seems like mostly just becuase of DateTimeUtils.toMillis and DateTimeUtils.fromMillis

cloud-fan · 2020-05-14T08:10:30Z

Yea, all the conflicts are caused by the different name of DateTimeUtils.toMillis in 3.0

cloud-fan · 2020-05-14T12:11:46Z

retest this please

SparkQA · 2020-05-14T12:28:38Z

Test build #122616 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

dongjoon-hyun · 2020-05-14T22:03:10Z

FYI, the dependency failure will be fixed by the following.

[SPARK-31713][INFRA] Make test-dependencies.sh detect version string correctly #28532

cloud-fan · 2020-05-15T04:31:39Z

retest this please

HyukjinKwon · 2020-05-15T04:37:56Z

FYI, the issue seems not completely fixed yet due to the cached .m2/repository, see also https://issues.apache.org/jira/browse/SPARK-31693?focusedCommentId=17107856&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17107856

SparkQA · 2020-05-15T04:43:35Z

Test build #122646 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T04:45:28Z

Test build #5006 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

dongjoon-hyun · 2020-05-15T05:24:15Z

Ya. The failure jobs are on the same broken Jenkins machine, amp-jenkins-worker-05.
And, the running jobs are on amp-jenkins-worker-03 and amp-jenkins-worker-04.

SparkQA · 2020-05-15T07:05:02Z

Test build #5008 has finished for PR 28526 at commit 9f14144.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T07:05:02Z

Test build #5007 has finished for PR 28526 at commit 9f14144.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

dongjoon-hyun · 2020-05-15T07:21:03Z

@cloud-fan . You can see HiveExternalCatalogVersionsSuite in the above two runs consistently.
cc @viirya .

We need #28536 .

MaxGekk

LGTM

SparkQA · 2020-05-15T08:56:01Z

Test build #5013 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T08:56:59Z

Test build #5016 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T08:56:59Z

Test build #5014 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T08:57:00Z

Test build #5015 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T09:24:42Z

Test build #5017 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T09:25:25Z

Test build #5018 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T09:26:01Z

Test build #5019 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T09:26:26Z

Test build #5020 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T09:27:12Z

Test build #5024 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

HyukjinKwon · 2020-05-15T09:55:21Z

retest this please

SparkQA · 2020-05-15T10:09:37Z

Test build #122666 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T10:40:45Z

Test build #5025 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T13:01:45Z

Test build #5022 has finished for PR 28526 at commit 9f14144.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T13:05:01Z

Test build #5023 has finished for PR 28526 at commit 9f14144.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T13:13:03Z

Test build #5021 has finished for PR 28526 at commit 9f14144.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T13:35:11Z

Test build #5031 has started for PR 28526 at commit 9f14144.

SparkQA · 2020-05-15T13:35:48Z

Test build #5033 has started for PR 28526 at commit 9f14144.

SparkQA · 2020-05-15T13:36:14Z

Test build #5035 has started for PR 28526 at commit 9f14144.

SparkQA · 2020-05-15T13:36:37Z

Test build #5037 has started for PR 28526 at commit 9f14144.

SparkQA · 2020-05-15T13:46:31Z

Test build #5032 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T13:46:59Z

Test build #5034 has finished for PR 28526 at commit 9f14144.

This patch fails build dependency tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

SparkQA · 2020-05-15T17:22:32Z

Test build #5036 has finished for PR 28526 at commit 9f14144.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

HyukjinKwon · 2020-05-16T04:08:46Z

retest this please

SparkQA · 2020-05-16T07:05:02Z

Test build #122705 has finished for PR 28526 at commit 9f14144.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

HyukjinKwon · 2020-05-16T08:05:01Z

retest this please

SparkQA · 2020-05-16T13:16:27Z

Test build #122724 has finished for PR 28526 at commit 9f14144.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class AvroDeserializer(

### What changes were proposed in this pull request? It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching: #28526 #28463 #28517 ### Why are the changes needed? To make PR builder more stable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #28547 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching: #28526 #28463 #28517 ### Why are the changes needed? To make PR builder more stable ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A Closes #28547 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 2012d58) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…atetime values from/to Parquet/Avro files ### What changes were proposed in this pull request? When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not. ### Why are the changes needed? Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values. ### Does this PR introduce _any_ user-facing change? Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config. ### How was this patch tested? updated tests Closes #28526 from cloud-fan/backport. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2020-05-17T02:33:07Z

finally, jenkins pass... merging to 3.0!

probot-autolabeler bot added AVRO CORE SQL labels May 14, 2020

MaxGekk reviewed May 14, 2020

View reviewed changes

HyukjinKwon approved these changes May 14, 2020

View reviewed changes

MaxGekk approved these changes May 15, 2020

View reviewed changes

HyukjinKwon mentioned this pull request May 15, 2020

[SPARK-31405][SQL][3.0] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files #28539

Closed

cloud-fan mentioned this pull request May 15, 2020

[SPARK-31732][TESTS] Disable some flaky tests temporarily #28547

Closed

cloud-fan closed this May 17, 2020

[SPARK-31405][SQL][3.0] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files #28526

[SPARK-31405][SQL][3.0] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files #28526

Conversation

cloud-fan commented May 14, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

cloud-fan commented May 14, 2020

SparkQA commented May 14, 2020

MaxGekk left a comment

Choose a reason for hiding this comment

HyukjinKwon commented May 14, 2020

HyukjinKwon left a comment • edited Loading

Choose a reason for hiding this comment

cloud-fan commented May 14, 2020

cloud-fan commented May 14, 2020

SparkQA commented May 14, 2020

dongjoon-hyun commented May 14, 2020

cloud-fan commented May 15, 2020

HyukjinKwon commented May 15, 2020 • edited Loading

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

dongjoon-hyun commented May 15, 2020 • edited Loading

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

dongjoon-hyun commented May 15, 2020 • edited Loading

MaxGekk left a comment

Choose a reason for hiding this comment

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

HyukjinKwon commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

SparkQA commented May 15, 2020

HyukjinKwon commented May 16, 2020

SparkQA commented May 16, 2020

HyukjinKwon commented May 16, 2020

SparkQA commented May 16, 2020

cloud-fan commented May 17, 2020

HyukjinKwon left a comment •

edited

Loading

HyukjinKwon commented May 15, 2020 •

edited

Loading

dongjoon-hyun commented May 15, 2020 •

edited

Loading

dongjoon-hyun commented May 15, 2020 •

edited

Loading