Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-31405][SQL][3.0] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files #28526

Closed
wants to merge 1 commit into from

Conversation

cloud-fan
Copy link
Contributor

What changes were proposed in this pull request?

When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not.

Why are the changes needed?

Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values.

Does this PR introduce any user-facing change?

Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config.

How was this patch tested?

updated tests

…me values from/to Parquet/Avro files

When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not.

Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values.

Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config.

updated tests

Closes apache#28477 from cloud-fan/rebase.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
@cloud-fan
Copy link
Contributor Author

cc @HyukjinKwon @MaxGekk

@SparkQA
Copy link

SparkQA commented May 14, 2020

Test build #122607 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which places does the original PR conflict with 3.0?

@HyukjinKwon
Copy link
Member

Many ..

diff --cc external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
index 27206edb287,1d18594fd34..00000000000
--- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
+++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
@@@ -110,22 -116,13 +116,29 @@@ class AvroDeserializer
        case (LONG, TimestampType) => avroType.getLogicalType match {
          // For backward compatibility, if the Avro type is Long and it is not logical type
          // (the `null` case), the value is processed as timestamp type with millisecond precision.
++<<<<<<< HEAD
 +        case null | _: TimestampMillis if rebaseDateTime => (updater, ordinal, value) =>
 +          val millis = value.asInstanceOf[Long]
 +          val micros = DateTimeUtils.fromMillis(millis)
 +          val rebasedMicros = rebaseJulianToGregorianMicros(micros)
 +          updater.setLong(ordinal, rebasedMicros)
 +        case null | _: TimestampMillis => (updater, ordinal, value) =>
 +          val millis = value.asInstanceOf[Long]
 +          val micros = DateTimeUtils.fromMillis(millis)
 +          updater.setLong(ordinal, micros)
 +        case _: TimestampMicros if rebaseDateTime => (updater, ordinal, value) =>
 +          val micros = value.asInstanceOf[Long]
 +          val rebasedMicros = rebaseJulianToGregorianMicros(micros)
 +          updater.setLong(ordinal, rebasedMicros)
++=======
+         case null | _: TimestampMillis => (updater, ordinal, value) =>
+           val millis = value.asInstanceOf[Long]
+           val micros = DateTimeUtils.millisToMicros(millis)
+           updater.setLong(ordinal, timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
          case _: TimestampMicros => (updater, ordinal, value) =>
            val micros = value.asInstanceOf[Long]
-           updater.setLong(ordinal, micros)
+           updater.setLong(ordinal, timestampRebaseFunc(micros))
          case other => throw new IncompatibleSchemaException(
            s"Cannot convert Avro logical type ${other} to Catalyst Timestamp type.")
        }
diff --cc external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
index dc232168fd2,21c5dec6239..00000000000
--- a/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
+++ b/external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala
@@@ -155,15 -160,10 +160,22 @@@ class AvroSerializer
        case (TimestampType, LONG) => avroType.getLogicalType match {
            // For backward compatibility, if the Avro type is Long and it is not logical type
            // (the `null` case), output the timestamp value as with millisecond precision.
++<<<<<<< HEAD
 +          case null | _: TimestampMillis if rebaseDateTime => (getter, ordinal) =>
 +            val micros = getter.getLong(ordinal)
 +            val rebasedMicros = rebaseGregorianToJulianMicros(micros)
 +            DateTimeUtils.toMillis(rebasedMicros)
 +          case null | _: TimestampMillis => (getter, ordinal) =>
 +            DateTimeUtils.toMillis(getter.getLong(ordinal))
 +          case _: TimestampMicros if rebaseDateTime => (getter, ordinal) =>
 +            rebaseGregorianToJulianMicros(getter.getLong(ordinal))
 +          case _: TimestampMicros => (getter, ordinal) => getter.getLong(ordinal)
++=======
+           case null | _: TimestampMillis => (getter, ordinal) =>
+             DateTimeUtils.microsToMillis(timestampRebaseFunc(getter.getLong(ordinal)))
+           case _: TimestampMicros => (getter, ordinal) =>
+             timestampRebaseFunc(getter.getLong(ordinal))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            case other => throw new IncompatibleSchemaException(
              s"Cannot convert Catalyst Timestamp type to Avro logical type ${other}")
          }
diff --cc sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 6c18280ce4d,aeaf884c7d1..00000000000
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@@ -2509,57 -2520,71 +2509,83 @@@ object SQLConf
      .booleanConf
      .createWithDefault(false)

++<<<<<<< HEAD
 +  val LEGACY_PARQUET_REBASE_DATETIME_IN_WRITE =
 +    buildConf("spark.sql.legacy.parquet.rebaseDateTimeInWrite.enabled")
 +      .internal()
 +      .doc("When true, rebase dates/timestamps from Proleptic Gregorian calendar " +
 +        "to the hybrid calendar (Julian + Gregorian) in write. " +
 +        "The rebasing is performed by converting micros/millis/days to " +
 +        "a local date/timestamp in the source calendar, interpreting the resulted date/" +
 +        "timestamp in the target calendar, and getting the number of micros/millis/days " +
 +        "since the epoch 1970-01-01 00:00:00Z.")
-       .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
-
-   val LEGACY_PARQUET_REBASE_DATETIME_IN_READ =
-     buildConf("spark.sql.legacy.parquet.rebaseDateTimeInRead.enabled")
++=======
+    val LEGACY_INTEGER_GROUPING_ID =
+     buildConf("spark.sql.legacy.integerGroupingId")
        .internal()
-       .doc("When true, rebase dates/timestamps " +
-         "from the hybrid calendar to Proleptic Gregorian calendar in read. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
-       .version("3.0.0")
+       .doc("When true, grouping_id() returns int values instead of long values.")
+       .version("3.1.0")
        .booleanConf
        .createWithDefault(false)

-   val LEGACY_AVRO_REBASE_DATETIME_IN_WRITE =
-     buildConf("spark.sql.legacy.avro.rebaseDateTimeInWrite.enabled")
+   val LEGACY_PARQUET_REBASE_MODE_IN_WRITE =
+     buildConf("spark.sql.legacy.parquet.datetimeRebaseModeInWrite")
        .internal()
-       .doc("When true, rebase dates/timestamps from Proleptic Gregorian calendar " +
-         "to the hybrid calendar (Julian + Gregorian) in write. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
+       .doc("When LEGACY, Spark will rebase dates/timestamps from Proleptic Gregorian calendar " +
+         "to the legacy hybrid (Julian + Gregorian) calendar when writing Parquet files. " +
+         "When CORRECTED, Spark will not do rebase and write the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the writing if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars.")
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
        .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)
+
+   val LEGACY_PARQUET_REBASE_MODE_IN_READ =
+     buildConf("spark.sql.legacy.parquet.datetimeRebaseModeInRead")
+       .internal()
+       .doc("When LEGACY, Spark will rebase dates/timestamps from the legacy hybrid (Julian + " +
+         "Gregorian) calendar to Proleptic Gregorian calendar when reading Parquet files. " +
+         "When CORRECTED, Spark will not do rebase and read the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the reading if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars. This config is " +
+         "only effective if the writer info (like Spark, Hive) of the Parquet files is unknown.")
+       .version("3.0.0")
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)

-   val LEGACY_AVRO_REBASE_DATETIME_IN_READ =
-     buildConf("spark.sql.legacy.avro.rebaseDateTimeInRead.enabled")
+   val LEGACY_AVRO_REBASE_MODE_IN_WRITE =
+     buildConf("spark.sql.legacy.avro.datetimeRebaseModeInWrite")
        .internal()
-       .doc("When true, rebase dates/timestamps " +
-         "from the hybrid calendar to Proleptic Gregorian calendar in read. " +
-         "The rebasing is performed by converting micros/millis/days to " +
-         "a local date/timestamp in the source calendar, interpreting the resulted date/" +
-         "timestamp in the target calendar, and getting the number of micros/millis/days " +
-         "since the epoch 1970-01-01 00:00:00Z.")
+       .doc("When LEGACY, Spark will rebase dates/timestamps from Proleptic Gregorian calendar " +
+         "to the legacy hybrid (Julian + Gregorian) calendar when writing Avro files. " +
+         "When CORRECTED, Spark will not do rebase and write the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the writing if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars.")
        .version("3.0.0")
-       .booleanConf
-       .createWithDefault(false)
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)
+
+   val LEGACY_AVRO_REBASE_MODE_IN_READ =
+     buildConf("spark.sql.legacy.avro.datetimeRebaseModeInRead")
+       .internal()
+       .doc("When LEGACY, Spark will rebase dates/timestamps from the legacy hybrid (Julian + " +
+         "Gregorian) calendar to Proleptic Gregorian calendar when reading Avro files. " +
+         "When CORRECTED, Spark will not do rebase and read the dates/timestamps as it is. " +
+         "When EXCEPTION, which is the default, Spark will fail the reading if it sees " +
+         "ancient dates/timestamps that are ambiguous between the two calendars. This config is " +
+         "only effective if the writer info (like Spark, Hive) of the Avro files is unknown.")
+       .version("3.0.0")
+       .stringConf
+       .transform(_.toUpperCase(Locale.ROOT))
+       .checkValues(LegacyBehaviorPolicy.values.map(_.toString))
+       .createWithDefault(LegacyBehaviorPolicy.EXCEPTION.toString)

    /**
     * Holds information about keys that have been deprecated.
@@@ -3139,9 -3166,7 +3165,13 @@@ class SQLConf extends Serializable wit

    def csvFilterPushDown: Boolean = getConf(CSV_FILTER_PUSHDOWN_ENABLED)

++<<<<<<< HEAD
 +  def parquetRebaseDateTimeInReadEnabled: Boolean = {
 +    getConf(SQLConf.LEGACY_PARQUET_REBASE_DATETIME_IN_READ)
 +  }
++=======
+   def integerGroupingIdEnabled: Boolean = getConf(SQLConf.LEGACY_INTEGER_GROUPING_ID)
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files

    /** ********************** SQLConf functionality methods ************ */

diff --cc sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
index 11ce11dd721,3e409ab9a50..00000000000
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java
@@@ -324,20 -352,20 +352,32 @@@ public class VectorizedColumnReader
              }
            }
          } else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
-           if (rebaseDateTime) {
+           if ("CORRECTED".equals(datetimeRebaseMode)) {
              for (int i = rowId; i < rowId + num; ++i) {
                if (!column.isNullAt(i)) {
++<<<<<<< HEAD
 +                long julianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
 +                long julianMicros = DateTimeUtils.fromMillis(julianMillis);
 +                long gregorianMicros = RebaseDateTime.rebaseJulianToGregorianMicros(julianMicros);
 +                column.putLong(i, gregorianMicros);
++=======
+                 long gregorianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
+                 column.putLong(i, DateTimeUtils.millisToMicros(gregorianMillis));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                }
              }
            } else {
+             final boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
              for (int i = rowId; i < rowId + num; ++i) {
                if (!column.isNullAt(i)) {
++<<<<<<< HEAD
 +                long gregorianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
 +                column.putLong(i, DateTimeUtils.fromMillis(gregorianMillis));
++=======
+                 long julianMillis = dictionary.decodeToLong(dictionaryIds.getDictId(i));
+                 long julianMicros = DateTimeUtils.millisToMicros(julianMillis);
+                 column.putLong(i, rebaseMicros(julianMicros, failIfRebase));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                }
              }
            }
@@@ -485,27 -514,29 +526,38 @@@
        defColumn.readLongs(
          num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
      } else if (originalType == OriginalType.TIMESTAMP_MICROS) {
-       if (rebaseDateTime) {
-         defColumn.readLongsWithRebase(
-           num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
-       } else {
+       if ("CORRECTED".equals(datetimeRebaseMode)) {
          defColumn.readLongs(
            num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn);
+       } else {
+         boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
+         defColumn.readLongsWithRebase(
+           num, column, rowId, maxDefLevel, (VectorizedValuesReader) dataColumn, failIfRebase);
        }
      } else if (originalType == OriginalType.TIMESTAMP_MILLIS) {
-       if (rebaseDateTime) {
+       if ("CORRECTED".equals(datetimeRebaseMode)) {
          for (int i = 0; i < num; i++) {
            if (defColumn.readInteger() == maxDefLevel) {
++<<<<<<< HEAD
 +            long micros = DateTimeUtils.fromMillis(dataColumn.readLong());
 +            column.putLong(rowId + i, RebaseDateTime.rebaseJulianToGregorianMicros(micros));
++=======
+             column.putLong(rowId + i, DateTimeUtils.millisToMicros(dataColumn.readLong()));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            } else {
              column.putNull(rowId + i);
            }
          }
        } else {
+         final boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
          for (int i = 0; i < num; i++) {
            if (defColumn.readInteger() == maxDefLevel) {
++<<<<<<< HEAD
 +            column.putLong(rowId + i, DateTimeUtils.fromMillis(dataColumn.readLong()));
++=======
+             long julianMicros = DateTimeUtils.millisToMicros(dataColumn.readLong());
+             column.putLong(rowId + i, rebaseMicros(julianMicros, failIfRebase));
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            } else {
              column.putNull(rowId + i);
            }
diff --cc sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
index 08fbca2995c,201ee16faeb..00000000000
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
@@@ -291,19 -289,10 +289,26 @@@ private[parquet] class ParquetRowConver
          }

        case TimestampType if parquetType.getOriginalType == OriginalType.TIMESTAMP_MILLIS =>
++<<<<<<< HEAD
 +        if (rebaseDateTime) {
 +          new ParquetPrimitiveConverter(updater) {
 +            override def addLong(value: Long): Unit = {
 +              val micros = DateTimeUtils.fromMillis(value)
 +              val rebased = rebaseJulianToGregorianMicros(micros)
 +              updater.setLong(rebased)
 +            }
 +          }
 +        } else {
 +          new ParquetPrimitiveConverter(updater) {
 +            override def addLong(value: Long): Unit = {
 +              updater.setLong(DateTimeUtils.fromMillis(value))
 +            }
++=======
+         new ParquetPrimitiveConverter(updater) {
+           override def addLong(value: Long): Unit = {
+             val micros = DateTimeUtils.millisToMicros(value)
+             updater.setLong(timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
            }
          }

diff --cc sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
index e367b9cc774,6c333671d59..00000000000
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala
@@@ -187,24 -198,15 +198,29 @@@ class ParquetWriteSupport extends Write
                buf.order(ByteOrder.LITTLE_ENDIAN).putLong(timeOfDayNanos).putInt(julianDay)
                recordConsumer.addBinary(Binary.fromReusedByteArray(timestampBuffer))

-           case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MICROS if rebaseDateTime =>
-             (row: SpecializedGetters, ordinal: Int) =>
-               val rebasedMicros = rebaseGregorianToJulianMicros(row.getLong(ordinal))
-               recordConsumer.addLong(rebasedMicros)
-
            case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MICROS =>
              (row: SpecializedGetters, ordinal: Int) =>
++<<<<<<< HEAD
 +              recordConsumer.addLong(row.getLong(ordinal))
 +
 +          case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS if rebaseDateTime =>
 +            (row: SpecializedGetters, ordinal: Int) =>
 +              val rebasedMicros = rebaseGregorianToJulianMicros(row.getLong(ordinal))
 +              val millis = DateTimeUtils.toMillis(rebasedMicros)
 +              recordConsumer.addLong(millis)
 +
 +          case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS =>
 +            (row: SpecializedGetters, ordinal: Int) =>
 +              val millis = DateTimeUtils.toMillis(row.getLong(ordinal))
++=======
+               val micros = row.getLong(ordinal)
+               recordConsumer.addLong(timestampRebaseFunc(micros))
+
+           case SQLConf.ParquetOutputTimestampType.TIMESTAMP_MILLIS =>
+             (row: SpecializedGetters, ordinal: Int) =>
+               val micros = row.getLong(ordinal)
+               val millis = DateTimeUtils.microsToMillis(timestampRebaseFunc(micros))
++>>>>>>> fd2d55c9919... [SPARK-31405][SQL] Fail by default when reading/writing legacy datetime values from/to Parquet/Avro files
                recordConsumer.addLong(millis)
          }

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. seems like mostly just becuase of DateTimeUtils.toMillis and DateTimeUtils.fromMillis

@cloud-fan
Copy link
Contributor Author

Yea, all the conflicts are caused by the different name of DateTimeUtils.toMillis in 3.0

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented May 14, 2020

Test build #122616 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@dongjoon-hyun
Copy link
Member

FYI, the dependency failure will be fixed by the following.

@cloud-fan
Copy link
Contributor Author

retest this please

@HyukjinKwon
Copy link
Member

HyukjinKwon commented May 15, 2020

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122646 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5006 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 15, 2020

Ya. The failure jobs are on the same broken Jenkins machine, amp-jenkins-worker-05.
And, the running jobs are on amp-jenkins-worker-03 and amp-jenkins-worker-04.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5008 has finished for PR 28526 at commit 9f14144.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5007 has finished for PR 28526 at commit 9f14144.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented May 15, 2020

@cloud-fan . You can see HiveExternalCatalogVersionsSuite in the above two runs consistently.
cc @viirya .

We need #28536 .

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5013 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5016 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5014 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5015 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5017 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5018 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5019 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5020 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5024 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #122666 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5025 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5022 has finished for PR 28526 at commit 9f14144.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5023 has finished for PR 28526 at commit 9f14144.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5021 has finished for PR 28526 at commit 9f14144.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5031 has started for PR 28526 at commit 9f14144.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5033 has started for PR 28526 at commit 9f14144.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5035 has started for PR 28526 at commit 9f14144.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5037 has started for PR 28526 at commit 9f14144.

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5032 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5034 has finished for PR 28526 at commit 9f14144.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@SparkQA
Copy link

SparkQA commented May 15, 2020

Test build #5036 has finished for PR 28526 at commit 9f14144.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 16, 2020

Test build #122705 has finished for PR 28526 at commit 9f14144.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 16, 2020

Test build #122724 has finished for PR 28526 at commit 9f14144.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class AvroDeserializer(

dongjoon-hyun pushed a commit that referenced this pull request May 16, 2020
### What changes were proposed in this pull request?

It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching:
#28526
#28463
#28517

### Why are the changes needed?

To make PR builder more stable

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28547 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun pushed a commit that referenced this pull request May 16, 2020
### What changes were proposed in this pull request?

It's quite annoying to be blocked by flaky tests in several PRs. This PR disables them. The tests come from 3 PRs I'm recently watching:
#28526
#28463
#28517

### Why are the changes needed?

To make PR builder more stable

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

N/A

Closes #28547 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 2012d58)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
cloud-fan added a commit that referenced this pull request May 17, 2020
…atetime values from/to Parquet/Avro files

### What changes were proposed in this pull request?

When reading/writing datetime values that before the rebase switch day, from/to Avro/Parquet files, fail by default and ask users to set a config to explicitly do rebase or not.

### Why are the changes needed?

Rebase or not rebase have different behaviors and we should let users decide it explicitly. In most cases, users won't hit this exception as it only affects ancient datetime values.

### Does this PR introduce _any_ user-facing change?

Yes, now users will see an error when reading/writing dates before 1582-10-15 or timestamps before 1900-01-01 from/to Parquet/Avro files, with an error message to ask setting a config.

### How was this patch tested?

updated tests

Closes #28526 from cloud-fan/backport.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor Author

finally, jenkins pass... merging to 3.0!

@cloud-fan cloud-fan closed this May 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants