Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading INT96 timestamps as LongTimestampWithTimeZoneType #22781

Merged
merged 2 commits into from
Jul 25, 2024

Conversation

raunaqmorarka
Copy link
Member

@raunaqmorarka raunaqmorarka commented Jul 24, 2024

Hive tables migrated to iceberg through Spark require the ability to read INT96 timestamps as LongTimestampWithTimeZoneType

Description

Fixes #11338

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Fix failure to read hive tables migrated to iceberg through Apache Spark. ({issue}`11338`)

@cla-bot cla-bot bot added the cla-signed label Jul 24, 2024
@github-actions github-actions bot added the hive Hive connector label Jul 24, 2024
Copy link
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@raunaqmorarka raunaqmorarka added the iceberg Iceberg connector label Jul 24, 2024
@raunaqmorarka raunaqmorarka requested a review from marcinsbd July 24, 2024 10:33
@@ -469,6 +469,32 @@ public void skip(int n)
};
}

public ValueDecoder<int[]> getInt96ToLongTimestampWithTimeZoneDecoder(ParquetEncoding encoding)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls add in the description of the PR that the following method can be used as reference while reviewing the PR

public ValueDecoder<int[]> getInt96ToLongTimestampDecoder(ParquetEncoding encoding, DateTimeZone timeZone)
{
checkArgument(
field.getType() instanceof TimestampType timestampType && !timestampType.isShort(),
"Trino type %s is not a long timestamp",
field.getType());
int precision = ((TimestampType) field.getType()).getPrecision();
return new InlineTransformDecoder<>(
getInt96TimestampDecoder(encoding),
(values, offset, length) -> {
for (int i = offset; i < offset + length; i++) {
long epochSeconds = decodeFixed12First(values, i);
long nanosOfSecond = decodeFixed12Second(values, i);
if (timeZone != DateTimeZone.UTC) {
epochSeconds = timeZone.convertUTCToLocal(epochSeconds * MILLISECONDS_PER_SECOND) / MILLISECONDS_PER_SECOND;
}
if (precision < 9) {
nanosOfSecond = (int) round(nanosOfSecond, 9 - precision);
}
// epochMicros
encodeFixed12(
epochSeconds * MICROSECONDS_PER_SECOND + (nanosOfSecond / NANOSECONDS_PER_MICROSECOND),
(int) ((nanosOfSecond * PICOSECONDS_PER_NANOSECOND) % PICOSECONDS_PER_MICROSECOND),
values,
i);
}
});
}

Comment on lines +640 to +642
if (precision < 9) {
epochNanos = round(epochNanos, 9 - precision);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code section tested anywhere?

have .123456789 in the storage , but having MICROSECONDS or MILLISECONDS the precision being read?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is called when TIMESTAMP_TZ_MICROS is tested, io.trino.parquet.reader.TestingColumnReader#WRITE_INT96 populates nanos which result in rounding to lower precision

…Type

Hive tables migrated to iceberg through Spark require the ability to
read INT96 timestamps as LongTimestampWithTimeZoneType
Copy link
Contributor

@wendigo wendigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@wendigo wendigo merged commit 849d995 into trinodb:master Jul 25, 2024
59 checks passed
@github-actions github-actions bot added this to the 453 milestone Jul 25, 2024
@raunaqmorarka raunaqmorarka deleted the int96-ice branch July 25, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector iceberg Iceberg connector
6 participants