Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert TIMESTAMP_WITH_TIME_ZONE to primitive type #7480

Closed
wants to merge 1 commit into from

Conversation

wypb
Copy link
Contributor

@wypb wypb commented Nov 9, 2023

TimestampWithTimeZone is a primitive type in Presto. In contrast,
it is implemented as a Row<int64_t, int16_t> in Velox and hence
treated as a complex type implicitly. This PR convert TIMESTAMP_WITH_TIME_ZONE
to primitive type to achive better consistency between Velox and Presto,
details of the discussion can be found here.

CC: @mbasmanova @aditi-pandit @majetideepak @kagamiori

@wypb wypb marked this pull request as draft November 9, 2023 03:44
Copy link

netlify bot commented Nov 9, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 5caa663
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/65d7ec3ff085540008274b20

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 9, 2023
@wypb wypb force-pushed the timestamp_with_time_zone branch 4 times, most recently from 2c08673 to 7d2c503 Compare November 9, 2023 05:36
@wypb wypb marked this pull request as ready for review November 9, 2023 06:45
@majetideepak
Copy link
Collaborator

majetideepak commented Nov 9, 2023

This change will also break the Presto TimestampTz semantics as mentioned here #2511 (comment)
But this change helps optimize TimestampTz type from Hive file formats. Let's make that clear in the description.

@wypb
Copy link
Contributor Author

wypb commented Nov 10, 2023

This change will also break the Presto TimestampTz semantics as mentioned here #2511 (comment)

I don't quite understand this. The implementation of TimestampTz in this PR is represented by bigint, which contains timestamp and time zone, I think it is consistent with this this implementation in Presto.

@majetideepak
Copy link
Collaborator

@wypb you are right. I misunderstood Java's TimestampTz specification.
After reading some of the Java code, the semantics do match. Thanks!

@wypb wypb force-pushed the timestamp_with_time_zone branch from 10d44a7 to 17ad5d7 Compare November 16, 2023 02:45
@wypb
Copy link
Contributor Author

wypb commented Nov 16, 2023

Hi @kagamiori @aditi-pandit @mbasmanova Could you also help me review this, thank you.

@wypb
Copy link
Contributor Author

wypb commented Nov 22, 2023

Hi @kagamiori Could you also help me review this, thank you.

@wypb
Copy link
Contributor Author

wypb commented Nov 28, 2023

Hi @kagamiori @aditi-pandit @mbasmanova Any comments on this PR?

@wypb
Copy link
Contributor Author

wypb commented Dec 12, 2023

Hi @mbasmanova @kagamiori @aditi-pandit Could you help me review this, thank you.

@mbasmanova
Copy link
Contributor

@aditi-pandit Aditi, would you help review this PR?

Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wypb. Did a quick round of review.

return;
}

auto timestamp = this->toTimestamp(timestampWithTimezone);
auto dateTime = getDateTime(timestamp, nullptr);
adjustDateTime(dateTime, unit);
auto tzID = unpackZoneKeyId(timestampWithTimezone);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Can this local variable be inlined in line 870 ?

@@ -125,4 +124,18 @@ class TimestampWithTimeZoneTypeFactories : public CustomTypeFactories {

void registerTimestampWithTimeZoneType();

static constexpr int32_t TIME_ZONE_MASK = 0xFFF;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The velox convention is "Use kPascalCase for static constants and enumerators" So kTimezoneMask kMillisShift is better.

const auto milliseconds = *timestampWithTimezone.template at<0>();
auto milliseconds = unpackMillisUtc(timestampWithTimezone);
auto tzID = unpackZoneKeyId(timestampWithTimezone);

Timestamp timestamp = Timestamp::fromMillis(milliseconds);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the local variables. They are used only once and can be inlined.

auto timezone =
util::getTimeZoneName(*timestampWithTimezone.template at<1>());
auto tzID = unpackZoneKeyId(timestampWithTimezone);
auto timezone = util::getTimeZoneName(tzID);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline tzId variable use.

result.template get_writer_at<0>() = utcTimestamp.getSeconds() * 1000;
result.template get_writer_at<1>() =
*timestampWithTimezone.template at<1>();
auto milliseconds = unpackMillisUtc(timestampWithTimezone);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : Inline local variables used only once.

auto timestamps = BaseVector::create(BIGINT(), size, pool);
auto* rawTimestamps =
timestamps->asFlatVector<int64_t>()->mutableRawValues();
auto timestamps = AlignedBuffer::allocate<int64_t>(size, pool, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor.

flatResult.setNull(row, true);
} else {
Timestamp ts = Timestamp::fromMillis(timestampVector->valueAt(row));
auto timestampWithTimezone =
input.as<SimpleVector<int64_t>>()->valueAt(row);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input.as<SimpleVector<int64_t>> can be made a local variable outside of the applyToSelectedNoThrow to avoid the casting per row.

applyHashFunction(rows, *vector_.get(), hashes, [&](auto row) {
auto timestampWithTimeZone = vector_->valueAt<int64_t>(row);
auto timestamp = unpackMillisUtc(timestampWithTimeZone);
return hashInteger(timestamp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestampWithTimeZone and timestamp can both be inlined.

@wypb wypb force-pushed the timestamp_with_time_zone branch 2 times, most recently from 397af42 to 3eb9510 Compare December 14, 2023 03:23
@wypb
Copy link
Contributor Author

wypb commented Dec 14, 2023

Hi @aditi-pandit Thank you for your review, I have modified the corresponding code according to the review comments. If you have time, please help to review it again. Thank you.

@wypb wypb force-pushed the timestamp_with_time_zone branch 5 times, most recently from e3796dd to c1f7dff Compare December 14, 2023 11:58
@wypb
Copy link
Contributor Author

wypb commented Feb 21, 2024

Hi @kagamiori wei, Thank you for your reply. I'll do it today.

@wypb wypb force-pushed the timestamp_with_time_zone branch 2 times, most recently from 048e21a to 1533794 Compare February 21, 2024 02:35
@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@wypb wypb force-pushed the timestamp_with_time_zone branch 2 times, most recently from e35564a to 93819aa Compare February 21, 2024 05:56
Copy link
Contributor

@spershin spershin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb

Thanks for working on this!
A added a couple of nits and a question.

@@ -135,6 +135,11 @@ FOLLY_ALWAYS_INLINE void PrestoHasher::hash<TypeKind::BIGINT>(
// returns the corresponding value directly.
return vector_->valueAt<int64_t>(row);
});
} else if (isTimestampWithTimeZoneType(vector_->base()->type())) {
// Hash only timestamp value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is how it was before, but why do we ignore TZ part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimestampWithTimeZoneType()
: RowType({"timestamp", "timezone"}, {BIGINT(), SMALLINT()}) {}
class TimestampWithTimeZoneType : public BigintType {
TimestampWithTimeZoneType() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint recommends

TimestampWithTimeZoneType() = default;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed.

Comment on lines 129 to 130
static constexpr int32_t kTimezoneMask = 0xFFF;
static constexpr int32_t kMillisShift = 12;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
static should not be used for globals in a header.
For these two constexpr would suffice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed.

@wypb wypb force-pushed the timestamp_with_time_zone branch 2 times, most recently from 31be7da to 4e02577 Compare February 22, 2024 00:53
@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kagamiori
Copy link
Contributor

kagamiori commented Feb 22, 2024

Hi @wypb, the change to Base64Test.timestampWithTimezone has just been merged internally. Could you rebase this PR onto the latest main one more time? Thanks!

There is a conflict with PrestoSerializer.cpp by the way.

@wypb wypb force-pushed the timestamp_with_time_zone branch from 980eb7b to 02b512b Compare February 23, 2024 00:45
@wypb wypb force-pushed the timestamp_with_time_zone branch from 7b961af to 5caa663 Compare February 23, 2024 00:52
@wypb
Copy link
Contributor Author

wypb commented Feb 23, 2024

Hi @kagamiori I have synchronized the latest code, and the relevant UT seems to have passed. PTAL, thank you.

@facebook-github-bot
Copy link
Contributor

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@majetideepak
Copy link
Collaborator

I see that this landed yesterday here 6fdaff7
Should we close this PR?

@kagamiori
Copy link
Contributor

kagamiori commented Feb 28, 2024

I see that this landed yesterday here 6fdaff7 Should we close this PR?

Yeah, not sure why this PR was not auto-closed. I'm closing it now.

@kagamiori kagamiori closed this Feb 28, 2024
@wypb wypb deleted the timestamp_with_time_zone branch February 28, 2024 22:51
@kagamiori
Copy link
Contributor

Hi @wypb, are you going to make a follow-up PR of prestodb/presto#21974 to remove the unused code for the old TimestampWithTimezone type and re-enable the unit test?

@wypb
Copy link
Contributor Author

wypb commented Mar 7, 2024

@kagamiori I will enable it today.

facebook-github-bot pushed a commit that referenced this pull request Mar 14, 2024
Summary:
TIMESTAMP_WITH_TIME_ZONE has been converted to a primitive type in #7480.

Update the documentation to match.

Pull Request resolved: #9042

Reviewed By: Yuhta

Differential Revision: D54885988

Pulled By: mbasmanova

fbshipit-source-id: 76a3a8cafbaf812ff7573816b57b1d1cde099471
Joe-Abraham pushed a commit to Joe-Abraham/velox that referenced this pull request Jun 7, 2024
…r#9042)

Summary:
TIMESTAMP_WITH_TIME_ZONE has been converted to a primitive type in facebookincubator#7480.

Update the documentation to match.

Pull Request resolved: facebookincubator#9042

Reviewed By: Yuhta

Differential Revision: D54885988

Pulled By: mbasmanova

fbshipit-source-id: 76a3a8cafbaf812ff7573816b57b1d1cde099471
@mbasmanova
Copy link
Contributor

mbasmanova commented Jun 27, 2024

@wypb FYI, there is a correctness issue for queries that use this type: #10338

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants