Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Add support to filter out plans that use Timestamp with Timezone #23179

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kgpai
Copy link
Contributor

@kgpai kgpai commented Jul 11, 2024

Description

This change causes the native worker to throw if Timestamp with Timezone is encountered in the plan. This is gated on the configuration property 'fail_on_timestamp_with_timezone'.

Motivation and Context

Due to issue facebookincubator/velox#10338, there can be cases where comparisons with Timestamp with Timezone can be wrong.

Impact

Will add further details soon.

Test Plan

Working on adding tests.

== NO RELEASE NOTE ==

@kgpai kgpai requested a review from a team as a code owner July 11, 2024 01:13
@kgpai kgpai marked this pull request as draft July 11, 2024 01:14
if (auto joinNode =
std::dynamic_pointer_cast<const core::AbstractJoinNode>(planNode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can AbstractJoinNode be used standalone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would cover both HashJoin and MergeJoin.

return true;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregationNode::aggregates_::call is a CallTypedExprPtr that can contain a LambdaTypedExprPtr as input. Should we check it too?

@facebook-github-bot
Copy link
Collaborator

@gggrace14 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai kgpai force-pushed the filter_out_timestamp_tz branch from 8ef9d99 to 1ec636e Compare July 11, 2024 08:05
@facebook-github-bot
Copy link
Collaborator

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai kgpai force-pushed the filter_out_timestamp_tz branch from 1ec636e to 5c38d5a Compare July 11, 2024 17:38
@facebook-github-bot
Copy link
Collaborator

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kgpai kgpai force-pushed the filter_out_timestamp_tz branch from 5c38d5a to 064a415 Compare July 11, 2024 18:35
@facebook-github-bot
Copy link
Collaborator

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@aditi-pandit
Copy link
Contributor

@kgpai @gggrace14 @kagamiori : Am curious why the invalidation happens during Prestissimo plan conversion instead of the Presto sql parser/semantic validation of the plan ? This code is brittle and could need a lot of tweaking. Its better to reject such a query at the co-ordinator itself if we are using native engine.

It would be cleaner to drop the Timestamp with Timezone type and functions from the list here for native engine
https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java#L681

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java#L681

There could be other approaches as well.

@tdcmeehan : Would be great to hear your thoughts.

@kgpai kgpai force-pushed the filter_out_timestamp_tz branch from 064a415 to 1a34455 Compare July 11, 2024 22:03
@facebook-github-bot
Copy link
Collaborator

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@amitkdutta
Copy link
Contributor

amitkdutta commented Jul 11, 2024

@kgpai @gggrace14 @kagamiori : Am curious why the invalidation happens during Prestissimo plan conversion instead of the Presto sql parser/semantic validation of the plan ? This code is brittle and could need a lot of tweaking. Its better to reject such a query at the co-ordinator itself if we are using native engine.

It would be cleaner to drop the Timestamp with Timezone type and functions from the list here for native engine https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java#L681

https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java#L681

There could be other approaches as well.

@tdcmeehan : Would be great to hear your thoughts.

@aditi-pandit We plan to fix this in velox as quickly as possible. We are putting this interim solution guarded by a query config and intend to remove it as soon as velox support is landed (which we expected to be done in a few weeks). Don't want to add any change in co-ordinator at this moment since its temporary. Ideally we want to make the velox change, but it will require a few weeks.

@kgpai kgpai force-pushed the filter_out_timestamp_tz branch from 1a34455 to 9200982 Compare July 12, 2024 00:19
@facebook-github-bot
Copy link
Collaborator

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@tdcmeehan
Copy link
Contributor

Understood that this is a temporary measure. But I don't understand the hesitation around adding this into the coordinator? It would just be a better check--it would occur during analysis, rather than during execution (which is after resources have been allocated, after queueing, tasks have been spun up and splits have begun to be enumerated). I also think it would just be simpler to put it there.

@amitkdutta
Copy link
Contributor

Understood that this is a temporary measure. But I don't understand the hesitation around adding this into the coordinator? It would just be a better check--it would occur during analysis, rather than during execution (which is after resources have been allocated, after queueing, tasks have been spun up and splits have begun to be enumerated). I also think it would just be simpler to put it there.

@tdcmeehan @aditi-pandit Yea coordinator is definitely a better place, I am trying here
#23191
I stumbled upon into two cases (#23191 (comment)), will be really helpful if you can guide and take a look. This change is very important and also time sensitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants