-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49985][SQL] Remove support for interval types in Variant #48215
[SPARK-49985][SQL] Remove support for interval types in Variant #48215
Conversation
common/variant/README.md
Outdated
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. | ||
|
||
[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata. | ||
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. Type IDs 19 and 20 were used to represent interval types for whom support has been temporarily disabled, and therefore, these type IDs should not be used by new types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE TO REVIEWERS: Please see if you approve of this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't future types use 19 and 20? Shouldn't intervals just be removed completely, and added later when the exact encoded format is agreed upon?
cc @cashmand For changes in the README file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshmotw-db I have a high-level question about the interval. Shouldn't we just remove the interval type, and "release" the ids?
common/variant/README.md
Outdated
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. | ||
|
||
[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata. | ||
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. Type IDs 19 and 20 were used to represent interval types for whom support has been temporarily disabled, and therefore, these type IDs should not be used by new types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't future types use 19 and 20? Shouldn't intervals just be removed completely, and added later when the exact encoded format is agreed upon?
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
…on't work with Variants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshmotw-db Thanks!
LGTM
@cloud-fan @HyukjinKwon can you please look at this PR? Thanks! |
Merged to master. |
What changes were proposed in this pull request?
Support for interval types was added to the variant spec. This PR removes this support and removes the ability to cast from interval types to variant and vice versa.
Why are the changes needed?
I implemented interval support for Variant before, but because the Variant spec type is supposed to be open and compatible with other engines which may not support all the ANSI Interval types, more thought needs to be put into the design of these intervals in Variant.
Does this PR introduce any user-facing change?
Yes, after this change, users would no longer be able to cast between variants and intervals.
How was this patch tested?
Unit tests making sure that
Was this patch authored or co-authored using generative AI tooling?
No.