Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49985][SQL] Remove support for interval types in Variant #48215

Closed

Conversation

harshmotw-db
Copy link
Contributor

@harshmotw-db harshmotw-db commented Sep 23, 2024

What changes were proposed in this pull request?

Support for interval types was added to the variant spec. This PR removes this support and removes the ability to cast from interval types to variant and vice versa.

Why are the changes needed?

I implemented interval support for Variant before, but because the Variant spec type is supposed to be open and compatible with other engines which may not support all the ANSI Interval types, more thought needs to be put into the design of these intervals in Variant.

Does this PR introduce any user-facing change?

Yes, after this change, users would no longer be able to cast between variants and intervals.

How was this patch tested?

Unit tests making sure that

  1. It is not possible to construct variants containing intervals.
  2. It is not possible to cast variants to intervals.
  3. Interval IDs in variants are treated just like other unknown type IDs.

Was this patch authored or co-authored using generative AI tooling?

No.

Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types.

[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata.
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. Type IDs 19 and 20 were used to represent interval types for whom support has been temporarily disabled, and therefore, these type IDs should not be used by new types.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE TO REVIEWERS: Please see if you approve of this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't future types use 19 and 20? Shouldn't intervals just be removed completely, and added later when the exact encoded format is agreed upon?

@harshmotw-db
Copy link
Contributor Author

cc @cashmand For changes in the README file.

Copy link
Contributor

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshmotw-db I have a high-level question about the interval. Shouldn't we just remove the interval type, and "release" the ids?

Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types.

[1] The parquet format does not have pure equivalents for the year-month and day-time interval types. Year-month intervals are usually represented using int32 values and the day-time intervals are usually represented using int64 values. However, these values don't include the start and end fields of these types. Therefore, Spark stores them in the column metadata.
Type IDs 17 and 18 were originally reserved for a prototype feature (string-from-metadata) that was never implemented. These IDs are available for use by new types. Type IDs 19 and 20 were used to represent interval types for whom support has been temporarily disabled, and therefore, these type IDs should not be used by new types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't future types use 19 and 20? Shouldn't intervals just be removed completely, and added later when the exact encoded format is agreed upon?

@harshmotw-db harshmotw-db changed the title [SPARK-48994][FOLLOW-UP] Disable support for interval types in Variant by default [SPARK-48994][FOLLOW-UP] Remove support for interval types in Variant by default Oct 15, 2024
@harshmotw-db harshmotw-db requested a review from gene-db October 15, 2024 19:57
@harshmotw-db harshmotw-db changed the title [SPARK-48994][FOLLOW-UP] Remove support for interval types in Variant by default [SPARK-48994][FOLLOW-UP] Remove support for interval types in Variant Oct 15, 2024
Copy link
Contributor

@gene-db gene-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshmotw-db Thanks!

LGTM

@harshmotw-db harshmotw-db changed the title [SPARK-48994][FOLLOW-UP] Remove support for interval types in Variant [SPARK-49985] Remove support for interval types in Variant Oct 16, 2024
@HyukjinKwon HyukjinKwon changed the title [SPARK-49985] Remove support for interval types in Variant [SPARK-49985][SQL] Remove support for interval types in Variant Oct 17, 2024
@harshmotw-db
Copy link
Contributor Author

@cloud-fan @HyukjinKwon can you please look at this PR? Thanks!

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants