Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Iceberg table sort orders #21977

Merged
merged 1 commit into from
Feb 7, 2025

Conversation

evanvdia
Copy link
Contributor

@evanvdia evanvdia commented Feb 21, 2024

Description

Add Support for Iceberg connector to create sorted files. The sort order can be configured with the sorted_by table property. When creating the table, can specify an array of one or more columns to use for sorting.

Cherry-pick of trinodb/trino#14891

Motivation and Context

Issue : #21978

Test Plan

  • Added tests for sorted_by operations

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Connector Changes
* Support for Iceberg table sort orders. Tables can be created to add a list of `sorted_by` columns which will be used to order files written to the table.

@evanvdia evanvdia requested a review from a team as a code owner February 21, 2024 08:40
@evanvdia evanvdia requested a review from presto-oss February 21, 2024 08:40
Copy link

linux-foundation-easycla bot commented Feb 21, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: evanvdia / name: Nidhin Varghese (d0a639b)

Copy link

github-actions bot commented Feb 21, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 8789cd9...d0a639b.

No notifications.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I had a few suggestions to help readability, and one question about the formatting of one of the two code examples.

Also please sign the Presto CLA by selecting the "Please click here to be authorized" link in the earlier comment.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch 2 times, most recently from d7a8899 to b65eae5 Compare February 22, 2024 11:16
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! This is better. I found a nit near the beginning, and I ask if you would recheck the formatting of the first code block example.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@tdcmeehan tdcmeehan added the iceberg Apache Iceberg related label Feb 22, 2024
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 8c6b521 to 57e5e2a Compare February 23, 2024 14:15
steveburnett
steveburnett previously approved these changes Feb 23, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull of updated branch, new local build of docs. Thanks for the changes!

@hantangwangd hantangwangd self-requested a review February 24, 2024 01:16
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 57e5e2a to 07ed82a Compare February 27, 2024 10:11
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a first look, some problems for discussion.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
Comment on lines 189 to 317
icebergTable.sortOrder().fields().stream()
.map(SortField::fromIceberg)
.collect(toImmutableList()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems haven't support IcebergNativeMetadata. Maybe try use transaction.replaceSortOrder() to support sort_by in IcebergNativeMetadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will analyse and work on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 07ed82a to 30ac341 Compare March 12, 2024 04:45
@tdcmeehan tdcmeehan requested a review from hantangwangd May 2, 2024 20:30
@tdcmeehan tdcmeehan self-assigned this May 2, 2024
@hantangwangd
Copy link
Member

Hi @evanvdia, just wanted to confirm that are you still working on this?

@evanvdia
Copy link
Contributor Author

evanvdia commented May 3, 2024

Hi @hantangwangd, yes. working on this.

@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch 2 times, most recently from 7b3bd0f to 3b6903b Compare May 23, 2024 11:43
@tdcmeehan
Copy link
Contributor

@evanvdia is this ready for review? There is a merge conflict.

@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 3b6903b to ef6df97 Compare June 5, 2024 16:23
@evanvdia
Copy link
Contributor Author

evanvdia commented Jun 5, 2024

@tdcmeehan Merge conflicts are resolved. But iceberg unit test cases are failing after rebasing.

image

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, please give us a notice when this PR is ready for review, thanks.

continue;
}

Optional<PartitionData> partitionData = getPartitionData(pagePartitioner.getColumns(), transformedPage, position);
WriteContext writer = createWriter(partitionData);
Optional<PartitionData> partitionData = getPartitionData(pagePartitioner.getColumns(), page, position);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Optional<PartitionData> partitionData = getPartitionData(pagePartitioner.getColumns(), page, position);
Optional<PartitionData> partitionData = getPartitionData(pagePartitioner.getColumns(), transformedPage, position);

After check the code, I think the problem is introduced by this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hantangwangd . that was the issue. done the changes. Will let you know once PR is ready for review.

@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from ef6df97 to 32840e1 Compare June 11, 2024 15:29
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 32840e1 to 5e60f75 Compare July 12, 2024 11:45
@evanvdia evanvdia requested a review from elharo as a code owner July 12, 2024 11:45
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch 3 times, most recently from f132df0 to 59aa7b6 Compare February 5, 2025 11:39
@evanvdia
Copy link
Contributor Author

evanvdia commented Feb 5, 2025

@tdcmeehan All changes are done.

sorted_by = ARRAY['join_date']
)

Iceberg Connector does not support sort order transforms.The following sort order transformations are not supported:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning is added when these are used, right? Can you add that information to this documentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Done

@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 59aa7b6 to e3c8cc4 Compare February 5, 2025 17:01
@evanvdia evanvdia requested a review from tdcmeehan February 6, 2025 03:38
@evanvdia
Copy link
Contributor Author

evanvdia commented Feb 6, 2025

@tdcmeehan Updated the doc. Could you please review.

@tdcmeehan
Copy link
Contributor

@steveburnett can you check the docs one last time?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! A few minor suggestions of phrasing for consistency, nothing major.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from e3c8cc4 to 938c2e0 Compare February 7, 2025 03:55
@evanvdia
Copy link
Contributor Author

evanvdia commented Feb 7, 2025

@steveburnett I have updated the changes. Can you please look into this.

@evanvdia evanvdia requested a review from steveburnett February 7, 2025 04:40
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 938c2e0 to 999f8e2 Compare February 7, 2025 05:03
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thank you!

tdcmeehan
tdcmeehan previously approved these changes Feb 7, 2025
@tdcmeehan tdcmeehan dismissed their stale review February 7, 2025 15:16

No attribution

Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please properly attribute this commit in the commit message. See our contributing guide for instructions.

Cherry-pick of  trinodb/trino#14891
Co-authored-by: Alexander Jo <jo.alex2144@gmail.com>
@evanvdia evanvdia force-pushed the iceberg_sorted_by_tbl_ppty branch from 999f8e2 to d0a639b Compare February 7, 2025 17:39
@ZacBlanco ZacBlanco merged commit 003d86a into prestodb:master Feb 7, 2025
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Apache Iceberg related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants