Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UPDATE support for Iceberg #24281

Merged
merged 2 commits into from
Feb 7, 2025

Conversation

ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Dec 18, 2024

Description

This PR adds support in the Iceberg connector for UPDATE operations.

Motivation and Context

Row-level table UPDATE support

Impact

  • Users can now set update_mode table property on Iceberg tables
  • Inserts and deletes now show operations as "overwrite" in the snapshot entries due to the new update implementation.
  • UPDATE <x> SET ... WHERE ... queries can now run successfully

Test Plan

Comprehensive set of unit tests for different tables and column types.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Changes
* Iceberg connector support for ``UPDATE`` SQL statements

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 18, 2024
@prestodb-ci prestodb-ci requested review from a team, anandamideShakyan and ShahimSharafudeen and removed request for a team December 18, 2024 22:05
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from e56fc7c to f0a29a0 Compare December 20, 2024 18:28
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from 94cc4b3 to 4aa34a0 Compare December 31, 2024 16:56
@ZacBlanco ZacBlanco marked this pull request as ready for review January 2, 2025 19:07
@ZacBlanco ZacBlanco requested a review from presto-oss January 2, 2025 19:07
@steveburnett
Copy link
Contributor

Thanks for the release note! Formatting nits:

== RELEASE NOTES ==

Iceberg Changes
* Iceberg connector support for ``UPDATE`` SQL statements. :pr:`24281`

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing this great feature. Some little problems and nits during the first rough inspection, and will take a detailed look in a couple of days.

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this high-quality implementation, overall looks good to me, only some little nits.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm sorry for the delay in reviewing it. I completed a partial review and will continue with the rest.

hantangwangd
hantangwangd previously approved these changes Feb 2, 2025
Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the whole work, looks good to me, only a couple of little nits.

Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, almost LGTM, just a couple of nits.

@steveburnett
Copy link
Contributor

New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@ZacBlanco ZacBlanco requested a review from elharo as a code owner February 4, 2025 01:45
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! A few suggestions, looks good overall.

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved
steveburnett
steveburnett previously approved these changes Feb 4, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local docs build, everything looks good. Thanks!

hantangwangd
hantangwangd previously approved these changes Feb 4, 2025
@ZacBlanco ZacBlanco requested a review from imjalpreet February 5, 2025 00:57
imjalpreet
imjalpreet previously approved these changes Feb 5, 2025
Copy link
Member

@imjalpreet imjalpreet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR @ZacBlanco

@tdcmeehan tdcmeehan self-assigned this Feb 5, 2025
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clean up the commits.

@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from 1a7292c to e3d8733 Compare February 6, 2025 19:13
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits, looks good

@@ -1237,8 +1256,8 @@ public static String metadataLocation(Table icebergTable)

/**
* Get the data location for target {@link Table},
* considering iceberg table properties {@code WRITE_DATA_LOCATION}, {@code OBJECT_STORE_PATH} and {@code WRITE_FOLDER_STORAGE_LOCATION}
* */
* considering iceberg table properties {@code WRITE_DATA_LOCATION}, {@code OBJECT_STORE_PATH} and {@code WRITE_FOLDER_STORAGE_LOCATION}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the comments were improperly formatted before, but can you revert this? It would just muddle the git blame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IntelliJ IDE likes to auto-format these...maybe I'll open a separate PR to apply formatting to the entire connector. This seems to happen quite often

@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from e3d8733 to 3e8eacc Compare February 6, 2025 23:37
ZacBlanco and others added 2 commits February 7, 2025 11:22
Without this change, the UpdateOperator would throw an exception
stating that there was no valid page source. This occurs because the
driver which is responsible for setting the UpdateablePageSource
never calls the proper method due to never receiving any inputs.

This now handles the case where the page source is never set by
returning an EmptySplitPageSource
This commit allows users to perform row-level updates when using
the Iceberg connector with Java-based workers.

This is achieved by improving on the IcebergUpdatablePageSource
to implement the updateRows method. The implementation passes
a  generated row ID column as a field in the page required by
updateRows. Then during updateRows, generated a positionDelete
file entry for the row ID, and also writes the row's updated value to a
new page sink for the newly updated data.

These new files are then commited in a rowDelta transaction within
the Iceberg connector metadata after processing is complete.

Co-Authored-By: Nidhin Varghese <Nidhin.Varghese1@ibm.com>
Co-Authored-By: Anoop V S <anoop.v.s@ibm.com>
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from 3e8eacc to cd4df4f Compare February 7, 2025 20:32
@ZacBlanco ZacBlanco merged commit 2005b0b into prestodb:master Feb 7, 2025
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants