Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Console Message Storage to Reduce Excessive Metadata Generation in Iceberg Table #3286

Merged
merged 6 commits into from
Mar 3, 2025

Conversation

kunwp1
Copy link
Collaborator

@kunwp1 kunwp1 commented Feb 27, 2025

This PR addresses the excessive storage consumption caused by metadata generated when storing console messages into the Iceberg table. This simple workflow produced 1.2GB of files: console message data and related Iceberg metadata.

The root cause is similar to the issue addressed in PR #3281 — the frequent commits to the Iceberg table result in excessive metadata generation.

To mitigate this, I reduced the frequency of commits, which significantly reduces the amount of generated metadata.

After this change, the same workflow produces only 184KB of files, representing a dramatic reduction in storage usage.
Screenshot 2025-02-27 at 3 02 23 PM

@kunwp1 kunwp1 added the fix label Feb 27, 2025
@kunwp1 kunwp1 requested a review from bobbai00 February 27, 2025 23:06
@kunwp1 kunwp1 self-assigned this Feb 27, 2025
@kunwp1 kunwp1 marked this pull request as ready for review February 28, 2025 00:33
Copy link
Collaborator

@bobbai00 bobbai00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kunwp1 kunwp1 merged commit e73f155 into master Mar 3, 2025
8 checks passed
@kunwp1 kunwp1 deleted the chris-fix-storage-console-messages branch March 3, 2025 19:58
kunwp1 added a commit that referenced this pull request Mar 4, 2025
…iler Warnings (#3302)

This PR addresses the concurrency issues that emerged in the console
messages and runtime statistics Iceberg tables following the integration
of PRs #3286 and #3281. The aggressive removal of old snapshots and
metadata in those earlier changes caused failures during concurrent read
and write operations. To resolve this, snapshots and metadata are now
lazily removed—triggered either when a workflow restarts or upon
lifecycle expiration.

Additionally, this update removes the Scala compiler warnings that were
previously occurring during code compilation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants