Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untracked Parquet Files on Storage Due to Schema Mismatch During Append #493

Open
JosepSampe opened this issue Nov 28, 2024 · 2 comments
Open
Assignees

Comments

@JosepSampe
Copy link
Member

In the Qbeast Delta implementation, appending data to an existing Delta table with a mismatched schema, and without the mergeSchema flag set to true, results in an issue.

The current logic writes data to storage before performing schema validation. When a schema mismatch is detected, an exception is raised, leaving the Parquet files in the storage, and unreferenced in any transaction logs.

The schema validation process should be updated to occur before writing data, preventing unreferenced (or orphaned) Parquet files on storage and ensuring consistency between storage and transaction logs.

@JosepSampe JosepSampe added the type: bug Something isn't working label Nov 28, 2024
@osopardo1
Copy link
Member

osopardo1 commented Nov 28, 2024

I would not categorize this as a bug. It's ok to have files in the storage that are not present in the DeltaLog. This is how Optimistic Concurrency works, and that is why there's a Log in place. It happens the same when Deleting or Updating the data using Copy On Write. Another thing is the documentation. If the user wants to read the Table as Parquet, it should know this in advance.

Nevertheless, I agree that checking that parameter before would be a necessary enhancement. But because it would skip a computer-intensive process, not because it ensures consistency between storage and log.

@JosepSampe JosepSampe removed the type: bug Something isn't working label Nov 28, 2024
@fpj
Copy link
Contributor

fpj commented Jan 13, 2025

Related to #278.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants