Skip to content

Commit

Permalink
[Protocol Change Request] Add VacuumProtocolCheck PROTOCOL change req…
Browse files Browse the repository at this point in the history
…uest (#2693)

## Protocol Change Request

### Description of the protocol change
Adds the VacuumProtocolCheck PROTOCOL change proposal. Design Doc:
https://docs.google.com/document/d/15o8WO2T0vN21S5JG-FT_ZNhXFCWyh0i9tqhr9kBmZpE/edit#heading=h.4cz970y1mk93

Protocol RFC issue: #2630

### Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or
another member of your organization be willing to contribute this
feature to the Delta Lake code base?

- [x] Yes. I can contribute.
- [ ] Yes. I would be willing to contribute with guidance from the Delta
Lake community.
- [ ] No. I cannot contribute at this time.

---------

Co-authored-by: Prakhar Jain <prakharjain09@gmail.com>
  • Loading branch information
sumeet-db and prakharjain09 authored Mar 4, 2024
1 parent fda41dd commit f71fef7
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 4 deletions.
10 changes: 6 additions & 4 deletions protocol_rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,

### Proposed RFCs

| Date proposed | RFC file | Github issue | RFC title |
|:-|:-|:-|:-|
| 2023-02-02 | [in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
| Date proposed | RFC file | Github issue | RFC title |
|:--------------|:-----------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:------------------------------|
| 2023-02-02 | [in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits |
| 2023-02-28 | [vacuum-protocol-check.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/vacuum-protocol-check.md) | https://github.com/delta-io/delta/issues/2630 | Enforce Vacuum Protocol Check |

### Accepted RFCs

Expand Down
29 changes: 29 additions & 0 deletions protocol_rfcs/vacuum-protocol-check.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Vacuum Protocol Check

This RFC introduces a new ReaderWriter feature named `vacuumProtocolCheck`. This feature ensures that the Vacuum operation consistently performs both reader and writer protocol check. The motivation for this change is to address inconsistencies in Vacuum's behavior across different delta implementations, as some of them skip the writer protocol checks in practice. This omission blocks any protocol changes that might impact vacuum, including improvements to vacuum itself. The writer protocol check addresses an initial oversight in the original Delta specification where an older Delta Client executing a Vacuum command might incorrectly delete files that are still in use by newer versions, potentially leading to data corruption.

**For further discussions about this protocol change, please refer to the Github issue - https://github.com/delta-io/delta/issues/2630**

--------


> ***New Section***
# VACUUM Protocol Check

The `vacuumProtocolCheck` ReaderWriter feature ensures consistent application of reader and writer protocol checks during `VACUUM` operations, addressing potential protocol discrepancies and mitigating the risk of data corruption due to skipped writer checks.

Enablement:
- The table must be on Writer Version 7 and Reader Version 3.
- The feature `vacuumProtocolCheck` must exist in the table `protocol`'s `writerFeatures` and `readerFeatures`.

## Writer Requirements for Vacuum Protocol Check

This feature affects only the VACUUM operations; standard commits remain unaffected.

Before performing a VACUUM operation, writers must ensure that they check the table's write protocol. This is most easily implemented by adding an unconditional write protocol check for all tables, which removes the need to examine individual table properties.

Writers that do not implement VACUUM do not need to change anything and can safely write to tables that enable the feature.

## Recommendations for Readers of Tables with Vacuum Protocol Check feature

For tables with Vacuum Protocol Check enabled, readers don’t need to understand or change anything new; they just need to acknowledge the feature exists.

0 comments on commit f71fef7

Please sign in to comment.