Skip to content

Commit

Permalink
[CDF] Metadata Check for CDF Columns
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Mar 1, 2024
1 parent 08645d0 commit c9e8a6b
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 21 deletions.
45 changes: 31 additions & 14 deletions docs/OptimisticTransactionImpl.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@ In other words, `OptimisticTransactionImpl` is a set of [actions](Action.md) as

## Contract

### <span id="clock"> Clock
### Clock { #clock }

```scala
clock: Clock
```

### <span id="deltaLog"> DeltaLog
### DeltaLog { #deltaLog }

```scala
deltaLog: DeltaLog
Expand All @@ -22,7 +22,7 @@ deltaLog: DeltaLog

`deltaLog` is part of the [TransactionalWrite](TransactionalWrite.md#deltaLog) abstraction and seems to change it to `val` (from `def`).

### <span id="snapshot"> Snapshot
### Snapshot { #snapshot }

```scala
snapshot: Snapshot
Expand All @@ -36,7 +36,7 @@ snapshot: Snapshot

* [OptimisticTransaction](OptimisticTransaction.md)

## <span id="readVersion"> Table Version at Reading Time
## Table Version at Reading Time { #readVersion }

```scala
readVersion: Long
Expand All @@ -52,7 +52,7 @@ readVersion: Long
* `WriteIntoDelta` is requested to [write](commands/WriteIntoDelta.md#write)
* `ImplicitMetadataOperation` is requested to [updateMetadata](ImplicitMetadataOperation.md#updateMetadata)

## <span id="commit"> Transactional Commit
## Transactional Commit { #commit }

```scala
commit(
Expand All @@ -62,7 +62,7 @@ commit(

`commit` attempts to commit the given [Action](Action.md)s (as part of the [Operation](Operation.md)) and gives the commit version.

### <span id="commit-usage"> Usage
### Usage { #commit-usage }

`commit` is used when:

Expand All @@ -86,14 +86,6 @@ commit(
* [UpdateCommand](commands/update/UpdateCommand.md) is executed
* [WriteIntoDelta](commands/WriteIntoDelta.md) is executed

### <span id="performCdcMetadataCheck"> performCdcMetadataCheck

```scala
performCdcMetadataCheck(): Unit
```

`performCdcMetadataCheck`...FIXME

### <span id="commit-prepareCommit"><span id="commit-finalActions"> Preparing Commit

`commit` then [prepares a commit](#prepareCommit) (that gives the final actions to commit that may be different from the given [action](Action.md)s).
Expand Down Expand Up @@ -832,6 +824,31 @@ In other words, `canUpdateMetadata` holds `true` when both of the following hold

* `WriteIntoDelta` is requested to [write data out](commands/WriteIntoDelta.md#write)

## Metadata Check for CDF Columns { #performCdcMetadataCheck }

```scala
performCdcMetadataCheck(): Unit
```

??? warning "Procedure"
`performCdcMetadataCheck` is a procedure (returns `Unit`) so _what happens inside stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)).

??? note "Noop"
`performCdcMetadataCheck` does nothing (_noop_) when executed with either the [newMetadata](#newMetadata) registry empty or [isCDCEnabledOnTable](change-data-feed/CDCReaderImpl.md#isCDCEnabledOnTable).

For the [new metadata](#newMetadata) and [Change Data Feed feature enabled on the table](change-data-feed/CDCReaderImpl.md#isCDCEnabledOnTable), `performCdcMetadataCheck` takes the column names of the newly-assigned metadata and compares them with the [reserved column names of CDF (CDF-aware read schema)](change-data-feed/CDCReaderImpl.md#cdcReadSchema).

If there are any reserved CDF column names found in the new metadata, `performCdcMetadataCheck` throws a `DeltaIllegalStateException` for the following:

* CDF was not enabled previously (in the initial metadata of the table snapshot) but reserved columns are present in the new schema
* CDF was enabled but reserved columns are present in the new metadata (i.e., in the data)

---

`performCdcMetadataCheck` is used when:

* `OptimisticTransactionImpl` is requested to [commitImpl](#commitImpl)

## Logging

`OptimisticTransactionImpl` is a Scala trait and logging is configured using the logger of the [implementations](#implementations).
8 changes: 4 additions & 4 deletions docs/change-data-feed/CDCReaderImpl.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,15 +275,15 @@ isCDCEnabledOnTable(
spark: SparkSession): Boolean
```

`isCDCEnabledOnTable` is an alias of [metadataRequiresFeatureToBeEnabled](ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled).
`isCDCEnabledOnTable` [checks if the given metadata requires the Change Data Feed feature to be enabled](ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled) (based on [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property).

---

`isCDCEnabledOnTable` is used when:

* `OptimisticTransactionImpl` is requested to [performCdcMetadataCheck](../OptimisticTransactionImpl.md#performCdcMetadataCheck) and [performCdcColumnMappingCheck](../OptimisticTransactionImpl.md#performCdcColumnMappingCheck)
* `WriteIntoDelta` is requested to [write](../commands/WriteIntoDelta.md#write)
* `CDCReaderImpl` is requested to [changesToDF](#changesToDF)
* `OptimisticTransactionImpl` is requested to [performCdcColumnMappingCheck](../OptimisticTransactionImpl.md#performCdcColumnMappingCheck) and [performCdcMetadataCheck](../OptimisticTransactionImpl.md#performCdcMetadataCheck)
* `WriteIntoDelta` is requested to [write data out](../commands/WriteIntoDelta.md#write)
* `CDCReaderImpl` is requested to [create a DataFrame of changes](#changesToDF)
* `TransactionalWrite` is requested to [performCDCPartition](../TransactionalWrite.md#performCDCPartition)

## Logging
Expand Down
4 changes: 2 additions & 2 deletions docs/change-data-feed/ChangeDataFeedTableFeature.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# ChangeDataFeedTableFeature

`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature.

`ChangeDataFeedTableFeature` is a [LegacyWriterFeature](../table-features/LegacyWriterFeature.md) with the following properties:

Property | Value
---------|------
[Name](../table-features/LegacyWriterFeature.md#name) | `changeDataFeed`
[Minimum writer protocol version](../table-features/LegacyWriterFeature.md#minWriterVersion) | `4`

`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature.

## metadataRequiresFeatureToBeEnabled { #metadataRequiresFeatureToBeEnabled }

??? note "FeatureAutomaticallyEnabledByMetadata"
Expand Down
3 changes: 2 additions & 1 deletion docs/table-features/FeatureAutomaticallyEnabledByMetadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ metadataRequiresFeatureToBeEnabled(
spark: SparkSession): Boolean
```

Controls whether this [TableFeature](TableFeature.md) should be supported and enabled because its metadata requirements are satisfied
Controls whether this [TableFeature](TableFeature.md) should be enabled because its metadata requirements are satisfied (e.g., a table property is enabled in the [configuration](../Metadata.md#configuration) of the given [Metadata](../Metadata.md))

Enabled (`true`) for automatically enabled features (based on [metadata](../Metadata.md) configuration)

See:

* [AppendOnlyTableFeature](../append-only-tables/AppendOnlyTableFeature.md#metadataRequiresFeatureToBeEnabled)
* [ChangeDataFeedTableFeature](../change-data-feed/ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled)
* [DeletionVectorsTableFeature](../deletion-vectors/DeletionVectorsTableFeature.md#metadataRequiresFeatureToBeEnabled)
* [RowTrackingFeature](../row-tracking/RowTrackingFeature.md#metadataRequiresFeatureToBeEnabled)

Expand Down

0 comments on commit c9e8a6b

Please sign in to comment.