From c9e8a6b94b7b95ebb9d3db41ad7077b14531ff2b Mon Sep 17 00:00:00 2001 From: Jacek Laskowski Date: Fri, 1 Mar 2024 21:02:43 +0100 Subject: [PATCH] [CDF] Metadata Check for CDF Columns --- docs/OptimisticTransactionImpl.md | 45 +++++++++++++------ docs/change-data-feed/CDCReaderImpl.md | 8 ++-- .../ChangeDataFeedTableFeature.md | 4 +- .../FeatureAutomaticallyEnabledByMetadata.md | 3 +- 4 files changed, 39 insertions(+), 21 deletions(-) diff --git a/docs/OptimisticTransactionImpl.md b/docs/OptimisticTransactionImpl.md index b56bcdf2f3..2e2eb55149 100644 --- a/docs/OptimisticTransactionImpl.md +++ b/docs/OptimisticTransactionImpl.md @@ -6,13 +6,13 @@ In other words, `OptimisticTransactionImpl` is a set of [actions](Action.md) as ## Contract -###  Clock +### Clock { #clock } ```scala clock: Clock ``` -###  DeltaLog +### DeltaLog { #deltaLog } ```scala deltaLog: DeltaLog @@ -22,7 +22,7 @@ deltaLog: DeltaLog `deltaLog` is part of the [TransactionalWrite](TransactionalWrite.md#deltaLog) abstraction and seems to change it to `val` (from `def`). -###  Snapshot +### Snapshot { #snapshot } ```scala snapshot: Snapshot @@ -36,7 +36,7 @@ snapshot: Snapshot * [OptimisticTransaction](OptimisticTransaction.md) -## Table Version at Reading Time +## Table Version at Reading Time { #readVersion } ```scala readVersion: Long @@ -52,7 +52,7 @@ readVersion: Long * `WriteIntoDelta` is requested to [write](commands/WriteIntoDelta.md#write) * `ImplicitMetadataOperation` is requested to [updateMetadata](ImplicitMetadataOperation.md#updateMetadata) -## Transactional Commit +## Transactional Commit { #commit } ```scala commit( @@ -62,7 +62,7 @@ commit( `commit` attempts to commit the given [Action](Action.md)s (as part of the [Operation](Operation.md)) and gives the commit version. -### Usage +### Usage { #commit-usage } `commit` is used when: @@ -86,14 +86,6 @@ commit( * [UpdateCommand](commands/update/UpdateCommand.md) is executed * [WriteIntoDelta](commands/WriteIntoDelta.md) is executed -### performCdcMetadataCheck - -```scala -performCdcMetadataCheck(): Unit -``` - -`performCdcMetadataCheck`...FIXME - ### Preparing Commit `commit` then [prepares a commit](#prepareCommit) (that gives the final actions to commit that may be different from the given [action](Action.md)s). @@ -832,6 +824,31 @@ In other words, `canUpdateMetadata` holds `true` when both of the following hold * `WriteIntoDelta` is requested to [write data out](commands/WriteIntoDelta.md#write) +## Metadata Check for CDF Columns { #performCdcMetadataCheck } + +```scala +performCdcMetadataCheck(): Unit +``` + +??? warning "Procedure" + `performCdcMetadataCheck` is a procedure (returns `Unit`) so _what happens inside stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)). + +??? note "Noop" + `performCdcMetadataCheck` does nothing (_noop_) when executed with either the [newMetadata](#newMetadata) registry empty or [isCDCEnabledOnTable](change-data-feed/CDCReaderImpl.md#isCDCEnabledOnTable). + +For the [new metadata](#newMetadata) and [Change Data Feed feature enabled on the table](change-data-feed/CDCReaderImpl.md#isCDCEnabledOnTable), `performCdcMetadataCheck` takes the column names of the newly-assigned metadata and compares them with the [reserved column names of CDF (CDF-aware read schema)](change-data-feed/CDCReaderImpl.md#cdcReadSchema). + +If there are any reserved CDF column names found in the new metadata, `performCdcMetadataCheck` throws a `DeltaIllegalStateException` for the following: + +* CDF was not enabled previously (in the initial metadata of the table snapshot) but reserved columns are present in the new schema +* CDF was enabled but reserved columns are present in the new metadata (i.e., in the data) + +--- + +`performCdcMetadataCheck` is used when: + +* `OptimisticTransactionImpl` is requested to [commitImpl](#commitImpl) + ## Logging `OptimisticTransactionImpl` is a Scala trait and logging is configured using the logger of the [implementations](#implementations). diff --git a/docs/change-data-feed/CDCReaderImpl.md b/docs/change-data-feed/CDCReaderImpl.md index 9f2da9ba97..0bc2bc206d 100644 --- a/docs/change-data-feed/CDCReaderImpl.md +++ b/docs/change-data-feed/CDCReaderImpl.md @@ -275,15 +275,15 @@ isCDCEnabledOnTable( spark: SparkSession): Boolean ``` -`isCDCEnabledOnTable` is an alias of [metadataRequiresFeatureToBeEnabled](ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled). +`isCDCEnabledOnTable` [checks if the given metadata requires the Change Data Feed feature to be enabled](ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled) (based on [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property). --- `isCDCEnabledOnTable` is used when: -* `OptimisticTransactionImpl` is requested to [performCdcMetadataCheck](../OptimisticTransactionImpl.md#performCdcMetadataCheck) and [performCdcColumnMappingCheck](../OptimisticTransactionImpl.md#performCdcColumnMappingCheck) -* `WriteIntoDelta` is requested to [write](../commands/WriteIntoDelta.md#write) -* `CDCReaderImpl` is requested to [changesToDF](#changesToDF) +* `OptimisticTransactionImpl` is requested to [performCdcColumnMappingCheck](../OptimisticTransactionImpl.md#performCdcColumnMappingCheck) and [performCdcMetadataCheck](../OptimisticTransactionImpl.md#performCdcMetadataCheck) +* `WriteIntoDelta` is requested to [write data out](../commands/WriteIntoDelta.md#write) +* `CDCReaderImpl` is requested to [create a DataFrame of changes](#changesToDF) * `TransactionalWrite` is requested to [performCDCPartition](../TransactionalWrite.md#performCDCPartition) ## Logging diff --git a/docs/change-data-feed/ChangeDataFeedTableFeature.md b/docs/change-data-feed/ChangeDataFeedTableFeature.md index 1b5677d181..8d061cb9e1 100644 --- a/docs/change-data-feed/ChangeDataFeedTableFeature.md +++ b/docs/change-data-feed/ChangeDataFeedTableFeature.md @@ -1,5 +1,7 @@ # ChangeDataFeedTableFeature +`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature. + `ChangeDataFeedTableFeature` is a [LegacyWriterFeature](../table-features/LegacyWriterFeature.md) with the following properties: Property | Value @@ -7,8 +9,6 @@ Property | Value [Name](../table-features/LegacyWriterFeature.md#name) | `changeDataFeed` [Minimum writer protocol version](../table-features/LegacyWriterFeature.md#minWriterVersion) | `4` -`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature. - ## metadataRequiresFeatureToBeEnabled { #metadataRequiresFeatureToBeEnabled } ??? note "FeatureAutomaticallyEnabledByMetadata" diff --git a/docs/table-features/FeatureAutomaticallyEnabledByMetadata.md b/docs/table-features/FeatureAutomaticallyEnabledByMetadata.md index 4a80c4875f..5fcd91c142 100644 --- a/docs/table-features/FeatureAutomaticallyEnabledByMetadata.md +++ b/docs/table-features/FeatureAutomaticallyEnabledByMetadata.md @@ -12,13 +12,14 @@ metadataRequiresFeatureToBeEnabled( spark: SparkSession): Boolean ``` -Controls whether this [TableFeature](TableFeature.md) should be supported and enabled because its metadata requirements are satisfied +Controls whether this [TableFeature](TableFeature.md) should be enabled because its metadata requirements are satisfied (e.g., a table property is enabled in the [configuration](../Metadata.md#configuration) of the given [Metadata](../Metadata.md)) Enabled (`true`) for automatically enabled features (based on [metadata](../Metadata.md) configuration) See: * [AppendOnlyTableFeature](../append-only-tables/AppendOnlyTableFeature.md#metadataRequiresFeatureToBeEnabled) +* [ChangeDataFeedTableFeature](../change-data-feed/ChangeDataFeedTableFeature.md#metadataRequiresFeatureToBeEnabled) * [DeletionVectorsTableFeature](../deletion-vectors/DeletionVectorsTableFeature.md#metadataRequiresFeatureToBeEnabled) * [RowTrackingFeature](../row-tracking/RowTrackingFeature.md#metadataRequiresFeatureToBeEnabled)