Skip to content

Commit

Permalink
System-Wide Defaults of Table Properties + CDF's spark.databricks.del…
Browse files Browse the repository at this point in the history
…ta.properties.defaults.enableChangeDataFeed system-wide configuration property
  • Loading branch information
jaceklaskowski committed Jan 28, 2024
1 parent a06a78c commit d8c10ad
Show file tree
Hide file tree
Showing 41 changed files with 120 additions and 114 deletions.
4 changes: 2 additions & 2 deletions docs/DeltaErrorsBase.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ convertToDeltaRowTrackingEnabledWithoutStatsCollection: Throwable
* `messageParameters`:
* [spark.databricks.delta.stats.collect](configuration-properties/index.md#DELTA_COLLECT_STATS)
* The [default session config key](table-features/TableFeatureProtocolUtils.md#defaultPropertyKey) of [RowTrackingFeature](row-tracking/RowTrackingFeature.md)
* The [default table property key](DeltaConfig.md#defaultTablePropertyKey) of [delta.enableRowTracking](DeltaConfigs.md#ROW_TRACKING_ENABLED)
* The [default table property key](table-properties/DeltaConfig.md#defaultTablePropertyKey) of [delta.enableRowTracking](table-properties/DeltaConfigs.md#ROW_TRACKING_ENABLED)

---

Expand All @@ -32,7 +32,7 @@ modifyAppendOnlyTableException(
* `errorClass`: `DELTA_CANNOT_MODIFY_APPEND_ONLY`
* `messageParameters`:
* The given `tableName`
* [delta.appendOnly](DeltaConfigs.md#IS_APPEND_ONLY)
* [delta.appendOnly](table-properties/DeltaConfigs.md#IS_APPEND_ONLY)

---

Expand Down
6 changes: 3 additions & 3 deletions docs/DeltaLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,7 +380,7 @@ getSnapshotAt(
checkpointInterval: Int
```

`checkpointInterval` is the current value of [checkpointInterval](DeltaConfigs.md#CHECKPOINT_INTERVAL) table property ([from](DeltaConfigs.md#fromMetaData) the [Metadata](#metadata)).
`checkpointInterval` is the current value of [checkpointInterval](table-properties/DeltaConfigs.md#CHECKPOINT_INTERVAL) table property ([from](table-properties/DeltaConfigs.md#fromMetaData) the [Metadata](#metadata)).

`checkpointInterval` is used when:

Expand Down Expand Up @@ -573,7 +573,7 @@ minFileRetentionTimestamp: Long
tombstoneRetentionMillis: Long
```

`tombstoneRetentionMillis` gives the value of [deletedFileRetentionDuration](DeltaConfigs.md#TOMBSTONE_RETENTION) table property ([from](DeltaConfigs.md#fromMetaData) the [Metadata](#metadata)).
`tombstoneRetentionMillis` gives the value of [deletedFileRetentionDuration](table-properties/DeltaConfigs.md#TOMBSTONE_RETENTION) table property ([from](table-properties/DeltaConfigs.md#fromMetaData) the [Metadata](#metadata)).

`tombstoneRetentionMillis` is used when:

Expand Down Expand Up @@ -707,7 +707,7 @@ assertRemovable(): Unit
??? warning "Procedure"
`assertRemovable` is a procedure (returns `Unit`) so _what happens inside stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)).

With [delta.appendOnly](DeltaConfigs.md#IS_APPEND_ONLY) table property enabled, `assertRemovable` throws a [DeltaUnsupportedOperationException](DeltaErrors.md#modifyAppendOnlyTableException).
With [delta.appendOnly](table-properties/DeltaConfigs.md#IS_APPEND_ONLY) table property enabled, `assertRemovable` throws a [DeltaUnsupportedOperationException](DeltaErrors.md#modifyAppendOnlyTableException).

---

Expand Down
2 changes: 1 addition & 1 deletion docs/FileAction.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Isolation Level | Description
[SnapshotIsolation](IsolationLevel.md#SnapshotIsolation) | No data changes (`dataChange` is `false` for all `FileAction`s to be committed)
[Serializable](IsolationLevel.md#Serializable) |  

There can be no [RemoveFile](RemoveFile.md)s with `dataChange` enabled for [appendOnly](DeltaConfigs.md#appendOnly) unmodifiable tables (or an [UnsupportedOperationException is thrown](DeltaLog.md#assertRemovable)).
There can be no [RemoveFile](RemoveFile.md)s with `dataChange` enabled for [appendOnly](table-properties/DeltaConfigs.md#appendOnly) unmodifiable tables (or an [UnsupportedOperationException is thrown](DeltaLog.md#assertRemovable)).

dataChange Value | When
-----------------|---------
Expand Down
2 changes: 1 addition & 1 deletion docs/InitialSnapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Snapshot | Value

Metadata | Value
---------|------
[configuration](#configuration) | [mergeGlobalConfigs](DeltaConfigs.md#mergeGlobalConfigs)
[configuration](#configuration) | [mergeGlobalConfigs](table-properties/DeltaConfigs.md#mergeGlobalConfigs)
[createdTime](#createdTime) | Current time (in ms)

## <span id="computedState"> computedState
Expand Down
2 changes: 1 addition & 1 deletion docs/Metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ deltaLog.snapshot.metadata.id
columnMappingMode: DeltaColumnMappingMode
```

`columnMappingMode` is the value of [columnMapping.mode](DeltaConfigs.md#COLUMN_MAPPING_MODE) table property ([from this Metadata](DeltaConfig.md#fromMetaData)).
`columnMappingMode` is the value of [columnMapping.mode](table-properties/DeltaConfigs.md#COLUMN_MAPPING_MODE) table property ([from this Metadata](table-properties/DeltaConfig.md#fromMetaData)).

`columnMappingMode` is used when:

Expand Down
4 changes: 2 additions & 2 deletions docs/MetadataCleanup.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@

### <span id="enableExpiredLogCleanup"> enableExpiredLogCleanup

`MetadataCleanup` uses [enableExpiredLogCleanup](DeltaConfigs.md#ENABLE_EXPIRED_LOG_CLEANUP) table configuration to enable [log cleanup](#doLogCleanup).
`MetadataCleanup` uses [enableExpiredLogCleanup](table-properties/DeltaConfigs.md#ENABLE_EXPIRED_LOG_CLEANUP) table configuration to enable [log cleanup](#doLogCleanup).

### <span id="deltaRetentionMillis"> logRetentionDuration

`MetadataCleanup` uses [logRetentionDuration](DeltaConfigs.md#LOG_RETENTION) table configuration for [cleanUpExpiredLogs](#cleanUpExpiredLogs) (to determine `fileCutOffTime`).
`MetadataCleanup` uses [logRetentionDuration](table-properties/DeltaConfigs.md#LOG_RETENTION) table configuration for [cleanUpExpiredLogs](#cleanUpExpiredLogs) (to determine `fileCutOffTime`).

## <span id="doLogCleanup"> Cleaning Up Expired Logs

Expand Down
4 changes: 2 additions & 2 deletions docs/OptimisticTransactionImpl.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ With all [action](FileAction.md)s with [dataChange](FileAction.md#dataChange) fl

### <span id="commit-registerPostCommitHook"> Registering Post-Commit Hook

`commit` [registers](#registerPostCommitHook) the [GenerateSymlinkManifest](post-commit-hooks/GenerateSymlinkManifest.md) post-commit hook when there is a [FileAction](FileAction.md) among the actions and the [compatibility.symlinkFormatManifest.enabled](DeltaConfigs.md#SYMLINK_FORMAT_MANIFEST_ENABLED) table property is enabled.
`commit` [registers](#registerPostCommitHook) the [GenerateSymlinkManifest](post-commit-hooks/GenerateSymlinkManifest.md) post-commit hook when there is a [FileAction](FileAction.md) among the actions and the [compatibility.symlinkFormatManifest.enabled](table-properties/DeltaConfigs.md#SYMLINK_FORMAT_MANIFEST_ENABLED) table property is enabled.

### <span id="commit-doCommitRetryIteratively"><span id="commit-commitVersion"><span id="commit-needsCheckpoint"> doCommitRetryIteratively

Expand Down Expand Up @@ -393,7 +393,7 @@ Attempting to commit version [attemptVersion] with [n] actions with [isolationLe
A commit triggers checkpointing when the following all hold:

1. The committed version is any version greater than `0`
1. The committed version is a multiple of [delta.checkpointInterval](DeltaConfigs.md#CHECKPOINT_INTERVAL) table property
1. The committed version is a multiple of [delta.checkpointInterval](table-properties/DeltaConfigs.md#CHECKPOINT_INTERVAL) table property

### <span id="doCommit-stats"> CommitStats

Expand Down
4 changes: 2 additions & 2 deletions docs/Protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Protocol(0, 2)

### <span id="requiredMinimumProtocol-appendOnly"> Append-Only Table

`requiredMinimumProtocol` reads [appendOnly](DeltaConfigs.md#IS_APPEND_ONLY) table property (from the [table configuration](Metadata.md#configuration) of the given [Metadata](Metadata.md)).
`requiredMinimumProtocol` reads [appendOnly](table-properties/DeltaConfigs.md#IS_APPEND_ONLY) table property (from the [table configuration](Metadata.md#configuration) of the given [Metadata](Metadata.md)).

If set, `requiredMinimumProtocol` creates a new [Protocol](#creating-instance) with the [minWriterVersion](#minWriterVersion) to be `3`.

Expand Down Expand Up @@ -144,7 +144,7 @@ Protocol(0, 4)

### <span id="requiredMinimumProtocol-change-data-feed"> Change Data Feed

`requiredMinimumProtocol` checks whether [delta.enableChangeDataFeed](DeltaConfigs.md#CHANGE_DATA_FEED) table property is enabled (in the given [Metadata](Metadata.md)).
`requiredMinimumProtocol` checks whether [delta.enableChangeDataFeed](table-properties/DeltaConfigs.md#CHANGE_DATA_FEED) table property is enabled (in the given [Metadata](Metadata.md)).

If enabled, `requiredMinimumProtocol` creates a new [Protocol](#creating-instance) with the [minWriterVersion](#minWriterVersion) to be `4`.

Expand Down
2 changes: 1 addition & 1 deletion docs/Snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ init(): Unit
numIndexedCols: Int
```

`numIndexedCols` is the value of [dataSkippingNumIndexedCols](DeltaConfigs.md#DATA_SKIPPING_NUM_INDEXED_COLS) table property.
`numIndexedCols` is the value of [dataSkippingNumIndexedCols](table-properties/DeltaConfigs.md#DATA_SKIPPING_NUM_INDEXED_COLS) table property.

??? note "Lazy Value"
`numIndexedCols` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Expand Down
2 changes: 1 addition & 1 deletion docs/TransactionalWrite.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ getOptionalStatsTrackerAndStatsCollection(

`getOptionalStatsTrackerAndStatsCollection` [getStatsSchema](#getStatsSchema) (for the given `output` and `partitionSchema`).

`getOptionalStatsTrackerAndStatsCollection` reads the value of [delta.dataSkippingNumIndexedCols](DeltaConfigs.md#DATA_SKIPPING_NUM_INDEXED_COLS) table property (from the [Metadata](OptimisticTransactionImpl.md#metadata)).
`getOptionalStatsTrackerAndStatsCollection` reads the value of [delta.dataSkippingNumIndexedCols](table-properties/DeltaConfigs.md#DATA_SKIPPING_NUM_INDEXED_COLS) table property (from the [Metadata](OptimisticTransactionImpl.md#metadata)).

`getOptionalStatsTrackerAndStatsCollection` creates a [StatisticsCollection](StatisticsCollection.md) (with the [tableDataSchema](StatisticsCollection.md#tableDataSchema) based on [spark.databricks.delta.stats.collect.using.tableSchema](configuration-properties/DeltaSQLConf.md#DELTA_COLLECT_STATS_USING_TABLE_SCHEMA) configuration property).

Expand Down
4 changes: 2 additions & 2 deletions docs/append-only-tables/AppendOnlyTableFeature.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Property | Value
[Name](../table-features/LegacyWriterFeature.md#name) | `appendOnly`
[Minimum writer protocol version](../table-features/LegacyWriterFeature.md#minWriterVersion) | `2`

`AppendOnlyTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.appendOnly](../DeltaConfigs.md#appendOnly) table property to control [Append-Only Tables](index.md) feature.
`AppendOnlyTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.appendOnly](../table-properties/DeltaConfigs.md#appendOnly) table property to control [Append-Only Tables](index.md) feature.

## metadataRequiresFeatureToBeEnabled { #metadataRequiresFeatureToBeEnabled }

Expand All @@ -21,4 +21,4 @@ Property | Value

`metadataRequiresFeatureToBeEnabled` is part of the [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md#metadataRequiresFeatureToBeEnabled) abstraction.

`metadataRequiresFeatureToBeEnabled` is the value of [delta.appendOnly](../DeltaConfigs.md#IS_APPEND_ONLY) table property (from the [Metadata](../DeltaConfig.md#fromMetaData)).
`metadataRequiresFeatureToBeEnabled` is the value of [delta.appendOnly](../table-properties/DeltaConfigs.md#IS_APPEND_ONLY) table property (from the [Metadata](../table-properties/DeltaConfig.md#fromMetaData)).
4 changes: 2 additions & 2 deletions docs/append-only-tables/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ hide:
* `DeltaSink` to [addBatch](../spark-connector/DeltaSink.md#addBatch) in `Complete` output mode
* [RemoveFile](../RemoveFile.md)s with [dataChange](../RemoveFile.md#dataChange) (at [prepareCommit](../OptimisticTransactionImpl.md#prepareCommit))

Append-Only Tables is enabled on a delta table using [delta.appendOnly](../DeltaConfigs.md#IS_APPEND_ONLY) table property (indirectly, through [AppendOnlyTableFeature](AppendOnlyTableFeature.md) that is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) and uses this table property).
Append-Only Tables is enabled on a delta table using [delta.appendOnly](../table-properties/DeltaConfigs.md#IS_APPEND_ONLY) table property (indirectly, through [AppendOnlyTableFeature](AppendOnlyTableFeature.md) that is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) and uses this table property).

## Demo

Create a delta table with [delta.appendOnly](../DeltaConfigs.md#appendOnly) table property enabled.
Create a delta table with [delta.appendOnly](../table-properties/DeltaConfigs.md#appendOnly) table property enabled.

=== "SQL"

Expand Down
4 changes: 2 additions & 2 deletions docs/auto-compaction/AutoCompactBase.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ getAutoCompactType(
`getAutoCompactType` is the value of the following (in the order of precedence):

1. [spark.databricks.delta.autoCompact.enabled](../configuration-properties/DeltaSQLConf.md#autoCompact.enabled), if configured.
1. [delta.autoOptimize](../DeltaConfigs.md#AUTO_OPTIMIZE) table property
1. [delta.autoOptimize.autoCompact](../DeltaConfigs.md#AUTO_COMPACT) table property
1. [delta.autoOptimize](../table-properties/DeltaConfigs.md#AUTO_OPTIMIZE) table property
1. [delta.autoOptimize.autoCompact](../table-properties/DeltaConfigs.md#AUTO_COMPACT) table property

`getAutoCompactType` defaults to `false`.

Expand Down
4 changes: 2 additions & 2 deletions docs/auto-compaction/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

Auto Compaction can be enabled system-wide using [spark.databricks.delta.autoCompact.enabled](../configuration-properties/index.md#spark.databricks.delta.autoCompact.enabled) configuration property.

Auto Compaction can be enabled on a delta table using [delta.autoOptimize.autoCompact](../DeltaConfigs.md#autoOptimize.autoCompact) table property.
Auto Compaction can be enabled on a delta table using [delta.autoOptimize.autoCompact](../table-properties/DeltaConfigs.md#autoOptimize.autoCompact) table property.

??? note "delta.autoOptimize Table Property is Deprecated"
[delta.autoOptimize](../DeltaConfigs.md#delta.autoOptimize) table property is deprecated.
[delta.autoOptimize](../table-properties/DeltaConfigs.md#delta.autoOptimize) table property is deprecated.

Auto Compaction uses [AutoCompact](AutoCompact.md) post-commit hook to be [executed](AutoCompactBase.md#run) at a [successful transaction commit](../OptimisticTransactionImpl.md#registerPostCommitHook) if there are files written to a delta table that can leverage compaction after a commit.
4 changes: 2 additions & 2 deletions docs/change-data-feed/ChangeDataFeedTableFeature.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Property | Value
[Name](../table-features/LegacyWriterFeature.md#name) | `changeDataFeed`
[Minimum writer protocol version](../table-features/LegacyWriterFeature.md#minWriterVersion) | `4`

`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature.
`ChangeDataFeedTableFeature` is a [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md) that uses [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property to control [Change Data Feed](index.md) feature.

## metadataRequiresFeatureToBeEnabled { #metadataRequiresFeatureToBeEnabled }

Expand All @@ -21,4 +21,4 @@ Property | Value

`metadataRequiresFeatureToBeEnabled` is part of the [FeatureAutomaticallyEnabledByMetadata](../table-features/FeatureAutomaticallyEnabledByMetadata.md#metadataRequiresFeatureToBeEnabled) abstraction.

`metadataRequiresFeatureToBeEnabled` is the value of [delta.enableChangeDataFeed](../DeltaConfigs.md#enableChangeDataFeed) table property in (the [configuration](../Metadata.md#configuration) of) the given [Metadata](../Metadata.md).
`metadataRequiresFeatureToBeEnabled` is the value of [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property in (the [configuration](../Metadata.md#configuration) of) the given [Metadata](../Metadata.md).
4 changes: 3 additions & 1 deletion docs/change-data-feed/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

Change Data Feed can be enabled on a delta table using [delta.enableChangeDataFeed](#delta.enableChangeDataFeed) table property.

Change Data Feed can be enabled globally (on all new delta tables) using [spark.databricks.delta.properties.defaults.enableChangeDataFeed](../table-properties/DeltaConfigs.md#spark.databricks.delta.properties.defaults) system-wide configuration property.

CDF data changes are written out (by [DelayedCommitProtocol](../DelayedCommitProtocol.md)) to [_change_data](#_change_data) directory as `cdc-`-prefixed parquet-encoded change data files.

With [CDF-Aware Table Scan (CDF Read)](CDCReaderImpl.md#isCDCRead) (based on [readChangeFeed](../spark-connector/options.md#readChangeFeed) read option), [loading a delta table](../spark-connector/DeltaDataSource.md#RelationProvider-createRelation) gives data changes (not the data of a particular version of the delta table).
Expand All @@ -17,7 +19,7 @@ Change Data Feed was released in Delta Lake 2.0.0 (that was tracked under [Suppo

## delta.enableChangeDataFeed { #delta.enableChangeDataFeed }

Change Data Feed can be enabled on a delta table using [delta.enableChangeDataFeed](../DeltaConfigs.md#enableChangeDataFeed) table property (through [ChangeDataFeedTableFeature](ChangeDataFeedTableFeature.md)).
Change Data Feed can be enabled on a delta table using [delta.enableChangeDataFeed](../table-properties/DeltaConfigs.md#enableChangeDataFeed) table property (through [ChangeDataFeedTableFeature](ChangeDataFeedTableFeature.md)).

```sql
ALTER TABLE delta_demo
Expand Down
2 changes: 1 addition & 1 deletion docs/checkpoints/Checkpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ In the end, `checkpoint` [cleans up the expired logs](../MetadataCleanup.md#doLo

`checkpoint` is used when:

* `OptimisticTransactionImpl` is requested to [postCommit](../OptimisticTransactionImpl.md#postCommit) (based on [checkpoint interval](../DeltaConfigs.md#CHECKPOINT_INTERVAL) table property)
* `OptimisticTransactionImpl` is requested to [postCommit](../OptimisticTransactionImpl.md#postCommit) (based on [checkpoint interval](../table-properties/DeltaConfigs.md#CHECKPOINT_INTERVAL) table property)
* `DeltaCommand` is requested to [updateAndCheckpoint](../commands/DeltaCommand.md#updateAndCheckpoint)

### checkpointAndCleanUpDeltaLog { #checkpointAndCleanUpDeltaLog }
Expand Down
2 changes: 1 addition & 1 deletion docs/checkpoints/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[Delta Table Checkpoint](Checkpoints.md#checkpoint) is a process of writing out a [Snapshot](../Snapshot.md) of a delta table into one or more checkpoint files for faster state reconstruction (_future replays of the log_).

Delta Table Checkpoint happens regularly at a [transaction commit](../OptimisticTransactionImpl.md#doCommit) every [checkpoint interval](../DeltaConfigs.md#CHECKPOINT_INTERVAL) or once at a [transaction commit](../OptimisticTransactionImpl.md#updateAndCheckpoint) for the following commands:
Delta Table Checkpoint happens regularly at a [transaction commit](../OptimisticTransactionImpl.md#doCommit) every [checkpoint interval](../table-properties/DeltaConfigs.md#CHECKPOINT_INTERVAL) or once at a [transaction commit](../OptimisticTransactionImpl.md#updateAndCheckpoint) for the following commands:

* [CloneTableBase](../commands/clone/CloneTableBase.md)
* [ConvertToDeltaCommand](../commands/convert/ConvertToDeltaCommand.md)
Expand Down
Loading

0 comments on commit d8c10ad

Please sign in to comment.