Skip to content

Commit

Permalink
Dynamic Partition Overwrite
Browse files Browse the repository at this point in the history
jaceklaskowski committed Feb 11, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
1 parent 155bf40 commit ba48bbf
Showing 12 changed files with 67 additions and 18 deletions.
2 changes: 1 addition & 1 deletion docs/commands/DeltaCommand.md
Original file line number Diff line number Diff line change
@@ -247,7 +247,7 @@ createSetTransaction(

* `DeleteCommand` is requested to [performDelete](delete/DeleteCommand.md#performDelete)
* `MergeIntoCommand` is requested to [commitAndRecordStats](merge/MergeIntoCommand.md#commitAndRecordStats)
* `UpdateCommand` is requested to [performUpdate](update/UpdateCommand.md#performUpdate)
* [Update](update/index.md) command is executed
* `WriteIntoDelta` is requested to [write data out](WriteIntoDelta.md#write)

## Logging
2 changes: 1 addition & 1 deletion docs/commands/WriteIntoDelta.md
Original file line number Diff line number Diff line change
@@ -18,7 +18,7 @@

`WriteIntoDelta` is created when:

* [DeltaDynamicPartitionOverwriteCommand](DeltaDynamicPartitionOverwriteCommand.md) is executed
* [DeltaDynamicPartitionOverwriteCommand](../dynamic-partition-overwrite/DeltaDynamicPartitionOverwriteCommand.md) is executed
* `DeltaLog` is requested to [create an insertable HadoopFsRelation](../DeltaLog.md#createRelation) (when `DeltaDataSource` is requested to create a relation as a [CreatableRelationProvider](../spark-connector/DeltaDataSource.md#CreatableRelationProvider) or a [RelationProvider](../spark-connector/DeltaDataSource.md#RelationProvider))
* `DeltaCatalog` is requested to [create a delta table](../DeltaCatalog.md#createDeltaTable)
* `WriteIntoDeltaBuilder` is requested to [build a V1Write](../WriteIntoDeltaBuilder.md#build)
4 changes: 2 additions & 2 deletions docs/commands/delete/DeleteCommand.md
Original file line number Diff line number Diff line change
@@ -37,15 +37,15 @@
sparkSession: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run` requests the [TahoeFileIndex](#tahoeFileIndex) for the [DeltaLog](../../TahoeFileIndex.md#deltaLog) (and [asserts that the table is removable](../../DeltaLog.md#assertRemovable)).

`run` requests the `DeltaLog` to [start a new transaction](../../DeltaLog.md#withNewTransaction) for [performDelete](#performDelete).

In the end, `run` re-caches all cached plans (incl. this relation itself) by requesting the `CacheManager` ([Spark SQL]({{ book.spark_sql }}/CacheManager)) to recache the [target](#target).

## <span id="performDelete"> performDelete
## performDelete { #performDelete }

```scala
performDelete(
2 changes: 1 addition & 1 deletion docs/commands/merge/MergeIntoCommandBase.md
Original file line number Diff line number Diff line change
@@ -238,7 +238,7 @@ In the end, `buildTargetPlanWithIndex` creates a `Project` logical operator with
spark: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run` is a transactional operation that is made up of the following steps:

20 changes: 11 additions & 9 deletions docs/commands/update/UpdateCommand.md
Original file line number Diff line number Diff line change
@@ -28,18 +28,20 @@ Name | web UI
`scanTimeMs` | time taken to scan the files for matches
`rewriteTimeMs` | time taken to rewrite the matched files

## <span id="run"> Executing Command
## <span id="performUpdate"><span id="rewriteFiles"><span id="buildUpdatedColumns"> Executing Command { #run }

```scala
run(
sparkSession: SparkSession): Seq[Row]
```
??? note "RunnableCommand"

```scala
run(
sparkSession: SparkSession): Seq[Row]
```

`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) abstraction.
`run` is part of the `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/#run)) abstraction.

`run`...FIXME

### <span id="performUpdate"> performUpdate
### performUpdate

```scala
performUpdate(
@@ -50,7 +52,7 @@ performUpdate(

`performUpdate`...FIXME

### <span id="rewriteFiles"> rewriteFiles
### rewriteFiles

```scala
rewriteFiles(
@@ -64,7 +66,7 @@ rewriteFiles(

`rewriteFiles`...FIXME

### <span id="buildUpdatedColumns"> buildUpdatedColumns
### buildUpdatedColumns

```scala
buildUpdatedColumns(
13 changes: 13 additions & 0 deletions docs/configuration-properties/index.md
Original file line number Diff line number Diff line change
@@ -184,6 +184,19 @@ Default: `3`

Default: `.s3-optimization-`

### <span id="dynamicPartitionOverwrite.enabled"><span id="DYNAMIC_PARTITION_OVERWRITE_ENABLED"> dynamicPartitionOverwrite.enabled { #spark.databricks.delta.dynamicPartitionOverwrite.enabled }

**spark.databricks.delta.dynamicPartitionOverwrite.enabled**

**(internal)** Enables [Dynamic Partition Overwrite](../dynamic-partition-overwrite/index.md) (whether to overwrite partitions dynamically when `partitionOverwriteMode` is `dynamic` in either the SQL configuration or a `DataFrameWriter` option).
When disabled, `partitionOverwriteMode` will be ignored.

Default: `true`

Used when:

* `DeltaWriteOptionsImpl` is requested to [isDynamicPartitionOverwriteMode](../spark-connector/DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode)

### <span id="spark.databricks.delta.history.maxKeysPerList"><span id="DELTA_HISTORY_PAR_SEARCH_THRESHOLD"> history.maxKeysPerList { #history.maxKeysPerList }

**spark.databricks.delta.history.maxKeysPerList**
4 changes: 4 additions & 0 deletions docs/dynamic-partition-overwrite/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: Dynamic Partition Overwrite
nav:
- index.md
- ...
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DeltaDynamicPartitionOverwriteCommand

`DeltaDynamicPartitionOverwriteCommand` is a `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) and a `V2WriteCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/V2WriteCommand/)) for dynamic partition overwrite using [WriteIntoDelta](WriteIntoDelta.md).
`DeltaDynamicPartitionOverwriteCommand` is a `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) and a `V2WriteCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/V2WriteCommand/)) for dynamic partition overwrite using [WriteIntoDelta](../commands/WriteIntoDelta.md).

`DeltaDynamicPartitionOverwriteCommand` sets [partitionOverwriteMode](../spark-connector/options.md#partitionOverwriteMode) option as `DYNAMIC` before [write](#run).

23 changes: 23 additions & 0 deletions docs/dynamic-partition-overwrite/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
hide:
- toc
---

# Dynamic Partition Overwrite

**Dynamic Partition Overwrite** is a `DataFrameWriter` ([Spark SQL]({{ book.spark_sql }}/DataFrameWriter)) feature that allows to only overwrite partitions (of a partitioned delta table) that have data written into it.

Dynamic Partition Overwrite can be enabled system-wide using [spark.databricks.delta.dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) configuration property.

??? note "DeltaIllegalArgumentException"
It is a [DeltaIllegalArgumentException](../spark-connector/DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode) for [spark.databricks.delta.dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) configuration property disabled yet the Dynamic Partition Overwrite Mode is [dynamic](../spark-connector/DeltaOptions.md#DYNAMIC).

!!! note "Conflicts with `replaceWhere`"
Dynamic Partition Overwrite cannot be used with [replaceWhere](../spark-connector/options.md#replaceWhere) option as they both specify which data to overwrite.

## Partition Overwrite Mode

**Partition Overwrite Mode** can be one of the following values (case-insensitive):

* [dynamic](../spark-connector/DeltaOptions.md#DYNAMIC)
* [static](../spark-connector/DeltaOptions.md#STATIC)
9 changes: 8 additions & 1 deletion docs/spark-connector/DeltaWriteOptionsImpl.md
Original file line number Diff line number Diff line change
@@ -55,7 +55,14 @@ rearrangeOnly: Boolean
isDynamicPartitionOverwriteMode: Boolean
```

`isDynamicPartitionOverwriteMode`...FIXME
`isDynamicPartitionOverwriteMode` determines the **partition overwrite mode** based on the value of [partitionOverwriteMode](#partitionOverwriteMode) option (in the [options](DeltaOptionParser.md#options)), if specified, or defaults to the value of `spark.sql.sources.partitionOverwriteMode` ([Spark SQL]({{ book.spark_sql }}/configuration-properties/#spark.sql.sources.partitionOverwriteMode)).

??? note "DeltaIllegalArgumentException: `DYNAMIC` mode with Dynamic Partition Overwrite disabled"
For `DYNAMIC` partition overwrite mode and [dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) disabled, `isDynamicPartitionOverwriteMode` reports a [DeltaIllegalArgumentException](../DeltaErrors.md#deltaDynamicPartitionOverwriteDisabled).

With [dynamicPartitionOverwrite.enabled](../configuration-properties/index.md#dynamicPartitionOverwrite.enabled) disabled and the partition overwrite mode is anything but [DYNAMIC](../spark-connector/DeltaOptions.md#DYNAMIC), `isDynamicPartitionOverwriteMode` is off (returns `false`).

Otherwise, `isDynamicPartitionOverwriteMode` is whether the partition overwrite mode is dynamic or not (static).

---

2 changes: 1 addition & 1 deletion docs/spark-connector/options.md
Original file line number Diff line number Diff line change
@@ -94,7 +94,7 @@ Mutually exclusive with [replaceWhere](#replaceWhere)

Used when:

* `DeltaDynamicPartitionOverwriteCommand` is executed
* [DeltaDynamicPartitionOverwriteCommand](../dynamic-partition-overwrite/DeltaDynamicPartitionOverwriteCommand.md) is executed (and sets `partitionOverwriteMode` to `DYNAMIC`)
* `DeltaWriteOptionsImpl` is requested to [isDynamicPartitionOverwriteMode](DeltaWriteOptionsImpl.md#isDynamicPartitionOverwriteMode) and for [partitionOverwriteModeInOptions](DeltaWriteOptionsImpl.md#partitionOverwriteModeInOptions)
* `WriteIntoDeltaBuilder` is requested to [overwriteDynamicPartitions](../WriteIntoDeltaBuilder.md#overwriteDynamicPartitions)

2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
@@ -164,6 +164,7 @@ nav:
- ... | data-skipping/**.md
- ... | deletion-vectors/**.md
- developer-api.md
- ... | dynamic-partition-overwrite/**.md
- ... | generated-columns/**.md
- installation.md
- ... | liquid-clustering/**.md
@@ -282,7 +283,6 @@ nav:
- create-table-like/index.md
- Delete:
- ... | flat | commands/delete/**.md
- DeltaDynamicPartitionOverwriteCommand: commands/DeltaDynamicPartitionOverwriteCommand.md
- DESCRIBE DETAIL:
- ... | flat | commands/describe-detail/**.md
- DESCRIBE HISTORY:

0 comments on commit ba48bbf

Please sign in to comment.