Skip to content

Commit

Permalink
Auto Compaction
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Jan 25, 2024
1 parent 86b2601 commit c9e96e0
Show file tree
Hide file tree
Showing 9 changed files with 119 additions and 2 deletions.
8 changes: 7 additions & 1 deletion docs/OptimisticTransactionImpl.md
Original file line number Diff line number Diff line change
Expand Up @@ -698,11 +698,17 @@ registerPostCommitHook(
hook: PostCommitHook): Unit
```

??? warning "Procedure"
`registerPostCommitHook` is a procedure (returns `Unit`) so _what happens inside stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)).

`registerPostCommitHook` registers (_adds_) the given [PostCommitHook](post-commit-hooks/PostCommitHook.md) to the [postCommitHooks](#postCommitHooks) internal registry.

---

`registerPostCommitHook` is used when:

* `OptimisticTransactionImpl` is created (and registers [CheckpointHook](checkpoints/CheckpointHook.md)) and [commitImpl](#commitImpl) (to register [GenerateSymlinkManifest](post-commit-hooks/GenerateSymlinkManifest.md))
* `OptimisticTransactionImpl` is requested to [commitImpl](#commitImpl)
* `TransactionalWrite` is requested to [write data out](TransactionalWrite.md#writeFiles)

## setNewProtocolWithFeaturesEnabledByMetadata { #setNewProtocolWithFeaturesEnabledByMetadata }

Expand Down
4 changes: 4 additions & 0 deletions docs/auto-compaction/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
title: Auto Compaction
nav:
- index.md
- ...
11 changes: 11 additions & 0 deletions docs/auto-compaction/AutoCompact.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# AutoCompact

`AutoCompact` is a [AutoCompactBase](AutoCompactBase.md).

??? note "case object"
`AutoCompact` is a `case object` in Scala which means it is a class that has exactly one instance (itself).
A `case object` is created lazily when it is referenced, like a `lazy val`.

Learn more in [Tour of Scala](https://docs.scala-lang.org/tour/singleton-objects.html).

`AutoCompact` is [registered](../OptimisticTransactionImpl.md#registerPostCommitHook) when `TransactionalWrite` is requested to [write data out](../TransactionalWrite.md#writeFiles) and there are indeed new files added and it is not [Optimize](../commands/optimize/index.md) command.
60 changes: 60 additions & 0 deletions docs/auto-compaction/AutoCompactBase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# AutoCompactBase

`AutoCompactBase` is an [extension](#contract) of the [PostCommitHook](../post-commit-hooks/PostCommitHook.md) abstraction for [post-commit hooks](#implementations) that [perform auto compaction](#run).

## Implementations

* [AutoCompact](AutoCompact.md)

## Name { #name }

??? note "PostCommitHook"

```scala
name: String
```

`name` is part of the [PostCommitHook](../post-commit-hooks/PostCommitHook.md#name) abstraction.

`name` is **Auto Compact**.

## Executing Post-Commit Hook { #run }

??? note "PostCommitHook"

```scala
run(
spark: SparkSession,
txn: OptimisticTransactionImpl,
committedVersion: Long,
postCommitSnapshot: Snapshot,
actions: Seq[Action]): Unit
```

`run` is part of the [PostCommitHook](../post-commit-hooks/PostCommitHook.md#run) abstraction.

`run` [determines the type of AutoCompact](#getAutoCompactType).

`run` returns (and hence skips auto compacting) when [shouldSkipAutoCompact](#shouldSkipAutoCompact) is enabled.

In the end, `run` [compactIfNecessary](#compactIfNecessary) with the following:

* `delta.commit.hooks.autoOptimize` operation name
* `maxDeletedRowsRatio` unspecified (`None`)

### compactIfNecessary { #compactIfNecessary }

```scala
compactIfNecessary(
spark: SparkSession,
txn: OptimisticTransactionImpl,
postCommitSnapshot: Snapshot,
opType: String,
maxDeletedRowsRatio: Option[Double]): Seq[OptimizeMetrics]
```

`compactIfNecessary` [prepareAutoCompactRequest](AutoCompactUtils.md#prepareAutoCompactRequest).

When [shouldCompact](AutoCompactRequest.md#shouldCompact) is disabled, `compactIfNecessary` returns no [OptimizeMetrics](../commands/optimize/OptimizeMetrics.md).

Otherwise, with [shouldCompact](AutoCompactRequest.md#shouldCompact) turned on, `compactIfNecessary` [performs auto compaction](AutoCompact.md#compact).
3 changes: 3 additions & 0 deletions docs/auto-compaction/AutoCompactRequest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# AutoCompactRequest

`AutoCompactRequest` is...FIXME
21 changes: 21 additions & 0 deletions docs/auto-compaction/AutoCompactUtils.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# AutoCompactUtils

## prepareAutoCompactRequest { #prepareAutoCompactRequest }

```scala
prepareAutoCompactRequest(
spark: SparkSession,
txn: OptimisticTransactionImpl,
postCommitSnapshot: Snapshot,
partitionsAddedToOpt: Option[PartitionKeySet],
opType: String,
maxDeletedRowsRatio: Option[Double]): AutoCompactRequest
```

`prepareAutoCompactRequest`...FIXME

---

`prepareAutoCompactRequest` is used when:

* `AutoCompactBase` is requested to [compactIfNecessary](AutoCompactBase.md#compactIfNecessary)
3 changes: 3 additions & 0 deletions docs/auto-compaction/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Auto Compaction

**Auto Compaction** feature in Delta Lake uses [AutoCompact](AutoCompact.md) post-commit hook to [run](AutoCompactBase.md#run) at a [successful transaction commit](../OptimisticTransactionImpl.md#registerPostCommitHook).
10 changes: 9 additions & 1 deletion docs/post-commit-hooks/PostCommitHook.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# PostCommitHook

`PostCommitHook` is an [abstraction](#contract) of [post-commit hooks](#implementations) to be [executed](#run) right after a successful [transaction commit](../OptimisticTransactionImpl.md#commit)).
`PostCommitHook` is an [abstraction](#contract) of [post-commit hooks](#implementations) to be [executed](#run) at the end of a successful [transaction commit](../OptimisticTransactionImpl.md#commit).

## Contract

Expand All @@ -14,6 +14,7 @@ User-friendly name of the hook (for error reporting)

See:

* [AutoCompactBase](../auto-compaction/AutoCompactBase.md#name)
* [CheckpointHook](../checkpoints/CheckpointHook.md#name)

Used when:
Expand All @@ -32,8 +33,12 @@ run(
committedActions: Seq[Action]): Unit
```

??? warning "Procedure"
`run` is a procedure (returns `Unit`) so _what happens inside stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)).

See:

* [AutoCompactBase](../auto-compaction/AutoCompactBase.md#run)
* [CheckpointHook](../checkpoints/CheckpointHook.md#run)

Used when:
Expand All @@ -42,5 +47,8 @@ Used when:

## Implementations

* [AutoCompactBase](../auto-compaction/AutoCompactBase.md)
* [CheckpointHook](../checkpoints/CheckpointHook.md)
* [GenerateSymlinkManifestImpl](GenerateSymlinkManifest.md)
* `IcebergConverterHook`
* `UpdateCatalogBase`
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,7 @@ nav:
- Features:
- features/index.md
- ... | append-only-tables/**.md
- ... | auto-compaction/**.md
- ... | change-data-feed/**.md
- ... | table-valued-functions/**.md
- ... | check-constraints/**.md
Expand Down

0 comments on commit c9e96e0

Please sign in to comment.