From ccfa999c9cca16476cc18fa147aca03e8a9c14cf Mon Sep 17 00:00:00 2001
From: Liam Brannigan <liambrannigan@Liams-MacBook-Pro.local>
Date: Thu, 6 Feb 2025 10:46:53 +0000
Subject: [PATCH] Add review changes

Signed-off-by: Liam Brannigan <liambrannigan@Liams-MacBook-Pro.local>
---
 docs/usage/working-with-partitions.md | 32 ++++++++++++++++-----------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/docs/usage/working-with-partitions.md b/docs/usage/working-with-partitions.md
index d1cd4ce252..72eafeeb05 100644
--- a/docs/usage/working-with-partitions.md
+++ b/docs/usage/working-with-partitions.md
@@ -9,7 +9,7 @@ Below, we demonstrate how to create, query, and update partitioned Delta tables,
 
 To create a partitioned Delta table, specify one or more partition columns when creating the table. Here we partition by the country column.
 ```python
-from deltalake import write_deltalake
+from deltalake import write_deltalake,DeltaTable
 import pandas as pd
 
 df = pd.DataFrame({
@@ -98,9 +98,10 @@ print(pdf)
 
 ### Overwriting a Partition
 
-You can overwrite a specific partition, leaving the other partitions intact. Pass in `mode="overwrite"` together with a predicate string.
+To overwrite a specific partition or partitions set `mode="overwrite"` together with a predicate string that specifies
+which partitions are present in the new data. By setting the predicate `deltalake` is able to skip the other partitions.
 
-In this example we overwrite the `DE` paritition with new data.
+In this example we overwrite the `DE` partition with new data.
 
 ```python
 df_overwrite = pd.DataFrame({
@@ -134,16 +135,17 @@ print(pdf)
 
 ## Updating Partitioned Tables with Merge
 
-You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones. Simply provide a matching predicate that references partition columns if needed.
+You can perform merge operations on partitioned tables in the same way you do on non-partitioned ones. If only a subset of existing partitions need to be read then provide a matching predicate that references the partition columns represented in the source data. The predicate then allows `deltalake` to skip reading the partitions not referenced by the predicate. 
 
-You can match on both the partition column (country) and some other condition. This example shows a merge operation that checks both the partition column ("country") and a numeric column ("num") when merging:
+This example shows a merge operation that checks both the partition column (`"country"`) and another column (`"num"`) when merging:
 - The merge condition (predicate) matches target rows where both "country" and "num" align with the source.
-- When a match occurs, it updates the "letter" column; otherwise, it inserts the new row.
+- If a match is found between a source row and a target row, the `"letter"` column is updated with the source data
+- Otherwise if no match is found for a source row it inserts the new row, creating a new partition if necessary
 
 ```python
 dt = DeltaTable("tmp/partitioned-table")
 
-source_data = pd.DataFrame({"num": [1, 101], "letter": ["A", "B"], "country": ["US", "US"]})
+source_data = pd.DataFrame({"num": [1, 101], "letter": ["A", "B"], "country": ["US", "CH"]})
 
 (
     dt.merge(
@@ -166,7 +168,7 @@ print(pdf)
 
 ```plaintext
     num letter country
-0   101      B      US
+0   101      B      CH
 1     1      A      US
 2     2      b      US
 3   900      m      DE
@@ -192,10 +194,11 @@ print(pdf)
 
 ```plaintext
     num letter country
-0   900      m      DE
-1  1000      n      DE
-2    10      x      CA
-3     3      c      CA
+0   101      B      CH
+1   900      m      DE
+2  1000      n      DE
+3    10      x      CA
+4     3      c      CA
 ```
 This command logically deletes the data by creating a new transaction.
 
@@ -204,6 +207,9 @@ This command logically deletes the data by creating a new transaction.
 ### Optimize & Vacuum
 
 Partitioned tables can accummulate many small files if a partition is frequently appended to. You can compact these into larger files on a specific partition with [`optimize.compact`](../../delta_table/#deltalake.DeltaTable.optimize).
+
+If we want to target compaction at specific partitions we can include partition filters.
+
 ```python
  dt.optimize.compact(partition_filters=[("country", "=", "CA")])
  ```
@@ -212,4 +218,4 @@ Then optionally [`vacuum`](../../delta_table/#deltalake.DeltaTable.vacuum) the t
 
 ### Handling High-Cardinality Columns
 
-Partitioning can be very powerful, but be mindful of using high-cardinality columns (columns with too many unique values). This can create an excessive number of directories and can hurt performance. For example, partitioning by date is typically better than partitioning by user_id if user_id has millions of unique values.
+Partitioning can be useful for reducing the time it takes to update and query a table, but be mindful of creating partitions against high-cardinality columns (columns with many unique values). Doing so can create an excessive number of partition directories which can hurt performance. For example, partitioning by date is typically better than partitioning by user_id if user_id has millions of unique values.