Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Delta Protocol] Follow up comments for ClusteringTable feature #2294

Closed

Conversation

dabao521
Copy link
Contributor

@dabao521 dabao521 commented Nov 15, 2023

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Follow up comments for ClusteringTable feature #2264

We had late comments after the original PR merged, and this PR is addressing those missed comments.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

No

- A clustering implementation is free to add additional information such as adding a new user-controlled metadata domain to keep track of its metadata.
- Writers must not define clustered and partitioned table at the same time.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @ryan-johnson-databricks / @imback82 , this is the new rule to disallow partitioned table . This is to address comment and comment

}
}
```
The example above converts `configuration` field into JSON format, including escaping characters. Here's how it looks in plain JSON for better understanding.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROTOCOL.md Outdated
@@ -1057,27 +1059,48 @@ When Row Tracking is enabled (when the table property `delta.enableRowTracking`

The Clustered Table feature facilitates the physical clustering of rows that share similar values on a predefined set of clustering columns.
This enhances query performance when selective filters are applied to these clustering columns through data skipping.
Clustering columns must be specified during the initial definition of a clustered table, and they can be modified after the table has been created.
Clustering columns can be sprecified when creating a table or later, as long as the table doesn't have partition columns.
Copy link
Contributor Author

@dabao521 dabao521 Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @imback82 / @ryan-johnson-databricks , I have updated here and below to let clustering table feature to be enabled either during table creation or at a later stage . This is to address comment.

Copy link
Collaborator

@ryan-johnson-databricks ryan-johnson-databricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dabao521 dabao521 requested a review from imback82 November 16, 2023 04:39
@dabao521 dabao521 requested a review from imback82 November 16, 2023 04:58
Copy link
Collaborator

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants