Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add table feature manipulation utilities in the Protocol #4157

Merged
merged 8 commits into from
Feb 20, 2025

Conversation

vkorukanti
Copy link
Collaborator

@vkorukanti vkorukanti commented Feb 14, 2025

Description

Utilities for

  • finding the implicitly and explicitly enabled table features
  • adding a feature to the protocol and creating an upgraded protocol

How was this patch tested?

Unittests.

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Left some comments.

List<String> readerFeatures,
List<String> writerFeatures) {
Set<String> readerFeatures,
Set<String> writerFeatures) {
this.minReaderVersion = minReaderVersion;
this.minWriterVersion = minWriterVersion;
this.readerFeatures = readerFeatures;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to using null?

I think I'd prefer, at the very least, that callers of getReaderFeatures and getWriterFeatures do not need to perform null checks. Perhaps storing null internally is fine (and then just expose the empty set)? But perhaps storing an empty Set would be better?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing an empty list, but when we are serializing to Row for writing to Delta log, written as null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about doing this the other way around, store null in the feature sets and return empty sets when invoking protocol.readerAndWriterFeatures? I believe this is what we do in delta.

/**
* Get the set of features that are implicitly enabled by the protocol. Features are implicitly
* enabled if they reader and/or writer version is less than the versions that support the
* explicit features specifying in `readerFeatures` and `writerFeatures` sets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: specified instead of specifying

Can you also give an example? IMO a succinct clear example is worth 1000 words of description. Ditto for other methods -- short examples would make it super clear.

* enabled if they are present in the `readerFeatures` and/or `writerFeatures` sets.
*/
public Set<TableFeature> getExplicitlyEnabledFeatures() {
// TODO: Should we throw an exception if we encounter a feature that is not known to Kernel yet?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't happen in this method, right? That should probably happen when we construct the protocol?

i.e. if Delta Kernel is pointed to a table and the table has Delta Table readerWriter Feature foo that Kernel doesn't support, we should throw?

and again on the write path we should check if there's a writer only feature that we don't support?

Let me know if I'm missing something -- I'd figure we were already doing that today

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this code to go through TableFeatures::getTableFeature so that way we always through error for unknown features.

The case is: read the protocol from the delta log, and you ask for the Protocol from all enabled features and process further (check if it is supported or not). We should throw an error here for an unknown feature.

* enabled if they reader and/or writer version is less than the versions that support the
* explicit features specifying in `readerFeatures` and `writerFeatures` sets.
*/
public Set<TableFeature> getImplicitlyEnabledFeatures() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw -- is it implicitly "enabled" or implicitly "supported"? I think the latter, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to say supported everywhere.

@@ -155,9 +155,9 @@ public Transaction build(Engine engine) {
if (!newWriterFeatures.isEmpty()) {
logger.info("Automatically enabling writer features: {}", newWriterFeatures);
shouldUpdateProtocol = true;
List<String> oldWriterFeatures = protocol.getWriterFeatures();
Set<String> oldWriterFeatures = protocol.getWriterFeatures();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a comment, but a question on the existing code for my understanding: TableFeatures.extractAutomaticallyEnabledWriterFeatures looks for writer features. So what is the mechanism to enable reader-writer features such as timestampNtz?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are just looking for the writer features (because Kernel has no support for any readerwriter feature yet).

This will be changed to TableFeatures.extractAutomaticallyEnabledFeatures which enables both readerWriter and writerOnly features, and also have methods to compute the new Protocol

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we change it already to TableFeatures.extractAutomaticallyEnabledFeatures and add an assertion that there is no reader+writer feature atm with FeatureAutomaticallyEnabledByMetadata trait? That way we know we won't forget that later on.

List<String> readerFeatures,
List<String> writerFeatures) {
Set<String> readerFeatures,
Set<String> writerFeatures) {
this.minReaderVersion = minReaderVersion;
this.minWriterVersion = minWriterVersion;
this.readerFeatures = readerFeatures;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator

@scottsand-db scottsand-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a couple of questions/comments -- after this, will approve. Thanks!

* already supports 'invariants' implicitly.
* <li>current protocol is (1, 7, writerFeature='rowTracking,domainMetadata' and the new feature
* to add is 'appendOnly' -> (1, 7, writerFeature='rowTracking,domainMetadata,appendOnly')
* <li>current protocol is (1, 7, writerFeature='rowTracking,domainMetadata' and the new feature
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: so how do we (kernel developers) "upgrade" a protocol from (1, 7) to then include column mapping?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the steps:

  1. start with Protocol(3, 7)
  2. add the required features (in this case columnMapping) and normalize it
  3. Check if the current protocol is upgradable to the protocol created in (2) - i.e does the currentProtocol already supports what is needed (implicitly or explicitly)
  4. If yes, we can return currentProtocol
  5. Otherwise see what new features we are adding and can they be auto upgradeable based on the table feature properties.

This method will be in the next PR (link here)

*/
public Protocol denormalized() {
// Denormalization can only be applied to legacy protocols.
if (supportsWriterFeatures) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice if we had a separate method (with some documentation) that explained the relationship between supportsWriterFeatures and is-legacy-protocol?

private def isLegacyProtocol() = supportsWriterFeatures

This would make it clearer in the code, and then you could also add some detail/comments in one spot that explaiend why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added utilities

@vkorukanti vkorukanti merged commit c612006 into delta-io:master Feb 20, 2025
21 checks passed
Copy link
Contributor

@andreaschat-db andreaschat-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vkorukanti thank you for doing this. I left a couple of comments.

this.minReaderVersion = minReaderVersion;
this.minWriterVersion = minWriterVersion;
this.readerFeatures = readerFeatures;
this.writerFeatures = writerFeatures;
this.readerFeatures = requireNonNull(readerFeatures, "readerFeatures cannot be null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about this. Isn't supposed kernel to be able to support legacy features?

if (o == null || getClass() != o.getClass()) {
return false;
}
Protocol protocol = (Protocol) o;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to otherProtocol?

/////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Get the set of features that are implicitly supported by the protocol. Features are implicitly
* supported if the reader and/or writer version is less than the versions that supports the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* supported if the reader and/or writer version is less than the versions that supports the
* may have implicit features. The implicit feature set is determined by collecting the legacy features with reader and writer versions less or equal to the protocol's reader and writer versions.

/// Public methods related to table features interaction with the protocol ///
/////////////////////////////////////////////////////////////////////////////////////////////////
/**
* Get the set of features that are implicitly supported by the protocol. Features are implicitly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Get the set of features that are implicitly supported by the protocol. Features are implicitly
* Get the set of features that are implicitly supported by the protocol. Only legacy protocols

return supportedFeatures;
}

/** Create a new {@link Protocol} object with the given {@link TableFeature} supported. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Create a new {@link Protocol} object with the given {@link TableFeature} supported. */
/** Create a new {@link Protocol} object with the given {@link TableFeature} iterable supported. */

@@ -384,8 +393,29 @@ public static TableFeature getTableFeature(String featureName) {
return tableFeature;
}

/** Does reader version supports explicitly specifying reader feature set in protocol? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I find the comment in delta more clean:
Check if this protocol is capable of adding features into its readerFeatures field.. Same for the writer features below.

}
});
} else {
// ensure we don't get (minReaderVersion, minWriterVersion) that satisfy the readerWriter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Ensure we don't etc

List<String> readerFeatures,
List<String> writerFeatures) {
Set<String> readerFeatures,
Set<String> writerFeatures) {
this.minReaderVersion = minReaderVersion;
this.minWriterVersion = minWriterVersion;
this.readerFeatures = readerFeatures;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about doing this the other way around, store null in the feature sets and return empty sets when invoking protocol.readerAndWriterFeatures? I believe this is what we do in delta.

}

// Tests for `normalized
Seq(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we have DELTA_PROTOCOL_DEFAULT_READER_VERSION and DELTA_PROTOCOL_DEFAULT_WRITER_VERSION in kernel?

*/
public Protocol withFeature(TableFeature feature) {
// Add required dependencies of the feature
Protocol protocolWithDependencies = withFeatures(feature.requiredFeatures());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, withFeatures(feature.requiredFeatures()) won't denormalize the protocol. So, if there is a depedent feature with higher reader/writer versions that won't be reflected in protocolWithDependencies . Perhaps we should add a comment it about it here. I tried to fix this in the past but other work got into the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants