-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add table feature manipulation utilities in the Protocol #4157
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Left some comments.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/Protocol.java
Outdated
Show resolved
Hide resolved
List<String> readerFeatures, | ||
List<String> writerFeatures) { | ||
Set<String> readerFeatures, | ||
Set<String> writerFeatures) { | ||
this.minReaderVersion = minReaderVersion; | ||
this.minWriterVersion = minWriterVersion; | ||
this.readerFeatures = readerFeatures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit to using null
?
I think I'd prefer, at the very least, that callers of getReaderFeatures
and getWriterFeatures
do not need to perform null checks. Perhaps storing null internally is fine (and then just expose the empty set)? But perhaps storing an empty Set would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Storing an empty list, but when we are serializing to Row for writing to Delta log, written as null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about doing this the other way around, store null in the feature sets and return empty sets when invoking protocol.readerAndWriterFeatures
? I believe this is what we do in delta.
/** | ||
* Get the set of features that are implicitly enabled by the protocol. Features are implicitly | ||
* enabled if they reader and/or writer version is less than the versions that support the | ||
* explicit features specifying in `readerFeatures` and `writerFeatures` sets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: specified
instead of specifying
Can you also give an example? IMO a succinct clear example is worth 1000 words of description. Ditto for other methods -- short examples would make it super clear.
* enabled if they are present in the `readerFeatures` and/or `writerFeatures` sets. | ||
*/ | ||
public Set<TableFeature> getExplicitlyEnabledFeatures() { | ||
// TODO: Should we throw an exception if we encounter a feature that is not known to Kernel yet? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't happen in this method, right? That should probably happen when we construct the protocol?
i.e. if Delta Kernel is pointed to a table and the table has Delta Table readerWriter Feature foo
that Kernel doesn't support, we should throw?
and again on the write path we should check if there's a writer only feature that we don't support?
Let me know if I'm missing something -- I'd figure we were already doing that today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this code to go through TableFeatures::getTableFeature
so that way we always through error for unknown features.
The case is: read the protocol from the delta log, and you ask for the Protocol from all enabled features and process further (check if it is supported or not). We should throw an error here for an unknown feature.
* enabled if they reader and/or writer version is less than the versions that support the | ||
* explicit features specifying in `readerFeatures` and `writerFeatures` sets. | ||
*/ | ||
public Set<TableFeature> getImplicitlyEnabledFeatures() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw -- is it implicitly "enabled" or implicitly "supported"? I think the latter, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to say supported everywhere.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/Protocol.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/tablefeatures/TableFeatures.java
Outdated
Show resolved
Hide resolved
70962de
to
d213b1b
Compare
@@ -155,9 +155,9 @@ public Transaction build(Engine engine) { | |||
if (!newWriterFeatures.isEmpty()) { | |||
logger.info("Automatically enabling writer features: {}", newWriterFeatures); | |||
shouldUpdateProtocol = true; | |||
List<String> oldWriterFeatures = protocol.getWriterFeatures(); | |||
Set<String> oldWriterFeatures = protocol.getWriterFeatures(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a comment, but a question on the existing code for my understanding: TableFeatures.extractAutomaticallyEnabledWriterFeatures looks for writer
features. So what is the mechanism to enable reader-writer features such as timestampNtz
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are just looking for the writer features (because Kernel has no support for any readerwriter feature yet).
This will be changed to TableFeatures.extractAutomaticallyEnabledFeatures
which enables both readerWriter and writerOnly features, and also have methods to compute the new Protocol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change it already to TableFeatures
.extractAutomaticallyEnabledFeatures
and add an assertion that there is no reader+writer feature atm with FeatureAutomaticallyEnabledByMetadata
trait? That way we know we won't forget that later on.
List<String> readerFeatures, | ||
List<String> writerFeatures) { | ||
Set<String> readerFeatures, | ||
Set<String> writerFeatures) { | ||
this.minReaderVersion = minReaderVersion; | ||
this.minWriterVersion = minWriterVersion; | ||
this.readerFeatures = readerFeatures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
d213b1b
to
1822011
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a couple of questions/comments -- after this, will approve. Thanks!
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/Protocol.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/Protocol.java
Outdated
Show resolved
Hide resolved
* already supports 'invariants' implicitly. | ||
* <li>current protocol is (1, 7, writerFeature='rowTracking,domainMetadata' and the new feature | ||
* to add is 'appendOnly' -> (1, 7, writerFeature='rowTracking,domainMetadata,appendOnly') | ||
* <li>current protocol is (1, 7, writerFeature='rowTracking,domainMetadata' and the new feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: so how do we (kernel developers) "upgrade" a protocol from (1, 7) to then include column mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the steps:
- start with Protocol(3, 7)
- add the required features (in this case columnMapping) and normalize it
- Check if the current protocol is upgradable to the protocol created in (2) - i.e does the currentProtocol already supports what is needed (implicitly or explicitly)
- If yes, we can return currentProtocol
- Otherwise see what new features we are adding and can they be auto upgradeable based on the table feature properties.
This method will be in the next PR (link here)
*/ | ||
public Protocol denormalized() { | ||
// Denormalization can only be applied to legacy protocols. | ||
if (supportsWriterFeatures) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice if we had a separate method (with some documentation) that explained the relationship between supportsWriterFeatures and is-legacy-protocol?
private def isLegacyProtocol() = supportsWriterFeatures
This would make it clearer in the code, and then you could also add some detail/comments in one spot that explaiend why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added utilities
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/Protocol.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vkorukanti thank you for doing this. I left a couple of comments.
this.minReaderVersion = minReaderVersion; | ||
this.minWriterVersion = minWriterVersion; | ||
this.readerFeatures = readerFeatures; | ||
this.writerFeatures = writerFeatures; | ||
this.readerFeatures = requireNonNull(readerFeatures, "readerFeatures cannot be null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about this. Isn't supposed kernel to be able to support legacy features?
if (o == null || getClass() != o.getClass()) { | ||
return false; | ||
} | ||
Protocol protocol = (Protocol) o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to otherProtocol
?
///////////////////////////////////////////////////////////////////////////////////////////////// | ||
/** | ||
* Get the set of features that are implicitly supported by the protocol. Features are implicitly | ||
* supported if the reader and/or writer version is less than the versions that supports the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* supported if the reader and/or writer version is less than the versions that supports the | |
* may have implicit features. The implicit feature set is determined by collecting the legacy features with reader and writer versions less or equal to the protocol's reader and writer versions. |
/// Public methods related to table features interaction with the protocol /// | ||
///////////////////////////////////////////////////////////////////////////////////////////////// | ||
/** | ||
* Get the set of features that are implicitly supported by the protocol. Features are implicitly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Get the set of features that are implicitly supported by the protocol. Features are implicitly | |
* Get the set of features that are implicitly supported by the protocol. Only legacy protocols |
return supportedFeatures; | ||
} | ||
|
||
/** Create a new {@link Protocol} object with the given {@link TableFeature} supported. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** Create a new {@link Protocol} object with the given {@link TableFeature} supported. */ | |
/** Create a new {@link Protocol} object with the given {@link TableFeature} iterable supported. */ |
@@ -384,8 +393,29 @@ public static TableFeature getTableFeature(String featureName) { | |||
return tableFeature; | |||
} | |||
|
|||
/** Does reader version supports explicitly specifying reader feature set in protocol? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I find the comment in delta more clean:
Check if this protocol is capable of adding features into its readerFeatures field.
. Same for the writer features below.
} | ||
}); | ||
} else { | ||
// ensure we don't get (minReaderVersion, minWriterVersion) that satisfy the readerWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Ensure we don't etc
List<String> readerFeatures, | ||
List<String> writerFeatures) { | ||
Set<String> readerFeatures, | ||
Set<String> writerFeatures) { | ||
this.minReaderVersion = minReaderVersion; | ||
this.minWriterVersion = minWriterVersion; | ||
this.readerFeatures = readerFeatures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about doing this the other way around, store null in the feature sets and return empty sets when invoking protocol.readerAndWriterFeatures
? I believe this is what we do in delta.
} | ||
|
||
// Tests for `normalized | ||
Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Do we have DELTA_PROTOCOL_DEFAULT_READER_VERSION
and DELTA_PROTOCOL_DEFAULT_WRITER_VERSION
in kernel?
*/ | ||
public Protocol withFeature(TableFeature feature) { | ||
// Add required dependencies of the feature | ||
Protocol protocolWithDependencies = withFeatures(feature.requiredFeatures()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, withFeatures(feature.requiredFeatures())
won't denormalize the protocol. So, if there is a depedent feature with higher reader/writer versions that won't be reflected in protocolWithDependencies
. Perhaps we should add a comment it about it here. I tried to fix this in the past but other work got into the way.
Description
Utilities for
How was this patch tested?
Unittests.