-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: ensure Iceberg layouts own the SeekableChannelsProvider #6371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really like the change, we should get this in before the release.
Minor comments.
...ions/iceberg/src/main/java/io/deephaven/iceberg/internal/DataInstructionsProviderLoader.java
Show resolved
Hide resolved
...ions/iceberg/src/main/java/io/deephaven/iceberg/internal/DataInstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/s3/src/main/java/io/deephaven/iceberg/util/S3InstructionsProviderPlugin.java
Show resolved
Hide resolved
...src/main/java/io/deephaven/extensions/trackedfile/TrackedSeekableChannelsProviderPlugin.java
Show resolved
Hide resolved
@@ -54,4 +54,23 @@ public SeekableChannelsProvider fromServiceLoader(@NotNull final URI uri, | |||
} | |||
throw new UnsupportedOperationException("No plugin found for uri: " + uri); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the fromServiceLoader(uri, specialInstructions)
path now, especially since we have a breaking release coming up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because one part of a release is breaking does not mean that we should feel free to break anything; there should be some level of nuance involved involved in each breaking change.
I did a quick scan of DHE usages; they have one test usage of fromServiceLoader
afaict.
Util/channel/src/main/java/io/deephaven/util/channel/SeekableChannelsProviderPlugin.java
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Outdated
Show resolved
Hide resolved
extensions/iceberg/src/main/java/io/deephaven/iceberg/layout/IcebergBaseLayout.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changeset is fine, modulo incredibly minor quibbles. I guess the real question is, what do we give up? It seems to me like having access to the full URI is potentially more powerful, and that we're preventing "hybrid" Iceberg catalogs with more than one URI scheme in use (which is (1) something I kind of want to build, and (2) not something that would totally surprise me in the wild).
...ions/iceberg/src/main/java/io/deephaven/iceberg/internal/DataInstructionsProviderPlugin.java
Outdated
Show resolved
Hide resolved
extensions/s3/src/main/java/io/deephaven/extensions/s3/GCSSeekableChannelProviderPlugin.java
Outdated
Show resolved
Hide resolved
Good catch regarding URIs with different schemes, we can either keep a set of channel providers in the Base Layout class or, for now, add an assertion that all the location keys have same URI scheme as the table location. |
I've added an explicit URI check; it's a good point. That said, I'm almost certain that we don't support multiple URI schemes today given the pain points of configuration (not necessarily speaking of DH). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
This greatly improves the efficiency of Iceberg reading. Previously, it was creating a
SeekableChannelsProvider
per URI, and now only one is created once per layout (/ Table).To aid in this update, objects that were previously created on-demand in
IcebergBaseLayout
are now created once upon construction. To enable this, it was noted that only the URI scheme is relevant for discrimination, and not actually the full URI to the data files. Thus, we can use the URI scheme as provided viaorg.apache.iceberg.Table#location
to do any up-front loading.The various interfaces that take a URI have been update to take a URI scheme instead. While this change could technically have been made in a non-breaking fashion by delegating existing URI methods to URI scheme methods, the existence of the URI methods encourages the wrong mental model and is easy to misuse, so they have been removed.
One of the
ParquetTableLocationKey
constructors has been deprecated, marked for removal. A more appropriate constructor has been added.BREAKING CHANGE:
SeekableChannelsProviderLoader.fromServiceLoader
has been removed, replaced withSeekableChannelsProviderLoader.load
.DataInstructionsProviderLoader.fromServiceLoader
has been removed, replaced withDataInstructionsProviderLoader.load
.SeekableChannelsProviderPlugin
methods have been changed, now use aString
for the URI scheme instead of aURI
.DataInstructionsProviderPlugin.createInstructions
method has been changed, now uses aString
for the URI scheme instead of aURI
.IcebergTableParquetLocationKey
has added a newSeekableChannelsProvider
parameter.