-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core persistence refactor phase 1 - extract BasePersistence and BaseMetaStoreManager to isolate all "transaction" behaviors #1070
Core persistence refactor phase 1 - extract BasePersistence and BaseMetaStoreManager to isolate all "transaction" behaviors #1070
Conversation
…stence stack Remove the EntityCacheEntry wrapper since the timestamp fields were never used anyways; instead the underlying Caffeine cache transparently handles access times, and the types of entries we cache are simply the existing ResolvedPolarisEntity. Remove interactions of business logic with explicit "cache entries", instead operating on ResolvedPolarisEntity. Fix the equals()/hashCode() behavior of PolarisEntity to be compatible with PolarisBaseEntity as intended. Improve code comments to explain the (current) relationship between PolarisEntity and PolarisBaseEntity, and clarify the behaviors in Resolver.java. Fully remove the PolarisRemoteCache interface and its methods. Add different methods that aren't cache-specific instead.
…erize ResolverTest
… of CallContext This appeared to be some leaky divergence that occurred after CallContext had been removed, but PolarisMetaStoreSession really is only a low-level implementation detail that should never be handled by BasePolarisCatalog/FileIOFactory. This plumbs CallContext explicitly into the FileIOFactory and FileIOUtil methods and thus removes a large source of CallContext.getCurrentContext calls; now the threadlocal doesn't have to be set at all in BasePolarisCatalogTest.
Extract a basic abstract base class BaseMetaStoreManager which can hold shared logic between implementations of PolarisMetaStoreManager
…g to implement UNDROP but for now are entirely unused. We can reintroduce it with a better design against multiple backends when/if we want to implement UNDROP.
…toreSession; only leave the transaction-specific methods in PolarisMetaStoreSession
…ence-poc-extract-abstract-base
…ransactional-style PolarisMetaStoreSession, so that BasePersistence properly exposes a lookupEntityByName method where impls that use secondary indexes can easily just lookup an entity by name instead of doing two lookups.
…tityActive protected-visibility; remove all callsites where PolarisMetaStoreManagerImpl calls it. Technically, while in the same package this doesn't prevent it from leaking, but we could reposition PolarisMetaStoreSession into a separate transaction-specific package to help protect it from leaking the lower-level abstractions.
…nd add writeEntity method to BasePersistence, with a default impl in PolarisMetaStoreSession containing what was previously in PolarisMetaStoreManagerImpl. This now protects all writes in PolarisMetaStoreManagerImpl from dealing with the three-table implementation detail. Technically slightly changes the ordering of updates within a transaction for renameEntity, but is arguably a more correct ordering, and the ordering doesn't interleave reads anyways.
…ap behavior from the underlying BasePersistence. Pushdown all the deleteFromEntities* methods into PolarisMetaStoreSession and add deleteEntity to BasePersistence which encapsulates handling the separate slices.
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Outdated
Show resolved
Hide resolved
…nto a new IntegrationPersistence interface; these methods encapsulate certain type-specific behaviors that are indirectly tied to persistence entities, such as principal secrets, storage integrations, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @dennishuo ! I think it is a step in the right direction for eventual NoSQL support :)
Some preliminary comments for now... more to come.
@@ -60,6 +60,10 @@ public CallContext resolveCallContext( | |||
.addKeyValue("headers", headers) | |||
.log("Resolving CallContext"); | |||
|
|||
// TODO: Once we have non-transactional-database persistence stores, this should be | |||
// pushed down for the metaStoreManagerFactory to inject Transactional-DB specific things | |||
// (including the MetaStoreSession" into the PolarisCallContext. The non-transactional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would PolarisCallContext
have DB-specific stuff? How do you envision the purpose and use cases of PolarisCallContext
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MetaStoreSessionFactory kind of implies at least baking in the realmContext
during the construction of the handle into the persistence store. So in general one might decorate the handle into the persistence store with various elements of traceability/auditability, blast-radius reduction (e.g. if it could connect to the backend with some kind of scoped internal credentials), which is why the current architecture treats the handle into the persistence backend as being a property of the call context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All that can be implemented as request-scoped beans, AFAIK 🤔 So, the "context" method parameters become injected fields or constructor parameters. Would that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per discussion, we can continue fine-tuning callcontext syntax in the followup.
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Show resolved
Hide resolved
* null if this is expected to be a brand-new entity | ||
*/ | ||
void writeEntity( | ||
@Nonnull PolarisCallContext callCtx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally against "call contexts" as explicit method parameters (or thread locals) from the API design POV.
Is there a reason why call context objects cannot be injected into impl. classes (Request-scoped beans in CDI)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very against using ThreadLocals as well, but it seems to me the opposite of ThreadLocal is explicit method parameter. In general I like the approach in core Iceberg code of trying to be very explicit as much as possible and avoiding injection.
"RequestScoped" seems prone to many of the same problems as ThreadLocals, where the provision of state leaves too much to the magic framework.
The main use case I'm thinking of here though is that not all CallContexts necessarily come in through the framework; we have the whole Tasks subsystem where we want an explicit definition of the CallContext that is constructed proactively by the system-initiated task handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding tasks in general and the concrete use case of "purging tables", all that's eventually needed are some parameters. For "purge table" that's an object describing the table and some Iceberg parameters - the actual "purge" itself doesn't need persistence at all. What we eventually need is a robust tasks framework (#774) - what individual task "runnables" need and how they work is up to those. If those need some persistence interaction, we can always have factories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the parameter, yea - I also think, it's not necessary. If an implementation needs it, it could take it as a constructor argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not agree that the CDI framework is "magic" :) Critical points are controlled by the application: either through annotations or producer methods.
That said, dealing with the "call context" stuff an persistence API in the same PR is probably an overkill. At the same time, I believe it is important to have as clean a persistence API as possible (i.e. minimal set of parameters).
How about removing call context from this interface and use specialiazation when integrating with existing callers?
For example: Instead of persistence.writeEntity(ctx, ...)
do persistenceProvider.forContext(ctx).writeEntity(...)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm being half-facetious about the "magic" ;)
I try not to be a luddite but maybe just old-fashioned. As the saying goes "any sufficiently advanced injection of boilerplate code is indistinguishable from magic"
In this particular case though I was also spooked at how easily it could potentially slip through to accidentally inject something at the wrong scope: #1000 (comment)
Somehow the producer graphs were happy to inject an ApplicationScoped CallContext without any idea where it came from (!?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inject an ApplicationScoped CallContext without any idea where it came from (!?)
Nope - producers are not ambiguous. If there is an ambiguity, Quarkus will fail the build. (Weld fails at runtime.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're explaining that each bean can only have one producer, but that's not the kind of ambiguity referred to above. If you have a long chain of method calls where A.x calls B.y which calls C.z and so on, it can often be unclear at what point in time some object that's used in C.z was actually produced by the injection framework. It's ambiguous where & when the object was constructed, not what method actually did the construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a different angle: Can PolarisCallContext
contain all the necessary information for all possible implementations of BasePersistence
? I'd like to avoid it becoming a union of all possible dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue fine-tuning callcontext syntax in the followup
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Show resolved
Hide resolved
* @param grantRec entity record to write, potentially replacing an existing entity record with | ||
* the same key | ||
*/ | ||
void writeToGrantRecords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like grant records a different from "entities", right? If so, why not have a separate persistence interface for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be possible, but we'll have to think more about what kind of consistency guarantees we want to preserve given the upper layer's ResolvedEntity
semantic which is a view of the entity with associated grants all together.
In practice, right now to preserve the intended behavior of Polaris' grant-based authorization, grants and entities are tightly coupled under the same implied persistence store with interrelated ids, grantRecordsVersions etc.
Perhaps if we really want to extract the GrantsManager to be 100% independent, we might want to introduce a different construct, like ForeignGrantRecords or something, where the referential integrity is no longer based on a shared persistence layer. I think that's a plausible option, but we can portray that as "new behavior" while we adapt the existing behavior as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I foresee the need to have pluggable authZ. It doesn't need to be necessarily this specific RBAC. It could be ABAC, another RBAC or even "noop". WDYT about making the current RBAC objects just entities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are tied to a shared persistence layer. The foreign key constraints are implied, not explicit, so the grant records don't actually need to point to a valid entity id (valid from the grant record's persistence layer's perspective). What matters is that the grant version of the entity changes when the grant records are changed, but even that is not guaranteed in a non-transactional database system. I.e., there is no CAS that guarantees grant records are persisted and that the entity grant version changes atomically.
Introducing a ForeignGrantRecords
concrete type means that the authorizer has to handle both native grants and external grants, which is more code to maintain and test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grant version of the entity changes when the grant records are changed
Very interesting... I think I missed that point before. Could someone point to current code that relies on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 403 in bd27a2e
granteeEntity.setGrantRecordsVersion(granteeEntity.getGrantRecordsVersion() + 1); |
// grants have changed, we need to bump-up the grants version
granteeEntity.setGrantRecordsVersion(granteeEntity.getGrantRecordsVersion() + 1);
this.writeEntity(callCtx, ms, granteeEntity, false);
Which impacts Resolver:
polaris/polaris-core/src/main/java/org/apache/polaris/core/persistence/resolver/Resolver.java
Line 559 in bd27a2e
|| entity.getGrantRecordsVersion() != versions.getGrantRecordsVersion()) { |
if (versions == null
|| entity.getEntityVersion() != versions.getEntityVersion()
|| entity.getGrantRecordsVersion() != versions.getGrantRecordsVersion()) {
// if null version we need to invalidate the cached entry since it has probably been
// dropped
if (versions == null) {
if (this.cache != null) {
this.cache.removeCacheEntry(resolvedEntity);
}
refreshedResolvedEntity = null;
} else {
// refresh that entity. If versions is null, it has been dropped
if (this.cache != null) {
refreshedResolvedEntity =
this.cache.getAndRefreshIfNeeded(
this.polarisCallContext,
entity,
versions.getEntityVersion(),
versions.getGrantRecordsVersion());
} else {
ResolvedEntityResult result =
this.polarisMetaStoreManager.refreshResolvedEntity(
this.polarisCallContext,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dennishuo : lots of comment :) but do not feel compelled to address them all in this PR. Many are discussion points. Many can be addressed later as long as we're in agreement about the general direction.
Thanks for starting this refactoring!
The only kind of critical point from my POV is dealing with multi-entity changes as a first-class persistence API feature.
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Outdated
Show resolved
Hide resolved
|
||
/** | ||
* Delete Polaris entity and grant record metadata from all tables. This is used during metadata | ||
* bootstrap to reset all tables to their original state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this "original state"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this comment to just say "...reset all tables to an empty state"; any further bootstrapping state would be defined within shared business logic in the PolarisMetaStoreManagerImpl or BaseMetaStoreManager
layer but this "persistence" layer doesn't need to worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I' m still confused... Would the "list" methods find anything after deleteAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, everything should be 100% empty after deleteAll
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's what I expected :) I guess it's just a mater to rephrasing javadoc :)
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Outdated
Show resolved
Hide resolved
* | ||
* @param callCtx call context | ||
*/ | ||
void deleteAll(@Nonnull PolarisCallContext callCtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scope of "all"? Realm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Updated comment to say "within the realm defined by the contents of the {@code callCtx}".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically we need two very different "purge realm" operations. "Prefix key deletion" and similar isn't supported by all databases. For databases that don't support "prefix key deletion" we'd have to scan the database and delete matching rows.
For database support that, deleting a whole realm can be very time consuming (think: transaction timeouts, aborted requests, etc).
What I'm thinking of is realm-state management: created (populate all the things) -> active (usable) -> inactive (just disabled) -> purging (realm can be purged) -> purged. This would give us some flexibility and I think better guarantees. When a realm's created, a bunch of things need to happen - with pluggable authZ for example even things we don't know - the "realm initial population" can be performed by the things that "core Polaris" needs plus things that plugins need.
To delete and eventually purge a realm, it should be set to "inactive" first to prevent having intermediate dirty states and requests running against a realm that's being purged (or marked as such). A regularly run maintenance job could then do the actual cleanup.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I suspect there will a lot more adjustments to the common bootstrap and purge code still to come, especially when it comes to async purge and maybe having a safe retention period before final purge; we can continue to refine in the followup.
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Outdated
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Show resolved
Hide resolved
long catalogId, | ||
long parentId, | ||
@Nonnull PolarisEntityType entityType, | ||
int limit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use for limit
when the caller cannot resume listing later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need pageToken for real pagination, agreed. Right now looks like upper layer uses it for basically "hasChildren" checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you can replace the last 3 parameters, if this returns a Stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pagination we'd need a "proper" page token that guards against changes between any two pages, to make it impossible that the same entries are returned in more than one page.
For posterity, assume there are 5 tables:
- table-A
- table-C
- table-E
- table-G
- table-I
Assume a page-size of 2.
The first request yields
- table-A
- table-C
Then "table-B" is created
The second request should yield
- table-E
- table-G
- table-I
However, org.apache.iceberg.rest.CatalogHandlers#paginate
uses integer indexes, so the second request would actually yield "table-C" again:
- table-C
- table-E
- table-G
- table-I
Implying that all list*
calls return results in a deterministic order (e.g. ascending by name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #273 for a pagination implementation that's not based on the integer index
polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java
Outdated
Show resolved
Hide resolved
* @param parentId id of the parent, either a namespace or a catalog | ||
* @return true if the parent entity has children | ||
*/ | ||
boolean hasChildren( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it critical to have this as a dedicated API method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Not sure, for now just preserving behavior. Looks like there's a mix of using hasChildren
and listEntities(limit...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do not mind, let's remove it to reduce the API breadth. We can always add something later if this use case needs to be optimized... I kind of doubt that it will even come to that because deleting namespaces is less frequent than getting the children of a namespace.
* @return a storage integration object | ||
*/ | ||
@Nullable | ||
<T extends PolarisStorageConfigurationInfo> PolarisStorageIntegration<T> createStorageIntegration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can PolarisStorageConfigurationInfo
be an simple entity in BasePersistence
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that was actually one of the original designs, and the current Catalog nesting was thought of as a "simplification" to define "inline StorageConfigurationInfo".
In theory they can be first-class entities and we can have catalogs take referential ids/uris to the configuration info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does persistence need to know about object-storage at all? I think persistence and object-storage are very different things. This function and loadPolarisStorageIntegration
are factories for object-storage specific functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we might be able to lift these one or two layers higher up in followups.
polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java
Show resolved
Hide resolved
* @param principalId principal id | ||
*/ | ||
@Nonnull | ||
PolarisPrincipalSecrets generateNewPrincipalSecrets( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd think secrete generation not be a persistence concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Persistence needs to persist a referential ID though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "persist a referential ID"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be better standardized in the next few followups especially as we also add a ConnectionConfigurationInfo
for Catalog Federation, but in general the idea is to leave room (maybe optionally) for the lower-level customizable logic to add persistence-related book-keeping, if secrets or integrations create heavyweight resources.
In OpenCatalog, for example, the machinery might create an "OauthIntegration" in a separate subsystem to serve each new Polaris PrincipalEntity, and there would be a unique integrationId
kept track of within the internalProperties of the PrincipalEntity, and likewise the principalId
would be stored in the external OAuthIntegration.
This does start getting into the realm of "type-specific extensions" to the underlying persistence model like was suggested in Yufei's proposal, but just in a more limited/low-level scope.
Generally I'd agree though that these interfaces could use some refinement. I'm thinking one of our followup refactors can focus on this "integration provider" concept to make it more natural across implementations. Maybe it's not strictly a "persistence concern" ultimately, but just a "pluggable integration object framework that may interact with the core persistence model".
* <p>Note that APIs to the actual persistence store are very basic, often point read or write to | ||
* the underlying data store. The goal is to make it really easy to back this using databases like | ||
* Postgres or simpler KV store. | ||
* Extends BasePersistence to express a more "transaction-oriented" control flow for backing stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename the class to something like TransactionalMetastore
or TransactionOrientedMetaStore
? I'd also suggest to move it to a separated package, like org.apache.polaris.core.persistence.transactional
, so that it provides a clear structure for other types of metastore to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
…ameLookupRecord, remove unused method
…nto a new package "transactional".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dennishuo! I think it's a good start point.
polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManagerImpl.java
Show resolved
Hide resolved
polaris-core/src/main/java/org/apache/polaris/core/persistence/BaseMetaStoreManager.java
Show resolved
Hide resolved
@Nonnull PolarisCallContext callCtx, | ||
long catalogId, | ||
long parentId, | ||
int typeCode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec does allow things of different types to have the same name
I suspect you refer to the Iceberg APIs/spec. IMO it's rather a mistake in Iceberg to allow having a view and a table with the exact same name - just thinking about: "SELECT * from this_thing" ... is it a table or a view or a UDF? OTOH, there are checks that prevent creating a table using the same name as an existing view.
* null if this is expected to be a brand-new entity | ||
*/ | ||
void writeEntity( | ||
@Nonnull PolarisCallContext callCtx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding tasks in general and the concrete use case of "purging tables", all that's eventually needed are some parameters. For "purge table" that's an object describing the table and some Iceberg parameters - the actual "purge" itself doesn't need persistence at all. What we eventually need is a robust tasks framework (#774) - what individual task "runnables" need and how they work is up to those. If those need some persistence interaction, we can always have factories.
* @param grantRec entity record to write, potentially replacing an existing entity record with | ||
* the same key | ||
*/ | ||
void writeToGrantRecords( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I foresee the need to have pluggable authZ. It doesn't need to be necessarily this specific RBAC. It could be ABAC, another RBAC or even "noop". WDYT about making the current RBAC objects just entities?
* | ||
* @param callCtx call context | ||
*/ | ||
void deleteAll(@Nonnull PolarisCallContext callCtx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically we need two very different "purge realm" operations. "Prefix key deletion" and similar isn't supported by all databases. For databases that don't support "prefix key deletion" we'd have to scan the database and delete matching rows.
For database support that, deleting a whole realm can be very time consuming (think: transaction timeouts, aborted requests, etc).
What I'm thinking of is realm-state management: created (populate all the things) -> active (usable) -> inactive (just disabled) -> purging (realm can be purged) -> purged. This would give us some flexibility and I think better guarantees. When a realm's created, a bunch of things need to happen - with pluggable authZ for example even things we don't know - the "realm initial population" can be performed by the things that "core Polaris" needs plus things that plugins need.
To delete and eventually purge a realm, it should be set to "inactive" first to prevent having intermediate dirty states and requests running against a realm that's being purged (or marked as such). A regularly run maintenance job could then do the actual cleanup.
WDYT?
* null if this is expected to be a brand-new entity | ||
*/ | ||
void writeEntity( | ||
@Nonnull PolarisCallContext callCtx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the parameter, yea - I also think, it's not necessary. If an implementation needs it, it could take it as a constructor argument.
polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java
Show resolved
Hide resolved
* @return a storage integration object | ||
*/ | ||
@Nullable | ||
<T extends PolarisStorageConfigurationInfo> PolarisStorageIntegration<T> createStorageIntegration( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does persistence need to know about object-storage at all? I think persistence and object-storage are very different things. This function and loadPolarisStorageIntegration
are factories for object-storage specific functionality.
long catalogId, | ||
long parentId, | ||
int typeCode, | ||
@Nonnull String name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (most dominant) caller of this is when paths within a catalog are resolved.
We already have all "exploded paths" available at the call size, so I guess it's more ergonomic to delegate the resolution of these "exploded paths" to something that specialized for that. The nice side effect would be that call sites wouldn't have to deal with parent-child relationships but only with paths.
@Nullable | ||
PolarisBaseEntity lookupEntityByName( | ||
@Nonnull PolarisCallContext callCtx, | ||
long catalogId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't catalogId
superfluous? Or do multiple catalogs have overlapping catalog-object IDs? IIRC all IDs are unique? Removing this parameter would also remove the need for the special NULL_ID handling, and PolarisEntityId
could also go away.
* which can support a runInTransaction semantic, while providing default implementations of some of | ||
* the BasePersistence methods in terms of lower-level methods that subclasses must implement. | ||
*/ | ||
public abstract class TransactionalPersistence implements BasePersistence, IntegrationPersistence { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common denominator for atomic operations in non-transactional databases is a CAS on a single row/key. Like "fetch all things", then "check all requirements", then "persist all the changes" and eventually "perform a single CAS" to perform the atomic change. How would this be implemented for example for a "rename table" operation or a "multi table commit" involving all the changes to the individual entities and consistency guarantees?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I think one main next step after getting this initial restructuring merged will be to push down the "atomicBatchConditionalWrite" method and make the multi-table control flow plumb through to that.
import org.apache.polaris.core.persistence.IntegrationPersistence; | ||
|
||
/** | ||
* Extends BasePersistence to express a more "transaction-oriented" control flow for backing stores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the concern with this approach as I put in https://lists.apache.org/thread/rf5orxs815zs4h64p4rwp03q3pbgxb5r
TL;DR: If we go with the semantics of "change is durable after a persistence API method call returns", then any kind API grouping (e.g. a few operations in the same Tx) is not logically correct, because the caller of the "wrapped" persistence deals with the same interface and can expect each call's changes to be durable independently of other calls' changes. This does not hold when actual changes are committed only at the end of a transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify: I support transaction persistence implementations in Polaris :) All I'm saying is that the transaction aspects should not be exposed in the persistence API, but should be handled privately by the transactional persistence impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a good point. Right now before all the refactoring I guess the upper layer (PolarisMetaStoreManagerImpl) is the ultimate authority on the unit of durability, and a custom impl of the traditional PolarisMetaStoreSession
might "enforce" this by having ensureInWriteTransaction
barriers, for example (OpenCatalog does this in its custom metastore session impl).
After this refactoring, I think it should be possible to unify under the semantic of "each method call is independently durable/committed", with the addition of the new atomicBatchConditionalWrite method (or whatever we call it).
As it stands in this PR, after pushing down the entitiesActive
and entitiesChangeTracking
stuff, the vast majority of the functionality no longer truly depends on the external layer defining the durability unit as a transaction, so we just need to chip away a few more things.
The main exceptions are called out in https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0#heading=h.2cx2yoanf9c4
- GrantRecords - we can sacrifice perfect "atomicity" in favor of slightly weaker but still good enough "happens-before" semantics, and just document the implications
- Types that create external "integrations" and other peripheral entities (Catalogs, Principals, Drop -> Tasks) - also sacrifice atomicity in favor of multi-phase commit types of behaviors and/or happens-before guarantees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @dennishuo for starting the refactoring effort!
From my POV we can probably merge this PR in its current state and continue this effort in follow-up PRs.
* | ||
* @param callCtx call context | ||
* @param entity entity to persist | ||
* @param nameOrParentChanged if true, also write it to by-name lookups if applicable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"by-name" lookups looks like the interface is making too many assumptions about the implementation.
Can we say that the caller is responsible for setting this flag in case the parent of the name of the entity changes so that the implementation could use it as a hint to perform additional validation. In any case, the implementation is responsible for ensuring that names are unique within each namespace and that the same object cannot be found under more than one parent.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, missed this one before merging; as you suggested in another comment as well, I'll look to revamp all the javadoc comments in the newly extracted interfaces to better reflect the latest thinking and avoid references to old implementation details
…etaStoreManager to isolate all "transaction" behaviors (apache#1070) * Restructure persistence class hierarchy and remove vestigial interfaces Extract a basic abstract base class BaseMetaStoreManager which can hold shared logic between implementations of PolarisMetaStoreManager * Remove all "entitiesDropped" members; these were vestigial from trying to implement UNDROP but for now are entirely unused. We can reintroduce it with a better design against multiple backends when/if we want to implement UNDROP. * Extract BasePersistence interface as parent interface of PolarisMetaStoreSession; only leave the transaction-specific methods in PolarisMetaStoreSession * Push all evidence of the two-phase lookupEntityByName into only the transactional-style PolarisMetaStoreSession, so that BasePersistence properly exposes a lookupEntityByName method where impls that use secondary indexes can easily just lookup an entity by name instead of doing two lookups. * Turn PolarisMetaStoreSession into an abstract class and make lookupEntityActive protected-visibility; remove all callsites where PolarisMetaStoreManagerImpl calls it. Technically, while in the same package this doesn't prevent it from leaking, but we could reposition PolarisMetaStoreSession into a separate transaction-specific package to help protect it from leaking the lower-level abstractions. * Pushdown all calls to writeToEntities into PolarisMetaStoreSession, and add writeEntity method to BasePersistence, with a default impl in PolarisMetaStoreSession containing what was previously in PolarisMetaStoreManagerImpl. This now protects all writes in PolarisMetaStoreManagerImpl from dealing with the three-table implementation detail. Technically slightly changes the ordering of updates within a transaction for renameEntity, but is arguably a more correct ordering, and the ordering doesn't interleave reads anyways. * Add originalEntity to the writeEntity method to enable compare-and-swap behavior from the underlying BasePersistence. Pushdown all the deleteFromEntities* methods into PolarisMetaStoreSession and add deleteEntity to BasePersistence which encapsulates handling the separate slices. * Break out external-integration related methods from BasePersistence into a new IntegrationPersistence interface; these methods encapsulate certain type-specific behaviors that are indirectly tied to persistence entities, such as principal secrets, storage integrations, etc. * Improve javadoc comments, rename PolarisEntityActiveRecord to EntityNameLookupRecord, remove unused method * Rename PolarisMetaStoreSession to TransactionalPersistence and move into a new package "transactional". * Also move the PolarisTreeMap* classes into the transactional package * Move PolarisMetaStoreManagerImpl into the "transactional" package per PR suggestion
Though this isn't 100% of the persistence refactor yet for unlocking non-transactional databases, this gets the main structure of the lower-level store layer mostly settled and possibly ready for parallel implementation work.
Fully remove the vestigial "entitiesDropped" methods; these were originally for UNDROP but ended up not being used; they can be reintroduced with better design in the future if we want to add UNDROP.
Extract BasePersistence interface as parent interface of PolarisMetaStoreSession which now becomes an abstract class;
only leave the transaction-specific methods in PolarisMetaStoreSession.
Eliminate all direct interaction with the "triplet tables" (entities/entitiesActive/entitiesChangeTracking) from PolarisMetaStoreManagerImpl in favor of aggregate methods declared in
BasePersistence
, and have the abstract base classPolarisMetaStoreSession
handle the juggling of the three related tables.This includes lookupEntities[Active], writeToEntities[Active], deleteFromEntities[Active], etc.
The new
writeEntity
method inBasePersistence
now also includesoriginalEntity
to make it more explicit what to use for the compare-and-swap. Technically we 90% only need theoriginalEntity.getEntityVersion()
andoriginalEntity.getGrantRecordsVersion()
, and in just therenameEntity
case we also need to useoriginalEntity.getParentId()
andoriginalEntity.getName()
, but for now it doesn't hurt to just plumb in the entire original entity.