Core persistence refactor phase 1 - extract BasePersistence and BaseMetaStoreManager to isolate all "transaction" behaviors #1070

dennishuo · 2025-02-26T08:32:27Z

Though this isn't 100% of the persistence refactor yet for unlocking non-transactional databases, this gets the main structure of the lower-level store layer mostly settled and possibly ready for parallel implementation work.

Fully remove the vestigial "entitiesDropped" methods; these were originally for UNDROP but ended up not being used; they can be reintroduced with better design in the future if we want to add UNDROP.

Extract BasePersistence interface as parent interface of PolarisMetaStoreSession which now becomes an abstract class;
only leave the transaction-specific methods in PolarisMetaStoreSession.

Eliminate all direct interaction with the "triplet tables" (entities/entitiesActive/entitiesChangeTracking) from PolarisMetaStoreManagerImpl in favor of aggregate methods declared in BasePersistence, and have the abstract base class PolarisMetaStoreSession handle the juggling of the three related tables.

This includes lookupEntities[Active], writeToEntities[Active], deleteFromEntities[Active], etc.

The new writeEntity method in BasePersistence now also includes originalEntity to make it more explicit what to use for the compare-and-swap. Technically we 90% only need the originalEntity.getEntityVersion() and originalEntity.getGrantRecordsVersion(), and in just the renameEntity case we also need to use originalEntity.getParentId() and originalEntity.getName(), but for now it doesn't hurt to just plumb in the entire original entity.

…stence stack Remove the EntityCacheEntry wrapper since the timestamp fields were never used anyways; instead the underlying Caffeine cache transparently handles access times, and the types of entries we cache are simply the existing ResolvedPolarisEntity. Remove interactions of business logic with explicit "cache entries", instead operating on ResolvedPolarisEntity. Fix the equals()/hashCode() behavior of PolarisEntity to be compatible with PolarisBaseEntity as intended. Improve code comments to explain the (current) relationship between PolarisEntity and PolarisBaseEntity, and clarify the behaviors in Resolver.java. Fully remove the PolarisRemoteCache interface and its methods. Add different methods that aren't cache-specific instead.

…erize ResolverTest

… of CallContext This appeared to be some leaky divergence that occurred after CallContext had been removed, but PolarisMetaStoreSession really is only a low-level implementation detail that should never be handled by BasePolarisCatalog/FileIOFactory. This plumbs CallContext explicitly into the FileIOFactory and FileIOUtil methods and thus removes a large source of CallContext.getCurrentContext calls; now the threadlocal doesn't have to be set at all in BasePolarisCatalogTest.

…ence-poc

Extract a basic abstract base class BaseMetaStoreManager which can hold shared logic between implementations of PolarisMetaStoreManager

…g to implement UNDROP but for now are entirely unused. We can reintroduce it with a better design against multiple backends when/if we want to implement UNDROP.

…toreSession; only leave the transaction-specific methods in PolarisMetaStoreSession

…ence-poc-extract-abstract-base

…ransactional-style PolarisMetaStoreSession, so that BasePersistence properly exposes a lookupEntityByName method where impls that use secondary indexes can easily just lookup an entity by name instead of doing two lookups.

…tityActive protected-visibility; remove all callsites where PolarisMetaStoreManagerImpl calls it. Technically, while in the same package this doesn't prevent it from leaking, but we could reposition PolarisMetaStoreSession into a separate transaction-specific package to help protect it from leaking the lower-level abstractions.

…nd add writeEntity method to BasePersistence, with a default impl in PolarisMetaStoreSession containing what was previously in PolarisMetaStoreManagerImpl. This now protects all writes in PolarisMetaStoreManagerImpl from dealing with the three-table implementation detail. Technically slightly changes the ordering of updates within a transaction for renameEntity, but is arguably a more correct ordering, and the ordering doesn't interleave reads anyways.

…ap behavior from the underlying BasePersistence. Pushdown all the deleteFromEntities* methods into PolarisMetaStoreSession and add deleteEntity to BasePersistence which encapsulates handling the separate slices.

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

…nto a new IntegrationPersistence interface; these methods encapsulate certain type-specific behaviors that are indirectly tied to persistence entities, such as principal secrets, storage integrations, etc.

dimas-b

Thanks for this PR @dennishuo ! I think it is a step in the right direction for eventual NoSQL support :)

Some preliminary comments for now... more to come.

dimas-b · 2025-02-26T22:50:10Z

service/common/src/main/java/org/apache/polaris/service/context/DefaultCallContextResolver.java

@@ -60,6 +60,10 @@ public CallContext resolveCallContext(
        .addKeyValue("headers", headers)
        .log("Resolving CallContext");

+    // TODO: Once we have non-transactional-database persistence stores, this should be
+    // pushed down for the metaStoreManagerFactory to inject Transactional-DB specific things
+    // (including the MetaStoreSession" into the PolarisCallContext. The non-transactional


Why would PolarisCallContext have DB-specific stuff? How do you envision the purpose and use cases of PolarisCallContext?

The MetaStoreSessionFactory kind of implies at least baking in the realmContext during the construction of the handle into the persistence store. So in general one might decorate the handle into the persistence store with various elements of traceability/auditability, blast-radius reduction (e.g. if it could connect to the backend with some kind of scoped internal credentials), which is why the current architecture treats the handle into the persistence backend as being a property of the call context.

All that can be implemented as request-scoped beans, AFAIK 🤔 So, the "context" method parameters become injected fields or constructor parameters. Would that work?

Per discussion, we can continue fine-tuning callcontext syntax in the followup.

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-26T22:59:00Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *     null if this is expected to be a brand-new entity
+   */
+  void writeEntity(
+      @Nonnull PolarisCallContext callCtx,


I'm generally against "call contexts" as explicit method parameters (or thread locals) from the API design POV.

Is there a reason why call context objects cannot be injected into impl. classes (Request-scoped beans in CDI)?

I'm very against using ThreadLocals as well, but it seems to me the opposite of ThreadLocal is explicit method parameter. In general I like the approach in core Iceberg code of trying to be very explicit as much as possible and avoiding injection.

"RequestScoped" seems prone to many of the same problems as ThreadLocals, where the provision of state leaves too much to the magic framework.

The main use case I'm thinking of here though is that not all CallContexts necessarily come in through the framework; we have the whole Tasks subsystem where we want an explicit definition of the CallContext that is constructed proactively by the system-initiated task handler

Regarding tasks in general and the concrete use case of "purging tables", all that's eventually needed are some parameters. For "purge table" that's an object describing the table and some Iceberg parameters - the actual "purge" itself doesn't need persistence at all. What we eventually need is a robust tasks framework (#774) - what individual task "runnables" need and how they work is up to those. If those need some persistence interaction, we can always have factories.

Regarding the parameter, yea - I also think, it's not necessary. If an implementation needs it, it could take it as a constructor argument.

I do not agree that the CDI framework is "magic" :) Critical points are controlled by the application: either through annotations or producer methods.

That said, dealing with the "call context" stuff an persistence API in the same PR is probably an overkill. At the same time, I believe it is important to have as clean a persistence API as possible (i.e. minimal set of parameters).

How about removing call context from this interface and use specialiazation when integrating with existing callers?

For example: Instead of persistence.writeEntity(ctx, ...) do persistenceProvider.forContext(ctx).writeEntity(...).

Yeah, I'm being half-facetious about the "magic" ;)

I try not to be a luddite but maybe just old-fashioned. As the saying goes "any sufficiently advanced injection of boilerplate code is indistinguishable from magic"

In this particular case though I was also spooked at how easily it could potentially slip through to accidentally inject something at the wrong scope: #1000 (comment)

Somehow the producer graphs were happy to inject an ApplicationScoped CallContext without any idea where it came from (!?)

inject an ApplicationScoped CallContext without any idea where it came from (!?)

Nope - producers are not ambiguous. If there is an ambiguity, Quarkus will fail the build. (Weld fails at runtime.)

I think you're explaining that each bean can only have one producer, but that's not the kind of ambiguity referred to above. If you have a long chain of method calls where A.x calls B.y which calls C.z and so on, it can often be unclear at what point in time some object that's used in C.z was actually produced by the injection framework. It's ambiguous where & when the object was constructed, not what method actually did the construction.

From a different angle: Can PolarisCallContext contain all the necessary information for all possible implementations of BasePersistence? I'd like to avoid it becoming a union of all possible dependencies.

Let's continue fine-tuning callcontext syntax in the followup

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-26T23:10:26Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   * @param grantRec entity record to write, potentially replacing an existing entity record with
+   *     the same key
+   */
+  void writeToGrantRecords(


It looks like grant records a different from "entities", right? If so, why not have a separate persistence interface for them?

It might be possible, but we'll have to think more about what kind of consistency guarantees we want to preserve given the upper layer's ResolvedEntity semantic which is a view of the entity with associated grants all together.

In practice, right now to preserve the intended behavior of Polaris' grant-based authorization, grants and entities are tightly coupled under the same implied persistence store with interrelated ids, grantRecordsVersions etc.

Perhaps if we really want to extract the GrantsManager to be 100% independent, we might want to introduce a different construct, like ForeignGrantRecords or something, where the referential integrity is no longer based on a shared persistence layer. I think that's a plausible option, but we can portray that as "new behavior" while we adapt the existing behavior as-is.

I foresee the need to have pluggable authZ. It doesn't need to be necessarily this specific RBAC. It could be ABAC, another RBAC or even "noop". WDYT about making the current RBAC objects just entities?

I don't think we are tied to a shared persistence layer. The foreign key constraints are implied, not explicit, so the grant records don't actually need to point to a valid entity id (valid from the grant record's persistence layer's perspective). What matters is that the grant version of the entity changes when the grant records are changed, but even that is not guaranteed in a non-transactional database system. I.e., there is no CAS that guarantees grant records are persisted and that the entity grant version changes atomically.

Introducing a ForeignGrantRecords concrete type means that the authorizer has to handle both native grants and external grants, which is more code to maintain and test.

grant version of the entity changes when the grant records are changed

Very interesting... I think I missed that point before. Could someone point to current code that relies on this?

polaris/polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManagerImpl.java

Line 403 in bd27a2e

granteeEntity.setGrantRecordsVersion(granteeEntity.getGrantRecordsVersion() + 1);

// grants have changed, we need to bump-up the grants version granteeEntity.setGrantRecordsVersion(granteeEntity.getGrantRecordsVersion() + 1); this.writeEntity(callCtx, ms, granteeEntity, false);

Which impacts Resolver:

polaris/polaris-core/src/main/java/org/apache/polaris/core/persistence/resolver/Resolver.java

Line 559 in bd27a2e

|| entity.getGrantRecordsVersion() != versions.getGrantRecordsVersion()) {

if (versions == null || entity.getEntityVersion() != versions.getEntityVersion() || entity.getGrantRecordsVersion() != versions.getGrantRecordsVersion()) { // if null version we need to invalidate the cached entry since it has probably been // dropped if (versions == null) { if (this.cache != null) { this.cache.removeCacheEntry(resolvedEntity); } refreshedResolvedEntity = null; } else { // refresh that entity. If versions is null, it has been dropped if (this.cache != null) { refreshedResolvedEntity = this.cache.getAndRefreshIfNeeded( this.polarisCallContext, entity, versions.getEntityVersion(), versions.getGrantRecordsVersion()); } else { ResolvedEntityResult result = this.polarisMetaStoreManager.refreshResolvedEntity( this.polarisCallContext,

dimas-b

@dennishuo : lots of comment :) but do not feel compelled to address them all in this PR. Many are discussion points. Many can be addressed later as long as we're in agreement about the general direction.

Thanks for starting this refactoring!

The only kind of critical point from my POV is dealing with multi-entity changes as a first-class persistence API feature.

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-26T23:24:04Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+
+  /**
+   * Delete Polaris entity and grant record metadata from all tables. This is used during metadata
+   * bootstrap to reset all tables to their original state


What is this "original state"?

Changed this comment to just say "...reset all tables to an empty state"; any further bootstrapping state would be defined within shared business logic in the PolarisMetaStoreManagerImpl or BaseMetaStoreManager layer but this "persistence" layer doesn't need to worry about it.

Sorry, I' m still confused... Would the "list" methods find anything after deleteAll?

No, everything should be 100% empty after deleteAll.

Ok, that's what I expected :) I guess it's just a mater to rephrasing javadoc :)

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-26T23:28:03Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *
+   * @param callCtx call context
+   */
+  void deleteAll(@Nonnull PolarisCallContext callCtx);


What is the scope of "all"? Realm?

Yes. Updated comment to say "within the realm defined by the contents of the {@code callCtx}".

Practically we need two very different "purge realm" operations. "Prefix key deletion" and similar isn't supported by all databases. For databases that don't support "prefix key deletion" we'd have to scan the database and delete matching rows.

For database support that, deleting a whole realm can be very time consuming (think: transaction timeouts, aborted requests, etc).

What I'm thinking of is realm-state management: created (populate all the things) -> active (usable) -> inactive (just disabled) -> purging (realm can be purged) -> purged. This would give us some flexibility and I think better guarantees. When a realm's created, a bunch of things need to happen - with pluggable authZ for example even things we don't know - the "realm initial population" can be performed by the things that "core Polaris" needs plus things that plugins need.
To delete and eventually purge a realm, it should be set to "inactive" first to prevent having intermediate dirty states and requests running against a realm that's being purged (or marked as such). A regularly run maintenance job could then do the actual cleanup.
WDYT?

Yeah I suspect there will a lot more adjustments to the common bootstrap and purge code still to come, especially when it comes to async purge and maybe having a safe retention period before final purge; we can continue to refine in the followup.

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-27T00:00:25Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+      long catalogId,
+      long parentId,
+      @Nonnull PolarisEntityType entityType,
+      int limit,


What's the use for limit when the caller cannot resume listing later?

We'll need pageToken for real pagination, agreed. Right now looks like upper layer uses it for basically "hasChildren" checks.

Looks like you can replace the last 3 parameters, if this returns a Stream

For pagination we'd need a "proper" page token that guards against changes between any two pages, to make it impossible that the same entries are returned in more than one page.

For posterity, assume there are 5 tables:

table-A

table-C

table-E

table-G

table-I

Assume a page-size of 2.

The first request yields

table-A

table-C

Then "table-B" is created

The second request should yield

table-E

table-G

table-I

However, org.apache.iceberg.rest.CatalogHandlers#paginate uses integer indexes, so the second request would actually yield "table-C" again:

table-C

table-E

table-G

table-I

Implying that all list* calls return results in a deterministic order (e.g. ascending by name).

See #273 for a pagination implementation that's not based on the integer index

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

dimas-b · 2025-02-27T00:04:25Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   * @param parentId id of the parent, either a namespace or a catalog
+   * @return true if the parent entity has children
+   */
+  boolean hasChildren(


Why is it critical to have this as a dedicated API method?

Good question. Not sure, for now just preserving behavior. Looks like there's a mix of using hasChildren and listEntities(limit...)

If you do not mind, let's remove it to reduce the API breadth. We can always add something later if this use case needs to be optimized... I kind of doubt that it will even come to that because deleting namespaces is less frequent than getting the children of a namespace.

dimas-b · 2025-02-27T00:13:40Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java

+   * @return a storage integration object
+   */
+  @Nullable
+  <T extends PolarisStorageConfigurationInfo> PolarisStorageIntegration<T> createStorageIntegration(


Can PolarisStorageConfigurationInfo be an simple entity in BasePersistence?

Yeah, that was actually one of the original designs, and the current Catalog nesting was thought of as a "simplification" to define "inline StorageConfigurationInfo".

In theory they can be first-class entities and we can have catalogs take referential ids/uris to the configuration info.

Why does persistence need to know about object-storage at all? I think persistence and object-storage are very different things. This function and loadPolarisStorageIntegration are factories for object-storage specific functionality.

Yeah, we might be able to lift these one or two layers higher up in followups.

polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java

dimas-b · 2025-02-27T00:15:12Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java

+   * @param principalId principal id
+   */
+  @Nonnull
+  PolarisPrincipalSecrets generateNewPrincipalSecrets(


I'd think secrete generation not be a persistence concern.

Persistence needs to persist a referential ID though

What do you mean by "persist a referential ID"?

This can be better standardized in the next few followups especially as we also add a ConnectionConfigurationInfo for Catalog Federation, but in general the idea is to leave room (maybe optionally) for the lower-level customizable logic to add persistence-related book-keeping, if secrets or integrations create heavyweight resources.

In OpenCatalog, for example, the machinery might create an "OauthIntegration" in a separate subsystem to serve each new Polaris PrincipalEntity, and there would be a unique integrationId kept track of within the internalProperties of the PrincipalEntity, and likewise the principalId would be stored in the external OAuthIntegration.

This does start getting into the realm of "type-specific extensions" to the underlying persistence model like was suggested in Yufei's proposal, but just in a more limited/low-level scope.

Generally I'd agree though that these interfaces could use some refinement. I'm thinking one of our followup refactors can focus on this "integration provider" concept to make it more natural across implementations. Maybe it's not strictly a "persistence concern" ultimately, but just a "pluggable integration object framework that may interact with the core persistence model".

flyrain · 2025-02-27T00:59:16Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreSession.java

- * <p>Note that APIs to the actual persistence store are very basic, often point read or write to
- * the underlying data store. The goal is to make it really easy to back this using databases like
- * Postgres or simpler KV store.
+ * Extends BasePersistence to express a more "transaction-oriented" control flow for backing stores


Can we rename the class to something like TransactionalMetastore or TransactionOrientedMetaStore? I'd also suggest to move it to a separated package, like org.apache.polaris.core.persistence.transactional, so that it provides a clear structure for other types of metastore to extend.

…ameLookupRecord, remove unused method

…nto a new package "transactional".

flyrain

Thanks @dennishuo! I think it's a good start point.

polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManagerImpl.java

polaris-core/src/main/java/org/apache/polaris/core/persistence/BaseMetaStoreManager.java

snazy · 2025-02-27T15:07:36Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+      @Nonnull PolarisCallContext callCtx,
+      long catalogId,
+      long parentId,
+      int typeCode,


the spec does allow things of different types to have the same name

I suspect you refer to the Iceberg APIs/spec. IMO it's rather a mistake in Iceberg to allow having a view and a table with the exact same name - just thinking about: "SELECT * from this_thing" ... is it a table or a view or a UDF? OTOH, there are checks that prevent creating a table using the same name as an existing view.

snazy · 2025-02-27T15:30:15Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *     null if this is expected to be a brand-new entity
+   */
+  void writeEntity(
+      @Nonnull PolarisCallContext callCtx,


Regarding tasks in general and the concrete use case of "purging tables", all that's eventually needed are some parameters. For "purge table" that's an object describing the table and some Iceberg parameters - the actual "purge" itself doesn't need persistence at all. What we eventually need is a robust tasks framework (#774) - what individual task "runnables" need and how they work is up to those. If those need some persistence interaction, we can always have factories.

snazy · 2025-02-27T15:36:09Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   * @param grantRec entity record to write, potentially replacing an existing entity record with
+   *     the same key
+   */
+  void writeToGrantRecords(


I foresee the need to have pluggable authZ. It doesn't need to be necessarily this specific RBAC. It could be ABAC, another RBAC or even "noop". WDYT about making the current RBAC objects just entities?

snazy · 2025-02-27T15:38:38Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *
+   * @param callCtx call context
+   */
+  void deleteAll(@Nonnull PolarisCallContext callCtx);


Practically we need two very different "purge realm" operations. "Prefix key deletion" and similar isn't supported by all databases. For databases that don't support "prefix key deletion" we'd have to scan the database and delete matching rows.

For database support that, deleting a whole realm can be very time consuming (think: transaction timeouts, aborted requests, etc).

What I'm thinking of is realm-state management: created (populate all the things) -> active (usable) -> inactive (just disabled) -> purging (realm can be purged) -> purged. This would give us some flexibility and I think better guarantees. When a realm's created, a bunch of things need to happen - with pluggable authZ for example even things we don't know - the "realm initial population" can be performed by the things that "core Polaris" needs plus things that plugins need.
To delete and eventually purge a realm, it should be set to "inactive" first to prevent having intermediate dirty states and requests running against a realm that's being purged (or marked as such). A regularly run maintenance job could then do the actual cleanup.
WDYT?

snazy · 2025-02-27T15:50:34Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *     null if this is expected to be a brand-new entity
+   */
+  void writeEntity(
+      @Nonnull PolarisCallContext callCtx,


Regarding the parameter, yea - I also think, it's not necessary. If an implementation needs it, it could take it as a constructor argument.

polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java

snazy · 2025-02-27T16:27:55Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/IntegrationPersistence.java

+   * @return a storage integration object
+   */
+  @Nullable
+  <T extends PolarisStorageConfigurationInfo> PolarisStorageIntegration<T> createStorageIntegration(


Why does persistence need to know about object-storage at all? I think persistence and object-storage are very different things. This function and loadPolarisStorageIntegration are factories for object-storage specific functionality.

snazy · 2025-02-27T16:38:14Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+      long catalogId,
+      long parentId,
+      int typeCode,
+      @Nonnull String name);


The (most dominant) caller of this is when paths within a catalog are resolved.

We already have all "exploded paths" available at the call size, so I guess it's more ergonomic to delegate the resolution of these "exploded paths" to something that specialized for that. The nice side effect would be that call sites wouldn't have to deal with parent-child relationships but only with paths.

snazy · 2025-02-27T16:39:48Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+  @Nullable
+  PolarisBaseEntity lookupEntityByName(
+      @Nonnull PolarisCallContext callCtx,
+      long catalogId,


Isn't catalogId superfluous? Or do multiple catalogs have overlapping catalog-object IDs? IIRC all IDs are unique? Removing this parameter would also remove the need for the special NULL_ID handling, and PolarisEntityId could also go away.

snazy · 2025-02-27T16:54:15Z

...rc/main/java/org/apache/polaris/core/persistence/transactional/TransactionalPersistence.java

+ * which can support a runInTransaction semantic, while providing default implementations of some of
+ * the BasePersistence methods in terms of lower-level methods that subclasses must implement.
+ */
+public abstract class TransactionalPersistence implements BasePersistence, IntegrationPersistence {


The common denominator for atomic operations in non-transactional databases is a CAS on a single row/key. Like "fetch all things", then "check all requirements", then "persist all the changes" and eventually "perform a single CAS" to perform the atomic change. How would this be implemented for example for a "rename table" operation or a "multi table commit" involving all the changes to the individual entities and consistency guarantees?

As discussed, I think one main next step after getting this initial restructuring merged will be to push down the "atomicBatchConditionalWrite" method and make the multi-table control flow plumb through to that.

dimas-b · 2025-02-27T22:35:42Z

...rc/main/java/org/apache/polaris/core/persistence/transactional/TransactionalPersistence.java

+import org.apache.polaris.core.persistence.IntegrationPersistence;
+
+/**
+ * Extends BasePersistence to express a more "transaction-oriented" control flow for backing stores


I have the concern with this approach as I put in https://lists.apache.org/thread/rf5orxs815zs4h64p4rwp03q3pbgxb5r

TL;DR: If we go with the semantics of "change is durable after a persistence API method call returns", then any kind API grouping (e.g. a few operations in the same Tx) is not logically correct, because the caller of the "wrapped" persistence deals with the same interface and can expect each call's changes to be durable independently of other calls' changes. This does not hold when actual changes are committed only at the end of a transaction.

To clarify: I support transaction persistence implementations in Polaris :) All I'm saying is that the transaction aspects should not be exposed in the persistence API, but should be handled privately by the transactional persistence impl.

Yeah it's a good point. Right now before all the refactoring I guess the upper layer (PolarisMetaStoreManagerImpl) is the ultimate authority on the unit of durability, and a custom impl of the traditional PolarisMetaStoreSession might "enforce" this by having ensureInWriteTransaction barriers, for example (OpenCatalog does this in its custom metastore session impl).

After this refactoring, I think it should be possible to unify under the semantic of "each method call is independently durable/committed", with the addition of the new atomicBatchConditionalWrite method (or whatever we call it).

As it stands in this PR, after pushing down the entitiesActive and entitiesChangeTracking stuff, the vast majority of the functionality no longer truly depends on the external layer defining the durability unit as a transaction, so we just need to chip away a few more things.

The main exceptions are called out in https://docs.google.com/document/d/1U9rprj8w8-Q0SnQvRMvoVlbX996z-89eOkVwWTQaZG0/edit?tab=t.0#heading=h.2cx2yoanf9c4

GrantRecords - we can sacrifice perfect "atomicity" in favor of slightly weaker but still good enough "happens-before" semantics, and just document the implications

Types that create external "integrations" and other peripheral entities (Catalogs, Principals, Drop -> Tasks) - also sacrifice atomicity in favor of multi-phase commit types of behaviors and/or happens-before guarantees.

… PR suggestion

dimas-b

Thanks again @dennishuo for starting the refactoring effort!

From my POV we can probably merge this PR in its current state and continue this effort in follow-up PRs.

dimas-b · 2025-02-28T23:26:45Z

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java

+   *
+   * @param callCtx call context
+   * @param entity entity to persist
+   * @param nameOrParentChanged if true, also write it to by-name lookups if applicable


"by-name" lookups looks like the interface is making too many assumptions about the implementation.

Can we say that the caller is responsible for setting this flag in case the parent of the name of the entity changes so that the implementation could use it as a hint to perform additional validation. In any case, the implementation is responsible for ensuring that names are unique within each namespace and that the same object cannot be found under more than one parent.

WDYT?

Ah sorry, missed this one before merging; as you suggested in another comment as well, I'll look to revamp all the javadoc comments in the newly extracted interfaces to better reflect the latest thinking and avoid references to old implementation details

…etaStoreManager to isolate all "transaction" behaviors (apache#1070) * Restructure persistence class hierarchy and remove vestigial interfaces Extract a basic abstract base class BaseMetaStoreManager which can hold shared logic between implementations of PolarisMetaStoreManager * Remove all "entitiesDropped" members; these were vestigial from trying to implement UNDROP but for now are entirely unused. We can reintroduce it with a better design against multiple backends when/if we want to implement UNDROP. * Extract BasePersistence interface as parent interface of PolarisMetaStoreSession; only leave the transaction-specific methods in PolarisMetaStoreSession * Push all evidence of the two-phase lookupEntityByName into only the transactional-style PolarisMetaStoreSession, so that BasePersistence properly exposes a lookupEntityByName method where impls that use secondary indexes can easily just lookup an entity by name instead of doing two lookups. * Turn PolarisMetaStoreSession into an abstract class and make lookupEntityActive protected-visibility; remove all callsites where PolarisMetaStoreManagerImpl calls it. Technically, while in the same package this doesn't prevent it from leaking, but we could reposition PolarisMetaStoreSession into a separate transaction-specific package to help protect it from leaking the lower-level abstractions. * Pushdown all calls to writeToEntities into PolarisMetaStoreSession, and add writeEntity method to BasePersistence, with a default impl in PolarisMetaStoreSession containing what was previously in PolarisMetaStoreManagerImpl. This now protects all writes in PolarisMetaStoreManagerImpl from dealing with the three-table implementation detail. Technically slightly changes the ordering of updates within a transaction for renameEntity, but is arguably a more correct ordering, and the ordering doesn't interleave reads anyways. * Add originalEntity to the writeEntity method to enable compare-and-swap behavior from the underlying BasePersistence. Pushdown all the deleteFromEntities* methods into PolarisMetaStoreSession and add deleteEntity to BasePersistence which encapsulates handling the separate slices. * Break out external-integration related methods from BasePersistence into a new IntegrationPersistence interface; these methods encapsulate certain type-specific behaviors that are indirectly tied to persistence entities, such as principal secrets, storage integrations, etc. * Improve javadoc comments, rename PolarisEntityActiveRecord to EntityNameLookupRecord, remove unused method * Rename PolarisMetaStoreSession to TransactionalPersistence and move into a new package "transactional". * Also move the PolarisTreeMap* classes into the transactional package * Move PolarisMetaStoreManagerImpl into the "transactional" package per PR suggestion

dennishuo added 12 commits February 24, 2025 03:30

Add support for actually giving null EntityCache to Resolver; paramet…

04a636e

…erize ResolverTest

Merge branch 'main' of github.com:dennishuo/polaris into dhuo-persist…

dc820d7

…ence-poc

Restructure persistence class hierarchy and remove vestigial interfaces

7d8942f

Extract a basic abstract base class BaseMetaStoreManager which can hold shared logic between implementations of PolarisMetaStoreManager

Remove all "entitiesDropped" members; these were vestigial from tryin…

54b762e

…g to implement UNDROP but for now are entirely unused. We can reintroduce it with a better design against multiple backends when/if we want to implement UNDROP.

Extract BasePersistence interface as parent interface of PolarisMetaS…

c7c9c45

…toreSession; only leave the transaction-specific methods in PolarisMetaStoreSession

Merge branch 'main' of github.com:dennishuo/polaris into dhuo-persist…

db9e71d

…ence-poc-extract-abstract-base

dennishuo requested review from adutra, ashvina, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain and ebyhr as code owners February 26, 2025 08:32

dennishuo commented Feb 26, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/BasePersistence.java Outdated Show resolved Hide resolved

Break out external-integration related methods from BasePersistence i…

b52a4af

…nto a new IntegrationPersistence interface; these methods encapsulate certain type-specific behaviors that are indirectly tied to persistence entities, such as principal secrets, storage integrations, etc.

dimas-b reviewed Feb 26, 2025

View reviewed changes

dimas-b reviewed Feb 27, 2025

View reviewed changes

flyrain reviewed Feb 27, 2025

View reviewed changes

dennishuo added 3 commits February 27, 2025 03:32

Improve javadoc comments, rename PolarisEntityActiveRecord to EntityN…

f9c1fd7

…ameLookupRecord, remove unused method

Rename PolarisMetaStoreSession to TransactionalPersistence and move i…

f0f06ad

…nto a new package "transactional".

Also move the PolarisTreeMap* classes into the transactional package

ea0a90f

flyrain reviewed Feb 27, 2025

View reviewed changes

polaris-core/src/main/java/org/apache/polaris/core/persistence/PolarisMetaStoreManagerImpl.java Show resolved Hide resolved

polaris-core/src/main/java/org/apache/polaris/core/persistence/BaseMetaStoreManager.java Show resolved Hide resolved

snazy reviewed Feb 27, 2025

View reviewed changes

dimas-b reviewed Feb 27, 2025

View reviewed changes

Move PolarisMetaStoreManagerImpl into the "transactional" package per…

f7d602c

… PR suggestion

dimas-b approved these changes Feb 28, 2025

View reviewed changes

dimas-b reviewed Feb 28, 2025

View reviewed changes

dennishuo merged commit 25509f2 into apache:main Mar 1, 2025
5 checks passed

Core persistence refactor phase 1 - extract BasePersistence and BaseMetaStoreManager to isolate all "transaction" behaviors #1070

Core persistence refactor phase 1 - extract BasePersistence and BaseMetaStoreManager to isolate all "transaction" behaviors #1070

Conversation

dennishuo commented Feb 26, 2025

dimas-b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-maynard Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyrain Feb 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flyrain left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b left a comment • edited Loading

Choose a reason for hiding this comment

dimas-b Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-maynard Feb 28, 2025 •

edited

Loading

dimas-b Feb 27, 2025 •

edited

Loading

dimas-b Feb 27, 2025 •

edited

Loading

flyrain Feb 27, 2025 •

edited

Loading

dimas-b left a comment •

edited

Loading

dimas-b Feb 28, 2025 •

edited

Loading