Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FederatedGraphStore double-caching causes desync issues in replicated deployment #2457

Closed
sw96411 opened this issue Jul 5, 2021 · 11 comments · Fixed by #2595
Closed

FederatedGraphStore double-caching causes desync issues in replicated deployment #2457

sw96411 opened this issue Jul 5, 2021 · 11 comments · Fixed by #2595
Assignees
Labels
bug Confirmed or suspected bug federated-store Specific to/touches the federated-store module

Comments

@sw96411
Copy link
Contributor

sw96411 commented Jul 5, 2021

FederatedGraphStore maintains both a uk.gov.gchq.gaffer.cache.Cache of the various Graphs managed by a federated store, and it's own private Map<FederatedAccess, Set> named 'storage' (the latter storing deserialised objects).

There is no mechanism to ensure that the Map reflects any changes in the Cache. This could be caused, for instance, by a deployment scenario where the customer is running multiple instances of Gaffer and trying to use a shared CacheLoader implementation to share data between them. This is demonstrated (mostly. Sorta) by the unit test in sw96411@15a2cdd

The trivial solution is simply to abandon the use of the private Map and always deserialise the objects from the Cache on demand. I'm reluctant to suggest this, though, as it would greatly increase the cost of calling any of the methods on FederatedStore.

Other options available include refreshing the Map from the Cache at a maximum frequency (even once per second would in theory save a lot of deserialisation), writing a serial number to the cache on modification and then testing this serial number on read, or creating a specialisation of Cache which can record the last time the Cache (or part thereof) was modified and then delegating the handling of this issue off to that somehow.

I'd be happy to PR in a solution based on any of the above suggestions, but before I do, I wanted to check with the collective:

  1. Is this scenario (multiple "hot spare" instances of Gaffer collaborating around a shared cache) even meant to be supported?
  2. How does this issue interact with FederatedStore ChangedGraphId with updated Cache will result in an orphaned table of data. #2435? Is this effectively a duplicate?

Tagging in @GCHQDev404 as I see they have made commits to related issues and @m29827
as they were the last editor on FederatedGraphStorage.

Grateful for thoughts from any of the community, of course!

@GCHQDev404 GCHQDev404 reopened this Jul 5, 2021
GCHQDev404 added a commit that referenced this issue Jul 5, 2021
…ion with FederatedGraphStorage and further testing.
@GCHQDev404
Copy link
Contributor

GCHQDev404 commented Jul 5, 2021

  1. sharing caches is not supported, hasn't been a requirement. I would need some more information in and around why a shared cache would be required between multiple Gaffer Instances.

  2. There is PR for FederatedStore ChangedGraphId with updated Cache will result in an orphaned table of data. #2435 (it passes locally) any work carried out on this branch likely will need to merged.

@GCHQDev404
Copy link
Contributor

GCHQDev404 commented Jul 5, 2021

examine commit 1324d92

GCHQDev404 added a commit that referenced this issue Jul 5, 2021
…ion with FederatedGraphStorage and further testing.
@sw96411
Copy link
Contributor Author

sw96411 commented Jul 6, 2021

Thanks @GCHQDev404. For clarity, I think instead of "Shared Caches" I should probably have said something like "CacheLoaders with a common persistent store".

This is required whenever you have multiple instances of gaffer running over the same data for resiliency or scalability. I.e. imagine you have at least two instances of Gaffer behind a load balancer, running over the same data. A user creates a graph on instance A, and then another user queries for the list of all graphs on instance B. We want (with maybe some acceptable level of lag) the user querying on B to be aware of the graph created on A.

Again, I'm still not 100% sure this is something Gaffer actually claims to support right now, although it seems like a reasonable requirement. But might make this a feature request rather than a bugfix and hence affect priority.

Re: the included commit. Thanks v much for this, it's the kind of thing I was thinking of. One thought: If a graph has been changed, rather than added or deleted (or a graph has been deleted and then another with the same name added), then will this cause OverWritingExceptions from the checkExisting and validateExisting methods?

Sorry for not giving you a unit test for this, I'm writing this between meetings but will try to drop one in later today if it's still relevant

(ETA clarification on "Shared Caches")

GCHQDev404 added a commit that referenced this issue Jul 11, 2021
…hronisation with FederatedGraphStorage and further testing."

This reverts commit 1324d92
GCHQDev404 added a commit that referenced this issue Jul 11, 2021
@GCHQDev404
Copy link
Contributor

@sw96411 Please examine the latest branch. https://github.com/gchq/Gaffer/tree/gh-2457-double-caching-issue
It has 1 remaining failing test, that requires fixing.
But the solution has removed the inner storage map from FederatedGraphStorage and I think is 99% complete.
It also contains the Table renaming branch, this might need to be teased out before merging.

@sw96411
Copy link
Contributor Author

sw96411 commented Jul 16, 2021

Just leaving a quick note based on our offline conversations to say that following an e2e test, this code in itself looks good, but performance issues arise, at least partly due to #2478. I think we suggested looking to minimise the frequency with which we call .getGraph() on GraphSerialisables in this code as part of the solution, but it's the E2E performance the matters rather than any particular solution

GCHQDev404 added a commit that referenced this issue Jul 16, 2021
GCHQDev404 added a commit that referenced this issue Jul 16, 2021
@GCHQDev404
Copy link
Contributor

Branch has been refactored, JobTracker cache has been changed to accept a naming suffix the same as FederatedStorageCache.

Please examine https://github.com/gchq/Gaffer/compare/gh-2457-double-caching-issue

@GCHQDev404
Copy link
Contributor

This contains many breaking changes and will need to be reviewed before merging into V1 Gaffer.
likely will need to be V2.

@sw96411
Copy link
Contributor Author

sw96411 commented Aug 13, 2021

Just a quick note to say that the code looks good and I've had a quick test against my team's dev environment. Although I wasn't able to do a full E2E test due to dependency issues, I was able to validate that the performance issues mentioned above no longer occur. I think this code is ready to enter the formal merge/test pipeline at the author's convenience

@sw96411
Copy link
Contributor Author

sw96411 commented Aug 13, 2021

P.S. Thanks v much @GCHQDev404 for the excellent work on this issue :-)

@n3101 n3101 added the federated-store Specific to/touches the federated-store module label Dec 7, 2021
@n3101 n3101 modified the milestones: v2.0.0, v2.0.0-alpha-0.3 Jan 19, 2022
GCHQDev404 added a commit that referenced this issue Feb 28, 2022
…nto gh-2457-double-caching-issue-merge-alpha1

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTraitsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreToFederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedAdminIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphWithHooksHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAggregateHandlerTest.java
GCHQDev404 added a commit that referenced this issue Jul 25, 2022
…' into gh-2457-double-caching-issue-merge-alpha3

# Conflicts:
#	core/graph/src/main/java/uk/gov/gchq/gaffer/graph/Graph.java
#	core/graph/src/main/java/uk/gov/gchq/gaffer/graph/GraphSerialisable.java
#	core/graph/src/test/java/uk/gov/gchq/gaffer/graph/GraphSerialisableTest.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStore.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/FederatedAddGraphHandlerParent.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreAuthTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreCacheTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreGetTraitsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreGraphVisibilityTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStorePublicAccessTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreToFederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreWrongGraphIDsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedAdminIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphWithHooksHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAggregateHandlerTest.java
@GCHQDev404 GCHQDev404 self-assigned this Sep 22, 2022
GCHQDev404 added a commit that referenced this issue Oct 23, 2022
…ha' into gh-2457-double-caching-issue-merge-alpha3

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStore.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/FederatedOperationChainValidator.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedOperationHandler.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreDefaultGraphsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedAdminIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedStoreRecursionIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/FederatedOperationHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphWithHooksHandlerTest.java
GCHQDev404 added a commit that referenced this issue Oct 23, 2022
…ha' into gh-2457-double-caching-issue-merge-alpha3

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStore.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/FederatedOperationChainValidator.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedOperationHandler.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreDefaultGraphsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedAdminIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedStoreRecursionIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/FederatedOperationHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphWithHooksHandlerTest.java
GCHQDev404 added a commit that referenced this issue Oct 23, 2022
…ha' into gh-2457-double-caching-issue-merge-alpha3

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStore.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/FederatedOperationChainValidator.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedOperationHandler.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorageTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreDefaultGraphsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedAdminIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/integration/FederatedStoreRecursionIT.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/FederatedOperationHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphHandlerTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedAddGraphWithHooksHandlerTest.java
GCHQDev404 added a commit that referenced this issue Oct 25, 2022
@t92549 t92549 linked a pull request Oct 26, 2022 that will close this issue
GCHQDev404 added a commit that referenced this issue Oct 27, 2022
…hanging GraphSerialisable causes backwards compatability issues.
GCHQDev404 added a commit that referenced this issue Oct 28, 2022
GCHQDev404 added a commit that referenced this issue Oct 31, 2022
GCHQDev404 added a commit that referenced this issue Oct 31, 2022
GCHQDev404 added a commit that referenced this issue Oct 31, 2022
@GCHQDev404
Copy link
Contributor

Migration

The Old cache will not work because the name of the cache will not be found/used.
Solution. delete the cache directory. turn on the FederatedStore it will be empty, re-connect to stores.
Store will continue to point to persistant data. (Proxy and Accumulo.)

GCHQDev404 added a commit that referenced this issue Nov 2, 2022

* gh-2457-double-caching-issue weak initial step, requires synchronisation with FederatedGraphStorage and further testing.

* gh-2457-double-caching-issue remove FederatedGraphStorage local map, using cache only.

* gh-2457-double-caching-issue remove FederatedGraphStorage test fixes

* gh-2457-double-caching-issue remove FederatedGraphStorage review.

* gh-2457-double-caching-removing-graphstorage minimising use of GraphSerialisable.getGraph()

* gh-2457-double-caching-removing-graphstorage gh-2478 JobTracker cache can have Suffix name.

* gh-2457 double caching issue fix for persisting graph names in tests.

* Merge remote-tracking branch 'origin/v2-alpha' into gh-2357-federatedstore-federated-operation-merge-alpha2
!!!With 1 failing class of Tests!!!

* gh-2457 GraphSerialisable not being able to Mock has failing tests. changing GraphSerialisable causes backwards compatability issues.

* gh-2457 Fixed GraphSerialisable equals.

* gh-2457 checkstyle

* gh-2457 PR requests.

* gh-2457 PR requests.

* gh-2457 PR requests.
@n3101
Copy link

n3101 commented Nov 2, 2022

Closed by #2595

@n3101 n3101 closed this as completed Nov 2, 2022
GCHQDev404 added a commit that referenced this issue Nov 2, 2022
GCHQDev404 added a commit that referenced this issue Nov 2, 2022
…re-RemoveGraphAndDeleteAccumuloTable

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
GCHQDev404 added a commit that referenced this issue Nov 2, 2022
…re-federated-operation-merge-mapping

# Conflicts:
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedGraphStorage.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStore.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreProperties.java
#	store-implementation/federated-store/src/main/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/impl/FederatedOperationHandler.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreDefaultGraphsTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/FederatedStoreTest.java
#	store-implementation/federated-store/src/test/java/uk/gov/gchq/gaffer/federatedstore/operation/handler/FederatedOperationHandlerTest.java
@t92549 t92549 added the bug Confirmed or suspected bug label Nov 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed or suspected bug federated-store Specific to/touches the federated-store module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants