(8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload #3640

julianknutsen · 2019-11-20T02:58:40Z

New commits start at 66f71e5

With the addition of ProtectedStorageEntrys, there are now persistable
maps that have different payloads and the same keys. In the
ProtectedDataStoreService case, the value is the ProtectedStorageEntry
which has a createdTimeStamp, sequenceNumber, and signature that can
all change, but still contain an identical payload.

Previously, the service was only updating the on-disk representation on
the first object and never again. So, when it was recreated from disk it
would not have any of the updated metadata. This was just copied from the
append-only implementation where the value was the Payload
which was immutable.

This hasn't caused any issues to this point, but it causes strange behavior
such as always receiving seqNr==1 items from seednodes on startup. It
is good practice to keep the in-memory objects and on-disk objects in
sync and removes an unexpected failure in future dev work that expects
the same behavior as the append-only on-disk objects.

Instead of using a subclass that overwrites a value, utilize Guice to inject the real value of 10000 in the app and let the tests overwrite it with their own.

Remove unused imports and clean up some access modifiers now that the final test structure is complete

Previously, this interface was called each time an item was changed. This required listeners to understand performance implications of multiple adds or removes in a short time span. Instead, give each listener the ability to process a list of added or removed entrys which can help them avoid performance issues. This patch is just a refactor. Each listener is called once for each ProtectedStorageEntry. Future patches will change this.

Minor performance overhead for constructing MapEntry and Collections of one element, but keeps the code cleaner and all removes can still use the same logic to remove from map, delete from data store, signal listeners, etc. The MapEntry type is used instead of Pair since it will require less operations when this is eventually used in the removeExpiredEntries path.

…batch All current users still call this one-at-a-time. But, it gives the ability for the expire code path to remove in a batch.

This will cause HashMapChangedListeners to receive just one onRemoved() call for the expire work instead of multiple onRemoved() calls for each item. This required a bit of updating for the remove validation in tests so that it correctly compares onRemoved with multiple items.

…ch removes bisq-network#3143 identified an issue that tempProposals listeners were being signaled once for each item that was removed during the P2PDataStore operation that expired old TempProposal objects. Some of the listeners are very expensive (ProposalListPresentation::updateLists()) which results in large UI performance issues. Now that the infrastructure is in place to receive updates from the P2PDataStore in a batch, the ProposalService can apply all of the removes received from the P2PDataStore at once. This results in only 1 onChanged() callback for each listener. The end result is that updateLists() is only called once and the performance problems are reduced. This removes the need for bisq-network#3148 and those interfaces will be removed in the next patch.

Now that the only user of this interface has been removed, go ahead and delete it. This is a partial revert of f5d75c4 that includes the code that was added into ProposalService that subscribed to the P2PDataStore.

Write a test that shows the incorrect behavior for bisq-network#3629, the hashmap is rebuilt from disk using the 20-byte key instead of the 32-byte key.

Addresses the first half of bisq-network#3629 by ensuring that the reconstructed HashMap always has the 32-byte key for each payload. It turns out, the TempProposalStore persists the ProtectedStorageEntrys on-disk as a List and doesn't persist the key at all. Then, on reconstruction, it creates the 20-byte key for its internal map. The fix is to update the TempProposalStore to use the 32-byte key instead. This means that all writes, reads, and reconstrution of the TempProposalStore uses the 32-byte key which matches perfectly with the in-memory map of the P2PDataStorage that expects 32-byte keys. Important to note that until all seednodes receive this update, nodes will continue to have both the 20-byte and 32-byte keys in their HashMap.

Addresses the second half of bisq-network#3629 by using the HashMap, not the protectedDataStore to generate the known keys in the requestData path. This won't have any bandwidth reduction until all seednodes have the update and only have the 32-byte key in their HashMap. fixes bisq-network#3629

The only user has been migrated to getMap(). Delete it so future development doesn't have the same 20-byte vs 32-byte key issue.

In order to implement remove-before-add behavior, we need a way to verify that the SequenceNumberMap was the only item updated.

It is possible to receive a RemoveData or RemoveMailboxData message before the relevant AddData, but the current code does not handle it. This results in internal state updates and signal handler's being called when an Add is received with a lower sequence number than a previously seen Remove. Minor test validation changes to allow tests to specify that only the SequenceNumberMap should be written during an operation.

Now that we have introduced remove-before-add, we need a way to validate that the SequenceNumberMap was written, but nothing else. Add this feature to the validation path.

In order to aid in propagation of remove() messages, broadcast them in the event the remove is seen before the add.

Now that there are cases where the SequenceNumberMap and Broadcast are called, but no other internal state is updated, the existing helper functions conflate too many decisions. Remove them in favor of explicitly defining each state change expected.

freimair · 2019-11-25T11:22:46Z

p2p/src/main/java/bisq/network/p2p/storage/P2PDataStorage.java

-            ProtectedStorageEntry previous = protectedDataStoreService.putIfAbsent(hashOfPayload, protectedStorageEntry);
-            if (previous == null)
-                protectedDataStoreListeners.forEach(e -> e.onAdded(protectedStorageEntry));
+            protectedDataStoreService.put(hashOfPayload, protectedStorageEntry);


This means that we will have a lot more save procedures. Looking at the code in FileManager:75 ff. I feel not confident to do this change now as we significantly up the risk of having corrupted data stores - which is very bad.

Just by a quick look at the filemanager code, I doubt that this is thread-safe. There seems to be an obvious race condition in 79-83. Can we have that tested and fixed if needed before we apply this patch?

I'm not sure I understand the full concern. These objects use the same path as all of the other persistable objects that are written on trades, account signings, etc. Are you suggesting we hold the release until FileManager.java:75 is looked into?

A ConcurrentHashMap is used in all the TempProposal code, but I think your comments are more related to the fact that all FileManager tasks aren't threadsafe regardless of the Storage data structures. Is that right?

I'm not sure I understand the full concern.

The storage stuff is most probably broken and we had corrupted databases in the past (which are very hard to recover from especially because the max message size is nearly reached and seednodes/clients have a hard time sending/receiving the whole 10MB of data). If we increase the number of writes (by persisting stuff everytime something we already have gets updated) we up the chance of hitting that race condition. Nothing more nothing less.

Are you suggesting we hold the release until FileManager.java:75 is looked into?

no, there is no need to hold the release. The broken code is shipped already, so there is nothing we can do about that. If you mean holding the PR by "hold the release" until this is fixed: yes.

aren't threadsafe regardless of the Storage data structures. Is that right?

yes, correct. I believe the storage stuff needs a closer look and fast and with high priority.

I agree the FileManager would need some love. I am not aware of many file corruption issues and there is some routines to handle that: Move corrupted file to a backup folder and recreate it. If recreating is problematic depends on the file. Some files are then taken from resources and just require the delta update (trade stat, account age). Others like PendingTrades, Disputes etc. are more problematic as its user data. But there are rolling backups and so far I am not aware that it ever causes a serious problem.
More problems happen with corrupted wallet files but that is done inside BitcoinJ (from where I borrowed most of the persistence code base).
So conclusion: Yes would be good to get the persistence code thread safe and improved. But I don't consider that we have lots of problems in that area atm (have not looked into the code changes and cannot comment on the increased risk).
Beside that I think it is one of the major performance bottlenecks. Some objects get written too often to disk and specially if they are larger and users have non-SSD disks it comes with considerable costs.

Fix a bug introduced in d484617 that did not properly handle a valid use case for duplicate sequence numbers. For in-memory-only ProtectedStoragePayloads, the client nodes need a way to reconstruct the Payloads after startup from peer and seed nodes. This involves sending a ProtectedStorageEntry with a sequence number that is equal to the last one the client had already seen. This patch adds tests to confirm the bug and fix as well as the changes necessary to allow adding of Payloads that were previously seen, but removed during a restart.

Although the code was correct, it was hard to understand the relationship between the to-be-written object and the savePending flag. Trade two dependent atomics for one and comment the code to make it more clear for the next reader.

Fix a bug in the FileManager where a saveLater called with a low delay won't execute until the delay specified by a previous saveLater call. The trade off here is the execution of a task that returns early vs. losing the requested delay.

Only one caller after deadcode removal.

Now that we want to make changes to the MapStoreService, it isn't sufficient to have a Fake of the ProtectedDataStoreService. Tests now use a REAL ProtectedDataStoreService and a FAKE MapStoreService to exercise more of the production code and allow future testing of changes to MapStoreService.

With the addition of ProtectedStorageEntrys, there are now persistable maps that have different payloads and the same keys. In the ProtectedDataStoreService case, the value is the ProtectedStorageEntry which has a createdTimeStamp, sequenceNumber, and signature that can all change, but still contain an identical payload. Previously, the service was only updating the on-disk representation on the first object and never again. So, when it was recreated from disk it would not have any of the updated metadata. This was just copied from the append-only implementation where the value was the Payload which was immutable. This hasn't caused any issues to this point, but it causes strange behavior such as always receiving seqNr==1 items from seednodes on startup. It is good practice to keep the in-memory objects and on-disk objects in sync and removes an unexpected failure in future dev work that expects the same behavior as the append-only on-disk objects.

There were no users.

julianknutsen added 9 commits November 19, 2019 08:30

[PR COMMENTS] Make maxSequenceNumberBeforePurge final

617585d

Instead of using a subclass that overwrites a value, utilize Guice to inject the real value of 10000 in the app and let the tests overwrite it with their own.

[TESTS] Clean up 'Analyze Code' warnings

3bd67ba

Remove unused imports and clean up some access modifiers now that the final test structure is complete

Change removeFromMapAndDataStore to signal listeners at the end in a …

489b25a

…batch All current users still call this one-at-a-time. But, it gives the ability for the expire code path to remove in a batch.

Remove HashmapChangedListener::onBatch operations

a8139f3

Now that the only user of this interface has been removed, go ahead and delete it. This is a partial revert of f5d75c4 that includes the code that was added into ProposalService that subscribed to the P2PDataStore.

[TESTS] Regression test for bisq-network#3629

849155a

Write a test that shows the incorrect behavior for bisq-network#3629, the hashmap is rebuilt from disk using the 20-byte key instead of the 32-byte key.

julianknutsen requested a review from ripcurlx as a code owner November 20, 2019 02:58

julianknutsen mentioned this pull request Nov 20, 2019

For Cycle 8 bisq-network/compensation#413

Closed

julianknutsen changed the title ~~Persist changes to ProtectedStoragePayload objects implementing PersistablePayload~~ (5/5) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload Nov 20, 2019

julianknutsen added 5 commits November 20, 2019 16:15

[DEAD CODE] Remove getProtectedDataStoreMap

793e84d

The only user has been migrated to getMap(). Delete it so future development doesn't have the same 20-byte vs 32-byte key issue.

[TESTS] Allow tests to validate SequenceNumberMap write separately

526aee5

In order to implement remove-before-add behavior, we need a way to verify that the SequenceNumberMap was the only item updated.

julianknutsen force-pushed the persist-correctly branch from ae26bff to 3695752 Compare November 21, 2019 00:39

julianknutsen added 3 commits November 22, 2019 08:16

[TESTS] Allow remove() verification to be more flexible

931c1f4

Now that we have introduced remove-before-add, we need a way to validate that the SequenceNumberMap was written, but nothing else. Add this feature to the validation path.

Broadcast remove-before-add messages to P2P network

0472ffc

In order to aid in propagation of remove() messages, broadcast them in the event the remove is seen before the add.

julianknutsen force-pushed the persist-correctly branch from 3695752 to 2336754 Compare November 22, 2019 19:55

julianknutsen changed the title ~~(5/5) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload~~ (6/6) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload Nov 22, 2019

freimair suggested changes Nov 25, 2019

View reviewed changes

julianknutsen changed the title ~~(6/6) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload~~ (7/7) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload Nov 25, 2019

julianknutsen changed the title ~~(7/7) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload~~ [POST_CODE_FREEZE] Persist changes to ProtectedStoragePayload objects implementing PersistablePayload Nov 25, 2019

julianknutsen mentioned this pull request Nov 25, 2019

FileManager can lose writes #3687

Closed

julianknutsen added 2 commits November 25, 2019 13:56

Clean up AtomicBoolean usage in FileManager

2208003

Although the code was correct, it was hard to understand the relationship between the to-be-written object and the savePending flag. Trade two dependent atomics for one and comment the code to make it more clear for the next reader.

[DEADCODE] Clean up FileManager.java

1895802

julianknutsen added 6 commits November 25, 2019 14:09

[REFACTOR] Inline saveNowInternal

685824b

Only one caller after deadcode removal.

[DEADCODE] Remove protectedDataStoreListener

3503fe3

There were no users.

[DEADCODE] Remove unused methods in ProtectedDataStoreService

44a11a0

julianknutsen force-pushed the persist-correctly branch from 2336754 to 44a11a0 Compare November 26, 2019 01:57

julianknutsen changed the title ~~[POST_CODE_FREEZE] Persist changes to ProtectedStoragePayload objects implementing PersistablePayload~~ (8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload Nov 26, 2019

freimair approved these changes Nov 26, 2019

View reviewed changes

ripcurlx merged commit b15eb70 into bisq-network:master Nov 26, 2019

julianknutsen deleted the persist-correctly branch November 26, 2019 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload #3640

(8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload #3640

julianknutsen commented Nov 20, 2019 •

edited

Loading

freimair Nov 25, 2019

julianknutsen Nov 25, 2019 •

edited

Loading

freimair Nov 25, 2019

chimp1984 Nov 25, 2019

(8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload #3640

(8/8) Persist changes to ProtectedStoragePayload objects implementing PersistablePayload #3640

Conversation

julianknutsen commented Nov 20, 2019 • edited Loading

freimair Nov 25, 2019

Choose a reason for hiding this comment

julianknutsen Nov 25, 2019 • edited Loading

Choose a reason for hiding this comment

freimair Nov 25, 2019

Choose a reason for hiding this comment

chimp1984 Nov 25, 2019

Choose a reason for hiding this comment

julianknutsen commented Nov 20, 2019 •

edited

Loading

julianknutsen Nov 25, 2019 •

edited

Loading