-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent parallel snapshots #4404
Prevent parallel snapshots #4404
Conversation
state/accountsDB.go
Outdated
return adb, nil | ||
} | ||
|
||
func getAccountsDb(args ArgsAccountsDB) *AccountsDB { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think createAccountsDb will be a better suitable name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, changed
state/accountsDB.go
Outdated
return false | ||
} | ||
|
||
if !trieStorageManager.ShouldTakeSnapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return directly here? like
return trieStorageManager.ShouldTakeSnapshot()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
trieStorageManager := adb.mainTrie.GetStorageManager() | ||
val, err := trieStorageManager.GetFromCurrentEpoch([]byte(common.ActiveDBKey)) | ||
if err != nil || !bytes.Equal(val, []byte(common.ActiveDBVal)) { | ||
startSnapshotAfterRestart(adb, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate goroutine will be launched due to calling startSnapshotAfterRestart
, which is correct ✅
Ideally, we wouldn't call such a function in a constructor (NewAccountsDB
), but can be left as it is for the moment (it was the same before).
Also, optionally, extract bytes.Equal(val, []byte(common.ActiveDBVal)
to a separate variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted in a separate variable.
startSnapshotAfterRestart
is moved from the constructor in #4367
@@ -1005,14 +1012,11 @@ func (adb *AccountsDB) RecreateAllTries(rootHash []byte) (map[string]common.Trie | |||
return nil, err | |||
} | |||
|
|||
recreatedTrie, err := adb.mainTrie.Recreate(rootHash) | |||
allTries, err := adb.recreateMainTrie(rootHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recreateMainTrie
returns allTries
- is not obvious from the name. Is there a better name for recreateMainTrie
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not having any great idea
if err != nil { | ||
log.Error("snapshotState error", "err", err.Error()) | ||
trieStorageManager, epoch, shouldTakeSnapshot := adb.prepareSnapshot(rootHash) | ||
if !shouldTakeSnapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps log a message that we won't take the snapshot yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a log message inside prepareSnapshot:
log.Debug("skipping snapshot",
"last snapshot rootHash", adb.lastSnapshot.rootHash,
"rootHash", rootHash,
"last snapshot epoch", adb.lastSnapshot.epoch,
"epoch", epoch,
"isSnapshotInProgress", adb.isSnapshotInProgress.IsSet(),
)
func (adb *AccountsDB) prepareSnapshot(rootHash []byte) (common.StorageManager, uint32, bool) { | ||
trieStorageManager, epoch, err := adb.getTrieStorageManagerAndLatestEpoch() | ||
if err != nil { | ||
log.Error("snapshot user state error", "err", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be considered a fatal error (e.g. stop node)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a chance that the next snapshot will succeed, so the node can continue processing. What do you say @iulianpascalau ?
state/accountsDB.go
Outdated
|
||
go adb.markActiveDBAfterSnapshot(stats, errChan, rootHash, "snapshotState user trie", epoch) | ||
|
||
adb.waitForCompletionIfRunningInImportDB(stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, in the future, we can also waitForCompletionIfHeavilySyncing()
(when out of sync). From my understanding of the logs (if correct), that would greatly reduce the snapshot time. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. import db finishes snapshot in 2-3 minutes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot process is started on a goroutine. During import-db, epochs are processed much faster than an actual epoch time, so there is a chance that an epoch is processed faster than it will take the snapshot to finish. adb.waitForCompletionIfRunningInImportDB(stats)
is added to assure that the processing will wait for the snapshot to finish.
|
||
func (adb *AccountsDB) prepareSnapshot(rootHash []byte) (common.StorageManager, uint32, bool) { | ||
trieStorageManager, epoch, err := adb.getTrieStorageManagerAndLatestEpoch() | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also log the results of this call (e.g. the returned epoch?) - for debugging purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is logged. If we skip the snapshot, we also log the epoch. If we take the snapshot, we log the epoch on "starting snapshot..." print
} | ||
|
||
func (adb *AccountsDB) markActiveDBAfterSnapshot(stats *snapshotStatistics, errChan chan error, rootHash []byte, message string, epoch uint32) { | ||
stats.PrintStats(message, rootHash) | ||
|
||
defer adb.isSnapshotInProgress.Reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the function markActiveDBAfterSnapshot()
is now more like a doStuffOnSnapshotCompletion
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Renamed to processSnapshotCompletion
. I look forward to a better naming idea 😅
if !trieStorageManager.ShouldTakeSnapshot() { | ||
log.Debug("skipping snapshot for rootHash", "hash", rootHash) | ||
trieStorageManager, epoch, shouldTakeSnapshot := adb.prepareSnapshot(rootHash) | ||
if !shouldTakeSnapshot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps log a message for not taking the snapshot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is logged inside prepareSnapshot
The base branch was changed.
Codecov Report
@@ Coverage Diff @@
## rc/v1.4.0 #4404 +/- ##
==========================================
Coverage 73.84% 73.84%
==========================================
Files 689 689
Lines 88127 88120 -7
==========================================
- Hits 65075 65073 -2
+ Misses 18140 18137 -3
+ Partials 4912 4910 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
…-snapshot-in-progress # Conflicts: # state/accountsDB.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System test passed.
Description of the reasoning behind the pull request (what feature was missing / how the problem was manifesting itself / what was the motive behind the refactoring)
Proposed Changes
isSnapshotInProgress
flag in order to prevent this kind of behaviour, and skip a snapshot if another one is running.Testing procedure