Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom error for trie get node #4724

Merged
merged 37 commits into from
Apr 21, 2023

Conversation

ssd04
Copy link
Contributor

@ssd04 ssd04 commented Nov 21, 2022

Reasoning behind the pull request

  • When syncing the trie is synced based on root hash, it would be better to sync based only on the missing nodes

Proposed changes

  • Handle get nodes from db error in order to pass also the missing node hash

Testing procedure

  • System test with scenario:
    • with #missing-nodes-sed
    • AccountsStatePruningEnabled and PeerStatePruningEnabled on false

Pre-requisites

Based on the Contributing Guidelines the PR author and the reviewers must check the following requirements are met:

  • was the PR targeted to the correct branch?
  • if this is a larger feature that probably needs more than one PR, is there a feat branch created?
  • if this is a feat branch merging, do all satellite projects have a proper tag inside go.mod?

@ssd04 ssd04 self-assigned this Nov 21, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 21, 2022

Codecov Report

❗ No coverage uploaded for pull request base (feat/sync-missing-trie-nodes@a729637). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head 3666928 differs from pull request most recent head 2a46bcc. Consider uploading reports for the commit 2a46bcc to get more accurate results

Additional details and impacted files
@@                       Coverage Diff                       @@
##             feat/sync-missing-trie-nodes    #4724   +/-   ##
===============================================================
  Coverage                                ?   70.86%           
===============================================================
  Files                                   ?      674           
  Lines                                   ?    87714           
  Branches                                ?        0           
===============================================================
  Hits                                    ?    62161           
  Misses                                  ?    20875           
  Partials                                ?     4678           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

process/sync/shardblock.go Outdated Show resolved Hide resolved
errors/missingTrieNodeError.go Outdated Show resolved Hide resolved
errors/missingTrieNodeError.go Outdated Show resolved Hide resolved
storage/pruning/triePruningStorer.go Outdated Show resolved Hide resolved
trie/interface.go Outdated Show resolved Hide resolved
trie/node.go Outdated Show resolved Hide resolved
@ssd04 ssd04 marked this pull request as ready for review December 7, 2022 09:19
@iulianpascalau iulianpascalau self-requested a review December 7, 2022 09:56
@@ -180,37 +182,32 @@ func (boot *MetaBootstrap) setLastEpochStartRound() {
func (boot *MetaBootstrap) SyncBlock(ctx context.Context) error {
err := boot.syncBlock()
if errors.IsGetNodeFromDBError(err) {
errSync := boot.syncAccountsDBs()
getNodeErr, ok := err.(*errors.GetNodeFromDBErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading names. We have the if errors.IsGetNodeFromDBError(err) { instruction and then we can not cast it to a *errors.GetNodeFromDBErr ? Let's refactor this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not done. We have the same function and the same cast. Please refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added interface here

process/sync/metablock.go Show resolved Hide resolved
@@ -144,7 +144,12 @@ func (boot *ShardBootstrap) StartSyncingBlocks() {
func (boot *ShardBootstrap) SyncBlock(ctx context.Context) error {
err := boot.syncBlock()
if errors.IsGetNodeFromDBError(err) {
errSync := boot.syncUserAccountsState()
getNodeErr, ok := err.(*errors.GetNodeFromDBErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here, misleading names, please refactor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not done

@@ -1033,6 +1033,11 @@ func (ps *PruningStorer) RangeKeys(_ func(key []byte, val []byte) bool) {
debug.PrintStack()
}

// GetIdentifier returns the identifier for storer
func (ps *PruningStorer) GetIdentifier() string {
return ps.identifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we somehow, change the paths and names for the accounts storers, the whole code will break. We need to find a proper way to use storers identifiers in this case. Perhaps using the trie containers implementation as seen in factory/state/stateComponents.go L151-L178?
This happens because we have 2 sets of constants declared: one in trie/trieFactory/trieFactoryArgs.go and the other one in dataRetriever/unitType.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from what i've tested, it still seems to work, will think more on the refactoring here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for now because the strings from both constant sets are the same. Will need to refactor this somehow. You can refactor now or add a TODO + task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added todo and separate task for this

…s-for-storage-mappers

refactor to use storer ids for dataRetriever
iulianpascalau
iulianpascalau previously approved these changes Dec 28, 2022
iulianpascalau
iulianpascalau previously approved these changes Jan 20, 2023
# Conflicts:
#	dataRetriever/factory/resolverscontainer/metaResolversContainerFactory.go
#	dataRetriever/factory/resolverscontainer/metaResolversContainerFactory_test.go
#	dataRetriever/factory/resolverscontainer/shardResolversContainerFactory.go
#	dataRetriever/factory/resolverscontainer/shardResolversContainerFactory_test.go
#	factory/consensus/consensusComponents.go
#	factory/processing/blockProcessorCreator_test.go
#	genesis/process/genesisBlockCreator.go
#	genesis/process/genesisBlockCreator_test.go
#	integrationTests/state/stateTrie/stateTrie_test.go
#	integrationTests/state/stateTrieSync/stateTrieSync_test.go
#	integrationTests/testProcessorNode.go
#	node/nodeRunner.go
#	process/block/shardblock.go
#	process/coordinator/process.go
#	process/sync/baseSync.go
#	process/sync/metablock_test.go
#	process/sync/shardblock_test.go
#	testscommon/components/components.go
#	testscommon/components/default.go
#	update/genesis/import.go
#	update/genesis/import_test.go
# Conflicts:
#	testscommon/components/default.go
@@ -13,13 +13,13 @@ require (
github.com/google/gops v0.3.18
github.com/gorilla/websocket v1.5.0
github.com/mitchellh/mapstructure v1.5.0
github.com/multiversx/mx-chain-core-go v1.2.0
github.com/multiversx/mx-chain-core-go v1.2.1-0.20230403113932-916b16d18978
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proper releases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be merged in a feature branch. Will create a release tag when the feature branch will be merged.

@@ -719,6 +719,11 @@ func (tc *transactionCoordinator) CreateMbsAndProcessCrossShardTransactionsDstMe
"total gas penalized", tc.gasHandler.TotalGasPenalized(),
"error", errProc,
)

if core.IsGetNodeFromDBError(errProc) {
return nil, 0, false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. Also, return errProc instead of err

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, removed this check. As discussed, it is not needed here.

@@ -2026,6 +2026,10 @@ func (sp *shardProcessor) createMiniBlocks(haveTime func() bool, randomness []by
log.Debug("elapsed time to create mbs to me", "time", elapsedTime)
if err != nil {
log.Debug("createAndProcessCrossMiniBlocksDstMe", "error", err.Error())

if core.IsGetNodeFromDBError(err) {
return nil, nil, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the check, not needed here.

@@ -524,7 +523,8 @@ func (adb *AccountsDB) loadDataTrie(accountHandler baseAccountHandler, mainTrie

dataTrie, err := mainTrie.Recreate(accountHandler.GetRootHash())
if err != nil {
return fmt.Errorf("trie was not found for hash, rootHash = %s, err = %w", hex.EncodeToString(accountHandler.GetRootHash()), err)
log.Error("trie was not found for hash", "rootHash", accountHandler.GetRootHash(), "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log error & return error at the same time? Maybe an error wrap will help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes, the err is now wrapped before returning. Removed the log print.

trie/node.go Outdated
dbWithID, ok := db.(dbWriteCacherWithIdentifier)
if !ok {
log.Trace(common.GetNodeFromDBErrorString, "error", err, "key", n, "db type", fmt.Sprintf("%T", db))
log.Warn("db does not have an identifier", "db type", fmt.Sprintf("%T", db))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log warn while returning the error. Why not wrap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use log trace if you already return this error in the caller side as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped the err.

@@ -88,7 +88,7 @@ func (tr *patriciaMerkleTrie) Get(key []byte) ([]byte, uint32, error) {

val, depth, err := tr.root.tryGet(hexKey, rootDepthLevel, tr.trieStorage)
if err != nil {
err = fmt.Errorf("trie get error: %w, for key %v", err, hex.EncodeToString(key))
log.Error("trie get error", "error", err.Error(), "key", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, log error while returning. Why not wrap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the changes, use wrapped err.

@@ -687,6 +687,17 @@ func (tsm *trieStorageManager) GetBaseTrieStorageManager() common.StorageManager
return tsm
}

// GetIdentifier returns the identifier of the main storer
func (tsm *trieStorageManager) GetIdentifier() string {
dbWithIdentifier, ok := tsm.mainStorer.(dbWriteCacherWithIdentifier)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

errors/missingTrieNodeError.go Show resolved Hide resolved
@@ -1,24 +1,43 @@
package errors

import (
"strings"

"encoding/hex"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


if IsClosingError(err) {
return false
// NewGetNodeFromDBErrWithKey will create a new instance of GetNodeFromDBErr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetNodeFromDBErrWithKey?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -1090,6 +1089,11 @@ func (txs *transactions) CreateAndProcessMiniBlocks(haveTime func() bool, random

if err != nil {
log.Debug("createAndProcessMiniBlocksFromMe", "error", err.Error())

if core.IsGetNodeFromDBError(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will have this change the desired effect? In the caller side, func (tc *transactionCoordinator) CreateMbsAndProcessTransactionsFromMe, there is only a print when this method returns error. I think that the behavior will remain the same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, removed.

@@ -719,6 +719,11 @@ func (tc *transactionCoordinator) CreateMbsAndProcessCrossShardTransactionsDstMe
"total gas penalized", tc.gasHandler.TotalGasPenalized(),
"error", errProc,
)

if core.IsGetNodeFromDBError(errProc) {
return nil, 0, false, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want that the execution of miniblocks dest me to continue with other mini blocks (from other shards) we need only to break here. The return will be in format return miniblocks, numTxsAdded, false, nil. Otherwise all the execution dest me will be skipped. Which is the real intention here?

Copy link
Contributor

@BeniaminDrasovean BeniaminDrasovean Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, as discussed.

trie/node.go Outdated
dbWithID, ok := db.(dbWriteCacherWithIdentifier)
if !ok {
log.Trace(common.GetNodeFromDBErrorString, "error", err, "key", n, "db type", fmt.Sprintf("%T", db))
log.Warn("db does not have an identifier", "db type", fmt.Sprintf("%T", db))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use log trace if you already return this error in the caller side as well.

@@ -88,7 +88,7 @@ func (tr *patriciaMerkleTrie) Get(key []byte) ([]byte, uint32, error) {

val, depth, err := tr.root.tryGet(hexKey, rootDepthLevel, tr.trieStorage)
if err != nil {
err = fmt.Errorf("trie get error: %w, for key %v", err, hex.EncodeToString(key))
log.Error("trie get error", "error", err.Error(), "key", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here

@@ -5,6 +5,7 @@ import (
"encoding/hex"
"encoding/json"
"fmt"
"github.com/multiversx/mx-chain-go/dataRetriever"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go imports

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -292,9 +292,9 @@ func (si *stateImport) getTrie(shardID uint32, accType Type) (common.Trie, error
return trieForShard, nil
}

trieStorageManager := si.trieStorageManagers[triesFactory.UserAccountTrie]
trieStorageManager := si.trieStorageManagers[dataRetriever.PeerAccountsUnit.String()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UserAccountsUnit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Changed.

@@ -2156,3 +2156,23 @@ func TestShardBootstrap_NilInnerBootstrapperClose(t *testing.T) {
bootstrapper := &sync.ShardBootstrap{}
assert.Nil(t, bootstrapper.Close())
}

func TestUnwrapGetNodeFromDBErr(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@BeniaminDrasovean BeniaminDrasovean merged commit da1e15e into feat/sync-missing-trie-nodes Apr 21, 2023
@BeniaminDrasovean BeniaminDrasovean deleted the sync-missing-keys-only branch April 21, 2023 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants