Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trie: new deleteFromDB option (default: false) #1219

Merged
merged 7 commits into from
May 12, 2021

Conversation

holgerd77
Copy link
Member

@holgerd77 holgerd77 commented Apr 22, 2021

Fixes #1210

I am coming more and more to the conclusion that this might be the only change we need. We shouldn't comment out stuff (or remove) directly in the DB manager to let people the freedom to delete themselves if they wish, this is what would be expected from the direct DB class.

Some historical reference in between: _formatNode has been moved from CheckpointTrie to BaseTrie here.

All Trie delete operations - both from del() and from batch() - are routed through this _formatNode method, so this should shield all possible high-level trie deletes.

I am unsure about the default setting of a potential new option deleteFromDB. My tendency is to switch to this keep-behavior to shield users from an unwanted delete behavior. However I am not sure about the side effects. This will lead to larger state growth - also e.g. in our client - if used for continuous trie operations. This would be an argument to start with remaining the delete-behavior (so set deleteFromDB to true by default) but then instantiate the default trie in the VM with false.

Hmm, very much unsure about these decisions.

Nevertheless this is my proposal what I would currently have a tendency towards.

@codecov
Copy link

codecov bot commented Apr 22, 2021

Codecov Report

Merging #1219 (61e6431) into master (7fe126c) will decrease coverage by 2.03%.
The diff coverage is n/a.

Impacted file tree graph

Flag Coverage Δ
block 82.57% <ø> (ø)
blockchain 84.23% <ø> (ø)
client 84.58% <ø> (-0.23%) ⬇️
common 88.07% <ø> (ø)
devp2p 83.88% <ø> (ø)
ethash 82.08% <ø> (ø)
tx 88.68% <ø> (-0.14%) ⬇️
vm 81.05% <ø> (-5.73%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

type: 'del',
key: hashRoot,
})
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this change is also changing the semantics of this conditional clause. Before if - when using a CheckpointTrie - being not within a checkpoint the remove calls would have been transitioned into a put operation. I am not seeing the logic of this and would assume this is falsy behaviour with no side effects (or maybe with side effects, not 100% sure), since the node references are removed from the trie anyhow (so it has no effect if the value is saved (again) in the DB. I wouldn't exclude the possibility though that I am overlooking something here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good observation and indeed weird

@ryanio
Copy link
Contributor

ryanio commented Apr 26, 2021

this is looking good to me. should we add a test setting the stateRoot back and accessing a "deleted" value? then it will succeed with deleteFromDb=false and fail (return null?) with deleteFromDb=true

@holgerd77
Copy link
Member Author

@ryanio test addition was a great suggestion since this revealed that this fix is actually not fixing the problem. I added a test here d7d7c38 and this is - against assumption - also returning v1 with deleteFromDB=true version when setting back the state root.

I tracked a bit with console.log and delete operations are actually not added in _formatNode since the rlpNode.length >= 32 condition is not satisfied on delete calls and therefore no delete is added. My first conclusion here is that we then have this delete behavior really only on checkpointing when we do the commit calls.

First-thought alternative suggestion to fix would therefore be to add a doDeletes=false option to the CheckpointTrie.commit() method. But I also have to think this further through and let this sink in a bit. Suggestions, some alternative analysis or reasoning and/or additional remarks welcome.

@holgerd77 holgerd77 force-pushed the trie-no-node-delete-option branch 2 times, most recently from 991e498 to 84e7fe4 Compare April 29, 2021 09:51
@holgerd77
Copy link
Member Author

holgerd77 commented Apr 29, 2021

Ok, I would now open this up for review. 😄

As discussed in the call we now keep (respectively: set) the no-delete option as default and generally do this as a general option for the Trie and no option on CheckpointTrie.commit() or something.

I've now found out why the tests failed, this is due to this rlpNode.length >= 32 check in the _formatNode() function. I was not able to "fix" this directly since I don't understand where this is coming from or what it does, various tests fail though if this is removed. So I added as a TODO to further analyze. The worst thing which can happen here atm is that there are some values less which are deleted when the deleteFromDB option is active. So I would regard this as an acceptable fix for now if we don't get to the root here on this, since overall the status quo is significantly improved by this PR respectively the HardHat bug and overall potentially dangerous delete behavior prevented.

Overall this PR has the following properties: I followed all the potential delete paths on the Trie methods, so various entry points which lead to either del() calls directly, pushing del operations an a DB stack, put() operations with an empty value and putting put operations with an empty value on a DB stack. All these paths end up on the _formatNode direct del DB stack addition, so all of this is prevented on the Trie with the PR. I also added some additional code docs to make this explicit. This also goes for the Trie subclasses CPTrie and SecureTrie, which all call into the super classes.

On the other hand the operations on the DB wrappers directly are not shielded - which I would argue should be the case since the DB classes should just do the DB operations if called, so people still have the chance to directly operate on the DB.

This makes me think respectively remember: we do the following direct DB operation in the StateManager within the VM:

async putContractCode(address: Address, value: Buffer): Promise<void> {
    const codeHash = keccak256(value)

    if (codeHash.equals(KECCAK256_NULL)) {
      return
    }

    await this._trie.db.put(codeHash, value)

    const account = await this.getAccount(address)
    account.codeHash = codeHash
    await this.putAccount(address, account)
  }

Is this still something dangerous in this context, e.g. when a numm value is passed? I have to say I am generally not understanding this code part and why the DB is accessed directly in this context and this is not done through the higher level Trie interface.

@ryanio
Copy link
Contributor

ryanio commented Apr 29, 2021

Regarding putContractCode, I think the reason the underlying db is used to put is because otherwise it would be routed through SecureTrie.put() which would keccak the key a second time, and codeHash is already keccak'd.

I am wondering if we change from this._trie.db.put(codeHash, value) to this._trie.put(value, value) if it would then properly keccak the key. But I am also wondering if it was bypassing the checkpointing or cache mechanisms on purpose? What if the state is reverted?


Regarding the rlpNode.length >= 32 check in _formatNode(), I believe it's because it's used in updateNode when creating a new branch or extension node and if the length is >= 32 it needs to be further formatted as it can't be stored as is.

I found on eth.wiki it says:

When one node is referenced inside another node, what is included is H(rlp.encode(x)), where H(x) = sha3(x) if len(x) >= 32 else x and rlp.encode is the RLP encoding function.

@holgerd77
Copy link
Member Author

@ryanio are there any code changes/updates you would recommend me to do derived from the last comment you made (putContractCode / greater-32-check)? 🤔 Or should we do this putContractCode check in a separate PR?

@ryanio
Copy link
Contributor

ryanio commented May 4, 2021

@holgerd77 hm, thinking about it more, putContractCode must use the underlying db so it doesn't format the nodes via baseTrie.put. I started looking through the YP to see where it explains to store contract code and it says (bottom of page 3):

The account state, σ[a], comprises the following four fields:
...
codeHash: The hash of the EVM code of this account—this is the code that gets executed should this address receive a message call; it is immutable and thus, unlike all other fields, cannot be changed after construction.
All such code fragments are contained in the state database under their corresponding hashes for later retrieval.

The last sentence explains why we use the underlying db instead of formatting the trie, so I think the current code is correct.

The greater-than-32 check also seems correct to me and doesn't need any code changes.

@ryanio ryanio force-pushed the trie-no-node-delete-option branch from 84e7fe4 to f02c8fe Compare May 5, 2021 19:23
@ryanio
Copy link
Contributor

ryanio commented May 5, 2021

Rebased the branch on master.

@alcuadrado
Copy link
Member

alcuadrado commented May 5, 2021

As mentioned on Discord, this PR doesn't fix the bug, at least not in Balancer's codebase.

This repo can be used as a test. You need to check out the state-manager-bug.

To use the latest/released version of the state manager, just run yarn and yarn test. 58 tests will fail.

To try with a local build of the vm, run yarn, yarn link the vm, and yarn test. 58 tests are also failing.


Update on this comment (from @holgerd77): we have re-analyzed and this comment is outdated, bug has been fixed and confirmed.

@holgerd77
Copy link
Member Author

Ok, I've now pushed a last update on the TODO comment with the suggestions from Ryan. I guess with the latest analysis done by @alcuadrado and @ryanio we can now merge here and do a release? 😄

Can someone please review and approve?

@holgerd77
Copy link
Member Author

@jochem-brouwer @ryanio Can this get a final review please? 😄

jochem-brouwer
jochem-brouwer previously approved these changes May 11, 2021
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this looks great!

@ryanio
Copy link
Contributor

ryanio commented May 12, 2021

I would still like to remove this TODO before merging: #1219 (comment)

@holgerd77
Copy link
Member Author

@ryanio could you do? I am somewhat on family holidays during the next days until Monday.

resolve TODO note for rlpNode.length >= 32
Copy link
Contributor

@ryanio ryanio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ryanio ryanio merged commit 4d3edaa into master May 12, 2021
@ryanio ryanio deleted the trie-no-node-delete-option branch May 12, 2021 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trie: do not delete values from DB by default
5 participants