Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Light friendly storage tracking: track last-modification-time entry in storage #425

Closed
wants to merge 12 commits into from

Conversation

svyatonik
Copy link
Contributor

@svyatonik svyatonik commented Jul 26, 2018

This PR implements "Track last-modification-time entry in storage" section of #131 .

Brief overview:

  1. runtime decides if it uses prefixes in the storage
  2. runtime saves 2 values in the storage when it is going to use prefixed values - ":prefix_len" and ":prefix"
    2.1) all storage values are supposed to have prefix of ":prefix_len" (if the entry ":prefix_len" exists)
    2.2) all new storage values are prepended by prefix from ":prefix" entry
    2.3) both values are inserted at the genesis (see TODO below)
    2.4) ":prefix" is updated (with encoded block#) when new blocks starts execution
  3. prefixes are appended/stripped by runtime-io functions
  4. when runtime deletes storage value, it is not actually deleted, but:
    4.1) prefix is written to the storage entry
    4.2) deleted key is inserted into special KeysSet structure (with ":deleted" prefix)
    4.3) when new blocks starts execution, runtime checks if it is a good time to purge deleted values (if current_block_number % purge_interval == 0)
    4.4) purge traverses KeysSet contents and for each deleted key checks (before purging this key):
    4.4.1) if it is still deleted
    4.4.2) if current_block_number - deleted_at_block_number > min_age_before_purge
    4.5) storage purge is not a part of consensus (i.e. there's no guarantee that old values will be purged every N blocks)
  5. there are now several places where substrate should know if values are prefixed:
    5.1) when it reads values directly from the storage (e.g. ":code")
    5.2) when it abouts to delete keys with given prefix
  6. because of (5), there are 2 pieces of code which are shared by substrate && runtime:
    6.6) to strip prefix, file substrate/runtime-io/src/prefix_shared.rs is compiled both by runtime-io and substrate-state-machine
    6.7) to remember keys that we need to purge (when deleting by prefix), file substrate/runtime-io/src/keys_set.rs is compiled both by runtime-io and substrate-state-machine

TODOs (in order of importance)

1) ":prefix" or ":prefix_len" are read on every storage access by runtime-io. We need to cache these values at least for whole "CodeExecutor::call" duration. This could be done if Ext would provide some kind of HashMap<&'static [u8], Vec>-based cache done
1.5) right now high-level tests for this new functionality are mostly located in demo-executor crate. Would be good to have low-level tests in substrate-runtime-system crate. +also add the test for temporary values deletion (make sure they're not scheduled for the purge) - right now this is a transparent (hardcoded in state-root) part of high-level tests;
2) add support (at substrate level) for runtimes to migrate storage from non-prefixed to prefixed and back forth (this would be slow since it would affect all values in the storage)
3) provide value(s) change logs to light nodes && use this to remove mined transactions/extrinsics from the pool
4) add support (at substrate level) for runtimes to change prefix length

TODOs for potential issues

1) I've found that here we only remove values from the backend, while leaving values in the overlay. Is this intended (@pepyakin ?)? Used the same strategy in save_pefix_keys, but imho it is a potential issue. UPD: paritytech/polkadot#426 fixed

@svyatonik svyatonik added the A0-please_review Pull request needs code review. label Jul 26, 2018
@svyatonik svyatonik added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Jul 26, 2018
@gavofyork
Copy link
Member

Hmm, my preference is for the "Ordered, indexed, Merklised per-block change-trie" idea rather than the prefixing idea, but happy to play in this direction to see how it goes. One particularly problematic case is there we have a huge referendum proposal (last one was 700KB) and its existence needs to be tested (this happens several times every block). In the current model, we only need to test existence of the key in the Merkle tree; if it's there we don't actually need to load the value to know that the proposal "exists". In this new model that's not enough since it might be there but in a zombie "deleted" state - we'll have to load the value (at least the first 4 bytes) to check whether it really is in the trie. That may be problematic to do efficiently without incurring the full cost of the 700KB load.

storage purge is not a part of consensus (i.e. there's no guarantee that old values will be purged every N blocks)

It must be since it alters the storage trie root.

@svyatonik
Copy link
Contributor Author

@gavofyork

  1. thanks for sharing info about large values - this must be optimized as you've told (to read prefix only). Will do that in parallel with TODO#1. Right now this should't be a problem, because this PR doesn't alters any existing chain (they have no ":prefix_len" entry) && if there are no prefix, exists_storage works as before
  2. storage root check is a part of consensus, of course. And given that the storage is the same on all nodes && ":code" runs deterministically, all scheduled values will be purged (as it is in the current implementation). I'm talking here about potential 2nd level consensus rule - like "ensure that there are no footprints after block N%interval==0". This is redundant. Probably ignore this statement at all :)

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jul 26, 2018
this.ext.set_prefix(&prefix, value);
Ok(())
},
ext_save_pefix_keys(prefix_data: *const u8, prefix_len: u32, set_prefix_data: *const u8, set_prefix_len: u32) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there's an r missing: ext_save_prefix_keys

@gavofyork
Copy link
Member

it would be good to know if it's even possible to read the first few bytes of a storage item without incurring the cost of reading all from disk, since testing for existence is a pretty important thing to be able to do cheaply.

@svyatonik
Copy link
Contributor Author

Just for the record:

  1. haven't found any API (in RocksDB) which just checks existence or read first few bytes of value;
  2. in this PR I've used other deletion strategy than described by @rphmeier in Protocol: Light-client friendly storage tracking #131. Probably that wasn't a good move by me, since it have lead to all this value-existence discussion. There are 2 problems with the "move under :deleted: prefix", which I have found inconvenient in usage/implementation:
    2.1) when we're setting the storage value, we do not know if it exists => on every update we should (in addition to prepending with prefix) we need to read existing value first and if it is None, check it under ":deleted" prefix + optionally remove if it is there. I thought it would be better to defer all deletion-related checks to the "purge block";
    2.2) we can't just put all deleted values under single ":deleted" key - otherwise we could purge the value before it will stay in deleted state for at least N blocks. I thought of creating several ":deleted" maps - like ":delete:at_block_10000", ":delete:at_block_20000", etc, but this requires checking several (at least 2) maps in (2.1).

So, given what I've said, if you found another approach better than what I did here, please comment + request changes.

@svyatonik
Copy link
Contributor Author

Found some issues when tried to run local testnet with prefixes => gotissues for now.

@svyatonik svyatonik added A4-gotissues and removed A0-please_review Pull request needs code review. labels Jul 31, 2018
@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A4-gotissues labels Jul 31, 2018
@svyatonik
Copy link
Contributor Author

svyatonik commented Jul 31, 2018

The problem (state_root of the built block != state_root after block execution) was a result of temporary values deletion issue. Specifically - when block is built, some values are created and deleted in the block. As a result, these values have been scheduled for deletion at 'purge block'. This isn't correct, since we only need to schedule values that have been pre-existed before block execution => fixed it.

Also: adding TODO#1.5

@svyatonik
Copy link
Contributor Author

svyatonik commented Aug 1, 2018

Added cached_storage() method to Ext. Caching all storage values seems a bad solution => only ":prefix" and ":prefix_len" are cached for now. This had slightly improved performance. Block import times are (native executor):

  • prefix-enabled runtime + cache is disabled: 0.2910150000000001;
  • prefix-enabled runtime + cache is enabled: 0.27259384615384613;
  • no-prefix runtime + cache is enabled: 0.2519619047619048;
  • no-prefix runtime + master branch (not prefix manipulations at all): 0.24247500000000002.

@ddorgan ddorgan closed this Aug 25, 2018
@ddorgan ddorgan reopened this Aug 25, 2018
@svyatonik svyatonik changed the title Prefix storage values Light friendly storage tracking: track last-modification-time entry in storage Aug 29, 2018
@gavofyork gavofyork added A1-onice and removed A0-please_review Pull request needs code review. labels Sep 5, 2018
@gavofyork
Copy link
Member

Superceded by #628 . Reopen later if that doesn't work out (though I expect it will).

@gavofyork gavofyork closed this Sep 5, 2018
@svyatonik svyatonik deleted the storage_value_prefix4 branch June 21, 2019 06:33
liuchengxu added a commit to chainx-org/substrate that referenced this pull request Aug 23, 2021
helin6 pushed a commit to boolnetwork/substrate that referenced this pull request Jul 25, 2023
Signed-off-by: Gregory Hill <gregorydhill@outlook.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants