Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Some changes tries optimizations #2840

Merged
merged 3 commits into from
Jul 16, 2019
Merged

Conversation

svyatonik
Copy link
Contributor

@svyatonik svyatonik commented Jun 11, 2019

I'd like to enable changes tries in default (staging) chain specification in nearest future so that out next testnet will have them enabled. But beforehand there should be some statistics collected to choose appropriate defaults for digest_interval and digest_levels. I've tested several configurations already to estimate maximal number of changes that could be covered by top-level digests without significant performance drops. Some rough measurements (that were made without changes in this PR) are at the end of description (fwiw).

What I've found during testing is that there are few easy-to-implement optimizations. This PR cuts block-with-CT (changes trie) import time by ~30% by:

  • reducing number of allocations in build.rs;
  • switching from HashSet to BTreeSet in OverlayedValue::extrinsics because we had to sort this (by converting to BTreeSet) anyway;
  • instead of computing CT root by calling trie_root(input_pairs) and then actually building CT (to store in DB), we're now building CT (the root is computed there anyway).

I have another optimization implemented - in-memory cache that holds data required to build digest tries (instead of reading lower-lever CT from the DB). But: (1) I'm not sure about the case when it actually significantly helps - need to confirm that yet (2) this should be a separate PR.

==========================================================================

In the chain where at every block we're changing 1000 different keys:

  • importing block without CT takes ~0.01s;
  • importing non-digest block with CT takes ~0.018s;
  • importing l1-digest block which covers 8 blocks (i.e. there are ~8000 different keys changed within this range) takes ~0.072s;
  • importing l2-digest block which covers 64 blocks (i.e. there are ~64000 different keys changed within this range) takes ~0.45s.

In the chain where at every block we're changing 5000 different keys:

  • importing block without CT takes ~0.027s;
  • importing non-digest block with CT takes ~0.060s;
  • importing l1-digest block which covers 8 blocks (i.e. there are ~40000 different keys changed within this range) takes ~0.25s;
  • importing l2-digest block which covers 64 blocks (i.e. there are ~320000 different keys changed within this range) takes ~2.106s.

@svyatonik svyatonik added A0-please_review Pull request needs code review. M4-core labels Jun 11, 2019
@svyatonik svyatonik added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Jun 12, 2019
@svyatonik
Copy link
Contributor Author

Think I found some other things that could be eliminated - inprogress

@svyatonik
Copy link
Contributor Author

I thought of maybe using the fact that extrinsics are executed in order => maybe replacing extrinsics: Option<BTreeSet<u32>> in OverlayedValue with extrinsics: Option<Vec<u32>> which is naturally ordered, but probably not the best idea, because it is the runtime who updates the EXTRINSIC_INDEX => it isn't guaranteed to be in-order.

@svyatonik svyatonik added A0-please_review Pull request needs code review. A3-in_progress Pull request is in progress. No review needed at this stage. and removed A3-in_progress Pull request is in progress. No review needed at this stage. A0-please_review Pull request needs code review. labels Jun 12, 2019
@svyatonik
Copy link
Contributor Author

The last update of this PR - I'm adding results for some 'average' case, where every block:

  1. updates 130 unique (across all blocks) keys;
  2. adds fake updates for 20 unique (across all blocks) keys;
  3. updates same 50 keys @ every block;
  4. adds fake updates for 10 same keys every block.

With CT enabled, import of block takes ~0.00399s. When digests are created every 512 blocks, it takes ~0.35114s to import that block. When digests are created every 1024 blocks, it takes ~0.69611 to import that block.

As some conclusion, I think that before enabling CT for the testnet:

  1. it makes sense to allow change of the CT configuration on active chain;
  2. probably later add some auto-maintain mechanism that will limit number of changes that could be covered by digest tries.

@svyatonik svyatonik added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Jun 20, 2019
@gavofyork
Copy link
Member

limit number of changes that could be covered by digest tries

how would this work?

@svyatonik
Copy link
Contributor Author

Something like that, but I have no details for this yet: (1) count number of keys that have to be covered by next top-level CT digest (2) when this crosses limit => create digest CT at next block and emit signal (digest item) that we have created top-level digest CT before schedule (3) reset number of keys to zero and go to (1).

Configuration change is also should be announced to light clients using special digest item.

@svyatonik
Copy link
Contributor Author

Most probably "number of keys" -> "number of changes tries' input pairs" that should be read to build top level digest block. Because it is impossible to have len(unique keys from all lower-level digests) without extra-computations.

@gavofyork
Copy link
Member

ok - makes sense. might it be possible to do incremental CT builds to spread the cost over all the blocks rather than have a single block that takes ages?

@svyatonik
Copy link
Contributor Author

might it be possible to do incremental CT builds to spread the cost over all the blocks rather than have a single block that takes ages?

I've considered this in the very beginning - i.e. you know that next digest trie will be built at block #1024 => when tries for previous 1023 blocks are being built, you also update trie for block#1024 => when you reach #1024 you actually have an almost-complete digest trie. But then I have thought about forks - i.e. when you have forked at #500 and then fork1 also forks at #700 then you'll have 4 potential future digest tries that have to be updated on every block. It seems too heavy (when thinking about performance) + complex (when thinking about code) to maintain. Dropped idea because of that.

@gavofyork
Copy link
Member

ok

Copy link
Contributor

@Demi-Marie Demi-Marie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, although I do not understand this code very well.

@gavofyork gavofyork merged commit f7489f0 into master Jul 16, 2019
@gavofyork gavofyork deleted the changes_tries_initial_optimization branch July 16, 2019 01:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants