From 4b6b868b1ce6dd3a5e86922de5e8d2faeb1618d4 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Fri, 14 Aug 2020 14:47:07 -0700 Subject: [PATCH 01/35] wip Co-authored-by: Jane Lusby --- book/src/dev/rfcs/0003-state-updates.md | 284 ++++++++++++++++++++++++ 1 file changed, 284 insertions(+) create mode 100644 book/src/dev/rfcs/0003-state-updates.md diff --git a/book/src/dev/rfcs/0003-state-updates.md b/book/src/dev/rfcs/0003-state-updates.md new file mode 100644 index 00000000000..9c9de77511a --- /dev/null +++ b/book/src/dev/rfcs/0003-state-updates.md @@ -0,0 +1,284 @@ +# State Updates + +- Feature Name: state_updates +- Start Date: 2020-08-14 +- Design PR: XXX +- Zebra Issue: XXX + +# Summary +[summary]: #summary + +Zebra manages chain state in the `zebra-state` crate, which allows state +queries via asynchronous RPC (in the form of a Tower service). The state +system is responsible for contextual verification in the sense of [RFC2], +checking that new blocks are consistent with the existing chain state before +committing them. This RFC describes how the state is represented internally, +and how state updates are performed. + +[RFC2]: ./0002-parallel-verification.md + +# Motivation +[motivation]: #motivation + +We need to be able to access and modify the chain state, and we want to have +a description of how this happens and what guarantees are provided by the +state service. + +# Definitions +[definitions]: #definitions + +* **state data**: Any data the state service uses to represent chain state. + +* **structural/semantic/contextual verification**: as defined in [RFC2]. + +* **block chain**: A sequence of valid blocks linked by inclusion of the + previous block hash in the subsequent block. Chains are rooted at the + *genesis* block and extend to a *tip*. + +* **chain state**: The state of the ledger after application of a particular + sequence of blocks (state transitions). + +* **difficulty**: The cumulative proof-of-work from genesis to the chain tip. + +* **best chain**: The chain with the greatest difficulty. This chain + represents the consensus state of the Zcash network and transactions. + +* **side chain**: A chain which is not contained in the best chain. + +* **chain reorganization**: Occurs when a new best chain is found and the + previous best chain becomes a side chain. + +* **orphaned block**: A block which is no longer included in the best chain. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +XXX fill in after writing other details + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## State Components + +Zcash (as implemented by `zcashd`) differs from Bitcoin in its treatment of +transaction finality. If a new best chain is detected that does not extend +the previous best chain, blocks at the end of the previous best chain become +orphaned (no longer included in the best chain). Their state updates are +therefore no longer included in the best chain's chain state. The process of +rolling back orphaned blocks and applying new blocks is called a chain +reorganization. Bitcoin allows chain reorganizations of arbitrary depth, +while `zcashd` limits reorganizations to 100 blocks. + +This difference means that in Bitcoin, chain state only has probabilistic +finality, while in Zcash, chain state is final once it is beyond the reorg +limit. To simplify our implementation, we split the representation of the +state data at the finality boundary provided by the reorg limit. + +State data from blocks *above* the reorg limit is stored in-memory using +immutable data structures from the `im` crate. State data from blocks *below* +the reorg limit is stored persistently using `sled`. This allows a +simplification of our state handling, because only final data is persistent. + +We choose `im` because it provides best-in-class manipulation of persistent +data structures. We choose `sled` because of its ease of integration and API +simplicity. + +One downside of this design is that restarting the node loses the last 100 +blocks, but node restarts are relatively infrequent and a short re-sync is +cheap relative to the cost of additional implementation complexity. + +## Service Interface +[service-interface]: #service-interface + +The state is accessed asynchronously through a Tower service interface. +Determining what guarantees the state service can and should provide to the +rest of the application requires considering two sets of behaviors: + +1. behaviors related to the state's external API (a `Buffer`ed `tower::Service`); +2. behaviors related to the state's internal implementation (using `sled`). + +Making this distinction helps us to ensure we don't accidentally leak +"internal" behaviors into "external" behaviors, which would violate +encapsulation and make it more difficult to replace `sled`. + +In the first category, our state is presented to the rest of the application +as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access +to a service using an actor model, moving the service to be shared into a +worker task and passing messages to it over an multi-producer single-consumer +(mpsc) channel. The worker task recieves messages and makes `Service::call`s. +The `Service::call` method returns a `Future`, and the service is allowed to +decide how much work it wants to do synchronously (in `call`) and how much +work it wants to do asynchronously (in the `Future` it returns). + +This means that our external API ensures that the state service sees a +linearized sequence of state requests, although the exact ordering is +unpredictable when there are multiple senders making requests. + +In the second category, the Sled API presents itself synchronously, but +database and tree handles are clonable and can be moved between threads. All +that's required to process some request asynchronously is to clone the +appropriate handle, move it into an async block, and make the call as part of +the future. (We might want to use Tokio's blocking API for this, but this is +an implementation detail). + +Because the state service has exclusive access to the sled database, and the +state service sees a linearized sequence of state requests, we have an easy +way to opt in to asynchronous database access. We can perform sled operations +synchronously in the `Service::call`, waiting for them to complete, and be +sure that all future requests will see the resulting sled state. Or, we can +perform sled operations asynchronously in the future returned by +`Service::call`. + +If we perform all *writes* synchronously and allow reads to be either +synchronous or asynchronous, we ensure that writes cannot race each other. +Asynchronous reads are guaranteed to read at least the state present at the +time the request was processed, or a later state. + +In summary: + +- **Sled reads** may be done synchronously (in `call`) or asynchronously (in + the `Future`), depending on the context; + +- **Sled writes** must be done synchronously (in `call`). + +## In-memory data structures +[in-memory]: #in-memory + +At a high level, the in-memory data structures store a collection of chains, +each rooted at the highest finalized block. Each chain consists of a map from +heights to blocks. Chains are stored using an ordered map from difficulty to +chains, so that the map ordering is the ordering of best to worst chains. + +- XXX fill in details on exact types + +- XXX work out whether we should store extra data (e.g., a HashSet of UTXOs + spent by some block etc.) to speed up checks. + +When a new block extends the best chain past 100 blocks, the old root is +removed from the in-memory state and committed to sled. + +## Sled data structures +[sled]: #sled + +Sled provides a persistent, thread-safe `BTreeMap<&[u8], &[u8]>`. Each map is +a distinct "tree". Keys are sorted using lex order on byte strings, so +integer values should be stored using big-endian encoding (so that the lex +order on byte strings is the numeric ordering). + +We use the following Sled trees: + +| Tree | Keys | Values | +|---------------------|-----------------------|-------------------------------------| +| `blocks_by_hash` | `BlockHeaderHash` | `Block` | +| `hash_by_height` | `BE32(height)` | `BlockHeaderHash` | +| `tx_by_hash` | `TransactionHash` | `BlockHeaderHash || BE32(tx_index)` | +| `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | + +Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. + +## Request / Response API +[request-response]: #request-response + +The state API is provided by a pair of `Request`/`Response` enums. Each +`Request` variant corresponds to particular `Response` variants, and it's +fine (and encouraged) for caller code to unwrap the expected variants with +`unreachable!` on the unexpected variants. This is slightly inconvenient but +it means that we have a unified state interface with unified backpressure. + +This API includes both write and read calls. Spotting `Commit` requests in +code review should not be a problem, but in the future, if we need to +restrict access to write calls, we could implement a wrapper service that +rejects these, and export "read" and "write" frontends to the same inner service. + +### `Request::CommitBlock(Arc)` +[request-commit-block]: #request-commit-block + +Performs contextual validation of the given block, committing it to the state +if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of +the newly committed block or an error. + +If the parent block is not committed, add the block to an internal queue for +future processing. + +Otherwise, attempt to perform contextual validation checks and the commit +the given block to the state. The exact list of contextual validation checks +will be specified in a later RFC. If contextual validation checks succeed, +the new block is added to one of the in-memory chains. If the resulting chain +is longer than 100 blocks, the oldest block is now past the reorg limit, so +it is removed from the in-memory chain and committed to sled as described +below. + +Finally, process any queued children of the newly committed block the same way. + +### `Request::CommitFinalizedBlock` +[request-commit-finalized-block]: #request-finalized-block + +Commits a finalized block to the sled state, skipping contextual validation. +The block's parent must be the current sled tip. This is exposed for use in +checkpointing, which produces in-order finalized blocks. Returns +`Response::Added(BlockHeaderHash)` with the hash of the committed block if +successful. + +This should be implemented as a wrapper around a function also called by +[`Request::CommitBlock`](#request-commit-block), which should: + +1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`. +Check that `block`'s parent hash is `old_tip` and its height is +`old_height+1`, or error. This check is performed as defense-in-depth +to prevent database corruption, but it is the caller's responsibility to +commit finalized blocks in order. + +2. Insert `(block_hash, block)` into `blocks_by_hash` and + `(BE32(height), block_hash)` into `hash_by_height`. + +3. Iterate over the enumerated transactions in the block. For each transaction: + 1. Insert `(transaction_hash, block_hash || BE32(tx_index))` to `tx_by_hash`; + 2. For each `TransparentInput::PrevOut { outpoint, .. }` in the + transaction's `inputs()`, remove `outpoint` from `utxo_by_output`. + 3. For each `output` in the transaction's `outputs()`, construct the + `outpoint` that identifies it, and insert `(outpoint, output)` into `utxo_by_output`. + These updates can be performed using a sled `Batch`. + +### `Request::Depth(BlockHeaderHash)` +[request-depth]: #request-depth + +Computes the depth in the best chain of the block identified by the given +hash, returning + +- `Response::Depth(Some(depth))` if the block is in the main chain; +- `Response::Depth(None)` otherwise. + +### `Request::Tip` +[request-tip]: #request-tip + +Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. + +### `Request::BlockLocator` +[request-block-locator]: #request-block-locator + +- XXX fill in + +### `Request::Transaction(TransactionHash)` +[request-transaction]: #request-transaction + +- XXX fill in + +### `Request::Block(BlockHeaderHash)` +[request-block]: #request-block + +- XXX fill in + +# Drawbacks +[drawbacks]: #drawbacks + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +# Prior art +[prior-art]: #prior-art + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +# Future possibilities +[future-possibilities]: #future-possibilities From 4f2303b38584b6fe26df21efa653ed1c29e8ff53 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Fri, 14 Aug 2020 17:50:56 -0700 Subject: [PATCH 02/35] wip2: add nullifiers Co-authored-by: Jane Lusby --- book/src/dev/rfcs/0003-state-updates.md | 59 +++++++++++++++++-------- 1 file changed, 40 insertions(+), 19 deletions(-) diff --git a/book/src/dev/rfcs/0003-state-updates.md b/book/src/dev/rfcs/0003-state-updates.md index 9c9de77511a..3ef66092dd1 100644 --- a/book/src/dev/rfcs/0003-state-updates.md +++ b/book/src/dev/rfcs/0003-state-updates.md @@ -167,12 +167,14 @@ order on byte strings is the numeric ordering). We use the following Sled trees: -| Tree | Keys | Values | -|---------------------|-----------------------|-------------------------------------| -| `blocks_by_hash` | `BlockHeaderHash` | `Block` | -| `hash_by_height` | `BE32(height)` | `BlockHeaderHash` | -| `tx_by_hash` | `TransactionHash` | `BlockHeaderHash || BE32(tx_index)` | -| `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | +| Tree | Keys | Values | +|----------------------|-----------------------|-------------------------------------| +| `blocks_by_hash` | `BlockHeaderHash` | `Block` | +| `hash_by_height` | `BE32(height)` | `BlockHeaderHash` | +| `tx_by_hash` | `TransactionHash` | `BlockHeaderHash || BE32(tx_index)` | +| `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | +| `sprout_nullifiers` | `sprout::Nullifier` | `()` | +| `sapling_nullifiers` | `sapling::Nullifier` | `()` | Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. @@ -200,17 +202,17 @@ the newly committed block or an error. If the parent block is not committed, add the block to an internal queue for future processing. -Otherwise, attempt to perform contextual validation checks and the commit -the given block to the state. The exact list of contextual validation checks -will be specified in a later RFC. If contextual validation checks succeed, -the new block is added to one of the in-memory chains. If the resulting chain -is longer than 100 blocks, the oldest block is now past the reorg limit, so -it is removed from the in-memory chain and committed to sled as described -below. +Otherwise, attempt to perform contextual validation checks and the commit the +given block to the state. The exact list of contextual validation checks will +be specified in a later RFC. If contextual validation checks succeed, the new +block is added to one of the in-memory chains. If the resulting chain is +longer than 100 blocks, the oldest block is now past the reorg limit, so we +remove it from all in-memory chains and commit it to sled as described below +in `CommitFinalizedBlock`. Finally, process any queued children of the newly committed block the same way. -### `Request::CommitFinalizedBlock` +### `Request::CommitFinalizedBlock(Arc)` [request-commit-finalized-block]: #request-finalized-block Commits a finalized block to the sled state, skipping contextual validation. @@ -224,7 +226,7 @@ This should be implemented as a wrapper around a function also called by 1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`. Check that `block`'s parent hash is `old_tip` and its height is -`old_height+1`, or error. This check is performed as defense-in-depth +`old_height+1`, or panic. This check is performed as defense-in-depth to prevent database corruption, but it is the caller's responsibility to commit finalized blocks in order. @@ -232,12 +234,31 @@ commit finalized blocks in order. `(BE32(height), block_hash)` into `hash_by_height`. 3. Iterate over the enumerated transactions in the block. For each transaction: - 1. Insert `(transaction_hash, block_hash || BE32(tx_index))` to `tx_by_hash`; + + 1. Insert `(transaction_hash, block_hash || BE32(tx_index))` to + `tx_by_hash`; + 2. For each `TransparentInput::PrevOut { outpoint, .. }` in the - transaction's `inputs()`, remove `outpoint` from `utxo_by_output`. + transaction's `inputs()`, remove `outpoint` from `utxo_by_output`. + 3. For each `output` in the transaction's `outputs()`, construct the - `outpoint` that identifies it, and insert `(outpoint, output)` into `utxo_by_output`. - These updates can be performed using a sled `Batch`. + `outpoint` that identifies it, and insert `(outpoint, output)` into + `utxo_by_output`. + + 4. For each [`JoinSplit`] description in the transaction, + insert `(nullifiers[0],())` and `(nullifiers[1],())` into + `sprout_nullifiers`. + + 5. For each [`Spend`] description in the transaction, insert + `(nullifier,())` into `sapling_nullifiers`. + +[`JoinSplit`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.JoinSplit.html +[`Spend`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.Spend.html + + +These updates can be performed in a batch or without necessarily iterating +over all transactions, if the data is available by other means; they're +specified this way for clarity. ### `Request::Depth(BlockHeaderHash)` [request-depth]: #request-depth From 5898590e7f569450d3b26c2dfe8f995cb3fc02e7 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Mon, 31 Aug 2020 15:29:13 -0700 Subject: [PATCH 03/35] Update book/src/dev/rfcs/0003-state-updates.md Co-authored-by: teor --- book/src/dev/rfcs/0003-state-updates.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/book/src/dev/rfcs/0003-state-updates.md b/book/src/dev/rfcs/0003-state-updates.md index 3ef66092dd1..3bb9ae6d6d2 100644 --- a/book/src/dev/rfcs/0003-state-updates.md +++ b/book/src/dev/rfcs/0003-state-updates.md @@ -169,9 +169,9 @@ We use the following Sled trees: | Tree | Keys | Values | |----------------------|-----------------------|-------------------------------------| -| `blocks_by_hash` | `BlockHeaderHash` | `Block` | -| `hash_by_height` | `BE32(height)` | `BlockHeaderHash` | -| `tx_by_hash` | `TransactionHash` | `BlockHeaderHash || BE32(tx_index)` | +| `blocks_by_hash` | `block::Hash` | `Block` | +| `hash_by_height` | `BE32(height)` | `block::Hash` | +| `tx_by_hash` | `transaction::Hash` | `block::Hash || BE32(tx_index)` | | `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | | `sprout_nullifiers` | `sprout::Nullifier` | `()` | | `sapling_nullifiers` | `sapling::Nullifier` | `()` | From 4b75d21255e33a7f23704f2c3633bd1f8cd5f58d Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Mon, 31 Aug 2020 20:31:17 -0700 Subject: [PATCH 04/35] Move to RFC number 5 --- book/src/SUMMARY.md | 1 + .../dev/rfcs/{0003-state-updates.md => 0005-state-updates.md} | 0 2 files changed, 1 insertion(+) rename book/src/dev/rfcs/{0003-state-updates.md => 0005-state-updates.md} (100%) diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 4adadf0bb61..de80dda91a0 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -14,6 +14,7 @@ - [Parallel Verification](dev/rfcs/0002-parallel-verification.md) - [Inventory Tracking](dev/rfcs/0003-inventory-tracking.md) - [Asynchronous Script Verification](dev/rfcs/0004-asynchronous-script-verification.md) + - [State Updates](dev/rfcs/0005-state-updates.md) - [Diagrams](dev/diagrams.md) - [Network Architecture](dev/diagrams/zebra-network.md) - [zebra-checkpoints](dev/zebra-checkpoints.md) diff --git a/book/src/dev/rfcs/0003-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md similarity index 100% rename from book/src/dev/rfcs/0003-state-updates.md rename to book/src/dev/rfcs/0005-state-updates.md From 5d5a3f8d6b39410ddf9975f23192000856ad33f1 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Mon, 7 Sep 2020 21:29:11 -0700 Subject: [PATCH 05/35] rfc: add PR link to state update RFC --- book/src/dev/rfcs/0005-state-updates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 3bb9ae6d6d2..2b79a26f90c 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -2,7 +2,7 @@ - Feature Name: state_updates - Start Date: 2020-08-14 -- Design PR: XXX +- Design PR: https://github.com/ZcashFoundation/zebra/pull/902 - Zebra Issue: XXX # Summary From 6b8a5d7bf488e217da7353f7ff4012f0a0e94491 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Mon, 7 Sep 2020 21:29:51 -0700 Subject: [PATCH 06/35] rfc: change state RFC to store blocks by height. The rationale for this change is described in the document: it means that we write blocks only to one end of the Sled tree, and hopefully helps us with spatial access patterns. This should help alleviate a major cause of memory use in Zebra's current WIP Sled structure, which is that: - blocks are stored in random, sparse order (by hash) in the B-tree; - the `Request::GetDepth` method opens the entire block store and queries a random part of its block data to determine whether a hash is present; - if present, it deserializes the complete block data of both the given block and the current tip block, to compute the difference in block heights. This access pattern forces a large amount of B-tree data to remain resident, and could probably be avoided if we didn't do that. --- book/src/dev/rfcs/0005-state-updates.md | 58 +++++++++++++++++++------ 1 file changed, 45 insertions(+), 13 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 2b79a26f90c..5314dcda9da 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -169,15 +169,36 @@ We use the following Sled trees: | Tree | Keys | Values | |----------------------|-----------------------|-------------------------------------| -| `blocks_by_hash` | `block::Hash` | `Block` | -| `hash_by_height` | `BE32(height)` | `block::Hash` | -| `tx_by_hash` | `transaction::Hash` | `block::Hash || BE32(tx_index)` | +| `hash_by_height` | `BE32(height)` | `block::Hash` | +| `height_by_hash` | `block::Hash` | `BE32(height)` | +| `block_by_height` | `BE32(height)` | `Block` | +| `tx_by_hash` | `transaction::Hash` | `BE32(height) || BE32(tx_index)` | | `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | | `sprout_nullifiers` | `sprout::Nullifier` | `()` | | `sapling_nullifiers` | `sapling::Nullifier` | `()` | Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. +### Notes on Sled trees + +- The `hash_by_height` and `height_by_hash` trees provide the bijection between + block heights and block hashes. (Since the Sled state only stores finalized + state, this is actually a bijection). + +- Blocks are stored by height, not by hash. This has the downside that looking + up a block by hash requires an extra level of indirection. The upside is + that blocks with adjacent heights are adjacent in the database, and many + common access patterns, such as helping a client sync the chain or doing + analysis, access blocks in (potentially sparse) height order. In addition, + the fact that we commit blocks in order means we're writing only to the end + of the Sled tree, which may help save space. + +- Transaction references are stored as a `(height, index)` pair referencing the + height of the transaction's parent block and the transaction's index in that + block. This would more traditionally be a `(hash, index)` pair, but because + we store blocks by height, storing the height saves one level of indirection. + + ## Request / Response API [request-response]: #request-response @@ -216,13 +237,19 @@ Finally, process any queued children of the newly committed block the same way. [request-commit-finalized-block]: #request-finalized-block Commits a finalized block to the sled state, skipping contextual validation. -The block's parent must be the current sled tip. This is exposed for use in -checkpointing, which produces in-order finalized blocks. Returns -`Response::Added(BlockHeaderHash)` with the hash of the committed block if -successful. +This is exposed for use in checkpointing, which produces in-order finalized +blocks. Returns `Response::Added(BlockHeaderHash)` with the hash of the +committed block if successful. -This should be implemented as a wrapper around a function also called by -[`Request::CommitBlock`](#request-commit-block), which should: +If the parent block is not committed, add the block to an internal queue for +future processing. Otherwise, call the wrapper function described below, then +process any queued children. (Although the checkpointer generates verified +blocks in order when it completes a checkpoint, the blocks are committed in the +response futures, so they may arrive out of order). + +Committing a block to the sled state should be implemented as a wrapper around +a function also called by [`Request::CommitBlock`](#request-commit-block), +which should: 1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`. Check that `block`'s parent hash is `old_tip` and its height is @@ -230,12 +257,14 @@ Check that `block`'s parent hash is `old_tip` and its height is to prevent database corruption, but it is the caller's responsibility to commit finalized blocks in order. -2. Insert `(block_hash, block)` into `blocks_by_hash` and - `(BE32(height), block_hash)` into `hash_by_height`. +2. Insert: + - `(hash, height)` into `height_by_hash`; + - `(height, hash)` into `hash_by_height`; + - `(height, block)` into `block_by_height`. 3. Iterate over the enumerated transactions in the block. For each transaction: - 1. Insert `(transaction_hash, block_hash || BE32(tx_index))` to + 1. Insert `(transaction_hash, block_height || BE32(tx_index))` to `tx_by_hash`; 2. For each `TransparentInput::PrevOut { outpoint, .. }` in the @@ -255,7 +284,6 @@ commit finalized blocks in order. [`JoinSplit`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.JoinSplit.html [`Spend`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.Spend.html - These updates can be performed in a batch or without necessarily iterating over all transactions, if the data is available by other means; they're specified this way for clarity. @@ -269,11 +297,15 @@ hash, returning - `Response::Depth(Some(depth))` if the block is in the main chain; - `Response::Depth(None)` otherwise. +Implemented by querying `height_by_hash`. + ### `Request::Tip` [request-tip]: #request-tip Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. +Implemented by querying `hash_by_height`. + ### `Request::BlockLocator` [request-block-locator]: #request-block-locator From 8bbdc0fac32edc5b78d9aa99626ce299671be7a9 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Tue, 8 Sep 2020 11:15:29 -0700 Subject: [PATCH 07/35] rfc: add sprout and sapling anchors to sled trees. Co-authored-by: Deirdre Connolly --- book/src/dev/rfcs/0005-state-updates.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 5314dcda9da..1c496703433 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -176,6 +176,8 @@ We use the following Sled trees: | `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` | | `sprout_nullifiers` | `sprout::Nullifier` | `()` | | `sapling_nullifiers` | `sapling::Nullifier` | `()` | +| `sprout_anchors` | `sprout::tree::Root` | `()` | +| `sapling_anchors` | `sapling::tree::Root` | `()` | Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. @@ -262,7 +264,10 @@ commit finalized blocks in order. - `(height, hash)` into `hash_by_height`; - `(height, block)` into `block_by_height`. -3. Iterate over the enumerated transactions in the block. For each transaction: +3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout + and Sapling anchors (XXX: how??) + +4. Iterate over the enumerated transactions in the block. For each transaction: 1. Insert `(transaction_hash, block_height || BE32(tx_index))` to `tx_by_hash`; From 4a1d1df70a4064687c5d2df20d50463feea61084 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Tue, 8 Sep 2020 11:49:01 -0700 Subject: [PATCH 08/35] rfc: fill in details of state service requests. --- book/src/dev/rfcs/0005-state-updates.md | 55 ++++++++++++++++++++++--- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 1c496703433..778c8078eff 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -293,7 +293,7 @@ These updates can be performed in a batch or without necessarily iterating over all transactions, if the data is available by other means; they're specified this way for clarity. -### `Request::Depth(BlockHeaderHash)` +### `Request::Depth(block::Hash)` [request-depth]: #request-depth Computes the depth in the best chain of the block identified by the given @@ -302,29 +302,72 @@ hash, returning - `Response::Depth(Some(depth))` if the block is in the main chain; - `Response::Depth(None)` otherwise. -Implemented by querying `height_by_hash`. +Implemented by querying: + +- (non-finalized) XXX parts of the non-finalized state; +- (finalized) the `height_by_hash` tree. ### `Request::Tip` [request-tip]: #request-tip Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. -Implemented by querying `hash_by_height`. +Implemented by querying: + +- (non-finalized) XXX parts of the non-finalized state; +- (finalized) the `hash_by_height` tree. ### `Request::BlockLocator` [request-block-locator]: #request-block-locator -- XXX fill in +Returns `Response::BlockLocator(Vec)` with hashes starting from +the current chain tip and reaching backwards towards the genesis block. The +first hash is the current chain tip. The last hash is the tip of the +finalized portion of the state. If the state is empty, the block locator is +also empty. + +This can be used by the sync component to request hashes of subsequent +blocks. + +Implemented by querying: + +- (non-finalized) XXX parts of the non-finalized state; +- (finalized) the `hash_by_height` tree. ### `Request::Transaction(TransactionHash)` [request-transaction]: #request-transaction -- XXX fill in +Returns + +- `Response::Transaction(Some(Transaction))` if the transaction identified by + the given hash is contained in the state; + +- `Response::Transaction(None)` if the transaction identified by the given + hash is not contained in the state. + +Implemented by querying: + +- (non-finalized) XXX parts of the non-finalized state; +- (finalized) the `tx_by_hash` (to get the parent block) and then + `block_by_height` (to get the transaction data) trees. ### `Request::Block(BlockHeaderHash)` [request-block]: #request-block -- XXX fill in +Returns + +- `Response::Block(Some(Arc))` if the block identified by the given + hash is contained in the state; + +- `Response::Block(None)` if the block identified by the given hash is not + contained in the state; + +Implemented by querying: + +- (non-finalized) XXX parts of the non-finalized state; +- (finalized) the `height_by_hash` (to get the block height) and then + `block_by_height` (to get the block data) trees. + # Drawbacks [drawbacks]: #drawbacks From 1c8ae25a524d21b8cef13c9e056599a70c8a040e Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Tue, 8 Sep 2020 12:09:10 -0700 Subject: [PATCH 09/35] rfc: extract commit process from API description --- book/src/dev/rfcs/0005-state-updates.md | 119 ++++++++++++++---------- 1 file changed, 70 insertions(+), 49 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 778c8078eff..9e0105ae3d5 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -48,8 +48,16 @@ state service. * **chain reorganization**: Occurs when a new best chain is found and the previous best chain becomes a side chain. +* **reorg limit**: The longest reorganization accepted by Zcashd, 100 blocks. + * **orphaned block**: A block which is no longer included in the best chain. +* **non-finalized state**: State data corresponding to blocks above the reorg + limit. This data can change in the event of a chain reorg. + +* **finalized state**: State data corresponding to blocks below the reorg + limit. This data cannot change in the event of a chain reorg. + # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -74,10 +82,11 @@ finality, while in Zcash, chain state is final once it is beyond the reorg limit. To simplify our implementation, we split the representation of the state data at the finality boundary provided by the reorg limit. -State data from blocks *above* the reorg limit is stored in-memory using -immutable data structures from the `im` crate. State data from blocks *below* -the reorg limit is stored persistently using `sled`. This allows a -simplification of our state handling, because only final data is persistent. +State data from blocks *above* the reorg limit (*non-finalized state*) is +stored in-memory using immutable data structures from the `im` crate. State +data from blocks *below* the reorg limit (*finalized state*) is stored +persistently using `sled`. This allows a simplification of our state +handling, because only finalized data is persistent. We choose `im` because it provides best-in-class manipulation of persistent data structures. We choose `sled` because of its ease of integration and API @@ -87,6 +96,13 @@ One downside of this design is that restarting the node loses the last 100 blocks, but node restarts are relatively infrequent and a short re-sync is cheap relative to the cost of additional implementation complexity. +Another downside of this design is that we do not achieve exactly the same +behavior as Zcashd in the event of a 51% attack: Zcashd limits *each* chain +reorganization to 100 blocks, but permits multiple reorgs, while Zebra limits +*all* chain reorgs to 100 blocks. In the event of a successful 51% attack on +Zcash, this could be resolved by wiping the Sled state and re-syncing the new +chain, but in this scenario there are worse problems. + ## Service Interface [service-interface]: #service-interface @@ -155,7 +171,24 @@ chains, so that the map ordering is the ordering of best to worst chains. spent by some block etc.) to speed up checks. When a new block extends the best chain past 100 blocks, the old root is -removed from the in-memory state and committed to sled. +removed from the non-finalized state and committed to the finalized state. + +## Committing non-finalized blocks + +If the parent block is not committed, add the block to an internal queue for +future processing. + +Otherwise, attempt to perform contextual validation checks and the commit the +given block to the state. The exact list of contextual validation checks will +be specified in a later RFC. If contextual validation checks succeed, commit +the new blocks to the non-finalized state as described below. Next, if the +resulting non-finalized chain is longer than 100 blocks, the oldest block is +now past the reorg limit. Remove it from the non-finalized state and commit +it as a finalized block, as described in the next section. + +XXX: fill in details of non-finalized state. + +Finally, process any queued children of the newly committed block the same way. ## Sled data structures [sled]: #sled @@ -200,52 +233,11 @@ Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. block. This would more traditionally be a `(hash, index)` pair, but because we store blocks by height, storing the height saves one level of indirection. - -## Request / Response API -[request-response]: #request-response - -The state API is provided by a pair of `Request`/`Response` enums. Each -`Request` variant corresponds to particular `Response` variants, and it's -fine (and encouraged) for caller code to unwrap the expected variants with -`unreachable!` on the unexpected variants. This is slightly inconvenient but -it means that we have a unified state interface with unified backpressure. - -This API includes both write and read calls. Spotting `Commit` requests in -code review should not be a problem, but in the future, if we need to -restrict access to write calls, we could implement a wrapper service that -rejects these, and export "read" and "write" frontends to the same inner service. - -### `Request::CommitBlock(Arc)` -[request-commit-block]: #request-commit-block - -Performs contextual validation of the given block, committing it to the state -if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of -the newly committed block or an error. - -If the parent block is not committed, add the block to an internal queue for -future processing. - -Otherwise, attempt to perform contextual validation checks and the commit the -given block to the state. The exact list of contextual validation checks will -be specified in a later RFC. If contextual validation checks succeed, the new -block is added to one of the in-memory chains. If the resulting chain is -longer than 100 blocks, the oldest block is now past the reorg limit, so we -remove it from all in-memory chains and commit it to sled as described below -in `CommitFinalizedBlock`. - -Finally, process any queued children of the newly committed block the same way. - -### `Request::CommitFinalizedBlock(Arc)` -[request-commit-finalized-block]: #request-finalized-block - -Commits a finalized block to the sled state, skipping contextual validation. -This is exposed for use in checkpointing, which produces in-order finalized -blocks. Returns `Response::Added(BlockHeaderHash)` with the hash of the -committed block if successful. +## Committing finalized blocks If the parent block is not committed, add the block to an internal queue for -future processing. Otherwise, call the wrapper function described below, then -process any queued children. (Although the checkpointer generates verified +future processing. Otherwise, commit the block described below, then +commit any queued children. (Although the checkpointer generates verified blocks in order when it completes a checkpoint, the blocks are committed in the response futures, so they may arrive out of order). @@ -293,6 +285,35 @@ These updates can be performed in a batch or without necessarily iterating over all transactions, if the data is available by other means; they're specified this way for clarity. + +## Request / Response API +[request-response]: #request-response + +The state API is provided by a pair of `Request`/`Response` enums. Each +`Request` variant corresponds to particular `Response` variants, and it's +fine (and encouraged) for caller code to unwrap the expected variants with +`unreachable!` on the unexpected variants. This is slightly inconvenient but +it means that we have a unified state interface with unified backpressure. + +This API includes both write and read calls. Spotting `Commit` requests in +code review should not be a problem, but in the future, if we need to +restrict access to write calls, we could implement a wrapper service that +rejects these, and export "read" and "write" frontends to the same inner service. + +### `Request::CommitBlock(Arc)` +[request-commit-block]: #request-commit-block + +Performs contextual validation of the given block, committing it to the state +if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of +the newly committed block or an error. + +### `Request::CommitFinalizedBlock(Arc)` +[request-commit-finalized-block]: #request-finalized-block + +Commits a finalized block to the sled state, skipping contextual validation. +This is exposed for use in checkpointing, which produces in-order finalized +blocks. Returns `Response::Added(BlockHeaderHash)` with the hash of the +committed block if successful. ### `Request::Depth(block::Hash)` [request-depth]: #request-depth From a9559c28dbd3396162ed3b914e88e9d0f6474b84 Mon Sep 17 00:00:00 2001 From: Henry de Valence Date: Tue, 8 Sep 2020 12:56:19 -0700 Subject: [PATCH 10/35] rfc: add anchor parameters to CommitBlock. These have to be computed by a verifier, so passing them as parameters means we don't recompute them. --- book/src/dev/rfcs/0005-state-updates.md | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 9e0105ae3d5..7202b45dcf5 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -300,20 +300,37 @@ code review should not be a problem, but in the future, if we need to restrict access to write calls, we could implement a wrapper service that rejects these, and export "read" and "write" frontends to the same inner service. -### `Request::CommitBlock(Arc)` +### `Request::CommitBlock` [request-commit-block]: #request-commit-block +```rust +CommitBlock { + block: Arc, + sprout_anchor: sprout::tree::Root, + sapling_anchor: sapling::tree::Root, +} +``` + Performs contextual validation of the given block, committing it to the state if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of the newly committed block or an error. -### `Request::CommitFinalizedBlock(Arc)` +### `Request::CommitFinalizedBlock` [request-commit-finalized-block]: #request-finalized-block +```rust +CommitFinalizedBlock { + block: Arc, + sprout_anchor: sprout::tree::Root, + sapling_anchor: sapling::tree::Root, +} +``` + Commits a finalized block to the sled state, skipping contextual validation. This is exposed for use in checkpointing, which produces in-order finalized blocks. Returns `Response::Added(BlockHeaderHash)` with the hash of the committed block if successful. + ### `Request::Depth(block::Hash)` [request-depth]: #request-depth From 64cbc666876a101d1dfe505461852543518194e1 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 8 Sep 2020 16:41:04 -0700 Subject: [PATCH 11/35] WIP for in memory state structs --- book/src/dev/rfcs/0005-state-updates.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 7202b45dcf5..2e18dd83089 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -167,6 +167,25 @@ chains, so that the map ordering is the ordering of best to worst chains. - XXX fill in details on exact types +- **Chain**: `(im::OrdMap>, HashSet, HashSet)` + - Ord impl is ordered by work + - push => add a block to the end of a chain, does contextual verification + checks, extracts info from block for extra data sets + - pop => remove the lowest block + - fork => create a new chain fork based on a given block(hash) within + another chain, calls push repeatedly +- **ChainSet**: `(BTreeSet, BTreeMap>)` + - `fn finalize(&mut self) -> Arc` + - `fn commit_block(&mut self, block: Arc) -> Result<(), Err>` + - iterate through chains, if the parent of `block` is the tip of any + chains, try to push block onto that chain + - if not, iterate through chains, checking if the parent is contained in + that chain + - if so, fork the chain at the parent and try to push `block` onto that + fork and add newly extended chain if the push succeeds + - if not, queue block, prune queued blocks that are below the reorg limit + + - XXX work out whether we should store extra data (e.g., a HashSet of UTXOs spent by some block etc.) to speed up checks. From c830c0c5aadc48824117c2503c9e15204f95ac4a Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Thu, 10 Sep 2020 11:43:13 -0700 Subject: [PATCH 12/35] tweeks from end of session with henry --- book/src/dev/rfcs/0005-state-updates.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 2e18dd83089..d5fb1c50f5c 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -167,11 +167,11 @@ chains, so that the map ordering is the ordering of best to worst chains. - XXX fill in details on exact types -- **Chain**: `(im::OrdMap>, HashSet, HashSet)` +- **Chain**: `(im::OrdMap>, HashSet, HashSet, HashSet, HashSet)` - Ord impl is ordered by work - push => add a block to the end of a chain, does contextual verification checks, extracts info from block for extra data sets - - pop => remove the lowest block + - pop => remove the lowest block, remove references to contents from block in various extra data sets (nullifiers, hashes, etc) - fork => create a new chain fork based on a given block(hash) within another chain, calls push repeatedly - **ChainSet**: `(BTreeSet, BTreeMap>)` From d26ecb825457209c11b332146047e51f2d57c5b1 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 11 Sep 2020 12:57:05 -0700 Subject: [PATCH 13/35] more updates from pairing --- book/src/dev/rfcs/0005-state-updates.md | 103 +++++++++++++++++++++++- 1 file changed, 99 insertions(+), 4 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index d5fb1c50f5c..b1dd6448b69 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -165,16 +165,111 @@ each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from difficulty to chains, so that the map ordering is the ordering of best to worst chains. +- index queued blocks by height rather than by hash because it lets us simultaniously limit the number of candidates as well as know when we want to prune the queue. + - XXX fill in details on exact types -- **Chain**: `(im::OrdMap>, HashSet, HashSet, HashSet, HashSet)` - - Ord impl is ordered by work + +```rust +struct Chain { + blocks: BTreeMap>, + height_by_hash: HashMap, + + utxos: HashSet, + sapling_anchors: HashSet, + sprout_anchors: HashSet, + sapling_nullifiers: HashSet, + sprout_nullifiers: HashSet, + partial_cumulative_work: PartialCumulativeWork, +} + +impl Chain { + // Push a block into a chain as the new tip + fn push(&mut self, block: Arc) -> Result<(), Error> { + // Do contextual validation checks... + + // Add block to end of `self.blocks` + // Add hash to `height_by_hash` + // Add new utxos and remove consumed utxos from `self.utxos` + // Add anchors to the appropriate `self._anchors` + // Add nullifiers to the appropriate `self._nullifiers` + // Add work to `self.partial_cumulative_work` + } + + fn pop_root(&mut self) -> Arc { + // Remove the lowest height block from `self.blocks` + // Remove the corresponding hash from `self.height_by_hash` + // Remove new utxos from `self.utxos` + // Remove the anchors from the appropriate `self._anchors` + // Remove the nullifiers from the appropriate `self._nullifiers` + + // Return the block + } + + fn pop_tip(&mut self) -> Arc { + // Remove the hightest height block from `self.blocks` + // Remove the corresponding hash from `self.height_by_hash` + // Add consumed utxos and remove new utxos from `self.utxos` + // Remove anchors from the appropriate `self._anchors` + // Remove the nullifiers from the appropriate `self._nullifiers` + // Subtract work from `self.partial_cumulative_work` + + // return the block + } + + fn fork(&self, parent: block::Hash) -> Self { + // assert self contains parent + // clone self + // while clone.tip.hash != parent { let _ = self.pop_tip(); } + // return clone + } +} + +impl Ord for Chain { + fn cmp(&self, other: &Self) -> Ordering { + // first compare partial_cumulative_work, return ordering if not equal + // if tied compare blockheaderhashes of the respective tips, return ordering + } +} + +struct ChainSet { + chains: BTreeSet, + queued_blocks: BTreeMap>>, +} + +impl ChainSet { + fn finalize(&mut self) -> Arc { + // move all chains to a temporary vec of chains + // pop root block from the best chain, called `block` hereafter + // add best chain back to `self.chains` + // iterate over the remaining of chains + // if chain starts with `block` remove `block` and re-add too `self.chains` + // if chain doesn't start with `block` drop chain + + // return `block` + } + + fn commit_block(&mut self, block: Arc) -> Result<(), Error> { + // iterate through chains, + // if parent of `block` is tip of any chains, try to push block onto + // that chain + // + // if no chains end in `block`'s parent search for a chain that contains `block` + // if a chain is found, for chain at `block.parent`, try to push `block` onto `fork`, if successful add fork to `self.chains` + // if no chain is found queue block, prune queued blocks that are below the reorg limit + } +} +``` + +- **Chain**: `(im::OrdMap>, HashSet, HashSet, HashSet, HashSet, PartialCumulativeWork)` + - Ord impl is ordered by work, tie break using block header hash - push => add a block to the end of a chain, does contextual verification checks, extracts info from block for extra data sets - pop => remove the lowest block, remove references to contents from block in various extra data sets (nullifiers, hashes, etc) + - pop_tip => remove the highest block, remove references to contents in the block - fork => create a new chain fork based on a given block(hash) within - another chain, calls push repeatedly -- **ChainSet**: `(BTreeSet, BTreeMap>)` + another chain, clones the original chain and calls pop_tip repeatedly until the given block is the tip +- **ChainSet**: `(BTreeSet, queued_blocks: BTreeMap>>)` - `fn finalize(&mut self) -> Arc` - `fn commit_block(&mut self, block: Arc) -> Result<(), Err>` - iterate through chains, if the parent of `block` is the tip of any From c18af0b16abf1b5f0c2207076e85392da56b89d9 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 11 Sep 2020 16:02:59 -0700 Subject: [PATCH 14/35] rewrite non-finalized state sections --- book/src/dev/rfcs/0005-state-updates.md | 268 ++++++++++++++---------- 1 file changed, 152 insertions(+), 116 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index b1dd6448b69..d224d04c728 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -83,14 +83,11 @@ limit. To simplify our implementation, we split the representation of the state data at the finality boundary provided by the reorg limit. State data from blocks *above* the reorg limit (*non-finalized state*) is -stored in-memory using immutable data structures from the `im` crate. State -data from blocks *below* the reorg limit (*finalized state*) is stored -persistently using `sled`. This allows a simplification of our state -handling, because only finalized data is persistent. - -We choose `im` because it provides best-in-class manipulation of persistent -data structures. We choose `sled` because of its ease of integration and API -simplicity. +stored in-memory and handles multiple chains. State data from blocks *below* +the reorg limit (*finalized state*) is stored persistently using `sled` and +only tracks a single chain. This allows a simplification of our state +handling, because only finalized data is persistent and the logic for +finalized data handles less invariants. One downside of this design is that restarting the node loses the last 100 blocks, but node restarts are relatively infrequent and a short re-sync is @@ -165,11 +162,19 @@ each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from difficulty to chains, so that the map ordering is the ordering of best to worst chains. -- index queued blocks by height rather than by hash because it lets us simultaniously limit the number of candidates as well as know when we want to prune the queue. +- index queued blocks by height rather than by hash because it lets us + simultaniously limit the number of candidates as well as know when we want + to prune the queue. - XXX fill in details on exact types +### `Chain` Type +[chain-type]: #chain-type + +We represent the non-finalized portion of a chain with the following data +structure and API: + ```rust struct Chain { blocks: BTreeMap>, @@ -182,127 +187,158 @@ struct Chain { sprout_nullifiers: HashSet, partial_cumulative_work: PartialCumulativeWork, } +``` -impl Chain { - // Push a block into a chain as the new tip - fn push(&mut self, block: Arc) -> Result<(), Error> { - // Do contextual validation checks... - - // Add block to end of `self.blocks` - // Add hash to `height_by_hash` - // Add new utxos and remove consumed utxos from `self.utxos` - // Add anchors to the appropriate `self._anchors` - // Add nullifiers to the appropriate `self._nullifiers` - // Add work to `self.partial_cumulative_work` - } - - fn pop_root(&mut self) -> Arc { - // Remove the lowest height block from `self.blocks` - // Remove the corresponding hash from `self.height_by_hash` - // Remove new utxos from `self.utxos` - // Remove the anchors from the appropriate `self._anchors` - // Remove the nullifiers from the appropriate `self._nullifiers` - - // Return the block - } - - fn pop_tip(&mut self) -> Arc { - // Remove the hightest height block from `self.blocks` - // Remove the corresponding hash from `self.height_by_hash` - // Add consumed utxos and remove new utxos from `self.utxos` - // Remove anchors from the appropriate `self._anchors` - // Remove the nullifiers from the appropriate `self._nullifiers` - // Subtract work from `self.partial_cumulative_work` - - // return the block - } - - fn fork(&self, parent: block::Hash) -> Self { - // assert self contains parent - // clone self - // while clone.tip.hash != parent { let _ = self.pop_tip(); } - // return clone - } -} +The `Chain` type consists of a set of blocks, representing the non-finalized +portion of the chain it represents where the lowest height block's parent is +the tip of the finalized state. All of the other members cache information +contained within that set of blocks for fast lookup. -impl Ord for Chain { - fn cmp(&self, other: &Self) -> Ordering { - // first compare partial_cumulative_work, return ordering if not equal - // if tied compare blockheaderhashes of the respective tips, return ordering - } -} +The `Chain` type exposes 3 public functions to manipulate chain data structures and one private helper function. + +#### `pub fn push(&mut self, block: Arc) -> Result<(), Error>` + +Push a block into a chain as the new tip if the block is a valid extension of +that chain. + +1. Run contextual validation checks on block against Self +1. Update cummulative data members + - Add block to end of `self.blocks` + - Add hash to `height_by_hash` + - Add new utxos and remove consumed utxos from `self.utxos` + - Add anchors to the appropriate `self._anchors` + - Add nullifiers to the appropriate `self._nullifiers` + - Add work to `self.partial_cumulative_work` + +#### `pub fn pop_root(&mut self) -> Arc` + +Remove the lowest height block of the non-finalized portion of a chain. + +1. Remove the lowest height block from `self.blocks` +1. Update cummulative data members + - Remove the block's hash from `self.height_by_hash` + - Remove new utxos from `self.utxos` + - Remove the anchors from the appropriate `self._anchors` + - Remove the nullifiers from the appropriate `self._nullifiers` +1. Return the block + +**Note**: We do not subtract work from `self.partial_cummulative_work`. This +is to make make the ordering of chains stable while finalizing blocks. + +#### `pub fn fork(&self, new_tip: block::Hash) -> Option` + +Fork a chain at the block with the given hash, if it is part of this chain. + +1. If `self` does not contain `new_tip` return `None` +2. Clone self as `forked` +3. While the tip of `forked` is not equal to `new_tip` + - call `forked.pop_tip()` and discard the old tip +4. Return `forked` + +#### `fn pop_tip(&mut self) -> Arc` + +Remove the highest height block of the non-finalized portion of a chain. + +1. Remove the highest height block from `self.blocks` +1. Update cummulative data members + - Remove the corresponding hash from `self.height_by_hash` + - Add consumed utxos and remove new utxos from `self.utxos` + - Remove anchors from the appropriate `self._anchors` + - Remove the nullifiers from the appropriate `self._nullifiers` + - Subtract work from `self.partial_cumulative_work` +1. Return the block +#### `Ord` + +The `Chain` type also implements `Ord` for reorganizing chains. First chains +are compared by their `partial_cummulative_work`. Ties are then broken by +comparing `BlockHeaderHashes` of the tips of each chain. + +### `ChainSet` Type +[chainset-type]: #chainset-type + +The `ChainSet` type represents the set of all non-finalized state. It +consists of a set of non-finalized but verified chains and a set of +unverified blocks which are waiting for the full context needed to verify +them to become available. + +`ChainState` is defined by the following structure: + +```rust struct ChainSet { chains: BTreeSet, queued_blocks: BTreeMap>>, } - -impl ChainSet { - fn finalize(&mut self) -> Arc { - // move all chains to a temporary vec of chains - // pop root block from the best chain, called `block` hereafter - // add best chain back to `self.chains` - // iterate over the remaining of chains - // if chain starts with `block` remove `block` and re-add too `self.chains` - // if chain doesn't start with `block` drop chain - - // return `block` - } - - fn commit_block(&mut self, block: Arc) -> Result<(), Error> { - // iterate through chains, - // if parent of `block` is tip of any chains, try to push block onto - // that chain - // - // if no chains end in `block`'s parent search for a chain that contains `block` - // if a chain is found, for chain at `block.parent`, try to push `block` onto `fork`, if successful add fork to `self.chains` - // if no chain is found queue block, prune queued blocks that are below the reorg limit - } -} ``` -- **Chain**: `(im::OrdMap>, HashSet, HashSet, HashSet, HashSet, PartialCumulativeWork)` - - Ord impl is ordered by work, tie break using block header hash - - push => add a block to the end of a chain, does contextual verification - checks, extracts info from block for extra data sets - - pop => remove the lowest block, remove references to contents from block in various extra data sets (nullifiers, hashes, etc) - - pop_tip => remove the highest block, remove references to contents in the block - - fork => create a new chain fork based on a given block(hash) within - another chain, clones the original chain and calls pop_tip repeatedly until the given block is the tip -- **ChainSet**: `(BTreeSet, queued_blocks: BTreeMap>>)` - - `fn finalize(&mut self) -> Arc` - - `fn commit_block(&mut self, block: Arc) -> Result<(), Err>` - - iterate through chains, if the parent of `block` is the tip of any - chains, try to push block onto that chain - - if not, iterate through chains, checking if the parent is contained in - that chain - - if so, fork the chain at the parent and try to push `block` onto that - fork and add newly extended chain if the push succeeds - - if not, queue block, prune queued blocks that are below the reorg limit - - -- XXX work out whether we should store extra data (e.g., a HashSet of UTXOs - spent by some block etc.) to speed up checks. - -When a new block extends the best chain past 100 blocks, the old root is -removed from the non-finalized state and committed to the finalized state. +And provides the following two public methods for manipulating the +non-finalized state: -## Committing non-finalized blocks +#### `pub fn finalize(&mut self) -> Arc` -If the parent block is not committed, add the block to an internal queue for -future processing. +Finalize the lowest height block in the non-finalized portion of a chain and +updates all side chains to match. -Otherwise, attempt to perform contextual validation checks and the commit the -given block to the state. The exact list of contextual validation checks will -be specified in a later RFC. If contextual validation checks succeed, commit -the new blocks to the non-finalized state as described below. Next, if the -resulting non-finalized chain is longer than 100 blocks, the oldest block is -now past the reorg limit. Remove it from the non-finalized state and commit -it as a finalized block, as described in the next section. +1. Move all chains from `self.chains` into a temporary buffer, a `Vec` for + example`, so they can be mutated. +1. Remove the lowest height block from the best chain with + `let block = best_chain.pop_root();` +1. Add `best_chain` back to `self.chains` +1. For each remaining `chain` + - If `chain` starts with `block`, remove `block` and add `chain` back to + `self.chains` + - Else, drop `chain` +1. Return `block` -XXX: fill in details of non-finalized state. +### `pub fn commit_block(&mut self, block: Arc) -> Result<(), Error>` + +Try to commit `block` to the non-finalized state. + +1. For each `chain` + - if `block.parent` == `chain.tip` + - try to push `block` onto that chain + - return result of `chain.push(block)` +1. Find the first chain that contains `block.parent` and fork it with + `block.parent` as the new tip + - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` +1. If `fork` is `Some` + - try to push `block` onto that chain + - return result of `chain.push(block)` +1. Else add `block` to `self.queued_blocks` + +### `pub fn process_queued_blocks(&mut self)` + +XXX: fill out description + +XXX: Do we need to add some channels to the `queued_blocks` for notifying +consumers that their blocks have been processed? + +In Summary: + +- `Chain` represents the non-finalized portion of a single chain +- `ChainSet` represents the non-finalized portion of all chains and all + unverified blocks that are waiting for context to be available. +- `chain_set::commit_block` handles committing or queueing blocks and + reorganizing chains but not finalizing them +- Finalized blocks are returned from `finalize` and must still be committed + to disk afterwards + +## Committing non-finalized blocks -Finally, process any queued children of the newly committed block the same way. +Given the above structures for manipulating the non-finalized state new +`non-finalized` blocks are commited in 3 steps. First we commit the block to +the in memory state, then we finalize the lowest height block if it is past +the reorg limit, finally we process any queued blocks and prune any that are +now past the reorg limit. + +1. Try to commit the block to the non-finalized state with + `chain_set.commit_block(block)?;` +1. If the best chain is longer than the reorg limit + - Finalize the lowest height block in the best chain with + `let finalized = chain_set.finalize()?;` + - commit `finalized` to disk with `CommitFinalizedBlock` +1. Process and prune any queued blocks with + `chain_set.process_queued_blocks();` ## Sled data structures [sled]: #sled From 636bcdf2cdc9240c33634e9e2a6f96003c04e1bf Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 11 Sep 2020 17:05:07 -0700 Subject: [PATCH 15/35] update query instructions for each request --- book/src/dev/rfcs/0005-state-updates.md | 35 ++++++++++++++++--------- 1 file changed, 23 insertions(+), 12 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index d224d04c728..057db4195f9 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -179,6 +179,7 @@ structure and API: struct Chain { blocks: BTreeMap>, height_by_hash: HashMap, + tx_by_hash: HashMap, utxos: HashSet, sapling_anchors: HashSet, @@ -189,12 +190,13 @@ struct Chain { } ``` -The `Chain` type consists of a set of blocks, representing the non-finalized +The `Chain` type consists of a set of blocks, containing the non-finalized portion of the chain it represents where the lowest height block's parent is -the tip of the finalized state. All of the other members cache information -contained within that set of blocks for fast lookup. +the tip of the finalized state. All of the other members of `Chain` cache +information contained within that set of blocks for fast lookup. -The `Chain` type exposes 3 public functions to manipulate chain data structures and one private helper function. +The `Chain` type exposes 3 public functions to manipulate chain data +structures and one private helper function. #### `pub fn push(&mut self, block: Arc) -> Result<(), Error>` @@ -205,6 +207,8 @@ that chain. 1. Update cummulative data members - Add block to end of `self.blocks` - Add hash to `height_by_hash` + - for each `transaction` in `block` + - add key: `transaction.hash` and value: `(height, tx_index)` to `tx_by_hash` - Add new utxos and remove consumed utxos from `self.utxos` - Add anchors to the appropriate `self._anchors` - Add nullifiers to the appropriate `self._nullifiers` @@ -217,6 +221,8 @@ Remove the lowest height block of the non-finalized portion of a chain. 1. Remove the lowest height block from `self.blocks` 1. Update cummulative data members - Remove the block's hash from `self.height_by_hash` + - for each `transaction` in `block` + - remove `transaction.hash` from `tx_by_hash` - Remove new utxos from `self.utxos` - Remove the anchors from the appropriate `self._anchors` - Remove the nullifiers from the appropriate `self._nullifiers` @@ -239,9 +245,11 @@ Fork a chain at the block with the given hash, if it is part of this chain. Remove the highest height block of the non-finalized portion of a chain. -1. Remove the highest height block from `self.blocks` +1. Remove the highest height `block` from `self.blocks` 1. Update cummulative data members - Remove the corresponding hash from `self.height_by_hash` + - for each `transaction` in `block` + - remove `transaction.hash` from `tx_by_hash` - Add consumed utxos and remove new utxos from `self.utxos` - Remove anchors from the appropriate `self._anchors` - Remove the nullifiers from the appropriate `self._nullifiers` @@ -492,8 +500,8 @@ hash, returning Implemented by querying: -- (non-finalized) XXX parts of the non-finalized state; -- (finalized) the `height_by_hash` tree. +- (non-finalized) the `height_by_hash` map in the best chain +- (finalized) the `height_by_hash` tree ### `Request::Tip` [request-tip]: #request-tip @@ -502,8 +510,8 @@ Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. Implemented by querying: -- (non-finalized) XXX parts of the non-finalized state; -- (finalized) the `hash_by_height` tree. +- (non-finalized) the highest height block in the best chain +- (finalized) the `hash_by_height` tree only if there is no `non-finalized` state ### `Request::BlockLocator` [request-block-locator]: #request-block-locator @@ -519,7 +527,7 @@ blocks. Implemented by querying: -- (non-finalized) XXX parts of the non-finalized state; +- (non-finalized) the `hash_by_height` map in the best chain - (finalized) the `hash_by_height` tree. ### `Request::Transaction(TransactionHash)` @@ -535,7 +543,9 @@ Returns Implemented by querying: -- (non-finalized) XXX parts of the non-finalized state; +- (non-finalized) the `tx_by_hash` map (to get the parent block) of each + chain starting with the best chain, and then find block in `blocks` of that + chain. - (finalized) the `tx_by_hash` (to get the parent block) and then `block_by_height` (to get the transaction data) trees. @@ -552,7 +562,8 @@ Returns Implemented by querying: -- (non-finalized) XXX parts of the non-finalized state; +- (non-finalized) the `height_by_hash` of each chain starting with the best + chain, then find block in `blocks` of that chain. - (finalized) the `height_by_hash` (to get the block height) and then `block_by_height` (to get the block data) trees. From 323a73082d1d9308e1d35c9ab883f77390e83840 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Mon, 14 Sep 2020 18:13:32 -0700 Subject: [PATCH 16/35] more updates --- book/src/dev/rfcs/0005-state-updates.md | 61 +++++++++++++------------ 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 057db4195f9..187bcef352a 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -162,13 +162,6 @@ each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from difficulty to chains, so that the map ordering is the ordering of best to worst chains. -- index queued blocks by height rather than by hash because it lets us - simultaniously limit the number of candidates as well as know when we want - to prune the queue. - -- XXX fill in details on exact types - - ### `Chain` Type [chain-type]: #chain-type @@ -270,22 +263,22 @@ consists of a set of non-finalized but verified chains and a set of unverified blocks which are waiting for the full context needed to verify them to become available. -`ChainState` is defined by the following structure: +`ChainState` is defined by the following structure and API: ```rust struct ChainSet { chains: BTreeSet, - queued_blocks: BTreeMap>>, + + queued_blocks: BTreeMap, + queued_blocks_by_parent: BTreeMap>, + queued_by_height: BTreeMap>, } ``` -And provides the following two public methods for manipulating the -non-finalized state: - #### `pub fn finalize(&mut self) -> Arc` -Finalize the lowest height block in the non-finalized portion of a chain and -updates all side chains to match. +Finalize the lowest height block in the non-finalized portion of the best +chain and updates all side chains to match. 1. Move all chains from `self.chains` into a temporary buffer, a `Vec` for example`, so they can be mutated. @@ -298,36 +291,48 @@ updates all side chains to match. - Else, drop `chain` 1. Return `block` -### `pub fn commit_block(&mut self, block: Arc) -> Result<(), Error>` +### `pub fn queue(&mut self, block: QueuedBlock)` + +Queue a non-finalized block to be committed to the state. -Try to commit `block` to the non-finalized state. +After queueing a non-finalized block, this method checks whether the newly +queued block (and any of its descendants) can be committed to the state + +1. Add `block` to `self.queued_blocks` and related members +1. add `block.hash` to a new list of `pending` blocks to be processed +1. while let Some(`hash`) = pending.pop() + - lookup the `block` for `hash` + - if `self.commit_block(block)` returns Some(`result`) + - broadcast `result` via `block.rsp_tx` + - remove `block` from `self.queued_blocks` and related members + - for each `hash` in `self.queued_blocks_by_parent.get(block.hash)` + - add `hash` to `pending` + +### `fn commit_block(&mut self, block: Arc) -> Option>` + +Try to commit `block` to the non-finalized state. Returns `None` if the block +cannot be committed due to missing context. 1. For each `chain` - if `block.parent` == `chain.tip` - try to push `block` onto that chain - - return result of `chain.push(block)` + - return Some(result) where `result` is result of `chain.push(block)` 1. Find the first chain that contains `block.parent` and fork it with `block.parent` as the new tip - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` 1. If `fork` is `Some` - try to push `block` onto that chain - - return result of `chain.push(block)` -1. Else add `block` to `self.queued_blocks` - -### `pub fn process_queued_blocks(&mut self)` - -XXX: fill out description - -XXX: Do we need to add some channels to the `queued_blocks` for notifying -consumers that their blocks have been processed? + - if successful add `fork` to `self.chains` + - return Some(result) where `result` is result of `fork.push(block)` +1. Else return None In Summary: - `Chain` represents the non-finalized portion of a single chain - `ChainSet` represents the non-finalized portion of all chains and all unverified blocks that are waiting for context to be available. -- `chain_set::commit_block` handles committing or queueing blocks and - reorganizing chains but not finalizing them +- `chain_set::queue` handles queueing and or commiting blocks and + reorganizing chains (via `commit_block`) but not finalizing them - Finalized blocks are returned from `finalize` and must still be committed to disk afterwards From d6d63bf2575caf938feaca1eceb78bd8601df9a5 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 15 Sep 2020 12:44:20 -0700 Subject: [PATCH 17/35] updates from pairing with henry --- book/src/dev/rfcs/0005-state-updates.md | 53 +++++++++++++++---------- 1 file changed, 33 insertions(+), 20 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 187bcef352a..54061ac0129 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -270,8 +270,8 @@ struct ChainSet { chains: BTreeSet, queued_blocks: BTreeMap, - queued_blocks_by_parent: BTreeMap>, - queued_by_height: BTreeMap>, + queued_by_parent: BTreeMap>, + queued_by_height: BTreeMap>, } ``` @@ -280,15 +280,21 @@ struct ChainSet { Finalize the lowest height block in the non-finalized portion of the best chain and updates all side chains to match. -1. Move all chains from `self.chains` into a temporary buffer, a `Vec` for - example`, so they can be mutated. +1. Extract the best chain from `self.chains` into `best_chain` +1. Extract the rest of the chains into a `side_chains` temporary variable, so + they can be mutated 1. Remove the lowest height block from the best chain with `let block = best_chain.pop_root();` 1. Add `best_chain` back to `self.chains` -1. For each remaining `chain` +1. For each remaining `chain` in `side_chains` - If `chain` starts with `block`, remove `block` and add `chain` back to `self.chains` - Else, drop `chain` +1. for each `height` in `self.queued_by_height` where the height is lower than the + new reorg limit + - for each `hash` in `self.queued_by_height` at `height` + - Remove the key `hash` from `self.queued_blocks` and store the removed `block` + - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash 1. Return `block` ### `pub fn queue(&mut self, block: QueuedBlock)` @@ -298,17 +304,23 @@ Queue a non-finalized block to be committed to the state. After queueing a non-finalized block, this method checks whether the newly queued block (and any of its descendants) can be committed to the state -1. Add `block` to `self.queued_blocks` and related members -1. add `block.hash` to a new list of `pending` blocks to be processed -1. while let Some(`hash`) = pending.pop() - - lookup the `block` for `hash` - - if `self.commit_block(block)` returns Some(`result`) - - broadcast `result` via `block.rsp_tx` - - remove `block` from `self.queued_blocks` and related members - - for each `hash` in `self.queued_blocks_by_parent.get(block.hash)` - - add `hash` to `pending` +1. Check if the parent block exists in any current chain +1. If it does, call `let ret = self.commit_block(block)` + 1. Call `self.process_queued(new_parents)` if `ret` is `Some` +1. Else Add `block` to `self.queued_blocks` and related members and return + +### `fn process_queued(&mut self, new_parent: block::Hash)` -### `fn commit_block(&mut self, block: Arc) -> Option>` +1. Create a list of `new_parents` and populate it with `new_parent` +1. While let Some(parent) = new_parents.pop() + - for each `hash` in `self.queued_by_parent.remove(&parent.hash)` + - lookup the `block` for `hash` + - remove `block` from `self.queued_blocks` + - remove `hash` from `self.queued_by_height` + - let result = `self.commit_block(block)`; + - add `result` to `new_parents` + +### `fn commit_block(&mut self, block: QueuedBlock) -> Option` Try to commit `block` to the non-finalized state. Returns `None` if the block cannot be committed due to missing context. @@ -316,15 +328,18 @@ cannot be committed due to missing context. 1. For each `chain` - if `block.parent` == `chain.tip` - try to push `block` onto that chain - - return Some(result) where `result` is result of `chain.push(block)` + - broadcast `result` via `block.rsp_tx` + - return Some(block.hash) if `result.is_ok()` 1. Find the first chain that contains `block.parent` and fork it with `block.parent` as the new tip - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` 1. If `fork` is `Some` - try to push `block` onto that chain - if successful add `fork` to `self.chains` - - return Some(result) where `result` is result of `fork.push(block)` -1. Else return None + - broadcast `result` via `block.rsp_tx` + - return Some(block.hash) if `result.is_ok()` +1. Else panic, this should be unreachable because `commit_block` is only + called when it's ready to be committed. In Summary: @@ -350,8 +365,6 @@ now past the reorg limit. - Finalize the lowest height block in the best chain with `let finalized = chain_set.finalize()?;` - commit `finalized` to disk with `CommitFinalizedBlock` -1. Process and prune any queued blocks with - `chain_set.process_queued_blocks();` ## Sled data structures [sled]: #sled From f8ea9f1ec7cf6d661fa3db566685c25b7f55a24a Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 15 Sep 2020 13:56:40 -0700 Subject: [PATCH 18/35] updates from proofreading solo --- book/src/dev/rfcs/0005-state-updates.md | 69 +++++++++++++++++-------- 1 file changed, 48 insertions(+), 21 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 54061ac0129..9f441eef768 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -3,7 +3,8 @@ - Feature Name: state_updates - Start Date: 2020-08-14 - Design PR: https://github.com/ZcashFoundation/zebra/pull/902 -- Zebra Issue: XXX +- Zebra Issue: https://github.com/ZcashFoundation/zebra/issues/1049 + # Summary [summary]: #summary @@ -63,6 +64,8 @@ state service. XXX fill in after writing other details +XXX(jane): I am planning on writing a guide-level explanation of the interface to zebra-state, intended for consumers of the `zebra-state` crate. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -152,7 +155,8 @@ In summary: - **Sled reads** may be done synchronously (in `call`) or asynchronously (in the `Future`), depending on the context; -- **Sled writes** must be done synchronously (in `call`). +- **Sled writes** must be done synchronously (in `call`), which is guaranteed + by the state's external API (a `Buffer`ed `tower::Service`). ## In-memory data structures [in-memory]: #in-memory @@ -165,6 +169,11 @@ chains, so that the map ordering is the ordering of best to worst chains. ### `Chain` Type [chain-type]: #chain-type +The `Chain` type consists of a set of blocks, containing the non-finalized +portion of the chain it represents where the lowest height block's parent is +the tip of the finalized state. All of the other members of `Chain` cache +information contained within that set of blocks for fast lookup. + We represent the non-finalized portion of a chain with the following data structure and API: @@ -183,20 +192,13 @@ struct Chain { } ``` -The `Chain` type consists of a set of blocks, containing the non-finalized -portion of the chain it represents where the lowest height block's parent is -the tip of the finalized state. All of the other members of `Chain` cache -information contained within that set of blocks for fast lookup. - -The `Chain` type exposes 3 public functions to manipulate chain data -structures and one private helper function. - #### `pub fn push(&mut self, block: Arc) -> Result<(), Error>` Push a block into a chain as the new tip if the block is a valid extension of that chain. 1. Run contextual validation checks on block against Self + 1. Update cummulative data members - Add block to end of `self.blocks` - Add hash to `height_by_hash` @@ -212,6 +214,7 @@ that chain. Remove the lowest height block of the non-finalized portion of a chain. 1. Remove the lowest height block from `self.blocks` + 1. Update cummulative data members - Remove the block's hash from `self.height_by_hash` - for each `transaction` in `block` @@ -219,6 +222,7 @@ Remove the lowest height block of the non-finalized portion of a chain. - Remove new utxos from `self.utxos` - Remove the anchors from the appropriate `self._anchors` - Remove the nullifiers from the appropriate `self._nullifiers` + 1. Return the block **Note**: We do not subtract work from `self.partial_cummulative_work`. This @@ -229,9 +233,12 @@ is to make make the ordering of chains stable while finalizing blocks. Fork a chain at the block with the given hash, if it is part of this chain. 1. If `self` does not contain `new_tip` return `None` + 2. Clone self as `forked` + 3. While the tip of `forked` is not equal to `new_tip` - call `forked.pop_tip()` and discard the old tip + 4. Return `forked` #### `fn pop_tip(&mut self) -> Arc` @@ -239,6 +246,7 @@ Fork a chain at the block with the given hash, if it is part of this chain. Remove the highest height block of the non-finalized portion of a chain. 1. Remove the highest height `block` from `self.blocks` + 1. Update cummulative data members - Remove the corresponding hash from `self.height_by_hash` - for each `transaction` in `block` @@ -247,6 +255,7 @@ Remove the highest height block of the non-finalized portion of a chain. - Remove anchors from the appropriate `self._anchors` - Remove the nullifiers from the appropriate `self._nullifiers` - Subtract work from `self.partial_cumulative_work` + 1. Return the block #### `Ord` @@ -281,20 +290,26 @@ Finalize the lowest height block in the non-finalized portion of the best chain and updates all side chains to match. 1. Extract the best chain from `self.chains` into `best_chain` + 1. Extract the rest of the chains into a `side_chains` temporary variable, so they can be mutated + 1. Remove the lowest height block from the best chain with `let block = best_chain.pop_root();` + 1. Add `best_chain` back to `self.chains` + 1. For each remaining `chain` in `side_chains` - If `chain` starts with `block`, remove `block` and add `chain` back to `self.chains` - Else, drop `chain` + 1. for each `height` in `self.queued_by_height` where the height is lower than the new reorg limit - for each `hash` in `self.queued_by_height` at `height` - Remove the key `hash` from `self.queued_blocks` and store the removed `block` - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash + 1. Return `block` ### `pub fn queue(&mut self, block: QueuedBlock)` @@ -305,13 +320,16 @@ After queueing a non-finalized block, this method checks whether the newly queued block (and any of its descendants) can be committed to the state 1. Check if the parent block exists in any current chain + 1. If it does, call `let ret = self.commit_block(block)` - 1. Call `self.process_queued(new_parents)` if `ret` is `Some` + - Call `self.process_queued(new_parents)` if `ret` is `Some` + 1. Else Add `block` to `self.queued_blocks` and related members and return ### `fn process_queued(&mut self, new_parent: block::Hash)` 1. Create a list of `new_parents` and populate it with `new_parent` + 1. While let Some(parent) = new_parents.pop() - for each `hash` in `self.queued_by_parent.remove(&parent.hash)` - lookup the `block` for `hash` @@ -330,39 +348,48 @@ cannot be committed due to missing context. - try to push `block` onto that chain - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` + 1. Find the first chain that contains `block.parent` and fork it with `block.parent` as the new tip - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` + 1. If `fork` is `Some` - try to push `block` onto that chain - if successful add `fork` to `self.chains` - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` + 1. Else panic, this should be unreachable because `commit_block` is only called when it's ready to be committed. In Summary: - `Chain` represents the non-finalized portion of a single chain + - `ChainSet` represents the non-finalized portion of all chains and all unverified blocks that are waiting for context to be available. + - `chain_set::queue` handles queueing and or commiting blocks and reorganizing chains (via `commit_block`) but not finalizing them + - Finalized blocks are returned from `finalize` and must still be committed to disk afterwards +- `finalize` handles pruning queued blocks that are past the reorg limit + ## Committing non-finalized blocks Given the above structures for manipulating the non-finalized state new -`non-finalized` blocks are commited in 3 steps. First we commit the block to -the in memory state, then we finalize the lowest height block if it is past -the reorg limit, finally we process any queued blocks and prune any that are -now past the reorg limit. +`non-finalized` blocks are commited in two steps. First we commit the block +to the in memory state, then we finalize the lowest height block if it is +past the reorg limit, finally we process any queued blocks and prune any that +are now past the reorg limit. + +1. Try to commit or queue the block to the non-finalized state with + `chain_set.queue(block)?;` -1. Try to commit the block to the non-finalized state with - `chain_set.commit_block(block)?;` 1. If the best chain is longer than the reorg limit - - Finalize the lowest height block in the best chain with + - Finalize the lowest height block in the best chain with `let finalized = chain_set.finalize()?;` - commit `finalized` to disk with `CommitFinalizedBlock` @@ -423,9 +450,9 @@ which should: 1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`. Check that `block`'s parent hash is `old_tip` and its height is -`old_height+1`, or panic. This check is performed as defense-in-depth -to prevent database corruption, but it is the caller's responsibility to -commit finalized blocks in order. +`old_height+1`, or panic. This check is performed as defense-in-depth to +prevent database corruption, but it is the caller's responsibility (e.g. the +zebra-state service's responsibility) to commit finalized blocks in order. 2. Insert: - `(hash, height)` into `height_by_hash`; From f27d44edb4b7c64a76a4cccb58d50b91c37f84a9 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 15 Sep 2020 14:43:14 -0700 Subject: [PATCH 19/35] add guide level explanation to state rfc --- book/src/dev/rfcs/0005-state-updates.md | 65 ++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 9f441eef768..c83c84dc952 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -62,9 +62,70 @@ state service. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -XXX fill in after writing other details +The `zebra-state` crate provides an implementation of the chain state storage +logic in a zcash consensus node. Its main responsibility is to store chain +state, validating new blocks against the existing chain state in the process, +and to allow later querying of said chain state. `zebra-state` provides this +interface via a `tower::Service` based on the actor model with a +request/response interface for passing messages back and forth between the +state service and the rest of the application. + +The main entry point for the `zebra-state` crate is the `init` function. This +function takes a `zebra_state::Config` and constructs a new state service, +which it returns wrapped by a tower::Buffer. This service is then interacted +with via the `tower::Service` trait. -XXX(jane): I am planning on writing a guide-level explanation of the interface to zebra-state, intended for consumers of the `zebra-state` crate. +```rust +use tower::{Service, ServiceExt}; + +let config = app_config(); +let state_config = config.state; +let network = config.network; + +let state = zebra_state::on_disk::init(state_config, network); +let request = zebra_state::Request::GetBlockLocator { genesis: genesis_hash }; +let response = state.ready_and().await?.call(request).await?; + +assert!(matches!(response, zebra_state::Response::BlockLocator(_))); +``` + +**Note**: The `tower::Service` API requires that `ready` is always called +exactly once before each `call`. It is up to users of the zebra state service +to uphold this contract. + +The service itself is clonable. When cloned it only clones the buffered +interface, and not the wrapped service, providing shared access to a common +chain state across multithreaded applications. + +The set of operations supported by `zebra-state` are encoded in its `Request` +enum. This enum has one variant for each supported operation. + +```rust +pub enum Request { + CommitBlock { + block: Arc, + }, + CommitFinalizedBlock { + block: Arc, + }, + Depth(Hash), + Tip, + BlockLocator, + Transaction(Hash), + Block(HashOrHeight), + + // .. some variants omitted +} +``` + +`zebra-state` breaks down its requests into two categories and provides +different guarantees for category. Those that modify the state and those that +do not. Requests that update the state are guaranteed to run sequentially and +will never race against each other. Requests that read state are done +asynchronously and are guaranteed to read at least the state present at the +time the request was processed, or a later state. The state service avoids +race conditions between the read state and the written state by doing all +contextual verification internally. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation From ad46826171c2650f406aa71d03dc63ee8da89027 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Tue, 15 Sep 2020 14:50:51 -0700 Subject: [PATCH 20/35] add drawbacks section --- book/src/dev/rfcs/0005-state-updates.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index c83c84dc952..377a2bf7bd0 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -677,14 +677,15 @@ Implemented by querying: # Drawbacks [drawbacks]: #drawbacks -# Rationale and alternatives -[rationale-and-alternatives]: #rationale-and-alternatives +- Restarts can cause `zebrad` to redownload up to the last one hundred blocks + it verified. -# Prior art -[prior-art]: #prior-art +- The service interface puts some extra responsibility on callers to ensure + it is used correctly and does not verify the usage is correct at compile + time. -# Unresolved questions -[unresolved-questions]: #unresolved-questions +- the service API is verbose and requires manually unwrapping enums -# Future possibilities -[future-possibilities]: #future-possibilities +- We do not handle reorgs the same way zcashd does, and could in theory need + to delete our entire on disk state and resync the chain in some + pathological reorg cases. \ No newline at end of file From 28e4bbe11e71063aafabfe9f421c2153579daa9a Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 11:49:52 -0700 Subject: [PATCH 21/35] Update book/src/dev/rfcs/0005-state-updates.md Co-authored-by: Henry de Valence --- book/src/dev/rfcs/0005-state-updates.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 377a2bf7bd0..ff6705016dc 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -83,7 +83,7 @@ let state_config = config.state; let network = config.network; let state = zebra_state::on_disk::init(state_config, network); -let request = zebra_state::Request::GetBlockLocator { genesis: genesis_hash }; +let request = zebra_state::Request::BlockLocator; let response = state.ready_and().await?.call(request).await?; assert!(matches!(response, zebra_state::Response::BlockLocator(_))); @@ -688,4 +688,4 @@ Implemented by querying: - We do not handle reorgs the same way zcashd does, and could in theory need to delete our entire on disk state and resync the chain in some - pathological reorg cases. \ No newline at end of file + pathological reorg cases. From d64d951234e82d27e69b106bfd36abb2bd205844 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 11:50:58 -0700 Subject: [PATCH 22/35] Apply suggestions from code review Co-authored-by: Henry de Valence --- book/src/dev/rfcs/0005-state-updates.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index ff6705016dc..98a14a858a3 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -72,7 +72,7 @@ state service and the rest of the application. The main entry point for the `zebra-state` crate is the `init` function. This function takes a `zebra_state::Config` and constructs a new state service, -which it returns wrapped by a tower::Buffer. This service is then interacted +which it returns wrapped by a `tower::Buffer`. This service is then interacted with via the `tower::Service` trait. ```rust @@ -93,9 +93,7 @@ assert!(matches!(response, zebra_state::Response::BlockLocator(_))); exactly once before each `call`. It is up to users of the zebra state service to uphold this contract. -The service itself is clonable. When cloned it only clones the buffered -interface, and not the wrapped service, providing shared access to a common -chain state across multithreaded applications. +The `tower::Buffer` wrapper is `Clone`able, allowing shared access to a common state service. This allows different tasks to share access to the chain state. The set of operations supported by `zebra-state` are encoded in its `Request` enum. This enum has one variant for each supported operation. @@ -227,7 +225,7 @@ each rooted at the highest finalized block. Each chain consists of a map from heights to blocks. Chains are stored using an ordered map from difficulty to chains, so that the map ordering is the ordering of best to worst chains. -### `Chain` Type +### The `Chain` type [chain-type]: #chain-type The `Chain` type consists of a set of blocks, containing the non-finalized From 821d0b1aff369d798813905f058ddf075973d1ac Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 11:51:26 -0700 Subject: [PATCH 23/35] Update book/src/dev/rfcs/0005-state-updates.md Co-authored-by: Henry de Valence --- book/src/dev/rfcs/0005-state-updates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 98a14a858a3..71b0f2e3a7b 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -421,7 +421,7 @@ cannot be committed due to missing context. 1. Else panic, this should be unreachable because `commit_block` is only called when it's ready to be committed. -In Summary: +### Summary - `Chain` represents the non-finalized portion of a single chain From eb15ba5cf918d9766db6c6b30be5591dee37a932 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 15:38:11 -0700 Subject: [PATCH 24/35] apply changes from code review --- book/src/dev/rfcs/0005-state-updates.md | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 71b0f2e3a7b..60b4951088b 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -78,10 +78,6 @@ with via the `tower::Service` trait. ```rust use tower::{Service, ServiceExt}; -let config = app_config(); -let state_config = config.state; -let network = config.network; - let state = zebra_state::on_disk::init(state_config, network); let request = zebra_state::Request::BlockLocator; let response = state.ready_and().await?.call(request).await?; @@ -209,13 +205,12 @@ synchronous or asynchronous, we ensure that writes cannot race each other. Asynchronous reads are guaranteed to read at least the state present at the time the request was processed, or a later state. -In summary: +### Summary - **Sled reads** may be done synchronously (in `call`) or asynchronously (in the `Future`), depending on the context; -- **Sled writes** must be done synchronously (in `call`), which is guaranteed - by the state's external API (a `Buffer`ed `tower::Service`). +- **Sled writes** must be done synchronously (in `call`) ## In-memory data structures [in-memory]: #in-memory From eafe22a82c79de7b92f40f68d0786bcfa4d14068 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 16:58:58 -0700 Subject: [PATCH 25/35] clarify iteration --- book/src/dev/rfcs/0005-state-updates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 60b4951088b..db2999e1d3f 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -360,7 +360,7 @@ chain and updates all side chains to match. 1. for each `height` in `self.queued_by_height` where the height is lower than the new reorg limit - - for each `hash` in `self.queued_by_height` at `height` + - for each `hash` in `self.queued_by_height.remove(height)` - Remove the key `hash` from `self.queued_blocks` and store the removed `block` - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash From 69d106bee55e866f9f448883308bfdd1a52bf499 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Wed, 16 Sep 2020 17:15:14 -0700 Subject: [PATCH 26/35] Apply suggestions from code review Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index db2999e1d3f..a3b19703357 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -117,7 +117,7 @@ different guarantees for category. Those that modify the state and those that do not. Requests that update the state are guaranteed to run sequentially and will never race against each other. Requests that read state are done asynchronously and are guaranteed to read at least the state present at the -time the request was processed, or a later state. The state service avoids +time the request was processed by the service, or a later state present at the time the request future is executed. The state service avoids race conditions between the read state and the written state by doing all contextual verification internally. @@ -280,7 +280,7 @@ Remove the lowest height block of the non-finalized portion of a chain. 1. Return the block **Note**: We do not subtract work from `self.partial_cummulative_work`. This -is to make make the ordering of chains stable while finalizing blocks. +is to make sure that the ordering of chains is stable while finalizing blocks. #### `pub fn fork(&self, new_tip: block::Hash) -> Option` @@ -316,8 +316,9 @@ Remove the highest height block of the non-finalized portion of a chain. The `Chain` type also implements `Ord` for reorganizing chains. First chains are compared by their `partial_cummulative_work`. Ties are then broken by -comparing `BlockHeaderHashes` of the tips of each chain. +comparing `block::Hash`es of the tips of each chain. +**Note**: Unlike `zcashd`, Zebra does not use block arrival times as a tie-breaker for the best tip. Since Zebra downloads blocks in parallel, download times are not guaranteed to be unique. Using the `block::Hash` provides a consistent tip order. (As a side-effect, the tip order is also consistent after a node restart, and between nodes.) ### `ChainSet` Type [chainset-type]: #chainset-type @@ -326,7 +327,7 @@ consists of a set of non-finalized but verified chains and a set of unverified blocks which are waiting for the full context needed to verify them to become available. -`ChainState` is defined by the following structure and API: +`ChainSet` is defined by the following structure and API: ```rust struct ChainSet { @@ -423,7 +424,7 @@ cannot be committed due to missing context. - `ChainSet` represents the non-finalized portion of all chains and all unverified blocks that are waiting for context to be available. -- `chain_set::queue` handles queueing and or commiting blocks and +- `ChainSet::queue` handles queueing and or commiting blocks and reorganizing chains (via `commit_block`) but not finalizing them - Finalized blocks are returned from `finalize` and must still be committed @@ -610,7 +611,7 @@ Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. Implemented by querying: - (non-finalized) the highest height block in the best chain -- (finalized) the `hash_by_height` tree only if there is no `non-finalized` state +- (finalized) the highest height block in the `hash_by_height` tree only if there is no `non-finalized` state ### `Request::BlockLocator` [request-block-locator]: #request-block-locator @@ -642,10 +643,10 @@ Returns Implemented by querying: -- (non-finalized) the `tx_by_hash` map (to get the parent block) of each +- (non-finalized) the `tx_by_hash` map (to get the block that contains the transaction) of each chain starting with the best chain, and then find block in `blocks` of that chain. -- (finalized) the `tx_by_hash` (to get the parent block) and then +- (finalized) the `tx_by_hash` (to get the block that contains the transaction) and then `block_by_height` (to get the transaction data) trees. ### `Request::Block(BlockHeaderHash)` @@ -671,7 +672,7 @@ Implemented by querying: [drawbacks]: #drawbacks - Restarts can cause `zebrad` to redownload up to the last one hundred blocks - it verified. + it verified in the best chain, and potentially some recent side-chain blocks. - The service interface puts some extra responsibility on callers to ensure it is used correctly and does not verify the usage is correct at compile @@ -682,3 +683,4 @@ Implemented by querying: - We do not handle reorgs the same way zcashd does, and could in theory need to delete our entire on disk state and resync the chain in some pathological reorg cases. +- testnet rollbacks are infrequent, but possible: each rollback will require additional state service code. From 077c1d9fee46e783b50da58caf590b772ecce959 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Thu, 17 Sep 2020 14:06:44 -0700 Subject: [PATCH 27/35] apply changes from code review --- book/src/dev/rfcs/0005-state-updates.md | 57 +++++++++++++++---------- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index a3b19703357..cdf8d6419ac 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -223,13 +223,24 @@ chains, so that the map ordering is the ordering of best to worst chains. ### The `Chain` type [chain-type]: #chain-type -The `Chain` type consists of a set of blocks, containing the non-finalized -portion of the chain it represents where the lowest height block's parent is -the tip of the finalized state. All of the other members of `Chain` cache -information contained within that set of blocks for fast lookup. -We represent the non-finalized portion of a chain with the following data -structure and API: +The `Chain` type represents a chain of blocks. Each block represents an +incremental state update, and the `Chain` type caches the cumulative state +update from its root to its tip. + +The `Chain` type is used to represent the non-finalized portion of a complete +chain of blocks rooted at the genesis block. The parent block of the root of +a `Chain` is the tip of the finalized portion of the chain. + +The `Chain` type supports serveral operations to manipulate chains, `push`, +`pop_root`, and `fork`. `push` is the most fundamental operation and handles +contextual validation of chains as they are extended. `pop_root` is provided +for finalization, and is how we move blocks from the non-finalized portion of +the state to the finalized portion. `fork` on the other hand handles creating +new chains for `push` when new blocks arrive whose parent isn't a tip of an +existing chain. + +The `Chain` type is defined by the following struct and API: ```rust struct Chain { @@ -253,7 +264,7 @@ that chain. 1. Run contextual validation checks on block against Self -1. Update cummulative data members +1. Update cumulative data members - Add block to end of `self.blocks` - Add hash to `height_by_hash` - for each `transaction` in `block` @@ -269,7 +280,7 @@ Remove the lowest height block of the non-finalized portion of a chain. 1. Remove the lowest height block from `self.blocks` -1. Update cummulative data members +1. Update cumulative data members - Remove the block's hash from `self.height_by_hash` - for each `transaction` in `block` - remove `transaction.hash` from `tx_by_hash` @@ -301,7 +312,7 @@ Remove the highest height block of the non-finalized portion of a chain. 1. Remove the highest height `block` from `self.blocks` -1. Update cummulative data members +2. Update cumulative data members - Remove the corresponding hash from `self.height_by_hash` - for each `transaction` in `block` - remove `transaction.hash` from `tx_by_hash` @@ -310,7 +321,7 @@ Remove the highest height block of the non-finalized portion of a chain. - Remove the nullifiers from the appropriate `self._nullifiers` - Subtract work from `self.partial_cumulative_work` -1. Return the block +3. Return the block #### `Ord` @@ -346,26 +357,26 @@ chain and updates all side chains to match. 1. Extract the best chain from `self.chains` into `best_chain` -1. Extract the rest of the chains into a `side_chains` temporary variable, so +2. Extract the rest of the chains into a `side_chains` temporary variable, so they can be mutated -1. Remove the lowest height block from the best chain with +3. Remove the lowest height block from the best chain with `let block = best_chain.pop_root();` -1. Add `best_chain` back to `self.chains` +4. Add `best_chain` back to `self.chains` -1. For each remaining `chain` in `side_chains` +5. For each remaining `chain` in `side_chains` - If `chain` starts with `block`, remove `block` and add `chain` back to `self.chains` - Else, drop `chain` -1. for each `height` in `self.queued_by_height` where the height is lower than the +6. for each `height` in `self.queued_by_height` where the height is lower than the new reorg limit - for each `hash` in `self.queued_by_height.remove(height)` - Remove the key `hash` from `self.queued_blocks` and store the removed `block` - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash -1. Return `block` +7. Return `block` ### `pub fn queue(&mut self, block: QueuedBlock)` @@ -376,16 +387,16 @@ queued block (and any of its descendants) can be committed to the state 1. Check if the parent block exists in any current chain -1. If it does, call `let ret = self.commit_block(block)` +2. If it does, call `let ret = self.commit_block(block)` - Call `self.process_queued(new_parents)` if `ret` is `Some` -1. Else Add `block` to `self.queued_blocks` and related members and return +3. Else Add `block` to `self.queued_blocks` and related members and return ### `fn process_queued(&mut self, new_parent: block::Hash)` 1. Create a list of `new_parents` and populate it with `new_parent` -1. While let Some(parent) = new_parents.pop() +2. While let Some(parent) = new_parents.pop() - for each `hash` in `self.queued_by_parent.remove(&parent.hash)` - lookup the `block` for `hash` - remove `block` from `self.queued_blocks` @@ -404,17 +415,17 @@ cannot be committed due to missing context. - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` -1. Find the first chain that contains `block.parent` and fork it with +2. Find the first chain that contains `block.parent` and fork it with `block.parent` as the new tip - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` -1. If `fork` is `Some` +3. If `fork` is `Some` - try to push `block` onto that chain - if successful add `fork` to `self.chains` - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` -1. Else panic, this should be unreachable because `commit_block` is only +4. Else panic, this should be unreachable because `commit_block` is only called when it's ready to be committed. ### Summary @@ -443,7 +454,7 @@ are now past the reorg limit. 1. Try to commit or queue the block to the non-finalized state with `chain_set.queue(block)?;` -1. If the best chain is longer than the reorg limit +2. If the best chain is longer than the reorg limit - Finalize the lowest height block in the best chain with `let finalized = chain_set.finalize()?;` - commit `finalized` to disk with `CommitFinalizedBlock` From 1f9403a4f2315096c5f28c27de2ef23becf9107d Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Thu, 17 Sep 2020 18:24:32 -0700 Subject: [PATCH 28/35] Update book/src/dev/rfcs/0005-state-updates.md Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index cdf8d6419ac..975c896de60 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -113,7 +113,7 @@ pub enum Request { ``` `zebra-state` breaks down its requests into two categories and provides -different guarantees for category. Those that modify the state and those that +different guarantees for each category: requests that modify the state, and requests that do not. Requests that update the state are guaranteed to run sequentially and will never race against each other. Requests that read state are done asynchronously and are guaranteed to read at least the state present at the From 04a97a316739ae6fae04ec69b5354ad08cc36784 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Thu, 17 Sep 2020 18:50:13 -0700 Subject: [PATCH 29/35] Apply suggestions from code review Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 975c896de60..464f23c74c0 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -133,7 +133,9 @@ orphaned (no longer included in the best chain). Their state updates are therefore no longer included in the best chain's chain state. The process of rolling back orphaned blocks and applying new blocks is called a chain reorganization. Bitcoin allows chain reorganizations of arbitrary depth, -while `zcashd` limits reorganizations to 100 blocks. +while `zcashd` limits chain reorganizations to 100 blocks. (In `zcashd`, the +new best chain must be a side-chain that forked within 100 blocks of the tip +of the current best chain.) This difference means that in Bitcoin, chain state only has probabilistic finality, while in Zcash, chain state is final once it is beyond the reorg @@ -217,7 +219,7 @@ time the request was processed, or a later state. At a high level, the in-memory data structures store a collection of chains, each rooted at the highest finalized block. Each chain consists of a map from -heights to blocks. Chains are stored using an ordered map from difficulty to +heights to blocks. Chains are stored using an ordered map from cumulative work to chains, so that the map ordering is the ordering of best to worst chains. ### The `Chain` type @@ -230,7 +232,8 @@ update from its root to its tip. The `Chain` type is used to represent the non-finalized portion of a complete chain of blocks rooted at the genesis block. The parent block of the root of -a `Chain` is the tip of the finalized portion of the chain. +a `Chain` is the tip of the finalized portion of the chain. As an exception, the finalized +portion of the chain is initially empty, until the genesis block has been finalized. The `Chain` type supports serveral operations to manipulate chains, `push`, `pop_root`, and `fork`. `push` is the most fundamental operation and handles @@ -262,7 +265,8 @@ struct Chain { Push a block into a chain as the new tip if the block is a valid extension of that chain. -1. Run contextual validation checks on block against Self +1. Run contextual validation checks on block against self and the finalized state + - genesis blocks have no context - the CheckpointVerifier performs the necessary `zero height` and `null parent hash` checks 1. Update cumulative data members - Add block to end of `self.blocks` @@ -447,7 +451,7 @@ cannot be committed due to missing context. Given the above structures for manipulating the non-finalized state new `non-finalized` blocks are commited in two steps. First we commit the block -to the in memory state, then we finalize the lowest height block if it is +to the in memory state, then we finalize all lowest height blocks that are past the reorg limit, finally we process any queued blocks and prune any that are now past the reorg limit. From a19970fd31829988bdd8ea75a7b6533fc034d72a Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 12:39:16 -0700 Subject: [PATCH 30/35] Apply suggestions from code review Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 53 +++++++++++++++++++------ 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 464f23c74c0..71bfd14c5e5 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -45,6 +45,8 @@ state service. represents the consensus state of the Zcash network and transactions. * **side chain**: A chain which is not contained in the best chain. + Side chains are pruned at the reorg limit, when they are no longer + connected to the finalized state. * **chain reorganization**: Occurs when a new best chain is found and the previous best chain becomes a side chain. @@ -59,6 +61,14 @@ state service. * **finalized state**: State data corresponding to blocks below the reorg limit. This data cannot change in the event of a chain reorg. +* **non-finalized tips**: The highest blocks in each non-finalized chain. These + tips might be at different heights. + +* **finalized tip**: The highest block in the finalized state. The tip of the best + chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can + be lower during the initial sync, and after a chain reorganization, if the new + best chain is at a lower height. + # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -330,7 +340,7 @@ Remove the highest height block of the non-finalized portion of a chain. #### `Ord` The `Chain` type also implements `Ord` for reorganizing chains. First chains -are compared by their `partial_cummulative_work`. Ties are then broken by +are compared by their `partial_cumulative_work`. Ties are then broken by comparing `block::Hash`es of the tips of each chain. **Note**: Unlike `zcashd`, Zebra does not use block arrival times as a tie-breaker for the best tip. Since Zebra downloads blocks in parallel, download times are not guaranteed to be unique. Using the `block::Hash` provides a consistent tip order. (As a side-effect, the tip order is also consistent after a node restart, and between nodes.) @@ -374,6 +384,8 @@ chain and updates all side chains to match. `self.chains` - Else, drop `chain` +5. calculate the new finalized tip height from the new `best_chain` + 6. for each `height` in `self.queued_by_height` where the height is lower than the new reorg limit - for each `hash` in `self.queued_by_height.remove(height)` @@ -413,8 +425,7 @@ queued block (and any of its descendants) can be committed to the state Try to commit `block` to the non-finalized state. Returns `None` if the block cannot be committed due to missing context. -1. For each `chain` - - if `block.parent` == `chain.tip` +1. Search for the first chain where `block.parent` == `chain.tip`. If it exists: - try to push `block` onto that chain - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` @@ -459,9 +470,14 @@ are now past the reorg limit. `chain_set.queue(block)?;` 2. If the best chain is longer than the reorg limit - - Finalize the lowest height block in the best chain with - `let finalized = chain_set.finalize()?;` - - commit `finalized` to disk with `CommitFinalizedBlock` + - Finalize all lowest height blocks in the best chain, and commit them to disk with `CommitFinalizedBlock`: + ``` + while self.best_chain().len() > reorg_limit { + let finalized = chain_set.finalize()?; + let request = CommitFinalizedBlock { finalized }; + sled_state.ready_and().await?.call(request).await?; + }; + ``` ## Sled data structures [sled]: #sled @@ -487,6 +503,8 @@ We use the following Sled trees: Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`. +**Note:** We do not store the cumulative work for the finalized chain, because the finalized work is equal for all non-finalized chains. So the additional non-finalized work can be used to calculate the relative chain order, and choose the best chain. + ### Notes on Sled trees - The `hash_by_height` and `height_by_hash` trees provide the bijection between @@ -524,11 +542,19 @@ Check that `block`'s parent hash is `old_tip` and its height is prevent database corruption, but it is the caller's responsibility (e.g. the zebra-state service's responsibility) to commit finalized blocks in order. +The genesis block does not have a parent block. For genesis blocks, +check that `block`'s parent hash is `null` (all zeroes) and its height is `0`. + 2. Insert: - `(hash, height)` into `height_by_hash`; - `(height, hash)` into `hash_by_height`; - `(height, block)` into `block_by_height`. +3. If the block is a genesis block, skip any transaction updates. + +(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions +are ignored during validation.) + 3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors (XXX: how??) @@ -601,7 +627,7 @@ CommitFinalizedBlock { Commits a finalized block to the sled state, skipping contextual validation. This is exposed for use in checkpointing, which produces in-order finalized -blocks. Returns `Response::Added(BlockHeaderHash)` with the hash of the +blocks. Returns `Response::Added(block::Hash)` with the hash of the committed block if successful. ### `Request::Depth(block::Hash)` @@ -610,31 +636,32 @@ committed block if successful. Computes the depth in the best chain of the block identified by the given hash, returning -- `Response::Depth(Some(depth))` if the block is in the main chain; +- `Response::Depth(Some(depth))` if the block is in the best chain; - `Response::Depth(None)` otherwise. Implemented by querying: -- (non-finalized) the `height_by_hash` map in the best chain +- (non-finalized) the `height_by_hash` map in the best chain, and - (finalized) the `height_by_hash` tree ### `Request::Tip` [request-tip]: #request-tip -Returns `Response::Tip(BlockHeaderHash)` with the current best chain tip. +Returns `Response::Tip(block::Hash)` with the current best chain tip. Implemented by querying: - (non-finalized) the highest height block in the best chain -- (finalized) the highest height block in the `hash_by_height` tree only if there is no `non-finalized` state +if the `non-finalized` state is empty +- (finalized) the highest height block in the `hash_by_height` tree ### `Request::BlockLocator` [request-block-locator]: #request-block-locator Returns `Response::BlockLocator(Vec)` with hashes starting from the current chain tip and reaching backwards towards the genesis block. The -first hash is the current chain tip. The last hash is the tip of the -finalized portion of the state. If the state is empty, the block locator is +first hash is the best chain tip. The last hash is the tip of the +finalized portion of the state. If the finalized and non-finalized states are both empty, the block locator is also empty. This can be used by the sync component to request hashes of subsequent From cc62811ec03a506244fc0d840d8a88cde7e39ade Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 12:40:39 -0700 Subject: [PATCH 31/35] Apply suggestions from code review Co-authored-by: Deirdre Connolly --- book/src/dev/rfcs/0005-state-updates.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 71bfd14c5e5..4a4418caaee 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -555,8 +555,9 @@ check that `block`'s parent hash is `null` (all zeroes) and its height is `0`. (Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions are ignored during validation.) -3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout - and Sapling anchors (XXX: how??) +3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors. + +**Note**: The Sprout and Sapling anchors are the roots of the Sprout and Sapling note commitment trees that have already been calculated for the last transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or `Spend`/`Output` descriptions in the Sapling case. These should be passed as fields in the `Commit*Block` requests. 4. Iterate over the enumerated transactions in the block. For each transaction: From 2cb0eb31e1156fe6af14705cab57ca97f7368fa3 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 12:42:52 -0700 Subject: [PATCH 32/35] Apply suggestions from code review Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 4a4418caaee..83df9363b07 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -687,10 +687,11 @@ Returns Implemented by querying: - (non-finalized) the `tx_by_hash` map (to get the block that contains the transaction) of each - chain starting with the best chain, and then find block in `blocks` of that - chain. -- (finalized) the `tx_by_hash` (to get the block that contains the transaction) and then - `block_by_height` (to get the transaction data) trees. + chain starting with the best chain, and then find block that chain's `blocks` (to get the block containing + the transaction data) +if the transaction is not in any non-finalized chain: +- (finalized) the `tx_by_hash` tree (to get the block that contains the transaction) and then + `block_by_height` tree (to get the block containing the transaction data). ### `Request::Block(BlockHeaderHash)` [request-block]: #request-block @@ -706,9 +707,10 @@ Returns Implemented by querying: - (non-finalized) the `height_by_hash` of each chain starting with the best - chain, then find block in `blocks` of that chain. -- (finalized) the `height_by_hash` (to get the block height) and then - `block_by_height` (to get the block data) trees. + chain, then find block that chain's `blocks` (to get the block data) +if the block is not in any non-finalized chain: +- (finalized) the `height_by_hash` tree (to get the block height) and then + the `block_by_height` tree (to get the block data). # Drawbacks @@ -726,4 +728,4 @@ Implemented by querying: - We do not handle reorgs the same way zcashd does, and could in theory need to delete our entire on disk state and resync the chain in some pathological reorg cases. -- testnet rollbacks are infrequent, but possible: each rollback will require additional state service code. +- testnet rollbacks are infrequent, but possible, due to bugs in testnet releases. Each testnet rollback will require additional state service code. From 190a9bbcada132090650f5ec2f805f46f48bde16 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 12:58:01 -0700 Subject: [PATCH 33/35] add info about default constructing chains when forking from finalized state --- book/src/dev/rfcs/0005-state-updates.md | 39 +++++++++++++++++++++---- 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 83df9363b07..aa11dd8a322 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -339,11 +339,31 @@ Remove the highest height block of the non-finalized portion of a chain. #### `Ord` -The `Chain` type also implements `Ord` for reorganizing chains. First chains +The `Chain` type implements `Ord` for reorganizing chains. First chains are compared by their `partial_cumulative_work`. Ties are then broken by comparing `block::Hash`es of the tips of each chain. -**Note**: Unlike `zcashd`, Zebra does not use block arrival times as a tie-breaker for the best tip. Since Zebra downloads blocks in parallel, download times are not guaranteed to be unique. Using the `block::Hash` provides a consistent tip order. (As a side-effect, the tip order is also consistent after a node restart, and between nodes.) +**Note**: Unlike `zcashd`, Zebra does not use block arrival times as a +tie-breaker for the best tip. Since Zebra downloads blocks in parallel, +download times are not guaranteed to be unique. Using the `block::Hash` +provides a consistent tip order. (As a side-effect, the tip order is also +consistent after a node restart, and between nodes.) + +#### `Default` + +The `Chain` type implements `Default` for constructing new chains whose +parent block is the tip of the finalized state. This implementation should be +handled by `#[derive(Default)]`. + +1. initialise cumulative data members + - Construct an empty `self.blocks`, `height_by_hash`, `tx_by_hash`, `self.utxos`, `self._anchors`, `self._nullifiers` + - Zero `self.partial_cumulative_work` + +**Note:** The chain can be empty if: + - after a restart - the non-finalized state is empty + - during a fork from the finalized tip - the forked Chain is empty, because all its blocks have been `pop`ped + + ### `ChainSet` Type [chainset-type]: #chainset-type @@ -426,9 +446,9 @@ Try to commit `block` to the non-finalized state. Returns `None` if the block cannot be committed due to missing context. 1. Search for the first chain where `block.parent` == `chain.tip`. If it exists: - - try to push `block` onto that chain - - broadcast `result` via `block.rsp_tx` - - return Some(block.hash) if `result.is_ok()` + - try to push `block` onto that chain + - broadcast `result` via `block.rsp_tx` + - return Some(block.hash) if `result.is_ok()` 2. Find the first chain that contains `block.parent` and fork it with `block.parent` as the new tip @@ -440,7 +460,14 @@ cannot be committed due to missing context. - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` -4. Else panic, this should be unreachable because `commit_block` is only +4. If `block.parent` == `finalized_tip.hash` + - Construct a new `Chain` with `Chain::default` + - try to push `block` onto that chain + - if successful add `fork` to `self.chains` + - broadcast `result` via `block.rsp_tx` + - return Some(block.hash) if `result.is_ok()` + +5. Else panic, this should be unreachable because `commit_block` is only called when it's ready to be committed. ### Summary From 4a035357485e01606fe2a14da156dfb29eaddb84 Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 14:29:40 -0700 Subject: [PATCH 34/35] Update book/src/dev/rfcs/0005-state-updates.md Co-authored-by: teor --- book/src/dev/rfcs/0005-state-updates.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index aa11dd8a322..348f6686018 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -582,6 +582,11 @@ check that `block`'s parent hash is `null` (all zeroes) and its height is `0`. (Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions are ignored during validation.) +3. If the block is a genesis block, skip any transaction updates. + +(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions +are ignored during validation.) + 3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors. **Note**: The Sprout and Sapling anchors are the roots of the Sprout and Sapling note commitment trees that have already been calculated for the last transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or `Spend`/`Output` descriptions in the Sapling case. These should be passed as fields in the `Commit*Block` requests. From 5840cfd7c1368c0bb5d36f9e0a3d6b74c4f9a54d Mon Sep 17 00:00:00 2001 From: Jane Lusby Date: Fri, 18 Sep 2020 14:47:32 -0700 Subject: [PATCH 35/35] move contextual verification out of Chain --- book/src/dev/rfcs/0005-state-updates.md | 97 ++++++++++++++----------- 1 file changed, 56 insertions(+), 41 deletions(-) diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index aa11dd8a322..67aa7abcdf8 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -253,6 +253,14 @@ the state to the finalized portion. `fork` on the other hand handles creating new chains for `push` when new blocks arrive whose parent isn't a tip of an existing chain. +**Note:** The `Chain` type's API is only designed to handle non-finalized +data. The genesis block and all pre sapling blocks are always considered to +be finalized blocks and should not be handled via the `Chain` type through +`CommitBlock`. They should instead be committed directly to the finalized +state with `CommitFinalizedBlock`. This is particularly important with the +genesis block since the `Chain` will panic if used while the finalized state +is completely empty. + The `Chain` type is defined by the following struct and API: ```rust @@ -270,13 +278,9 @@ struct Chain { } ``` -#### `pub fn push(&mut self, block: Arc) -> Result<(), Error>` - -Push a block into a chain as the new tip if the block is a valid extension of -that chain. +#### `pub fn push(&mut self, block: Arc)` -1. Run contextual validation checks on block against self and the finalized state - - genesis blocks have no context - the CheckpointVerifier performs the necessary `zero height` and `null parent hash` checks +Push a block into a chain as the new tip 1. Update cumulative data members - Add block to end of `self.blocks` @@ -294,7 +298,7 @@ Remove the lowest height block of the non-finalized portion of a chain. 1. Remove the lowest height block from `self.blocks` -1. Update cumulative data members +2. Update cumulative data members - Remove the block's hash from `self.height_by_hash` - for each `transaction` in `block` - remove `transaction.hash` from `tx_by_hash` @@ -302,10 +306,7 @@ Remove the lowest height block of the non-finalized portion of a chain. - Remove the anchors from the appropriate `self._anchors` - Remove the nullifiers from the appropriate `self._nullifiers` -1. Return the block - -**Note**: We do not subtract work from `self.partial_cummulative_work`. This -is to make sure that the ordering of chains is stable while finalizing blocks. +3. Return the block #### `pub fn fork(&self, new_tip: block::Hash) -> Option` @@ -356,12 +357,14 @@ parent block is the tip of the finalized state. This implementation should be handled by `#[derive(Default)]`. 1. initialise cumulative data members - - Construct an empty `self.blocks`, `height_by_hash`, `tx_by_hash`, `self.utxos`, `self._anchors`, `self._nullifiers` + - Construct an empty `self.blocks`, `height_by_hash`, `tx_by_hash`, + `self.utxos`, `self._anchors`, `self._nullifiers` - Zero `self.partial_cumulative_work` **Note:** The chain can be empty if: - after a restart - the non-finalized state is empty - - during a fork from the finalized tip - the forked Chain is empty, because all its blocks have been `pop`ped + - during a fork from the finalized tip - the forked Chain is empty, because + all its blocks have been `pop`ped ### `ChainSet` Type @@ -404,15 +407,15 @@ chain and updates all side chains to match. `self.chains` - Else, drop `chain` -5. calculate the new finalized tip height from the new `best_chain` +6. calculate the new finalized tip height from the new `best_chain` -6. for each `height` in `self.queued_by_height` where the height is lower than the +7. for each `height` in `self.queued_by_height` where the height is lower than the new reorg limit - for each `hash` in `self.queued_by_height.remove(height)` - Remove the key `hash` from `self.queued_blocks` and store the removed `block` - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash -7. Return `block` +8. Return `block` ### `pub fn queue(&mut self, block: QueuedBlock)` @@ -446,7 +449,7 @@ Try to commit `block` to the non-finalized state. Returns `None` if the block cannot be committed due to missing context. 1. Search for the first chain where `block.parent` == `chain.tip`. If it exists: - - try to push `block` onto that chain + - push `block` onto that chain - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` @@ -455,15 +458,8 @@ cannot be committed due to missing context. - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));` 3. If `fork` is `Some` - - try to push `block` onto that chain - - if successful add `fork` to `self.chains` - - broadcast `result` via `block.rsp_tx` - - return Some(block.hash) if `result.is_ok()` - -4. If `block.parent` == `finalized_tip.hash` - - Construct a new `Chain` with `Chain::default` - - try to push `block` onto that chain - - if successful add `fork` to `self.chains` + - push `block` onto that chain + - add `fork` to `self.chains` - broadcast `result` via `block.rsp_tx` - return Some(block.hash) if `result.is_ok()` @@ -493,11 +489,23 @@ to the in memory state, then we finalize all lowest height blocks that are past the reorg limit, finally we process any queued blocks and prune any that are now past the reorg limit. -1. Try to commit or queue the block to the non-finalized state with - `chain_set.queue(block)?;` +1. Run contextual validation on `block` against the finalized and non + finalized state + +2. If `block.parent` == `finalized_tip.hash` + - Construct a new `Chain` with `Chain::default` + - push `block` onto that chain + - add `fork` to `chain_set.chains` + - broadcast `result` via `block.rsp_tx` + - return Some(block.hash) if `result.is_ok()` + +3. commit or queue the block to the non-finalized state with + `chain_set.queue(block);` + +4. If the best chain is longer than the reorg limit + - Finalize all lowest height blocks in the best chain, and commit them to + disk with `CommitFinalizedBlock`: -2. If the best chain is longer than the reorg limit - - Finalize all lowest height blocks in the best chain, and commit them to disk with `CommitFinalizedBlock`: ``` while self.best_chain().len() > reorg_limit { let finalized = chain_set.finalize()?; @@ -584,8 +592,6 @@ are ignored during validation.) 3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors. -**Note**: The Sprout and Sapling anchors are the roots of the Sprout and Sapling note commitment trees that have already been calculated for the last transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or `Spend`/`Output` descriptions in the Sapling case. These should be passed as fields in the `Commit*Block` requests. - 4. Iterate over the enumerated transactions in the block. For each transaction: 1. Insert `(transaction_hash, block_height || BE32(tx_index))` to @@ -605,6 +611,12 @@ are ignored during validation.) 5. For each [`Spend`] description in the transaction, insert `(nullifier,())` into `sapling_nullifiers`. +**Note**: The Sprout and Sapling anchors are the roots of the Sprout and +Sapling note commitment trees that have already been calculated for the last +transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or +`Spend`/`Output` descriptions in the Sapling case. These should be passed as +fields in the `Commit*Block` requests. + [`JoinSplit`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.JoinSplit.html [`Spend`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.Spend.html @@ -688,9 +700,9 @@ if the `non-finalized` state is empty Returns `Response::BlockLocator(Vec)` with hashes starting from the current chain tip and reaching backwards towards the genesis block. The -first hash is the best chain tip. The last hash is the tip of the -finalized portion of the state. If the finalized and non-finalized states are both empty, the block locator is -also empty. +first hash is the best chain tip. The last hash is the tip of the finalized +portion of the state. If the finalized and non-finalized states are both +empty, the block locator is also empty. This can be used by the sync component to request hashes of subsequent blocks. @@ -713,12 +725,14 @@ Returns Implemented by querying: -- (non-finalized) the `tx_by_hash` map (to get the block that contains the transaction) of each - chain starting with the best chain, and then find block that chain's `blocks` (to get the block containing - the transaction data) +- (non-finalized) the `tx_by_hash` map (to get the block that contains the + transaction) of each chain starting with the best chain, and then find + block that chain's `blocks` (to get the block containing the transaction + data) if the transaction is not in any non-finalized chain: -- (finalized) the `tx_by_hash` tree (to get the block that contains the transaction) and then - `block_by_height` tree (to get the block containing the transaction data). +- (finalized) the `tx_by_hash` tree (to get the block that contains the + transaction) and then `block_by_height` tree (to get the block containing + the transaction data). ### `Request::Block(BlockHeaderHash)` [request-block]: #request-block @@ -755,4 +769,5 @@ if the block is not in any non-finalized chain: - We do not handle reorgs the same way zcashd does, and could in theory need to delete our entire on disk state and resync the chain in some pathological reorg cases. -- testnet rollbacks are infrequent, but possible, due to bugs in testnet releases. Each testnet rollback will require additional state service code. +- testnet rollbacks are infrequent, but possible, due to bugs in testnet + releases. Each testnet rollback will require additional state service code.