Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update docs for blobs #9807

Closed
wants to merge 40 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
d2544a9
feat: first blobs commit - ts + nr only - overflow err
MirandaWood Sep 25, 2024
9cefc9a
feat: hash blob in chunks, force native sim for block-root
MirandaWood Sep 27, 2024
df93cf5
feat: add ts blob class, use real blob in rollup, more tests
MirandaWood Oct 2, 2024
c5539eb
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Oct 4, 2024
5e92023
feat: post merge fixes, add sponge to epoch prover, cleanup
MirandaWood Oct 4, 2024
1d17855
fix: more merge fixes + cleanup
MirandaWood Oct 4, 2024
aff5e35
feat: purge txseffectshash from nr and ts
MirandaWood Oct 7, 2024
dfe722b
feat: publish blobs to L1, verify, test, a lot of cleanup
MirandaWood Oct 11, 2024
bc0678c
chore: fmt, test refactors + fixes
MirandaWood Oct 21, 2024
ad090f6
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Oct 21, 2024
bf98bbc
fix: multiple post merge fixes, refactor err logging, prover-coord te…
MirandaWood Oct 24, 2024
2cd7570
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Oct 24, 2024
8a8dcab
feat: reinstate bignum code after fix, post merge fixes and fmt
MirandaWood Oct 24, 2024
cc01696
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Oct 24, 2024
6f92b1f
chore: revert e2e prover coord fix in favour of master
MirandaWood Oct 24, 2024
233d3fd
feat: added custom existing sponge absorber to save gates, fmt
MirandaWood Oct 25, 2024
7c1749d
feat: tightly pack blobs, encode and decode with prefixes
MirandaWood Oct 29, 2024
247a49d
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Oct 29, 2024
8f3d10f
chore: post merge fixes, formatting
MirandaWood Oct 31, 2024
20674cf
fix: apply fix for public processor state (PR9634)
MirandaWood Nov 1, 2024
38725f8
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 1, 2024
77dcf8e
chore: update fixtures + fmt
MirandaWood Nov 1, 2024
580b1b2
feat: add blob check override, simulate txs
MirandaWood Nov 4, 2024
2166780
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 4, 2024
99ac96e
feat: update bignum, fmt, comments, small fixes
MirandaWood Nov 5, 2024
43bb5f0
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 5, 2024
88733fe
chore: post merge fix
MirandaWood Nov 5, 2024
37f7af4
chore: fix viem kzg (ty Mike), cleanup
MirandaWood Nov 6, 2024
9bb2c21
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 6, 2024
c667c50
feat: added total len to encoding, comments, tests, fmt
MirandaWood Nov 6, 2024
3a7a997
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 6, 2024
225ba00
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 6, 2024
27d839e
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 6, 2024
f241744
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 6, 2024
9c4c4da
chore: remove unused body calldata variable name
MirandaWood Nov 6, 2024
6cbab7e
chore: update hardcoded test val
MirandaWood Nov 6, 2024
deeb044
Update ci.yml
ludamad Nov 6, 2024
afe79e0
chore: cleanup, fmt
MirandaWood Nov 7, 2024
38796a0
Merge remote-tracking branch 'origin' into mw/blob-circuit
MirandaWood Nov 7, 2024
3ea637d
docs: update docs for blobs
MirandaWood Nov 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ jobs:
concurrency_key: build-x86
# prepare images locally, tagged by commit hash
- name: "Build E2E Image"
timeout-minutes: 40
timeout-minutes: 90
if: (needs.configure.outputs.non-docs == 'true' && needs.configure.outputs.non-barretenberg-cpp == 'true') || github.ref_name == 'master'
run: |
earthly-ci ./yarn-project+export-e2e-test-images
Expand Down
10 changes: 5 additions & 5 deletions barretenberg/cpp/src/barretenberg/vm/aztec_constants.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#define GAS_FEES_LENGTH 2
#define GAS_LENGTH 2
#define CALL_CONTEXT_LENGTH 4
#define CONTENT_COMMITMENT_LENGTH 4
#define CONTENT_COMMITMENT_LENGTH 3
#define CONTRACT_STORAGE_READ_LENGTH 3
#define CONTRACT_STORAGE_UPDATE_REQUEST_LENGTH 3
#define GLOBAL_VARIABLES_LENGTH 9
Expand All @@ -32,13 +32,13 @@
#define PUBLIC_INNER_CALL_REQUEST_LENGTH 13
#define STATE_REFERENCE_LENGTH 8
#define TOTAL_FEES_LENGTH 1
#define HEADER_LENGTH 24
#define PUBLIC_CIRCUIT_PUBLIC_INPUTS_LENGTH 866
#define PUBLIC_CONTEXT_INPUTS_LENGTH 41
#define HEADER_LENGTH 23
#define PUBLIC_CIRCUIT_PUBLIC_INPUTS_LENGTH 865
#define PUBLIC_CONTEXT_INPUTS_LENGTH 40
#define AVM_VERIFICATION_KEY_LENGTH_IN_FIELDS 86
#define AVM_PROOF_LENGTH_IN_FIELDS 4176
#define AVM_PUBLIC_COLUMN_MAX_SIZE 1024
#define AVM_PUBLIC_INPUTS_FLATTENED_SIZE 2914
#define AVM_PUBLIC_INPUTS_FLATTENED_SIZE 2913
#define MEM_TAG_FF 0
#define MEM_TAG_U1 1
#define MEM_TAG_U8 2
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,6 @@ bb::fr WorldState::compute_initial_archive(const StateReference& initial_state_r
0,
0,
0,
0,
// state reference - the initial state for all the trees (accept the archive tree)
initial_state_ref.at(MerkleTreeId::L1_TO_L2_MESSAGE_TREE).first,
initial_state_ref.at(MerkleTreeId::L1_TO_L2_MESSAGE_TREE).second,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ TEST_F(WorldStateTest, GetInitialTreeInfoForAllTrees)
EXPECT_EQ(info.meta.size, 1);
EXPECT_EQ(info.meta.depth, tree_heights.at(MerkleTreeId::ARCHIVE));
// this is the expected archive tree root at genesis
EXPECT_EQ(info.meta.root, bb::fr("0x1200a06aae1368abe36530b585bd7a4d2ba4de5037b82076412691a187d7621e"));
EXPECT_EQ(info.meta.root, bb::fr("0x234e13471e10c9cf5e8d6aee4c62952f6ea211d06c2b6912b13da2578e2c1ec7"));
}
}

Expand Down
20 changes: 20 additions & 0 deletions docs/docs/migration_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,26 @@ keywords: [sandbox, aztec, notes, migration, updating, upgrading]
Aztec is in full-speed development. Literally every version breaks compatibility with the previous ones. This page attempts to target errors and difficulties you might encounter when upgrading, and how to resolve them.

## 0.62.0

### Blobs
We now publish all DA in EVM blobs rather than calldata. This replaces all code that touched the `txsEffectsHash`.
In the rollup circuits, instead of hashing each child circuit's `txsEffectsHash` to form a tree, we track tx effects by absorbing them into a sponge for blob data (hence the name: `spongeBlob`). This sponge is treated like the state trees in that we check each rollup circuit 'follows' the next:

```diff
- let txs_effects_hash = sha256_to_field(left.txs_effects_hash, right.txs_effects_hash);
+ assert(left.end_sponge_blob.eq(right.start_sponge_blob));
+ let start_sponge_blob = left.start_sponge_blob;
+ let end_sponge_blob = right.end_sponge_blob;
```
This sponge is used in the block root circuit to confirm that an injected array of all `txEffects` does match those rolled up so far in the `spongeBlob`. Then, the `txEffects` array is used to construct and prove opening of the polynomial representing the blob commitment on L1 (this is done efficiently thanks to the Barycentric formula).
On L1, we publish the array as a blob and verify the above proof of opening. This confirms that the tx effects in the rollup circuit match the data in the blob:

```diff
- bytes32 txsEffectsHash = TxsDecoder.decode(_body);
+ bytes32 blobHash = _validateBlob(blobInput);
```
Where `blobInput` contains the proof of opening and evaluation calculated in the block root rollup circuit. It is then stored and used as a public input to verifying the epoch proof.

### [TXE] Single execution environment
Thanks to recent advancements in Brillig TXE performs every single call as if it was a nested call, spawning a new ACVM or AVM simulator without performance loss.
This ensures every single test runs in a consistent environment and allows for clearer test syntax:
Expand Down
180 changes: 180 additions & 0 deletions docs/docs/protocol-specs/data-publication-and-availability/blobs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
---
title: Blobs
---

## Implementation

### Technical Background

Essentially, we replace publishing all a tx's effects in calldata with publishing in a blob. Any data inside a blob is *not available* to the EVM so we cannot simply hash the same data on L1 and in the rollup circuits, and check the hash matches, as we do now.

Instead, publishing a blob makes the `blobhash` available:

```solidity
/**
* blobhash(i) returns the versioned_hash of the i-th blob associated with _this_ transaction.
* bytes[0:1]: 0x01
* bytes[1:32]: the last 31 bytes of the sha256 hash of the kzg commitment C.
*/
bytes32 blobHash;
assembly {
blobHash := blobhash(0)
}
```

Where the commitment $C$ is a KZG commitment to the data inside the blob over the BLS12-381 curve. There are more details [here](https://notes.ethereum.org/@vbuterin/proto_danksharding_faq#What-format-is-blob-data-in-and-how-is-it-committed-to) on exactly what this is, but briefly, given a set of 4096 data points inside a blob, $d_i$, we define the polynomial $p$ as:

$$p(\omega^i) = d_i.$$

In the background, this polynomial is found by interpolating the $d_i$ s (evaluations) against the $\omega^i$ s (points), where $\omega^{4096} = 1$ (i.e. is a 4096th root of unity).

This means our blob data $d_i$ is actually the polynomial $p$ given in evaluation form. Working in evaluation form, particularly when the polynomial is evaluated at roots of unity, gives us a [host of benefits](https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html#evaluation-form). One of those is that we can commit to the polynomial (using a precomputed trusted setup for secret $s$ and BLS12-381 generator $G_1$) with a simple linear combination:

$$ C = p(s)G_1 = p(sG_1) = \sum_{i = 0}^{4095} d_i l_i(sG_1),$$

where $l_i(x)$ are the [Lagrange polynomials](https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html#lagrange-polynomials). The details for us are not important - the important part is that we can commit to our blob by simply multiplying each data point by the corresponding element of the Lagrange-basis trusted setup and summing the result!

### Proving DA

So to prove that we are publishing the correct tx effects, we just do this sum in the circuit, and check the final output is the same $C$ given by the EVM, right? Wrong. The commitment is over BLS12-381, so we would be calculating hefty wrong-field elliptic curve operations.

Thankfully, there is a more efficient way, already implemented in the [`blob`](https://github.com/AztecProtocol/aztec-packages/tree/master/noir-projects/noir-protocol-circuits/crates/blob) crate in aztec-packages.

Our goal is to efficiently show that our tx effects accumulated in the rollup circuits are the same $d_i$ s in the blob committed to by $C$ on L1. To do this, we can provide an *opening proof* for $C$. In the circuit, we evaluate the polynomial at a challenge value $z$ and return the result: $p(z) = y$. We then construct a [KZG proof](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html#kate-proofs) in typescript of this opening (which is actually a commitment to the the quotient polynomial $q(x)$), and verify it on L1 using the [point evaluation precompile](https://eips.ethereum.org/EIPS/eip-4844#point-evaluation-precompile) added as part of EIP-4844. It has inputs:

- `versioned_hash`: The `blobhash` for this $C$
- `z`: The challenge value
- `y`: The claimed evaluation value at `z`
- `commitment`: The commitment $C$
- `proof`: The KZG proof of opening

It checks:

- `assert kzg_to_versioned_hash(commitment) == versioned_hash`
- `assert verify_kzg_proof(commitment, z, y, proof)`

As long as we use our tx effect fields as the $d_i$ values inside the circuit, and use the same $y$ and $z$ in the public inputs of the Honk L1 verification as input to the precompile, we have shown that $C$ indeed commits to our data. Note: I'm glossing over some details here which are explained in the links above (particularly the 'KZG Proof' and 'host of benefits' links).

But isn't evaluating $p(z)$ in the circuit also a bunch of very slow wrong-field arithmetic? No! Well, yes, but not as much as you'd think!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very enjoyable read !


To evaluate $p$ in evalulation form at some value not in its domain (i.e. not one of the $\omega^i$ s), we use the [barycentric formula](https://dankradfeist.de/ethereum/2021/06/18/pcs-multiproofs.html#evaluating-a-polynomial-in-evaluation-form-on-a-point-outside-the-domain):

$$p(z) = A(z)\sum_{i=0}^{4095} \frac{d_i}{A'(\omega^i)} \frac{1}{z - \omega^i}.$$

What's $A(x)$, you ask? Doesn't matter! One of the nice properties we get by defining $p$ as an interpolation over the roots of unity, is that the above formula is simplified to:

$$p(z) = \frac{z^{4096} - 1}{4096} \sum_{i=0}^{4095} \frac{d_i\omega^i}{z - \omega^i}.$$

We can precompute all the $\omega^i$, $-\omega^i$ s and $4096^{-1}$, the $d_i$ s are our tx effects, and $z$ is the challenge point (discussed more below). This means computing $p(z)$ is threoretically 4096 wrong-field multiplications and 4096 wrong-field divisions, far fewer than would be required for BLS12-381 elliptic curve operations.

### Rollup Circuits

#### Base

We need to pass up *something* encompassing the tx effects to the rollup circuits, so they can be used as $d_i$ s when we prove the blob opening. The simplest option would be to `poseidon2` hash the tx effects instead and pass those up, but that has some issues:

- If we have one hash per base rollup (i.e. per tx), we have an ever increasing list of hashes to manage.
- If we hash these in pairs, then we need to recreate the rollup structure when we prove the blob.

The latter is doable, but means encoding some maximum number of txs, `N`, to loop over and potentially wasting gates for blocks with fewer than `N` txs. For instance, if we chose `N = 96`, a block with only 2 txs would still have to loop 96 times. Plus, a block could never have more than 96 transactions without a fork.

Instead, we manage state in the vein of `PartialStateReference`, where we provide a `start` and `end` state in each base and subsequent merge rollup circuits check that they follow on from one another. The base circuits themselves simply prove that adding the data of its tx indeed moves the state from `start` to `end`.

To encompass all the tx effects, we use a `poseidon2` sponge and absorb each field. We also track the number of fields added to ensure we don't overflow the blob (4096 BLS fields, which *can* fit 4112 BN254 fields, but adding the mapping between these is complex). Given that this struct is a sponge used for a blob, I have named it:

```rs
global IV: Field = (FIELDS_PER_BLOB as Field) * 18446744073709551616;

struct SpongeBlob {
sponge: Poseidon2,
fields: u32,
}

impl SpongeBlob {
fn new() -> Self {
Self {
sponge: Poseidon2::new(IV),
fields: 0,
}
}
// Add fields to the sponge
fn absorb<let N: u32>(&mut self, input: [Field; N], in_len: u32) {
// in_len is all non-0 input
for i in 0..in_len {
self.sponge.absorb(input[i]);
}
self.fields += in_len;
}
// Finalise the sponge and output poseidon2 hash of all fields absorbed
fn squeeze(&mut self) -> Field {
self.sponge.squeeze()
}
}
```

To summarise: each base circuit starts with a `start` `SpongeBlob` instance, which is either blank or from the preceding circuit, then calls `.absorb()` with the tx effects as input. Just like the output `BaseOrMergeRollupPublicInputs` has a `start` and `end` `PartialStateReference`, it will also have a `start` and `end` `SpongeBlob`.

#### Merge

We simply check that the `left`'s `end` `SpongeBlob` == the `right`'s `start` `SpongeBlob`, and assign the output's `start` `SpongeBlob` to be the `left`'s and the `end` `SpongeBlob` to be the `right`'s.

#### Block Root

The current route is to inline the blob functionality inside the block root circuit.
<!-- We would allow up to 3 blobs to be proven in one block root rollup. For simplicity, the below explanation will just summarise what happens for a single blob. -->

First, we must gather all our tx effects ($d_i$ s). These will be injected as private inputs to the circuit and checked against the `SpongeBlob`s from the pair of `BaseOrMergeRollupPublicInputs` that we know contain all the effects in the block's txs. Like the merge circuit, the block root checks that the `left`'s `end` `SpongeBlob` == the `right`'s `start` `SpongeBlob`.

It then calls `squeeze()` on the `right`'s `end` `SpongeBlob` to produce the hash of all effects that will be in the blob. Let's call this `h`. The raw injected tx effects are `poseidon2` hashed and we check that the result matches `h`. We now have our set of $d_i$ s.

We now need to produce a challenge point `z`. This value must encompass the two 'commitments' used to represent the blob data: $C$ and `h` (see [here](https://notes.ethereum.org/@vbuterin/proto_danksharding_faq#Moderate-approach-works-with-any-ZK-SNARK) for more on the method). We simply provide $C$ as a public input to the block root circuit, and compute `z = poseidon2(h, C)`.

The block root now has all the inputs required to call the blob functionality described above. Along with the usual `BlockRootOrBlockMergePublicInputs`, we also have `BlobPublicInputs`: $C$, $z$, and $y$.

<!-- TODO(Miranda): Add details of block merge and root here once we know how they will look with batching -->

### L1 Contracts

#### Rollup

The function `propose()` takes in these `BlobPublicInputs` and a ts generated `kzgProof` alongside its usual inputs for proposing a new L2 block. The transaction also includes our blob sidecar(s). We verify the `BlobPublicInputs` correspond to the sidecars by calling EVM's point evaluation precompile:

```solidity
// input for the blob precompile
bytes32[] input;
// extract the blobhash from the one submitted earlier:
input[0] = blobHashes[blockHash];
input[1] = z;
input[2] = y;
input[3] = C;
// the opening proof is computed in ts and inserted here
input[4] = kzgProof;

// Staticcall the point eval precompile https://eips.ethereum.org/EIPS/eip-4844#point-evaluation-precompile :
(bool success, bytes memory data) = address(0x0a).staticcall(input);
require(success, "Point evaluation precompile failed");
```

We have now linked the `BlobPublicInputs` ($C$, $z$, and $y$) to a published EVM blob. We still need to show that these inputs were generated in our rollup circuits corresponding to the blocks we claim. For each proposed block, we store them:

```solidity
blobPublicInputs[blockNumber] = BlobPublicInputs({
z,
y,
c,
});
```

Then, when the epoch proof is submitted in `submitEpochRootProof()`, we access these to verify the ZKP:

```solidity
// blob_public_inputs
for (uint256 i = 0; i < _epochSize; i++) {
uint256 j = currentIndex + i;
publicInputs[j] = blobPublicInputs[previousBlockNumber + i + 1].z;
publicInputs[j + 1] = blobPublicInputs[previousBlockNumber + i + 1].y;
publicInputs[j + 2] = blobPublicInputs[previousBlockNumber + i + 1].c;
}
```

Note that we do not need to check that our $C$ matches the `blobhash` - the precompile does this for us.
Loading
Loading