Import expired blobs #5391

realbigsean · 2024-03-11T22:59:52Z

Description

initializing a new archive node that wishes to store all blobs back to the deneb fork will need a way to import blobs into the db
syncing them over the network might place an undue burden on archive nodes because you'd ban all non-archive peers (unless we make peer scoring changes around this). This might not work well at all if there are too few archive nodes. It might work just by updating min_epochs_for_blob_sidecars_requests to u64::MAX
a better option would be to add a --blob-provider flag that points to an archive node's beacon API
we could also potentially support importing a blobs db from a file

The text was updated successfully, but these errors were encountered:

realbigsean · 2024-03-14T16:41:10Z

we should try to include an option for this in the next release because I think it'll be an important feature once blobs start expiring on mainnet. I'm not sure which path is easier/more useful so would appreciate feedback. We could end up implementing both, but I think starting with one for the next release makes sense

RPC download

if --blob-provider $URL is added, always download via beacon API up until the configured prune boundary
could this run as a process mostly independent of sync? It would just be required to lag sync
- how would this handle a scenario where blob data is invalid or unavailable - cause a crash?

DB Import export

When lighthouse is shut off:

lighthouse db export-blobs --file $FILENAME
lighthouse db import-blobs --file $FILENAME

On startup:

should lighthouse identify the db changes and go back to verify old blobs if the node is synced?
should this only be possible on a fresh sync?
do we need to add a flag --archive-blobs ? Or should this automatically be identified
LH will need to be aware of old blobs during sync so as to not duplicate download and writes, and to verify blocks older than the prune boundary against local blobs
how should gaps in the db here be handled if they are older than the prune boundary?

jonathanudd · 2024-08-29T11:24:08Z

Any updates regarding this?

michaelsproul · 2024-08-29T23:10:22Z

@jonathanudd Not yet sorry. We've been blocked a bit on implementing PeerDAS, but now that the bulk of that code is in unstable we can set to work on a future-compatible design here.

One complication is that with PeerDAS most nodes will cease to store whole blobs, so only supernodes (nodes that opt to store all blobs) will be able to implement the HTTP API for fetching/exporting blobs. Every other node will just have fragments ("columns").

Another issue is partial blob storage. At the moment Lighthouse uses a single marker to track which blobs are available: the oldest_blob_slot. The simplest way to implement an archive mode without changing this would be to store every blob from oldest_blob_slot forwards, and just lower that slot when we import a batch. However a more useful mode for L2s is probably to archive just a subset of blobs that they're interested in. This would break the current oldest_blob_slot model, and require us to always consider the possibility of gaps. This may be fine, as we need to consider gaps even when importing the full history, as @realbigsean identified in his comment. Another complication related to partial blob storage is that we store lists of blobs in the DB indexed by block root, and maintain an invariant that we either have 0 blobs for a block, or all of them. This invariant would be harder to change, so the simplest form of partial blob storage would have to be on a per-block basis.

One way forward might be:

Define an SSZ-based format for blobs similar to Era files. Something similar to Vec<BlobSidecar<E>> in slot-ascending order could be the starting point. We may want some metadata too like: file format version, network identifier, slot range (slot of oldest blob in the file, slot of newest blob in the file).
We could assume for now that we only need to handle the Deneb case, or we could build in support for PeerDAS data columns in the file format. Using data columns would allow ordinary nodes (not supernodes) to import and export data. Alternatively, maybe the file format should be flexible enough to support both columns and full blobs post-PeerDAS?
Implement lighthouse db import-blobs/export-blobs in terms of the above file format. To appease Lighthouse's DB invariants we could check that the sidecars for a given slot are complete, by comparing the list of blob sidecars to the blob KZG commitments from the block in the DB.
Implement a consistency check in Lighthouse's HTTP API to guard against returning wrong data when there are gaps. This is related to: Will lighthouse return an empty list when fetching an expired blob? #5640. We should probably load the block header and check that the DB contains the correct (complete) number of blobs prior to returning. We can error out with a 404 if the blobs are unavailable.

Down the line we can:

Encourage other clients to adopt the format we develop. This could be after some iteration on the design.
Develop standalone tools for manipulating the archive files: e.g. filtering to achieve partial blob storage.

michaelsproul · 2024-11-01T06:20:38Z

Mac and I are going to start working on this

michaelsproul · 2024-11-21T01:30:13Z

Some feedback based on the initial MVP:

Would be good to be able to import/export without shutting down the node.

michaelsproul · 2024-11-29T01:50:06Z

New plan:

Add a Lighthouse POST endpoint that accepts a list of blob sidecars (SSZ or JSON). It could do some lightweight verification: check if we have the corresponding block in our DB, optionally verify the KZG commitment/other proofs.
Add a subcommand to lighthouse db that uses either blob sidecar files from a directory, or the GET API of another BN to import blobs. POST them to the target BN using the POST API from (1).

Things to check:

Should we support importing blobs (not sidecars) from disk? I think the EF have a stash of these.
Eventually we could support importing a subset of blobs per block, but this is a bit hard with LH's current DB schema, so we might wait to do this later.

michaelsproul · 2024-12-06T04:49:03Z

Notes from today's call:

For importing batches:
- Whole batch either imports or not.
- Iteration to check that all referenced blocks exist prior to doing any database ops.
- And if ?verify=true check KZG commitments/inclusion proofs. Verify that all blobs in the same BlobSidecarList have the same block_root (requires hashing the block header).
For duplicate blobs:
- Try to load the existing blobs for the block root. If they're identical to the ones from the API, then silently skip. If they are different, then something is probably wrong, so raise an error. We could verify which of the two is correct, but only if ?verify=true, so it's simpler just to error in this case.
Front end:
- lighthouse db export-blobs: use std API to write SSZ Vec<BlobSidecarList<..>> files in directory. TODO: how to name these files? Alternative 0: named by slot range (a bit dodgy for unfinalized blobs). Alternative 1: write blobs in one file per block, named by block root, e.g. 0x00beef_00.ssz. Alternative 2: name by hash of list of block roots in batch? <- hash-based addressing is kind of nice?
- lighthouse db import-blobs: use the Lighthouse custom API (batched) to import all batches of blobs from a directory to a running BN. Eventually: standardise this import API.
Additional checks:
- If user is running with prune_blobs=false and tries to import blobs older than DA period: error.

realbigsean added the deneb label Mar 11, 2024

michaelsproul added the database label Mar 12, 2024

realbigsean added v5.0.0 Q1 2024 v5.2.0 Q2 2024 and removed v5.0.0 Q1 2024 labels Mar 14, 2024

nisdas mentioned this issue Apr 16, 2024

Allow Importing of Historical Blobs prysmaticlabs/prysm#13882

Open

chong-he mentioned this issue May 19, 2024

Flag --prune-blobs false prunes blobs older than 4096 epochs #5804

Closed

chong-he removed the v5.2.0 Q2 2024 label Aug 15, 2024

michaelsproul mentioned this issue Oct 21, 2024

Nodes missing blob data from slot 8633650 #6529

Closed

michaelsproul added the das Data Availability Sampling label Oct 31, 2024

michaelsproul assigned michaelsproul and macladson Nov 1, 2024

michaelsproul mentioned this issue Nov 7, 2024

MVP impl of import-blobs #6570

Open

macladson mentioned this issue Dec 4, 2024

Allow importing of historical blobs via HTTP API #6656

Open

michaelsproul mentioned this issue Jan 12, 2025

Compute columns in post-PeerDAS checkpoint sync #6760

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import expired blobs #5391

Import expired blobs #5391

realbigsean commented Mar 11, 2024

realbigsean commented Mar 14, 2024 •

edited

Loading

jonathanudd commented Aug 29, 2024

michaelsproul commented Aug 29, 2024

michaelsproul commented Nov 1, 2024

michaelsproul commented Nov 21, 2024

michaelsproul commented Nov 29, 2024

michaelsproul commented Dec 6, 2024 •

edited

Loading

Import expired blobs #5391

Import expired blobs #5391

Comments

realbigsean commented Mar 11, 2024

Description

realbigsean commented Mar 14, 2024 • edited Loading

jonathanudd commented Aug 29, 2024

michaelsproul commented Aug 29, 2024

michaelsproul commented Nov 1, 2024

michaelsproul commented Nov 21, 2024

michaelsproul commented Nov 29, 2024

michaelsproul commented Dec 6, 2024 • edited Loading

realbigsean commented Mar 14, 2024 •

edited

Loading

michaelsproul commented Dec 6, 2024 •

edited

Loading