-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve range sync with PeerDAS #6258
Comments
This does seem necessary to me, also because requesting all custody columns from the same peer means you have to find a peer that exactly matches your custody right? so retries would happen frequently |
+1 to the above. One of the bugs we had earlier on The proposed change here would allow us to start requesting blocks and columns without having to wait for peers to be available across all custody subnets (for supernodes that would mean requests would be delayed until it has peers across all 128 subnets!). |
Noting another issue with backfill sync: Backfill sync sources blocks and blobs / data columns from peers with the _by_range RPCs. We bundle those results in an RpcBlock and send it to the processor. Since now the block and column peers may be different we need to attribute fault to the right peer. Currently Change the function to fn process_backfill_blocks() {
check_hash_chain(downloaded_blocks)?;
check_availability(downloaded_blocks)?;
import_historical_block_batch(downloaded_blocks)?;
} Where |
Description
Currently range sync and backfill sync fetch blocks and blobs from the network with this sequence:
block_by_range
andblobs_by_range
to the SAME ONE peerThis strategy is not optimal but good enough for now. However with PeerDAS, the worst-case number of requests per batch increases from 2 (blocks + blobs) to 2 +
DATA_COLUMN_SIDECAR_SUBNET_COUNT / CUSTODY_REQUIREMENT = 2+32
(if not connected to any larger node.If we extend the current paradigm, a single failure on a columns_by_range request will trigger a retry of all 34 requests. Not optimal 😅
A solution is to make the "components_by_range" request able to retry each individual request. This is what block lookup requests do, where each component (block, blobs, custody) has its own state and retry count.
Your feedback requested here 👇
Going that direction would add a bunch of lines to sync, first of all I want to check if you agree with the problem, and that it's actually necessary to make each request retry-able.
The text was updated successfully, but these errors were encountered: