-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChainSync refactor #693
ChainSync refactor #693
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like while in bootstrap phase, every hello we get, we'll be doing a sync from that Tipset? So if there are 3 Hello messages, they all get scheduled? Or am I misreading?
No, only tipsets that aren't connected get scheduled, which is the point of having multiple workers (so one doesn't get sidetracked with an invalid chain and have to start from scratch again) |
What do you mean by connected here? If i understand, connected means common ancestor? Lets say we receive a Hello from PeerA. We start doing a reverse header sync. Half way through a reverse header sync, we receive another Hello from PeerB. PeerB will probably have a newer head than PeerA. If we start to do a reverse header sync on PeerB as well, in most cases (assuming no attacks or weird things which I think is the most likely and far more common scenario), we will have a large amount of the same fetches. This is because a reverse header sync can take quite a bit of time if the chain is long, and the blocks dont get persisted into the store until ALL the blocks from head -> 0 are fetched, so WorkerB fetching from PeerB will not know that WorkerA fetching from PeerA is already fetching the same blocks. |
It's all based on the hello messages received, they get grouped as they come in, so as long as there isn't a break in between two epochs (probably rare) they will be considered on the same chain. This is determined by checking if either the tipset parent is the other or if they are the same. This logic existed, but is used more correctly than before (can still be improved on later). This is the point of having those tipset buckets (to track which tipsets are being synced and what to assign another worker to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it L(Real)GTM. I noticed that currently we dont verify the winning post proof. Cant remember why that is, but that can be addressed in another PR. Also nice one getting rid of the network_handler. I always got mixed up between that and the context when i was heads deep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome job! Great separating out the sync_worker
logic away from sync.rs
. Just a few nits and more questions than anything.
let ts: Tipset = Tipset::new(vec![header.clone()])?; | ||
// TODO this is really broken, should not be setting the tipset metadata to a tipset with just | ||
// the one header. | ||
pub async fn set_tipset_tracker(&self, header: &BlockHeader) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see the need for the tip_index.rs
file anymore. I recall implementing it really early thinking it would be good for tipset data retrievals but it doesn't look like we are using it anywhere for that purpose? We set the tracker but never make use of it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was planning on leaving it for now because out of scope of this and the changes were already a lot
blockchain/chain_sync/src/sync.rs
Outdated
use vm::TokenAmount; | ||
|
||
// TODO revisit this type, necessary for two sets of Arc<Mutex<>> because each state is | ||
// on seperate thread and needs to be mutated independently, but the vec needs to be read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: seperate -> separate
.chain_store | ||
.heaviest_tipset() | ||
.await | ||
.map(|heaviest| ts.weight() >= heaviest.weight()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be ts.weight() > heaviest.weight()
? Do we want to validate and set peer head when weight is equal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, intentional that if the weight is the same it will also be considered. Also the logic matches Lotus (Although they setup the function differently)
} | ||
|
||
/// informs the syncer about a new potential tipset | ||
/// This should be called when connecting to new peers, and additionally | ||
/// when receiving new blocks from the network | ||
pub async fn inform_new_head(&mut self, peer: PeerId, fts: &FullTipset) -> Result<(), Error> { | ||
pub async fn inform_new_head(&mut self, peer: PeerId, ts: &FullTipset) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed in the Lotus implementation they are adding the peerId
to their ChainExchange
pool, should there be a TODO to add this peer to the available peers for blocksync peers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're doing it when the hello message is handled by the syncer. I don't see why it would need to be handled after this process
commit 8ef5ae5 Author: Austin Abell <austinabell8@gmail.com> Date: Wed Sep 16 12:30:01 2020 -0400 Peer stats tracking and selection (#701) * Added peer manager logic and started to move blocksync handling into network context * Migrate to using handled blocksync requests and remove a duplicate code * Fix test use * Fix duration subtract logic * Update peer manager updates from hello requests * Docs and cleanup * more cleanup commit dc0ff4c Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Tue Sep 15 14:43:18 2020 -0400 Semantic Validation for Messages (#703) * semantic validation * suggestions * suggestions commit 3411459 Author: Austin Abell <austinabell8@gmail.com> Date: Mon Sep 14 15:14:48 2020 -0400 ChainSync refactor (#693) * Switch chain store to threadsafe * Update genesis to arc reference * wip refactoring chain sync to workers in async tasks * Update network event handling and remove NetworkHandler * Update tipset scheduling logic * Update peer retrieval to take a random sample of available peers * Cleanup and enabling all existing tests * fix worker task spawn * Add TODO for emit event ignoring and change to error log * oops * Update comment * Fix typo commit 66ca99e Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Mon Sep 14 13:48:22 2020 -0400 Fix StateManager use in different components (#694) * wrap sm in arc * lint * fix tests * suggestions * clippy commit 96b64cb Author: nannick <rajarupans@gmail.com> Date: Mon Sep 14 10:33:27 2020 -0400 Drand ignore env variable (#697) * First commit * Using enviromental vairable instead of cli flag * FIxing dumb mistakes * Renaming drand flag * UPdate latest_beacon_entry * Adding drand check * Adding doc commit 548a464 Author: Austin Abell <austinabell8@gmail.com> Date: Thu Sep 10 13:22:22 2020 -0400 Print out conformance results and add log for skips (#695) commit 0d7b16c Author: Stepan <s.s.naumov@gmail.com> Date: Tue Sep 8 09:01:05 2020 -0400 Add CLI command to add Genesis Miner to Genesis Template (#644) * AddMinerGenesis cmd command * fixing tests * addressing comments * removing jsons
* WIP * More changes * Even more changes * Amt docs * Clippy & copyright * Squashed commit of the following: commit 8ef5ae5 Author: Austin Abell <austinabell8@gmail.com> Date: Wed Sep 16 12:30:01 2020 -0400 Peer stats tracking and selection (#701) * Added peer manager logic and started to move blocksync handling into network context * Migrate to using handled blocksync requests and remove a duplicate code * Fix test use * Fix duration subtract logic * Update peer manager updates from hello requests * Docs and cleanup * more cleanup commit dc0ff4c Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Tue Sep 15 14:43:18 2020 -0400 Semantic Validation for Messages (#703) * semantic validation * suggestions * suggestions commit 3411459 Author: Austin Abell <austinabell8@gmail.com> Date: Mon Sep 14 15:14:48 2020 -0400 ChainSync refactor (#693) * Switch chain store to threadsafe * Update genesis to arc reference * wip refactoring chain sync to workers in async tasks * Update network event handling and remove NetworkHandler * Update tipset scheduling logic * Update peer retrieval to take a random sample of available peers * Cleanup and enabling all existing tests * fix worker task spawn * Add TODO for emit event ignoring and change to error log * oops * Update comment * Fix typo commit 66ca99e Author: Eric Tu <6364934+ec2@users.noreply.github.com> Date: Mon Sep 14 13:48:22 2020 -0400 Fix StateManager use in different components (#694) * wrap sm in arc * lint * fix tests * suggestions * clippy commit 96b64cb Author: nannick <rajarupans@gmail.com> Date: Mon Sep 14 10:33:27 2020 -0400 Drand ignore env variable (#697) * First commit * Using enviromental vairable instead of cli flag * FIxing dumb mistakes * Renaming drand flag * UPdate latest_beacon_entry * Adding drand check * Adding doc commit 548a464 Author: Austin Abell <austinabell8@gmail.com> Date: Thu Sep 10 13:22:22 2020 -0400 Print out conformance results and add log for skips (#695) commit 0d7b16c Author: Stepan <s.s.naumov@gmail.com> Date: Tue Sep 8 09:01:05 2020 -0400 Add CLI command to add Genesis Miner to Genesis Template (#644) * AddMinerGenesis cmd command * fixing tests * addressing comments * removing jsons * Delete old deadline tests * Fix verify_block_signature * Fix state_api * Silent unused variable warnings * clippy * clippy * clippy * Fix miner worker addr function * fix load_sectors_from_set * Fix get_miner_info * miner faults update * Setup faults and recoveries logic * Add todo for checking verification function * Update to specs-actors v0.9.3 * Fix test * Remove Cid.Undef check * PR fixes * Use the actor_error! macro in the miner actor * Fix type mismatch * PR changes * Use return value * Rename file * More PR fixes * Add assertions and error message * Downcast errors originating from Deadlines::{load_deadline, update_deadline} and State::save_deadlines * Rename Sectors::{new, load} to Sectors::{load, load_sector} * Fix inverted logic * Fix state manager * Remove unused parameters * clippy * Replace function by const, remove duplication, remove comment * Rename field, fix macro formatting, remove unnecessary vec * Downcast errors Co-authored-by: austinabell <austinabell8@gmail.com>
Summary of changes
Changes introduced in this pull request:
SyncWorker
struct, and theChainSyncer
just handles network events and scheduling/ handling sync workersSo the loading/ fetching of tipsets from hello protocol is done in separate threads as to not block the
ChainSyncer
polling or the network event channel will fill and stall (what was happening before). This can be handled better but I want to keep the scope of this PR to a minimum as it's already changing a lot.Reference issue to close (if applicable)
Closes #690
Other information and links