-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate sync slowness near the chain tip #2877
Comments
I've only seen this issue near the chain tip, so I've edited the ticket name. I've also added some suggestions for diagnosing the issue, |
This was referenced Oct 14, 2021
Hey team! Please add your planning poker estimate with ZenHub @conradoplg @dconnolly @jvff @oxarbitrage @teor2345 @upbqdn |
This seems to have been fixed by #2921 We should re-open if we see this re-occurring |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
It seems syncing is slower than usual. We should investigate and check if it's normal behavior or if there is some issue.
In one instance, I interrupted a (synced) Zebra and restarted it. It took 30 min for it to reach the tip again and start the mempool.
In other instance, it took 40 minutes to download, verify and commit ~400 blocks (while behind the tip), which seems a lot.
One particular thing I noticed (which I don't know if is related or not) is that most block verifications are cancelled (e.g. my currently node that has been running for ~2 hours has ~8K cancelled verifications and only ~400 verified blocks). These cancellations come from the restart in
ChainSync::sync
. Here is a log excerpt from when that was happening.Diagnostic Suggestions
We can find the location of the errors by tracking the heights reached by the sync downloader, inbound block gossip downloader, BlockVerifier, non-finalised state, and finalised state.
This might also be related to the duplicate block errors in #1372 - the same block can get downloaded multiple times, and cause an error, which sometimes restarts the syncer. We recently added a block gossip task in #2729. Having more block gossips might have made duplicate blocks worse, because the syncer and downloader are more likely to download them.
We can check by adding metrics for each kind of error. That will be easier once we know where the errors are coming from.
We could also look at the trace logs and see what the specific errors are.
The text was updated successfully, but these errors were encountered: