Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate sync slowness near the chain tip #2877

Closed
conradoplg opened this issue Oct 14, 2021 · 3 comments
Closed

Investigate sync slowness near the chain tip #2877

conradoplg opened this issue Oct 14, 2021 · 3 comments
Labels
C-bug Category: This is a bug I-slow Problems with performance or responsiveness

Comments

@conradoplg
Copy link
Collaborator

conradoplg commented Oct 14, 2021

Motivation

It seems syncing is slower than usual. We should investigate and check if it's normal behavior or if there is some issue.

In one instance, I interrupted a (synced) Zebra and restarted it. It took 30 min for it to reach the tip again and start the mempool.

In other instance, it took 40 minutes to download, verify and commit ~400 blocks (while behind the tip), which seems a lot.

One particular thing I noticed (which I don't know if is related or not) is that most block verifications are cancelled (e.g. my currently node that has been running for ~2 hours has ~8K cancelled verifications and only ~400 verified blocks). These cancellations come from the restart in ChainSync::sync. Here is a log excerpt from when that was happening.

Diagnostic Suggestions

We can find the location of the errors by tracking the heights reached by the sync downloader, inbound block gossip downloader, BlockVerifier, non-finalised state, and finalised state.

This might also be related to the duplicate block errors in #1372 - the same block can get downloaded multiple times, and cause an error, which sometimes restarts the syncer. We recently added a block gossip task in #2729. Having more block gossips might have made duplicate blocks worse, because the syncer and downloader are more likely to download them.

We can check by adding metrics for each kind of error. That will be easier once we know where the errors are coming from.

We could also look at the trace logs and see what the specific errors are.

@conradoplg conradoplg added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage labels Oct 14, 2021
@teor2345 teor2345 changed the title Investigate sync slowness Investigate sync slowness near the chain tip Oct 14, 2021
@teor2345
Copy link
Contributor

I've only seen this issue near the chain tip, so I've edited the ticket name.

I've also added some suggestions for diagnosing the issue,

@oxarbitrage oxarbitrage added this to the 2021 Sprint 21 milestone Oct 19, 2021
@teor2345 teor2345 added C-bug Category: This is a bug I-slow Problems with performance or responsiveness P-Medium and removed C-enhancement Category: This is an improvement labels Oct 20, 2021
@mpguerra
Copy link
Contributor

@mpguerra
Copy link
Contributor

This seems to have been fixed by #2921

We should re-open if we see this re-occurring

@mpguerra mpguerra removed this from the 2021 Sprint 21 milestone Oct 25, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug I-slow Problems with performance or responsiveness
Projects
None yet
Development

No branches or pull requests

4 participants