Investigate sync slowness near the chain tip #2877

conradoplg · 2021-10-14T17:03:25Z

Motivation

It seems syncing is slower than usual. We should investigate and check if it's normal behavior or if there is some issue.

In one instance, I interrupted a (synced) Zebra and restarted it. It took 30 min for it to reach the tip again and start the mempool.

In other instance, it took 40 minutes to download, verify and commit ~400 blocks (while behind the tip), which seems a lot.

One particular thing I noticed (which I don't know if is related or not) is that most block verifications are cancelled (e.g. my currently node that has been running for ~2 hours has ~8K cancelled verifications and only ~400 verified blocks). These cancellations come from the restart in ChainSync::sync. Here is a log excerpt from when that was happening.

Diagnostic Suggestions

We can find the location of the errors by tracking the heights reached by the sync downloader, inbound block gossip downloader, BlockVerifier, non-finalised state, and finalised state.

This might also be related to the duplicate block errors in #1372 - the same block can get downloaded multiple times, and cause an error, which sometimes restarts the syncer. We recently added a block gossip task in #2729. Having more block gossips might have made duplicate blocks worse, because the syncer and downloader are more likely to download them.

We can check by adding metrics for each kind of error. That will be easier once we know where the errors are coming from.

We could also look at the trace logs and see what the specific errors are.

The text was updated successfully, but these errors were encountered:

teor2345 · 2021-10-14T20:23:27Z

I've only seen this issue near the chain tip, so I've edited the ticket name.

I've also added some suggestions for diagnosing the issue,

mpguerra · 2021-10-22T09:32:22Z

Hey team! Please add your planning poker estimate with ZenHub @conradoplg @dconnolly @jvff @oxarbitrage @teor2345 @upbqdn

mpguerra · 2021-10-25T21:56:30Z

This seems to have been fixed by #2921

We should re-open if we see this re-occurring

conradoplg added C-enhancement Category: This is an improvement S-needs-triage Status: A bug report needs triage labels Oct 14, 2021

teor2345 changed the title ~~Investigate sync slowness~~ Investigate sync slowness near the chain tip Oct 14, 2021

This was referenced Oct 14, 2021

Ignore duplicate block AlreadyInChain errors in the syncer #2883

Closed

Ignore AlreadyInChain error in the syncer #2890

Merged

oxarbitrage added this to the 2021 Sprint 21 milestone Oct 19, 2021

teor2345 added C-bug Category: This is a bug I-slow Problems with performance or responsiveness P-Medium and removed C-enhancement Category: This is an improvement labels Oct 20, 2021

teor2345 mentioned this issue Oct 24, 2021

Handle duplicate block errors #1372

Closed

6 tasks

mpguerra closed this as completed Oct 25, 2021

mpguerra removed this from the 2021 Sprint 21 milestone Oct 25, 2021

mpguerra removed the S-needs-triage Status: A bug report needs triage label Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate sync slowness near the chain tip #2877

Investigate sync slowness near the chain tip #2877

conradoplg commented Oct 14, 2021 •

edited by teor2345

Loading

teor2345 commented Oct 14, 2021

mpguerra commented Oct 22, 2021

mpguerra commented Oct 25, 2021

Investigate sync slowness near the chain tip #2877

Investigate sync slowness near the chain tip #2877

Comments

conradoplg commented Oct 14, 2021 • edited by teor2345 Loading

Motivation

Diagnostic Suggestions

teor2345 commented Oct 14, 2021

mpguerra commented Oct 22, 2021

mpguerra commented Oct 25, 2021

conradoplg commented Oct 14, 2021 •

edited by teor2345

Loading