Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Fix for stuck nodes: Don't lose future blocks if multiple future blocks received at the same height #489

Merged
merged 6 commits into from
Nov 30, 2018

Conversation

jsolman
Copy link
Contributor

@jsolman jsolman commented Nov 29, 2018

I've been observing for some time now that nodes can seem to get stuck and stop syncing. After spending some time in the TaskManager it seemed that somehow a block hash must be getting into knownHashes but somehow getting lost and never making it into block_cache_unverified in Blockchain.cs. After a careful audit of how that could happen, I belive I've found a scenario that can cause it.

Update: The original scenario that this change started with turned out not to be an issue. The second change does fix the situation where a valid block could be removed from block_cache_unverified while still in knownHashes if an invalid future block were sent to the network above the current height.

…ht, that is not the next block it should go into the unverified cache.
@jsolman jsolman requested a review from erikzhang November 29, 2018 20:53
@jsolman
Copy link
Contributor Author

jsolman commented Nov 29, 2018

Added another commit also to fix receiving potential invalid future blocks being able to cause syncing to get stuck

@jsolman
Copy link
Contributor Author

jsolman commented Nov 30, 2018

There is something else besides this needed to fix nodes getting stuck. The hunt continues.

It may have just been due to the bug I introduced. in b85e21f fixd by 9caf2e1. Testing again now.

@jsolman jsolman changed the title Potential Fix for stuck nodes: Don't lose future blocks that don't verify if one above the header height Potential Fix for stuck nodes: Don't lose future blocks if two future blocks received at the same height Nov 30, 2018
@jsolman jsolman changed the title Potential Fix for stuck nodes: Don't lose future blocks if two future blocks received at the same height Potential Fix for stuck nodes: Don't lose future blocks if multiple future blocks received at the same height Nov 30, 2018
@erikzhang
Copy link
Member

So, the only scenario that this fix solves is that someone sent fake blocks on the network?

@jsolman
Copy link
Contributor Author

jsolman commented Nov 30, 2018

Yeah at this point, it solves for someone sending fake blocks on the network, unless there is any way for CN nodes to ever send an invalid block out, which I think there isn't. Do we know for sure no one has sent fake blocks on the network?

@erikzhang
Copy link
Member

If you have tested it well, I will merge it.

@jsolman
Copy link
Contributor Author

jsolman commented Nov 30, 2018

Give me 12 hours to test more. I’m running it now and it seems fine, but I’m headed to bed now and would like will examine logs in the morning.

@erikzhang
Copy link
Member

OK. If your test is complete, you can merge it by yourself.

@jsolman
Copy link
Contributor Author

jsolman commented Nov 30, 2018

Thanks, I will do so when ready then.

@vncoelho
Copy link
Member

vncoelho commented Nov 30, 2018

@jsolman, yesterday we tested it and the results keep the same, it does not finally solve the issue.

However, @erikzhang, the following analysis may help us to keep tracking/hunting this syncing issue.

When syncing problems, a common state is:
99/22/79/39/37, namely:
{block.Index}/{Height}/{header_index.Count}/{block_cache.Count}/{block_cache_unverified.Count}

We receive a block with higher index and we do not have the header yet.
Some blocks are in the cache and some are unverified.

This keeps until on lucky we are able to process the process and reach the top again.

@jsolman jsolman merged commit 07896db into master Nov 30, 2018
@shargon shargon deleted the PotentiallyFixNodesGettingStuck branch November 30, 2018 16:58
@jsolman jsolman mentioned this pull request Dec 5, 2018
rodoufu pushed a commit to rodoufu/neo that referenced this pull request Mar 5, 2019
Thacryba pushed a commit to simplitech/neo that referenced this pull request Feb 17, 2020
Added the following:
test, notices, overview

Updated:
introduction, python
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants