-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some consensus node only proposal one miner transaction when there are transactions in the network #817
Comments
Nice, my friend, good innitiave. Their mempool will be probably empty, it is a problem in the tx rebroadcasting. |
I have tested on testnet.
All in all, I think it seems we have route problem? |
Good work! Did you manage to log the messages received 'per node'? We now need confirmation that:
Is there another possible cause? I can only think about these two considering the evidence you provided. Edit: If you have to reboot the node to make it work, is it evidence that the error is in the node itself, and not with other peers? |
@KickSeason, it is this. This was one of the hypotheses, now you tested and found a fact. Nice work. The class neo/neo/Network/P2P/RemoteNode.cs Lines 230 to 234 in 70bf2f5
After that, it is again set to true neo/neo/Network/P2P/RemoteNode.cs Lines 106 to 110 in 70bf2f5
Which is a message usually created from here: neo/neo/Network/P2P/Connection.cs Lines 131 to 144 in 70bf2f5
I believe we need to better arrange the timeouts and this requirement set of |
@lock9, aligned with this we have the priorities on messages, which might be one of the reasons in which they still participate in the consensus but "do not have time" to receive the other types of TCP data. |
Interesting observations... perhaps message is received but not internally passed on Akka? |
I will continue testing and record all p2p messages. If the node received transaction message, we can be sure it's akka problem or node internal problem. |
any news @KickSeason ? |
@shargon, can you take a look at me last message above about the |
no progress. I'm testing on testnet. I can only access one consensus node. I have to wait it reproducing this issue. |
I built a p2p-plugin and filter tx inv and tx message:
I wonder why there are so many hashes that cn didn't get tx from? I searched those hashes. They aren't on blockchain. |
Those hashes are on know hashes but no received by |
Ngd have set up another testnet for repoduce the issue. |
Could NGD test it too with the last patch #865? |
We just set up another testnet to try reproducing this issue. But it haven't happened yet. |
Thanks for all @KickSeason ! |
@shargon It's already happened on Testnet No.2, we're making analysis for consensus log. |
Good news, the first step is "reproduce it" |
hello guys. I add log in neo based on v2.10.2 and reprodued the issue. Here is the log file: 2019-07-02.log At height 31922, this node is Primary. It proposal empty block. @shargon @vncoelho @jsolman @igormcoelho I think there are so many high priority messages in TaskManagerMailbox that no new task for tx. I will add mailbox queue count logs. And the block messages. This node can handle block messages. Block message is also hight priority letter. |
It is like a
Add more logs on memorypool please |
Hmm, if it receives the transaction but doesn’t think it is valid, but adds it in known hashes, but it later becomes valid, that could be a problem if knownhashes won’t allow it to receive it again. We should check if this is what is happening. |
However that would not cause all blocks by that CN to be completely empty of all but the miner transaction. So that may not explain what is happening. |
There are knownhashes count in log file. About 154000 hashes, it's around 4MB. |
Will add mempool log and test |
Could you test in parallel, with and without patch? |
You mean the knownhashes fix patch? |
Yes, just for confirm if this solve the problem, at least in the same time frame |
Btw
This is where I add inv tx log. I think it is nothing to do with knownHases in ProtocolHandler if there is a print log. |
I think we make a great progress. I add more logs and reproduced again in our testnet2. you can look at the TaskManagerMailbox.log first. It's is crowding by a lot of low priority messages. |
13.300 low priority messages???? |
yes |
Is the same url, is this right? |
Sorry, mistake. |
I would like to know how many duplicates, we should check the hash before append to the queue |
How many hours until the issue appear? |
3 days after restart.Only 1 node behave like this. |
I would like to know the type of the messages in the queue, something like is that possible? Thanks for your work! |
Will do. |
We should cache the consensus messages on the CN nodes in case they arrive out of order. Other nodes should cache also for a time and drop duplicates and prevent storms of consensus messages. |
Yes, we can have a consensus message pool |
Also, any consensus message from a height less than the current block can just be dropped immediately and not forwarded. This alone will probably fix the issue. |
Recently we find that some consensus nodes only proposal one minter transaction. you can find details on neoscan from 3846424 to 3856930. During this time, there only 2 out of 7 consensus nodes proposal non-miner transactions.
And there is an old problem that sometimes one certain consensus node won't proposal transaction except miner transaction. This issue has been discussed before at #474 . But now seems not solved.
I will do some test first to find the reason.
First, I will make certain whether the transactions arrived in the mempool of the problem cn.
Do you have any idea about this? Is a solution to this currently in progress?
@vncoelho @igormcoelho @jsolman
The text was updated successfully, but these errors were encountered: