Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

LefterisJP · 2018-10-30T23:04:45Z

Problem Definition

We have seen many cases lately where people's nodes get a TransactionUnderpriced error or similar at restart. This can probably happen when we have an on-chain transaction having been sent, added to the mempool but not yet mined and then we restart or the node crashes and we restart.

It does not seem that we are able to recover from this at the moment as at recovering from a restart we will process all pending transactions which will essentially re-process the pending ContractSendXXX event and resend the on-chain transaction leading to the nonce reuse or transaction underpriced error.

Task

We need to be able to handle this scenario as with the mainnet transactions taking too long to get mined this is really easy to hit.

How can we achieve this?

We should keep the transaction hash of all in flight chain transactions somewhere. Probably either in or close to the pending transactions of the ChainState. So whenever we send a transaction to the proxy the chainstate should already have it inside the pending transactions and we can add the pending transaction hash there.

Then at restarting of the node when we process the pending
transactions we can actually get the transaction hash of each pending transaction. Then for each of them call getTransactionByHash() and if this returns None continue by forwarding it to the raiden event handler for processing. But if on the other hand it returns a transaction then we know that the transaction is still pending to be mined so we don't do anything.

Notes:

It will take a bit of work to make the chainstate accessible from inside the proxy. Maybe a better/cleaner approach exists to achieve the same results. Open to suggestions.
Another facet of this issue can also be seen at Overwrite in-flight transactions #2801 where the same idea is taken even further by also thinking on canceling or replacing pending on-chain transactions.

The text was updated successfully, but these errors were encountered:

hackaugusto · 2018-10-30T23:12:13Z

We have seen many cases lately where people's nodes get a TransactionUnderpriced error or similar at restart

I have three ideas why this could happen:

There is a pending transaction with the same nonce in the pool. Either:
- The same account is being used by another application.
- The eth_getTransactionCount RPC call is returning an invalid nonce.
The error is also being used for something unrelated to transaction overwriting (what?)

Before working on this I recommend finding out which of the above (or other) reason is causing the underpriced error.

LefterisJP · 2018-10-30T23:54:48Z

I have three ideas why this could happen:

Why not also the node crashed/restarted while an on-chain transaction is pending to be mined, and when we restart it's still pending?

That's what I think is happening. Because at that point we will try to resend the transaction once we get it out of the pending transactions at restart but we will hit the same nonce since the pending transactions are not yet mined.

I think your comment here:

The eth_getTransactionCount RPC call is returning an invalid nonce.

hits the spot. We write 'pending' there but as far as I recall that does not check the mempool. At least for geth there is an issue open for 2.5 years now for this.

Parity has the same problem in this issue.

hackaugusto · 2018-10-31T00:14:53Z

Uh-oh, if the node doesn't tell us how many transactions are in the mempool the restart code will not work at all, either we have to switch to remote signing, or store the nonce locally.

LefterisJP · 2018-10-31T09:35:51Z

As discussed in breakfast either the transaction_hash approach or as you say storing the nonce locally would work for the short term perhaps. Let me investigate a bit when I got more time. All input is welcome!

LefterisJP · 2018-11-01T18:13:48Z

Done by: #2943

LefterisJP added this to the Red eyes testnet 16 milestone Oct 31, 2018

LefterisJP self-assigned this Oct 31, 2018

LefterisJP added State / Investigating For issues that are currently being looked into before labeling further Severity / Medium labels Oct 31, 2018

LefterisJP closed this as completed Nov 1, 2018

LefterisJP mentioned this issue Sep 25, 2019

Stop using web3.txpool.inspect for geth #4976

Closed

LefterisJP mentioned this issue Nov 5, 2019

Fully support Infura #5217

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

LefterisJP commented Oct 30, 2018

hackaugusto commented Oct 30, 2018

LefterisJP commented Oct 30, 2018

hackaugusto commented Oct 31, 2018

LefterisJP commented Oct 31, 2018

LefterisJP commented Nov 1, 2018

Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

Recovering from a restart/crash where an onchain transaction is in the mempool but not yet mined #2933

Comments

LefterisJP commented Oct 30, 2018

Problem Definition

Task

hackaugusto commented Oct 30, 2018

LefterisJP commented Oct 30, 2018

hackaugusto commented Oct 31, 2018

LefterisJP commented Oct 31, 2018

LefterisJP commented Nov 1, 2018