Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid submitting txs when simulation fails, except for one special case #1542

Merged
merged 9 commits into from
Nov 12, 2021

Conversation

romac
Copy link
Member

@romac romac commented Nov 4, 2021

Closes: #1479
Related: #712

Description

This PR changes the way the relayer reacts to tx simulation failures in the following way:

  • if the tx simulation failed because the client state height is too low, the relayer will fallback on the default gas setting and attempt to submit the tx anyway.

    @ancazamfir: https://github.com/informalsystems/ibc-rs/pull/1542#issuecomment-961766790
    Regarding the first case, it's been a lot on my mind as we see this very often in the packet worker. It happens when we have to split N msgs in a number of Tx-es with n (aka max_msg_num) msgs each, with n < N and when the first message is UpdateClient.
    We may have the worker determine n from the config and prepend the UpdateClient before every batch of n-1 packets. But by the time it gets serialized and into the runtime, it might have to split again due to max_tx_size. The other possibility that we discussed before is to not serialize messages at all and let the runtime ensure that each batch in the Tx starts with UpdateClient.
    I think that if we could fix this then we can drop the Tx on all simulation errors.

  • if the tx simulation failed for any other reason, we abort the submission of the tx and bubble the error up, and let the worker deal with the error. At the moment, workers do not handle these errors in any specific manner, and will just attempt to send the tx again for a fixed number of times. In that case, the overall behavior of the relayer does not change, except that we do not waste fees unnecessarily anymore because we won't attempt to actually submit the tx using the default gas.

How to test

  1. Create a channel

    ❯ hermes create channel ibc-0 ibc-1 --port-a transfer --port-b transfer
    
  2. In the Hermes config, set max_msg_num = 5 for chain ibc-0

  3. Start Hermes

    ❯ hermes start
    
  4. Send N > 5 packets to ibc-0

    ❯ hermes tx raw ft-transfer ibc-0 ibc-1 transfer channel-0 9999 -o 1000 -n 10 -k user2
    
  5. Check the hermes start logs, you should find something like:

    2021-11-05T14:12:09.633184Z  WARN ThreadId(30) [ibc-1] estimate_gas: failed to simulate tx, falling back on default gas because the error is potentially recoverable: gRPC call failed with status: status: InvalidArgument, message: "failed to execute message; message index: 0: acknowledge packet verification failed: packet acknowledgement verification failed: failed packet acknowledgement verification for client (07-tendermint-0): client state height < proof height ({0 243} < {0 554}): invalid height: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
    2021-11-05T14:12:09.633290Z DEBUG ThreadId(30) [ibc-1] send_tx: using 900000000 gas, fee Fee { amount: "900000stake", gas_limit: 900000000 }
    2021-11-05T14:12:09.639044Z DEBUG ThreadId(30) [ibc-1] send_tx: broadcast_tx_sync: Response { code: Ok, data: Data([]), log: Log("[]"), hash: transaction::Hash(BA94AE4CA198F56E27B4A44DA5E508A2E2207E306F475E5285D873296D892170) }
    

For contributor use:

  • Added a changelog entry, using unclog.
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • If applicable: Unit tests written, added test to CI.
  • Updated relevant documentation (docs/) and code comments.
  • Re-reviewed Files changed in the Github PR explorer.

@romac romac force-pushed the romac/1479-avoid-useless-txs branch from 4e75c61 to da1e368 Compare November 4, 2021 14:26
@romac romac changed the title Avoid resubmitting redundant txs when tx simulation fails Avoid submitting txs when simulation fails, except for one special case Nov 4, 2021
@adizere
Copy link
Member

adizere commented Nov 4, 2021

Feedback so far: love the PR title! 💯

Quick idea we can iterate on:

if the tx simulation failed because the chain sent an account sequence mismatch error, the relayer will fallback on the default gas setting and attempt to submit the tx anyway. This is because we have seen cases in production where this happened, for reasons which are yet to be clarified.

Since we noticed this special case in production, I wonder if it makes sense to quickly parse some production logs: (1) grep for sequence mismatch error at the simulation step, and (2) derive a rough success vs. failure ratio for the submitted transactions. If many tx-es succeed, the solution here is sound! Otherwise we can think about revising the approach. Would that make sense?

@ancazamfir
Copy link
Collaborator

ancazamfir commented Nov 5, 2021

A few thoughts:

  • we have to continue for errors that include ..verification failed: ...client state height < proof height :
2021-11-04T13:28:00.277866Z ERROR ThreadId(26) [ibc-0] send_tx: failed to estimate gas, falling back on default gas, error: GRPC call return error status status: InvalidArgument, message: "failed to execute message; message index: 0: acknowledge packet verification failed: packet acknowledgement verification failed: failed packet acknowledgement verification for client (07-tendermint-0): client state height < proof height ({1 323} < {1 400}): invalid height: invalid request", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }
  • for account sequence mismatch we need to re-cache but with the understanding that there is still no guarantee that we get it right as other Tx-es may be in the mempool. The only cases I know where this happens are:

    • the wallet is being used outside of this hermes instance (in this case expected > got)
    • a previous Tx has passed CheckTx but dropped (e.g. by the mempool) before DeliverTx has a chance to run. One example is when we send max gas > block max gas (in this case expected < got)
      Continuing with sending the Tx will not solve this I think.
  • for all other errors is probably good to just drop the Tx with error like you propose

Regarding the first case, it's been a lot on my mind as we see this very often in the packet worker. It happens when we have to split N msgs in a number of Tx-es with n (aka max_msg_num) msgs each, with n < N and when the first message is UpdateClient.
We may have the worker determine n from the config and prepend the UpdateClient before every batch of n-1 packets. But by the time it gets serialized and into the runtime, it might have to split again due to max_tx_size. The other possibility that we discussed before is to not serialize messages at all and let the runtime ensure that each batch in the Tx starts with UpdateClient.
I think that if we could fix this then we can drop the Tx on all simulation errors.

So in summary I thinking we should consider:

  • fix first issue
  • recache account sequence on SDK 32 error
  • drop Tx for ALL simulation errors

@romac romac marked this pull request as ready for review November 5, 2021 14:18
@romac romac requested a review from ancazamfir November 5, 2021 14:18
self.max_gas(),
));
Err(e)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still check that the estimated gas is smaller than the configured max gas, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@adizere adizere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like solid work Romain, I'm still testing and will merge soon!

relayer/src/chain/cosmos.rs Show resolved Hide resolved
@adizere adizere merged commit e511439 into master Nov 12, 2021
@adizere adizere deleted the romac/1479-avoid-useless-txs branch November 12, 2021 10:30
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this pull request Sep 13, 2022
…se (informalsystems#1542)

* Rename CosmosSdk::estimate_gas to simulate_gas

* Rephrase a couple gRPC-related errors

* Recover from `account seq mismatch` error during tx simulation

* Add changelog entry

* Fix doc test

* Undo naming change

* Continue when client state height too low rather than on account sequence mismatch

* Check that the configured max gas is smaller than the max gas consensus param

* Re-introduce check that the estimated gas is smaller than the max gas
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hermes resubmits already received packets
3 participants