Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear packets on hermes start #1201

Merged
merged 5 commits into from
Jul 15, 2021
Merged

Clear packets on hermes start #1201

merged 5 commits into from
Jul 15, 2021

Conversation

ancazamfir
Copy link
Collaborator

@ancazamfir ancazamfir commented Jul 14, 2021

Closes: #1200
Partially closes: #1196

Description

Hermes does not clear packets on start if there are no IBC events and no clear_packets_interval trigger.
The e2e script masks this error because it sends some packets, starts hermes and then sends more packets (issue masked here because this triggers packet clearing) before checking that all packets are cleared.

  • Fixed script to catch the bug
  • Fixed the bug

For contributor use:

  • Updated the Unreleased section of CHANGELOG.md with the issue.
  • If applicable: Unit tests written, added test to CI.
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Updated relevant documentation (docs/) and code comments.
  • Re-reviewed Files changed in the Github PR explorer.

@ancazamfir ancazamfir marked this pull request as ready for review July 14, 2021 14:18
@ancazamfir ancazamfir requested review from adizere and romac as code owners July 14, 2021 14:18
// at predefined block intervals.
if self.clear_packets_interval != 0
&& height.revision_height % self.clear_packets_interval == 0
if link.a_to_b.clear_packets()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two remarks for this change:

  1. It can trigger link.a_to_b.do_clear_packets repeatedly until the clear_packets flag is set to false. This flag is set to false only when the worker receives a command with a batch of events (that is the first match arm above, on line 103).

  2. If I understand correctly, with this change there may be two separate invocations to link.a_to_b.do_clear_packets:

    • one invocation upon NewBlock (this call site), which would clear packets for that NewBlock's height
    • another invocation with the call site within update_schedule(batch), which would clear packets for the height of that batch, see the snippet below:

https://github.com/informalsystems/ibc-rs/blob/872a5030f9bcc46cfbbb71803554731b3a12ca16/relayer/src/link/relay_path.rs#L299-L306

If the height of NewBlock event and the batch height are the same, then we'll clear packets twice for the same height. Worst case, this will schedule twice the execution of the same operational data.

BTW I tried to reproduce point (2) but was not able. I was only able to reproduce problem (1) above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted an alternative fix
628425b

Feel free to revert if this approach is not appropriate.

Copy link
Collaborator Author

@ancazamfir ancazamfir Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the height of NewBlock event and the batch height are the same, then we'll clear packets twice for the same height. Worst case, this will schedule twice the execution of the same operational data.

Ah yes, good point! This is in fact the case in v0.5.0 and older versions also. We used to get the IBC event before the New Block event (at same height) so we had to clear on IBC event also. With the changes in the scheduler and monitor we now always get the NewBlock before the IBC event. I think we can clear packets on NewBlock only. I will try some changes along these lines so we simplify the code quite a bit.

I was only able to reproduce problem (1) above.

How?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How?

Steps to reproduce (1)

  1. dev-env with two chains, and create a channel between the two;
  2. git co 872a503 and start hermes. The logs show

Jul 15 09:24:16.570 INFO [ibc-1:transfer/channel-0 -> ibc-0] clearing pending packets from events before height 1-312
Jul 15 09:24:16.581 INFO [ibc-1:transfer/channel-0 -> ibc-0] finished scheduling the clearing of pending packets
Jul 15 09:24:16.886 INFO [ibc-0:transfer/channel-0 -> ibc-1] clearing pending packets from events before height 0-324
Jul 15 09:24:16.902 INFO [ibc-0:transfer/channel-0 -> ibc-1] finished scheduling the clearing of pending packets
Jul 15 09:24:17.723 INFO [ibc-1:transfer/channel-0 -> ibc-0] clearing pending packets from events before height 1-313
Jul 15 09:24:17.733 INFO [ibc-1:transfer/channel-0 -> ibc-0] finished scheduling the clearing of pending packets
Jul 15 09:24:18.091 INFO [ibc-0:transfer/channel-0 -> ibc-1] clearing pending packets from events before height 0-325
Jul 15 09:24:18.101 INFO [ibc-0:transfer/channel-0 -> ibc-1] finished scheduling the clearing of pending packets
Jul 15 09:24:18.872 INFO [ibc-1:transfer/channel-0 -> ibc-0] clearing pending packets from events before height 1-314
Jul 15 09:24:18.880 INFO [ibc-1:transfer/channel-0 -> ibc-0] finished scheduling the clearing of pending packets
Jul 15 09:24:19.190 INFO [ibc-0:transfer/channel-0 -> ibc-1] clearing pending packets from events before height 0-326

The problem was: the flag link.a_to_b.clear_packets is never set to false, so upon every NewBlock event, the method do_clear_packets is invoked.

@adizere adizere self-requested a review July 15, 2021 07:27
Copy link
Member

@adizere adizere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adizere adizere merged commit 9190230 into master Jul 15, 2021
@adizere adizere deleted the clearonstart branch July 15, 2021 08:09
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this pull request Sep 13, 2022
* Fix script to catch the packets not cleared on start

* Clear packets on start

* Attempt at an alternative fix

* Clear packets only on NewBlock event

* Changelog

Co-authored-by: Adi Seredinschi <adi@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hermes does not clear packets on start Hermes issues found by Dex scalability tests on v0.6.0
2 participants