-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not skip flaky test in CI #85
Do not skip flaky test in CI #85
Conversation
cc @fho Well, this sure is interesting. With only 1 cpu in use the Anyway, on my 16-core laptop, I tested this out with the following command:
This is a typical sequence of errors in the RabbitMQ logs:
The changes in this PR take into account the fact that you shouldn't use a channel after any channel exception (which was not the case prior to these changes). Notice the connection exception due to the fact that the Go client snuck in a So my guess at this time is that the increased concurrency available via |
Part of #77 Break on channel error, logging Add enough checks so that test passes, comment out some logging.
Cool, a 👍 from @fho is as good as any review. Merging once the tests pass. Have a great weekend. |
@lukebakken 😅 I actually only read your comment so far. |
Sounds good, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR changes the TestRepeatedChannelExceptionWithPublishAndMaxProcsIssue46
testcase in a way that it does not fail on the same condition then before.
The actual bug, that a connection becomes unusable after a message is published on a channel for a non-existing exchange still happens from now and then.
I can reproduce the issue with the previous version of the TestRepeatedChannelExceptionWithPublishAndMaxProcsIssue46
testcase via go test -count=100 -cpu 24 -race -v -tags integration -run=TestRepeatedChannelExceptionWithPublishAndMaxProcsIssue46 ./...
on top of ffeb46d
Thanks for taking the time to look at this @fho
Publishing to an exchange that doesn't exist causes a channel-level exception from RabbitMQ. After such an exception, you can't continue to use that channel. The test used to continue publishing to the same channel which violates the protocol. In addition, the test might also sneak a So, the original test was flawed but it worked because the tests always used to run with Really this test exposes why a RabbitMQ publisher must use publisher confirms, or at a bare minimum, the We can continue discussion in a new issue if you'd like. I'll wait to publish version 1.4.0 of this library in case I'm way off-base here! |
@lukebakken thanks for the extended explanation.
My understanding of the testcase is that it should ensure that the connection stays usable and that it is always possible to open a new channel. The error that is caused by publishing a message to an non-existing exchange should only render the channel unusable. After the testcase changes, these conditions are not ensured anymore:
Therefore I think the original issue can not be caught anymore after those changes. |
@fho I just woke up in the middle of the night and thought again about the test and realized there is indeed a bug somewhere in the library. Then of course you've already commented to say as much. I'll open a new issue in the morning. Thanks! |
Follow-up to #85 When this test fails, RabbitMQ logs the following connection exception: ``` 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> Channel error on connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest'), channel 20: 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> operation basic.publish caused a channel exception not_found: no exchange 'not-existing-exchange' in vhost '/' 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> Error on AMQP connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest', state: running), channel 20: 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> operation basic.publish caused a connection exception channel_error: "expected 'channel.open'" ```
Follow-up to #85 When this test fails, RabbitMQ logs the following connection exception: ``` 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> Channel error on connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest'), channel 20: 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> operation basic.publish caused a channel exception not_found: no exchange 'not-existing-exchange' in vhost '/' 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> Error on AMQP connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest', state: running), channel 20: 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> operation basic.publish caused a connection exception channel_error: "expected 'channel.open'" ```
Follow-up to #85 When this test fails, RabbitMQ logs the following connection exception: ``` 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> Channel error on connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest'), channel 20: 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> operation basic.publish caused a channel exception not_found: no exchange 'not-existing-exchange' in vhost '/' 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> Error on AMQP connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest', state: running), channel 20: 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> operation basic.publish caused a connection exception channel_error: "expected 'channel.open'" ```
* Revert test to demonstrate actual bug Follow-up to #85 When this test fails, RabbitMQ logs the following connection exception: ``` 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> Channel error on connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest'), channel 20: 2022-05-24 11:00:12.747989+00:00 [error] <0.19502.2> operation basic.publish caused a channel exception not_found: no exchange 'not-existing-exchange' in vhost '/' 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> Error on AMQP connection <0.19347.2> (172.17.0.1:46318 -> 172.17.0.2:5672, vhost: '/', user: 'guest', state: running), channel 20: 2022-05-24 11:00:12.748614+00:00 [error] <0.19347.2> operation basic.publish caused a connection exception channel_error: "expected 'channel.open'" ``` * Extend number of iterations * Looks like this has a good chance of fixing the issue related to #85 * Indicate that a channel is closed immediately after decoding a channelClose frame * Close the channel prior to the Once call in the same manner as the Connection * No need to set closed again, wording * Convert to the correct error type, thanks to @Gsantomaggio * Conversion fixes
Part of #77