-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
connection: fix: reader go-routine is leaked on connection close #70
Conversation
c222ead
to
3e2ea21
Compare
@fho no rush at all, but when you get time could you check out the integration test failures? I'm checking it out too when I have time. |
@lukebakken please create seperate commits for the changes that you are doing, instead of editing mine, also please don't force-push my branch. I'll have a look regarding the ci failures next week, one seems to be because of missing go.sum changes. |
3e2ea21
to
c222ead
Compare
I've reverted the squash, but please note that the changes will probably be squashed when merged into I added a step to GitHub actions to run |
thanks!
Running
The tests failed because the go routine leak detector found a heartbeat go-routine still running when the new Those go-routine leaks already exists in the main branch and happen because some testcases did not close the connection they opened. |
I had to add it explicitly because the CI runs failed without it 🤷 |
d981c68
to
0c5ba36
Compare
Thanks for your contributions. I merged #73 into |
I'll have a look. I'm doing it like described here: go test -tags=integration -c -o tests
for test in $(go test -tags integration -list . | grep -E "^(Test|Example)"); do ./tests -test.v -test.run "^$test\$" && echo -n "." || echo -e "\n$test failed"; done |
hrm, I could not reproduce the test failure locally so far. Another test fails on my machine sometimes that does not fail on CI though :-)
|
@lukebakken 💦 I found the leaked go-routine, it was in The tests now succeeded. |
Well done! You found it just by reviewing the code? |
When a message was sent and it's response was received while the connection was closed or an error happened, the reader go-routine could get stuck and be leaked. The reader go routine tries to send a received message to the unbuffered c.rpc channel via the dispatch0() and dispatchN() methods. The call() method reads from the rpc channel. If an error happened while the dispatch method sends a message to the rpc channel, the call() method could terminate because it read an error from c.errors or because c.errors was closed. To prevent the scenario: - the reader go-routine now closes c.rpc when it terminates, - The call() method, reads from c.rpc until a message was received or it is closed. When c.rpc is closed, it reads an error from c.errors or wait until c.errors is closed. When it reads an error, it returns it. If it is closed it returns ErrClosed. This ensures that the messages is read from c.rpc before call() returns. It also ensures that when a message was received that it is processed. Previously it could happen that the message was silently ignored because c.errors returned an error or was closed. tests: add testcase to ensure reader routine terminates Add a testcase for the bug that the reader go-routine tries to send a message to the buffered rpc channel but call() terminated because it read an error from the errors chan or the errors chan was closed. It cause that reader routine gets stuck forever and does not terminate when the connection is closed. More information: rabbitmq#69. This testcase does not reproduce the issue reliably, but it is triggered in ~80% of executions. Bump GH actions versions add missing go.sum file tests/TestRequiredServerLocale: close connection on testcase termination The TestRequiredServerLocale testcase was not closing the connection that it opened. This caused the goleak detector in the TestReaderGoRoutineTerminatesWhenMsgIsProcessedDuringClose testcase to complain about a leaked heartbeat go-routine. tests: remove duplicate goleak invocation goleak is now called in TestMain(). The invocation in the TestReaderGoRoutineTerminatesWhenMsgIsProcessedDuringClose testcase can be removed. tests/ExampleConnection_reconnect: close connection on termination ExampleConnection_reconnect was not closing the opened connection on termination. This caused the goleak checker to complain about a leaked heartbeat go routine. Close the connection.
221de11
to
2dfde48
Compare
Thank you! |
@lukebakken I somewhen tried running the tests via Before I was running it via the commands from #70 (comment) or simply via When I understood that I defined the env variable, run the tests as described in #70 (comment) and then I found the culprit. :-) |
@lukebakken thank you too :-) |
When a message was sent and it's response was received while the
connection was closed or an error happened, the reader go-routine could
get stuck and be leaked.
The reader go routine tries to send a received message to the unbuffered
c.rpc channel via the dispatch0() and dispatchN() methods.
The call() method reads from the rpc channel.
If an error happened while the dispatch method sends a message to the
rpc channel, the call() method could terminate because it read an error
from c.errors or because c.errors was closed.
To prevent the scenario:
is closed. When c.rpc is closed, it reads an error from c.errors or
wait until c.errors is closed.
When it reads an error, it returns it. If it is closed it returns
ErrClosed.
This ensures that the messages is read from c.rpc before call() returns.
It also ensures that when a message was received that it is processed.
Previously it could happen that the message was silently ignored because
c.errors returned an error or was closed.
This fixes #69.
A testcase can be found in PR #71