Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP receiver async - fix panic during shutdown (async mode only) #29121

Merged

Conversation

hovavza
Copy link
Contributor

@hovavza hovavza commented Nov 12, 2023

Description:
Link to tracking Issue: Fixes #29120
Fixing a bug - panic happens during stop method in async mode only (didn't affect the default non-async mode).
When stop is called, it closes the messageQueue channel, signaling to processMessagesAsync to stop running. However, readMessagesAsync sometimes tries to write into the closed channel (depends whether the method is currently reading from the closed connection or currently trying to write to the channel), and as a result, panic error happens.

Separated between wg (waitGroup that serves non-async code and processMessagesAsync) & the new wg_reader (waitGroup serving readMessagesAsync only). This allows us to first stop readMessagesAsync, wait for it to finish, before closing the channel.
Instead, stop (in async mode) should do the following:

  1. Close the connection - signaling readMessagesAsync to stop - the messageQueue channel will remain open until that method is done so there's no risk of panic (due to writing to a closed channel).
  2. Wait for readMessagesAsync to finish (wait for new wg_reader).
  3. Close messageQueue channel (signaling processMessagesAsync to stop)
  4. Wait for processMessagesAsync to finish (wait sg).

Link to tracking Issue: 29120

Testing: Unitests ran. Ran concrete strato, stopped & restarted multiple times, didn't see any panic (and stop completed successfully as expected)

Documentation: None.

@hovavza hovavza requested a review from djaglowski as a code owner November 12, 2023 10:27
@hovavza hovavza requested a review from a team November 12, 2023 10:27
@atoulme
Copy link
Contributor

atoulme commented Nov 13, 2023

The changes make sense to me, is there a way to write a test to reproduce the issue?

@hovavza
Copy link
Contributor Author

hovavza commented Nov 13, 2023

The changes make sense to me, is there a way to write a test to reproduce the issue?
@atoulme , I'm not sure how. There is the TestInput test method that would fail if the Stop method failed (including panic), including in the async mode. However, it's a matter of luck and volume. The readMessagesAsync method needs to be after the read-from-port line and before the channel addition while the Stop is called. In low traffic scenarios like the tests, this wouldn't happen.

@crobert-1
Copy link
Member

Hello @hovavza, can you update the PR description with the following line?

Link to tracking Issue: Fixes #29120

This will allow GitHub to properly link this to the issue it's fixing.

@hovavza
Copy link
Contributor Author

hovavza commented Nov 14, 2023

Hello @hovavza, can you update the PR description with the following line?

Link to tracking Issue: Fixes #29120

This will allow GitHub to properly link this to the issue it's fixing.

@crobert-1 done

@hovavza
Copy link
Contributor Author

hovavza commented Nov 16, 2023

@djaglowski @crobert-1 @atoulme , can we complete this PR?

@djaglowski djaglowski merged commit c3d4d65 into open-telemetry:main Nov 16, 2023
82 checks passed
@github-actions github-actions bot added this to the next release milestone Nov 16, 2023
RoryCrispin pushed a commit to ClickHouse/opentelemetry-collector-contrib that referenced this pull request Nov 24, 2023
…n-telemetry#29121)

**Description:** 
Link to tracking Issue: Fixes open-telemetry#29120
Fixing a bug - panic happens during stop method in async mode only
(didn't affect the default non-async mode).
When stop is called, it closes the messageQueue channel, signaling to
processMessagesAsync to stop running. However, readMessagesAsync
sometimes tries to write into the closed channel (depends whether the
method is currently reading from the closed connection or currently
trying to write to the channel), and as a result, panic error happens.

Separated between wg (waitGroup that serves non-async code and
processMessagesAsync) & the new wg_reader (waitGroup serving
readMessagesAsync only). This allows us to first stop readMessagesAsync,
wait for it to finish, before closing the channel.
Instead, stop (in async mode) should do the following:
1. Close the connection - signaling readMessagesAsync to stop - the
messageQueue channel will remain open until that method is done so
there's no risk of panic (due to writing to a closed channel).
2. Wait for readMessagesAsync to finish (wait for new wg_reader).
3. Close messageQueue channel (signaling processMessagesAsync to stop)
4. Wait for processMessagesAsync to finish (wait sg).

**Link to tracking Issue:** 29120

**Testing:** Unitests ran. Ran concrete strato, stopped & restarted
multiple times, didn't see any panic (and stop completed successfully as
expected)

**Documentation:** None.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UDP receiver async - panic during shutdown (async mode only)
5 participants