Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shutdown behavior for groupbyprocessor #1465

Closed
jpkrohling opened this issue Jul 31, 2020 · 4 comments · Fixed by #1842
Closed

Shutdown behavior for groupbyprocessor #1465

jpkrohling opened this issue Jul 31, 2020 · 4 comments · Fixed by #1842
Assignees
Labels
bug Something isn't working help wanted Good issue for contributors to OpenTelemetry Service to pick up

Comments

@jpkrohling
Copy link
Member

When working on #1362, there was a discussion about what the shutdown behavior should be for the processor, regarding the in-flight traces:

what should happen with in-flight traces when the processor is shutting down? Should we just discard them? Should we flush them immediately, blocking until they are consumed, possibly honoring the context's deadline?

To unblock that PR, this issue here was created, so that the appropriate solution is agreed on, to be implemented as a follow-up PR.

@jpkrohling jpkrohling added the bug Something isn't working label Jul 31, 2020
@tigrannajaryan tigrannajaryan added this to the GA 1.0 milestone Aug 12, 2020
@tigrannajaryan tigrannajaryan added the help wanted Good issue for contributors to OpenTelemetry Service to pick up label Sep 2, 2020
@jpkrohling
Copy link
Member Author

This should also be assigned to me.

@jpkrohling
Copy link
Member Author

cc @nilebox, as you reviewed the original PR.
cc @pjanotti, @bogdandrutu and @tigrannajaryan for your opinions.

The groupbyprocessor's premise is to release traces to the next processor only when a trace has been completed. It's unclear what should be the behavior during the shutdown: should in-memory traces be discarded, as they are potentially incomplete? Or should them all be flushed, to give them a chance of being persisted, even if partially?

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Sep 3, 2020

I believe the desirable shutdown behavior is to stop receiving new data, drain all in-memory data from the pipeline, flush the exporters and exit the process. For groupbyprocessor this would mean flushing the accumulated (even if incomplete) traces to the next consumer. I do not think there is an expectation to wait for a trace to be complete before shutdown is complete.

@jpkrohling
Copy link
Member Author

I think that's in line with @nilebox's opinions. I'll work on doing this then.

jpkrohling added a commit to jpkrohling/opentelemetry-collector that referenced this issue Sep 25, 2020
* Drain the queue upon shutdown, with a time limit. Fixes open-telemetry#1465.
* Added metrics to the groupbyprocessor, making it easier to understand what's going on in case of problems. See open-telemetry#1811.
* Changes the in-memory storage to unlock its RLock when the method returns. Fixes open-telemetry#1811.

Link to tracking Issue: open-telemetry#1465 and open-telemetry#1811
Testing: unit + manual tests
Documentation: see README.md

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
jpkrohling added a commit to jpkrohling/opentelemetry-collector that referenced this issue Sep 25, 2020
* Drain the queue upon shutdown, with a time limit. Fixes open-telemetry#1465.
* Added metrics to the groupbyprocessor, making it easier to understand what's going on in case of problems. See open-telemetry#1811.
* Changes the in-memory storage to unlock its RLock when the method returns. Fixes open-telemetry#1811.

Link to tracking Issue: open-telemetry#1465 and open-telemetry#1811
Testing: unit + manual tests
Documentation: see README.md

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
jpkrohling added a commit to jpkrohling/opentelemetry-collector that referenced this issue Sep 25, 2020
* Drain the queue upon shutdown, with a time limit. Fixes open-telemetry#1465.
* Added metrics to the groupbyprocessor, making it easier to understand what's going on in case of problems. See open-telemetry#1811.
* Changes the in-memory storage to unlock its RLock when the method returns. Fixes open-telemetry#1811.

Link to tracking Issue: open-telemetry#1465 and open-telemetry#1811
Testing: unit + manual tests
Documentation: see README.md

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
tigrannajaryan pushed a commit that referenced this issue Oct 1, 2020
…ring shutdown. (#1842)

Fixed deadlock in groupbytrace processor.

* Drain the queue upon shutdown, with a time limit. Fixes #1465.
* Added metrics to the groupbyprocessor, making it easier to understand what's going on in case of problems. See #1811.
* Changes the in-memory storage to unlock its RLock when the method returns. Fixes #1811.

Link to tracking Issue: #1465 and #1811
Testing: unit + manual tests
Documentation: see README.md

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
MovieStoreGuy pushed a commit to atlassian-forks/opentelemetry-collector that referenced this issue Nov 11, 2021
…y#1465)

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.34.0 to 1.35.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.34.0...v1.35.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tyler Yahn <MrAlias@users.noreply.github.com>
Troels51 pushed a commit to Troels51/opentelemetry-collector that referenced this issue Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Good issue for contributors to OpenTelemetry Service to pick up
Projects
None yet
2 participants