-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shutdown should only be called if Start was called #9682
Comments
@djaglowski I can this up, it you haven't already planned to work on it. |
Thanks @VihasMakwana. I recommend we wait to hear from @open-telemetry/collector-maintainers before working on this. |
I've encountered this functionality of calling
I totally understand the idea of "we shouldn't have to shutdown something that hasn't been started", but I have concerns if we remove this requirement. Happy to share more details on my concern if there's more interest in removing this requirement, or if my documentation link is not relevant. |
Thanks for linking the documentation for this @crobert-1.
To be clear, I'm not suggesting we remove the requirement. I just think that the cited issue demonstrates that due to interactions between components, it may a difficult problem to fully prevent. Therefore, we could choose to prevent some problems by avoiding calls to Shutdown which we know are at best no-ops. |
Okay, appreciate the clarification. Referring back to the Panics are worse than memory leaks, but just wanted to bring up that we're hitting issues either way. Also, this seems like it's adding complexity for the sake of not requiring components to meet the specification, which doesn't seem right to me. Interested to hear from others, just some things to consider. 👍 |
That's exactly why we have the lifecycle tests auto-generated for every component. One part of the test is to call |
@dmitryax, the difference here is that the lifecycle tests only validate each component in isolation. The problem here was an interaction between two components. I think it's a different kind of problem which we can't very well test for, but could prevent. |
Please see the original report here: #6507 Lifecycle tests at least make sure we have each component being responsible of its own lifecycle. We can expand this over time. I believe one action we can take to allow for this is to have each component The journald receiver is one of the last components that doesn't implement lifecycle tests, as per #27849. I would be happy to spend time to make it compatible with generated tests, and it will become more resilient as result. Between generated tests and go leak tests, we are trying to raise the bar for components so we can eventually try for more complex scenarios, such as the ones introduced with OpAmp which might make the collector able to reload its configuration. |
Context: open-telemetry/opentelemetry-collector-contrib#31476 (comment)
Describe the bug
A failure to start any component results in a call to
Shutdown
on all components. Since some components never received aStart
call to begin with, some expected state may not be present.What did you expect to see?
We have had some previous problems with
Shutdown
causing panics becauseStart
was never called. However, if I recall correctly were generally operate under the assumption that a component'sShutdown
SHOULD only be called after the same component'sStart
is called.What did you see instead?
Due to the behavior here, an error returned by any component results in
Shutdown
being called on every component.What version did you use?
v0.95.0
Additional Context
Ideally,
Shutdown
functions should be resilient enough that this problem does not occur in any case. However, it may not always clear to component authors that they need to protect scenarios whereStart
was never called on the component.Potential Solution
When iterating through the list of components to start, keep track of which ones have actually been started. When shutting down, call
Shutdown
on components which hadStart
called.Another minor stability benefit here is that the shutdown order will perfectly correspond to the start order (though reversed). Currently, the order is determined independently via a topological sort. In some cases, this sort may be non-deterministic, so the exact sequence of components is not consistent. This is generally not a problem, but in theory could complicate troubleshooting which relies on component startup order.
The text was updated successfully, but these errors were encountered: