-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logstransformprocessor deadlocks under load #16604
Comments
This is a similar problem to what is pointed out in #15378 . I believe the right fix is that this processor should create a completely new stanza pipeline for each invocation. Then, leveraging the back propagation work @sumo-drosiek has in his PR #16452 , loop over and receive from the channel until all messages are processed. In my case, I want to leverage the "move" transformer because it appears to be the only transform processor that supports arbitrary path walking on an object. As a temporary workaround for myself, I've invoked the batch process of fromPDataConverter in a go routine and then put a timeout on the output select. However, while that does help prevent the deadlock, the number of go routines over time will continue to build up as there will be a mismatch of the number of routines selecting from the channel vs the number of messages being pushed onto the channel. |
Here is a commit that works for my particular case to ensure processLogs handles the logs pushed to it synchronously. However, it might have issues if any of the stanza's are filters like in the other issue linked above. I'll wait for some feedback before I move forward on any of these fixes. |
Pinging code owners for processor/logstransform: @djaglowski @dehaansa. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Resolved by #17079 |
Component(s)
processor/logstransform
What happened?
Description
There is a deadlock scenario caused in the logs transform processor under load.
The primary issue is that every invocation of
logstransformprocessor.processLogs
shares the same stanza pipeline and a common instance of the emitter.Each invocation pushes log messages into the processing channel pipeline (blocking).
Then waits on an output channel that is at the tail end of that pipeline.
However, because of the implementation of the shared emitter, each invocation to processLogs isn't necessarily going to receive the logs that came through it's pipeline. As each log message being sent through the stanza pipeline is processed, it gets pushed into a batch in LogEmitter. Logs pushed to the outputChannel to be processed only occur if at least 100 (max batch size) messages were pushed in or the timeout occurs.
It can cause a situation where multiple concurrent invocations push messages to be processed, but only 1 invocation receives the batch to select from. Each additional invocation is blocked waiting to select from the outputChannel that has nothing in it.
The bottom 3 invocations are blocked because there are no more logs being emitted. If a new invocation comes in, only one of the blocked routines will become unblocked. If the upstream receivers of those pipelines do not timeout to cancel their context, it effectively creates a deadlock.
Steps to Reproduce
I have example test cases here where you can see the effect of this.
7fb27bb
Expected Result
Each batch of log messages pushed into logstransform should be fully processed by processLogs.
Actual Result
The logs output by each call to processLogs are somewhat arbitrary because emitted batches could contain a combination of logs from other asynchronous invocations.
Collector version
v0.66.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
No response
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: