-
Notifications
You must be signed in to change notification settings - Fork 750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
appender: fix WorkerGuard not waiting for writer destruction #1713
Conversation
tracing-appender/src/non_blocking.rs
Outdated
@@ -277,7 +277,16 @@ impl Drop for WorkerGuard { | |||
// when the `Worker` calls `recv()` on a zero-capacity channel. Use `send_timeout` | |||
// so that drop is not blocked indefinitely. | |||
// TODO: Make timeout configurable. | |||
let _ = self.shutdown.send_timeout((), Duration::from_millis(1000)); | |||
match self.shutdown.send_timeout((), Duration::from_millis(1000)) { | |||
Err(SendTimeoutError::Disconnected(_)) => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we get the Disconnected
error, this means that the Receiver
side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet. should we still call join
on the JoinHandle
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we get the Disconnected error, this means that the Receiver side of the channel has already been dropped...but that doesn't necessarily mean that the worker thread has terminated yet.
right, this can be racy also
If Disconnected
can occur only during worker thread (which owns Receiver
) destruction, I think it is safe to use join
here.
Went ahead and merged Disconnected
with Ok
case.
Co-authored-by: Eliza Weisman <eliza@buoyant.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me overall! i commented on a couple of additional things that might be worth thinking about.
if let Err(e) = self.writer.flush() { | ||
eprintln!("Failed to flush. Error: {}", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm assuming the flush here was removed because we will already always flush on shutdown in Worker::work()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's why I felt safe to remove it
but the actual reason was that it could block (or introduce undefined delay)(#1125 (comment)) when joining the worker. To be honest, I would've just joined the thread, but the IO argument does make sense and I just followed with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realised that within this logic it might be a good idea to drop writer as well: 45da512. There could be some IO there in destructor.
Can be though of as a continuation to #1120 and #1125. Example with problematic racy behavior: ``` use std::io::Write; struct TestDrop<T: Write>(T); impl<T: Write> Drop for TestDrop<T> { fn drop(&mut self) { println!("Dropped"); } } impl<T: Write> Write for TestDrop<T> { fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> { self.0.write(buf) } fn flush(&mut self) -> std::io::Result<()> { self.0.flush() } } fn main() { let writer = TestDrop(std::io::stdout()); let (non_blocking, _guard) = tracing_appender::non_blocking(writer); tracing_subscriber::fmt().with_writer(non_blocking).init(); } ``` Running this test case in a loop with `while ./test | grep Dropped; do done`, it can be seen that sometimes writer (`TestDrop`) is not dropped and the message is not printed. I suppose that proper destruction of non-blocking writer should properly destroy underlying writer. Solution involves joining `Worker` thread (that owns writer) after waiting for it to almost finish avoiding potential deadlock (see #1120 (comment))
Motivation
Can be though of as a continuation to #1120 and #1125.
Example with problematic racy behavior:
Running this test case in a loop with
while ./test | grep Dropped; do done
, it can be seen that sometimes writer (TestDrop
) is not dropped and the message is not printed.I suppose that proper destruction of non-blocking writer should properly destroy underlying writer.
Solution
Solution involves joining
Worker
thread (that owns writer) after waiting for it to almost finish avoiding potential deadlock (see #1120 (comment))