Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appender: fix WorkerGuard not waiting for writer destruction #1713

Merged
merged 7 commits into from
Nov 13, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 24 additions & 10 deletions tracing-appender/src/non_blocking.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ pub const DEFAULT_BUFFERED_LINES_LIMIT: usize = 128_000;
#[must_use]
#[derive(Debug)]
pub struct WorkerGuard {
_guard: Option<JoinHandle<()>>,
handle: Option<JoinHandle<()>>,
sender: Sender<Msg>,
shutdown: Sender<()>,
}
Expand Down Expand Up @@ -259,7 +259,7 @@ impl<'a> MakeWriter<'a> for NonBlocking {
impl WorkerGuard {
fn new(handle: JoinHandle<()>, sender: Sender<Msg>, shutdown: Sender<()>) -> Self {
WorkerGuard {
_guard: Some(handle),
handle: Some(handle),
sender,
shutdown,
}
Expand All @@ -268,21 +268,35 @@ impl WorkerGuard {

impl Drop for WorkerGuard {
fn drop(&mut self) {
match self
.sender
.send_timeout(Msg::Shutdown, Duration::from_millis(100))
{
let timeout = Duration::from_millis(100);
match self.sender.send_timeout(Msg::Shutdown, timeout) {
Ok(_) => {
// Attempt to wait for `Worker` to flush all messages before dropping. This happens
// when the `Worker` calls `recv()` on a zero-capacity channel. Use `send_timeout`
// so that drop is not blocked indefinitely.
// TODO: Make timeout configurable.
let _ = self.shutdown.send_timeout((), Duration::from_millis(1000));
let timeout = Duration::from_millis(1000);
match self.shutdown.send_timeout((), timeout) {
Err(SendTimeoutError::Timeout(_)) => {
eprintln!(
"Shutting down logging worker timed out after {:?}.",
timeout
);
}
_ => {
// At this point it is safe to wait for `Worker` destruction without blocking
if let Some(handle) = self.handle.take() {
if handle.join().is_err() {
eprintln!("Logging worker thread panicked");
}
};
}
}
}
Err(SendTimeoutError::Disconnected(_)) => (),
Err(SendTimeoutError::Timeout(e)) => println!(
"Failed to send shutdown signal to logging worker. Error: {:?}",
e
Err(SendTimeoutError::Timeout(_)) => eprintln!(
"Sending shutdown signal to logging worker timed out after {:?}",
timeout
),
}
}
Expand Down
6 changes: 2 additions & 4 deletions tracing-appender/src/worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -73,17 +73,15 @@ impl<T: Write + Send + Sync + 'static> Worker<T> {
match self.work() {
Ok(WorkerState::Continue) | Ok(WorkerState::Empty) => {}
Ok(WorkerState::Shutdown) | Ok(WorkerState::Disconnected) => {
drop(self.writer); // drop now in case it blocks
let _ = self.shutdown.recv();
break;
return;
}
Err(_) => {
// TODO: Expose a metric for IO Errors, or print to stderr
}
}
}
if let Err(e) = self.writer.flush() {
eprintln!("Failed to flush. Error: {}", e);
}
Comment on lines -84 to -86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming the flush here was removed because we will already always flush on shutdown in Worker::work()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's why I felt safe to remove it

but the actual reason was that it could block (or introduce undefined delay)(#1125 (comment)) when joining the worker. To be honest, I would've just joined the thread, but the IO argument does make sense and I just followed with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that within this logic it might be a good idea to drop writer as well: 45da512. There could be some IO there in destructor.

})
}
}