Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PubSub publish timeout #8

Merged
merged 2 commits into from
Apr 26, 2022

Conversation

rnarubin
Copy link
Collaborator

This adds a timeout to publishing messages to PubSub, as well as
improves some logging in the default retry implementation.

The initial timeout implementation here is done a semver-compatible way,
and with a relatively simple configurable fixed timeout. In the next
major semver I'd like to make the configuration required, and
potentially add some time-based retrying and timeout backoff.

@rnarubin rnarubin requested a review from blogle April 22, 2022 21:20
@rnarubin rnarubin self-assigned this Apr 22, 2022
This adds a timeout to publishing messages to PubSub, as well as
improves some logging in the default retry implementation.

The initial timeout implementation here is done a semver-compatible way,
and with a relatively simple configurable fixed timeout. In the next
major semver I'd like to make the configuration required, and
potentially add some time-based retrying and timeout backoff.
@rnarubin rnarubin force-pushed the better_publish_timeout branch from 0bd0ded to b220eea Compare April 22, 2022 23:22
@blogle
Copy link

blogle commented Apr 25, 2022

In the next major semver I'd like to make the configuration required, and potentially add some time-based retrying and timeout backoff.

I see logic for exponential backoff and that code is part of the pubsub client struct. Are we just not using that logic yet, or is this referring to something else?

let publish = client.publish(tonic_request);

// apply a timeout to the publish
let publish_fut = tokio::time::timeout(timeout, publish).map_err(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know all that much about grpc... do we need both a timeout on the client and server side?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Client side is the more obvious one, in case the server never responds, or a network problem, or whatever. Server side is more of a good convention; if the server takes too long and knows the client will timeout, the server is allowed to free resources after that time, which can help responsiveness overall when under load.


// apply a timeout to the publish
let publish_fut = tokio::time::timeout(timeout, publish).map_err(
|tokio::time::error::Elapsed { .. }| {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we hit the server side timeout, does tokio return the same time::error:Elapsed error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server timeout would result in an error from the wrappee publish future. That's why the match is on a double Result<Result<T, Status> Status>. I'd have used flatten but it's not stable yet rust-lang/rust#70142

backoff_ms = %interval.as_millis(),
remaining_attempts = match self.intervals.size_hint() {
(_lower_bound, Some(upper_bound)) => upper_bound,
(lower_bound, None) => lower_bound
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't grok why the lower bound of the iterator would be indicative of the remaining attempts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not exactly indicative. I thought about having "unknown" if there's no upper bound. In practice this will never matter because the only possible iterator will always have an upper bound (.take(max_retries.unwrap_or(usize::MAX)))

@rnarubin
Copy link
Collaborator Author

In the next major semver I'd like to make the configuration required, and potentially add some time-based retrying and timeout backoff.

I see logic for exponential backoff and that code is part of the pubsub client struct. Are we just not using that logic yet, or is this referring to something else?

There's exponential backoff for retries (i.e. try again after increasing sleep). Timeouts could also use such a backoff (i.e. wait for a response with increasing timeout), instead of the current fixed timeout, and potentially the same or similar backoff code. However the current retry logic is count-limited and not time-limited, which means if the backoff is expontential and the timeout is exponential, retrying a series of timed-out requests will take N attempts which might add to many minutes of time. If retries were time-limited, then there'd be a better overall cap to the possible delay from a series of timeouts.

Having time-limited retries is a worthwhile change, but will take some work and I want to get these timeouts in place first.

@rnarubin rnarubin merged commit 53b8cae into standard-ai:master Apr 26, 2022
@rnarubin rnarubin deleted the better_publish_timeout branch April 26, 2022 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants