Emit metrics for record_progress endpoint #709
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously we were only tracking the worker time, not the endpoint. We see that there is a direct correlation with the throughput of a job and the worker time. This seems wrong to me, because as long as the worker is keeping up with the input rate, the throughput shouldn't be affected.
Note that we believe that the worker should not affect the HTTP endpoint at all - we connect these with a bounded queue and pushing into the queue is done with
try_send
, which shouldn't block (https://docs.rs/crossbeam-channel/latest/crossbeam_channel/struct.Sender.html#method.try_send) and returns an error if the queue is full. We already emit a metric if the queue is full, and that's not happening here.The hope is that the extra metric here gives us some clue for what the problem is.
Metric graphs: