-
-
Notifications
You must be signed in to change notification settings - Fork 646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional metrics for remote byte stores #20138
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, with some stylistic comments.
src/rust/engine/remote_provider/remote_provider_opendal/src/lib.rs
Outdated
Show resolved
Hide resolved
} | ||
Err(_) => Metric::RemoteStoreWriteErrors, | ||
}; | ||
workunit.increment_counter(result_metric, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging this single step into the two branches above would make the code flow easier, I think. And same for next file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, have two calls to increment_counter
?
It might read a bit better, but my thinking was that it's nice to "guarantee" that we record the result against some metric, on every code path, which this style does better than having to remember to call increment_counter
on every branch. Only a minor objection, though.
Prompted by #20138 (comment), this adds two helpers for a common pattern: increment a metric counter or record a metric observation _if_ the current thread has a workunit handle set. As a related drive-by, this also notices that `increment_counter` takes `&mut self` but is happy with `&self` (and similarly for one `record_observation` function), and so swaps it to use `&self`.
I'm going to merge this, @stuhood please let me know if I've missed some context about these metrics and I can fix it up retrospectively. |
This adds a bunch of additional metrics to provide more insight into the byte store behaviour: for each of the "read", "write" and "list missing digests" (aka "exists") operations, track:
This mimics the metrics for the remote action cache.
The minor shuffling of code in this PR also fixes a bug with the
RemoteStoreBlobBytesDownloaded
metric: before this PR, any successful network request would count as bytes downloaded, even if the digest didn't exist/wasn't downloaded (i.e. it successfully determined that the digest wasn't cached). This PR restricts the metric to only count digest bytes when the network request is successful and the digest actually existed/was downloaded.I'm hopeful this will at least give more insight into the scale of the problem in #20133, in addition to being generally useful as "what's pants up to" reporting.