Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional metrics for remote byte stores #20138

Merged
merged 5 commits into from
Nov 7, 2023
Merged

Conversation

huonw
Copy link
Contributor

@huonw huonw commented Nov 1, 2023

This adds a bunch of additional metrics to provide more insight into the byte store behaviour: for each of the "read", "write" and "list missing digests" (aka "exists") operations, track:

  • number of attempts
  • number of successes (for "read", this is further split into "cached" and "uncached", i.e. whether the successful network request found data or not)
  • number of errors

This mimics the metrics for the remote action cache.

The minor shuffling of code in this PR also fixes a bug with the RemoteStoreBlobBytesDownloaded metric: before this PR, any successful network request would count as bytes downloaded, even if the digest didn't exist/wasn't downloaded (i.e. it successfully determined that the digest wasn't cached). This PR restricts the metric to only count digest bytes when the network request is successful and the digest actually existed/was downloaded.

I'm hopeful this will at least give more insight into the scale of the problem in #20133, in addition to being generally useful as "what's pants up to" reporting.

@huonw huonw added the category:internal CI, fixes for not-yet-released features, etc. label Nov 1, 2023
@huonw huonw marked this pull request as ready for review November 2, 2023 04:00
@huonw huonw requested review from stuhood and tgolsson November 2, 2023 04:00
Copy link
Contributor

@tgolsson tgolsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, with some stylistic comments.

}
Err(_) => Metric::RemoteStoreWriteErrors,
};
workunit.increment_counter(result_metric, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging this single step into the two branches above would make the code flow easier, I think. And same for next file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, have two calls to increment_counter?

It might read a bit better, but my thinking was that it's nice to "guarantee" that we record the result against some metric, on every code path, which this style does better than having to remember to call increment_counter on every branch. Only a minor objection, though.

huonw added a commit that referenced this pull request Nov 5, 2023
Prompted by
#20138 (comment),
this adds two helpers for a common pattern: increment a metric counter
or record a metric observation _if_ the current thread has a workunit
handle set.

As a related drive-by, this also notices that `increment_counter` takes
`&mut self` but is happy with `&self` (and similarly for one
`record_observation` function), and so swaps it to use `&self`.
@huonw
Copy link
Contributor Author

huonw commented Nov 7, 2023

I'm going to merge this, @stuhood please let me know if I've missed some context about these metrics and I can fix it up retrospectively.

@huonw huonw merged commit eeb4905 into main Nov 7, 2023
@huonw huonw deleted the huonw/more-store-metrics branch November 7, 2023 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:internal CI, fixes for not-yet-released features, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants