Add additional metrics for remote byte stores #20138

huonw · 2023-11-01T23:58:27Z

This adds a bunch of additional metrics to provide more insight into the byte store behaviour: for each of the "read", "write" and "list missing digests" (aka "exists") operations, track:

number of attempts
number of successes (for "read", this is further split into "cached" and "uncached", i.e. whether the successful network request found data or not)
number of errors

This mimics the metrics for the remote action cache.

The minor shuffling of code in this PR also fixes a bug with the RemoteStoreBlobBytesDownloaded metric: before this PR, any successful network request would count as bytes downloaded, even if the digest didn't exist/wasn't downloaded (i.e. it successfully determined that the digest wasn't cached). This PR restricts the metric to only count digest bytes when the network request is successful and the digest actually existed/was downloaded.

I'm hopeful this will at least give more insight into the scale of the problem in #20133, in addition to being generally useful as "what's pants up to" reporting.

tgolsson

Looks good to me, with some stylistic comments.

src/rust/engine/remote_provider/remote_provider_opendal/src/lib.rs

tgolsson · 2023-11-02T21:01:37Z

src/rust/engine/fs/store/src/remote.rs

+                    }
+                    Err(_) => Metric::RemoteStoreWriteErrors,
+                };
+                workunit.increment_counter(result_metric, 1);


Merging this single step into the two branches above would make the code flow easier, I think. And same for next file.

As in, have two calls to increment_counter?

It might read a bit better, but my thinking was that it's nice to "guarantee" that we record the result against some metric, on every code path, which this style does better than having to remember to call increment_counter on every branch. Only a minor objection, though.

Prompted by #20138 (comment), this adds two helpers for a common pattern: increment a metric counter or record a metric observation _if_ the current thread has a workunit handle set. As a related drive-by, this also notices that `increment_counter` takes `&mut self` but is happy with `&self` (and similarly for one `record_observation` function), and so swaps it to use `&self`.

…trics

huonw · 2023-11-07T06:34:28Z

I'm going to merge this, @stuhood please let me know if I've missed some context about these metrics and I can fix it up retrospectively.

huonw added 3 commits November 2, 2023 10:30

Basic store read/write attempt/success/fail metrics

e7ceef7

Metrics for existence queries

300a955

Docs

0f47866

huonw added the category:internal CI, fixes for not-yet-released features, etc. label Nov 1, 2023

huonw marked this pull request as ready for review November 2, 2023 04:00

huonw requested review from stuhood and tgolsson November 2, 2023 04:00

tgolsson approved these changes Nov 2, 2023

View reviewed changes

huonw mentioned this pull request Nov 2, 2023

Add helper for conditionally recording metrics #20143

Merged

huonw added 2 commits November 6, 2023 09:32

Merge remote-tracking branch 'upstream/main' into huonw/more-store-me…

83a589c

…trics

Use metric helpers

c37484f

huonw merged commit eeb4905 into main Nov 7, 2023

huonw deleted the huonw/more-store-metrics branch November 7, 2023 06:34

huonw mentioned this pull request Dec 6, 2023

GitHub Actions Cache backend easily hits rate limit errors #20133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional metrics for remote byte stores #20138

Add additional metrics for remote byte stores #20138

huonw commented Nov 1, 2023 •

edited

Loading

tgolsson left a comment

tgolsson Nov 2, 2023

huonw Nov 2, 2023

huonw commented Nov 7, 2023

Add additional metrics for remote byte stores #20138

Add additional metrics for remote byte stores #20138

Conversation

huonw commented Nov 1, 2023 • edited Loading

tgolsson left a comment

Choose a reason for hiding this comment

tgolsson Nov 2, 2023

Choose a reason for hiding this comment

huonw Nov 2, 2023

Choose a reason for hiding this comment

huonw commented Nov 7, 2023

huonw commented Nov 1, 2023 •

edited

Loading