Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

TransactionScheduler: detailed consume worker metrics #33895

Merged
merged 2 commits into from
Nov 20, 2023

Conversation

apfitzge
Copy link
Contributor

Problem

  • ConsumeWorker threads to not report any detailed transaction processing metrics

Summary of Changes

  • Add detailed metrics to be collected by consume worker threads
  • Because worker threads are lazy i.e. they sleep unless there's work to do; the scheduler controller will report worker metrics

Fixes #

@codecov
Copy link

codecov bot commented Oct 27, 2023

Codecov Report

Merging #33895 (8201125) into master (8c8cd66) will decrease coverage by 0.1%.
Report is 12 commits behind head on master.
The diff coverage is 93.6%.

❗ Current head 8201125 differs from pull request most recent head 76ca5ea. Consider uploading reports for the commit 76ca5ea to get more accurate results

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #33895     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         819      819             
  Lines      220119   220324    +205     
=========================================
+ Hits       180390   180554    +164     
- Misses      39729    39770     +41     

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 13, 2023
@apfitzge apfitzge removed the stale [bot only] Added to stale content; results in auto-close after a week. label Nov 13, 2023
@apfitzge apfitzge force-pushed the consume_worker_metrics branch from c7b1fff to 8201125 Compare November 17, 2023 20:50
Comment on lines 169 to 170
if self.interval.should_update(REPORT_INTERVAL_MS)
&& self.latch.swap(false, Ordering::Relaxed)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the 3 metric kinds (count, error, timing) all share the same report interval and this latch variable. The latch is used to tell us if anything happened, i.e. the worker thread didn't just sleep the entire time. When worker receives work it sets the latch to true, and it will be reset to false (by the scheduler) on the first interval expiration after that.

@apfitzge apfitzge marked this pull request as ready for review November 17, 2023 22:08
@apfitzge apfitzge requested a review from tao-stones November 17, 2023 22:08
Copy link
Contributor

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - this set of metrics combined with QoS and banking stats should provide a good view of how scheduler does

@apfitzge apfitzge merged commit 8a298f1 into solana-labs:master Nov 20, 2023
17 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants