feat: new estimator variance check #7197

jakmeier · 2022-07-15T17:57:10Z

The condition to check for high variance of measurements so far has been
that no single sample is further than 15% from the mean away. We often
missed that when running with 5 iterations and one warm-up iteration.
The counter-intuitive side of this check is that is becomes less likely
that variance is considered low enough, if more iterations are added.

The new condition computes actual variance and standard deviation.
Now we expect that standard deviation is lower than 10%.

Also add an outer loop to block overhead computation. Usually there
is a loop + a number of actions inside a block. But this estimation
has empty blocks and therefore each measurement so far was
just a single apply block call. Grouping 10 empty blocks together
for each measurement and adding an outer loop for --iters reduces
the overall variance.

The condition to check for high varaince of measurments so far has been that no single sample is further than 15% from the mean away. We often missed that when running with 5 iterations and one warm-up iteration. The counter-intuitive side of this check is that is becomes less likely that variance is considered low enough, if more iterations are added. The new condition computes actual variance and standard deviation. Now we expect that standard deviation is lower than 10%.

jakmeier · 2022-07-15T18:01:05Z

Main motivation behind this: CE results have a lot of UNCERTAIN HIGH-VARIANCE results.

For time measurements, the block overhead in particular was marked uncertain. And now that it gets subtracted from many other measurements, this infects all those other estimations that then also get marked uncertain.

matklad

This is reasonable!

I wonder though, should we add an extra check for outliers (eg, check that no measurement is 2x larger than all other ones, or something)?

Sometimes we have a case where, eg, a first iteration is very slow (due to some caching), and it'd be useful to catch those specifically.

matklad · 2022-07-18T08:56:31Z

runtime/runtime-params-estimator/src/gas_cost.rs

@@ -441,6 +441,16 @@ impl ops::AddAssign for GasCost {
    }
 }

+impl iter::Sum for GasCost {
+    fn sum<I: Iterator<Item = Self>>(mut iter: I) -> Self {
+        let mut accum = iter.next().unwrap_or(GasCost::zero(GasMetric::Time));


Yeah, I recall it being tricky to deal with the case where there are no gas costs, so you don't know the metric. If we rely on this being a newtral element(eg, even if the actual metric is icont), then perhaps let's add zero-arg (GasCost::zero())?

Actually, thinking about this more, the reason why GasCost tracks metric is historical:

originally, we measured only time or only icount, and had to thread metric throughout. Since then, we've switched to multidimentional metric (ie, we count IO costs separately).

So perhaps a bigger refactor is due, where we say that GasCost is a product of measurements fro several metrics, where each metric is an Option (so, each specific gas cost knows what metrics did it actually track). Then zero would naturlaly return all components as Some(0) (ie, zero includes all metrics, measured as zero) .

Yeah such a refactor makes sense, I'll do it after merging this.

matklad · 2022-07-18T08:59:02Z

runtime/runtime-params-estimator/src/utils.rs

-
-    !all_below_threshold
+    let variance =
+        samples.iter().map(|value| (mean - *value).powi(2)).sum::<f64>() / samples.len() as f64;


if samples.len() <= 1 { return f64::Inf } / (samples.len() - 1)

I guessed you meant to say two things here

the method should be robust to all type-correct inputs, such as empty samples (=> I changed it to return true in such case.)

We should use sample standard deviation instead, i.e. divide by n-1 instead of n. (=> done)

If I misinterpreted your comment, please let me know :)

jakmeier · 2022-07-18T18:13:44Z

I wonder though, should we add an extra check for outliers (eg, check that no measurement is 2x larger than all other ones, or something)?

Sometimes we have a case where, eg, a first iteration is very slow (due to some caching), and it'd be useful to catch those specifically.

@matklad If I understand your concern correctly, then I believe variance is already good enough for that because it is quadratically sensitive to large deviations. Let me maybe put some numbers on the table as an example. CE runs with 1 warm-up iteration and 5 measured iterations. If the non-warm-up results were [200,100,100,100,100] then mean = 120, variance = 2000, and standard deviation = 44.72. So the percentage of stddev / mean = 37.27%, far above the 10% the new check is set at.

Or do you suggest to add an extra check, in the sense that we would mark it not as "HIGH-VARIANCE" in such a case but instead "SINGLE-OUTLIER"?

matklad · 2022-07-19T09:33:48Z

I believe variance is already good enough for that because it is quadratically sensitive to large deviations

Yeah, this makes sense, I guess just variance is enough then!

The condition to check for high variance of measurements so far has been that no single sample is further than 15% from the mean away. We often missed that when running with 5 iterations and one warm-up iteration. The counter-intuitive side of this check is that is becomes less likely that variance is considered low enough, if more iterations are added. The new condition computes actual variance and standard deviation. Now we expect that standard deviation is lower than 10%. Also add an outer loop to block overhead computation. Usually there is a loop + a number of actions inside a block. But this estimation has empty blocks and therefore each measurement so far was just a single apply block call. Grouping 10 empty blocks together for each measurement and adding an outer loop for `--iters` reduces the overall variance.

* Display block and chunk producers of an epoch. * Epoch length * test: simplify mocking in setup_mock_all_validators (#7194) There are many lovable aspects to `setup_mock_all_validators`! What this PR deals with is "tying the knot" infra. `setup_mock_validator` is based on creating `n` validators and setting up the mock network between them. To create a mock network, you need all `n` validators, and to create a validator, you need a mock network. The cycle is broken at runtime, in three different ways: * inside `setup_mock_all_validators`, we have `locked_connectors` * at some call-sites, we overwrite the whole `peer_manager_mock` * yet at other call-sites, we essentially create a copy of `locked_connectors` This PR unifies that to a single infra: * there's a single `Arc<OnceCell<Vec<Addr<Client>>>>` inside `setup_mock_all_validators` * `OnceCell::wait` is used as a barrier * The `&[Addr<CLient>]` are passed in as an argument to `peer_manager_mock`, so that it doesn't have to re-invent a way to get a hold of those This changes the signature of the function, so the diff is pretty large. It is designed to be reviewed per-commit. * Fix m1 build issue (#7199) * renamed Validator to AccountData (#7206) https://nearinc.atlassian.net/browse/CPR-94 * feat: new estimator variance check (#7197) The condition to check for high variance of measurements so far has been that no single sample is further than 15% from the mean away. We often missed that when running with 5 iterations and one warm-up iteration. The counter-intuitive side of this check is that is becomes less likely that variance is considered low enough, if more iterations are added. The new condition computes actual variance and standard deviation. Now we expect that standard deviation is lower than 10%. Also add an outer loop to block overhead computation. Usually there is a loop + a number of actions inside a block. But this estimation has empty blocks and therefore each measurement so far was just a single apply block call. Grouping 10 empty blocks together for each measurement and adding an outer loop for `--iters` reduces the overall variance. * test: more useful assert (#7211) Got this one spuriously(?) firing in some test with nightly features. Let's make the failure a bit more helpful here! * trie_key: don’t store single bytes as slices (#7200) trie_key module stores separators and keys as slices even though they are all just one-byte long. There are parts of the code which assume the length to be one and won’t work if the length ever changes. Simplify the code by keeping the values as plain bytes rather than slices so that it’s explicit in their type that they are always just one byte. * test: simplify cops support in KeyValue runtime (#7213) * eagarly compute assignemt of validators to blocks & chunks, which I find a bit easier to wrap my head around (you can print `EpochValidatorSet`s and just see the result) * compress the code by centralizing cfg handling * Refactor: remove some dead code (#7216) Closes #7203 There are a few structs that do not need to derive `Serialize` and `Deserialize`. Remove the derives highlights that there are some structs that we do not need at all and that we do not need to define custom deserializer: `bytes_as_string`. Tests ------- - `cargo check --tests --benches` does not complain. - Other existing tests * refactor: remove metric field in `GasCost` (#7215) Explicitly tracking the metric inside `GasCost` is awkward for zero values and it requires the estimator configuration to be passed around more than strictly necessary. This change makes the metric implicit, by virtue of having or not having the measurement values stored inside. For conversion to gas, and when printing, `ICount` measurements are given priority over `Time` measurements when for both values exist. This could be changed, if both need to be tracked simultaneously, for now only one of them is of interest at the time. * export rocksdb cf size as prometheus metric (#6635) Creating a function to get int rocks db properties for every column and adding a way to export it as a prometheus metric. Reorganising parse_statistics to write stats into provided result param. * refactor: `call_main` utility for integration test (#7217) We have been copying the same code multiple times, this refactor pulls out the functionality the `TestEnv`. * Replace a map with a for-loop Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com> Co-authored-by: Phuong Nguyen <ChaoticTempest@users.noreply.github.com> Co-authored-by: pompon0 <pompon.pompon@gmail.com> Co-authored-by: Jakob Meier <mail@jakobmeier.ch> Co-authored-by: Michal Nazarewicz <mina86@mina86.com> Co-authored-by: Akhilesh Singhania <akhi3030@gmail.com> Co-authored-by: posvyatokum <posvyatokum@edu.hse.ru>

jakmeier requested a review from matklad July 15, 2022 17:57

jakmeier marked this pull request as ready for review July 18, 2022 07:52

jakmeier requested a review from a team as a code owner July 18, 2022 07:52

matklad approved these changes Jul 18, 2022

View reviewed changes

sample standard deviation

9c56e7a

matklad approved these changes Jul 19, 2022

View reviewed changes

Merge branch 'master' into estimator-variance

40c20c6

jakmeier added the S-automerge label Jul 19, 2022

near-bulldozer bot merged commit 39ce78d into near:master Jul 19, 2022

jakmeier mentioned this pull request Jul 21, 2022

ActionCreateAccount measurement is volatile #4802

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: new estimator variance check #7197

feat: new estimator variance check #7197

jakmeier commented Jul 15, 2022

jakmeier commented Jul 15, 2022

matklad left a comment

matklad Jul 18, 2022

jakmeier Jul 18, 2022

matklad Jul 18, 2022

jakmeier Jul 18, 2022

jakmeier commented Jul 18, 2022

matklad commented Jul 19, 2022

feat: new estimator variance check #7197

feat: new estimator variance check #7197

Conversation

jakmeier commented Jul 15, 2022

jakmeier commented Jul 15, 2022

matklad left a comment

Choose a reason for hiding this comment

matklad Jul 18, 2022

Choose a reason for hiding this comment

jakmeier Jul 18, 2022

Choose a reason for hiding this comment

matklad Jul 18, 2022

Choose a reason for hiding this comment

jakmeier Jul 18, 2022

Choose a reason for hiding this comment

jakmeier commented Jul 18, 2022

matklad commented Jul 19, 2022