Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: new estimator variance check #7197

Merged
merged 3 commits into from
Jul 19, 2022

Conversation

jakmeier
Copy link
Contributor

The condition to check for high variance of measurements so far has been
that no single sample is further than 15% from the mean away. We often
missed that when running with 5 iterations and one warm-up iteration.
The counter-intuitive side of this check is that is becomes less likely
that variance is considered low enough, if more iterations are added.

The new condition computes actual variance and standard deviation.
Now we expect that standard deviation is lower than 10%.

Also add an outer loop to block overhead computation. Usually there
is a loop + a number of actions inside a block. But this estimation
has empty blocks and therefore each measurement so far was
just a single apply block call. Grouping 10 empty blocks together
for each measurement and adding an outer loop for --iters reduces
the overall variance.

The condition to check for high varaince of measurments so far has been
that no single sample is further than 15% from the mean away. We often
missed that when running with 5 iterations and one warm-up iteration.
The counter-intuitive side of this check is that is becomes less likely
that variance is considered low enough, if more iterations are added.

The new condition computes actual variance and standard deviation.
Now we expect that standard deviation is lower than 10%.
@jakmeier jakmeier requested a review from matklad July 15, 2022 17:57
@jakmeier
Copy link
Contributor Author

Main motivation behind this: CE results have a lot of UNCERTAIN HIGH-VARIANCE results.

For time measurements, the block overhead in particular was marked uncertain. And now that it gets subtracted from many other measurements, this infects all those other estimations that then also get marked uncertain.

@jakmeier jakmeier marked this pull request as ready for review July 18, 2022 07:52
@jakmeier jakmeier requested a review from a team as a code owner July 18, 2022 07:52
Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reasonable!

I wonder though, should we add an extra check for outliers (eg, check that no measurement is 2x larger than all other ones, or something)?

Sometimes we have a case where, eg, a first iteration is very slow (due to some caching), and it'd be useful to catch those specifically.

@@ -441,6 +441,16 @@ impl ops::AddAssign for GasCost {
}
}

impl iter::Sum for GasCost {
fn sum<I: Iterator<Item = Self>>(mut iter: I) -> Self {
let mut accum = iter.next().unwrap_or(GasCost::zero(GasMetric::Time));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I recall it being tricky to deal with the case where there are no gas costs, so you don't know the metric. If we rely on this being a newtral element(eg, even if the actual metric is icont), then perhaps let's add zero-arg (GasCost::zero())?

Actually, thinking about this more, the reason why GasCost tracks metric is historical:

originally, we measured only time or only icount, and had to thread metric throughout. Since then, we've switched to multidimentional metric (ie, we count IO costs separately).

So perhaps a bigger refactor is due, where we say that GasCost is a product of measurements fro several metrics, where each metric is an Option (so, each specific gas cost knows what metrics did it actually track). Then zero would naturlaly return all components as Some(0) (ie, zero includes all metrics, measured as zero) .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah such a refactor makes sense, I'll do it after merging this.


!all_below_threshold
let variance =
samples.iter().map(|value| (mean - *value).powi(2)).sum::<f64>() / samples.len() as f64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if samples.len() <= 1 { return f64::Inf }
/ (samples.len() - 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guessed you meant to say two things here

  1. the method should be robust to all type-correct inputs, such as empty samples (=> I changed it to return true in such case.)
  2. We should use sample standard deviation instead, i.e. divide by n-1 instead of n. (=> done)

If I misinterpreted your comment, please let me know :)

@jakmeier
Copy link
Contributor Author

I wonder though, should we add an extra check for outliers (eg, check that no measurement is 2x larger than all other ones, or something)?

Sometimes we have a case where, eg, a first iteration is very slow (due to some caching), and it'd be useful to catch those specifically.

@matklad If I understand your concern correctly, then I believe variance is already good enough for that because it is quadratically sensitive to large deviations. Let me maybe put some numbers on the table as an example. CE runs with 1 warm-up iteration and 5 measured iterations. If the non-warm-up results were [200,100,100,100,100] then mean = 120, variance = 2000, and standard deviation = 44.72. So the percentage of stddev / mean = 37.27%, far above the 10% the new check is set at.

Or do you suggest to add an extra check, in the sense that we would mark it not as "HIGH-VARIANCE" in such a case but instead "SINGLE-OUTLIER"?

@matklad
Copy link
Contributor

matklad commented Jul 19, 2022

I believe variance is already good enough for that because it is quadratically sensitive to large deviations

Yeah, this makes sense, I guess just variance is enough then!

@near-bulldozer near-bulldozer bot merged commit 39ce78d into near:master Jul 19, 2022
nikurt pushed a commit that referenced this pull request Jul 20, 2022
The condition to check for high variance of measurements so far has been
that no single sample is further than 15% from the mean away. We often
missed that when running with 5 iterations and one warm-up iteration.
The counter-intuitive side of this check is that is becomes less likely
that variance is considered low enough, if more iterations are added.

The new condition computes actual variance and standard deviation.
Now we expect that standard deviation is lower than 10%.

Also add an outer loop to block overhead computation. Usually there
is a loop + a number of actions inside a block. But this estimation
has empty blocks and therefore each measurement so far was
just a single apply block call. Grouping 10 empty blocks together
for each measurement and adding an outer loop for `--iters` reduces 
the overall variance.
near-bulldozer bot pushed a commit that referenced this pull request Jul 21, 2022
* Display block and chunk producers of an epoch.

* Epoch length

* test: simplify mocking in setup_mock_all_validators (#7194)

There are many lovable aspects to `setup_mock_all_validators`! What this PR deals with is "tying the knot" infra. `setup_mock_validator` is based on creating `n` validators and setting up the mock network between them. To create a mock network, you need all `n` validators, and to create a validator, you need a mock network. The cycle is broken at runtime, in three different ways: 

* inside `setup_mock_all_validators`, we have `locked_connectors` 
* at some call-sites, we overwrite the whole `peer_manager_mock`
* yet at other call-sites, we essentially create a copy of `locked_connectors`

This PR unifies that to a single infra: 

* there's a single `Arc<OnceCell<Vec<Addr<Client>>>>` inside `setup_mock_all_validators`
* `OnceCell::wait` is used as a barrier 
* The `&[Addr<CLient>]` are passed in as an argument to `peer_manager_mock`, so that it doesn't have to re-invent a way to get a hold of those

This changes the signature of the function, so the diff is pretty large. It is designed to be reviewed per-commit.

* Fix m1 build issue (#7199)

* renamed Validator to AccountData (#7206)

https://nearinc.atlassian.net/browse/CPR-94

* feat: new estimator variance check (#7197)

The condition to check for high variance of measurements so far has been
that no single sample is further than 15% from the mean away. We often
missed that when running with 5 iterations and one warm-up iteration.
The counter-intuitive side of this check is that is becomes less likely
that variance is considered low enough, if more iterations are added.

The new condition computes actual variance and standard deviation.
Now we expect that standard deviation is lower than 10%.

Also add an outer loop to block overhead computation. Usually there
is a loop + a number of actions inside a block. But this estimation
has empty blocks and therefore each measurement so far was
just a single apply block call. Grouping 10 empty blocks together
for each measurement and adding an outer loop for `--iters` reduces 
the overall variance.

* test: more useful assert (#7211)

Got this one spuriously(?) firing in some test with nightly features.
Let's make the failure a bit more helpful here!

* trie_key: don’t store single bytes as slices (#7200)

trie_key module stores separators and keys as slices even though they
are all just one-byte long.  There are parts of the code which assume
the length to be one and won’t work if the length ever changes.

Simplify the code by keeping the values as plain bytes rather than
slices so that it’s explicit in their type that they are always just
one byte.

* test: simplify cops support in KeyValue runtime (#7213)

* eagarly compute assignemt of validators to blocks & chunks, which I find a bit easier to wrap my head around (you can print `EpochValidatorSet`s and just see the result)
* compress the code by centralizing cfg handling

* Refactor: remove some dead code (#7216)

Closes #7203 

There are a few structs that do not need to derive `Serialize` and `Deserialize`.  Remove the derives highlights that there are some structs that we do not need at all and that we do not need to define custom deserializer: `bytes_as_string`.

Tests
-------
- `cargo check --tests --benches` does not complain.
- Other existing tests

* refactor: remove metric field in `GasCost` (#7215)

Explicitly tracking the metric inside `GasCost` is awkward for zero
values and it requires the estimator configuration to be passed around
more than strictly necessary.

This change makes the metric implicit, by virtue of having or not having
the measurement values stored inside.

For conversion to gas, and when printing, `ICount` measurements are
given priority over `Time` measurements when for both values exist.
This could be changed, if both need to be tracked simultaneously, for
now only one of them is of interest at the time.

* export rocksdb cf size as prometheus metric (#6635)

Creating a function to get int rocks db properties for every column and adding a way to export it as a prometheus metric.
Reorganising parse_statistics to write stats into provided result param.

* refactor: `call_main` utility for integration test (#7217)

We have been copying the same code multiple times, this refactor
pulls out the functionality the `TestEnv`.

* Replace a map with a for-loop

Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>
Co-authored-by: Phuong Nguyen <ChaoticTempest@users.noreply.github.com>
Co-authored-by: pompon0 <pompon.pompon@gmail.com>
Co-authored-by: Jakob Meier <mail@jakobmeier.ch>
Co-authored-by: Michal Nazarewicz <mina86@mina86.com>
Co-authored-by: Akhilesh Singhania <akhi3030@gmail.com>
Co-authored-by: posvyatokum <posvyatokum@edu.hse.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants