Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Flat and differential runtime config files #6693

Merged
merged 18 commits into from
May 10, 2022

Conversation

jakmeier
Copy link
Contributor

@jakmeier jakmeier commented Apr 25, 2022

Replace the JSON runtime config file per version with a base
parameters.txt file plus differential files per version.
The RuntimeConfig remains untouched. But it is now loaded from the
flat list of parameters, that converts it to JSON structured data following
the exact layout we use in code, that is then converted to an actual struct
using serde derived code.

The new TXT files are written in a custom format for key-values separated
by a column. Keys must match a defined parameter and values are strings
or integers. Differential files include old and new values with a -> between
them.

The direct benefit of this is better readability for config changes and
an easier and less error-prone process of doing such changes.

Indirectly, this could be the first step to further clean up code
around gas costs.
(1) the gas profile could use these parameters to track costs instead
of its own enum Cost.
(2) The params-estimator can link estimations to params more easily.

Test plan

The usage of RuntimeConfigStore has not changed, only how it is
created. Therefore, tests can ensure all stored RuntimeConfig
versions are exactly the same as before. And this should be sufficient
to confirm this change has no side-effects.

To implement this, we keep the old JSON files in the legacy_configs
directory. The test test_old_and_new_runtime_config_format_match then
iterates through all existing versions and compares the serde
deserialization with configs loaded from TXT files. This is all we need to
verify correctness of the parameters loaded through the new files.

The new TXT file format also comes with custom parsing logic. Therefore,
a few unit tests to make sure it behaves the way we think in all
circumstances.

Replace the JSON runtime config file per version with a base
`parameters.txt` file plus differntial files per version.
The `RuntimeConfig` remains untouched. But it is now loaded from the
flat list of parameters, instead of through serde derived code.

The direct benefit of this is better readibilty for config changes and
an easier and less error-prone process of doing such changes.

Indirectly, this could be the first step to further clean up code
around gas costs.
(1) the gas profile could use these parameters to track costs instead
of its own `enum Cost`.
(2) The params-estimator can link estimations to params more easily.

Test plan
---------
The usage of `RuntimeConfigStore` has not changed, only how it is
created. Therefore, tests can ensure all stored `RuntimeConfig`
versions are exactly the same as before. And this should be sufficient
to confirm this change has no side-effects.

To implement this, we keep the old JSON files in the `legacy_configs`
directory. The test `test_old_and_new_runtime_config_format_match` then
iterates through all existing versions and compares the serde
deserialization with configs loaded from TXT files.
@jakmeier jakmeier requested a review from matklad April 25, 2022 20:02
@jakmeier jakmeier requested a review from a team as a code owner April 25, 2022 20:02
alt_bn128_g1_sum_element: read_parameter(params, Parameter::WasmAltBn128G1SumElement),
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's maybe move these bulky statements to the parameter module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean all the constructors? (from_parameters, test and free) Or just the new one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My latest commit moves all new constructors to parameter_table but left other constructors where they are. Not sure if this is what you had in mind.

MaxLocalsPerContract,
}

pub fn load_parameters_from_txt(arg: &[u8]) -> BTreeMap<Parameter, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn load_parameters_from_txt(arg: &[u8]) -> BTreeMap<Parameter, String> {
pub(crate) fn load_parameters_from_txt(arg: &[u8]) -> BTreeMap<Parameter, String> {

I think it makes sense to avoid pub as much as possible. Cross-crate deps are the only deps that matter, so they should stand out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I defined this in primitives-core crate, where as all callers are in primitives crate. So your suggestion doesn't work out of the box. But is also seems wrong to have code define on crate A and only call it in crate B.

However, the type Parameter must be in primitives-core because it should be the new source-of-truth replacing some types in core/primitives-core/src/profile.rs. And primitives-core is not allowed to depend on other crates, according to our architecture.md.

So, in my latest commit, I moved all the logic to construct over to primitives. To still have the constructor syntax like VMLimitConfig::from_parameters(params), I defined an extension trait FromParameterTable. (I can also rewrite to params.vm_limit_config() if you think that's better.)

This refactor sort of works but also brings new problems. I previously liked that each config struct had all constructors (for testing and for production) right next to each other, whereas now they are defined in different crates. And there are other things, like wasmer2_stack_limit_default that used to be private but now turned public in the refactor.

Not sure if it is an overall improvement or not.

@matklad matklad requested a review from Longarithm April 27, 2022 11:40
@jakmeier
Copy link
Contributor Author

I wrote up the motivation behind this in a separate issue: #6710

The new type `ParameterTable` wraps the BTreeMap<Parameter,String>
that holds a list of parameter - value pairs.

Move all logic around the parameter table to the primitives crate,
as opposed to `primitives-core` where it was before.

Also includes other small fixes from review comments.
jakmeier added 5 commits May 2, 2022 19:11
The `ParameterTable` now contains `serde_json::Value` instead of a
`String`. To fill it in, a simple custom parsing function is used.
The method `runtime_config_json` converts the table to a JSON
representation using the correct structure.
Derived serde code then translates to a `RuntimeConfig`.
@jakmeier
Copy link
Contributor Author

jakmeier commented May 3, 2022

Okay, this is ready for re-review. You probably want to look through all files again but here is the summary of my changes.

  • The ParameterTable now stores serde_json::Value instead of String.
  • To convert from ParameterTable to RuntimeConfig, I go through JSON and serde-derived code. While creating the JSON, the WASM external costs and the limit configurations are constructed by iterating over the corresponding parameters. All other parameters are still placed manually.
  • Fees, that is all action + receipt costs that have a split between send_sir/send_not_sir/execution are listed in enum FeeParameter which makes constructing the JSON a bit easier.
  • Differential and base files now use slightly different format. (base: key: value and diff: key: new -> old)
  • Addressed all comments from reviews so far

I think the big-picture questions left for me are:

  1. Are we all okay with using a custom format? Or should we use a standard format. I could live with both and the work to go from what we have now to a standard format is moderate.
  2. Does the TXT -> JSON -> Rust struct strategy make sense the way I implemented it? It is roughly what I understood based on review comments and a quick discussion with matklad yesterday. I think in principle it does make a lot of sense to use serde for the final type fitting. But there are a couple of rough corners when actually implementing it. Ideas for improvement are welcome.
  3. Have I forgotten things that could benefit from tests?

@jakmeier jakmeier requested review from nagisa and matklad May 3, 2022 10:08
// We do not have "arbitrary_precision" serde feature enabled, thus we
// can only store up to `u64::MAX`, which is `18446744073709551615` and
// has 20 characters.
if raw_number.len() < 20 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we happy with the behaviour here when somebody specifies 10^19 which is 20 digits long but is still 263.11 and fits into u64? serde_json does have an error code for out-of-range numbers internally but sadly it isn't exposed :(

If not, I'm worried that serde_json might not actually be the right tool for us tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I think I am okay with that. Mostly because I am committed to not touch the RuntimeConfig and its old deserialization code in this PR. This makes it easier to argue that this is a non-observable change, based on the tests provided.

What I find more troublesome is that values using #[serde(with = "u128_dec_format")] must use a string, not a number, regardless of whether it fits in a u64. I tend towards keeping it as-is for now and fixing it when needed. (It will fail in many tests if we try to use values that are shorter than 20 characters for a value parsed with u128_dec_format.)

But I am curious, do you have alternatives in mind that could be used in our case instead of serde_json?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably isn't terribly difficult to implement a custom deserializer just for the values. An example of an implementation that does this is the aforementioned config-rs (I encourage to at least skim throug the link here). It has handling for u128 built-in too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, a custom value parser would make sense. Also thanks for the link with the example.

As things stand right now, I don't see a strong motivation to do it right now. As long as we have JSON as intermediary, we have to store numbers as strings anyway and as long as we don't lose precision on the way, I think it doesn't matter too much where the cut-off point is.

Copy link
Collaborator

@nagisa nagisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good. I imagine we can change the exact implementation of value deserializer later on if we find that json is not actually suitable for our needs.

That said, please give other reviewers some time to look through this still.

@jakmeier
Copy link
Contributor Author

jakmeier commented May 3, 2022

I think this looks pretty good. I imagine we can change the exact implementation of value deserializer later on if we find that json is not actually suitable for our needs.

That said, please give other reviewers some time to look through this still.

Thanks! Yes, I am not really in a hurry here. Since some people are taking time off this week, I will probably wait to merge this until next week.

Copy link
Member

@Longarithm Longarithm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for making config updates more simple and transparent!

Copy link
Contributor

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor style comments, but they are optional and can be followed-up!

@@ -0,0 +1,151 @@
# Gas economics config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to add a preamble along the lines of

# Protocol config for the *initial* version of the protocol.
# Diffs are applied to get a config at a specifiv protocol version

(note -- wording is very provisional, please refine it :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea, I've written something along those lines.

.expect("Failed parsing initial testnet config")
}

pub(crate) fn from_parameters_txt(txt_file: &str) -> Result<RuntimeConfig, InvalidConfigError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

txt_file is a bad name:

  • ambiguity between Path and String (contents of the file)
  • this doesn't have to be File

Maybe just impl FromStr for RuntimeConfig and impl FromStr for ParameterTable?

Also, does this need to be pub(crate)? Another alternative would be to just new(t: ParameterTable), if this doesn't increase visibility of the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done impl FromStr for ParameterTable and impl FromStr for ParameterTableDiff.

But for RuntimeConfig, I went with new, also replacing fn from_parameters(params: &ParameterTable) -> Result<Self, serde_json::Error> on the way.

Comment on lines +227 to +229
if value.is_empty() {
return Ok(serde_json::Value::Null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check feels like somethig worth pushing to the caller perhaps. Just a gut automatic reaction, maybe this is wrong in context :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. Most of the parameters are number types and I would rather have it as null when they are unspecified. Also, serde will not like that, I think.

}

#[test]
fn test_old_runtime_config_data() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice assuarance that we didn't change values in the process of migration! Let's remove it once the PR is merged (that is, I think the only value of this test is to check this specific PR, and, once that it, wec an remove the test without the loss of confidence? )!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@near-bulldozer near-bulldozer bot merged commit d914b64 into near:master May 10, 2022
jakmeier added a commit to jakmeier/nearcore that referenced this pull request May 11, 2022
`ci.sh`: No longer used because the buildkite pipeline has ben archived.
The same task, among others, is performed by continuous estimation.

`compare_costs.py` and `safety_multiplier.py`: Both operate on the
replaced JSON runtime config format. We have not used these scripts in
a long time, since estimations did not produce complete JSONs anyway.
But there always was the chance that we might go that way again.
Now, however,  that near#6693 replaced the JSON format, these scipts are
truely obsolete and better of deleted.
near-bulldozer bot pushed a commit that referenced this pull request May 11, 2022
`ci.sh`: No longer used because the buildkite pipeline has ben archived.
The same task, among others, is performed by continuous estimation.

`compare_costs.py` and `safety_multiplier.py`: Both operate on the
replaced JSON runtime config format. We have not used these scripts in
a long time, since estimations did not produce complete JSONs anyway.
But there always was the chance that we might go that way again.
Now, however,  that #6693 replaced the JSON format, these scripts are
truly obsolete and better of deleted.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants