-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
/benchmark runtime pallet pallet_contracts |
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results
|
Why is this only a problem with It is trivial to create a thin wrapper around this struct in case you want to mimic the old behavior. To answer your question: You have 2 options:
Prefer 2) if you are operating in a hot loop, otherwise it won't matter too much what you do there. If you already do have a reusable byte buffer for other common operations you could start using it here. |
It isn't a problem, really. It was just a convenience API to save myself this |
…path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
Should we run a burn-in just in case? |
I put it on ice because it turns out that we have a performance regression. While the weights with regards to instantiation time and host functions go down we have almost 50% regression for instruction weights. @Robbepop confirmed that with a coremark benchmark just now. burn-in is a good idea after we resolved the regression. |
Turns out this is more complex than stating that I tested [profile.release]
lto = false # or "fat"
codegen-units = 16 # or 1 The Wasm
Summary
For more information about the huge difference in |
Runtime uses |
I'd welcome that we at least run benchmarks in order to see what is really happening. |
What do you mean? Weights are the benchmark. |
Sorry, i mean benchmarking wasmi version 0.10.0 instead of version 0.11.0. With this we can better locate the slowdown. |
/benchmark runtime pallet pallet_contracts |
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --release --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results
|
I am currently running a benchmark here with |
…path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
With |
That's great news! Especially for the upcoming Ideally we'd check if using |
@@ -229,6 +229,7 @@ fn create_project_cargo_toml( | |||
let mut release_profile = Table::new(); | |||
release_profile.insert("panic".into(), "abort".into()); | |||
release_profile.insert("lto".into(), true.into()); | |||
release_profile.insert("codegen-units".into(), 1.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it makes sense to have a production
profile here as to reduce compile time for the normal dev-flow?
Polkadot opted for it, and Substrate in general as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this would make sense. This is just an ad-hoc commit that will not be merged. However, it is already build with lto for every release build. Do you think cgu=1
is so much slower that it is worth an extra profile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested it on my laptop, the compile time was always ~25s.
So it probably makes no difference for small binaries and is not worth the effort here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some benchmarks in #1262, it does make things significantly slower (tested node-runtime
and Polkadot compilation there).
Also note that release_profile.insert("lto".into(), true.into());
is -C lto
, not -C lto=fat
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to=true
and lto="fat"
are the same: https://doc.rust-lang.org/cargo/reference/profiles.html#lto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reproducibly get 2 different sizes when switching between -C lto
and -C lto=fat
. Created zulip thread about this: https://rust-lang.zulipchat.com/#narrow/stream/122652-new-members/topic/-C.20lto.20and.20-C.20lto-fat.20are.20different.3F
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested wasmi
with lto="fat"
and lto=true
and I received the same outputs everytime. In some cases Cargo did not even bother to recompile the crate. So I cannot confirm your observations unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was adding .append_to_rust_flags("-C lto=fat")
to build.rs
, I guess you changed those lines above instead? Not sure how there could be any difference though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I just changed the lines that @athei also changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you append it to the build flags in the build.rs
you will have essentially both: true
in the profile (set by the substrate-wasm-builder
as per my patch) and then the hard coded flags (as set by you). Not sure why this makes a difference. Unless, the substrate-wasm-builder
uses its dev
profile 😱
I compared the benchmark results of this PR with the ones of #10709. With We merge this as soon as the runtime is build with those flags for production/benchmarks @ggwpez and we ran a burn in. |
This reverts commit fd792e7.
/benchmark runtime pallet pallet_contracts |
Benchmark Runtime Pallet for branch "at-bump-wasmi" with command cargo run --quiet --profile=production --features=runtime-benchmarks --manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic="*" --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs Toolchain: stable-x86_64-unknown-linux-gnu (default) Results
|
let mut production_profile = Table::new(); | ||
production_profile.insert("inherits".into(), "release".into()); | ||
production_profile.insert("lto".into(), "fat".into()); | ||
production_profile.insert("codegen-units".into(), 1.into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that we use panic = "abort"
for both release
and dev
profiles but not for production
profile? Defaults to panic = "unwind"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Offtopic: The tons of .into()
calls kinda irk me, we should really think about changing the API of Table::insert
to take a generic parameter T: Into<WhateverThisIs>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
production
inherits from release
. It uses abort
implicitly. We know for sure because we would get a build error if we'd try to build a runtime with unwind support (we don't implement unwind support).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so overriding profile.release
also overrides all inherited profiles. Makes sense.
#10747 is merged. Runtime is build with |
…--manifest-path=bin/node/cli/Cargo.toml -- benchmark --chain=dev --steps=50 --repeat=20 --pallet=pallet_contracts --extrinsic=* --execution=wasm --wasm-execution=compiled --heap-pages=4096 --output=./frame/contracts/src/weights.rs --template=./.maintain/frame-weight-template.hbs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspected the weights:
- Setup weights went up quite a bit.
- Load/Store instructions almost doubled in weight.
- Other instructions stayed the same or improved slightly
fn instr_i64load(r: u32, ) -> Weight { | ||
(74_212_000 as Weight) | ||
// Standard Error: 0 | ||
.saturating_add((1_325_000 as Weight).saturating_mul(r as Weight)) | ||
(118_782_000 as Weight) | ||
// Standard Error: 2_000 | ||
.saturating_add((2_711_000 as Weight).saturating_mul(r as Weight)) | ||
} | ||
fn instr_i64store(r: u32, ) -> Weight { | ||
(74_505_000 as Weight) | ||
// Standard Error: 1_000 | ||
.saturating_add((1_423_000 as Weight).saturating_mul(r as Weight)) | ||
(118_311_000 as Weight) | ||
// Standard Error: 5_000 | ||
.saturating_add((2_378_000 as Weight).saturating_mul(r as Weight)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Load and store instruction performance almost halved (look at r
).
(140_169_000 as Weight) | ||
(187_520_000 as Weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General call
overhead increased quite a bit. Most likely setting up a new module and instance just takes longer in the new version. Do we have a benchmark for this in wasmi (creating a new instance) @Robbepop ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there are benchmarks for checking the efficiency of module instantiation.
(210_420_000 as Weight) | ||
// Standard Error: 133_000 | ||
.saturating_add((145_601_000 as Weight).saturating_mul(c as Weight)) | ||
// Standard Error: 8_000 | ||
.saturating_add((1_760_000 as Weight).saturating_mul(s as Weight)) | ||
(312_769_000 as Weight) | ||
// Standard Error: 122_000 | ||
.saturating_add((142_000_000 as Weight).saturating_mul(c as Weight)) | ||
// Standard Error: 7_000 | ||
.saturating_add((1_798_000 as Weight).saturating_mul(s as Weight)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overhead of instantiating per code byte seems to stay the same. General overhead goes up. I assume setting up the wasm instance is slower now.
Performance regressed too much. We wait for the new wasmi_v1 to try again. |
Only minor changes in wasmi. Most importantly we got rid of that
core
feature which allows us to make thesp-sandbox
manifest much nicer.@Robbepop The deprecation of
memory.get
doesn't feel good. It just requires the user to introduce the tedious extra step of allocating a buffer in the common case where we want to allow a dynamically sized buffer. Or how else would be deal with this code:substrate/client/executor/wasmi/src/lib.rs
Lines 175 to 182 in 33c518e
Why didn't the dependabot pick this up?