-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasmparser
: VisitOperator API
#697
wasmparser
: VisitOperator API
#697
Conversation
The trait method identifiers have been automatically generated using the `heck` crate. This is the same that I would have used if I were to generate the trait using a proc. macro. A proc. macro was not used since I wanted to keep the PR as simple and contained as possible. I figured all those trait methods do not necessarily need docs since that would have been very daunting work to add. Also docs for all the trait methods can be added later. One downside of the non-generated trait is that we are going to have to keep those identifiers in sync manually.
Note: This was primarily copy pasted from the respective match construct used so far.
This was the biggest chunk of the work and it was also not trivial to copy over from the `match` based code constructs. I tried to deduplicate most of the same validation logic by splitting out those parts into their own functions such as OperatorValidator::check_v128_shift_op etc. We might want to use these VisitOperator impl in the respective `match` based impementation to deduplicate validation logic. This refactoring should be simple since we will only forward to the respective visitor based method. The minor downside is that we can no longer merge match arms while forwarding.
This was relatively straight forward and most of the work could be automated. This basically forwards all the methods to the underlying OperatorValidator impls and adds offsets to respective errors as does the original implementation. I prevent rustfmt from doing its job here so that we can do bulk line based transformations of the code.
This is the API to the new visitor based operator validation.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, thanks! One worry I have is that this is introducing two ways to do the same thing that we need to keep in sync an up-to-date. I think that the overhead here is somewhat low because it's easy to spot mistakes and such but I want to make sure that everything is at least thoroughly tested. For example I don't think there should be two different paths that read the binary encoding of an operator. The style implemented in the operator validator where Operator
is deleted to visit methods I think makes sense.
I unfortunately can't open validator/operators.rs
in a browser to provide more detailed feedback:
- Could the
VisitOperator
trait have a methodvisit_operator
which delegates to the appropriatevisit
method internally? This could then be used by the operator validator I believe. - Passing around
resources: &I
seems pretty onerous. If you'd like I think we could move ownership of the resources into the operator validator instead of having it separately owned in the function validator. - If possible I'd ideally like to remove the dependence on
Operator
if we can. Given this PR is there any reason for it to stick around?
For benchmarks the in-repository benchmark isn't really that good I think, do you have more representative numbers on a more idiomatic module perhaps? I suspect that they're still wins, but it may be bigger than 10% since the modules in this repository are all test files which have few instructions.
This comment was marked as outdated.
This comment was marked as outdated.
This is due to Alex C. mentioning that its closure based usage is pretty heavy on the Rust comiler. Benchmarks show that performance is not affected by this change.
wasmparser
: Add VisitOperator trait and impls
Ah this is just a convenience method if
This crate does not provide a strong API stability guarantee, there's a whole lot of major versions and new things get added all the time. This basically isn't a factor when designing this crate. Instead we've been designing this crate as "keep it up to day with abstractions that all fit together", which is why given this trait I'm not sure we still need an
Is that 6ms before this change or after this change? Are you able to see other bottlenecks in |
This comment was marked as resolved.
This comment was marked as resolved.
This was demanded by Alex C. during code review. In general this cleaned up a lot of code in the OperatorValidator impl. However, this design is incompatible with the way the ModuleState::check_init_expr method works and causes a borrow check error there. I have not yet been able to fully resolve this borrow check error since I am missing information about the involved behaviours.
This comment was marked as resolved.
This comment was marked as resolved.
…-visit-operator # Conflicts: # crates/wasmparser/src/validator/core.rs # crates/wasmparser/src/validator/operators.rs
This comment was marked as resolved.
This comment was marked as resolved.
This PR again moves the WasmModuleResources out of the OperatorValidator but keeps many of the improvements by now taking over a pair of (offset, resources): (usize, T) for all visitation methods.
Also use it from OperatorValidator::process_operator method.
This required adding a lifetime to the VisitOperator trait to preserve the lifetime on the BrTable.
I'm ok deferring dealing with
which I think should solve your use case? Otherwise I have posted Robbepop#2 to simplify the operator validator (in my mind at least). The main goal of the operator validator is to be as absolutely simple and close to the spec as we can manage. This is critical for correctness so having extra layers of abstractions and distinctions like |
Er sorry and to clarify with Robbepop#2 I personally feel this is good to go and iteration can continue in-tree. |
Yes that would be fine for my use case. Thank you for the review and for creating the issue which allows us to merge this PR and defer the rest of the work. I understand your concern with keeping the operator validator as simple as possible. I will review your changes and push them to update this PR for merge. |
Simplify the operator validator
@alexcrichton I just benchmarked the PR with your simplifications and noticed that performance drastically went down from 18% improvements to just 12% improvements. Even if you are not convinced by the Usually while working on this PR I was implementing everything in steps and checking performance after each step to assert constant improvement. However, this was not possible with your changes as everything was squashed into a single commit. Hard to guess where the difference is coming from but I feel this should probably be dealt with as follow up. After all the entire essence of this new API is for the sake of performance. |
No I simply assumed that performance wouldn't change, let me investigate why it changed locally and see if I can't figure it out. I'm with you though in that this is supposed to be performant, I am not trying to compromise on the efficiency here. |
Cool! For best results try benchmarking using the following profile since it makes a lot of difference using this new API: [profile.bench]
lto = "fat"
codegen-units = 1 |
I checked out the commit prior to mine and I added a benchmark that additionally validates the entirety of spidermonkey.wasm. Before after my PR to this PR I get:
Taking a look at the profile I wrote this trivial patch: diff --git a/crates/wasmparser/src/binary_reader.rs b/crates/wasmparser/src/binary_reader.rs
index f54a1414..983c0057 100644
--- a/crates/wasmparser/src/binary_reader.rs
+++ b/crates/wasmparser/src/binary_reader.rs
@@ -994,12 +994,19 @@ impl<'a> BinaryReader<'a> {
///
/// If `BinaryReader` has no bytes remaining.
pub fn read_u8(&mut self) -> Result<u8> {
- self.ensure_has_byte()?;
- let b = self.buffer[self.position];
+ let b = match self.buffer.get(self.position) {
+ Some(b) => *b,
+ None => return self.eof_err(),
+ };
self.position += 1;
Ok(b)
}
+ #[cold]
+ fn eof_err<T>(&self) -> Result<T> {
+ Err(BinaryReaderError::eof(self.original_position(), 1))
+ }
+
/// Advances the `BinaryReader` up to four bytes to parse a variable
/// length integer as a `u32`.
/// and got this speedup:
Basically I can't really reproduce the numbers you're seeing. For performance work I think it would be best to settle on a set of benchmarks to validate (maybe Otherwise I suspect there's quite a lot of low hanging fruit on Are you able to quantify where the slowdown you're seeing is coming from? Did I miss something perhaps? I'm also on an aarch64 machine instead of an x86_64 machine which could make things different. |
This comment was marked as outdated.
This comment was marked as outdated.
@alexcrichton I conducted benchmarks using the
|
Benchmark | Default Profile | Opt. Profile | Diff% |
---|---|---|---|
parse |
47.228 ms | 38.095 ms | -19.602% |
validate |
88.523 ms | 87.496 ms | -1.1601% |
rf-visit-operator
Branch
Benchmark | Default Profile | Opt. Profile | Diff% | Diff% to main (Default) |
Diff% to main (Opt.) |
---|---|---|---|---|---|
parse |
51.140 ms | 40.412 ms | -20.977% | +8.2832% | +6.0806% |
validate |
79.736 ms | 70.894 ms | -11.089% | -9.9262% | -18.974% |
rf-visit-operator
Branch (Cleanup Reverted)
Benchmark | Default Profile | Opt. Profile | Diff% | Diff% to main (Default) |
Diff% to main (Opt.) |
---|---|---|---|---|---|
parse |
51.335 ms | 40.490 ms | -21.125% | +9.3440% | +7.0290% |
validate |
78.600 ms | 68.066 ms | -13.402% | -10.849% | -21.926% |
Where Opt. Profile is
[profile.bench]
lto = "fat"
codegen-units = 1
Conclusions
- The
parsing
benchmark using the defaultprofile.bench
regressed compared tomain
branch by roughly 8-9%. This is not okay and needs investigation. - The
main
branch does not really profit from the optimizedprofile.bench
with respect to itsvalidate
benchmark. - There is a small but significant performance difference between the
rf-visit-operator
branch and its version with theOperatorValidator
cleanup reverted for both profiles. - The
VisitOperator
API allowsrustc
to better optimize thevalidation
procedures.- Default Profile: 10% improvement
- Optimized Profile: 18-21% improvement
- Using the optimized
profile.bench
yields roughly 20% performance improvement for all branches for theparse
benchmarks. (Unrelated to therf-visit-operator
PR)
Remarks
While I can understand profiling for the default bench
profile is important for most users I think it is at least as important to also profile for the profile with maximum amounts of optimizations turned on for the cases were that matters. In wasmi
we would never compile a release without this particular profile since it provides up to 400% better performance in some cases with an average performance boost of 100%. Of course wasmi
is probably an outlier as an interpreter but certainly not the only performance sensitive project out there.
Update
Using flamegraph
I was able to spot a major performance difference in BinaryReader::read_memarg
that could explain the difference between the parse
benchmark results. I tried out multiple efforts to equalize performance without success. I suppose that the OperatorFactory
is probably not the ideal construct. On the other hand we could get rid of enum Operator
anyways and therefore also of OperatorFactory
et. al.
Using an x86_64 machine I can roughly reproduce these results. In taking a deeper look all I can conclude though is that there's just so much low-hanging fruit I don't think it makes sense at this time to draw broad conclusions. For example the lto build before my patch is indeed ~8% faster than the lto build after my patch, but all the changes in functions seem to just be little layout things here and there where all the hot functions have so many better ways to be optimized than the changes I made. I'm unable to reproduce your slowdown relative to My read on this is that this PR should go ahead and get merged and further iteration can happen in-tree. There's lots of little things I see which can be improved in the validator such as outlining error paths to cold functions to enable more inlining and such. How do you feel about that? Basically taking the current state of this PR as a baseline and continuing to optimize from there. |
I agree, let's merge and progress in-tree! :) I totally share your thoughts on the optimization matters and will do some researching post merge. |
Ok sounds good! I think we probably need to rewrite the |
This commit fixes a regression introduced in bytecodealliance#697 which could cause a panic when validating an invalid wasm module. The issue introduced was that a [check that the control stack is non-empty][check] was lost in the refactoring of the operator validator. This check ran for every single operator and verified that there was a frame on the control stack that the operator could be attached to, otherwise it means instructions were present after the end of the function. The current design of `VisitOperator` doesn't have an easy place to slot this in so I decided to fix this via a different route than was implemented before. Anything which operates on the control stack now checks to see if it's empty instead of asserting it's non-empty. Operators which don't touch the control stack are then checked by ensuring that the `end` opcode which emptied the control stack was the last operator processed in the function. This exposed a minor issue where when validating const expressions the offset that was passed in as the final offset of the expression was actually the first offset of the expression. Additionally this adds some tests to exercise this corner case (unsure why the spec test suite doesn't have them already!) [check]: https://github.com/bytecodealliance/wasm-tools/blob/8732e0bc8a579cd9f15d9134af997c5d3d95af5d/crates/wasmparser/src/validator/operators.rs#L581-L583
This commit fixes an accidental regression from bytecodealliance#697 in the refactoring I added at the end where manual modification of `self.control` was replaced with `self.push_ctrl`. For the exceptions proposal specifically this is not valid since the `block_type` for the frame doesn't get the parameters pushed when the `catch` and `catch_all` instructions are encountered.
Fix an accidental regression from #697
This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that.
This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that.
This commit fixes some more regressions from bytecodealliance#697 where an empty `control` stack was previously unexpected but now possible in some instructions. All of these bugs were identified by oss-fuzz.
This commit fixes some more regressions from #697 where an empty `control` stack was previously unexpected but now possible in some instructions. All of these bugs were identified by oss-fuzz.
This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that.
* Upgrade wasm-tools crates, namely the component model This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven't added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I've updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that. * Update version numbers * Use crates.io
Closes #695.
@alexcrichton Review should follow the commits. I added commit messages to make understanding of the changes simpler. Large parts of the code have been automatically generated and bulk transformed.
The major downside of this approach is that the
VisitOperator
trait is really ... huge and there currently is no way for users to implement just parts of it. However, that should probably not matter much for the projects of the BytecodeAlliance. I would not recommend default implementing the methods. Instead users should take care of this work.A minor downside is that maintainers of the
wasmparser
crate now have to sync changes in theOperator
variants for the identifiers of theVisitOperator
trait. Therefore we should strife to create a proc. macro based solution in the future but I deemed it out of scope for this already big PR.