-
Notifications
You must be signed in to change notification settings - Fork 758
GC Optimization Guidebook
As mentioned in the wasm-opt prompt engineering page, Binaryen's primary optimization pipeline has been optimized on LLVM output. That is, when you do wasm-opt -O3
then you get a useful set of optimizations to run on something clang
/rustc
/etc. emitted at -O3
. Wasm GC languages may benefit from a different set of optimizations, primarily because such languages generally do not use LLVM (e.g., Java, Dart, etc.), so the "shape" of the Wasm they emit is different, and different Binaryen optimizations may work better.
In addition to the fact that LLVM is often not used, Wasm GC is fundamentally different from Wasm MVP in the sense that it is much closer to a true IR for optimization. In particular, in C/C++/Rust etc. output in Wasm MVP we don't have an explicit user stack (just a region of linear memory that is managed manually), nor do we have explicit allocations (malloc/free
are just functions that manage another region of linear memory), and we don't have explicit function pointers (just integers, that are used as offsets into a table). All that greatly limits what optimizations can be done on Wasm MVP, compared to say LLVM IR. As a result, LLVM -O3
can do a lot of things you can't do on Wasm, like move an allocation from the user stack to a local. But that changes with Wasm GC, in particular:
- Allocations are explicit, using
struct.new
andarray.new
. - Function pointers are explicit, using
ref.func
andcall_ref
.
This makes Wasm GC much more optimizable, as now we can do things like move an allocation into locals (and Binaryen does that in the Heap2Local pass). So Binaryen cannot be a general-purpose optimizer for Wasm MVP content - MVP is not expressive enough - but it can be for Wasm GC. And that is good, since those languages often do not use LLVM as mentioned above, and so Binaryen can help out by doing general-purpose optimizations.
To summarize all that, wasm-opt -O3
on LLVM output does some useful refining of LLVM's optimizations for Wasm MVP, but on Wasm GC Binaryen can do a lot more and be used as the primary toolchain optimizer. Doing so requires more than just passing -O3
or such, and that is the topic of the rest of this page.
wasm-opt -O3
is enough for LLVM output, as mentioned above, which is already quite optimized. For other code you may want to run the entire Binaryen optimization pipeline more than once, which you can do like this:
wasm-opt -O3 -O3
each -O3
not only sets the opt level to 3 but also asks the tool to run the full optimization pipeline, and so it can be specified more than once (this is different than the UI for gcc and clang). The same holds for -Os
etc.
If your compiler has not done much general-purpose optimizations before Binaryen runs on it, you may want several rounds of optimization. For example, J2Wasm uses 6 or so, at the time of writing this doc.
The Grand Unified Flow Analysis is an optimization that scans the entire program and makes inferences about what content can appear where. This can help MVP content, but really shines on GC because it infers a lot about types, in particular, it can find when a location must contain a constant, which can lead to devirtualization, a crucial optimization.
GUFA is a heavyweight optimization and not run by default. You run it manually with --gufa
. When to run it, and how many times, is worth experimenting with, but you can try things like this:
-
-O3 --gufa -O3
: One run of the main optimization pipeline, then GUFA, then another run of the pipeline to take advantage of GUFA's findings. -O3 --gufa -O3 -O3
-O3 --gufa -O3 --gufa -O3
etc. You can also try -Os
instead of -O3
etc.
It can be useful to run --metrics
in the middle, to see the impact of a pass. For example, wasm-opt --metrics -O3 --metrics
will dump metrics once, then optimize, then dump metrics again. The second dump will contain a diff compared to the last metrics, so you can see stats on the change in the number of each type of instruction.
By default Binaryen uses the isorecursive typesystem as defined in the Wasm GC spec. You can switch to "nominal mode", however:
wasm-opt --nominal
In nominal mode each type is a simple nominal type. That means that if a type is not used, in particular, it can be removed from the binary. That is not the case with the isorecursive type system as if a rec group is alive then all its contents must remain alive, even if unused.
The point of the isorecursive type system is to allow separate modules to interact (by matching up their rec groups). If you don't need that then using nominal mode is better.
Technically, nominal mode is equivalent to using a single big rec group for everything - which is compliant with the Wasm GC spec - and also not caring about changing the form of that rec group.
By default we assume that any type that escapes to the outside may be inspected and interacted with. For example, if a struct escapes then the outside may read a field, or a function reference escapes then it may be called. That means we cannot alter that field or that function, say by refining the type of the field or removing a parameter, etc. If you do not have such interactions with the outside then you can tell Binaryen to assume a closed world:
wasm-opt --closed-world
In a closed world we run several more important passes, and some other passes become more effective, so this is quite important to do, if you can.
Note that you can still let references escape to the outside. For example, the outside might hold onto a reference (caching it on some other object, say) and pass it back in. The only thing that is disallowed is for the outside to interact with the contents of the reference.
The Binaryen optimizer has many passes that do type-based inference. For example, if a type's field is only ever written a single value, then we can infer the result in all reads of that field in the entire program (which is the case for things like vtables in J2Wasm, for example). Such type-based optimization gets more powerful the more refined the type information is: it is better to use different types as much as possible rather than reusing the same type in many contexts. Binaryen has two optimizations that can help here: TypeSSA
which "splits" types, defining a new type at each struct.new
basically, and TypeMerging
which "coalesces" types, finding different types that do not actually need to be different and then folding them together.
The general idea is that you want to split types as much as possible, then run optimizations, and then merge them at the end. The merge is useful because unnecessarily split types add size to the binary, but it is important to do it at the end because after the merge the optimizer can do less things.
For example, you can try this:
wasm-opt --type-ssa -O3 -O3 --merge-types -O3
That is, split types, then optimize twice, then merge, then optimize again. (The last optimization pass can sometimes do a little more work after merging, as once types are merged sometimes functions who now have identical types can be merged.)
Alternatively, if your toolchain already emits very refined types (already using a new type in every location that makes sense) then you can omit the --type-ssa
pass.
As with earlier suggestions, it is a good idea to experiment with various options in terms of the order and number of optimization cycles that you run.
See the main wasm-opt prompt engineering page for general advice on optimizer options.