-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC to remove old Cranelift x86 backend. #12
Conversation
5561603
to
e74a43e
Compare
backend? | ||
|
||
Question 2: Is there any functionality in the old backend that we have | ||
not yet adequately replicated in the new backend? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe couple of specialized instructions?
least register definitions and platform-specific legalizations. We | ||
will need to replace some of the legalizations that the new backend | ||
relies on with handwritten versions in the `simple_legalize` | ||
framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only simple_legalize runs for the new backends AFAIK.
Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this! I'd be actually happy to see the old backend go away :-)
Is there any functionality in the old backend that we have not yet
adequately replicated in the new backend?
Unfortunately, as much as I love the new backend, the old backend still has better performance in terms of compile times. I know that @cfallin has been working on a new register allocator (thanks!), but early results don't suggest a large improvement in terms of compile times compared to the current regalloc.rs
implementation that's used in the new backend, so even with those two we might still be slower at compiling code than the old backend. In the future, with the new constraint system, the new register allocator ought to be faster for sure, but there is a tiny risk that even then, the old-backend would compiler faster.
Of course, compilation speed isn't the main worry, and the new backend has only brought improvements in terms of maintainability, lowering complexity and improving the quality of the generated code. I simply slightly worry that if we abandoned the old backend too eagerly, then we would send the message that we don't really support the JIT use-case (viz. fast compile times), and that wasmtime ought to be used only in AOT contexts.
Alternatively, we could make acceptance of this RFC conditional on the new backend compiling not being X% slower at compilation than the old backend is, or just plan and prototype new improvements that ought to lower compile times in the new backend, so we can confidently avoid the risk of just getting worse compile times.
Thoughts?
Co-authored-by: Benjamin Bouvier <public@benj.me>
Indeed, I just spoke with @bnjbvr, and I agree with his main point: it would be really great to solve the compile-time issue before we remove the old backend, at least with the current set of tradeoffs. The old backend's maintenance and complexity costs impose on us more as time goes on, but for now (certainly for the next few months) this is minimal. The other point I hadn't considered was that, at some point as I continue to work on compile-time performance, next with instruction-selector work (which should move us over to a constraint-based API for the regalloc as @bnjbvr notes, hopefully with good effects), I'll want to actually compare with the old backend as well to show conclusively that we've improved compile time. For that reason, it would be useful to keep it working until then. It's still worthwhile to discuss how the removal would actually happen, and what simplifications it would allow, of course, so I think we should keep discussing that as ideas occur to us! |
Would it make sense to at least move the old backend to some sub-directory that clearly identifies it as legacy? Right now, it's not at all obvious which code one should look at as a newcomer to Cranelift, and fixing that seems quite valuable. |
Yes, I agree, moving |
These backends will be removed in the future (see bytecodealliance/rfcs#12 and the pending bytecodealliance#3009 in this repo). In the meantime, to more clearly communicate that they are using "legacy" APIs and will eventually be removed, this PR places them in an `isa/legacy/` subdirectory. No functional changes otherwise.
One thing I noticed with my PR to remove it from cranelift-codegen-meta was that compilation and execution of cranelift-codegen-meta becomes near instantaneously instead of taking like 20s. I think compilation of cranelift-codegen also became a bit faster. Stripping the whole old backend will likely improve compilation time of cranelift-codegen a lot and reduce the compiled size significantly. |
A clarification on my previous comment: we're not using the old backend anymore, and we would be fine removing it, as we've already taken and accepted the compile-times hit that came with it. The previous comment rather expressed a touch of worry that despite all its greatness, the new backend is a bit slower than the older one in terms of compilation times, which performance hasn't improved a lot over the last few months (not blaming anyone, of course; anyone, including us, could have contributed patches to that effect). The Cranelift team does have a plan now for improving this situation, by 1. using the new regalloc in the future, 2. using a better instruction selector to make use of the new regalloc. But even this plan has considerable risks, as we don't have a clear view on what the final compile-time performance will be, when compared to just keeping the old backend. Overall, my point is that we should make sure that the next framework improvement will not result in another compile-time performance hit, that we wouldn't be ready to take. For one, switching to the new regalloc before using its full capacities would be such a compile-time hit. I am confident that this isn't a controversial point of view, and I do note that the forces at hand are working hard towards lower compile-times in general, which is highly appreciated. |
Thanks @bnjbvr! I agree 100% that getting compile-times down is the most important goal we have in front of us w.r.t. performance work, and things should not become any slower than they are. The ongoing odyssey with regalloc performance has been a frustration to me personally but of course it's just a really fundamentally hard problem. Earlier perf benchmarking with regalloc2 and the SSA-based inputs, together with profiling results when using the regalloc.rs shim, suggest to me that the many reg-to-reg moves inserted when lowering to VCode are a significant source of slowdown; so the hope (educated guess?) is that with the operand-constraints model in the native regalloc2 API, this should get much better. That's still a risk until we get there and actually measure, though, as you note. My thought w.r.t. regalloc2 and ISLE is also that we will need to carefully measure perf before flipping any more switches, and I personally agree that we need to be really careful to avoid any regressions. Possibly this means that ISLE comes online first and we get to the point that we can then use regalloc2's API natively before switching it on by default, and just use the shim for fuzzing it until then; I'm not sure (and this should be discussed in a separate RFC). But for the topic at hand, I'll say for the record here that I agree, compile-perf should and must improve. |
(Though as an addendum to the above, I should note that last time I measured regalloc2 with the shim, it still improved compile-time performance slightly on most benchmarks, and more significantly in worst-cases like clang.wasm, so I guess I only mean that we should measure carefully, and possibly regalloc2 comes in before ISLE and still offers an intermediate improvement. Again, topic for another RFC!) |
Motion to finalize with a disposition to mergeGiven the latest comments above, I think that we should consider moving forward with this RFC. Specifically, it appears that (i) there are no active users of the old backend any more, and (ii) the maintenance burden continues to exist, and continues to raise little questions and issues at inopportune moments; so I think it is time! Stakeholders sign-off(I'm borrowing the stakeholders list from #14 here) ArmDFINITYEmbark StudiosFastly
Google/EnvoyIntelMicrosoftMozillaIBMwasmCloudUnaffiliated |
We have signoffs from multiple groups now, so: Entering Final Call Period
The FCP will end on Tuesday, September 28. |
This is fine with me. |
The FCP has now ended with no further discussion, so it's now time to merge this RFC. Thanks all for the many good points and discussion above! As a next step, we can look at moving forward the existing draft PR to remove the old x86 backend, and then further cleanups from there. |
Rendered
This RFC describes the process by which we hope to remove the old Cranelift x86 backend, with the appropriate consensus beforehand and the appropriate cleanups afterward.
It has been spurred by @bjorn3's work to draft the technical work of actually snipping it out in bytecodealliance/wasmtime#3009 (thanks!).