Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC to remove old Cranelift x86 backend. #12

Merged
merged 3 commits into from
Sep 28, 2021

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Jun 21, 2021

Rendered

This RFC describes the process by which we hope to remove the old Cranelift x86 backend, with the appropriate consensus beforehand and the appropriate cleanups afterward.

It has been spurred by @bjorn3's work to draft the technical work of actually snipping it out in bytecodealliance/wasmtime#3009 (thanks!).

@cfallin cfallin force-pushed the remove-old-backend branch from 5561603 to e74a43e Compare June 21, 2021 18:51
backend?

Question 2: Is there any functionality in the old backend that we have
not yet adequately replicated in the new backend?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe couple of specialized instructions?

least register definitions and platform-specific legalizations. We
will need to replace some of the legalizations that the new backend
relies on with handwritten versions in the `simple_legalize`
framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only simple_legalize runs for the new backends AFAIK.

Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
Copy link
Member

@bnjbvr bnjbvr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this! I'd be actually happy to see the old backend go away :-)

Is there any functionality in the old backend that we have not yet
adequately replicated in the new backend?

Unfortunately, as much as I love the new backend, the old backend still has better performance in terms of compile times. I know that @cfallin has been working on a new register allocator (thanks!), but early results don't suggest a large improvement in terms of compile times compared to the current regalloc.rs implementation that's used in the new backend, so even with those two we might still be slower at compiling code than the old backend. In the future, with the new constraint system, the new register allocator ought to be faster for sure, but there is a tiny risk that even then, the old-backend would compiler faster.

Of course, compilation speed isn't the main worry, and the new backend has only brought improvements in terms of maintainability, lowering complexity and improving the quality of the generated code. I simply slightly worry that if we abandoned the old backend too eagerly, then we would send the message that we don't really support the JIT use-case (viz. fast compile times), and that wasmtime ought to be used only in AOT contexts.

Alternatively, we could make acceptance of this RFC conditional on the new backend compiling not being X% slower at compilation than the old backend is, or just plan and prototype new improvements that ought to lower compile times in the new backend, so we can confidently avoid the risk of just getting worse compile times.

Thoughts?

Co-authored-by: Benjamin Bouvier <public@benj.me>
@cfallin
Copy link
Member Author

cfallin commented Jun 24, 2021

Indeed, I just spoke with @bnjbvr, and I agree with his main point: it would be really great to solve the compile-time issue before we remove the old backend, at least with the current set of tradeoffs. The old backend's maintenance and complexity costs impose on us more as time goes on, but for now (certainly for the next few months) this is minimal.

The other point I hadn't considered was that, at some point as I continue to work on compile-time performance, next with instruction-selector work (which should move us over to a constraint-based API for the regalloc as @bnjbvr notes, hopefully with good effects), I'll want to actually compare with the old backend as well to show conclusively that we've improved compile time. For that reason, it would be useful to keep it working until then.

It's still worthwhile to discuss how the removal would actually happen, and what simplifications it would allow, of course, so I think we should keep discussing that as ideas occur to us!

@tschneidereit
Copy link
Member

Would it make sense to at least move the old backend to some sub-directory that clearly identifies it as legacy? Right now, it's not at all obvious which code one should look at as a newcomer to Cranelift, and fixing that seems quite valuable.

@cfallin
Copy link
Member Author

cfallin commented Jun 24, 2021

Yes, I agree, moving isa/x86/ to isa/legacy/x86/ or something like that would be a good first-step-to-removal. I'll put together a quick PR for that.

cfallin added a commit to cfallin/wasmtime that referenced this pull request Jun 24, 2021
These backends will be removed in the future (see
bytecodealliance/rfcs#12 and the pending bytecodealliance#3009 in this repo).

In the meantime, to more clearly communicate that they are using
"legacy" APIs and will eventually be removed, this PR places them in an
`isa/legacy/` subdirectory. No functional changes otherwise.
@bjorn3
Copy link
Contributor

bjorn3 commented Jun 24, 2021

The old backend's maintenance and complexity costs impose on us more as time goes on, but for now (certainly for the next few months) this is minimal.

One thing I noticed with my PR to remove it from cranelift-codegen-meta was that compilation and execution of cranelift-codegen-meta becomes near instantaneously instead of taking like 20s. I think compilation of cranelift-codegen also became a bit faster. Stripping the whole old backend will likely improve compilation time of cranelift-codegen a lot and reduce the compiled size significantly.

@bnjbvr
Copy link
Member

bnjbvr commented Sep 16, 2021

A clarification on my previous comment: we're not using the old backend anymore, and we would be fine removing it, as we've already taken and accepted the compile-times hit that came with it.

The previous comment rather expressed a touch of worry that despite all its greatness, the new backend is a bit slower than the older one in terms of compilation times, which performance hasn't improved a lot over the last few months (not blaming anyone, of course; anyone, including us, could have contributed patches to that effect).

The Cranelift team does have a plan now for improving this situation, by 1. using the new regalloc in the future, 2. using a better instruction selector to make use of the new regalloc. But even this plan has considerable risks, as we don't have a clear view on what the final compile-time performance will be, when compared to just keeping the old backend.

Overall, my point is that we should make sure that the next framework improvement will not result in another compile-time performance hit, that we wouldn't be ready to take. For one, switching to the new regalloc before using its full capacities would be such a compile-time hit. I am confident that this isn't a controversial point of view, and I do note that the forces at hand are working hard towards lower compile-times in general, which is highly appreciated.

@cfallin
Copy link
Member Author

cfallin commented Sep 16, 2021

Thanks @bnjbvr!

I agree 100% that getting compile-times down is the most important goal we have in front of us w.r.t. performance work, and things should not become any slower than they are. The ongoing odyssey with regalloc performance has been a frustration to me personally but of course it's just a really fundamentally hard problem. Earlier perf benchmarking with regalloc2 and the SSA-based inputs, together with profiling results when using the regalloc.rs shim, suggest to me that the many reg-to-reg moves inserted when lowering to VCode are a significant source of slowdown; so the hope (educated guess?) is that with the operand-constraints model in the native regalloc2 API, this should get much better. That's still a risk until we get there and actually measure, though, as you note.

My thought w.r.t. regalloc2 and ISLE is also that we will need to carefully measure perf before flipping any more switches, and I personally agree that we need to be really careful to avoid any regressions. Possibly this means that ISLE comes online first and we get to the point that we can then use regalloc2's API natively before switching it on by default, and just use the shim for fuzzing it until then; I'm not sure (and this should be discussed in a separate RFC). But for the topic at hand, I'll say for the record here that I agree, compile-perf should and must improve.

@cfallin
Copy link
Member Author

cfallin commented Sep 16, 2021

(Though as an addendum to the above, I should note that last time I measured regalloc2 with the shim, it still improved compile-time performance slightly on most benchmarks, and more significantly in worst-cases like clang.wasm, so I guess I only mean that we should measure carefully, and possibly regalloc2 comes in before ISLE and still offers an intermediate improvement. Again, topic for another RFC!)

@cfallin
Copy link
Member Author

cfallin commented Sep 16, 2021

Motion to finalize with a disposition to merge

Given the latest comments above, I think that we should consider moving forward with this RFC. Specifically, it appears that (i) there are no active users of the old backend any more, and (ii) the maintenance burden continues to exist, and continues to raise little questions and issues at inopportune moments; so I think it is time!

Stakeholders sign-off

(I'm borrowing the stakeholders list from #14 here)

Arm

DFINITY

Embark Studios

Fastly

Google/Envoy

Intel

Microsoft

Mozilla

IBM

wasmCloud

Unaffiliated

@cfallin
Copy link
Member Author

cfallin commented Sep 17, 2021

We have signoffs from multiple groups now, so:

Entering Final Call Period

https://github.com/bytecodealliance/rfcs/blob/main/accepted/rfc-process.md#making-a-decision-merge-or-close

Once any stakeholder from a different group has signed off, the RFC will move into a 10 calendar day final comment period (FCP), long enough to ensure that other stakeholders have at least a full business week to respond.

The FCP will end on Tuesday, September 28.

@uweigand
Copy link
Member

This is fine with me.

@cfallin
Copy link
Member Author

cfallin commented Sep 28, 2021

The FCP has now ended with no further discussion, so it's now time to merge this RFC. Thanks all for the many good points and discussion above! As a next step, we can look at moving forward the existing draft PR to remove the old x86 backend, and then further cleanups from there.

@cfallin cfallin merged commit 2821d03 into bytecodealliance:main Sep 28, 2021
@cfallin cfallin deleted the remove-old-backend branch September 28, 2021 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants