-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fiber switching for WebAssembly #13107
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of stylistic suggestions.
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
Thanks! Too much time not programming in a beautiful language, I guess 😂 |
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
…nto feat/wasm32-fibers
…dy enabled by default.
Turns out LLVM produces some really repetitive code when not optimizing. The generated wasm code for We can either run the spec with V8 or optimize before running with wasmtime. As wasmtime is such a popular runtime for wasm, I opted for optimizing before executing. This only became a problem with this PR because the asyncify pass increases the generated code size even more and it crossed some limit. |
So does that mean the workaround in #4516 (comment) somehow does not work when targetting WASM? |
Yes, it still generates too much code. Also, compiling the specs in release mode takes 20s on my machine. But without optimizations it takes 50s! I didn't check, but these huge functions are most likely the cause. Using static data embedded in a string literal would do wonders here. #4516 (comment) |
.github/workflows/wasm32.yml
Outdated
@@ -51,7 +51,7 @@ jobs: | |||
run: make crystal # TODO: Remove this after next version update | |||
|
|||
- name: Build spec/wasm32_std_spec.cr | |||
run: bin/crystal build spec/wasm32_std_spec.cr -o wasm32_std_spec.wasm --target wasm32-wasi --release | |||
run: bin/crystal build spec/wasm32_std_spec.cr -o wasm32_std_spec.wasm --target wasm32-wasi -Duse_pcre --release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wasm lib for PCRE2 should be available since #13109. Why -Duse_pcre
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just merging master, so this is what's in master today
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but that seems to be a mistake. Apparently 46a6ac7 unintentionally merged it back in.
I think we can just remove it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make more sense to fix master and re-merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, maybe. Don't think it matters much, though.
Just stating the obvious: wasm specs are failing.
|
I did a significant refactoring of this PR to make is safer and more organized. Relevant changes:
Unfortunately the CI is still failing and I can't reproduce locally (even with wasm being almost deterministic). As the crash is inside dlmalloc, I'm suspecting that we are running out of memory and we need the GC... I'll investigate that futher. |
It's fixed! What was going on?By default the main stack comes after data and grows down. Nothing prevents it from overflowing and overwriting something in the data section. This is very dangerous.
The thing I overlooked is that by default the stack has only 64KB. That's very likely to be too short for most Crystal programs. Also, it seems that the See this issue: WebAssembly/wasi-libc#233
How was it fixed?The first change is to use the linker flag This is the new layout:
The stack can still grow into the unused region, but that's fine since this region is really unused. I'm still not sure why I couldn't reproduce this locally and it only happened with the CI. C'est la vie. Please give me another review round. Thanks! |
@@ -384,6 +384,7 @@ module Crystal | |||
[{cl, cmd, nil}] | |||
elsif program.has_flag? "wasm32" | |||
link_flags = @link_flags || "" | |||
link_flags += " --stack-first -z stack-size=#{8 * 1024 * 1024}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should reuse Fiber::StackPool::STACK_SIZE
or perhaps introduce a more generic property for this. It would also be great to allow configuration of the stack size at compile time.
Perhaps a generic CRYSTAL_STACK_SIZE
environment variable could be useful for this? It would populate this value here and Fiber::StackPool::STACK_SIZE
and have 8MB as default value.
Just thinking ahead, this can be implemented outside this PR. (It's probably better to extract it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like having a CRYSTAL_STACK_SIZE
environment variable! But let's do that on another PR.
Does this PR alone enable any new standard library specs? |
Not any entire spec file. But there are some I can enable by adding flag checks inside the spec files. Should I? Channels is a good example. Most tests pass, but some require exception handling to work. |
Yes, that would be great. All specs that work should be enabled 👍 |
I'm not sure what the progress of this was, but I merged Bizarrely enough though I can't run all the specs, because I ran into a cranelift bug for wasmtime, and wasmer's single-pass compiler crashes on arm64, and some other runtimes fail because the functions are so large...etc etc. (I think LLVM's being a little over-aggressive with its inlining(?), but I digress.) |
This PR is the base work to support GC and Event Loop on Wasm. With these changes fibers are now usable and you can play with Channels, as long as you don't do IO or
sleep
inside a Fiber.This PR requires some explanation since stack switching on WebAssembly is very different from other platforms.
WebAssmbly as built by LLVM has 3 stacks:
local.get
andlocal.set
instructions. These frames are private and can't be manipulated outside of the function.__stack_pointer
global and grows down. This is not a feature of WebAssembly, just a convention that LLVM uses.We need to save and restore all three stacks for this to work.
There is an ongoing proposal to add stack switching to WebAssembly, but as of this writing it is not standardized and isn't implemented by any runtime. But I believe it will still take a few years before it is supported everywhere. At this point, this proposal is just an "optimization" of what Asyncify already does. See details at https://github.com/WebAssembly/stack-switching.
Here we use an alternative implementation on top of Binaryen's Asyncify pass. It is a static transformation on the WebAssembly program that must be performed after the Crystal program is compiled. It rewrites every relevant function so that it writes its stack frame to memory and rewinds to the caller, as well as recovering the stack from memory. It is all controlled by a global state variable. Please read more details at https://github.com/WebAssembly/binaryen/blob/main/src/passes/Asyncify.cpp.
Stack switching is accomplished by the cooperation between two functions:
Fiber.swapcontext
andFiber.wrap_with_fibers
. Before they are explained, the memory layout must be understood:The first 1024 bytes of memory are unused (but they are usable, including the address 0!). Follows the static data like constant strings. This would be read only data on other platforms, but here it is writable. The special symbols
__data_start
and__data_end
are generated by LLVM to mark this region. Then follows the main stack. A__heap_base
symbol points to the bottom of this stack. The global__stack_pointer
starts at this position and is moved by the program during execution. After that, the rest of the memory follows, used mainly by malloc. It is important to note that during execution everything is just memory, there is no real difference between those sections.For our implementation of stack switching we manage this memory layout on every stack:
The header stores 4 pointers as metadata:
[0]
=> A function pointer to fiber_main[1]
=> A pointer to the Fiber instance[2]
=> The current position on the asyncify buffer (starts after the header, grows up)[4]
=> The end position on the asyncify buffer (the current stack top)On the main stack all of them are guaranteed to start null.
Stack switching should happens as follows:
CrystalMain
should callFiber.wrap_with_fibers
, passing a proc to the main function.Fiber.swapcontext
will be called. It will:a. store the new context at
Fiber.@@next_context
.b. store the current stack_pointer global on context.stack_top.
c. update the current position of the asyncify buffer to be just after the header
d. update the end position of the asyncify buffer to be the current stack top
e. mark
Fiber.is_manipulating_stack_with_asyncify = true
f. begin stack unwinding with
LibAsyncify.start_unwind()
and returns.Fiber.wrap_with_fibers
again. We know that we are unwinding asFiber.is_manipulating_stack_with_asyncify
is marked. This means we have to either start a new fiber or rewind into a previously running fiber. If there is a asyncify buffer, then setup the rewinding process. Then call into the fiber main function. If it's null, then this is the main fiber, just call the original block.For all of this to work we need to run the asyncify pass on the compiler output after linking. I modified the compiler to just do that using Binaryen
wasm-opt
. As I was already there, I also enabled running Binaryen optimizations when building with--release
, those should improve the output size a lot.Also, seems like something is broken when importing custom wasm modules and building without single_module: the import module name disappears. I'll open an issue for that, but in the mean time I added a workaround to always build wasm as a single module. It is required to "link" against asyncify.
In the future the GC will work by unwinding and then rewinding the same fiber again, so that it is able to scan pointers on the asyncify buffer.