-
-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use .init_array on all WASM targets, not just WASI/Emscripten #76
Conversation
What happens if the embedder makes more than a single function call into the same instance of a module? |
If that happens despite the heuristics determining that it shouldn't, then the constructors will get called multiple times. |
Specifically for the constructors used by this crate, what would be the observable behavior for someone using this crate's public API? |
I think you'd wind up with collections returning multiple copies of everything submitted to them. To be clear, to whatever extent this is a problem, it's problem for all WASM targets. I don't think this PR exacerbates it. |
The underlying data structure for inventory is a linked list of statically allocated nodes, so I was uncertain how that could end up containing a variable number of copies of each submitted value dependent on the runtime number of calls to __wasm_call_ctors. Having n duplicates of each value for n init calls is not one of the possible options with this data structure. The only possible outcomes are:
From inspecting Registry::submit, the actual behavior is 2, so inventory::iter::<T> would be an infinite loop. That is bad enough that I think it would need to be solved before declaring that wasm is supported. Either the submit logic needs to be changed (on cfg wasm only) or the iterator needs to correctly handle a circular list by stopping when the list's head is reached for a second time. From https://github.com/llvm/llvm-project/blob/llvmorg-19.1.6/lld/wasm/Writer.cpp#L1525-L1591, it seems like the __wasm_call_ctors implementation in wasm-ld is geared around the following kind of code: namespace {
std::string s = compute_s();
} where calling the ctor on every entry into the module, and calling the dtor on every exit, is fine and doesn't result in a memory leak. It seems misguided to me, considering scenarios like: const std::vector<T> vec = {...};
void enqueue(std::size_t i) {
static const T *prev;
if (prev) {
dequeue(*prev);
}
prev = &vec[i];
} which is fine on non-wasm, but on wasm, the dtor+ctor between consecutive calls would cause use-after-free. The wasm-ld implementation I would have expected is that it generates a boolean that remembers whether constructors have already run, and only runs them on the first call. But I guess they considered that problematic due to destructors never being run. |
Ah my bad; I only glanced over the linked-list code and didn't notice that the allocations were static. So yeah, a circular list would be the result here, which is rather bad because it would be difficult for users of the crate to figure out what was going wrong. I think a wasm-only change to the submit logic is the easiest and most robust solution. I'll amend this PR to add that. |
I've updated this PR to ensure idempotence. I tested it by manually invoking __wasm_call_ctors() multiple times from the entry point of a test program. This confirmed both that, without this patch, it results in a circular linked list, and that it works correctly when this patch is included. I also updated the original commit by tweaking the target conditionals to ensure that it's logically impossible for them to overlap, and added some documentation about WebAssembly support and how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
wasm-ld
's support for.init_array
isn't specific to WASI or Emscripten; it works regardless of what platform is being targeted. This PR adjusts the conditional-compilation attributes accordingly.I was surprised that getting
wasm32-unknown-unknown
working was quite this easy, because I expected that in order to get constructors to run, I would have to explicitly invoke__wasm_call_ctors
from the top of my entrypoint function, but it turns out thatwasm-ld
inserts this automatically!This confused me for a while because when I looked at the disassembly I wasn't seeing any safeguards against
__wasm_call_ctors
being called multiple times if the embedder makes multiple calls into an instantiated module. It turns out thatwasm-ld
has a somewhat crummy heuristic for handling this: if you're doing what it considers a "command-style" link and you never invoke__wasm_call_ctors
explicitly anywhere in your module, then it assumes that the embedder is only ever going to make one call into any given instance of the module and that it should put the__wasm_call_ctors
call at the top of every exported function. Otherwise, it refrains and assumes that the embedder will take responsibility for causing__wasm_call_ctors
to run at the appropriate time.That seems a little janky to me, but the jank is orthogonal to anything that I think this crate needs to concern itself with. So I think this PR is correct, and if anything needs to change about how
__wasm_call_ctors
is called then that's for LLVM to deal with.