Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use .init_array on all WASM targets, not just WASI/Emscripten #76

Merged
merged 3 commits into from
Jan 7, 2025

Conversation

dfoxfranke
Copy link
Contributor

@dfoxfranke dfoxfranke commented Jan 6, 2025

wasm-ld's support for .init_array isn't specific to WASI or Emscripten; it works regardless of what platform is being targeted. This PR adjusts the conditional-compilation attributes accordingly.

I was surprised that getting wasm32-unknown-unknown working was quite this easy, because I expected that in order to get constructors to run, I would have to explicitly invoke __wasm_call_ctors from the top of my entrypoint function, but it turns out that wasm-ld inserts this automatically!

This confused me for a while because when I looked at the disassembly I wasn't seeing any safeguards against __wasm_call_ctors being called multiple times if the embedder makes multiple calls into an instantiated module. It turns out that wasm-ld has a somewhat crummy heuristic for handling this: if you're doing what it considers a "command-style" link and you never invoke __wasm_call_ctors explicitly anywhere in your module, then it assumes that the embedder is only ever going to make one call into any given instance of the module and that it should put the __wasm_call_ctors call at the top of every exported function. Otherwise, it refrains and assumes that the embedder will take responsibility for causing __wasm_call_ctors to run at the appropriate time.

That seems a little janky to me, but the jank is orthogonal to anything that I think this crate needs to concern itself with. So I think this PR is correct, and if anything needs to change about how __wasm_call_ctors is called then that's for LLVM to deal with.

@dtolnay
Copy link
Owner

dtolnay commented Jan 6, 2025

What happens if the embedder makes more than a single function call into the same instance of a module?

@dfoxfranke
Copy link
Contributor Author

If that happens despite the heuristics determining that it shouldn't, then the constructors will get called multiple times.

@dtolnay
Copy link
Owner

dtolnay commented Jan 6, 2025

Specifically for the constructors used by this crate, what would be the observable behavior for someone using this crate's public API?

@dfoxfranke
Copy link
Contributor Author

I think you'd wind up with collections returning multiple copies of everything submitted to them.

To be clear, to whatever extent this is a problem, it's problem for all WASM targets. I don't think this PR exacerbates it.

@dtolnay
Copy link
Owner

dtolnay commented Jan 7, 2025

The underlying data structure for inventory is a linked list of statically allocated nodes, so I was uncertain how that could end up containing a variable number of copies of each submitted value dependent on the runtime number of calls to __wasm_call_ctors. Having n duplicates of each value for n init calls is not one of the possible options with this data structure.

The only possible outcomes are:

  1. correct behavior, i.e. init is accidentally idempotent
  2. linked list becomes circular
  3. undefined behavior

From inspecting Registry::submit, the actual behavior is 2, so inventory::iter::<T> would be an infinite loop. That is bad enough that I think it would need to be solved before declaring that wasm is supported. Either the submit logic needs to be changed (on cfg wasm only) or the iterator needs to correctly handle a circular list by stopping when the list's head is reached for a second time.

From https://github.com/llvm/llvm-project/blob/llvmorg-19.1.6/lld/wasm/Writer.cpp#L1525-L1591, it seems like the __wasm_call_ctors implementation in wasm-ld is geared around the following kind of code:

namespace {
std::string s = compute_s();
}

where calling the ctor on every entry into the module, and calling the dtor on every exit, is fine and doesn't result in a memory leak. It seems misguided to me, considering scenarios like:

const std::vector<T> vec = {...};

void enqueue(std::size_t i) {
  static const T *prev;
  if (prev) {
    dequeue(*prev);
  }
  prev = &vec[i];
}

which is fine on non-wasm, but on wasm, the dtor+ctor between consecutive calls would cause use-after-free.

The wasm-ld implementation I would have expected is that it generates a boolean that remembers whether constructors have already run, and only runs them on the first call. But I guess they considered that problematic due to destructors never being run.

@dfoxfranke
Copy link
Contributor Author

Ah my bad; I only glanced over the linked-list code and didn't notice that the allocations were static. So yeah, a circular list would be the result here, which is rather bad because it would be difficult for users of the crate to figure out what was going wrong. I think a wasm-only change to the submit logic is the easiest and most robust solution. I'll amend this PR to add that.

@dfoxfranke
Copy link
Contributor Author

I've updated this PR to ensure idempotence. I tested it by manually invoking __wasm_call_ctors() multiple times from the entry point of a test program. This confirmed both that, without this patch, it results in a circular linked list, and that it works correctly when this patch is included.

I also updated the original commit by tweaking the target conditionals to ensure that it's logically impossible for them to overlap, and added some documentation about WebAssembly support and how __wasm_call_ctors works.

Copy link
Owner

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dtolnay dtolnay merged commit b131245 into dtolnay:master Jan 7, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants