-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient memory passing between WASM and host #314
Comments
Being able to use caller-supplied memory buffers for dynamically-sized return values of import calls has been discussed and also filed in #175. Having thought about it a bit more in the interim, it does seem quite doable (even addressing the tricky cases mentioned in #175) to add as a new
If you're writing a preview2-to-1 adapter by hand (and thus you know the semantics of the call and thus that, in the function signature
and if the preview2 caller does the above cute-little-hack, then I think the whole path ends up forwarding the original caller-supplied buffer all the way to the preview1 call. There is the additional overhead of the |
Thanks @lukewagner , I'm glad to see that this scenario was taken into account already and there's a clean solution for that. I've created a simple example just to make sure I understand what you propose (there's lots of corner cases not being handled, it's just for demonstration purposes) - let me know if I got this right:
The implementation is slightly different to what you proposed as I don't override realloc implementation. By doing that, we could probably apply the same optimization for non-adapter usecases, but I'd rather keep the first iteration as non-invasive as possible (although this approach won't work if the interface returns more than one list, which I don't think is the case for the preview1 adapter). We can extend the approach as next step if needed (or just wait for the #175 to have a proper support for that). Given that supporting the adapter would require changes in wasi-libc, I'm curious if the maintainers would be open for those modifications (@sunfishcode , @sbc100, @abrown ). I think we'd need to have a high-level agreement whether the adapter is a way to go to support preview1 in wasi-libc, but for now just testing the water - would that be something we could perhaps have in wasi-libc? |
The wasmtime-(compiled to wasm)-based adapter @cpetig has been working on might fit the bill here. |
Thanks for sharing. I actually used some of the work done by @cpetig shared on Zulip to generate core modules with preview2 cannonical ABI to test my adapter :) (thanks @cpetig for the work BTW). |
@loganek I might be missing a constraint, but I think it should be possible to write this preview2-to-1 adapter without any changes to wasi-libc, since the adapter doesn't have to know about this realloc trick; it can just call realloc as a black box (and maybe a caller-supplied buffer is used, and maybe not). The trick is how to get line 22 of your adapter to be able to call the preview2-core-module-defined |
Hmm, I can't imagine making the change without modifications of the wasi-libc; the wasi-libc code is the only place where the caller-supplied buffer is available. Once the preview2 function is being called, this information is lost. So I think that somewhere before the preview2 function is called, the wasi-libc must store the pointer of the caller-supplied buffer (for example, see here: https://github.com/WebAssembly/wasi-libc/pull/477/files#diff-36afeb04f659e0e36e627b55010e437bd0855e5a0b6edf82348cb7528b3c3d17R48). Also, wasi-libc unconditionally copies the data from the buffer returned by preview2 function into a caller-supplied buffer, and frees the list (e.g. here https://github.com/WebAssembly/wasi-libc/pull/477/files#diff-36afeb04f659e0e36e627b55010e437bd0855e5a0b6edf82348cb7528b3c3d17R60-R61). If the buffer is the one that the caller provided, that function will try to release that buffer (which is wrong). So I don't really see how we could do that optimization without changing a bit wasi-libc code. I think preview1->preview2 is a bit different - preview1 implementation calls preview2 function. Because preview1 has direct access to the buffer, it can do the "pseudo" allocation within the adapter. Let me know if I'm missing anything obvious here. I'd love to encapsulate all the hacky logic within the adapter, but can't think of any good way to do that. |
Ah, I think the missing link is that while, iiuc, the preview1-to-2 adapter does do this caller-provided-buffer optimization, it looks like the native preview2 wasi-libc implementation does not (yet). But let's say that native preview2 wasi-libc added the caller-provided-buffer optimization (which would be well-motivated as a pure optimization): then this is not a detail that the preview2-to-1 adapter would have to care about: the preview2-to-1 adapter simply calls wasi-libc's Does that make sense? If so, then what's nice is that there is no extra ABI contract beyond what plain preview2 specifies, which should reduce coupling. |
@lukewagner ok looks like we both talked about the same thing :) So we both agree that wasi-libc changes are necessary - I'll prepare a brief plan with changes and present it to the community. In my example repo I'm doing exactly that:
I implement it as a separate function because I simply didn't want to modify the auto-generated Let me know if you're aligned with above and if so, I think we can close the issue. |
Ah, ok, gotcha, good to hear. Overall, it seems like a good optimization to have, so I think we're aligned that it'd be a good idea to do somehow. I'm not sure where exactly in the toolchain is the right place to make those changes, but what you're saying sounds plausible, so I'll leave you to it to discuss with the folks working on the individual parts. |
Hi @lukewagner , could you please explain a bit more about how to use using caller-supplied buffer for avoiding allocation and copy during calling read() API. I made a diagram for the workingflow based on my understanding. It may be inaccurate. Could you please share how to apply this method as you mentioned? |
Talk it a bit with my colleague Liang He. We figure it might be something like below: This workflow only works for the native as the import provider. It implies the wasm runtime will do private handling for the native implementer of the WIT interface, like skipping all the memory reallocation and copying during the function call. And the native implementation of import and libc read() will share some special information. |
Hi @xwang98 I think there important thing to know is that there's a The proposal here is that in the libc's // This is WASI libc code
typedef struct {
void* buffer;
size_t len;
} user_allocator_t;
_Thread_local user_allocator_t user_allocator = {0};
void *cabi_realloc(void *ptr, size_t old_size, size_t align, size_t new_size) {
if (user_allocator.buffer && user_allocator.len >= new_size) {
return buffer;
} else {
// buffer wasn't provided, fallback to malloc
return malloc(new_size);
}
}
void read(const char* path, char *buf, size_t buf_size)
{
user_allocator.buffer = buf;
user_allocator.len = buf_size;
// we don't pass the buffer, but cabi_realloc knows about it already
void* ret = preview2_read(path, size); // call a function from the runtime
if (ret != buf) {
// something went wrong, and runtime couldn't use the caller-provided buffer
memcpy(buf, ret, size);
free(ret);
}
// reset caller-provided allocator
user_allocator.buffer = NULL;
user_allocator.len = 0;
}
// This is Runtime implementation
void* preview2_read(const char* path, size)
{
void *buf = cabi_realloc(NULL, 0, 1, size);
fread(buf, size, 1, file);
return buf;
} The implementation above just presents a high-level idea and obviously misses a lot of edge cases, but I hope that makes things clear. When it comes to preview2->preview1 adapter, the I've implemented the idea here: https://github.com/loganek/wasm-wsp2-efficient-memory/ (please note though that I don't use |
@loganek it looks like our lastest diagram pretty much follows what you explained. The following question is what if there are two It looks like a potential need for WIT to support a dual-access memory concept. It is not preferred to introduce too many tricks in the libc implementation as it is so foundational. |
@xwang98 I was mainly interested in the preview2 to preview1 adapter scenario and I don't think there's a need for more than one pointer (but didn't look at the all API s yet, so I might be wrong). One idea that comes to my mind though is to have a list/stack of buffer pointers in the allocator instead of just one, and consecutive calls to realloc would get/pop pointers from the stack. That'd however require a contract between the wash code and runtime code that defines the order of allocations. |
may or may not be helpful. just FYI: WebAssembly/WASI#594 |
Hi all,
I do apologize in advance if this topic was already discussed and documented somewhere - if that's the case, I'd be grateful for any pointers to those discussions.
I've been recently looking into some of the ways we can produce preview1-compatible WASM modules and at the same time not affecting the overall development of WASI. In our case, we run WASM runtime on customer devices, and the runtime in some cases can't be updated - so we're stuck with preview1 in the host for at least 4-5 years (and likely longer).
I've explored a few different options to stay up-to-date with the tooling (mainly, wasi-libc) and still remain compatible with the old runtimes we have in production:
Whereas all three options are still on the table, I think the 3rd one is the least intrusive one, requires very little maintenance overhead (I think it's almost an one-time effort) and enables us to completely abandon preview1 references in the tooling.
I've started working on a prototype of the adapter. So far I've prototyped (it's very buggy and limited, don't use it yet :) ) clock and sockets APIs (using WAMR-specific preview1-like interfaces). I was able to run a simple tcp client/server application using @dicej 's wasi-libc branch. When implementing the streams API I realized that interfaces that return a list, e.g.
don't allow users to pass a pre-allocated buffer (e.g. on the stack); instead, host is expected to call WASM's malloc to allocate a memory for the return buffer. This is a problem for many embedded applications where dynamic allocations are not recommended or even not allowed.
In addition to that, for this to work with libc, the buffer allocated by the host must be then copied to a buffer provided as a parameter for the libc function, so we have something like (very high-level flow, missing lots of details but hopefully clearly explains the problem):
So we not only spend some cycles to go from host back to WASM to call allocation function (which depending on the implementation might be slow) but also we only allocate a memory temporarily to copy the value back to a buffer already allocated by the user. This is how it works in wasi-libc and I'm not sure if other languages have similar problem - even if not, C and other languages relying on wasi-libc like C++ or Rust are probably popular enough to not neglect this problem.
I understand why list (and perhaps other data types) used as return values require allocating memory - WASM code doesn't know how big the return value is, so it's reasonable to let runtime do the allocation. However, there are some cases (like read function) where user already provides the maximum requested size, and in those cases they might want to also provide a buffer that's been already allocated.
I wonder if the problem of efficient data passing between host and WASM was discussed, and if so, what's the recommendation? I'm not sure whether this is a problem with a component model per se, or is it more a problem with a design of specific interfaces (or both)? For example, would it be possible to have a
read
function to be something like:? So the host knows that the list already points to a buffer of a specific length, and it should fill it with the data? This is just one of the ideas, but I'm curious what others think.
Please note it's not just a problem for the adapter from preview2 to preview1, and it's not only about any specific interface (I can imagine lots of different proposals follow the same pattern). If the problem is not being addressed, I think it might be a blocker for some of the embedded usecases to onboard to preview2+ (those projects would either stick to preview1 in some form, or not use WASM at all).
The text was updated successfully, but these errors were encountered: