-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Allow for static, freeable, String for fast handoff to Ruby #6
Conversation
It would be so cool if it was possible to get this working. The following is me just writing things out as I'm thinking though it, sorry it's long and rambling. For some reason I was under the impression that Ruby strings (despite storing their length) had to be NULL terminated, but now I can't find any reference to that. If that's not an obstacle, then it's the freeing the string that's the tricky part. It's not safe for Ruby to use its allocator to free memory Rust has allocated, as there's no guarantee they are the same allocator. As soon as the I think the safe way would be to use a finaliser, this is basically how Ruby lets you handle freeing C/Rust data 'wrapped' in a Ruby object. Finalisers are called some time after the object has been GC'd so you know Ruby doesn't have any references, and neither should any (correctly written) C/Rust extensions. Unfortunately the finaliser API for plain Ruby objects isn't great, and I've not implemented the API for it yet. The signature is VALUE rb_define_finalizer(VALUE obj, VALUE block); so you first have to construct a block, which has the signature VALUE rb_proc_new(rb_block_call_func_t func, VALUE callback_arg); where VALUE rb_block_call_func(VALUE yielded_arg, VALUE callback_arg, int argc, const VALUE *argv, VALUE blockarg) So the finaliser block is called without any arguments, so So we get to provide a function pointer If we just had some data on the heap in a The problem with Ruby has a few APIs like this, and the way to work around it is you pass a pointer to a closure as This is easy in say Aside: I have just realised I screwed up Once we're heap allocating a closure, and relying on Ruby's finaliser API, which isn't really designed to handle more than a small amount of finalisers, and slows down the GC, it might be that we've lost the benefits of not copying the string. But if you'd still like to give it a try, I think the function for setting a finaliser should have the signature (in the pub fn define_finalizer<T, F>(value: T, func: F) -> Result<(), Error>
where
T: Deref<Target = Value>,
F: FnOnce(),
{
todo!();
} It could probably just use and then you could do something like: // I think this name and signature make it clear it's taking ownership of `s`
// to convert it to a `Self`
pub fn from_string(s: String) -> Self {
let ptr = s.as_ptr();
let len = s.len();
let r_string = unsafe { Self::new_lit(ptr as _, len as _) };
// as far as Rust knows the ownership of `s` is moved into the
// finaliser and it'll leave the memory `s` points to alone until
// the finaliser is run. The finaliser func doesn't do anything with
// `s`, it'll just drop it once it's run.
// The finaliser is run some time after the string is GC'd, so nothing
// will still be using it
crate::gc::define_finalizer(r_string, move || {
// I think this is how you'd force something to move into a closure
// but I don't remember and haven't tested this
let _s = s;
}).unwrap(); // define_finalizer can fail, we don't expect it to here
r_string
} Some other things to check would be how this interacts with cloning/duping the string in Ruby, and methods that return copy-on-write references to the original string (I think |
I figured out another way to tie the lifetime of some Rust data to a Ruby object. Using There an example in these changes: 91ee513#diff-676d4870958cd433b65507e695efedb29f7a535ba9551cf276e21f1b8e2f25eaR149 |
I'm pretty sure the strings do not have to be null terminated, since it is actually fully possible to store null bytes in a Ruby string.
Can you give an example here? |
Just a note that I do plan to get back to you on this, but my personal life (newborn baby) is currently occupying all my time. |
Congrats on the newborn! Take your time far the most important things ❤️ |
So I'm far from an expert, I don't even really have any hands on experience, I'm just piecing things together from what I've read. Let's make sure we're on the same page to start. My understanding of the code here is that we're using
The changes also provide a method to clear the So we start with something like this:
Then we call
Then we forget the Rust string and get:
The boundary between the "Rust Heap" and "Ruby Heap" is kind of weird, as everything is running in the same process, so it's the same heap, but there is potential for the memory to be managed by different allocators. Ruby's configure script has an option to compile with jemalloc, and before that was added it was a common optimisation to patch Ruby with a faster allocator (usually the same jemalloc). Rust used to exclusively use jemalloc, but as of 1.28 the These two together mean that while it's pretty common for both Ruby and Rust to be using the system allocator, it's also quite possible they will be using different allocators. The really hard part is knowing when both are using the same allocator. It's not really possible when developing a publicly available extension gem - users can do what they want. I'm not sure documenting it is even enough - I don't think there's many developers at my job who realise the Ruby we're using in production has been compiled with jemalloc. There's lots of bits of documentation that suggest memory allocated with a particular allocator should only ever be freed with the same allocator's A particularly good example is Rust's (currently unstable)
This is particularly relevant, as what we're trying to do here is the kind of thing My rough understanding of why one allocator can't free memory allocated by another is that the two allocators could have different internal bookkeeping, and freeing memory an allocator is unaware of could corrupt that bookkeeping. For example, say allocator A just directly makes syscalls to request memory for each allocation, and allocator B pre-allocates a block of memory, then hands out chunks of that for each allocation. Allocating with A and then freeing with B will corrupt B's record keeping. So to me this all adds up to, there's no guarantee the allocator Rust uses to allocate the string and the allocator Ruby uses to free the string are the same, so there's no way to safely do what these changes are currently doing. The safe way would be to - when Ruby is done with the string - reconstitute it into a Rust String, then drop it. The obvious api for this is Ruby's finalisers, but after a quick benchmark the finaliser API is devastatingly slow so of no use in this case. The other workaround I can think of - wrapping a Rust struct holding the string in a Ruby object and assigning that to an ivar of the String that'll be GC'd at the same time as the String - is also too much of a slowdown to be useful. Part of the problem seems to be, there isn't actually that big of a speedup to be gained - the conversion from Rust String to Ruby String is already pretty quick. So there's basically no room for a workaround for freeing the string. I'd still love to find a way to get this working1, I just can't see a path to it myself. 1. The reverse of this, |
This has got me thinking… In We even thought that reporting the memory usage might be unsafe, but it turned out GC is not triggered if I wonder if we used Although dubious, if that’s the case we may be able to use the same allocator in Rust + Ruby… |
There's a currently unstable Rust feature called The especially useful part is the allocator becomes a type parameter (with a default of pub fn from_string(s: String<RbAllocator>) -> RString {
...
} and our requirement that the string has been allocated with the same allocator as Ruby is enforced by the type system. The But I guess we'd be waiting on the Rust PR to add allocator support to String before it's even possible to prototype this on nightly. |
So I stumbled upon another interesting way to do this type of thing. |
So I've been thinking about devising a mechanism to hand off a Rust
String
to Ruby in a way that:memcpy
on the Ruby sideThis would allow for extremely fast, zero-copy handoffs of strings to Ruby.
I have a proof of concept in this PR, and I think it's close. The issue I'm running into is since
RString
implementsCopy
(and wisely so, I think), it is possible to attempt to access a potentially freed pointer on theRString
.Unfortunately, I do not really know of a good way around it... Was curious to hear if you had any ideas or thoughts 😄