-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade CString to not allocate on empty. #40547
Conversation
r? @brson (rust_highfive has picked a reviewer for you, use r? to override) |
src/libstd/ffi/c_str.rs
Outdated
let slice = slice::from_raw_parts(ptr, len as usize); | ||
CString { inner: mem::transmute(slice) } | ||
if len == 1 { | ||
debug_assert_eq!(ptr, NUL_TERMINATED_EMPTY.as_ptr() as *mut c_char); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion makes no sense. It is requiring all ptr
s pointing to null byte be NUL_TERMINATED_EMPTY
, which is a breaking change and also weird.
(It precludes code like one below from working)
let cstring = CString::from_raw(b"\0");
...
mem::forget(cstring);
src/libstd/ffi/c_str.rs
Outdated
@@ -274,7 +287,12 @@ impl CString { | |||
/// Failure to call `from_raw` will lead to a memory leak. | |||
#[stable(feature = "cstr_memory", since = "1.4.0")] | |||
pub fn into_raw(self) -> *mut c_char { | |||
Box::into_raw(self.into_inner()) as *mut c_char | |||
if let Some(bytes) = self.into_inner() { | |||
if bytes.len() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a reason for this conditional. Return the pointer always. Alternatively assert that if bytes.len() == 1, the pointer is the same as of NUL_TERMINATED_EMPTY slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is already an assertion; I accidentally added that condition. Will remove.
Was CString NonZero before this change? It won't be after the change. Transmute::<Option, Something>(a) will stop compiling, too, if it compiled before... |
src/libstd/ffi/c_str.rs
Outdated
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub struct CString { | ||
// Invariant 1: the slice ends with a zero byte and has a length of at least one. | ||
// Invariant 2: the slice contains only one zero byte. | ||
// Improper usage of unsafe function can break Invariant 2, but not Invariant 1. | ||
inner: Box<[u8]>, | ||
inner: Option<Box<[u8]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like lacks a >
here?
@oli-obk It was because it was just a wrapper for a I've also fixed the errors; initially I had also coded this change to replace |
src/libstd/ffi/c_str.rs
Outdated
@@ -260,8 +267,13 @@ impl CString { | |||
#[stable(feature = "cstr_memory", since = "1.4.0")] | |||
pub unsafe fn from_raw(ptr: *mut c_char) -> CString { | |||
let len = libc::strlen(ptr) + 1; // Including the NUL byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the strlen be moved the into the else branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it should!
How about change
and initialize it to |
@mzji That won't work because |
To preserve null-pointer optimization, you can keep the original |
@abonander We cannot pass the ownership of both stack memory and static memory over the FFI boundary, right? |
@mzji Static memory is fine because it's guaranteed to be around for the life of the program. There is some issue if the consumer of the string means to mutate it (because the static will be stored in a non-writable region, and also potentially shared with other instances), but that's not commonly done, to my knowledge; string manipulations generally involve copying to a new allocation. It's also meaningless to mutate a zero-length C-string because all you could mutate is the nul-byte, in which case your handling of C-strings is erroneous to begin with, and a segfault from attempting to write non-writable memory would probably be the nicest way to discover that bug. Leaking heap memory is also fine because it won't ever be reclaimed unless you get the pointer back and free it yourself. That's the point of However, stack memory is an issue because if you're wanting to pass a C-string to some library which will then hold onto it, having the backing memory on the stack is an issue because when your function returns and another one is called, you'll have some other code using the exact same memory region as your C-string. |
@abonander Ok the 'C code will hold the pointer' part makes sense. Anyway that's shouldn't be called 'passing ownership', since the buffer will be freed by rust code, not C code. I misunderstood this. |
That's why I put "ownership" in quotes. Even if the concept of ownership isn't built into the language, the C code (or any other language that's ABI-compatible with C) is still taking responsibility for the pointer and freeing it prematurely could cause Bad Things™ to happen. |
What I thought is actually that the C code just takes a borrow (in Rust meaning) of the string. However, yeah, some C code just does some cute bad things... So put empty string in static memory could almost immidiately figure out some C code just works incorrectly. |
If the FFI code doesn't assume the pointer will remain valid after the call returns, then passing a temporary pointer is fine; you would use |
Also one thing we should remember, is that the memory should be freed by the allocator which allocates it. That means, even for pointers generated by |
766957d
to
4417e8b
Compare
How often are empty strings passed through FFI? Is there a specific use case this is targeting? |
I'm not sure if this scenario is a concern for rust libraries, but potentially you could load a dynamic library at runtime using dlopen, call a function returning an empty string, unload it and then try to access the string. |
8c89b4d
to
063cfe8
Compare
This is not true. The dynamic libraries and their static memory may be loaded and unloaded at will. All in all, I find it ever so harder to justify making this change. |
We discussed this PR during libs triage the other day, and we were in general hesitant to accept this due to the difference from how types like @clarcharr can you clarify use cases and perhaps provide some real-world benchmarks to motivate this PR? |
@clarcharr thanks for the contribution! We're looking for some clarification; would you mind responding to the previous comment? |
@shepmaster @alexcrichton sorry for not responding! I kind of forgot about this PR and I'm not 100% sure if I have a compelling enough argument for it. Right now, my main thoughts were that it would support a future implementation of Uses of that would basically be anywhere where some Rust code needs to build strings for C code. Which, again, I'm not sure how common that is. I'd be willing to come up with some benchmarks if this kind of thing is desired, although right now I am leaning toward closing this and starting more of a discussion elsewhere on how desired |
Also for context, I've currently been looking up inconsistencies for rust-lang/rfcs#1876 that can be fixed now without an RFC, so that there's less that'd be needed to talk about for a future RFC. I sort of included this in that list as a "maybe we want this" because I figured that it'd be easier to come up with an implementation than to speculate about how it'd be done. |
Hm ok. I figured that we wouldn't end up expending that much effort enhancing |
We currently do have that method, but the main downside I see is that you either a) have to go unsafe and assert yourself that there are no internal NULs or b) pay an extra penalty checking yourself. I also figure that this is similar to what you get from And considering how they'd all just be shallow façades over |
Ok well in any case the original thoughts of the libs team were that lacking motivation/benchmarks we'd be inclined to close this PR. Would you like to gather that though before closing? |
Personally I think it's best to close this now and wait until more discussion/an RFC happens before continuing. I can reopen later if there's a desire to merge this later. |
This is done by letting
CString
contain an empty slice, while also returning a static, NUL-terminated slice when necessary.This shouldn't be too much of a maintenance burden, but I'll put this PR up and let the libs team decide if they want to keep this.