-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new API to read wide_strings from Memory (for Windows) #66470
Conversation
r? @zackmdavis (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
src/librustc_mir/interpret/memory.rs
Outdated
pub fn read_wide_str(&self, ptr: Scalar<M::PointerTag>) -> InterpResult<'tcx, &[u8]> { | ||
let widestr_u8_initbyte = self.read_bytes(ptr, Size::from_bytes(1))?; | ||
let mut widestr_len = 0; // length in bytes | ||
unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could a safety comment be added to this block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better yet, could you please not use unsafe
? Just follow the implementation strategy of read_c_str
, that one does not use unsafe, either.
r? @RalfJung |
This comment has been minimized.
This comment has been minimized.
src/librustc_mir/interpret/memory.rs
Outdated
unsafe { | ||
let mut tracker = &widestr_u8_initbyte[0] as *const u8; | ||
while !(*tracker == 0 && *tracker.add(1) == 0) { | ||
tracker = tracker.add(2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually incorrect unsafe
code. You are taking a slice of length 1, turning it into a raw pointer, and reading outside the bounds of that slice. That's UB due to violating Rust's aliasing rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even worse, isn't this going out-of-bounds of the allocation and causing dangling pointer deref's in case there is no trailing double-NULL?
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Looks like your code doesn't actually typecheck yet.
|
Thank you 😄 May I ask how I can see such detailed error messages when I build rust from source code locally using |
That indicates a very serious issue with your local setup, which you should try to get fixed. I suggest asking for help on Discord or Zulip (I'm afraid I won't have much time in the near future, else I'd offer help myself).
|
I will leave an update once I test setting/getting environment variables in Windows using the new API function. |
/// Reads bytes until a `0x00` is encountered. Will error if the end of the allocation | ||
/// is reached before a `0x00` is found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0x00
is still a single byte. You meant 0x0000
, I think.
Thanks, this looks great! I have one high-level concern though: the return value of these methods is still a byte slice in target endianess. Actually turning this into |
Ping from Triage: @JOE1994 any updates? |
@joelpalmer Hi, I apologize for delaying the process. |
From my understanding, it is not really necessary to care about converting a byte slice to I made a pull-request(rust-lang/miri#1098) to the MIRI repo to demonstrate how I intend to use Please correct me if I'm mistaken. Thank you |
Why would that be the case? The "wide_str" is UTF-16 encoded, which works in units of 2 bytes. You cannot treat this as a UTF-8 or ASCII string and expect that to make any sense.
Yeah that won't work. You have to properly account for the fact that the in-memory encoding of these strings is UTF-16 (or rather, the quirky Windows version of UTF16, though that does not make anything any worse). |
Thank you for your correction. I intended to show a very rough draft of using I will follow your directions from MIRI's side, and come back here after that. |
It's okay, you making that PR helped a lot for me to understand where your misunderstanding was rooted. :) |
Ping from Triage: Any update @JOE1994? |
I will finish working on this PR once my final exam finishes tomorrow. I apologize for the delay.. |
Since this issue has been open for quite long without much progress, I'll make a new PR later once I make some progess locally. Thank you all for your feedback 🙂 |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
rust-lang/miri#707 (comment)
Struct
rustc_mir::interpret::Memory
provides an API(read_c_str
) for reading null-terminated strings, but does not provide an API for reading wide_strings(2 bytes per character; terminated with 2 consecutive null-bytes). A new APIread_wide_str
for reading wide_strings will be helpful in writing code for MIRI in Windows.Any feedback would be appreciated! Thank you 👍