Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new API to read wide_strings from Memory (for Windows) #66470

Closed
wants to merge 12 commits into from
Closed

new API to read wide_strings from Memory (for Windows) #66470

wants to merge 12 commits into from

Conversation

JOE1994
Copy link
Contributor

@JOE1994 JOE1994 commented Nov 16, 2019

rust-lang/miri#707 (comment)

Struct rustc_mir::interpret::Memory provides an API(read_c_str) for reading null-terminated strings, but does not provide an API for reading wide_strings(2 bytes per character; terminated with 2 consecutive null-bytes). A new API read_wide_str for reading wide_strings will be helpful in writing code for MIRI in Windows.

Any feedback would be appreciated! Thank you 👍

@rust-highfive
Copy link
Collaborator

r? @zackmdavis

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 16, 2019
@rust-highfive

This comment has been minimized.

pub fn read_wide_str(&self, ptr: Scalar<M::PointerTag>) -> InterpResult<'tcx, &[u8]> {
let widestr_u8_initbyte = self.read_bytes(ptr, Size::from_bytes(1))?;
let mut widestr_len = 0; // length in bytes
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a safety comment be added to this block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, could you please not use unsafe? Just follow the implementation strategy of read_c_str, that one does not use unsafe, either.

@Centril
Copy link
Contributor

Centril commented Nov 16, 2019

r? @RalfJung

@rust-highfive rust-highfive assigned RalfJung and unassigned zackmdavis Nov 16, 2019
@rust-highfive

This comment has been minimized.

unsafe {
let mut tracker = &widestr_u8_initbyte[0] as *const u8;
while !(*tracker == 0 && *tracker.add(1) == 0) {
tracker = tracker.add(2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually incorrect unsafe code. You are taking a slice of length 1, turning it into a raw pointer, and reading outside the bounds of that slice. That's UB due to violating Rust's aliasing rules.

Copy link
Member

@RalfJung RalfJung Nov 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even worse, isn't this going out-of-bounds of the allocation and causing dangling pointer deref's in case there is no trailing double-NULL?

@RalfJung RalfJung added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 19, 2019
@rust-highfive

This comment has been minimized.

@rust-highfive

This comment has been minimized.

@RalfJung
Copy link
Member

Looks like your code doesn't actually typecheck yet.

2019-11-23T14:27:02.8712977Z error[E0308]: mismatched types
2019-11-23T14:27:02.8713428Z    --> src/librustc/mir/interpret/allocation.rs:340:23
2019-11-23T14:27:02.8713785Z     |
2019-11-23T14:27:02.8714148Z 340 |            .position(|&(l,r)| l == 0 && r == 0) {
2019-11-23T14:27:02.8714839Z     |
2019-11-23T14:27:02.8714839Z     |
2019-11-23T14:27:02.8715168Z     = note: expected type `(&u8, &u8)`

@JOE1994
Copy link
Contributor Author

JOE1994 commented Nov 25, 2019

Looks like your code doesn't actually typecheck yet.
2019-11-23T14:27:02.8712977Z error[E0308]: mismatched types
2019-11-23T14:27:02.8713428Z --> src/librustc/mir/interpret/allocation.rs:340:23
2019-11-23T14:27:02.8713785Z |
2019-11-23T14:27:02.8714148Z 340 | .position(|&(l,r)| l == 0 && r == 0) {
2019-11-23T14:27:02.8714839Z |
2019-11-23T14:27:02.8714839Z |
2019-11-23T14:27:02.8715168Z = note: expected type (&u8, &u8)

Thank you 😄 May I ask how I can see such detailed error messages when I build rust from source code locally using x.py? I could only see a more generic error message like "gcc ~ ~ build failed."

@RalfJung
Copy link
Member

RalfJung commented Nov 25, 2019

I could only see a more generic error message like "gcc ~ ~ build failed."

That indicates a very serious issue with your local setup, which you should try to get fixed. I suggest asking for help on Discord or Zulip (I'm afraid I won't have much time in the near future, else I'd offer help myself).

./x.py --stage 0 build src/librustc_mir should pass successfully before you push to a PR, and when your system is set up properly it will show you error messages like the one above.

@JOE1994 JOE1994 requested a review from RalfJung November 26, 2019 01:38
@JOE1994
Copy link
Contributor Author

JOE1994 commented Nov 26, 2019

I will leave an update once I test setting/getting environment variables in Windows using the new API function.

Comment on lines 327 to 328
/// Reads bytes until a `0x00` is encountered. Will error if the end of the allocation
/// is reached before a `0x00` is found.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0x00 is still a single byte. You meant 0x0000, I think.

@RalfJung
Copy link
Member

Thanks, this looks great!

I have one high-level concern though: the return value of these methods is still a byte slice in target endianess. Actually turning this into u16s will require adjusting to host endianess, will it not? How did you intend to use that in Miri? I assume you will want a u16 slice eventually. Just transmuting this u8 slice to u16 will not give you the right result.

@joelpalmer
Copy link

Ping from Triage: @JOE1994 any updates?

@JOE1994
Copy link
Contributor Author

JOE1994 commented Dec 2, 2019

@joelpalmer Hi, I apologize for delaying the process.
I'll be back with an update in no more than 2 days.

@JOE1994
Copy link
Contributor Author

JOE1994 commented Dec 4, 2019

I have one high-level concern though: the return value of these methods is still a byte slice in target endianess. Actually turning this into u16s will require adjusting to host endianess, will it not? How did you intend to use that in Miri? I assume you will want a u16 slice eventually. Just transmuting this u8 slice to u16 will not give you the right result.

From my understanding, it is not really necessary to care about converting a byte slice to u16s when using read_wide_str to emulate SetEnvironmentVariableW & GetEnvironmentVariableW in Windows.

I made a pull-request(rust-lang/miri#1098) to the MIRI repo to demonstrate how I intend to use read_wide_str in MIRI.

Please correct me if I'm mistaken. Thank you

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2019

From my understanding, it is not really necessary to care about converting a byte slice to u16s when using read_wide_str to emulate SetEnvironmentVariableW & GetEnvironmentVariableW in Windows.

Why would that be the case? The "wide_str" is UTF-16 encoded, which works in units of 2 bytes. You cannot treat this as a UTF-8 or ASCII string and expect that to make any sense.

I made a pull-request(rust-lang/miri#1098) to the MIRI repo to demonstrate how I intend to use read_wide_str in MIRI.

Yeah that won't work. You have to properly account for the fact that the in-memory encoding of these strings is UTF-16 (or rather, the quirky Windows version of UTF16, though that does not make anything any worse).

@JOE1994
Copy link
Contributor Author

JOE1994 commented Dec 4, 2019

Why would that be the case? The "wide_str" is UTF-16 encoded, which works in units of 2 bytes. You cannot treat this as a UTF-8 or ASCII string and expect that to make any sense.

Thank you for your correction.
For emulating GetEnvironmentVariableW with MIRI on Windows, I wanted to use Memory::write_bytes, which takes impl IntoIterator<Item = u8> as one of its arguments.
I was blindly thinking that since the argument type is an IntoIterator of u8, it wouldn't be necessary to convert a byte-slice to a wide string..

I intended to show a very rough draft of using read_wide_str on MIRI's side with the PR,
but I should have reviewed it more carefully before making the PR. I apologize for the mess 😿

I will follow your directions from MIRI's side, and come back here after that.

@RalfJung
Copy link
Member

RalfJung commented Dec 4, 2019

I intended to show a very rough draft of using read_wide_str on MIRI's side with the PR,
but I should have reviewed it more carefully before making the PR. I apologize for the mess crying_cat_face

It's okay, you making that PR helped a lot for me to understand where your misunderstanding was rooted. :)

@joelpalmer
Copy link

Ping from Triage: Any update @JOE1994?

@JOE1994
Copy link
Contributor Author

JOE1994 commented Dec 9, 2019

Ping from Triage: Any update @JOE1994?

I will finish working on this PR once my final exam finishes tomorrow. I apologize for the delay..

@JOE1994
Copy link
Contributor Author

JOE1994 commented Dec 31, 2019

Since this issue has been open for quite long without much progress, I'll make a new PR later once I make some progess locally. Thank you all for your feedback 🙂

@JOE1994 JOE1994 closed this Dec 31, 2019
@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-7 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-12-31T16:41:26.3849704Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-12-31T16:41:26.4074649Z ##[command]git config gc.auto 0
2019-12-31T16:41:26.4139845Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-12-31T16:41:26.4197114Z ##[command]git config --get-all http.proxy
2019-12-31T16:41:26.4353126Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/66470/merge:refs/remotes/pull/66470/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants