-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: set main thread name without re-encoding #123534
Conversation
rustbot has assigned @Mark-Simulacrum. Use |
/// Const convert UTF-8 to UTF-16, for use in the wide_str macro. | ||
/// | ||
/// Note that this is designed for use in const contexts so is not optimized. | ||
pub const fn to_utf16<const UTF16_LEN: usize>(s: &str) -> [u16; UTF16_LEN] { | ||
let mut output = [0_u16; UTF16_LEN]; | ||
let mut pos = 0; | ||
let s = s.as_bytes(); | ||
let mut i = 0; | ||
while i < s.len() { | ||
match s[i].leading_ones() { | ||
// Decode UTF-8 based on its length. | ||
// See https://en.wikipedia.org/wiki/UTF-8 | ||
0 => { | ||
// ASCII is the same in both encodings | ||
output[pos] = s[i] as u16; | ||
i += 1; | ||
pos += 1; | ||
} | ||
2 => { | ||
// Bits: 110xxxxx 10xxxxxx | ||
output[pos] = ((s[i] as u16 & 0b11111) << 6) | (s[i + 1] as u16 & 0b111111); | ||
i += 2; | ||
pos += 1; | ||
} | ||
3 => { | ||
// Bits: 1110xxxx 10xxxxxx 10xxxxxx | ||
output[pos] = ((s[i] as u16 & 0b1111) << 12) | ||
| ((s[i + 1] as u16 & 0b111111) << 6) | ||
| (s[i + 2] as u16 & 0b111111); | ||
i += 3; | ||
pos += 1; | ||
} | ||
4 => { | ||
// Bits: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx | ||
let mut c = ((s[i] as u32 & 0b111) << 18) | ||
| ((s[i + 1] as u32 & 0b111111) << 12) | ||
| ((s[i + 2] as u32 & 0b111111) << 6) | ||
| (s[i + 3] as u32 & 0b111111); | ||
// re-encode as UTF-16 (see https://en.wikipedia.org/wiki/UTF-16) | ||
// - Subtract 0x10000 from the code point | ||
// - For the high surrogate, shift right by 10 then add 0xD800 | ||
// - For the low surrogate, take the low 10 bits then add 0xDC00 | ||
c -= 0x10000; | ||
output[pos] = ((c >> 10) + 0xD800) as u16; | ||
output[pos + 1] = ((c & 0b1111111111) + 0xDC00) as u16; | ||
i += 4; | ||
pos += 2; | ||
} | ||
// valid UTF-8 cannot have any other values | ||
_ => unreachable!(), | ||
} | ||
} | ||
output | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work. I feel like at least some of this should be using more public std API instead of a bunch of sorcerous isopsephia, but I looked for equivalents and couldn't find any in the stdlib, so this will do for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r=me with comment
@bors r=workingjubilee |
Okay, following fmease's explanation I think using a decl macro would be fine since we're std and get to use nightly features when we want: |
`wide_str!` creates a null terminated UTF-16 string whereas `utf16!` just creates a UTF-16 string without adding a null.
Ok, I've rewritten it to use macros 2.0. I did the same for both macros for the sake of consistency. |
Yay! ( I don't mean to be annoying, I just don't think we should embrace a fragile proliferation of underscores if we don't have to. ) @bors r+ |
…llaumeGomez Rollup of 7 pull requests Successful merges: - rust-lang#118391 (Add `REDUNDANT_LIFETIMES` lint to detect lifetimes which are semantically redundant) - rust-lang#123534 (Windows: set main thread name without re-encoding) - rust-lang#123659 (Add support to intrinsics fallback body) - rust-lang#123689 (Add const generics support for pattern types) - rust-lang#123701 (Only assert for child/parent projection compatibility AFTER checking that theyre coming from the same place) - rust-lang#123702 (Further cleanup cfgs in the UI test suite) - rust-lang#123706 (rustdoc: reduce per-page HTML overhead) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#123534 - ChrisDenton:name, r=workingjubilee Windows: set main thread name without re-encoding As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.
As a minor optimization, we can skip the runtime UTF-8 to UTF-16 conversion.