Revisiting OsStr and UTF-8 on Windows #2741

mqudsi · 2019-08-09T22:19:43Z

As of Windows 10 build 17035 (stabilized in April 2018 update), all (in reality, most) Windows APIs can be called via the "legacy" narrow-string interfaces (e.g. CreateFileA instead of CreateFileW), but have been updated to be UTF-8 compatible when the codepage is set to UTF-8 (rather than expecting CP-1252 or the non-unicode system character set). It also affects non-suffixed functions like fopen.

I haven't dug in deeply enough to see what the behavior is when the original data is not valid Unicode (e.g. how this intersects with WTF-8), but it does provide some opportunities to unify the behavior between unix and Windows ffi, e.g. using u8 rather than u16 for OsStrExt.

Of course none of this going to be backported to older versions of Windows we would like to continue supporting, so I'm not sure how that would ultimately play with cfg targets, either.

The text was updated successfully, but these errors were encountered:

nagisa · 2019-08-09T23:52:45Z

Unfeasible at least for a few dozen of years more, until virtually everybody is using April 2018 or later Windows (rather than it being an equal split between 7 and 10, with sprinkles of XP, Vista, 8 and 8.1 here and there)

Regardless, I feel that "when codepage is set to utf-8" is ultimately what puts a stoke in the wheel here.

Diggsey · 2019-08-10T00:05:30Z

Also, UTF-8 being supported doesn't change the fact that file system paths are represented as (possibly UTF-16) u16 arrays.

mqudsi · 2019-08-10T00:57:55Z

Sorry for not quoting, im on mobile.

As I understand it, it would be as simple as the code page can be configured/forced to utf8 for the running application as part of the environment startup code implemented for translating fn main() for MSVC targets.

Regarding compatibility: I don’t think we should explicitly aim to sunset support for older operating systems (now or in the future), I was thinking more along the lines of alternate targets akin to msvc.legacy and msvc, with the former being what we currently have in place. Obviously only thinking aloud at this point in time, though!

crlf0710 · 2019-08-10T04:06:51Z

By the way, the fact that OsStr doesn't support slicing on windows greatly limited its usage. I think the api surface should expand a little.

Ixrec · 2019-08-10T13:40:27Z

I don't think it's feasible for Rust's standard library to assume it can quietly change the code page for any application it's part of, simply because you can have Rust libraries inside a non-Rust application. Though even for a 100% Rust codebase, I'm not sure we'd ever want std APIs to be implicitly changing subtle process-wide settings like code page. Making OsStr be UTF-8 probably won't be a viable proposal until all non-end-of-life versions of Windows reject non-UTF-8 paths by default.

But if the motivation is just to make OsStr slicing a thing, I believe #2295 is an accepted (albeit apparently never implemented?) RFC to enable pattern matching and slicing OsStrs by adopting the OMG-WTF-8 encoding for Windows.

LunarLambda · 2024-05-21T10:25:46Z

As I understand it, it would be as simple as the code page can be configured/forced to utf8 for the running application as part of the environment startup code implemented for translating fn main() for MSVC targets.

Applications only have access to setting the codepage to UTF-8 if and only if they embed an application manifest specifying utf-8 support. Rust does not do this today, and I don't know that it should. Even then, what about a cdylib or in future dylib being loaded by an application without such a manifest? Now you have Rust code assuming it can speak UTF-8 to system APIs when in fact, it can't.

jonas-schievink added A-ffi FFI related proposals. A-string Proposals relating to strings. A-windows Proposals relating to Windows. labels Nov 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revisiting OsStr and UTF-8 on Windows #2741

Revisiting OsStr and UTF-8 on Windows #2741

mqudsi commented Aug 9, 2019

nagisa commented Aug 9, 2019 •

edited

Loading

Diggsey commented Aug 10, 2019

mqudsi commented Aug 10, 2019

crlf0710 commented Aug 10, 2019

Ixrec commented Aug 10, 2019 •

edited

Loading

LunarLambda commented May 21, 2024

Revisiting OsStr and UTF-8 on Windows #2741

Revisiting OsStr and UTF-8 on Windows #2741

Comments

mqudsi commented Aug 9, 2019

nagisa commented Aug 9, 2019 • edited Loading

Diggsey commented Aug 10, 2019

mqudsi commented Aug 10, 2019

crlf0710 commented Aug 10, 2019

Ixrec commented Aug 10, 2019 • edited Loading

LunarLambda commented May 21, 2024

nagisa commented Aug 9, 2019 •

edited

Loading

Ixrec commented Aug 10, 2019 •

edited

Loading