-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revisiting OsStr and UTF-8 on Windows #2741
Comments
Unfeasible at least for a few dozen of years more, until virtually everybody is using April 2018 or later Windows (rather than it being an equal split between 7 and 10, with sprinkles of XP, Vista, 8 and 8.1 here and there) Regardless, I feel that "when codepage is set to utf-8" is ultimately what puts a stoke in the wheel here. |
Also, UTF-8 being supported doesn't change the fact that file system paths are represented as (possibly UTF-16) u16 arrays. |
Sorry for not quoting, im on mobile. As I understand it, it would be as simple as the code page can be configured/forced to utf8 for the running application as part of the environment startup code implemented for translating Regarding compatibility: I don’t think we should explicitly aim to sunset support for older operating systems (now or in the future), I was thinking more along the lines of alternate targets akin to |
By the way, the fact that |
I don't think it's feasible for Rust's standard library to assume it can quietly change the code page for any application it's part of, simply because you can have Rust libraries inside a non-Rust application. Though even for a 100% Rust codebase, I'm not sure we'd ever want But if the motivation is just to make |
Applications only have access to setting the codepage to UTF-8 if and only if they embed an application manifest specifying utf-8 support. Rust does not do this today, and I don't know that it should. Even then, what about a |
As of Windows 10 build 17035 (stabilized in April 2018 update), all (in reality, most) Windows APIs can be called via the "legacy" narrow-string interfaces (e.g.
CreateFileA
instead ofCreateFileW
), but have been updated to be UTF-8 compatible when the codepage is set to UTF-8 (rather than expecting CP-1252 or the non-unicode system character set). It also affects non-suffixed functions likefopen
.I haven't dug in deeply enough to see what the behavior is when the original data is not valid Unicode (e.g. how this intersects with WTF-8), but it does provide some opportunities to unify the behavior between unix and Windows ffi, e.g. using
u8
rather thanu16
forOsStrExt
.Of course none of this going to be backported to older versions of Windows we would like to continue supporting, so I'm not sure how that would ultimately play with
cfg
targets, either.The text was updated successfully, but these errors were encountered: