-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Wtf8Buf::into_string
for the case where it contains UTF-8.
#96869
Conversation
r? rust-lang/libs |
@bors r+ rollup=never |
📌 Commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 has been approved by |
⌛ Testing commit f4c93bcbe837ab5220beaa2892fc249f49a9e740 with merge 7f13587a59ff3c5350b77bf9334994f91f05d9fc... |
💔 Test failed - checks-actions |
This comment has been minimized.
This comment has been minimized.
src/librustdoc/html/render/context.rs had an assert that rustdoc's render context was a certain size. It's already not checked on most platforms, as it's just a guard against the context growing unexpectedly, so I've now added a patch to disable the check on Windows too. |
☔ The latest upstream changes (presumably #97433) made this pull request unmergeable. Please resolve the merge conflicts. |
There's merge conflict now. Needs a rebase. |
@rustbot label -S-waiting-on-author +S-waiting-on-review |
@bors r+ |
📌 Commit e620b42644e3404377fa0fae0afd660c6bd77999 has been approved by |
🌲 The tree is currently closed for pull requests below priority 1000. This pull request will be tested once the tree is reopened. |
Merge branch 'master' into main |
Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the string is known to contain UTF-8. This is efficiently computed in many common situations, such as when a `Wtf8Buf` is constructed from a `String` or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16 decoding and already checking for surrogates. This makes `OsString::into_string` O(1) rather than O(N) on Windows in common cases. And, it eliminates the need to scan through the string for surrogates in `Args::next` and `Vars::next`, because the strings are already being translated with `Wtf8Buf::from_wide`. Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`, such as `DirEntry::file_name` and `fs::read_link`, so with this patch, users of those functions can subsequently call `.into_string()` without paying for an extra scan through the string for surrogates.
This assert is just making sure the size of `Context` doens't grow unexpectedly, and it's already not being checked on every platform. `PathBuf` now has a different size on Windows, so adjust this to avoid checking the size on Windows.
I think I used the github merge-conflict UI and didn't realize that generated a merge commit. I've now fixed that. |
Ping @joshtriplett; this is already |
@bors r=joshtriplett |
☀️ Test successful - checks-actions |
Finished benchmarking commit (25ea5a3): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Footnotes |
/// It is possible for `bytes` to have valid UTF-8 without this being | ||
/// set, such as when we're concatenating `&Wtf8`'s and surrogates become | ||
/// paired, as we don't bother to rescan the entire string. | ||
is_known_utf8: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a field here introduced a subtle bug in PathBuf: #124409.
Privacy-breaking transmutes are "fun". ;)
…ng-utf8-invariant, r=<try> Make PathBuf less Ok with adding UTF-16 then `into_string` Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869. try-job: x86_64-msvc
…ring-utf8-invariant, r=joboet Make PathBuf less Ok with adding UTF-16 then `into_string` Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869. try-job: x86_64-msvc
Rollup merge of rust-lang#126305 - workingjubilee:fix-os-string-to-string-utf8-invariant, r=joboet Make PathBuf less Ok with adding UTF-16 then `into_string` Fixes rust-lang#126291 which is, as far as I can tell, a regression introduced by rust-lang#96869. try-job: x86_64-msvc
Add a
is_known_utf8
flag toWtf8Buf
, which tracks whether thestring is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a
Wtf8Buf
is constructed from aString
or
&str
, or withWtf8Buf::from_wide
which is already doing UTF-16decoding and already checking for surrogates.
This makes
OsString::into_string
O(1) rather than O(N) on Windows incommon cases.
And, it eliminates the need to scan through the string for surrogates in
Args::next
andVars::next
, because the strings are already beingtranslated with
Wtf8Buf::from_wide
.Many things on Windows construct
OsString
s withWtf8Buf::from_wide
,such as
DirEntry::file_name
andfs::read_link
, so with this patch,users of those functions can subsequently call
.into_string()
withoutpaying for an extra scan through the string for surrogates.
r? @ghost