-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS string string-like interface #1309
Conversation
See rust-lang/rust#26499 (related discussion) |
Thanks. I'd missed that discussion.
That also suggests the question of whether these functions should take things like |
```rust | ||
/// Returns true if the string starts with a valid UTF-8 sequence | ||
/// equal to the given `&str`. | ||
fn starts_with_str(&self, prefix: &str) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When adding string-like APIs to OsStr
, it may be best to try to stick to original str
APIs as much as possible, while also allowing all kinds of fun functionality for OsStr
. Along those lines, perhaps this API could look like:
fn starts_with<S: AsRef<OsStr>>(&self, prefix: S) -> bool;
That should cover this use case as well as something like os_str.starts_with(&other_os_string)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable. I'll make that change.
Thanks for the RFC @wthrowe! It does seem high time that we start expanding the API for My thoughts on this in the past have been primarily along the lines of:
|
I'll look into writing up equivalents of some of the more general pattern matching |
Yeah I agree that |
|
||
This is analogous to the existing `OsStr::to_string_lossy` method, but | ||
transfers ownership. This operation can be done without a copy if the | ||
`OsString` contains UTF-8 data or if the platform is Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading between lines I think you already know that, but WTF-8 to UTF-8 conversion can be done in place: https://simonsapin.github.io/wtf-8/#converting-wtf-8-utf-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. In fact, the internal WTF-8 implementation in libstd already has an into_string_lossy
that does the right thing. It just isn't exposed at the OsString
level.
Also adds more explanation of how OS strings are interpreted.
This seems to not prevent anything actually useful and avoids confusion. Also matches `str` better.
New version, now with more patterns! The previous proposed functions have mostly been replaced with a new interface matching I did decide to not propose The main uncertainty in the new version is what to do with patterns that match the empty string. The non-iterator functions (like |
One more thought on patterns: With more complicated implementors of |
|
||
```rust | ||
/// Returns true if `needle` is a substring of `self`. | ||
fn contains_os<S: AsRef<OsStr>>(&self, needle: S) -> bool; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be a bit onerous to remember that contains
and contains_os
are both methods on an OsStr
. I think we'd be in a better place (e.g. especially in mirroring str
) if we could avoid extra methods like this. It may involve perhaps a new Pattern
trait down below, but we may also be able to substitute AsRef<OsStr>
for Pattern
because str
is in theory far more ubiquitous than OsStr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some kind of OsPattern
trait implemented for P: Pattern
and OsStr
could probably work. (Implementing for AsRef<OsStr>
as well would be forbidden by coherence, I believe.) There might be some trickiness due to starts_with
and contains
having different bounds. Have to think about it.
Edit: I think that can be dealt with by bounds on the trait methods.
No reason such a thing couldn't be used for replace
as well, except that for some reason str
doesn't accept patterns there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I forget the exact reason that replace on strings doesn't take a pattern, but I think there may be a good reason? (cc @Kimundi)
Otherwise yeah having an OsPattern
trait seems not-too-bad here perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a possible rough design:
Trait OsPattern<'a>
is basically identical to Pattern<'a>
except that the str
s become OsStr
s, the Searcher
is bound on OsSearcher<'a>
, and is_contained_in
has an extra bound (more on that later).
Safe trait OsSearcher<'a>
only has methods haystack
(same as Searcher<'a>
) and is_prefix_of
.
Safe trait ReverseOsSearcher<'a>
requires OsSearcher<'a>
and adds is_suffix_of
. OsStr::ends_with
requires this bound.
Safe trait FullOsSearcher<'a>
requires OsSearcher<'a>
and adds is_contained_in
. OsStr::contains
requires this bound.
If we additionally want to support replace
with patterns, then we also have:
Unsafe trait IndexedOsSearcher<'a>
requires FullOsSearcher<'a>
and adds the remainder of the Searcher<'a>
methods, except that all the usize
returns are changed to OsIndex<'a>
s. OsStr::replace
requires this bound.
Enum OsIndex<'a>
has variants Unicode(&'a str, usize)
and NonUnicode(&'a OsStr, NonUnicodeIndex)
. The first field in each variant is expected to be a substring of the haystack, and the second is an index into that substring. NonUnicodeIndex
is a struct with basically no public interface. (OS-specific interfaces may be added later.)
If we decide to add methods like the split_prefix
mentioned above we will likely need to add more variants on the OsSearcher
trait. Ideally that would be figured out before any of this is stabilized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in the limit I think this may end up duplicating the API surface area of the string pattern traits, but there's also a question as to whether this needs to be so full-blown just yet. In theory OS strings are used much more rarely than regular strings (especially for various text manipulation routines), so we may be able to get by with a much smaller API surface area.
I wonder if perhaps this could expose a relatively straightforward, if not as generic, API today which leaves room to this sort of expansion in the future but doesn't take the leap quite just yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this, I think at the very least IndexedOsSearcher
is silly, because I can't think of any case where one would implement FullOsSearcher
but not it, so they can be merged.
The only reasonable simplification of the API that I can think of that can be generalized to something like this is to just bound on the Pattern
family of traits. Then there are no new traits needed, and everything works except for starts_with
, ends_with
, contains
, and replace
with an OsStr
. That's probably not too bad as a restriction, and I believe changing to OsPattern
at some later time would be fully backwards compatible.
I've unfortunately not been able to find much time to allocate to this RFC, so @Kimundi (who originally developed the pattern API for strings) is gonna take over this. |
Hi, just wanted to say that haven't forgotten about this RFC, but I'm currently investigating how a fully general pattern API (for arbitrary slice types) would look like, and whether there might be incompatibilities or method name clashes with this RFC. |
@Kimundi Thanks for the update - could really do with this functionality! |
Update: I have a incomplete, undocumented prototype at https://github.com/Kimundi/rust_pattern_api_v2 that provides the same Pattern API + iterators for both I need to find time to do a proper writeup for my findings, but in regard to this RFC the cliff notes are:
|
The libs team discussed this RFC recently and the conclusion was that while we'd like to implement something along these lines the RFC will need much of a revamp now and we unfortunately haven't been able to reach the author, so we're going to close. We're of course quite willing to entertain RFCs in this area though! |
Rendered
Prototype implementation (also covers #1307)
Cc: #900, #1307