-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deprecate Unicode functions that will be moved to crates.io #24428
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
1e709bf
to
c596b9a
Compare
@@ -161,6 +161,9 @@ enum DecompositionType { | |||
/// External iterator for a string decomposition's characters. | |||
/// | |||
/// For use with the `std::iter` module. | |||
#[allow(deprecated)] | |||
#[deprecated(reason = "use the crates.io `unicode-decomp` library instead", | |||
since = "1.0.0-nightly-20150415")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These since
tags should all be "1.0.0" for now (e.g. they're applicable for the 1.0.0 release)
1bf8bd0
to
7aec8a0
Compare
But why? I think unicode crate is the right place to hold these features.
|
7aec8a0
to
d14a96c
Compare
I'm also interested to understand the rationale for breaking these into fine-grained crates, rather than building out |
The two things that convinced me each function should be in its own crate are:
I suppose one other point in favor is that there are already a few small crates out there that provide their own small bits of Unicode functionality, so an omnibus crate would end up either duplicating their functionality or requiring people to pull in multiple crates anyway. |
d14a96c
to
516795d
Compare
@liigo the idea of moving these out has been discussed in #24402, #24340, #15628, and rust-lang/rfcs#1054 . In short, the notion is that there is no particular reason to have these in the standard library, and on the other hand, including them is a burden both to libstd and to libunicode, because of the stability guarantees that libstd wants to provide to users. |
I don't think grapheme/width management should move out of libunicode, since there're strong related to Unicode. How about moves out the whole libunicode? cc @alexcrichton |
I think maybe you all are talking about different things: One is: should these three new The other is: should this functionality be in a crate distributed with rustc? (As opposed to crates.io.) The As @kwantam said, removing them from But there is some compiler usage where this PR simply adds |
(We could theoretically offer a |
@SimonSapin I agree regarding renaming libunicode. I was going to suggest libcore_unicode: the non-deprecated functionality really is being used in libcore, libcollections, and libstd, so calling it librustc_unicode might be suggestive of a narrower set of uses than is actually the case. Regarding the use of If there's general consensus on a rename, I can rename libunicode to whatever name we decide (libcore_unicode and librustc_unicode have been suggested so far), and then leave behind a dummy libunicode that re-exports everything from lib{core,rustc}_unicode with a |
I don’t have an opinion on the new name,
If that’s an option, I’d rather have it in this PR than |
👍 if everyone else is onboard. |
I'm ok not handling graphemes and friends in width calculations for the compiler, hardwiring to 1 seems like it's definitely fine for now. I don't have a super strong opinion on one crate vs many crates, but I might err on the side of small crates for now as it pushes back on the idea of a "dumping ground" for unicode-related functionality and as @huonw suggested we can always have our own facade crate if necessary. I would also be fine renaming libunicode in-tree, and it's also probably fine to not have much of a deprecation strategy as it looks like very few crates are still using it and it's unstable to start out with. I would recommend |
@alexcrichton would you prefer I leave the feature name the same ( |
I think leaving the same feature name is fine, e.g. "unicode support in general" |
516795d
to
9952769
Compare
👍 PR updated as discussed. |
I'm on board with the proposed changes. |
d1341f9
to
503533c
Compare
⛄ The build was interrupted to prioritize another pull request. |
⌛ Testing commit 29d1252 with merge 7616f19... |
⛄ The build was interrupted to prioritize another pull request. |
⌛ Testing commit 29d1252 with merge b7661c9... |
⛄ The build was interrupted to prioritize another pull request. |
⌛ Testing commit 29d1252 with merge c9069fe... |
💔 Test failed - auto-mac-64-nopt-t |
Not sure how to proceed here. Don't have a mac on which to try and reproduce, and I find it slightly hard to believe that a segfault building libserialize has anything to do with this PR. |
@bors: retry Ah I believe this was spurious |
Thanks! |
This patch 1. renames libunicode to librustc_unicode, 2. deprecates several pieces of libunicode (see below), and 3. removes references to deprecated functions from librustc_driver and libsyntax. This may change pretty-printed output from these modules in cases involving wide or combining characters used in filenames, identifiers, etc. The following functions are marked deprecated: 1. char.width() and str.width(): --> use unicode-width crate 2. str.graphemes() and str.grapheme_indices(): --> use unicode-segmentation crate 3. str.nfd_chars(), str.nfkd_chars(), str.nfc_chars(), str.nfkc_chars(), char.compose(), char.decompose_canonical(), char.decompose_compatible(), char.canonical_combining_class(): --> use unicode-normalization crate
I suppose it’s a bit late now, but shouldn’t deprecating/removing an entire crate + a bunch of commonly-used methods require an RFC, even if it’s marked as unstable? I know there’s been a lot of discussion about this elsewhere, but I thought that any decent-sized change should still go through the full RFC process instead of informally reaching a decision through discussion all about GitHub. |
In fairness, at this point nothing's been removed from libunicode, only deprecated (well, and renamed). We can still put things through the RFC process if that's deemed necessary. It's possible that we could leave the width-related functions behind a rustc_private feature gate to keep #8706 closed. It's not clear to me how critical that bug is. |
@P1start unfortunately requiring an RFC for nearly any modification to libstd is probably infeasible, especially when it comes to unstable APIs. We've long thought that these APIs would move out of the standard library at some point as they've stuck out as not quite belonging for some time now, but we may not have communicated that clearly enough. Right now we don't have a great story for external libraries in the rust-lang organization (and elsewhere) in terms of evolving their API, possibly coming back into libstd, etc. I would expect an RFC if these libraries are to be re-included, but for now I wouldn't expect an RFC to move unstable features out into external crates. |
Hm, I'm unclear why the compiler's use of these functions was removed: isn't the point of having the |
@huonw I would personally be somewhat uncomfortable having to maintain these tables in the standard library for private usage by the compiler when they don't really come up in practice that often, so to slate them for deletion the compiler was hardwired to char == 1-wide slot on the screen. |
This patch
librustc_driver and libsyntax. This may change pretty-printed
output from these modules in cases involving wide or combining
characters used in filenames, identifiers, etc.
The following functions are marked deprecated:
--> use unicode-width crate
--> use unicode-segmentation crate
char.compose(), char.decompose_canonical(), char.decompose_compatible(),
char.canonical_combining_class():
--> use unicode-normalization crate