-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept underscores in unicode escapes #43716
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
47416d3
to
7d1fa02
Compare
src/libsyntax/parse/lexer/mod.rs
Outdated
/// | ||
/// At this point, we have already seen the \ and the u, the { is the current character. We | ||
/// will read at least one digit, and up to 6, and pass over the }. | ||
/// At this point, we have already seen the `\` and the `u`, the `{` is the current character. We |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI is unhappy with this line because it is too long 🙂
[00:03:14] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:968: line longer than 100 chars
[00:03:15] some tidy checks failed
|
@SimonSapin what do you think about this? Also ping @rust-lang/lang |
Making numeric escape sequences consistent with integer literals makes sense to me. 👍 |
Please don't allow prefixes or suffixes, but otherwise this seems like a great idea. |
@MaloJaffre |
14085a8
to
0bac86c
Compare
Thanks for the suggestion @petrochenkov. Edit: Travis failure looks spurious (workers failed to start) |
src/libsyntax/parse/lexer/mod.rs
Outdated
loop { | ||
match self.ch { | ||
Some('}') => { | ||
if valid && count == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if count == 0
would give the same result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because in the case \u{#}
, we don't want to say that the escape is empty, so we check there was no invalid characters before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, this is in a loop, okay then.
src/libsyntax/parse/lexer/mod.rs
Outdated
self.err_span_char(start_bpos, | ||
self.pos, | ||
"invalid character in unicode escape", | ||
c); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error can now be reported a lot of times in case of unterminated unicode escapes.
It probably should be reported only the first time.
src/libsyntax/parse/mod.rs
Outdated
diag.struct_span_err(span, "invalid unicode character escape") | ||
.help("unicode escape must be at most 10FFFF") | ||
.emit(); | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid an changing the return type to option here and just return something like Replacement character U+FFFD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaloJaffre
Could you also squash commits after updating the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @petrochenkov!
Ok, I will shortly do another round of changes and squash everything.
Implementation LGTM, modulo comments. @rfcbot fcp merge |
I have no rights for @rfcbot, could someone start an FCP? |
0bac86c
to
d4e0e52
Compare
@petrochenkov Done. Edit: Travis failure looks spurious (OSX jobs failed to start). |
Friendly ping @nikomatsakis, to start a FCP, if there are no concerns about the implementation. |
@MaloJaffre |
I think this is fine without a feature gate. (Though I’m not in any team that would make that decision.) |
@rfcbot fcp merge |
Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams: No concerns currently listed. Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
self.bump(); | ||
count += 1; | ||
if let Some('_') = self.ch { | ||
// disallow leading `_` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a compile-fail
test checking that leading _
is disallowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a parse-fail
test that checks that, do I need to move it to compile-fail
?
🔔 This is now entering its final comment period, as per the review above. 🔔 |
The final comment period is now complete. |
@bors r+ |
📌 Commit d4e0e52 has been approved by |
Accept underscores in unicode escapes Fixes #43692. I don't know if this need an RFC, but at least the impl is here!
☀️ Test successful - status-appveyor, status-travis |
It is worth noting that most syntax highlighters will need updating to support this. (I just did Vim.) We need something like a mailing list for syntax highlighters where syntax changes can be announced. Regular expression highlighters will now need something like |
Rust gained this in rust-lang/rust#43716.
Changes in Rust highlighting: * Add missing types (keywords) and missing integer suffixes: `i128` & `u128` * https://doc.rust-lang.org/std/primitive.i128.html * https://doc.rust-lang.org/std/primitive.u128.html * Fix: Allow underscores in unicode escapes: rust-lang/rust#43716 Sources: * Rust documentation: https://doc.rust-lang.org/ * Rust in Vim: https://github.com/rust-lang/rust.vim/blob/master/syntax/rust.vim * Rust in ACE: https://github.com/ajaxorg/ace/blob/master/lib/ace/mode/rust_highlight_rules.js
Fixes #43692.
I don't know if this need an RFC, but at least the impl is here!