-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check for possible CStr literals in pre-2021 #118691
Conversation
r? @b-naber (rustbot has picked a reviewer for you, use r? to override) |
@@ -640,6 +640,17 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
// extra info for `c"str"` before 2021 edition or `c "str"` in all editions | |||
if self.prev_token.is_ident_named(Symbol::intern("c")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.prev_token.is_ident_named(Symbol::intern("c")) | |
if self.prev_token.is_ident_named(sym::c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably also want to check for cr
(as in cr""
, raw c-string literals). If I'm not mistaken “sym::c && StrRaw
” only accounts for c r""
(notice the space).
// extra info for `c"str"` before 2021 edition or `c "str"` in all editions | ||
if self.prev_token.is_ident_named(Symbol::intern("c")) | ||
&& let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) = | ||
&self.token.kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could check if the c
immediately precedes the string literal via self.prev_token.span.hi() == self.token.span.lo()
to rule out the false positives like c ""
(notice the space) since I don't know how helpful the c-str notes are for c ""
. For comparison, we don't provide any special diagnostics for b ""
and r ""
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you noted
We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.
it should be as simple as self.prev_token.span.eq_ctxt(self.token.span) && self.token.span.edition < Edition::Edition2021
.
The eq_ctxt
check prevents the notes about cstr literals from misfiring on:
// edition: <2021
macro_rules! m { ($x:ident) => { $x"" } }
fn main() { m!(c) }
Your current implementation fires here (I think, unless I'm misremembering and we have an uninterpolated NtIdent
🤔).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fixing this up, but using this macro as an example, self.prev_token.span.eq_ctxt(self.token.span) == true
. The previous token's span appears to be that of the $x
token in the macro expansion body, rather than the c
argument to the macro, so they have the same context. Is this correct, or am I missing something?
&self.token.kind | ||
{ | ||
err.note("you may be trying to declare a CStr literal"); | ||
err.note("`CStr` literals require edition 2021"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err.note("`CStr` literals require edition 2021"); | |
err.note("`c-string literals require at least edition 2021"); |
&& let TokenKind::Literal(token::Lit { kind: token::Str | token::StrRaw(..), .. }) = | ||
&self.token.kind | ||
{ | ||
err.note("you may be trying to declare a CStr literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could think about making this diagnostic translatable. https://rustc-dev-guide.rust-lang.org/diagnostics/translation.html#writing-translatable-diagnostics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also think about moving this check above the creation of err
or alternatively .cancel()
the err
and provide a specialized error message instead of expected one of [symbol soup]
like proposed in #118654: c-string literals are not supported in Rust 2018 and older
.
I've gone ahead and removed the ambiguity with whitespace, leaving that to just be a standard "expected one of" errors for consistency with This could still misfire on the characters |
r? fmease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's great! Sorry for taking so long. Your code looks good, I just have some nitpicks about the comments and the diagnostics.
Lastly, could you squash your commits in one? After that, it should be ready to go!
@@ -640,6 +640,34 @@ impl<'a> Parser<'a> { | |||
} | |||
} | |||
|
|||
// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Extra info for `c"str"` before 2021 edition or `c "str"` in all editions. The heuristic | |
// Try to detect an attempt by the user to write a c-string literal before the 2021 edition. The heuristic |
or something like that (it gives a better summary imo). Note that your comment still mentions c "str"
(with a space) which is no longer accurate.
// edition where c-string literals are not allowed. There is the slight possibility of a | ||
// false positive for a `cr#` that wasn't intended to start a c-string literal, but the | ||
// lexer was greedy and didn't preserve whether the `r#` on its own would have started a | ||
// valid raw string literal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is fine but you could also think about expanding upon it with something along the lines of “and we don't want to perform unbounded lookahead here to check if we have a sequence of hashes followed by a string literal”.
&& self.prev_token.span.hi() == self.token.span.lo() | ||
&& !self.token.span.at_least_rust_2021() | ||
{ | ||
err.cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since this subdiagnostic can have false positives, it might be wiser to keep the “token soup” error, not sure. When I suggested canceling the error, I assumed the diagnostic would be 100% accurate and we could replace the main message of the diagnostic similar to error[E0670]: `async fn` is not permitted in Rust 2015
.
Your call though. If you'd like to keep the shorter message, I'd go with unexpected token
(dropping the found
) and with unexpected token
over found here
since the current phrasing isn't used in any other of rustc's error messages or only very rarely regarding the latter.
let mut err = | ||
self.struct_span_err(self.token.span, format!("found unexpected token {descr}")); | ||
err.span_label(self.token.span, "found here".to_owned()); | ||
err.note("you may be trying to declare a c-string literal"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err.note("you may be trying to declare a c-string literal"); | |
err.note("you might have meant to write a c-string literal"); |
Not a native speaker but the word declare doesn't feel right to me in this context.
self.struct_span_err(self.token.span, format!("found unexpected token {descr}")); | ||
err.span_label(self.token.span, "found here".to_owned()); | ||
err.note("you may be trying to declare a c-string literal"); | ||
err.note("c-string literals require edition 2021 or later"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
err.note("c-string literals require edition 2021 or later"); | |
err.note("c-string literals require the 2021 edition or later"); |
err.note("c-string literals require edition 2021 or later"); | |
err.note("c-string literals require Rust 2021 or later"); |
(minor) Preexisting diagnostics always seem to be using one of the suggested expressions.
c6729bd
to
2c96025
Compare
Okay, reverted to token soup because I agree it's more appropriate for this case; we really just want a hint here, not a whole new error (also saves me the trouble of being too creative with error messages). Also changed |
Thanks! :) @bors r+ rollup |
…mease Add check for possible CStr literals in pre-2021 Fixes [rust-lang#118654](rust-lang#118654) Adds information to errors caused by possible CStr literals in pre-2021. The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace. We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much. This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!
…mpiler-errors Rollup of 7 pull requests Successful merges: - rust-lang#118691 (Add check for possible CStr literals in pre-2021) - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work) - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage) - rust-lang#119089 (effects: fix a comment) - rust-lang#119096 (Yeet unnecessary param envs) - rust-lang#119118 (Fix arm64e-apple-ios target) - rust-lang#119134 (resolve: Feed visibilities for unresolved trait impl items) r? `@ghost` `@rustbot` modify labels: rollup
…mpiler-errors Rollup of 7 pull requests Successful merges: - rust-lang#118691 (Add check for possible CStr literals in pre-2021) - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work) - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage) - rust-lang#119089 (effects: fix a comment) - rust-lang#119096 (Yeet unnecessary param envs) - rust-lang#119118 (Fix arm64e-apple-ios target) - rust-lang#119134 (resolve: Feed visibilities for unresolved trait impl items) r? `@ghost` `@rustbot` modify labels: rollup
…iaskrgr Rollup of 7 pull requests Successful merges: - rust-lang#118691 (Add check for possible CStr literals in pre-2021) - rust-lang#118973 (rustc_codegen_ssa: Don't drop `IncorrectCguReuseType` , make `rustc_expected_cgu_reuse` attr work) - rust-lang#119071 (-Znext-solver: adapt overflow rules to avoid breakage) - rust-lang#119089 (effects: fix a comment) - rust-lang#119094 (Add function ABI and type layout to StableMIR) - rust-lang#119102 (Add arm-none-eabi and armv7r-none-eabi platform-support documentation.) - rust-lang#119107 (subtype_predicate: remove unnecessary probe) Failed merges: - rust-lang#119135 (Fix crash due to `CrateItem::kind()` not handling constructors) - rust-lang#119141 (Add method to get instance instantiation arguments) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of rust-lang#118691 - chfogelman:improve-cstr-error, r=fmease Add check for possible CStr literals in pre-2021 Fixes [rust-lang#118654](rust-lang#118654) Adds information to errors caused by possible CStr literals in pre-2021. The lexer separates `c"str"` into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have written `c "str"` in a later edition, so to not confuse people who _are_ using a recent edition, I also added a note about whitespace. We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much. This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!
Fixes #118654
Adds information to errors caused by possible CStr literals in pre-2021.
The lexer separates
c"str"
into two tokens if the edition is less than 2021, which later causes an error when parsing. This error now has a more helpful message that directs them to information about editions. However, the user might also have writtenc "str"
in a later edition, so to not confuse people who are using a recent edition, I also added a note about whitespace.We could probably figure out exactly which scenario has been encountered by examining spans and editions, but I figured it would be better not to overcomplicate the creation of the error too much.
This is my first code PR and I tried to follow existing conventions as much as possible, but I probably missed something, so let me know!