-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start tracking quoting style in the AST #10298
Conversation
Performance is neutral on all benchmarks according to Codspeed. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. It would be nice to use the new string flags in the formatter to avoid parsing them from source. But that's nothing for this PR.
ruff/crates/ruff_python_formatter/src/string/mod.rs
Lines 121 to 144 in 72bf1c2
pub(crate) fn parse(input: &str) -> StringPrefix { | |
let chars = input.chars(); | |
let mut prefix = StringPrefix::empty(); | |
for c in chars { | |
let flag = match c { | |
'u' | 'U' => StringPrefix::UNICODE, | |
'f' | 'F' => StringPrefix::F_STRING, | |
'b' | 'B' => StringPrefix::BYTE, | |
'r' => StringPrefix::RAW, | |
'R' => StringPrefix::RAW_UPPER, | |
'\'' | '"' => break, | |
c => { | |
unreachable!( | |
"Unexpected character '{c}' terminating the prefix of a string literal" | |
); | |
} | |
}; | |
prefix |= flag; | |
} | |
prefix | |
} |
I posted a solution that avoids the need to change FStringStart
. I would prefer if we keep it as is to reduce the diff size.
crates/ruff_python_ast/src/nodes.rs
Outdated
} | ||
|
||
/// Does the string have a `u` or `U` prefix? | ||
pub const fn is_u_string(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit,
pub const fn is_u_string(&self) -> bool { | |
pub const fn is_unicode(&self) -> bool { |
Potentially without the string
postfix because that's given by the enclosing type. E.g it's also is_triple_quoted
and not is_triple_quoted_string
. E.g let's compare some call-sites: string_literal.is_unicode_string()
seems repetivie in comparison to string_literal.is_unicode()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response as #10298 (comment)
impl From<ruff_python_ast::str::QuoteStyle> for Quote { | ||
fn from(value: ruff_python_ast::str::QuoteStyle) -> Self { | ||
match value { | ||
ruff_python_parser::QuoteStyle::Double => Self::Double, | ||
ruff_python_parser::QuoteStyle::Single => Self::Single, | ||
ruff_python_ast::str::QuoteStyle::Double => Self::Double, | ||
ruff_python_ast::str::QuoteStyle::Single => Self::Single, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replace the Quote
type with QuoteStyle
? I would prefer to minimize the number of Quote
like types we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do, but it would require making the ruff_python_literal
crate have a dependency on the ruff_python_ast
crate (or vice versa), due to this:
ruff/crates/ruff_python_codegen/src/stylist.rs
Lines 123 to 130 in 965adbe
impl From<Quote> for StrQuote { | |
fn from(val: Quote) -> Self { | |
match val { | |
Quote::Single => StrQuote::Single, | |
Quote::Double => StrQuote::Double, | |
} | |
} | |
} |
If we used ruff_python_ast::str::QuoteStyle
there instead of ruff_python_codegen::stylist::Quote
, we wouldn't be able to have the implementation of that trait in the ruff_python_codegen
crate, since the QuoteStyle
enum comes from the ruff_python_literal
crate. ruff_python_literal
does not currently depend on ruff_python_ast
(nor vice versa).
.with_value(capital_env_var.into_boxed_str()) | ||
.with_prefix({ | ||
if env_var.is_unicode() { | ||
StringLiteralPrefix::UString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think I would prefer Unicode
, it avoids the string
repetition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike Unicode
, because all strings in Python are unicode... the u
prefix does nothing at runtime, it's purely there for Python 2 compatibility. (In fact, it was originally removed in Python 3.0, but they added it back in a later Python 3 version because it was so difficult to write code that worked on both Python 2 and Python 3 if the prefix was gone in Python 3.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. But it is the unicode flag ;) Can you do a quick check on what we call it in other places in the code base? It would be nice if it is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a grep -- there's a pyupgrade rule called UnicodeKindPrefix
, but other than that, I can't see much use of the term Unicode
across the codebase to describe this prefix...
I'd still rather keep this as-is -- I personally would find it really confusing to refer to these strings as "unicode strings", given that all strings are unicode strings, even without the prefix
c8bdaf0
to
7ab7c5a
Compare
7ab7c5a
to
703dcf3
Compare
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information: - The quoting style used (double or single quotes) - Whether the string is triple-quoted or not - Whether the string is raw or not This PR is a followup to astral-sh#10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the `ExprGenerator` in a followup PR, which will allow us to provide a fix for astral-sh#7799. The information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on `Tok::String`, `Tok::FStringStart` and `Tok::FStringMiddle` tokens in astral-sh#10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node. Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a `u` prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.
Summary
This PR modifies our AST so that nodes for string literals, bytes literals and f-strings all retain the following information:
This PR is a followup to #10256. Like with that PR, this PR does not, in itself, fix any bugs. However, it means that we will have the necessary information to preserve quoting style and rawness of strings in the
ExprGenerator
in a followup PR, which will allow us to provide a fix for #7799.The PR should be easiest to review commit-by-commit:
ast::StringLiteral
nodesast::BytesLiteral
nodesast::FString
nodesThe information is recorded on the AST nodes using a bitflag field on each node, similarly to how we recorded the information on
Tok::String
,Tok::FStringStart
andTok::FStringMiddle
tokens in #10298. Rather than reusing the bitflag I used for the tokens, however, I decided to create a custom bitflag for each AST node.Using different bitflags for each node allows us to make invalid states unrepresentable: it is valid to set a
u
prefix on a string literal, but not on a bytes literal or an f-string. It also allows us to have better debug representations for each AST node modified in this PR.Test Plan
cargo test