Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify string escaping #160

Closed
ndmitchell opened this issue Feb 5, 2021 · 2 comments
Closed

Clarify string escaping #160

ndmitchell opened this issue Feb 5, 2021 · 2 comments

Comments

@ndmitchell
Copy link
Contributor

The spec defines string escapes like \a as \x07 - but doesn't say if \x07 is a legitimate escape. The octal escapes list \119 as \t9, but don't say what rules are used to determined it's a single character escape. There is no notation for specifying Unicode values. Should we just copy the spec from https://python-reference.readthedocs.io/en/latest/docs/str/escapes.html?

@alandonovan
Copy link
Contributor

alandonovan commented Feb 5, 2021

See #112, which I think covers this issue. (I've been hoping to fix #112 today, though time keeps slipping away.) In short: text strings will permit \xXX escapes for values in the range 0-127, and byte strings will permit them for values in the range 0-255.

Unicode code points will be denoted as \uXXXX or \UXXXXXXXX, which may appear in text or byte strings. In a text string, the escape denotes the UTF-k encoding of the 16- or 32-bit Unicode code point. In a byte string, the escape denotes its UTF-8 encoding. Text and byte string literals may also contain unescaped non-ASCII code points, such as "Ω" or b"Ω" in the source file, which is assumed to be encoded as UTF-8. (Bazel has a bug in which its source files are currently assumed to be Latin1, so we may need to temporarily disallow non-ASCII in literals in Bazel, to avoid confusion.)

facebook-github-bot pushed a commit to facebook/starlark-rust that referenced this issue Feb 6, 2021
Summary: Our escape characters are not implemented as per the Starlark spec, and the Starlark spec isn't very complete. I've raised bazelbuild/starlark#160 to get more details in the spec, but the temptation is to just follow the Python character escaping spec.

Reviewed By: bobyangyf

Differential Revision: D26276328

fbshipit-source-id: c50a2a677707257a9b481a45a65343cd109cf715
@brandjon
Copy link
Member

brandjon commented Jan 8, 2025

The current spec lists all valid escapes and says escapes not listed are an error. #112 should cover the need for unicode escapes. Closing.

@brandjon brandjon closed this as completed Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants