Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RFC 3349, mixed utf8 literals #120286

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nnethercote
Copy link
Contributor

RFC: rust-lang/rfcs#3349
Tracking issue: #116907

r? @ghost

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 23, 2024
@rustbot
Copy link
Collaborator

rustbot commented Jan 23, 2024

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

@nnethercote nnethercote marked this pull request as draft January 23, 2024 23:51
@rust-log-analyzer

This comment has been minimized.

@nnethercote nnethercote force-pushed the 3349-mixed-utf8-literals branch from bc4ce4a to f2dee33 Compare January 24, 2024 06:42
@rust-log-analyzer

This comment has been minimized.

@nnethercote nnethercote force-pushed the 3349-mixed-utf8-literals branch 2 times, most recently from 8b2b3a7 to 585f313 Compare January 25, 2024 01:31
@rustbot rustbot added A-testsuite Area: The testsuite used to check the correctness of rustc T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Jan 25, 2024
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 25, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 25, 2024
…, r=<try>

Implement RFC 3349, mixed utf8 literals

RFC: rust-lang/rfcs#3349
Tracking issue: rust-lang#116907

r? `@ghost`
@bors
Copy link
Contributor

bors commented Jan 25, 2024

⌛ Trying commit 585f313 with merge b626f8d...

@bors
Copy link
Contributor

bors commented Jan 25, 2024

☀️ Try build successful - checks-actions
Build commit: b626f8d (b626f8d9c6f947aa83d4a7f690e96eff73339152)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b626f8d): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.8% [0.8%, 0.8%] 1
Improvements ✅
(primary)
-0.2% [-0.3%, -0.2%] 6
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 1
All ❌✅ (primary) -0.2% [-0.3%, -0.2%] 6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.7% [3.7%, 3.7%] 1
Regressions ❌
(secondary)
9.7% [9.7%, 9.7%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.1% [-6.8%, -4.0%] 3
All ❌✅ (primary) 3.7% [3.7%, 3.7%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 662.082s -> 661.981s (-0.02%)
Artifact size: 308.26 MiB -> 308.25 MiB (-0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 25, 2024
@nnethercote
Copy link
Contributor Author

No significant perf changes.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Jan 26, 2024
@nnethercote
Copy link
Contributor Author

This doesn't fully implement the RFC, because I think a couple of things in the RFC are a bad idea. Details in this comment.

Specifically:
- Allow unicode chars in b"" and br"" literals. This is done by changing
  `Mode::allow_unicode_chars` to succeed on `ByteStr` and `RawByteStr`.
- Allow unicode escapes in b"" literals. This is done by changing
  `Mode::allow_unicode_escapes` to succeed on `ByteStr`.

Byte string literals can already have high bytes (`\x80`..`\xff`).
Because they now also support unicode chars, they can now be mixed utf8,
so we use `unescape_mixed`/`cook_mixed` instead of
`unescape_unicode`/`cook_unicode` to process them.

A new type `Rfc3349`, is used to implement the feature gating. Values of
that type are threaded through the unescaping code to track whether
rules from rfc3349 are required for unescaping to succeed.

Test changes:

- tests/ui/mixed-utf8-literals/basic.rs: new `check-pass` UI test with
  various literals exercising the new forms.

- tests/ui/attributes/key-value-non-ascii.rs: changed from a byte string
  literal to a byte literal; we just need some kind of problem with a
  literal to preserve the test's intent.

- tests/ui/parser/raw/raw-byte-string-literals.rs: moved the raw byte
  string literal with a non-ASCII char to `basic.rs`.

- tests/ui/parser/byte-string-literals.rs: similar.

- tests/ui/parser/issues/issue-23620-invalid-escapes.rs: moved one case
  fully to `basic.rs`, and one partially.

- tests/ui/parser/unicode-control-codepoints.rs: left the code
  unchanged, but the errors are now about mixed-utf8-literals being
  feature gated.

- tests/ui/suggestions/multibyte-escapes.rs: moved one case to
  `basic.rs`.

- compiler/rustc_lexer/src/unescape/tests.rs: various adjustments
  - two cases that previously failed now succeed
  - added some more cases for the newly supported syntax

I wasn't sure how to handle rust-analyzer in general, so I just allowed
mixed utf8 literals everywhere without complaint.
@nnethercote nnethercote force-pushed the 3349-mixed-utf8-literals branch from 585f313 to c144da3 Compare January 30, 2024 05:37
@bors
Copy link
Contributor

bors commented Jan 30, 2024

☔ The latest upstream changes (presumably #120491) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC Dylan-DPC added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 6, 2024
@Dylan-DPC
Copy link
Member

@nnethercote any updates on this? thanks

@nnethercote
Copy link
Contributor Author

I nominated it for lang-team discussion six months ago, and haven't heard anything since.

@Dylan-DPC Dylan-DPC added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Oct 6, 2024
@apiraino apiraino added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Oct 10, 2024
@apiraino
Copy link
Contributor

I'll signal that this needs to be discussed by T-lang

@rustbot label +S-blocked

@rustbot rustbot added the S-blocked Status: Blocked on something else such as an RFC or other implementation work. label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-testsuite Area: The testsuite used to check the correctness of rustc perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-blocked Status: Blocked on something else such as an RFC or other implementation work. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants