-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support UTF-32 position encoding #14141
Conversation
Looks like this is a native encoding for Emacs at least!
Together with emacs-lsp/lsp-mode#3958, this fixes the infamous Emacs vs 💩 problem. |
@@ -192,9 +193,9 @@ pub(crate) fn apply_document_changes( | |||
let mut line_index = LineIndex { | |||
// the index will be overwritten in the bottom loop's first iteration | |||
index: Arc::new(ide::LineIndex::new(&text)), | |||
// We don't care about line endings or offset encoding here. | |||
// We don't care about line endings here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @lnicola , the comment about irrelevance of encoding here turned out to be wrong at least for tests. So I just blindly threaded the correct encoding here. I wonder if that might have fixed some latent bugs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's from c8b9ec8. But I don't remember the code well enough to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I took a second look here, and it does look like an outright bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh dear, yeah, this one was busted pretty substantially. If we didn't get reports about that, that means that no-one actually uses utf-8 😢
It only worked with VS Code because Code uses utf16.
Co-authored-by: Stig Brautaset <stig@brautaset.org>
I considered also
So it seems safer to not do this. |
In emacs-lsp/lsp-mode#3344 (comment) @yyoncho seems to be open to applying one patch (I'm not sure which), so maybe this can be fixed "properly" soon. Unfortunately, it seems that nobody bothered to even test the two previous patches (yours likely didn't work). I still think the proper fix is to send the correct UTF-16 positions, because in the meanwhile everybody will move on to |
assert_eq!(col_index.utf16_to_utf8_col(1, 18), TextSize::from(20)); // space | ||
assert_eq!(col_index.utf16_to_utf8_col(1, 19), TextSize::from(21)); // second メ at 21..24 | ||
let text: String = { | ||
let mut chars: Vec<char> = ((0 as char)..char::MAX).collect(); // Neat! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find this impl, does it skip the surrogates? I guess it's not a problem today, but might be worth keeping it in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The impl is here:
https://doc.rust-lang.org/stable/src/core/iter/range.rs.html#409
My understanding is that this includes chars which are encoded by surrogate pairs, but not individual surrogates (as those would produce invalid utf8)
LGTM, I'd be in favor of merging this rather sooner than later, so it has a bit of time to bake until next Monday. |
For me, the fact that Emacs native coordinates are in unicode code points feels like a very strong argument to support this on the server. Eglot today handles this correctly by re-encoding whole lines: which feels pretty wasteful to me. (at the same time, Emacs should have used Rust-style utf8 indexes for internal coordinates, as that's a superior scheme oveal) |
bors r+ |
@bors r+ |
☀️ Test successful - checks-actions |
(I assume that utf-32 is native to emacs based on joaotavora/eglot#916 (comment)) |
//! | ||
//! Currently, we use oorandom instead, but it misses these two utilities. | ||
//! Perhaps we should switch to `fastrand`, or our own small prng, it's not like | ||
//! we need anything move complicatied that xor-shift. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already merged but it looks like there are at least 3 typos in this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two typos, would you mind pointing out the third one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "move complicatied that" should be "more complicated than".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the second one, what's the third? https://github.com/rust-lang/rust-analyzer/pull/14154/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- move -> more
- complicatied -> complicated
- that -> than
🤔 maybe I didn't express myself clearly... your PR seems to fix everything in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I missed than
, but matklad kindly pointed it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, if you don't mind there's a staarts
typo in https://blog.dend.ro/introducing-sd-notify/ (very nice crate btw!).
Looks like this is a native encoding for Emacs at least!