Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Syntax highlighting breaks in rust file #206

Closed
ahkrr opened this issue Jun 10, 2021 · 7 comments
Closed

Bug: Syntax highlighting breaks in rust file #206

ahkrr opened this issue Jun 10, 2021 · 7 comments
Labels
A-tree-sitter Area: Tree-sitter E-help-wanted Call for participation: Extra attention is needed

Comments

@ahkrr
Copy link

ahkrr commented Jun 10, 2021

Helix was build with the current git/master branch.

hx_syntax.mp4

I have noticed that in some of my code, methods get highlighted with the color for variables.
In this case, TokenStream::form_iter and .set_span should also be green.

hx_syn

I tried to look into the issue and ended up in helix-core/src/syntax.rs +1309 HighlightIter::next.
At this point I have no desire to try to understand whats going on in there and where the problem could be originating form.
I ende my search at layer.captures.

I tried the default theme.toml and it also breaks, but it is less legible (in my opinion).

@archseer
Copy link
Member

For the video, I've noticed that and I think it's an issue with multiple cursors, the edits sent to tree sitter aren't generated with the right offsets for subsequent cursors:

pub(crate) fn generate_edits(
old_text: RopeSlice,
changeset: &ChangeSet,
) -> Vec<tree_sitter::InputEdit> {
use Operation::*;
let mut old_pos = 0;
let mut new_pos = 0;
let mut edits = Vec::new();
let mut iter = changeset.changes.iter().peekable();
// TODO; this is a lot easier with Change instead of Operation.
fn point_at_pos(text: RopeSlice, pos: usize) -> (usize, Point) {
let byte = text.char_to_byte(pos); // <- attempted to index past end
let line = text.char_to_line(pos);
let line_start_byte = text.line_to_byte(line);
let col = byte - line_start_byte;
(byte, Point::new(line, col))
}
fn traverse(point: Point, text: &Tendril) -> Point {
let Point {
mut row,
mut column,
} = point;
// TODO: there should be a better way here
for ch in text.bytes() {
if ch == b'\n' {
row += 1;
column = 0;
} else {
column += 1;
}
}
Point { row, column }
}
while let Some(change) = iter.next() {
let len = match change {
Delete(i) | Retain(i) => *i,
Insert(_) => 0,
};
let mut old_end = old_pos + len;
match change {
Retain(_) => {
new_pos += len;
}
Delete(_) => {
let (start_byte, start_position) = point_at_pos(old_text, old_pos);
let (old_end_byte, old_end_position) = point_at_pos(old_text, old_end);
// TODO: Position also needs to be byte based...
// let byte = char_to_byte(old_pos)
// let line = char_to_line(old_pos)
// let line_start_byte = line_to_byte()
// Position::new(line, line_start_byte - byte)
// deletion
edits.push(tree_sitter::InputEdit {
start_byte, // old_pos to byte
old_end_byte, // old_end to byte
new_end_byte: start_byte, // old_pos to byte
start_position, // old pos to coords
old_end_position, // old_end to coords
new_end_position: start_position, // old pos to coords
});
}
Insert(s) => {
let (start_byte, start_position) = point_at_pos(old_text, old_pos);
let ins = s.chars().count();
// a subsequent delete means a replace, consume it
if let Some(Delete(len)) = iter.peek() {
old_end = old_pos + len;
let (old_end_byte, old_end_position) = point_at_pos(old_text, old_end);
iter.next();
// replacement
edits.push(tree_sitter::InputEdit {
start_byte, // old_pos to byte
old_end_byte, // old_end to byte
new_end_byte: start_byte + s.len(), // old_pos to byte + s.len()
start_position, // old pos to coords
old_end_position, // old_end to coords
new_end_position: traverse(start_position, s), // old pos + chars, newlines matter too (iter over)
});
} else {
// insert
edits.push(tree_sitter::InputEdit {
start_byte, // old_pos to byte
old_end_byte: start_byte, // same
new_end_byte: start_byte + s.len(), // old_pos + s.len()
start_position, // old pos to coords
old_end_position: start_position, // same
new_end_position: traverse(start_position, s), // old pos + chars, newlines matter too (iter over)
});
}
new_pos += ins;
}
}
old_pos = old_end;
}
edits
}

(note, that TODO might be easier to address now that transaction.changes_iter() exists.)


For the second example can you give me a copy-pastable example so I can inspect it on https://tree-sitter.github.io/tree-sitter/playground ? I suspect we might be taking the wrong scope out here, it could be that spans.last() fixes it:

// TODO: scope matching: biggest union match? [string] & [html, string], [string, html] & [ string, html]
// can do this by sorting our theme matches based on array len (longest first) then stopping at the
// first rule that matches (rule.all(|scope| scopes.contains(scope)))
// log::info!(
// "scopes: {:?}",
// spans
// .iter()
// .map(|span| theme.scopes()[span.0].as_str())
// .collect::<Vec<_>>()
// );
let style = match spans.first() {
Some(span) => theme.get(theme.scopes()[span.0].as_str()),
None => theme.get("ui.text"),
};

The best behavior might be to iter over all the scopes and merge styles.

@archseer archseer added A-tree-sitter Area: Tree-sitter E-help-wanted Call for participation: Extra attention is needed labels Jun 10, 2021
@ahkrr
Copy link
Author

ahkrr commented Jun 10, 2021

The image is from some toy-code of mine that is from here byte-match.
It is only a single file 117 lines. The example is at the end of src/lib.rs

@archseer
Copy link
Member

Ah okay, I see it now, vec! is a macro invocation and everything inside it just gets parsed as identifiers since it's possible it's not valid rust code. I think there was an issue open on tree-sitter-rust about it, I'd prefer if a parse was at least attempted.

@ahkrr
Copy link
Author

ahkrr commented Jun 10, 2021

While I am at it, I have noticed another issue. Same file as the image before.
The _ in _ => in a match doesn't get recognized in any category.
And there are other elements in the macros that don't have a category.
This image is with all categories set to #000000, except for ui stuff, and the fallback that is ui.text at #ffffff
hx_sy

@ahkrr
Copy link
Author

ahkrr commented Jun 10, 2021

"{" @punctuation.bracket
"}" @punctuation.bracket
"," @punctuation.delimiter
":" @punctuation.delimiter

(identifier) @variable

I added these in my own rust/highlights.scm
If someone does this black and white trick on their own stuff, they will see more uncategorized things.
Should these rules or a some variation of them be introduced into the official rust/highlights.scm?

@archseer
Copy link
Member

We likely want to port some of these punctuation/operator rules: https://github.com/nvim-treesitter/nvim-treesitter/blob/3b516c8e2b29dc2f8bfa5425a1b9369496918884/queries/rust/highlights.scm#L213

I probably never noticed because I don't highlight those. We do definitely accept these types of query improvements to runtime/ because the upstream tree-sitter files were very barebones.

@archseer
Copy link
Member

Resolved by #430

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tree-sitter Area: Tree-sitter E-help-wanted Call for participation: Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants