[JSONSelection] Track location information for parsed structures using `Ranged<T>` trait and `WithRange<T>` wrapper #5987

benjamn · 2024-09-10T20:09:00Z

In order to provide more helpful parsing and error messages for JSONSelection, and to enable IDE/LSP features like syntax highlighting and tooltip hovering in the future, the JSONSelection parser needs to track where each parsed substructure came from within the original source string.

The nom_locate crate was helpful for replacing raw &str inputs with the type Span<'a> = LocatedSpan<&'a str> type, which nom (mostly) seamlessly accepts everywhere &str was previously expected. However, while the LocatedSpan type makes it easy to determine the starting offset of a given AST structure, the ending offset typically needs to be derived from parsing the structure's children. To bootstrap this parsing from the bottom up, I found it helpful to add a parsed_span combinator to match specific concrete tokens and capture their range information.

This would all have been considerably easier if every AST structure was a struct, because then I could simply have added a range: Option<(usize, usize)> (aka range: OffsetRange) field to each struct, and then make sure that field gets populated during parsing. Where possible, that's what I did, but there are a number of AST structures that make more sense as enum types, and we also need to keep track of location information for primitive/string types that cannot store their own ranges.

To work with this variety of AST structures, I introduced a Ranged<T> trait, which is implemented by all the structures that can provide (or cheaply derive) their own ranges. For those that cannot, there's a WithRange<T> wrapper struct that implements Ranged<T>. I considered using WithRange<T> (note: fka Parsed<T>) for everything, but that proved cumbersome and redundant, so the later commits in this PR walk back the use of WithRange<T> to just the types that need it, leaving the rest to implement Ranged<T> directly, without a wrapper.

To make this range information immediately useful, I am now including it as one of the fields of the ApplyToError struct, making runtime apply_to_path errors easier to debug. I would also like to include range information (along with more descriptive error messages) in parse errors generated by nom, but I haven't found a good way to replace the opaque nom::error::Error objects (which only store the input string and an error code) with errors that allow custom messages and range information, so that will have to wait for a subsequent PR.

You can review this PR commit-by-commit if you like, but I changed my mind a few times (e.g. renaming Parsed<T> to WithRange<T> after adding Range<T>), so it might be easier to review the whole set of changes together.

router-perf · 2024-09-10T20:09:31Z

#5987 (comment)

lennyburdette

i didn't go over this with a fine-toothed comb, but the overall structure looks sound and the tests look good. i'm inclined to merge this early, probably right after our next release, so we can iterate on diagnostics for the language server. thank you for this!

apollo-federation/src/sources/connect/json_selection/known_var.rs

#5987 (comment)

Now that not every T::parse method returns an IResult<Span, Parsed<T>>, it makes sense to name the wrapper struct WithRange<T> to reflect why only some AST structures need to be wrapped (that is, they would have difficulty storing or otherwise deriving their own ranges).

#5987 (comment)

#5987 (comment) #5987 (comment)

Note that this commit also changes the hash to include error.range (along with error.message and error.path), which practically guarantees no two ApplyToError objects will ever be deduplicated, so we might not even need to implement Hash if we move away from using IndexSet to accumulate the errors. #5987 (comment)

…ath. This method isn't actually used anywhere yet, but changing the return type seems like a good idea: #5987 (comment)

#5987 (comment)

Previously, the parser would always consume both leading and trailing spaces and comments when parsing (most) AST structures. This was a convenient convention because it meant the caller of a given ::parse method typically did not have to worry about consuming trailing spaces immediately after calling the child ::parse method, since they would already have been consumed. Now that we are tracking location information with the help of nom_locate, it's relatively easy to determine the start offset of a given AST structure (typically input.location_offset()), but it's trickier to determine the end offset, because the input text represents the entire remaining suffix of the original input, with no helpful indication where the end of the current AST node might be. If we shift our conventions to stop consuming trailing spaces and comments, then the caller of a given ::parse method can rely on the end offset returned by the child ::parse method to be the end of the child AST node, and can typically use that information to determine the end offset of the parent AST node, with help from merge_ranges. This change also allows us to store useful range information for the PathList::Empty variant, which previously always had `range: None`. Even though these ranges are empty (start == end), we can simplify the full_range calculation in PathList::parse_with_depth if we don't have to check whether matches!(rest.node(), PathList::Empty) and handle that case specially. Instead, we can typically compute full_range from merge_ranges(first_child.range(), rest.range()) without having to fall back to other ranges when rest is PathList::Empty (which previously meant rest.range() would be None). This empty range now captures the location where parsing of the previous AST nodes actually stopped, so rest.range() is always usable (not None) and accurate (no trailing spaces_or_comments included).

#5987 (comment) #5987 (comment)

#5987 (comment)

nicholascioli · 2024-09-12T20:36:25Z

Thanks for working through all of the comments! The only thing left is if you want to add a quick comment about the trailing space. Otherwise, feel free to squash and merge!

The inclusion of "array" and "string" in the error message felt confusing to me, since it's not obvious that the array/string in question is the input value that came before the `->`. #5987 (comment)

benjamn requested review from lennyburdette, dylan-apollo and pubmodmatt September 10, 2024 20:09

benjamn self-assigned this Sep 10, 2024

benjamn requested review from nicholascioli, a team, dariuszkuc, sachindshinde, goto-bus-stop, SimonSapin, lrlna, TylerBloom and duckki as code owners September 10, 2024 20:09

benjamn marked this pull request as draft September 10, 2024 21:56

This comment was marked as resolved.

Sign in to view

benjamn added a commit that referenced this pull request Sep 10, 2024

Update external (apollo-router) snapshot tests, again.

ce36e85

#5987 (comment)

benjamn force-pushed the benjamn/JSONSelection-parsing-and-error-ranges branch from 18dffb2 to ce36e85 Compare September 10, 2024 21:59

benjamn marked this pull request as ready for review September 10, 2024 22:47

lennyburdette approved these changes Sep 11, 2024

View reviewed changes

apollo-federation/src/sources/connect/json_selection/known_var.rs Outdated Show resolved Hide resolved

benjamn added a commit that referenced this pull request Sep 11, 2024

Rename outdated into_parsed to into_with_range.

13d41b3

#5987 (comment)

benjamn added a commit that referenced this pull request Sep 11, 2024

Rename outdated into_parsed to into_with_range.

8a35874

#5987 (comment)

benjamn force-pushed the benjamn/JSONSelection-parsing-and-error-ranges branch from 13d41b3 to 8a35874 Compare September 11, 2024 15:46

This comment was marked as resolved.

Sign in to view

benjamn added a commit that referenced this pull request Sep 11, 2024

Update external (apollo-router) snapshot tests, again.

46946e8

#5987 (comment)

benjamn added a commit that referenced this pull request Sep 11, 2024

Rename outdated into_parsed to into_with_range.

845e2c7

#5987 (comment)

benjamn force-pushed the benjamn/JSONSelection-parsing-and-error-ranges branch from 351f0a6 to 845e2c7 Compare September 11, 2024 16:18

benjamn added a commit that referenced this pull request Sep 11, 2024

Update external (apollo-router) snapshot tests, again.

4a966f9

#5987 (comment)

benjamn added a commit that referenced this pull request Sep 11, 2024

Rename outdated into_parsed to into_with_range.

b57decb

#5987 (comment)

benjamn force-pushed the benjamn/JSONSelection-parsing-and-error-ranges branch from 845e2c7 to b57decb Compare September 11, 2024 18:29

benjamn added 19 commits September 12, 2024 16:09

Store range in StarSelection struct to avoid Parsed<StarSelection>.

c74f655

Remove StripRanges and PrettyPrintable for unused Parsed<JSONSelection>.

edc560c

Store range in MethodArgs struct to avoid WithRange<MethodArgs>.

634e48e

Use named OffsetRange type to represent Option<(usize, usize)>.

f55e8ff

Fix ending offsets for PathList ranges.

c9ff757

Update external (apollo-router) snapshot tests, again.

0b5edf6

#5987 (comment)

Delegate more to PathList in PathSelection::external_var_paths.

608eb13

Rename outdated into_parsed to into_with_range.

8da3208

#5987 (comment)

Use std::ops::Range for OffsetRange rather than (usize, usize).

2af7164

#5987 (comment) #5987 (comment)

Avoid parsed_ variable name prefix in favor of ranged_.

30c6f51

Return &[JSON] slice instead of cloned Vec<JSON> from ApplyToError::p…

34fa7c9

…ath. This method isn't actually used anywhere yet, but changing the return type seems like a good idea: #5987 (comment)

Use serde to simplify ApplyToError::from_json.

5f72df4

#5987 (comment)

Simplify PathSelection::apply_to_path for PathList::Key case.

8dae616

#5987 (comment)

Switch some larger assert_eq! tests to assert_debug_snapshot!.

62629a8

#5987 (comment) #5987 (comment)

Remove Ranged<T>::node() and use WithRange<T>::as_ref() instead.

3c7c3ab

#5987 (comment)

Implement PrettyPrintable for inner types, not WithRange<_>.

725afd8

#5987 (comment)

benjamn force-pushed the benjamn/JSONSelection-parsing-and-error-ranges branch from 1454e59 to 725afd8 Compare September 12, 2024 20:13

nicholascioli approved these changes Sep 12, 2024

View reviewed changes

Improve ->get(index) out of bounds errors and their tests.

b08dbb9

The inclusion of "array" and "string" in the error message felt confusing to me, since it's not obvious that the array/string in question is the input value that came before the `->`. #5987 (comment)

benjamn merged commit 9a34ca3 into next Sep 12, 2024
10 of 12 checks passed

benjamn deleted the benjamn/JSONSelection-parsing-and-error-ranges branch September 12, 2024 21:45

benjamn restored the benjamn/JSONSelection-parsing-and-error-ranges branch September 12, 2024 21:45

This was referenced Sep 13, 2024

[JSONSelection] Remove &mut errors: IndexSet<ApplyToError> from apply_to_path method signatures #6003

Merged

[JSONSelection] Support $(...) syntax to simplify expression injection #5994

Merged

fix: group mapping errors #6020

Merged

benjamn mentioned this pull request Sep 30, 2024

[JSONSelection] Report better parsing error messages using extra: X field of LocatedSpan<T, X = ()> #6089

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JSONSelection] Track location information for parsed structures using `Ranged<T>` trait and `WithRange<T>` wrapper #5987

[JSONSelection] Track location information for parsed structures using `Ranged<T>` trait and `WithRange<T>` wrapper #5987

benjamn commented Sep 10, 2024

router-perf bot commented Sep 10, 2024 •

edited by lennyburdette

Loading

This comment was marked as resolved.

lennyburdette left a comment

This comment was marked as resolved.

nicholascioli commented Sep 12, 2024

[JSONSelection] Track location information for parsed structures using Ranged<T> trait and WithRange<T> wrapper #5987

[JSONSelection] Track location information for parsed structures using Ranged<T> trait and WithRange<T> wrapper #5987

Conversation

benjamn commented Sep 10, 2024

router-perf bot commented Sep 10, 2024 • edited by lennyburdette Loading

This comment was marked as resolved.

lennyburdette left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

nicholascioli commented Sep 12, 2024

[JSONSelection] Track location information for parsed structures using `Ranged<T>` trait and `WithRange<T>` wrapper #5987

[JSONSelection] Track location information for parsed structures using `Ranged<T>` trait and `WithRange<T>` wrapper #5987

router-perf bot commented Sep 10, 2024 •

edited by lennyburdette

Loading