Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JSONSelection] Support -> method syntax for inline data transforms #5762

Merged
merged 49 commits into from
Aug 27, 2024

Conversation

benjamn
Copy link
Member

@benjamn benjamn commented Aug 1, 2024

This PR extends JSONSelection syntax to allow any field value (including fields selected by nested.path.selections) to be transformed in-line using a "method" syntax that looks like either field->method or field->method(arg1, arg2, ...), where method is an identifier that specifies which method implementation to invoke against the preceding field's value.

When a -> method takes arguments, those arguments can be any JSON literal expression, with an additional allowance for PathSelection values, so you can refer to $variables like $this or $args or $ (see #5142) within the JSON literal. This additional capability is why I introduced the JSLiteral enum and related parse*/ApplyTo methods, rather than just reusing serde_json_bytes::Value. In some sense, this extension brings the JSLiteral syntax closer to JavaScript literal syntax (which can refer to variables) than to JSON syntax, which is why I called it JSLiteral rather than JSONLiteral.

The implementation of a method has access not only to its arguments, but also to the input value preceding the ->. Sometimes, you may need to refer to that input value within the method arguments, which is possible using the newly introduced @ symbol. This syntax was directly inspired by JSONPath syntax, which uses the @ character for analogous purposes (representing the current value within filter expressions).

This syntax was first inspired by the need to operate on arrays, using methods like array->first, array->last, and array->slice(0, 3), but the set of potential use cases for the syntax has grown rapidly since then, and is very much still growing. You can find examples of all the currently supported methods in the README.md, and their actual implementations can be found in the json_selection/methods.rs file.

For now, the set of available methods is fixed by the Rust implementation, and registered in a lazy_static! map. In the future, we might make the method list configurable, with a flexible type system to validate inputs and outputs, but for now it seems simpler to restrict this functionality to methods we know about and have carefully reviewed.

The most important commit is the final one, but I included a number of supporting refactoring commits beforehand.

benjamn and others added 10 commits August 1, 2024 17:09
This is a follow-up to PR #5156, which reordered some rules in the
JSONSelection grammar but neglected to regenerate the SVG diagrams.
Although it may seem like a minor thing, we need an outer PathSelection
wrapper around an inner PathList so we can apply certain logic only at
the beginning of the path, which was not previously possible using only
the PathSelection enum.

For example, during the recursion of pretty_print_with_indentation, we
don't know if a given PathList::Key is the first key in the path, in
which case it _might_ need a leading '.' (depending on is_single_key),
or some subsequent key, in which case it always needs a leading '.'. If
we make a distinction between PathSelection and PathList, we can easily
apply that special logic only at the beginning of the path in
PathSelection::pretty_print_with_indentation.

As a side benefit, we can keep the PathSelection::path field private,
since PathSelection is now a struct rather than an enum, which should
help maintain the various invariants we want to enforce upon the path
(e.g. variables can only come at the beginning, and selections can only
come at the end), because code outside the json_selection directory can
no longer create any arbitrary chain of PathSelection enum variants.
I originally thought we might allow integer literals in path selections,
such as data.results.0, but that functionality will be ultimately/better
achieved using `->` methods that operate on arrays (`->first`, `->last`,
`->at(n)`, `->slice(start, end)`, etc), leaving the PathList::Key
variant to represent only string-valued keys.
We want the single-PathSelection version of JSONSelection to be capable
of operating directly on arrays (e.g. `$->slice(0, 5) { id }`), so that
decision should not be forced at the JSONSelection level.
@router-perf
Copy link

router-perf bot commented Aug 1, 2024

CI performance tests

  • const - Basic stress test that runs with a constant number of users
  • demand-control-instrumented - A copy of the step test, but with demand control monitoring and metrics enabled
  • demand-control-uninstrumented - A copy of the step test, but with demand control monitoring enabled
  • enhanced-signature - Enhanced signature enabled
  • events - Stress test for events with a lot of users and deduplication ENABLED
  • events_big_cap_high_rate - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity
  • events_big_cap_high_rate_callback - Stress test for events with a lot of users, deduplication enabled and high rate event with a big queue capacity using callback mode
  • events_callback - Stress test for events with a lot of users and deduplication ENABLED in callback mode
  • events_without_dedup - Stress test for events with a lot of users and deduplication DISABLED
  • events_without_dedup_callback - Stress test for events with a lot of users and deduplication DISABLED using callback mode
  • extended-reference-mode - Extended reference mode enabled
  • large-request - Stress test with a 1 MB request payload
  • no-tracing - Basic stress test, no tracing
  • reload - Reload test over a long period of time at a constant rate of users
  • step-jemalloc-tuning - Clone of the basic stress test for jemalloc tuning
  • step-local-metrics - Field stats that are generated from the router rather than FTV1
  • step-with-prometheus - A copy of the step test with the Prometheus metrics exporter enabled
  • step - Basic stress test that steps up the number of users over time
  • xlarge-request - Stress test with 10 MB request payload
  • xxlarge-request - Stress test with 100 MB request payload

@benjamn benjamn changed the title [JSONSelection] Support -> method syntax for inline data transforms and flexible value injection [JSONSelection] Support -> method syntax for inline data transforms Aug 1, 2024
@benjamn benjamn force-pushed the benjamn/JSONSelection-method-syntax branch from fdb1408 to 61566c7 Compare August 2, 2024 22:27
Copy link
Member

@dylan-apollo dylan-apollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some comments on just the docs, in thinking about the methods themselves from a users' perspective. I do think before we stabilize any of these, we need to try them out on some full examples, which means we should add expansion support for them before continuing. In particular, expansion needs to know if a method resolves to an object, and if so, what the shape of that object is.

@benjamn
Copy link
Member Author

benjamn commented Aug 23, 2024

@nicholascioli @dylan-apollo Heads up that I grouped all the experimental methods we are not yet making public into a future:: module namespace (f1daee6), so you can more easily breeze over those method implementations in review.

Copy link
Contributor

@nicholascioli nicholascioli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass!

The only public methods we want to expose related to the
`ApplyTo[Internal]` traits are the top-level `apply_to` and
`apply_with_vars` methods, not internal methods like `apply_to_path` or
`apply_to_array`.

This commit hoists the `apply_to` and `apply_with_vars` methods up to
the `JSONSelection` namespace (only), and requires each other AST type
to implement its own `ApplyToInternal` trait / `apply_to_path` method,
which is only visible within the `json_selection` module.
The "External" naming emphasizes that we are only collecting variables
like $this, $args, and $context, and not $ or @ variables.
@benjamn benjamn requested review from a team as code owners August 26, 2024 23:22
Copy link
Member

@dylan-apollo dylan-apollo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've reached peak iteration on this PR, and I'm happy to merge it if you are @benjamn! Discussed in a few places but, for posterity, we won't be documenting these as stable until we tackle a few follow-up items (all tracked in Jira, mostly validation-related). Merging it lets us sneak it into the release, though, so we can start experimenting with methods before letting users go wild (and start working on those follow-ups ASAP).

Thanks for putting all the effort into this, especially with all our questions and feedback!

@benjamn benjamn merged commit 2e56be0 into next Aug 27, 2024
11 checks passed
@benjamn benjamn deleted the benjamn/JSONSelection-method-syntax branch August 27, 2024 19:19
benjamn added a commit that referenced this pull request Sep 13, 2024
benjamn added a commit that referenced this pull request Sep 13, 2024
benjamn added a commit that referenced this pull request Sep 17, 2024
benjamn added a commit that referenced this pull request Feb 11, 2025
We previously decided to allow leading 0s, per this discussion:
#5762 (comment)

However, since we use the parsing logic of `serde_json::Number` to
produce the internal representation, and that logic fails given leading
zeros, we need to normalize the input `&str` a bit before calling
`number.parse()`. We already perform some similar normalizations, e.g.
ensuring the fractional part is at least a 0 (not empty after the `.`),
so there's precedent.
benjamn added a commit that referenced this pull request Feb 11, 2025
We previously decided to allow leading 0s, per this discussion:
#5762 (comment)

However, since we use the parsing logic of `serde_json::Number` to
produce the internal representation, and that logic fails given leading
zeros, we need to normalize the input `&str` a bit before calling
`number.parse()`. We already perform some similar normalizations, e.g.
ensuring the fractional part is at least a 0 (not empty after the `.`),
so there's precedent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants