Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use JSON11 to parse and serialize long numerals #784

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

AMoo-Miki
Copy link
Collaborator

@AMoo-Miki AMoo-Miki commented Jun 1, 2024

Description

Changing from the RegEx based approach to an AST one dramatically reduces the encounters with false-positives, reducing the amount of time it takes to parse very large values.

Check List

  • New functionality includes testing.
    • All tests pass
  • Linter check was successfull - yarn run lint doesn't show any errors
  • Commits are signed per the DCO using --signoff
  • Changelog was updated.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some of the CI has failed, take a look?

Are _stringifyWithBigInt and _parseWithBigInt used anywhere after this or are they dead code? Remove.

Are there tests for this that do big ints that cover these paths?

What are the performance numbers, even if anecdotal?

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is JSON11 a superset of sjson? Why not just use that in all cases and get rid of all the special handling of big ints?

Would it make sense to let users configure the serializer/deserializer instead of having special handling of big ints and importing mutliple libraries?

Signed-off-by: Miki <miki@amazon.com>
@AMoo-Miki
Copy link
Collaborator Author

AMoo-Miki commented Jun 3, 2024

Are _stringifyWithBigInt and _parseWithBigInt used anywhere after this or are they dead code? Remove.

Removed dead code.

Are there tests for this that do big ints that cover these paths?

Yes; I cleaned them up to remove the unnecessary tests.

What are the performance numbers, even if anecdotal?

An experiment with 10,000 false positives took over 90s earlier. With JSON11, it takes 200-ish ms.

Looks like some of the CI has failed, take a look?

Fixed.

@AMoo-Miki
Copy link
Collaborator Author

AMoo-Miki commented Jun 3, 2024

Is JSON11 a superset of sjson?

JSON11 is a superset of JSON5 which is spec-wise a superset of JSON. By contrast, secure-json-parse is just a wrapper around the built-in JSON.parse to throw an exception if it finds a key named __proto__. secure-json-parse has no special abilities beyond that.

Why not just use that in all cases and get rid of all the special handling of big ints?

JSON.parse being a built-in method is 17 times faster than JSON5.parse and by proxy JSON11's. We are giving users the choice between (A) fast and imprecise for large integers, and (B) slow and precise. I would hazard a guess that not many users deal with precise large integers because BigInt is a newer concept in JS.

Would it make sense to let users configure the serializer/deserializer instead of having special handling of big ints and importing mutliple libraries?

It is a user provided option named enableLongNumeralSupport which allows them to choose between A and B.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than the previous custom implementation, so good to go.

Generally, in the combination of shouldHandleLongNumerals=true, numeralsAreNumbers getting set by the first parse, and JSON11.stringify succeeding, we parse twice.

This just begs the question of why all this complication, what would not be possible if the caller specified that they want to use JSON vs. JSON11 vs. JSONXYZ instead of enableLongNumeralSupport?

@AMoo-Miki
Copy link
Collaborator Author

On a side note: I believe the way secure-json-parse works is not useful. It either removes the property or throws an exception. When there is a correct way of handling it by retaining __proto__ but preventing it from poisoning the prototype, we shouldn't destroying data.

@dblock dblock merged commit c450c35 into opensearch-project:main Jun 3, 2024
66 of 67 checks passed
@dblock
Copy link
Member

dblock commented Jun 3, 2024

On a side note: I believe the way secure-json-parse works is not useful. It either removes the property or throws an exception. When there is a correct way of handling it by retaining __proto__ but preventing it from poisoning the prototype, we shouldn't destroying data.

I don't think I understand this, but open issues!

@AMoo-Miki
Copy link
Collaborator Author

This is better than the previous custom implementation, so good to go.

I am glad I was able to make time for this.

Generally, in the combination of shouldHandleLongNumerals=true, numeralsAreNumbers getting set by the first parse, and JSON11.stringify succeeding, we parse twice.

I could just make it look at shouldHandleLongNumerals and then use JSON11. I feel with numeralsAreNumbers, we get to save on running the slower method unnecessarily. If we assume t is the time it takes to run the built-in methods and 17t would be the time for the JSONx methods, if a user asks for long numerals and only half their calls reference long numerals:

  1. With giving them the option to choose the JSONx library:
    • All calls will take 17t and making 100 calls will take 1700t.
  2. With the current detection:
    1. calls that don't need treatment will use the built-in one and take t.
    2. calls that need treatment will use t + 17t because they have to run both and will take 18t.
    • Making 100 calls will take 950t.

This just begs the question of why all this complication, what would not be possible if the caller specified that they want to use JSON vs. JSON11 vs. JSONXYZ?

That is certainly an option but for mixed-use customers, it would lead to maintaining too many options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants