Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing Nan/Infinity breaks most standard JSON parsers #208

Closed
clelange opened this issue Oct 3, 2023 · 9 comments · Fixed by #242
Closed

Allowing Nan/Infinity breaks most standard JSON parsers #208

clelange opened this issue Oct 3, 2023 · 9 comments · Fixed by #242
Labels
schema Issues related to the schema definition

Comments

@clelange
Copy link

clelange commented Oct 3, 2023

I found that allowing NaN and Infinity breaks standard JSON parsers, e.g. web browsers, probably because it violates https://datatracker.ietf.org/doc/html/rfc8259#section-6
An example file is https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration/-/blob/master/POG/MUO/2018_UL/muon_Z.json.gz
This was introduced in #85 and is actually a feature of rapidjson. It converts the value to std::numeric_limits<double>::infinity(). This is a nice feature, but I'm not sure it's worth the caveat of breaking JSON parsing...

Two questions come to my mind:

  1. Do we need infinity? I would say maybe, since one could also just use a large number such as 999999 or collider centre-of-mass energy, but I understand that "infinity" is nicer.
  2. Is it worth using rapidjson-specific features? I would rather say no.

I would propose to use infinity, but as a quoted, case-insensitive string instead and leave the parsing/value assignment to correctionlib. In this way, the JSON standard would be preserved and one could still open the file in RFC8259-compliant applications. What do you think?

@nsmith-
Copy link
Collaborator

nsmith- commented Oct 24, 2023

It originally came up as a way to handle #83, and yes a large number would work similarly in practice. We could alter the schema and replace basically every float with float | "inf" | "-inf" or similar, but it looks like a pretty invasive change. It would be backwards-compatible, so a minor version schema update would be OK, with a deprecation of using the Infinity keyword in the JSON itself for the eventual v3.
Given the effort, I'm actually somewhat inclined to recommend replacing current usage of Infinity with a large number.

@nsmith-
Copy link
Collaborator

nsmith- commented Jan 24, 2024

@IzaakWN given you originally requested the asymmetric flow, do you have any opinions on using a "large number" instead of infinity?

@IzaakWN
Copy link
Contributor

IzaakWN commented Jan 24, 2024

No, not really strong opinion either way. As long as it's large enough to cover all realistic physics use cases (like eta, pT, mass, ...), as well as technical limitations by different data types (mainly in python and C++ I guess) . :)

@nsmith- nsmith- added the schema Issues related to the schema definition label Feb 28, 2024
@clelange
Copy link
Author

clelange commented Mar 5, 2024

I think using a large number sounds good. We could recommend something that hopefully will never be reached such as 999999?

@nsmith-
Copy link
Collaborator

nsmith- commented Mar 5, 2024

I was leaning towards implementing float | "inf" | "-inf" but the two approaches are not in conflict with each other.

@clelange
Copy link
Author

clelange commented Mar 5, 2024

For my understanding, what would this look like for:

"edges": [
    15.0,
    20.0,
    25.0,
    30.0,
    40.0,
    50.0,
    60.0,
    120.0,
    Infinity
],

@nsmith-
Copy link
Collaborator

nsmith- commented Mar 5, 2024

"edges": [
    15.0,
    20.0,
    25.0,
    30.0,
    40.0,
    50.0,
    60.0,
    120.0,
    "inf"
],

@clelange
Copy link
Author

clelange commented Mar 5, 2024

OK, fine for me. Renders OK in browsers.

@nsmith-
Copy link
Collaborator

nsmith- commented Jun 18, 2024

This has gotten more urgent since something changed in pydantic between 2.6.4 and 2.7 that makes the serialization of float('inf') now null, possibly related to pydantic/pydantic-core#872.
Regardless, I will add a change to support the proposed syntax of "inf", "+inf", "-inf"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema Issues related to the schema definition
Development

Successfully merging a pull request may close this issue.

3 participants