-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON parsing fails for integer values greater than BiggestInt.max #15413
Comments
This would need another JsonNodeKind and branch of JsonNode specifically for uints, which would have no other use outside this specific case (integers between [2^63, 2^64)). A simple alternative is parsing to a float, which is technically what JS does. A |
I’ve had this problem for ages. As far as I can tell, JSON doesn’t define a range for integers and unfortunately, we cannot complete a parse successfully when our native type cannot hold the JSON integer value. The right solution fixes our stdlib, I think, but someone who cares about our hooks system should weigh in with a sympathetic design. What do our other 3rd-party JSON libraries do? |
We should parse to float for these cases. Or maybe pretend it's a string literal. UInt64 solves nothing. |
Should be doable by fixing #14696 (comment) |
It will be very tricky for caller to handle logic that normally a JInt is expected, but if the value greater than some value it is a JFloat and lose precision silently. IMHO, throwing an error make sense. |
If the library was generic on the integer type then the user could choose appropriate type for their use case including a bigint from nimble. Is it possible to make it generic with a default to BiggestInt without breaking compatibility? |
D's std.json (see https://dlang.org/phobos/std_json.html) handles large uint's correctly; nim can handle them correctly too. import std.stdio;
import std.json;
void main(string[]args){
string s = "9223372036854775808";
JSONValue j = parseJSON(s);
assert(j.toString == "9223372036854775808");
assert(j.type == JSONType.uinteger);
assert(j.get!ulong == 9223372036854775808);
} (note that https://github.com/dlang-community/std_data_json or https://vibed.org/api/vibe.data.json/ is more popular) |
@timotheecour Whats |
no, it's an enum member encoding the parsed type; see code here: https://github.com/dlang/phobos/blob/master/std/json.d here are how these map to internal D types: struct JSONValue
{
import std.exception : enforce;
union Store
{
string str;
long integer;
ulong uinteger;
double floating;
JSONValue[string] object;
JSONValue[] array;
}
private Store store; and here's the parsing logic: if (isNegative)
{
value.store.integer = parse!long(data);
value.type_tag = JSONType.integer;
}
else
{
// only set the correct union member to not confuse CTFE
ulong u = parse!ulong(data);
if (u & (1UL << 63))
{
value.store.uinteger = u;
value.type_tag = JSONType.uinteger;
}
else
{
value.store.integer = u;
value.type_tag = JSONType.integer;
}
} which is hopefully self explanatory: it tries to fit into a
float is a bad workaround here; and Big Int is not needed to solve this problem. We can adapt the above D solution which (hopefully) results in no breaking change; only parsed integers that don't fit in |
How does that help? Sounds like a terrible product to use. We need BigInt and I think it’s pretty clear why; the JSON spec demands it. Anything else is just going to require fixing later. |
Json numbers are not BigInt, they are arbitrary precision decimal numbers. The problem here is the same with string and BigInt, they require an additional tiny allocation and pointer indirection making json parsing even slower than it already is. So my recommendation is to not write to the decimal field unless absolutely necessary.
WTF float? convert the number to a lossy type and then back to uint. What could ever go wrong with that one. String would work I agree, but also an additional uint64 field would fix the problem. |
I have to agree with @krux02 here; Json numbers are not BigInt so using BigInt wouldn't solve the general problem of number, and most importantly would incur overhead in the common case; furthermore there is not ETA on std/bigints see #14696 (essentially, either it's a slow nim implementation/wrapper or it's a fast nim wrapper around GMP but which has LGPL licensing concerns, see #14696 (comment)) my proposal remains:
benefits:
|
I don't know why @krux02 brought up the number type; this issue is about the integer type. I'm against changing I'm curious why you think that we'll have bigint overhead for all integers. It seems to me that we'll know how many digits the integer has, so most smaller integers should be quite easy to accomodate natively without any bigint machinery. |
I don't see anything wrong with adding Note: other libraries I found that support (at least) uint64 in json do as I suggest (more research welcome; by no means complete):
I don't follow this. No code will break and the mapping from a PR=> #15545 |
I'm with you so far. 👍
I know. But that doesn't mean there's nothing wrong with it, right? It just means you don't see it. Once you start changing the semantics, you ruin the point of JSON. If you're using it in any kind of way beyond that which it is specified for, then basically you are doing it wrong. The whole idea is that we agree on what we know and don't know about the data.
Not my problem. I didn't invent JSON nor did I write our implementation of it. I'm just trying to explain that you're going down the wrong path if you decide to attempt to ascribe more precision to types that are intentionally vague. This isn't adding abstraction, it's removing it. Now, how are you not going to break my code when you add a new |
If the new enum value only occur if you encounter values that were impossible to use before, then I don't think it would break your code. As timothee suggested, the uint type should only be used for integer values that cannot be represented with
Well to be precise, Json only has a one type for numbers. And these are arbitrary precision decimal types (I call it JNumber for now), because that is exactly what you can represent in ASCII. Nim's native types are fixed precision integer and floating point types with up to 64 bits of precision. On top of that Nim's floating point numbers (IEEE 754) also allow NaN and ±∞. This means neithe is JNumber a subset of any of the buildin Nim types, nor is float32/float64 a subset of JNumber and the implementation of Json now has to deal with this mess. One advantage that we do know is, that almost all numbers that end up in json are serialized from some native number types, not hand written nor from some arbitrary precision library. So it totally makes sens to special case for all these builtin types to support them as good and fast as possible. So even if the Arbitrary precision number type would already be supported, it would still make sense to add this UInt special case for performance reasons. |
First, maybe you can explain what I'm missing in the following exhaustive case statement: case kind
of JInt: discard
of JFloat: discard
of JString: discard
of JBool: discard
of JObject: discard
of JArray: discard
of JNull: discard This will result in a compile-time error if you add a new value to I agree that the implementation is poor. Again, if you want my support, show me how the change does not break existing code. Alternatively, show how the change won't itself need to be changed in some breaking way in the future. Honestly, I would sooner support a solution that is 100x slower, supports only one |
Yes, sorry, you are right. For a moment I forgot that in Nim case statements fail to compile when they are not fully covered. |
What about |
JavaScript doesn't have 64 bit unsigned numbers at all, this discussion is silly. If we look at the spec, https://www.json.org/json-en.html it seems clear to me that any number must be accepted, including numbers much larger than |
Sure, but this is JSON, not JavaScript. If we break with |
I like
this gives best of both worlds:
|
Not a fan, but I'm thinking I'll be writing my own JSON parser, so don't let me stand in the way of "progress". |
@disruptek well, if you do that, you should probably start here: https://github.com/Araq/packedjson |
Am I understanding correctly that your plan is to break code by introducing |
In the meantime I found a better solution: A parse flag so that "unparsable" numbers are mapped to |
but then you cannot distinguish
or
one way is to map to a new type |
I've updated #15545 as described in #15413 (comment)
EDIT: and see #15646 for my proposed approach to mitigate impact of enum changes. |
Yes true, but not really worth to worry about. "JString" is bad, but floating point numbers need to fall back to a string already anyway if they want to serialize infinity somehow. So I think "numbers might be in a string literal" is something I think is acceptable. |
…ge via special encoded JString
* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy
* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy
* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy
* fixes nim-lang#15413 * better hide it properly * see if this makes our list of important packages happy
The JSON parsing functions (like
parseJson
andparseFile
) fail when trying to parse an integer which value exceeds the maximum value thatBiggestInt
can hold, even if that value could be represented by theBiggestUInt
type.Example
See https://play.nim-lang.org/#ix=2yON
Current Output
Expected Output
Possible Solution
json.nim
integers are represented using theBiggestInt
type (see https://github.com/nim-lang/Nim/blob/devel/lib/pure/json.nim#L184 and https://github.com/nim-lang/Nim/blob/devel/lib/pure/json.nim#L795), which is signed and thus cannot contain any values that are greater than the maximum value forBiggestInt
. A solution would be to detect ifBiggestInt
can hold the integer value and if not, useBiggestUInt
.Additional Information
The text was updated successfully, but these errors were encountered: