-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] GetJsonObject does not normalize non-string output #10218
Comments
Hi @SurajAralihalli, as I confirmed with Chong, assign this issue to you. |
This commit will solve items1, item2, item4. |
For floating point numbers I think the rule is the same as double to string in java, if so we can reuse |
Describe the bug
GetJsonObject on the CPU will first parse the JSON, and then when it goes to output the result it will convert the parsed data back to a JSON string. This results in the new string being normalized. We do not do any of this. Instead we just copy the character range back out. The following bugs can show up because of this.
{ "a" : "A" }
becomes{"a":"A"}
on the CPU, but{ "a" : "A" }
on the GPU{'a':'A"'}
becomes{"a":"A\""}
on the CPU, but stays as{'a':'A"'}
on the GPUIn the simplest case Spark strips unneeded trailing zeros for floating point numbers.
[100.0,200.000,351.980]
on the CPU becomes[100.0,200.0,351.98]
, but on the GPU it is unchangedFor larger floating point numbers it can be converted to scientific notation, or have the notation normalized.
[12345678900000000000.0]
becomes[1.23456789E19]
on the CPU, but is unchanged on the GPU.[1E308]
becomes[1.0E308]
on the CPU.But for very large/small float numbers that would not fit in a double, they are turned into "Infinity"/"-Infinity"
[1.0E309,-1E309,1E5000]
becomes["Infinity","-Infinity","Infinity"]
But integer like numbers are not modified
[12345678900000000000]
just stays the same, even for numbers that are very, very large. i.e."1" + ("0" * 400)
{"a":"B\'"}
becomes{"a":"B'"}
. Escaping the ' character is not needed here.That said we don't need to worry about normalizing nulls, as they are always
null
and nothing else is allowed, or booleans becausetrue
andfalse
are the only ones supported.The text was updated successfully, but these errors were encountered: