Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect decoding of 65-bit integer #701

Closed
jdmsu4 opened this issue Jun 7, 2024 · 1 comment · Fixed by #751
Closed

Incorrect decoding of 65-bit integer #701

jdmsu4 opened this issue Jun 7, 2024 · 1 comment · Fixed by #751

Comments

@jdmsu4
Copy link

jdmsu4 commented Jun 7, 2024

Description

There seems to be a case where 65 bit integers are improperly decoded. I discovered this in a large dynamically generated json file at seemingly random times using the msgspec lib until I tracked down and isolated an example (below). Both larger and smaller encoded integers do not seem to have any issue. I expected the value of 19933688932870350000 to be consistently encoded and decoded as such, but instead received the number 1482881526185800828 as the output. As I was trying to figure out what was going on, I tried using the standard library json.loads() and got the expected value out. When examining the binary representation of the number, I noticed the incorrect value was equivalent to the correct one, besides the leading 4 bits:

Decimal Binary value Bits
Input 19933688932870350000 10001010010010100010000001110010000110100010010110011010001111100 65
Output (incorrect) 1482881526185800828 1010010010100010000001110010000110100010010110011010001111100 61

Here is a minimally reproducible example:

import json
import msgspec

# incorrect value
encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
decoded_json =  msgspec.json.decode(encoded_json)
print(decoded_json)

# correct value
comp_encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
comp_decoded_json = json.loads(comp_encoded_json)
print(comp_decoded_json)
@sigprof
Copy link

sigprof commented Aug 6, 2024

I suppose that this piece of code does not detect the overflow correctly:

msgspec/msgspec/_core.c

Lines 11902 to 11909 in 2c37da0

if (
(digit_count > 19) &&
(
(is_float) ||
((integer_end - integer_start) != 20) ||
(*integer_start != '1') ||
(mantissa <= ONE_E18)
)

Apparently the problem appears with the numbers starting from (1 << 64) + (10 ** 18) + 1, for which the above code will have mantissa = ONE_E18 + 1:

>>> test_value = (1 << 64) + (10 ** 18)
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(19446744073709551616, 19446744073709551616)

>>> test_value = (1 << 64) + (10 ** 18) + 1
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(1000000000000000001, 19446744073709551617)

and continues up to 19999999999999999999 (then at 20000000000000000000 the *integer_start != '1' part triggers, and the code starts behaving correctly again):

>>> test_value = 19999999999999999999
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(1553255926290448383, 19999999999999999999)

>>> test_value = 19999999999999999999 + 1
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(20000000000000000000, 20000000000000000000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants