Incorrect decoding of 65-bit integer #701

jdmsu4 · 2024-06-07T23:55:53Z

Description

There seems to be a case where 65 bit integers are improperly decoded. I discovered this in a large dynamically generated json file at seemingly random times using the msgspec lib until I tracked down and isolated an example (below). Both larger and smaller encoded integers do not seem to have any issue. I expected the value of 19933688932870350000 to be consistently encoded and decoded as such, but instead received the number 1482881526185800828 as the output. As I was trying to figure out what was going on, I tried using the standard library json.loads() and got the expected value out. When examining the binary representation of the number, I noticed the incorrect value was equivalent to the correct one, besides the leading 4 bits:

	Decimal	Binary value	Bits
Input	19933688932870350000	10001010010010100010000001110010000110100010010110011010001111100	65
Output (incorrect)	1482881526185800828	1010010010100010000001110010000110100010010110011010001111100	61

Here is a minimally reproducible example:

import json
import msgspec

# incorrect value
encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
decoded_json =  msgspec.json.decode(encoded_json)
print(decoded_json)

# correct value
comp_encoded_json = msgspec.json.encode({"test_value": 19933688932870350000})
comp_decoded_json = json.loads(comp_encoded_json)
print(comp_decoded_json)

The text was updated successfully, but these errors were encountered:

sigprof · 2024-08-06T15:25:49Z

I suppose that this piece of code does not detect the overflow correctly:

msgspec/msgspec/_core.c

Lines 11902 to 11909 in 2c37da0

    
           if ( 
        
               (digit_count > 19) && 
        
               ( 
        
                   (is_float) || 
        
                   ((integer_end - integer_start) != 20) || 
        
                   (*integer_start != '1') || 
        
                   (mantissa <= ONE_E18) 
        
               )

Apparently the problem appears with the numbers starting from (1 << 64) + (10 ** 18) + 1, for which the above code will have mantissa = ONE_E18 + 1:

>>> test_value = (1 << 64) + (10 ** 18)
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(19446744073709551616, 19446744073709551616)

>>> test_value = (1 << 64) + (10 ** 18) + 1
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(1000000000000000001, 19446744073709551617)

and continues up to 19999999999999999999 (then at 20000000000000000000 the *integer_start != '1' part triggers, and the code starts behaving correctly again):

>>> test_value = 19999999999999999999
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(1553255926290448383, 19999999999999999999)

>>> test_value = 19999999999999999999 + 1
>>> msgspec.json.decode(json.dumps(test_value)), test_value
(20000000000000000000, 20000000000000000000)

jcrist mentioned this issue Oct 20, 2024

Fix JSON decoding of > 64 bit integers #751

Merged

jcrist closed this as completed in #751 Oct 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect decoding of 65-bit integer #701

Incorrect decoding of 65-bit integer #701

jdmsu4 commented Jun 7, 2024

sigprof commented Aug 6, 2024

Incorrect decoding of 65-bit integer #701

Incorrect decoding of 65-bit integer #701

Comments

jdmsu4 commented Jun 7, 2024

Description

sigprof commented Aug 6, 2024