Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about Neo.Json.Serialize #1191

Closed
erikzhang opened this issue Oct 28, 2019 · 9 comments · Fixed by #1197
Closed

Confusion about Neo.Json.Serialize #1191

erikzhang opened this issue Oct 28, 2019 · 9 comments · Fixed by #1197
Labels
Discussion Initial issue state - proposed but not yet accepted

Comments

@erikzhang
Copy link
Member

erikzhang commented Oct 28, 2019

In InteropService, there are two SYSCALLs about JSON, Neo.Json.Serialize and Neo.Json.Deserialize.

Neo.Json.Serialize converts StackItems to JSON objects. Everything looks great, the only problem is how to convert ByteArray.

We know in NeoVM, there is no differece between byte array and string. They are both stored in a ByteArray. Currently we use ByteArray.GetString(), and convert it to a JString. That means we have the following conversion:

"test" => "test"
0x1234 => "\u00124"

The first conversion is what we need, but we prefer the second one to be:

0x1234 => "1234"

If we use ByteArray.GetByteArray() and then use ToHexString(), we get the following conversion:

"test" => "74657374"
0x1234 => "1234"

This time, we are pleased with the second conversion, but the first conversion for test is not good.

We need to decide which way to use.

Option 1: ByteArray.GetString()
Option 2: ByteArray.GetByteArray() and then ToHexString()
Option 3: Create new JSON APIs. Such as Neo.Json.NewJArray, Neo.Json.AddItem, Neo.Json.NewJObject, Neo.Json.AddProperty.

@erikzhang erikzhang added the Discussion Initial issue state - proposed but not yet accepted label Oct 28, 2019
@ixje
Copy link
Contributor

ixje commented Oct 28, 2019

I prefer option 2

"test" => "74657374"
0x1234 => "1234"

My reasoning is that byte arrays should be represented as byte arrays, not with some strange encoding in front of it. If the byte array data happens to be decodable to a human readable string that's an extra and up to the user.

@lock9
Copy link
Contributor

lock9 commented Oct 29, 2019

I'm in favor of option 2 too.

@shargon
Copy link
Member

shargon commented Oct 29, 2019

I prefer option 2 too

@igormcoelho
Copy link
Contributor

igormcoelho commented Oct 30, 2019

Looks like option 2 is better indeed, since string will be representing a bytearray as an explicit hexstring... it takes more space, but it's much more readable. And we cannot affort to have crazy chars inside json anyway. Good idea.

@shargon
Copy link
Member

shargon commented Nov 5, 2019

Usually in json is used base64 instead of Hex, did you consider the use of base64?

@erikzhang
Copy link
Member Author

Base64 has a higher compression ratio. I like it.

@ixje
Copy link
Contributor

ixje commented Nov 5, 2019

Base64 has a higher compression ratio. I like it.

Base64 adds overhead both on processing and size (ref)

An obvious method to escape binary data is to use Base64. However, Base64 has a high processing overhead. Also it expands 3 bytes into 4 characters which leads to an increased data size by around 33%.

In [1]: import base64

In [2]: array = b'\x11'*64

In [3]: array
Out[3]: b'\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11\x11'

In [4]: base64.b64encode(array)
Out[5]: b'EREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREREQ=='

In [5]: len(base64.b64encode(array))
Out[5]: 88

In [6]: len(array)
Out[6]: 64

@erikzhang
Copy link
Member Author

erikzhang commented Nov 5, 2019

Base64 encodes 6bit into one char. Hex encodes 4 bits into one char. There is no byte array in JSON, so we have to encode byte array into string. Either use base64 or hex.

@ixje
Copy link
Contributor

ixje commented Nov 5, 2019

oh right forgot the output has to be a string, still early morning 😅
base64 sounds good as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Initial issue state - proposed but not yet accepted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants