Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

BigInteger to Hex in most compressed format (NeoVM 3) #170

Closed
igormcoelho opened this issue Jun 12, 2019 · 12 comments
Closed

BigInteger to Hex in most compressed format (NeoVM 3) #170

igormcoelho opened this issue Jun 12, 2019 · 12 comments

Comments

@igormcoelho
Copy link
Contributor

igormcoelho commented Jun 12, 2019

A more uniform view of the number zero can be achieved by always represent it in its most compressed format. 0x0000 is zero, 0x00 is also zero, but 0x'' (empty array) is also zero. So we should adopt empty array as our new standard. Currently, the standard (by C# BigInteger) is "0x00" I think (although 0x"" is also recognized as zero...)
This helps in many things:

  • PUSH0 now may push VM_Integer zero (as it is an empty array now, not 0x00, this eventually causes confusion during compiling...)
  • In general, PUSHX should push a number (VM_Int), and PUSHF (which is currently equals to PUSH0), should push a VM_Boolean False.
    So, Number 0 as bytearray, will actually be an empty bytearray, as expected. Before this, I used to make some strange computations, like PUSH0 PUSH0 ADD, not being equals to PUSH0, for example. Now this wont happen anymore.

If someone wants the hexstring zero, it can do PUSHBYTES1 00, as always.

@igormcoelho
Copy link
Contributor Author

This is very useful for NEP-5... if you have Balance zero, it will make it an empty bytearray (which can be ripped out automatically, if other proposal is accepted on Neo #824).

@ixje
Copy link
Contributor

ixje commented Jun 13, 2019

PUSH0 now may push VM_Integer zero (as it is an empty array now, not 0x00, this eventually causes confusion during compiling...)

🤔 Looking at

case OpCode.PUSH0:
{
context.EvaluationStack.Push(EmptyBytes);
if (!CheckStackSize(true)) return false;
break;
I don't see how can that happen? Can you elaborate?


So, Number 0 as bytearray, will actually be an empty bytearray, as expected

I don't think that's expected behaviour at all
C#

int x = 0;
Console.WriteLine(BitConverter.ToString(BitConverter.GetBytes(x))); // 00-00-00-00

python

In [1]: x = 0

In [2]: x.to_bytes(4, 'little')
Out[2]: b'\x00\x00\x00\x00'

Limiting it to a single byte (0x00) could make sense, but empty byte array for an actual value can't be right.


I actually think we should fix it like we did here #132 for the Boolean GetByteArray() issue.

If the instructions PUSH1 - PUSH16 push the numbers 1 - 16, then having PUSH0 not pushing the number 0 is a deviation from the convention.

Here are 4 tests (json);

  1. compare 2 x PUSH0 with NUMEQUAL
  2. compare 2 x PUSH0 with EQUAL
  3. ADD 2 x PUSH0, Compare to PUSH0 with NUMEQUAL
  4. ADD 2 x PUSH0, Compare to PUSH0 with EQUAL

Number 4 is the test you described. In the current state 1-3 pass, 4 will fail (as you described). As seen above the current PUSH0 logic pushes EmptyBytes that looks as follows

private static readonly byte[] EmptyBytes = new byte[0];

change this to

private static readonly byte[] EmptyBytes = { 0 };

and all 4 tests pass as should be expected and the naming convention of the opcodes actually becomes compliant. (We probably want to rename EmptyBytes)
-edit-
and we should likely push it as an int instead of bytearray, otherwise PUSH1-16 gives Integer types whereas PUSH0 gives a ByteArray type.

@igormcoelho
Copy link
Contributor Author

igormcoelho commented Jun 13, 2019

In fact, the C# BigInteger from numerics is the one we use as standard for integers (thats why I emphasized it as VM_Integer). Negative values are two complement, and sometimes an extra zero is required on the left to differentiate from negative values. My proposal just affects zero value. Some months ago I fixed python project, in the opposite direction... it considered zero as empty array, what I propose now for Neo3.

CityOfZion/neo-python-core#50

@igormcoelho
Copy link
Contributor Author

@ixje this is the current standard for numbers:
https://docs.microsoft.com/en-us/dotnet/api/system.numerics.biginteger.tobytearray?view=netframework-4.8

//    -1 (FF) -> 1 bytes: FF
//    1 (01) -> 1 bytes: 01
//    0 (00) -> 1 bytes: 00
//    120 (78) -> 1 bytes: 78
//    128 (0080) -> 2 bytes: 80 00
//    255 (00FF) -> 2 bytes: FF 00
//    1024 (0400) -> 2 bytes: 00 04
//    -9223372036854775808 (8000000000000000) -> 8 bytes: 00 00 00 00 00 00 00 80
//    9223372036854775807 (7FFFFFFFFFFFFFFF) -> 8 bytes: FF FF FF FF FF FF FF 7F
//    90123123981293054321 (04E2B5A7C4A975E971) -> 9 bytes: 71 E9 75 A9 C4 A7 B5 E2 04

What I want to do is to change:
// 0 (00) -> 1 bytes: 00
to
// 0 (00) -> 0 bytes:

This will make it fully standard with the rule of being the most compressed possible, and will automatically help solving many interesting things.

igormcoelho added a commit to igormcoelho/neo-vm that referenced this issue Jun 13, 2019
@igormcoelho
Copy link
Contributor Author

@ixje please take a look at tests from PR: #173

@ixje
Copy link
Contributor

ixje commented Jun 13, 2019

I took a look. I like having a clear PUSH0 and PUSHBYTES0 👍 .

However, I still don't think it is good practice to turn 0x00 into 0 bytes. It's very unconventional and confusing.

@igormcoelho
Copy link
Contributor Author

igormcoelho commented Jun 14, 2019

In fact,this is transparent for users and devs... the most affected ones its us, here, devs creating dev tools for the devs 😂 Its important to be space efficient, specially on a high performant blockchain.
On specification it will be straightforward to explain it this way, nice and clean ;) still waiting for more comments, specially @lightzero.

@ixje
Copy link
Contributor

ixje commented Jun 14, 2019

I'm trying to see the issue from your side but I just cannot see how this is more transparent than having a natural relationship of PUSHx pushes 1 byte with the value of x on the stack? It feels to me that you're pushing for it for storage reasons more than it being logical from convention/standard perspective.


On specification it will be straightforward to explain it this way, nice and clean

In your case the specification must say something like.

PUSH<x> pushes 1 byte with the value of <x> on the stack, with the exception of PUSH0 which pushes an empty array

We can drop the whole second line if we do not adopt this empty bytearray behaviour.


What is the main reason behind this proposal? Reducing storage needs?
If so, why don't we look at places that store a big amount of bytes and try to reduce those instead? e.g. storage keys. Do we really need a UInt20 prefix or can we reduce that? Or how about removing the grouping bytes? What about changing from UInt256 data types in blocks to something else etc.

In general 00 bytes also compress well and I'm sure the Leveldb people thought about that.

@igormcoelho
Copy link
Contributor Author

igormcoelho commented Jun 14, 2019

PUSH<x> pushes 1 byte with the value of <x> on the stack, with the exception of PUSH0 which pushes an empty array

this is incorrect. The current proposed behavior is: PUSH<x> pushes StackItem Integer x into the stack. This is valid for x=-1 until 16. You can check the tests, there's no exception here. PUSH0will push Integer zero, the same way PUSH16 will push Integer sixteen.

The other "family" of opcodes is: PUSHBYTES<y>, that pushes y bytes into the stack. If y=1, we push 1 byte; if y=2, we push two bytes; if y=0, we push 0 bytes. Everything is standard now.

If you look into C# BigInteger, having zero as 0x00 is an exception, not the rule case. The rule case is simply removing all leading zeroes when not necessary, except when BigInteger is zero... Note that if you load an empty byte array, it will also be recognized as zero, but if you serialize it again, it won't be empty, it will be 0x00. This is not good in my opinion, and if you look at the devpack, this is the cause to some workarounds that won't be necessary anymore.

The magic on the storage is not the single byte, but the ability to garbage collect automatically every existing key with empty bytearray as value... in NEP-5 scenario, it means that it will be impossible to create empty addresses without assets, occupying a big space on the key side and all the key prefixes. Without this, it's not possible to implement automatic garbage collection.

@ixje
Copy link
Contributor

ixje commented Jun 14, 2019

I don't even understand what the "automatic" in "automatic garbage" collection means in this case. I see nothing preventing some algorithm looking for keys with 1 0x00 byte value as opposed to searching for an empty bytearray. Anyway I'm retracting myself, do as the C# people see fit. I'm too tired of discussing 99% C# proponents on this repo not caring about what a change does for any of the alternative languages. All I can say it that all these obscure things are more than likely going to result in unexpected behaviour in alternative languages as they're super easy to miss (and I say that from a perspective of having cleaned up many of those misses in the current audit I'm doing).

@vncoelho
Copy link
Member

Your answer is like a lesson, @igormcoelho. A true professor, brother, even without dominating C# you can think in all languages.
Igor's background in C\C++ is what makes the real difference.

@igormcoelho
Copy link
Contributor Author

igormcoelho commented Jun 14, 2019

@ixje this is still under discussion, topic is not closed: see PR #173. I asked your opinion here because it's valuable.

In fact, I just retracted from the creation of PUSHBYTES0, because it could possibly break (or require name changes) on the tools of many people, including our neo-avm-optimizer, neo-one, perhaps neoscan avm parser, all blockchain explorers, and mainly neo-python. This is not a C# protocol, it is a language agnostic protocol. I have been implementing Neo on C++ and trying to help neo-python to fix bugs in the past, so I can guarantee to you that no decision is based on C# here. I'm not a C# developer, and when NEO3 Specification (https://github.com/neo-project/specification) is ready, there will be no references in it for the C# language. If you take any existing biginteger library, all of them will have the same or nearly the same behavior. Negative is two's complement, and positive byte array is pruned. Standard python would put biginteger 0 as empty array, so this change is beneficial to python and not beneficial to C# (but good to Neo3 protocol on general).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants