Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add eth-evm-code codec #61

Closed
wants to merge 1 commit into from
Closed

Add eth-evm-code codec #61

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Oct 4, 2017

This is the binary found in a smart contract account, referenced in ethereum by keccak256(evm_code_binary). It is binary data, which the Ethereum Virtual Machine (EVM) interprets.

This is the binary found in a smart contract account, referenced in ethereum by `keccak256(evm_code_binary)`. It is binary data, which the Ethereum Virtual Machine (EVM) interprets.
@whyrusleeping whyrusleeping requested a review from kumavis October 4, 2017 13:39
@whyrusleeping
Copy link
Member

This is yet another distinct ethereum format?

@ghost
Copy link
Author

ghost commented Oct 4, 2017

@whyrusleeping It is a raw binary. The work around is using 0x55 to reference it. We would like to have a codec for meta purposes, i.e. We would know "Oh, this IPLD is an EVM Code"

@whyrusleeping
Copy link
Member

Hrm... if there isnt any inherent structure to the data, then i really hesitate to add another new code for it.

@Stebalien
Copy link
Member

In general, that's not what CIDs are for. CIDs allow one to interpret unstructured binary into structured data but shouldn't (usually) be used to determine the intended use of the data (unixfs currently does this but it but shouldn't).

For example, we've considered adding a code for webasm but that's because it's an AST, not just raw binary. However, if a user provided a CBOR-encoded webasm AST, an IPLD webasm VM shouldn't care.

@ghost ghost mentioned this pull request Oct 4, 2017
@ghost
Copy link
Author

ghost commented Oct 4, 2017

I see. I'm short on argumentations, as I am not particularly clear on the overall vision of CID. While the points expressed above make sense to me, I'd like to get @kumavis feedback, as he is the "end user" of this information.

@Stebalien
Copy link
Member

Basically, IPLD == structured data. CIDs exist to allow us to be compatible with existing merkledags (e.g., eth, git, our own protobuf-based merkledag (used in unixfs), etc.). In general, IPLD does not encode type information beyond basic primitives such as: byte array, array, string, number, float, map, etc... (exact primitives TBD by the end of this week or I start kidnapping people).

@kumavis
Copy link
Contributor

kumavis commented Oct 4, 2017

yeah theres not a lot of structure to this. you could index it by program counter or op code index (skipping inline data associated with an opcode) but no external links.
but there is value in identifying what the data is "this is evm code"

im indifferent

Copy link
Member

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • Is there a need for this to be "serialized and deserialized"?
  • Is there any local resolve to happen?

From the thread, sounds like you won't benefit from having the evm code as a IPLD Format.

@ghost
Copy link
Author

ghost commented Oct 6, 2017

@diasdavid

Is there a need for this to be "serialized and deserialized"?
Is there any local resolve to happen?

No and no. 😄

Just a little bit of context...

We have eth-account-snapshots. One of their fields is the codeHash (for now, just a keccak256 hash of the actual EVM code this account holds, but wanting it to become a CID, so we can fetch it).

What are we doing now? We are importing this data, making it a "raw" (0x55) elementand hashing it as a "keccak256" (0x1b). One example is zb34VweGrraA5TkNXMLYj1AEUw9GfqsU9ov6PM9QiKjxKFf6J. We already imported 30,261 of these, which are the unique versions of the 1845516 smart contracts of the current mainnet ethereum blockchain (yeah, there is a huge number of dups in there).

So, as @kumavis said, there is value on identifying that this is not just ordinary raw information hashed into keccak256, but actual ethereum smart contract code. I second him, as I see a meta value in figuring out the contents of a cid via its codec, But @Stebalien well argues that CID shouldn`t convey this information. Both arguments make sense to me, therefore I am indiferent.

Do we close this PR then?

@daviddias
Copy link
Member

daviddias commented Oct 9, 2017

@Stebalien is right, CID should not convey that information. That doesn't mean that the metadata information should not exist in the whole data structure. Think of CID such as Content Address + File Extension, the File Extension part lets the IPLD resolver figure out what content it is, how to parse it and traverse through it and that is pretty much it (for now).

Also, if I'm understanding correctly, you know that the codeHash is indeed pointing to EVM code, isn't that enough meta info to know that the data coming from that pointer is indeed EVM code?

@ghost
Copy link
Author

ghost commented Oct 9, 2017

All clear. Closing this issue.

@ghost ghost closed this Oct 9, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants