Skip to content
This repository has been archived by the owner on Aug 20, 2021. It is now read-only.

Next steps #1

Open
whyrusleeping opened this issue Jul 3, 2017 · 28 comments
Open

Next steps #1

whyrusleeping opened this issue Jul 3, 2017 · 28 comments

Comments

@whyrusleeping
Copy link
Member

To get this integrated more officially into go-ipfs, we will need to first clean the code here up a little (make sure it complies with golint and vetting tools) and then make it confirm to the newer go-ipld-format plugin semantics (it needs a DecodeBlock method matching this: https://github.com/ipfs/go-ipld-format/blob/master/coding.go#L13) Then in an init function, it needs to register itself in the decoders map.

Then, any build of go-ipfs that imports this package will automatically be able to handle ethereum types.

There are also a few changes from https://github.com/ipfs/go-ipfs/compare/feat/zcash that we will need to get merged (primarily the changes to the ipfs dag put command that allows hex input).

And finally, this package doesnt implement handling for all the different ethereum object types. I only did block, transaction, and transaction trie parsing. Working on support for state trie processing will be nice.

cc @hermanjunge

@whyrusleeping
Copy link
Member Author

also cc @kumavis

@whyrusleeping
Copy link
Member Author

Once all the above is done, building ipfs with ethereum support would still require running a custom binary. I'm really wanting to add support for plugins to ipfs, and that would nicely solve the issue. You could just build this package as a plugin, put it in the ipfs/plugins directory, and bam! you have the ability to traverse ethereum dags

@kumavis
Copy link

kumavis commented Jul 3, 2017

yeah i agree plugins would be awesome - look forward to that

@kumavis
Copy link

kumavis commented Jul 3, 2017

I only did block, transaction, and transaction trie parsing. Working on support for state trie processing will be nice.

currently eth-ipfs bridge only serves blocks, transactions (no tries), state and storage tries, no tx receipts or no tx receipt tries. These are missing b/c parity does not index them by hash (tx receipt) or doesn't store them at all (tx trie, tx receipt tree).

@ghost
Copy link

ghost commented Jul 20, 2017

Done

  • Learned and ran gx tools to update dependencies.
  • Learned and run the vetting tools.
  • Added docs and hacks to be able to dev this element outside linux.
  • Added the respective Pull Request.
  • Realized that we were working with the output of ethapi's BlockByHeader. We need to work with eth-block as a block header. Added those notes.
  • Also realized that the shortest path to extract the ethereum blockchain into IPFS is to patch ethapi to give us from geth all the data we need. One repo with the extracting CLI, and another one with a patch to geth will be developed in parallel.

Next Steps

  • Refactor to comply with eth-block to be a block header refactor/block-header.
  • Add support for state trie element retrieval feat/state-trie.
  • Extend this program to a repo to be called go-ipld-eth-dump-star, which will be a CLI app where you provide a block hash, and this program will recursively retrieve and add to IPFS the block header, ommer list, state trie elements, etc; Until complete the 8 IPLD types. This should be the angular stone to the ambitious project of having the whole eth blockchain on IPFS.
    • To accomplish above, we will need to build a patch to go-ethereum's ethapi package, to retrieve whatever we need.
    • Support for the rest of the 8 IPLD types.
[0x90] eth-block 
  ommers: eth-block-list
  txs: eth-tx-trie
  receipts: eth-tx-receipt-trie
  state: eth-state-trie

[0x91] eth-tx (local data only)
[0x92] eth-tx-receipt (local data only)
[0x93] eth-account-snapshot (local data only)

[0x94] eth-block-list (rlp array)
  [leaves]: eth-block

[0x95] eth-tx-trie (merkle trie)
  [leaves]: eth-tx

[0x96] eth-tx-receipt-trie (merkle trie)
  [leaves]: links to eth-tx-receipt

[0x97] eth-state-trie (secure merkle trie)
  [leaves]: links to eth-account-snapshot

[0x98] eth-storage-trie (secure merkle trie)
  [leaves]: links to raw binary

@ghost
Copy link

ghost commented Jul 20, 2017

Paging @Kubuxu @Stebalien as per @whyrusleeping advice.

@ghost
Copy link

ghost commented Jul 20, 2017

And linking to our evil world domination plan repo MetaMask/eth-ipfs-browser-client#1

u0f0vk4

@Stebalien
Copy link
Member

Then in an init function, it needs to register itself in the decoders map.

FYI, we've stopped doing this. Instead, just call Register(codec, decoder) at some point before trying to decode an eth block (this makes it easier to register eth decoders from plugins).

@ghost
Copy link

ghost commented Jul 20, 2017

@Stebalien
Copy link
Member

Ah, ok. Just wanted to make sure there wasn't out-of-date information floating around.

@ghost
Copy link

ghost commented Jul 20, 2017

@Stebalien please, if you have some time, take a look at the attempt of documenting each public function PR, so golint stopped nagging me 😉. Most of the comments are stolen borrowed from go-ipld-format interface.

I am building from this first PR.

@ghost
Copy link

ghost commented Aug 6, 2017

Moving forward with this. A big refactor was already done here in #5. Some time will be spent on working on an importer. We can use the material in the plugin's directory README to make a blog post in the future.

@ghost
Copy link

ghost commented Aug 7, 2017

Got some interesting data on my first attempt to import

screen_shot_2017-08-07_at_6 31 26_am

The importing performance should improve with a truckload of cheap machines (or research `amazon's lambda`s maybe?) and a shared stack. Redis comes to my mind.


herman
[07:15] 
Finally. One answer to a question

[07:15] 
2017/08/07 07:11:10 From the stack: 0xc7041743ad5152d8d13815ca6be379ff3b4c994069cc419867ab0d890d460b5f
2017/08/07 07:11:10     z45oqTS7yKVxeLJE8H1Q5o8nTusiARceKKt7hMkbED8PDeaCHQ2
2017/08/07 07:11:10         This is a leaf
2017/08/07 07:11:10             Node imported. Count = 12352
2017/08/07 07:11:10 From the stack: 0x84269463e5e9ecf08491d8745b98cec308498076c2cacbbe1c6e7adbe5d00438
2017/08/07 07:11:10     z45oqTS3UJgmLqbXEdANJGbbHKTHJcdZhvhkkrsoD6XL2A4dftb
2017/08/07 07:11:10         Adding 0xef6d2178835239b85ea68f9b3c2201ee49daf3744ebeb48901cc9374d9b97b9d (idx: b) to the stack
2017/08/07 07:11:10         Adding 0x06214d858b09063e9efe886d4f634348a7845a729807472bec1dbb26c40ac136 (idx: 5) to the stack
2017/08/07 07:11:10             Node imported. Count = 12353
2017/08/07 07:11:10 From the stack: 0x06214d858b09063e9efe886d4f634348a7845a729807472bec1dbb26c40ac136
2017/08/07 07:11:10     z45oqTRtzNeddu43X6Xvt8SBFmtVxukPrPZeBe4tiGNZYCeHf7K
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12354
2017/08/07 07:11:11 From the stack: 0xef6d2178835239b85ea68f9b3c2201ee49daf3744ebeb48901cc9374d9b97b9d
2017/08/07 07:11:11     z45oqTSAh4htdRWX3DNXP1Ze2sQJ55UrukYNYoSMVitNXeY4P9n
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12355
2017/08/07 07:11:11 From the stack: 0x1a202509db353cf86ea03dc0a9864a2c40af91e8bd28c1dc8ac56818824ed638
2017/08/07 07:11:11     z45oqTRvLRn9vW9u7VeEE7rqtrr4jz8ks5zzDSggNkF8BGgW9My
2017/08/07 07:11:11         This is a leaf
2017/08/07 07:11:11             Node imported. Count = 12356
Stack Empty. We are done here :D
[07:15] 
Genesis Block has `8,892` accounts (see
https://github.com/ethereum/pyethsaletool/blob/master/genesis_block.json)


[07:16] 
And `12'356` state trie nodes (took from `06:27:09` to `07:11:11` to traverse them all. 
Tunneling to the source Me in Chile, `mantis` in `Azure East 2` ) (edited)


[07:17] 
Etherscan says that the latest block (`#4127835`) has `5,270,884`
https://etherscan.io/accounts


herman
[07:31] 
So, a näive download at this rate to the latest block should take `435` hours ->
https://www.wolframalpha.com/input/?i=(06:27:09+to+07:11:11)+*+(5270327%2F8892)


[07:34] 
Now.
1) We will do the retrieval from a local machine with respected to the parity server.
2) as you get more blocks, the odds of "repeating" trie nodes increase
(That's the whole point of using a state trie).
3) We have to figure out a way to parallelize this process
(as stated above, several machines or lambdas, plus a common stack in redis, for example). or 
4) We can do the initial job, with just plugin to an inactive levelDB for earlier blocks,
and then using the API for the latest blocks.


[07:35] 
Anyways. We will figure out something, as always. At last we have numbers to start with!

@Kubuxu
Copy link
Member

Kubuxu commented Aug 7, 2017

There should also be a major perf improvement if you switch your IPFS node from flatfs to bager (still WIP) right now but what you can do is:

  1. disable DHT - --dht=none option for the daemon for initial add
  2. enable the flatfs NoSync option in the config

@ghost
Copy link

ghost commented Aug 21, 2017

OK. #7 is the second (and hopefully last) heavy overhaul. Now we can talk about organic growth, continuous improvements and the such.

Current focus is making a fast and decent importer (https://github.com/hermanjunge/go-ipld-eth-import, to be someday gave away to ipfs) for the eth-state-trie elements.

TODO

  • [0x96] - eth-state-trie. Support input for RLP encoded state trie elements.

    • HINT: We get them from the Parity IPFS API.
    • Develop this library feature in tandem with go-ipld-eth-import.
    • And [0x97] - eth-account-snapshot
  • [0x95] - eth-tx-receipt:

    • Propose a script to get all receipts from a block and make a JSON array of them.
    • Support the input of this JSON array to form the eth-tx-receipt-trie ([0x96]) leaves, and the eth-tx-receipt objects.
  • The rest of the IPLD ETH Types:

    • [0x91] - eth-block-list
    • [0x98] - eth-storage-trie

@ghost
Copy link

ghost commented Sep 20, 2017

This one PR moves the needle to the right.

Docs

We have a pretty decent doc to make a huge blog post on this! Pinging @whyrusleeping as you requested this.

TODO

Operational

  • Finish the "cold" importer. Which takes data from a disconnected (hence "cold") geth levelDB, and imports it into a local IPFS node.
  • Implement the "hot" importer: A stripped-down geth node that will connect to the DevP2P grid, taking the latest updates in the blockchain and importing them into the local IPFS node.

Remaining codecs to implement

  • [0x95] - eth-tx-receipt:

    • Propose a script to get all receipts from a block and make a JSON array of them.
    • Support the input of this JSON array to form the eth-tx-receipt-trie ([0x96]) leaves, and the eth-tx-receipt objects.
  • The rest of the IPLD ETH Types:

    • [0x91] - eth-block-list
    • [0x98] - eth-storage-trie

Ideas

  • Transform
    • One for eth address -> <block-cid>/address/<eth-address>/balance to be <block-cid>/root/<keccak256(eth-address)/balance.
    • One for tx -> <block-cid>/txs/<tx-id>/nonce to be <block-cid>/tx/<rlp(tx-id)/nonce

@ghost
Copy link

ghost commented Sep 20, 2017

This is a write up on IPFS/notes I made the other day.

@ghost
Copy link

ghost commented Oct 4, 2017

@dryajov

Remaining codecs to implement

  • The rest of the IPLD ETH Types:

    • [0x91] - eth-block-list
    • [0x98] - eth-storage-trie
  • Fix eth-account-snapshot to reference:

    • [0x98] - Its storage root (an eth-storage-trie)
    • [0x55] - A raw keccak256 hash referencing the EVM Code.
  • [0x95] - eth-tx-receipt:

    • Propose a script to get all receipts from a block and make a JSON array of them.
    • Support the input of this JSON array to form the eth-tx-receipt-trie ([0x96]) leaves, and the eth-tx-receipt objects.

### EVM Code codec

Following PRs should be approved to include 0x99 codec here. Please give them a close following, as they involve a practical discussion whether it makes sense to add a new codec, or if we stick to 0x55 (raw data), as the EVM code has no structure.

* multiformats/multicodec#61
* ipfs/go-cid#37

@ghost ghost self-assigned this Nov 7, 2017
@rmulhol
Copy link

rmulhol commented May 14, 2018

@hermanjunge How did you fetch the state trie rlp data that's in the test data directory? I see that you mentioned using the parity ipfs api - how did you determine the cid to pass in? Did you use this tool? If so, did you generate the eth-state-trie cid from a block hash, a state root hash, or something different?

@ghost
Copy link

ghost commented May 14, 2018

Did you use this tool?

That's correct, https://github.com/kumavis/eth-ipld-cli

If so, did you generate the eth-state-trie cid from a block hash, a state root hash, or something different?

The root state trie can be obtained from the block header. Succesive trie hashes are obtained when you retrieve this first element from a database (i.e. The ipfs-parity API), and then continue traversing. To know the traversal path, you need to hash (keccak-256) the value of the ethereum address. There is section documenting this example of performing the former operation manually with the ipfs client and the plugin in this repository. You can even find code to create the hash in that section.

Hope this answers your question.

@rmulhol
Copy link

rmulhol commented May 14, 2018

Thanks for the quick reply, @hermanjunge! That example is really helpful, and I'm super excited to see where this project goes/potentially contribute.

Quick follow up - the example works for fetching the state root of the genesis block (and for traversing to accounts from there). Do you know whether it's possible to perform similar operations on subsequent blocks?

For example, with the genesis block, I know that the cid for the header is z43AaGF73rnZ14vjAkMQ8xoNfBShmq8qaiqFuELAx1vxSTzfGY2 and the cid for the root is z45oqTS97WG4WsMjquajJ8PB9Ubt3ks7rGmo14P5XWjnPL7LHDM, and I can traverse downward to learn information about accounts from there.

However, for block 5,000,000, it appears that the cid for the header is z43AaGF1A8G45wosbcDDkCMWyNt5FfWc1UMM3EzrdS9ZTGN419B and the cid for the root is z45oqTS15RnXKjQMUS4gtmpJJzeuKeYLE2yw1pdi98NUxCH6YZi, but the parity ipfs api call of http://localhost:5001/api/v0/block/get?arg=z45oqTS15RnXKjQMUS4gtmpJJzeuKeYLE2yw1pdi98NUxCH6YZi yields an error of State root not found (at least for me). Any idea what might be happening here?

@ghost
Copy link

ghost commented May 14, 2018

Is highly probable that your parity client has pruned that state from the database, or have not even obtained that element from its synchronization. You may want to try with a latter block and state trie.

@ghost
Copy link

ghost commented May 14, 2018

I checked with my running server and failed for block 5,000,000. However, for a recent block (5,614,095), I got success. Here,

https://etherscan.io/block/5614095

gave me its hash 0x536c2a4cf78f03268dc7f2bac2e5ce541d13fad0179891c47cd6825cedcb5829

eth-ipld cid 0x536c2a4cf78f03268dc7f2bac2e5ce541d13fad0179891c47cd6825cedcb5829

# gives  "ethBlock": "z43AaGExLSyBxdzcVdwGtC4X3Ydf2ftKYzQsyr1W6MioA3cZT4c",

curl --output - http://localhost:5001/api/v0/block/get?arg=z43AaGExLSyBxdzcVdwGtC4X3Ydf2ftKYzQsyr1W6MioA3cZT4c | eth-ipld block

# and I am able to get the stateRoot
# "stateRoot": "0x9c3e5ae1dcdbfcde4d804a4e54e793c8ac6328151d2dcf95438df04d98fe9703",

# which I convert to cid
# eth-ipld cid 0x9c3e5ae1dcdbfcde4d804a4e54e793c8ac6328151d2dcf95438df04d98fe9703

# then

curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld rlp

# gives

[
  "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080",
  "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723",
...
  "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4",
  ""
]

@AFDudley
Copy link

Hi Herman!

I was able to run:
curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld node

and get this result:

{ "type": "branch", "children": { "0": "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080", "1": "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723", "2": "ecab2131db994982a26b540241fc4e7710b2aa1301383794a2ba12ff3200d5f4", "3": "cc486e899be905efdfbea3cd6b66d16e06e6c71759859d957eab69979eff875f", "4": "2834b562daf7e045516c2c85bd60a42ff4bcbc729efb240a6245e99a2c126f5f", "5": "d1f0e775c71bc99cc1db69ac4275283693bf7d701b8ea7db02c72e0a46b97405", "6": "8c5b2b89a8eb9488507c057a43a068cbda6ec937ecca32b29823e78d67dbe977", "7": "57d2db06cb923f043d1e16b170cb14a35b2efc96f272fd29c39e546d44b881ac", "8": "ebbbb6b1320a5ff55b2d8c35e52b810cd1eb4f90f0ef8f87d6fbaf0580ad950e", "9": "e659edc1f12beb959cf1405d6a4d8e669c77b109cfe4f4a56c7b748094c878ac", "a": "9c2bea51084f610a779574f0cf23f4f8e406766fe1d845078ce95e370b06aa02", "b": "01d04ace33310b608c1c751b6775c1ab91041efab45a5a91c817c29165c50bee", "c": "cf7d5c9c8b86721a18700afef55d316aed43635d955a631f490729237a27f168", "d": "fcf1c49c0585961b3e03f0fcecdb3f2cc23f3a9c796973f7b27a160841937500", "e": "71338d7803e20cf47d411e9ba0a6594492d62c656c5a53b16f825b934c6bcdce", "f": "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4" }, "value": "0x" }

But when I ran it with eth-ipld block I get this error:
Error: wrong number of fields in data at Object.exports.defineProperties (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-util/dist/index.js:698:15) at new module.exports (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-block/header.js:79:9) at getStdin.buffer.then (/usr/local/lib/node_modules/eth-ipld/commands/block.js:31:20) at process._tickCallback (internal/process/next_tick.js:109:7)

could you explain what's going on? Thanks!

@ghost
Copy link

ghost commented May 14, 2018

eth-ipld block processes the RLP of a block header

@AFDudley
Copy link

Yes. your line (with the typo fixed):
curl --output - http://localhost:5001/api/v0/block/get?arg=z45oqTS56MVrDkBQoQFs5mcxLHty2msTu2D3cJfdZfFsirxKnaN | eth-ipld block

returns:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 532 0 532 0 0 6897 0 --:--:-- --:--:-- --:--:-- 7000 Error: wrong number of fields in data at Object.exports.defineProperties (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-util/dist/index.js:698:15) at new module.exports (/usr/local/lib/node_modules/eth-ipld/node_modules/ethereumjs-block/header.js:79:9) at getStdin.buffer.then (/usr/local/lib/node_modules/eth-ipld/commands/block.js:31:20) at process._tickCallback (internal/process/next_tick.js:109:7)

Not:
[ "0c416261069b8763ea27d6eafb97351e511c951ac8b6eeed5ecd02a59a85e080", "dd5f3eaef4a1aa058a7da097c28be246d812d0c921ee2eb4ced6a8088e34e723", ... "8771e32a2fabd77211b95bd12e1b67db6755601ae046da94a07dc983c7300bd4", "" ]

Are we using different versions of eth-ipld block ?

Thanks again.

@ghost
Copy link

ghost commented May 15, 2018

You are right @AFDudley , I checked my ./bash_history. Bad copy-pasta. I meant eth-ipld rlp. Apologies. Typo corrected above.

@AFDudley
Copy link

Thanks, I was able to replicate that with a more recent block.

@ghost ghost removed their assignment Aug 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants