Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation of era1 + export + verify #1998

Merged
merged 7 commits into from
Feb 9, 2024
Merged

Conversation

kdeme
Copy link
Contributor

@kdeme kdeme commented Jan 31, 2024

Initial implementation of era1 + export and verify command in the existing eth_data_exporter.

Implementation of era1 file format as currently described in: ethereum/go-ethereum#26621

Main issue currently is: 3c67911#diff-1948da0f3a7cfa79607319be10ae226dae3c9d9bbbd6ff35faea52df948c571cR40-R43

# - Snappy does not give the same compression result as the implementation used
# by geth for some block headers and block bodies. This is an issue if we want
# to rely on sha256sum as checksum for the individual era1 files as is suggested
# in https://github.com/ethereum/go-ethereum/pull/26621

Other TODOs:

  • General clean-up / rework some code
  • Validation of blocks and receipts in verify command
  • Testing of last (incomplete) era.
  • Testing index (Later to be used when Portal bridge uses era files)

Regarding the exportEra1 command: The export happens by requesting the block data over EL json-rpc API. So it is not directly integrated in Nimbus. This is done because it is difficult to currently get Nimbus EL to sync until merge point on mainnet, and with this architecture it is/should be compatible with other clients. Downside is that it will obviously be slower than directly accessing the db.

@lightclient
Copy link

I'm just curious if you verify the checksums do not match? I actually haven't checked any impls to verify if this is actually an issue in practice or just theory.

@kdeme
Copy link
Contributor Author

kdeme commented Feb 1, 2024

I'm just curious if you verify the checksums do not match? I actually haven't checked any impls to verify if this is actually an issue in practice or just theory.

Yes, the checksums do not match.
When I compared some era1 files created from your branch with some of this nimbus implementation, I noticed occasionally (not all, in fact a minority) records of BlockHeaders and BlockBodys to be different. When I further investigated, it turned out the the compressed byte string was different, but once I decompressed both, the resulting RLP encoded byte string was the same. I will further investigate when this occurs and whether it is some sort of ambiguity in snappy.

My era1 verification code can however perfectly read (snappy decode works fine) the geth created era1 files and verify them (accumulator creation, txRoot, ommersHash and receiptsRoot checks).

@lightclient
Copy link

Okay fantastic. I never really came up with an appropriate solution to the checksum issue. I think it is still valuable to provide per-client because of how fast it can be verified. But it is good to know the accumulator is in agreement.

@kdeme
Copy link
Contributor Author

kdeme commented Feb 5, 2024

One more issue was noticed when decoding the later geth era1 files, fixed here: status-im/nim-eth#672
Invalid encoder/decoder on Receipt lists on our side.

@kdeme
Copy link
Contributor Author

kdeme commented Feb 6, 2024

I've noticed another issue when decoding the BlockIndex of the geth era1 files.
It seems that in the geth implementation each offset is relative to the position after the index that was read. While I understood from the era spec/implementation that each offset is relative to the (fixed) position of the beginning of the BlockIndex record.

https://github.com/ethereum/go-ethereum/pull/26621/files#diff-deee055223d3338d7aee5bae75e02c99f5e9b1c70405d9ec3d0d95ff99807479R182-R187

edit: Upon reading the era1 spec better, it seems it is specified that way: starting-number is the first block number in the archive. Every index is a defined relative to index's location in the file. The total number of block entries in the file is recorded in count.
edit2: Upon reading that above sentence again, it isn't that clear actually that this is what it means.

@tersec
Copy link
Contributor

tersec commented Feb 6, 2024

Okay fantastic. I never really came up with an appropriate solution to the checksum issue. I think it is still valuable to provide per-client because of how fast it can be verified. But it is good to know the accumulator is in agreement.

How much slower is checksum(decode_snappy(snappy_encoded_blob)) than checksum(snappy_encoded_blob)?

The thing which should match is the decoded, not encoded, version.

@kdeme
Copy link
Contributor Author

kdeme commented Feb 8, 2024

This first version is ready to be merged if status-im/nimbus-eth2#5866 gets in.

Has been tested manually by exporting some of the biggest (in data size) eras. And also the latest, incomplete era.
Verify has been tested both with self generated ones and with the ones from geth, the latter does still need the small adaption in index code.

Exporting of era1 files does take very long (due to the current architecture) and is thus not that practical to use for the full range.

@kdeme kdeme marked this pull request as ready for review February 9, 2024 08:10
@kdeme
Copy link
Contributor Author

kdeme commented Feb 9, 2024

This has now also been tested on an era1 file generated with the latest (corrected) version in go-ethereum (ethereum/go-ethereum#28959).

@kdeme kdeme merged commit 2fb90cf into master Feb 9, 2024
21 checks passed
@kdeme kdeme deleted the era1-intial-impl branch February 9, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants