-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: lookup txs by block number instead of block hash #19431
core: lookup txs by block number instead of block hash #19431
Conversation
I've built an image with your last commit (the hacky one) on top of the latest version of your previous PR (currently pending merge) and deployed them on 2 VMs (06/07 (old-pr/old-pr-and-this) for Martin). Will report on the fast sync results in half a day. |
Thanks @karalabe ! Excited to see if we get any additional savings here. |
Sync is not yet done, but your numbers seem to be fairly accurate. PR is about 12GB smaller than without it. |
Fast sync done, 12GB saved! |
d90ef2b
to
3ab452b
Compare
@karalabe That's great! I've update the PR with the actual code and backwards compatibility with the prior transaction lookup formats. |
8ae92dd
to
48177b5
Compare
Transaction hashes now store a reference to their corresponding block number as opposed to their hash. In benchmarks this was shown to reduce storage by over 12 GB. The main limitation of this approach is that transactions on non-canonical blocks could never be looked up, however that is currently not supported. The database version has been upgraded to version 5 and the transaction lookup process is backwards-compatible with the prior two transaction lookup formats prexisting in the database instance. Tests have been added to ensure this.
48177b5
to
835abc4
Compare
@@ -27,28 +29,36 @@ import ( | |||
|
|||
// ReadTxLookupEntry retrieves the positional metadata associated with a transaction | |||
// hash to allow retrieving the transaction or receipt by hash. | |||
func ReadTxLookupEntry(db ethdb.Reader, hash common.Hash) common.Hash { | |||
func ReadTxLookupEntry(db ethdb.Reader, hash common.Hash) *uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worthwhile to have two methods? One that returns a uint64
and one that returns hash
? That way, we would sometimes not needs to do lookup tx -> hash, read hashto number -> read cannon hash (number).
It would also make the code for the callers a bit simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holiman Could you point me to the code that you think would benefit from this? How performance critical is that code as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about ReadTransaction
and ReadReceipt
.
They do
ReadTxLookupEntry, ReadCanonicalHash,ReadBody
.
If it's an old tx, it would internally be
read a hash, ReadHeaderNumber, ReadCanonicalHash,ReadBody
.
which could be
read a hash -> ReadBody
.
Maybe it would just overcomplicate things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holiman I agree that legacy databases pay a penalty here. If a user was using the client for constantly looking up transactions they may feel the effects of this slowdown. However, if this was a serious user I would probably expect them to resync. We could also offer a CLI tool to do these various database upgrades. @karalabe let me know if this is something that would be valuable to have. It would be done in a follow up PR.
Apart from my 2 tiny nits, LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small question, otherwise LGTM
return nil | ||
} | ||
// Database v6 tx lookup just stores the block number | ||
if len(data) < common.HashLength { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking is this the final step of txLookup
? Since there is no flag at all now for the v6 format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing was updated to use this format: https://github.com/ethereum/go-ethereum/pull/19431/files#diff-0fba94aa54693ff2a4f26210c404dbd9R59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear what you mean by a flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the "flag" for blob. For example, if the blob can be decoded into v3 structure, this is kind of "flag". Also if the length of blob is 32, it is kind of flag for v4, v5.
So probably in the future if we want to change the format again, it can be painful.
But just thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the flag here is < 32 bytes
:) But all in all I think we always want to try to decode into the newest version first. People who upgrade to 1.9 will with a high chance resync, so might as well use the happy path and not do 3 decoding to get it right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rjl493456442 My intention was how @karalabe interpreted the code. If either of you feel like this needs to be made more clear and have a suggestion about how to do it please feel free to adjust it or let me know what to do.
* core: lookup txs by block number instead of block hash Transaction hashes now store a reference to their corresponding block number as opposed to their hash. In benchmarks this was shown to reduce storage by over 12 GB. The main limitation of this approach is that transactions on non-canonical blocks could never be looked up, however that is currently not supported. The database version has been upgraded to version 5 and the transaction lookup process is backwards-compatible with the prior two transaction lookup formats prexisting in the database instance. Tests have been added to ensure this. * core/rawdb: tiny review nit fixes
Transaction hashes now store a reference to their corresponding
block number as opposed to their hash. In benchmarks this was
shown to reduce storage by over 12 GB.
The main limitation of this approach is that transactions on
non-canonical blocks could never be looked up, however that is
currently not supported.
The database version has been upgraded to version 5 and the
transaction lookup process is backwards-compatible with the
prior two transaction lookup formats prexisting in the
database instance. Tests have been added to ensure this.