-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: badger: add a has check before writing to reduce duplicates #10680
Conversation
I'm seeing 9-10s block times, which seem normal. But I'll continue testing to see if anything unusual happens. |
c3a7488
to
0cff56a
Compare
Looks good. Average 10.5 blocktimes both ways. |
Really, I'm seeing 10.2s block times with this patch, so it may even be speeding things up a bit by not writing duplicate state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update and return a sentinel error if exists?
It might be more efficient, as it saves a round trip in badger.
Not sure I understand. This patch is checking if we have something before writing it. That's not an error, just something to be optimized. |
Not an error, I was suggesting using Basically move the occurs check into the |
// Check if we have it before writing it. | ||
switch err := db.View(func(txn *badger.Txn) error { | ||
_, err := txn.Get(k) | ||
return err | ||
}); err { | ||
case badger.ErrKeyNotFound: | ||
case nil: | ||
// Already exists, skip the put. | ||
return nil | ||
default: | ||
return err | ||
} | ||
|
||
// Then write it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be relatively more efficient to do this check inside db.Update
below and discarding the transaction if the key exists, instead of starting a new transaction. Although in practice it probably doesn't matter much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that but that forces a read/write dependency (a transaction). That does guarantee no overwrites, but I was concerned about performance (IIRC, transactions in badger had some scaling issues).
I've been running this for several days now and haven't had any issues. |
|
VM flush used to stop flushing once it encountered a block already present in the blockstore but that changed with the FVM. Now, dag traversal now happens FVM side so we instead flush all reachable and newly newly written blocks without checking the store. We could check if we already have them, but that would require crossing the rust/go FFI boundary, which tends to be very slow. I also significantly prefer the new VM flush logic as it's resilient to missing blocks.
You're right, we should. I'll try to find someone with such a node. |
The store bloat comes from putting those new CIDs over and over again in Badger. Badger is an LSM tree and it happily write the entry to L0/L1 and the value log again, even if it already exists. For mutable data, this makes sense (as eventual compation takes care of preserving the latest value, or as many values you ask it to keep), but for immutable data, it never makes sense to rewrite it. It's entirely redundant.
The write workloads are extremely user-dependent here. A has cache would make sense if we think that the combination of blockstore write workloads from StateCompute, StateReplay, and block validation, will produce a good hit rate. Because the first two are entirely user-driven, I'm not convinced. What would a good value be?
You're worried that the cost of the has check where the index spills to disk will add significant latency? Intuitively I think the risk is small / non-existent as Badger will prioritise keeping the top levels of the index in memory (i.e. most recent data)? This is specifically what block validation will use. It may be noticeable for StateCompute and StateReplay, but those are user workloads? Nevertheless, we should test, yep! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine to merge and see how it performs.
Ideally we'd check on a full-node first (can give you access to one if you don't have already)
I've tested it on a snapshot node, but not a node with significant historical state. I don't think this will slow that node down, but I'm happy to test if you give me access. |
Related Issues
fixes #10457
Proposed Changes
Check if we have blocks before writing them (in badger, only). By default, badger just appends the block which will grow the size of the datastore until garbage collected. Worse, badger garbage collection is expensive and not fully effective.
Additional Info
Downsides:
However, the average block sync time (on my machine) is 10.5s before and after this patch, so it shouldn't matter.
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, build, chore, ci, docs, perf, refactor, revert, style, testarea
, e.g. api, chain, state, market, mempool, multisig, networking, paych, proving, sealing, wallet, deps