-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor for better caching #30
Conversation
|
||
// If we have _no_ links, we've collapsed everything. | ||
if nd.links[0] == nil { | ||
return 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is a bug in the current AMT. If we insert a high key, then delete it, we don't round-trip back to the empty AMT. (we fail to reset the height).
f18774b
to
e6282fa
Compare
Results:
|
8e17758
to
948f5cf
Compare
This refactor: 1. Ensures we never write anything till we flush. This: 1. Saves us quite a bit of time/gas when making many state modifications. 2. Gives us some room to further optimize batch operations without requiring a network upgrade. 3. Is much easier to reliably re-implement (e.g., in other languages). 2. Completely decodes nodes on load, and re-encodes them on save. This means all bitfield operations are isolated to two functions and bitfields do not need to be maintained in state (they're generated on the fly on flush). 3. Checks a bunch of invariants. We can't check everything, but we can at least avoid doing anything terribly incorrect.
Most batch-delete operations can be safely implemented without changing observed behavior (blockstore access patterns) as long as the indices are pre-sorted.
This patch set introduces a breaking change to the format (fixes a bug).
948f5cf
to
9c15d17
Compare
@ZenGround0 I've rebased this PR on master. Please review when you get a chance. |
in terms of the algorithm (discounting caching) this doesn't change anything except for that minor collapse fix does it? |
Yes. It should (modulo the bug fix) produce the same CIDs). |
Thanks @Stebalien. Should we repackage to |
The last commit in this PR repackages to v3. The tricky part will be consuming v3 and v2 at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done yet but dropping off some comments before I finish going through this tomorrow.
MaxIndexBits = 63 | ||
WidthBits = 3 | ||
Width = 1 << WidthBits // 8 | ||
BitfieldSize = 1 // ((width - 1) >> 3) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if this line did the calculation explicitly it would match the line above and constrain constants at runtime to avoid errors
BitfieldSize = 1 // ((width - 1) >> 3) + 1 | |
BitfieldSize = ((Width - 1) >> 3) + 1 // 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see the assert in the init function below. Any reason to prefer that over constraining as suggested above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if there is someway to document that the magic number 3 here is log of number of bits in a byte and unrelated to WidthBits it may help preempt some confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When this PR is merged I'll rebase and clean up my big docs PR #23, make it lighter-touch and easier to merge as just docs (currently has a couple of minor code changes to help with making it reader-friendly). That should help with making a lot of these magic values clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we thought the math was too confusing and opaque so we just asserted it. I'm going to clean this all up in my PR to make the width configurable.
expLinks []cid.Cid | ||
expVals []*cbg.Deferred | ||
cache []*Node | ||
store cbor.IpldStore | ||
} | ||
|
||
func NewAMT(bs cbor.IpldStore) *Root { | ||
return &Root{ | ||
store: bs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: not your naming but fyi using bs
for a cbor IpldStore always makes me do a double take when I come back to this code because this thing is close to but not quite a BlockStore. Probably not worth changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a bit funky. We can address it in a followup PR.
node.go
Outdated
if !expectLeaf { | ||
return nil, errLeafUnexpected | ||
} | ||
for x := byte(0); x < internal.Width; x++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we drop the cast to byte to help avoid confusion? Reading through the dense bit operations on line 42 I kept getting thrown by byte x not having bit ops performed on it. I'm missing how the cast helps since the loop bounds should keep all array indexes in bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this was a micro optimization but it almost certainly doesn't make a difference. I'll remove it.
continue | ||
} | ||
if ln.dirty { | ||
subn, err := ln.cached.flush(ctx, bs, height-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any possible code path that can hit this, but if the dirty bit is somehow set without a cached node then this will panic. What are your thoughts on checking for this condition and returning an error in this case? Is panicing preferable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be an invariant and I'd usually panic in cases like this. But given the blockchain context, it probably makes sense to be extra defensive.
04f4207
to
cb3d5f0
Compare
Motivation: Updates to the AMT usually involves multiple unnecessary writes during the modifications. This refactor punts all writes to the end.
This refactor:
Note: technically, we'll still:
-- @whyrusleeping