-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support HAMT #4
Support HAMT #4
Conversation
add builders and utitlities for working with unixfs data protobufs
Add HAMT Implementation + Shared iterator to use with basic directory
move basic directory to its own package, redo root package based on more flexible refication and more flexible node prototype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇 🎉 💯
data/builder/builder.go
Outdated
"github.com/ipld/go-ipld-prime/fluent/qp" | ||
) | ||
|
||
func BuildUnixFs(fn func(*Builder)) (data.UnixFSData, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment to this toplevel method with pointer to example of using it to build a node
type UnixFSData struct { | ||
dataType Int | ||
data optional Bytes | ||
filesize optional Int; | ||
blocksizes [Int] | ||
|
||
hashType optional Int | ||
fanout optional Int | ||
mode optional Int | ||
mtime optional UnixTime | ||
} representation map | ||
*/ | ||
|
||
ts.Accumulate(schema.SpawnStruct("UnixFSData", | ||
[]schema.StructField{ | ||
schema.SpawnStructField("DataType", "Int", false, false), | ||
schema.SpawnStructField("Data", "Bytes", true, false), | ||
schema.SpawnStructField("FileSize", "Int", true, false), | ||
schema.SpawnStructField("BlockSizes", "BlockSizes", false, false), | ||
schema.SpawnStructField("HashType", "Int", true, false), | ||
schema.SpawnStructField("Fanout", "Int", true, false), | ||
schema.SpawnStructField("Mode", "Int", true, false), | ||
schema.SpawnStructField("Mtime", "UnixTime", true, false), | ||
}, | ||
schema.SpawnStructRepresentationMap(nil), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to @warpfork we really should figure out how to generate this intermediate code-gen format from ipld schemas :)
gengo "github.com/ipld/go-ipld-prime/schema/gen/go" | ||
) | ||
|
||
func main() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure to add a go:generate
line somewhere in this repo so that a go generate ./...
at the root re-runs this.
data/unmarshal.go
Outdated
switch fieldNum { | ||
case Data_DataTypeWireNum: | ||
if wireType != protowire.VarintType { | ||
return fmt.Errorf("protobuf: (UnixFSData) invalid wireType, field: DataType, expected %d, got %d", protowire.VarintType, wireType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
factor out a constant format string from this and below?
"google.golang.org/protobuf/encoding/protowire" | ||
) | ||
|
||
func DecodeUnixFSData(src []byte) (UnixFSData, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to follow the Decoder func(NodeAssembler, io.Reader) error
interface here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea kinda -- it's never going to be registered as a multicodec, and this provides a bunch of weird type casts and artificial reader constructions all around. This isn't an IPLD format per se. We will always be decoding from within a larger block.
links := n._substrate.FieldLinks() | ||
link := lookup(links, key) | ||
if link == nil { | ||
return nil, schema.ErrNoSuchField{Type: nil /*TODO*/, Field: ipld.PathSegmentOfString(key)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leaving comment to flag this todo :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one's actually an upstream design problem. (When I drafted that error type, I thought having codegen spit out values for self-description of the types it had just emitted would be a "just around the corner" feature, and so I'd put a pointer to type info here. That turned out to be misplaced optimism. It's likely this should be replaced by a convention of type name strings instead.)
I'll make an issue upstream to at least track this.
|
||
// satisfy schema.TypedNode | ||
func (UnixFSBasicDir) Type() schema.Type { | ||
return nil /*TODO:typelit*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't this just TypeMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that's not really the same thing. There are a ton of restrictions on format. It's actually a struct from the schema systems point of view. Actually, given it doesn't adequately satisfy that function we should remove it.
shardCache := make(map[ipld.Link]*_UnixFSHAMTShard, substrate.FieldLinks().Length()) | ||
bf := BitField(data) | ||
return &_UnixFSHAMTShard{ | ||
ctx: ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
holding onto context past construction may be a mis-match that confuses things. currently context is passed in during a load and the caller may assume that's tracking the needed loads to the datastore to construct the node, such that once they get their constructed top-level node back they cancel the context.
if later they attempt to do a map iterator over the children in a directory they aren't necessarily going to make the connection that their previous construction-time context is still in play.
On the flip side, the whole point of the hamt is that we can lazy-load, and we do need a context of some sort while doing that, and I don't see any others to pull in that make any more sense.
thoughts @warpfork
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. the fix would be to pass context to all the ipld node methods, which is not a bad idea per say, but a real big change.
I mean the reality is all these loads are happening in LookupByString, which is a bit confusing I think, since who would expect that massive amounts of fetching from datastores and even the network is happening here. But, on the other hand, I don't have anything better right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. understood that this is what the interface lets us do for now, but also pointing it out to @warpfork that the interface forces this in it's current design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very much prefer that interfaces are changed to allow context. Storing in a struct temporarily is problematic in that once more code gets built on this the embedded context gets harder to remove, and the context's lifetime may be confusing. If this cannot be changed now, please leave a TODO to indicate this is a workaround that needs to be addressed later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting feedback, thanks for this.
- It is true that currently storing a context is really the only way it's possible to go with this.
- Agree that storing context means lifetime can be unclear and that seems kind of problematic.
- Confirm that adding contexts to all methods is a pretty big change.
@mvdan and I actually talked about this (the possibility of adding context parameters to everything) previously too, and at the time put it back down as insufficiently convinced that it provided value for the amount of line noise it adds. (Mostly because the amount of syntactic bulk in these APIs already often feels like it's kind of pushing it.) But it's definitely possible it's still worth reconsidering.
As an interesting comparison, the new golang fs
packages also weighed a similar decision -- whether to add contexts to almost all file IO methods -- and they decided there to store context in a filesystem structure as well. I don't know if that's right or wrong either, but it's food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it helps to think about things, this article describes why it is important to pass Context rather than store it in another type. It also highlights a rare case where storing Context in a struct type may make sense, and how to do so safely:
https://blog.golang.org/context-and-structs
Perhaps @mvdan could offer his take on this. He has commented on this in the past: golang/go#22602 (comment)
hamt/util.go
Outdated
} | ||
|
||
if !nd.FieldHashType().Exists() || uint64(nd.FieldHashType().Must().Int()) != HashMurmur3 { | ||
return fmt.Errorf("only murmur3 supported as hash function") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have these errors as constants, here & below?
cbf9156
to
55bf1b3
Compare
have reifier add path selection to all protobuf nodes, even ones that are not unixfs
55bf1b3
to
d6733f6
Compare
respond to PR comments -- make error types, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just a few minor style comments. Looks very good.
data/builder/builder.go
Outdated
b := &Builder{MapAssembler: ma} | ||
fn(b) | ||
if !b.hasBlockSizes { | ||
qp.MapEntry(ma, "BlockSizes", qp.List(0, func(ipld.ListAssembler) {})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider a const
for "BlockSizes"
and other map entry keys.
data/errors.go
Outdated
if !ok { | ||
actualName = "Unknown Type" | ||
} | ||
return fmt.Sprintf("Incorrect Node Type: (UnixFSData) expected type: %s, actual type: %s", expectedName, actualName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lower case error strings
} | ||
if u.FieldDataType().Int() == Data_HAMTShard { | ||
return HAMTShardPerimissionsDefault | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using switch u.FieldDataType().Int() { ... }
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right :)
data/unmarshal.go
Outdated
for { | ||
if len(remaining) == 0 { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider doing this test in the for
:
for len(remaining) != 0 {
...
}
Same comment for other consumeX
functions.
shardCache := make(map[ipld.Link]*_UnixFSHAMTShard, substrate.FieldLinks().Length()) | ||
bf := BitField(data) | ||
return &_UnixFSHAMTShard{ | ||
ctx: ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would very much prefer that interfaces are changed to allow context. Storing in a struct temporarily is problematic in that once more code gets built on this the embedded context gets harder to remove, and the context's lifetime may be confusing. If this cannot be changed now, please leave a TODO to indicate this is a workaround that needs to be addressed later.
hamt/util.go
Outdated
out := int(mkmask(i) & curb) | ||
hb.consumed += i | ||
return out | ||
} else if i < leftb { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point -- copied code but you are totally right.
hamt/util.go
Outdated
return nil | ||
} | ||
|
||
func Log2Size(nd data.UnixFSData) int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be exported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair
reification.go
Outdated
} | ||
builder, ok := reifyFuncs[data.FieldDataType().Int()] | ||
if !ok { | ||
return nil, fmt.Errorf("No reification for this UnixFS node type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "No" --> "no"
Use ipld-primes reifier to avoid uncomfortable node prototype stuff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in notes for @gammazero comments and otherwise pushed.
Support HAMT This commit was moved from ipfs/go-unixfsnode@5a89cd4
Goals
Support both regular directories and HAMTS for path lookup
Implementation