Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: change block layout for Pointers #53

Closed
rvagg opened this issue Jul 22, 2020 · 7 comments · Fixed by #60
Closed

Proposal: change block layout for Pointers #53

rvagg opened this issue Jul 22, 2020 · 7 comments · Fixed by #60

Comments

@rvagg
Copy link
Member

rvagg commented Jul 22, 2020

Current IPLD schema for this HAMT (ref ipld/specs#282):


```ipldsch
type Node struct {
  bitmap Bytes
  pointers [ Pointer ]
} representation tuple

type Pointer union {
  | &Node "0"
  | Bucket "1"
} representation keyed

type Bucket [ KV ]

type KV struct {
  key Bytes
  value Value
} representation tuple

Notable is the keyed union which, in CBOR for the Bucket case becomes: MAP + STRING + "1" + ARRAY + elements ... (0x161, 0x97, 0x49, 0x129 plus an array element, for the single KV Bucket case).

This ought to be a kinded union:

type Pointer union {
  | &Node link
  | Bucket list
} representation kinded

i.e. encode just ARRAY + elements and TAG(52) + CID (0x129 plus an array element, for the single KV Bucket case) and a decode simply inspects that first token to determine which one it is. This is all hand-rolled in pointer_cbor.go and it could be swapped out to decode and switch on if maj == cbg.MajArray at the top of UnmarshalCBOR I think. But it would change the block layout, making current chain data unreadable and clients that don't understand this unable to read new stuff. But I think we have a window where that's possible (like the proposed hash alg change)?

/cc @warpfork

@rvagg rvagg added the need/triage Needs initial labeling and prioritization label Jul 22, 2020
@rvagg rvagg mentioned this issue Jul 22, 2020
@ianopolous
Copy link
Contributor

2 cents: That's exactly how we serialize our champ in peergos and it works well.

@rvagg
Copy link
Member Author

rvagg commented Jul 23, 2020

/cc @anorth @Stebalien @whyrusleeping

@warpfork
Copy link
Contributor

+1 I think this would be a perfectly reasonable change.

(I'm speculating this code/layout might predate when we introduced and enumerated all the union strategies we expect to support in IPLD via Schemas, and so maybe it was made with some conservative understanding of what would be well-supported. If so: let it be updated to be known: yep, kinded representation strategy for unions are 100% definitely a thing we've described and will support.)

@Stebalien
Copy link
Member

That seems reasonable. It would save us ~3 bytes per entry?

@rvagg
Copy link
Member Author

rvagg commented Jul 27, 2020

Not per entry, per element of each node's data array, each of which may contain up to 3 entries or a link to a child node, so it's hard to quantify exactly how much this will save.

A full (theoretical) node with a bitWidth of 5 will have 32 of these, so that's 96 bytes shaved off that node by not including the {"0":..} wrapper, regardless of whether the node contains actual entries or links to child nodes. It'll also make encoding slightly cheaper, and decoding will be much safer because it only has to check for the type of a single token which can only be either a tag(42) or an array, anything else should be rejected.

@whyrusleeping
Copy link
Member

@rvagg This seems like a good change to make, It is definitely a breaking change, but as you pointed out, we do need to make a change to switch the hash function over anyways, and we're resetting the networks pretty regularly. Saving 3 bytes per data element is pretty significant.

If we can get this is pretty quickly (i wish i had seen this issue sooner) then we can hopefully land it before things get more real

rvagg referenced this issue in rvagg/go-hamt-ipld Aug 11, 2020
From:

	type Pointer union {
		&Node "0"
		Bucket "1"
	} representation keyed

i.e. {"0": CID} or {"1": [KV...]}

To:

	type Pointer union {
		&Node link
		Bucket list
	} representation kinded

i.e. CID or [KV...]

Also removes redundant refmt tags

Closes: https://github.com/ipfs/go-hamt-ipld/issues/53
@rvagg
Copy link
Member Author

rvagg commented Aug 11, 2020

done in #60

rvagg referenced this issue in rvagg/go-hamt-ipld Aug 11, 2020
From:

	type Pointer union {
		&Node "0"
		Bucket "1"
	} representation keyed

i.e. {"0": CID} or {"1": [KV...]}

To:

	type Pointer union {
		&Node link
		Bucket list
	} representation kinded

i.e. CID or [KV...]

Also removes redundant refmt tags

Closes: https://github.com/ipfs/go-hamt-ipld/issues/53
Stebalien referenced this issue in rvagg/go-hamt-ipld Aug 17, 2020
From:

	type Pointer union {
		&Node "0"
		Bucket "1"
	} representation keyed

i.e. {"0": CID} or {"1": [KV...]}

To:

	type Pointer union {
		&Node link
		Bucket list
	} representation kinded

i.e. CID or [KV...]

Also removes redundant refmt tags

Closes: https://github.com/ipfs/go-hamt-ipld/issues/53
@anorth anorth removed the need/triage Needs initial labeling and prioritization label Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants