Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(multicodecs): add filecoin multicodecs #161

Closed
wants to merge 3 commits into from

Conversation

hannahhoward
Copy link

Goals

Filecoin needs to be able to construct CIDs for:

  • Piece Commitment (CommP)
  • Data Commitment (CommD)
  • Replica Commitment (CommR)

These are used to uniquely identify:

  • a Filecoin Piece (unsealed)
  • a Filecoin Sector (unsealed)
  • a Filecoin Sector (sealed & replicated)

According to @porcuquine :
Filecoin using custom hashing
Hashing is different for sealed and unsealed data

Implementation

I have defined three serialization codecs (for each of the three types of data) and two hashing algorithms

Assuming these changes are accepted I will make changes to go-multihash and go-cid.

add serialization and hashing codecs for filecoin
table.csv Outdated
@@ -429,3 +429,8 @@ holochain-key-v0, holochain, 0x947124, Holochain v0 pub
holochain-key-v1, holochain, 0x957124, Holochain v1 public key + 8 R-S (63 x Base-32)
holochain-sig-v0, holochain, 0xa27124, Holochain v0 signature + 8 R-S (63 x Base-32)
holochain-sig-v1, holochain, 0xa37124, Holochain v1 signature + 8 R-S (63 x Base-32)
fil-piece-unsealed, serialization, 0xfi01, Filecoin piece, raw data (CID = Piece Commitment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the format of this data? CAR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be IPLD?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's neither. The CID here is the root of the merkletree constructed from the bytes that make up the CAR.

table.csv Outdated
fil-sector-unsealed, serialization, 0xfi02, Filecoin sector, raw data (CID = Data commitment)
fil-sector-sealed, serialization, 0xfi03, Filecoin sector, sealed and replicated (CID = Replication Commitment)
fil-hash-unsealed, multihash, 0xfi04, Filecoin unsealed commitment hash (custom hashing alg)
fil-hash-sealed, multihash, 0xfi05, Filecoin sealed commitment hash (custom hashing alg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the actual hashing algorithm? Or is this some kind of special merkle-tree based hash?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to @porcuquine, it is indeed custom. @porcuquine describes it as "SHA256 with a twist" -- different enough that he says it shouldn't just be SHA-256

@Stebalien
Copy link
Member

What are the size constraints on these codes? I.e., what goes on chain?

@hannahhoward
Copy link
Author

The size of CommP at least is 32 bytes.

@hannahhoward
Copy link
Author

But @jbenet would like it to be an actual CID -- as in:
32 bytes CommP, which itself a custom hashing format + hash identifier (fil-hash-unsealed) + serialization codec (the piece is not simply a CAR. it is a CAR + bit padding + padding at the end, from which merkle-hashes are constructed for every 32 bytes, and then there are hashes of each of those hashes, in a tree all the way up to a root hash -- it doesn't feel accurate to say the serialization is anything but a filecoin customization)

@hannahhoward
Copy link
Author

I'd ask everyone to pretend I didn't just act like the letter 'i' is a valid hex character.

Comment on lines +432 to +434
fil-piece-unsealed, serialization, 0xf101, Filecoin piece- raw data (CID = Piece Commitment)
fil-sector-unsealed, serialization, 0xf102, Filecoin sector- raw data (CID = Data commitment)
fil-sector-sealed, serialization, 0xf103, Filecoin sector- sealed and replicated (CID = Replication Commitment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the documentation available about the structure of all those ones? Are those really unique formats or just blobs of bytes? Multicodecs are for formats, not identifiers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hannahhoward the piece commitment and the data commitment should be the same, only need one of those

@magik6k
Copy link
Contributor

magik6k commented Feb 7, 2020

To be able to resolve those nodes the codec should indicate the depth of the graph, otherwise you can't know if you have an intermediate node or a leaf

For a 32G sector this means we'd need codecs for 30 levels (each level is 32 bytes), and possibly more levels for a bit of future-proofing (64 for a nice round number?)

@Stebalien
Copy link
Member

I'd really rather not abuse codecs that way. I'm going to try to think up a better approach.

@rvagg
Copy link
Member

rvagg commented Apr 9, 2020

Edit: unnecessary detail in here, will add a new comment below with pertinent details.

This is related to a problem I'm having right now doing CID stuff for blockchain data in that we're addressing merkle nodes and not an entire block that we can hash. I'd really like to have a consistent approach to this that we can reuse.

The CommP as a "CID" is really just a CID for a 64-byte string that's a concatenation of two hashes. It's plain SHA2-256 over that. So it could be locked in as that. You could, if you wanted to, make a CID using the same coding for each node in the CommP merkle, all the way down to the base 32-byte hashes of the underlying data.

What I'm trying to solve for right now in serializing BTC blocks is getting from the header to the transactions by way of a 2-ary merkle tree where the root is all the header has. go-ipld-btc takes the interesting approach of overloading the bitcoin-tx multicodec to say "this could be a 64-byte concatenation of hashes and is therefore a merkle node, and if not, then it's the leaf transaction".

https://github.com/ipld/go-ipld-btc/blob/5fe5af640eda869dc1236673de9fd321ba14062b/parsing.go#L133-L138

It seems that we need a more consistent way of addressing merkle structures with CIDs because they pop up so frequently. But in the meantime, saying that 0xf101 is simply a SHA2-256 over a 64-byte string is probably enough to get an entry in the table? In which case the description Filecoin piece, raw data (CID = Piece Commitment) would need to be changed to talk about being a merkle node. The unfortunate implication of not having a better solution is that if I were to unpack a CommP down to its base data into a string of CID+Blocks then it'd be a CID for every node in the merkle tree, each of which is 64-bytes, all the way to the edges, which is a lot of nodes and a lot of wastage because CIDs are so long--this is what I'm dealing with in trying to serialize BTC blocks into CID+Blocks.

@porcuquine
Copy link

porcuquine commented Apr 9, 2020

I think you know this, and it's tangential to what you are discussing, but I want to remind you that the CommP/CommD hash is not just SHA2-256. It's SHA2-256, with truncation to 254 bits (which I'm intentionally not fully specifying here). I mention it mainly so the description above isn't taken out of context as a specification by some future reader.

@rvagg
Copy link
Member

rvagg commented Apr 9, 2020

Edit: unnecessary detail in here, will add a new comment below with pertinent details.

@porcuquine

is not just SHA2-256. It's SHA2-256, with truncation to 254 bits

But that's only at the base of the merkle tree in the underlying data, for the purpose of CID ~= hash(block) the block is just level n-1 of the merkle tree which doesn't have anything fancy about it, it's just a plain hash of two concatenated hashes.

Edit: it does mean that when you get to the base of the merkle tree, if you were addressing nodes by this CID style, you'd be addressing something different, not the concatenation of two hashes, but the underlying data with 2-bit spacing at 254-bit intervals, which is weird, but the merkle tree doesn't care.

There's currently no mechanism to say that "this CID addresses all of this underlying data", it only addresses an immediate "block" which can be passed through some hashing algorithm. i.e. it breaks down if you want to use a CID to address anything beyond a node in a merkle tree. But for the purpose of https://github.com/filecoin-project/go-fil-commcid, simplifying it to say that's all it is is probably enough, it's just not very satisfying when you push the CID ~= hash(block) concept and want block to be something larger.

@porcuquine
Copy link

I'm not sure if we are talking past each other or not. In Filecoin, the Merkle trees whose roots are called either CommP or CommR — are composed of binary hashes which are as I described, not just SHA2-256 without the truncation. This is important because each node in the tree has to fit into one BLS12-381 field element when included as a private input to the Merkle inclusion proofs contained in the PoRep SNARK.

@rvagg
Copy link
Member

rvagg commented Apr 9, 2020

Edit: unnecessary detail in here, will add a new comment below with pertinent details.

@porcuquine oh yeah, sorry about that .. https://github.com/filecoin-project/rust-fil-proofs/blob/d7896c29ef3c0cc8c04f9fab7ef434e6691ed480/storage-proofs/core/src/hasher/sha256.rs#L75 and I even implemented that in javascript so should have remembered! https://github.com/rvagg/js-fil-utils/blob/52255d1603ac49d912d8a1ede66bab5c0baa228b/merkle.js#L10

Well that complicates it further then. Hash algorithm is something like "sha2-256-254" which maybe needs a multihash entry of its own to make this work.

@rvagg
Copy link
Member

rvagg commented Apr 9, 2020

Edit: unnecessary detail in here, will add a new comment below with pertinent details.

... and that's what this PR is for, the sha2-256-254 _multihash_, with the _multicodec_ for CommP currently being ["raw"](https://github.com/filecoin-project/go-fil-commcid/blob/2b8bd03caca59c436d953f32fd825c055b612e18/commcid.go#L69) which probably isn't right and may need its own entry.

I guess for sealed data, describing it as sha2-256-254 doesn't cut it since those extra 2 bits are going to be filled with something that we can't specify without additional context? For unsealed we can at least say say 0x00.

@porcuquine
Copy link

@porcuquine oh yeah, sorry about that .. https://github.com/filecoin-project/rust-fil-proofs/blob/d7896c29ef3c0cc8c04f9fab7ef434e6691ed480/storage-proofs/core/src/hasher/sha256.rs#L75 and I even implemented that in javascript so should have remembered! https://github.com/rvagg/js-fil-utils/blob/52255d1603ac49d912d8a1ede66bab5c0baa228b/merkle.js#L10

Hence my mild panic.

I guess for sealed data, describing it as sha2-256-254 doesn't cut it since those extra 2 bits are going to be filled with something that we can't specify without additional context? For unsealed we can at least say say 0x00.

I'm not 100% sure what you're discussing, so I'll be guarded. SHA2-256 doesn't contribute to any root of sealed data. The only Merkle trees created on sealed data use Poseidon with a more complicated tree structure.

@rvagg
Copy link
Member

rvagg commented Apr 9, 2020

Sorry for my spam in here @hannahhoward et. al. I've <details>d my comments above, you can ignore them for now. I'm writing up a clearer set of thoughts on how to handle this.

@hannahhoward one thing I would like to know is whether this PR or https://github.com/filecoin-project/go-fil-commcid/blob/master/commcid.go reflects current desired state. The latter uses raw as a codec type but this PR introduces both codecs and multihashes. Does this just reflect the uncertainty surrounding this issue, or limitations of go-cid in being unable to represent these new codecs yet?

@rvagg
Copy link
Member

rvagg commented Apr 10, 2020

(Skip down to "Suggestions" at the bottom of this you 'aint got time for all this text, scroll back up for background.)

We're trying to extend CID & multiformats a bit beyond what they should be used for here I think, but the edges are blurry already so let's be clear before proceeding. Please excuse my basic framing here, I know it's probably tedious for some.

What are these values we're trying to identify:

  • Comm{P,D,R} are roots of a merkle trees over a large blob of data. That data at the base of the merkle is a transformation of source data in the case of unsealed data (zero-padding for size then fr32 bit padding). So the relationship to source data is not as crisp as hashfn(data). The relationship gets even more complicated for sealed data I believe.
  • Unsealed data uses a binary merkle tree structure using SHA2-256 with the 2 bits at the end of each zeroed, so quite novel.
  • Sealed data uses a very novel merkle tree structure; I gather section 5.2 of the poseidon paper describes it, pretty opaque to most mortals I think, with whatever hashing algorithm it needs for its proofs.
  • Comm{P,D,R} are not hashes of raw data in the sense that multihash uses the term "hash". Multihash to date doesn't cover the case of addressing a block of data via a merklization process. And these processes use more than a plain merkle.

CIDs and muiltihashes

A CID is:

"a typed content address: a tuple of (content-type, content-address)..

It's for content-addressed data where address = hashfn(data). Each hashfn should be in the multicodec table as a multihash:

Multihash is a protocol for differentiating outputs from various well-established cryptographic hash functions, addressing size + encoding considerations.

(This is being baked into the IETF standard submission for multihash @ https://datatracker.ietf.org/doc/draft-snell-multihash/)

To date, as far as I'm aware, all of the multihash entries in the multicodec table are relatively discrete, in that they are not compound algorithms and they fall within this "well-established" rule. My expertise fails here in trying to find the limits to this classification but one multihash that seemingly breaks this is dbl-sha2-256, which I think we can thank BTC for. Aside from this, we haven't introduced anything that would be a compound process to go from data to address. Specifically, there are no multihashes that get close to describing a merkle-style hash that would let you take a merkle root and say that it is the content-address of the data at the bottom of that merkle tree.

Multihash & this PR

As it stands, I believe (my reading could be faulty!) this PR is proposing to extend the multihash concept and say that: 0xf104 and 0xf105 are compound "hashing" functions. They include many complex steps. In the case of unsealed, 0xf104, something like: merkle(arity=2, hash=sha2-256-truncated, data=zero-padded(fr32-padded(source))). And in the case of sealed data, 0xf105: the novel poseidon merkle structure with whatever other processes it needs to do proofs (beyond me atm).

Compounding this, I believe the sealed "hashing" process would use additional context external to the algorithm (for the ZK proof) making it impossible, or impractical for anyone to take data, run hashfn() and produce address to validate correctness, unless they have that additional context (I think this is true, my understanding of sealing is extremely limited).

You could take a 0xf104 (unsealed) chunk and run a "hash function" hashfn over data and produce address independent of any context. It's just extremely complicated, as outlined above. https://github.com/rvagg/js-fil-utils is basically this, you can see the complexity. But again, we're pushing the limits of what a "multihash" was meant to do.

CID & this PR

As per the definition above, a CID is a content-type plus the content-address—or multihash.

So considering this content-type: If we're addressing underlying piece and sector data then that might make sense. It's still a bit of a stretch, but if you include everything that takes you from the source data to the "hash" as the "hash function" then this could work. But a CID is normally paired with a "block", the thing it's pointing to. So we'd need to be clear exactly what we're pointing to in each of these cases? In the case of CommP is it the source data of the piece, or is it the padded piece, or something else? What about CommD and CommR?

What we need to resolve

For multihash:

  1. What are the bounds of the processes that we want to call a "hash function" for the purpose of a multihash for each of these things?
  2. What is the "content" that we are producing a "content address" for?
  3. Is this even a good fit for a multihash?

e.g. for CommP:

  • The hash function could be the full merkle(arity=2, hash=sha2-256-truncated, data=zero-padded(fr32-padded(source))) (all of it from source data) and the "content" would therefore be the source data (CAR file?).
  • The hash function could be just the root of the merkle tree sha2-256-truncated(concatenation of root-1 merkle node hashes) and the source data would be therefore be just the concatenation of the two hashes at the merkle level just under the root. This is similar to an approach taken by go-ipld-btc, which encodes all nodes of the merkle tree from the txMerkleRoot to the transactions themselves, giving each of them a CID. That gets out of hand with a large tree of course and it's not clear in the case of Comm{P,D,R} that you'd even want to do that.

For CID / multicodec:

  1. What is the appropriate "tag"?
  2. Is this even a good for a multicodec and therefore a CID?

This PR uses serialization but I don't think that's correct when you compare it to the other entries with that tag (cbor, protobuf, json, etc.).

Suggestions

We have 2 get-out-of-jail tools here that we can fall back on if needed:

  • identity / 0x00 multihash
  • raw / 0x55 ipld multicodec

I think "well-established cryptographic hash functions" disqualifies the addition of the new multihashes. Both "well-established" and "hash functions" trip us up. I think even that dbl-sha2-256 stretches "well-established" but has become pervasive enough in he coin ecosystem that you could give it a pass. Even if we could narrow it down to something like "SHA2-256 with 2 trailing bits zeroed", that's too novel for what multihash currently is. Forcing this is going to require a redefinition of "multihash", which isn't out of bounds of course, but it will have consequences we need to carefully consider beyond this issue.

At this stage I think the multihash for these things, if one is required (i.e. to make a CID), should be indentity / 0x00. This would be saying that "this is the actual thing, not a hash of a thing that you can load and navigate in to", which I think may fit with how CommP (in particular) would be used when in CID form. You're not addressing the underlying data, you don't even have it to check that your CID+Block pair makes sense and will likely have difficulty getting the Block anyway.

For the other part, the "multicodec", it doesn't seem as clear either way. I don't think serialization is correct. But we have addresses, records and namespaces in here so it doesn't seem entirely out of place to have an entry that's not a codec in the strict sense of the word.

We've already set the precedent that tag could be used for broader classification (see holochain). So tag could just be filecoin. 0xf101, 0xf102, 0xf103 could work (aside from Jeromy's comment about Piece and Data being redundant?) as specifiers that the thing included in this CID by an identity "hash" can be used to talk to Filecoin about things of that sort.

Any requirement for the abiltiy to differentiate algorithm in the future can iterate in the 0xf1 range, bumping to the next one but saying "it's v2 of CommX".

Example of a CommP to a CID using this:

  • Raw CommP: 97b8a11ea031fe2128ed396046af6353132e4a522345eede9f6aac7036284c02
  • Identity multihash: 002097b8a11ea031fe2128ed396046af6353132e4a522345eede9f6aac7036284c02
  • 0xf101 content type / "codec" CIDv1: 0181e203002097b8a11ea031fe2128ed396046af6353132e4a522345eede9f6aac7036284c02
  • base32 CID: baga6eayaecl3rii6uay74iji5u4warvpmnjrglskkirul3w6t5vky4bwfbgae

If we have something that's not a serialization or an ipld "multicodec", does it make sense that you could make a CID from it? What does such a CID even represent and how would you use it in a way that requires encapsulation inside a CID? I suppose a disembodied CID that tells you it's a 0xf101 could at least tell you that you should be able to query Filecoin commitment piece API for information about it or use it in a retrieval request. I'm not so sure about what you'd use the others for outside of Filecoin itself.

@jbenet
Copy link
Member

jbenet commented Apr 14, 2020

Quick notes -- not done but out of time, will return later on --

  • Poseidon should pass the multihash bar -- meaning that if we think the bar for inclusion to multihash excludes fns like poseidon because they're too new or too esoteric, then the bar needs redefining. I understand the goal of making sure "well known" implies "broadly adopted", but in order to be useful multihash should operate at a few levels earlier -- ie much more permissive/lenient. I think the level of well known-ness of Poseidon (which is very new, very esoteric) fits within the original intention for multihash.
  • this does not mean all implementations should include these functions, or even carry the entries in their local tables, but rather that the global table should allocate a number, and those implementations that want to can provide support
  • There are infinite numbers in the multicodec table, and not a lot of demand for them at present (affecting small numbers) such that we have to worry about denying codec inclusion. the important part is to be specific, not to save an entry in a sparse table
  • the broader goal of multiformats is to self-describe well-typed objects. It is better to assign a new number than not represent the information or fake it (identity/raw).
  • The internal hashing of all the relevant trees is -- by definition -- a proper merkle tree. One can either (a) treat all nodes in the tree as ipld objects (my preferred approach), or (b) elect to treat the whole intermediate tree as an implicit special purpose object/format for applications like filecoin.
  • Yes the trunctated sha256 is weird, but it's not that weird. truncating cryptographic artifacts is not unusual in a lot of crypto systems. many crypto systems force funky constraints like that. For multihash, multicodec, and cids to be useful to crypto systems in the large, they will all have to learn to deal with them.
  • If cids and multiformats can't easily meet constraints like these from near-reach merkle-tree-galore applications like filecoin, i doubt their long-term utility. put another way: i think that success for these formats requires being useful for these sorts of use cases, and finding the right ways of getting all these cryptographic artifacts to coexist and play well together -- that's why multiformats calls for self-description: to let applications meet their constraints while also communicating them to other applications

@rvagg
Copy link
Member

rvagg commented Apr 16, 2020

So we need to address the flexibility of the definition of "multihash" first.

There's 3 approaches:

  1. Nope - these things are too novel for "well-established", stick with opaque values and use identity
  2. Accept the novel hash functions used throughout the merkle trees in Filecoin as multihash entries, leaving identification a matter of identifying merkle nodes (primarily the roots but each node could be addressed in the same way all the way down).
  3. Accept the general merklisation processes, including their node hash functions, as multihash entries. We'd be opening an interesting door with this option and it's justifiable since you're still essentially just hashing some base value, you're just doing it in a very complicated way. But then we could do things like identify bitcoin transactions using their standard transaction IDs rather than having to interpret those IDs as roots of a merkle tree.

Having talked it through with @porcuquine, I've put up #170 and #171 to cover approach 2. I buy Poseidon as a valid multihash entry, it's just got a lot of possible permutations so we need to make sure we fit just enough (but not too much) information into the entries to sufficiently differentate them. sha2-256-trunc2 is not unreasonable considering our existing dbl-sha2-256 entry; but it does open the door to weird and wonderful permutations of existing hash functions if we accept it.

If we accept those as valid extensions to multihash, then #172 could wrap them up as CIDs. Borrowing the first 3 entries from this PR. But we'd have to be clear that these identifiers, in the CID : Block relationship, are only identifying nodes (primarily roots) of merkle trees as their "Blocks", not underlying data.

@rvagg
Copy link
Member

rvagg commented Apr 23, 2020

Summary of actions to get this closed out, please skim this and register any disagreement now @Stebalien @jbenet @whyrusleeping @porcuquine @mikeal @magik6k @dignifiedquire:

  • add sha2-256-trunc2 multihash: 0x1012 #170 adds a sha2-256-trunc254-padded multihash that's specifically SHA2-256 with two bytes zeroed out (specifics in the table notes)
  • Add two Poseidon Filecoin variant multihashes #171 adds two multihashes: poseidon-bls12_381-a2-fc1 and poseidon-bls12_381-a2-fc1-sc for Poseidon with BLS12-381 arity of 2 and Filecoin specific parameters, with a high-security variant with additional circuits that's available if needed at some future date without needing to redo a secure setup for regenerating parameters.
  • add filecoin commitment merkle root codecs #172 adds two multicodecs, fil-commitment-unsealed for both CommP and CommD and fil-commitment-sealed for CommR. The unsealed get to share because the CommP merkle is strictly a subset of the full CommD merkle. These multicodecs are currently using the tag filecoin but we may change that to ipld in a future refactor of this table that cleans up tags for consistency. These new multicodecs strictly describe a node in a merkle tree--if you were to "load" them, you'd only get the node which could be unmarshalled by their respective codecs as a list of links to child nodes or base data.

@jbenet
Copy link
Member

jbenet commented Apr 23, 2020

that works for me 👍

thank you very much for taking this on @rvagg -- i know this is really tricky work, with lots of ramifications, and i really appreciate the care and thoughtfulness you put into the whole thread of things

@porcuquine
Copy link

Amen.

@rvagg
Copy link
Member

rvagg commented Apr 29, 2020

We had a bit of a diversion in #172 over the nature of the "codecs", thanks to @hannahhoward for the pinging us on that. My original attempt to shoehorn these in as IPLD-like codecs was a failure, but @vmx and I are mostly comfortable with moving forward with simply adding them as identifier descriptors.

So the multihash says "what hash function was involved in generating this content address" and the codec says "what type of content is this addressing" while stopping a bit short of saying "how can I decode the data if I were to fetch it"—which is where we fail at these being true IPLD-like codecs. We get to differentiate between sealed and unsealed addresses (CIDs), but you couldn't practically traverse into them like an IPlD codec. The tag stays as filecoin for now rather than serialization or ipld to make this distinction.

We also had a good discussion about qualification for this table and how we need a clearer description of what gets to be in here and what doesn't. The purpose needs to be more clearly stated so contributions to the table are easier to approve or reject. There also needs to be some room for contributors to be able to be able to justify inclusion for their own reasons if there isn't an obvious disqualification according to the purpose of the table.

In the case of Filecoin, we still don't fully understand why CIDs would be useful for each of the 3 cases, or where they might get used outside of the system for them to need an entry in here. If someone could explain that it would be interesting for us. But grokking all of these details is going to be too great a burden for maintenance of this table in general (for just these entries it's taken many collective hours of learning—this was positive for both @vmx and I but won't scale).

rvagg added a commit that referenced this pull request May 12, 2020
SHA2-256 with the trailing 2 bits zeroed out. Primary current use is
Filecoin.

Ref: #161
rvagg added a commit that referenced this pull request May 12, 2020
Reserving the 0xb400 range for Poseidon variants, allowing FIL to
iterate on the `fcX` extension of the name where they stay with
BLS12-381 and arity=2. High security variant is for extra circuits
that are usable in case new attacks arise from the standard variant.

Ref: #161
Ref: https://eprint.iacr.org/2019/458.pdf
rvagg added a commit that referenced this pull request May 12, 2020
These describe roots & nodes of a merkle tree, not the underlying
data. In the case of CommP and CommD they are binary merkle trees
using sha2-256-trunc2. For CommR they are novel structure merkle
trees using poseidon-bls12_381-a2-fc1.

All nodes of the respective merkle trees could also be described
using this codec if required, all the way to base data. It is
anticipated that the primary use will be restricted to the roots.

Ref: #161
Closes: #161
Closes: #167
@rvagg rvagg closed this in #172 May 12, 2020
rvagg added a commit that referenced this pull request May 12, 2020
These describe roots & nodes of a merkle tree, not the underlying
data. In the case of CommP and CommD they are binary merkle trees
using sha2-256-trunc2. For CommR they are novel structure merkle
trees using poseidon-bls12_381-a2-fc1.

All nodes of the respective merkle trees could also be described
using this codec if required, all the way to base data. It is
anticipated that the primary use will be restricted to the roots.

Ref: #161
Closes: #161
Closes: #167
rvagg added a commit to rvagg/go-multihash that referenced this pull request May 12, 2020
@rvagg
Copy link
Member

rvagg commented May 12, 2020

sha2-256-trunc254-padded: 0x1012
poseidon-bls12_381-a2-fc1: 0xb401
poseidon-bls12_381-a2-fc1-sc: 0xb402
fil-commitment-unsealed: 0xf101
fil-commitment-sealed: 0xf102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants