Skip to content
This repository has been archived by the owner on Jun 6, 2023. It is now read-only.

Murmur3 based HAMTs are not secure with attacker influenced keys #517

Closed
Stebalien opened this issue Jun 22, 2020 · 14 comments · Fixed by #769
Closed

Murmur3 based HAMTs are not secure with attacker influenced keys #517

Stebalien opened this issue Jun 22, 2020 · 14 comments · Fixed by #769
Assignees
Labels
P1 High priority, required for basic network functionality and growth
Milestone

Comments

@Stebalien
Copy link
Member

If an attacker can influence the keys used in a HAMT, the HAMT must use a cryptographically secure hash function. As-is, if an attacker can brute-force several (3) 64bit murmur3 hashes, they can prevent a key from being inserted into the HAMT.

Even if the key is a sha256 hash, brute-forcing 2**64 SHA256 hashes (enough to pick a specific murmur3 key) is well within the power of a bitcoin miner.

TL;DR: The hash needs to have at least 128 bits of security (but doesn't need to be safe against the birthday attack).

@ZenGround0
Copy link
Contributor

if an attacker can brute-force several (3) 64bit murmur3 hashes, they can prevent a key from being inserted into the HAMT.

Could you explain the mechanism behind this or link to somewhere it has already been discussed?

@Stebalien
Copy link
Member Author

If I know you're going to insert key Kt with hash Ht = murmur3(Kt).

  1. I can create keys Kx, Ky, and Kz such that the murmur3(K{x,y,z}) == Kt.
  2. Insert these into the HAMT.
  3. You'll then try to insert Kt.
  4. Try to insert a 4th key in this shard, hitting https://github.com/ipfs/go-hamt-ipld/blob/0310ad2b0b1f880f8412150aa7dd9b1e6b9fb79d/hamt.go#L337-L358 because the shard is full.
  5. Repeatedly deepen the hamt until you run out of bits in the hash.
  6. Return an error.

@Stebalien
Copy link
Member Author

@rvagg
Copy link
Member

rvagg commented Jul 22, 2020

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification? I'm working on an appendix to the IPLD HashMap spec which can be linked from go-hamt-ipld and the Filecoin specs that outlines the precise ways that the Filecoin HAMT varies and would like to be able to say what Filecoin uses. I imagine this might also need a SHA2-256 implementation alongside the current default @ https://github.com/ipfs/go-hamt-ipld/blob/master/hash.go ?

@austinabell
Copy link
Contributor

austinabell commented Jul 22, 2020

TL;DR: The hash needs to have at least 128 bits of security (but doesn't need to be safe against the birthday attack).

Is there a reason against using the 128 bit murmur3? 64 bit version is just a chopped 128 and there doesn't seem to be a clear reason to me why 64 bit hashes are being used as they are not being persisted in any way (in rust impl we are using 128 bit murmur3 because it wasn't specified to specifically use 64 bit but this should be resolved soon so we don't hit this edge case).

Edit: Seems to me, if I'm not misremembering things, that you would have to brute force all these hashes: 3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0. But yeah this doesn't seem much better. (with bit width 5 you don't even have to match the last 4 bits for the brute forced 3 keys)

@Stebalien
Copy link
Member Author

Murmur3 is not a cryptographic hash function. Technically murmur3-128 should be fine as long as the input keys are already cryptographic hashes of something, but that still makes me nervous (and I'm not even sure that that's the case here).

3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0.

Not quite. Buckets only apply to leaves (i.e., three values to a bucket). Once you fill a bucket, you break it out into a new shard.

So you'd need to crack 3 hashes, not 3*64.

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification?

@rvagg Which part? The fact that filecoin will use sha256?

@austinabell
Copy link
Contributor

Murmur3 is not a cryptographic hash function. Technically murmur3-128 should be fine as long as the input keys are already cryptographic hashes of something, but that still makes me nervous (and I'm not even sure that that's the case here).

I agree, just mentioning if you don't want actual breaking changes

3x first 60 bits matching, first 55 bits, first 50 bits, ... until the height 0.

Not quite. Buckets only apply to leaves (i.e., three values to a bucket). Once you fill a bucket, you break it out into a new shard.

Yes, I'm aware of this, I may have explained poorly. the 3 hashes you need are the ones in the leaf, and all of the other hashes to brute force (55, 50, 45, .., 5) are to push the height to the max. Because each item inserted to push the height needs to have 5*height bits matching to ensure the collision.

So you'd need to crack 3 hashes, not 3*64.

Sorry, bad wording, I meant just brute forcing 3 hashes with the first 60 bits matching, didn't mean to say you need 3*60 hashes

In any case, this is semantics, I agree with the outcome and let me know where I can find the final decision so we can make the change on our end

@Stebalien
Copy link
Member Author

Sorry, bad wording, I meant just brute forcing 3 hashes with the first 60 bits matching, didn't mean to say you need 3*60 hashes

Ah, yes, you're right. My point was that you don't need to brute-force at each level. If you brute-force 3 hashes at one level, then try to insert a 4th item with the same hash, that will immediately try to recursively expand the HAMT till it runs out of hash bits.

In any case, this is semantics, I agree with the outcome and let me know where I can find the final decision so we can make the change on our end

👍

I believe the final decision is "use sha256" but I'm not sure what remains to make that happen.

@austinabell
Copy link
Contributor

Ah, yes, you're right. My point was that you don't need to brute-force at each level. If you brute-force 3 hashes at one level, then try to insert a 4th item with the same hash, that will immediately try to recursively expand the HAMT till it runs out of hash bits.

No? Because if the hash is the same, the value will be overwritten. The hash has to be distinct, but match the previous bytes. I verified on the go impl and my impl but maybe I'm just overlooking something here.

@Stebalien
Copy link
Member Author

Because if the hash is the same, the value will be overwritten

That shouldn't be the case. We should get to:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L355

Then call:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L294

Then fail with ErrMaxDepth.

@austinabell
Copy link
Contributor

Because if the hash is the same, the value will be overwritten

That shouldn't be the case. We should get to:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L355

Then call:

https://github.com/ipfs/go-hamt-ipld/blob/21886a1edb600df1cc6cead007fcba408f47af91/hamt.go#L294

Then fail with ErrMaxDepth.

Ahhh you're definitely right, my fault. So basically you would need 4 hashes where the first 60 bits are the same and the last 4 different.

I would still argue even without this case being hit that 64 bit hashes would still cause issues because a collision would just overwrite the data and that seems equally as bad as not allowing certain values in.

@Stebalien
Copy link
Member Author

I agree, both cases are equally bad.

Stebalien added a commit to Stebalien/lotus that referenced this issue Jul 23, 2020
This way there's a single source of truth. Preparation for fixing
filecoin-project/specs-actors#517 (requires changing
HAMT parameters).
Stebalien added a commit to Stebalien/lotus that referenced this issue Jul 23, 2020
This way there's a single source of truth. Preparation for fixing
filecoin-project/specs-actors#517 (requires changing
HAMT parameters).
Stebalien added a commit that referenced this issue Jul 23, 2020
And switch to sha256 based HAMTs to fix #517.

Requires filecoin-project/lotus#2539 in lotus (so our
HAMTs remain compatible).
Stebalien added a commit that referenced this issue Jul 23, 2020
And switch to sha256 based HAMTs to fix #517.

Requires filecoin-project/lotus#2539 in lotus (so our
HAMTs remain compatible).
Stebalien added a commit that referenced this issue Jul 23, 2020
And switch to sha256 based HAMTs to fix #517.

Requires filecoin-project/lotus#2539 in lotus (so our
HAMTs remain compatible).
@anorth anorth added the P1 High priority, required for basic network functionality and growth label Jul 23, 2020
@anorth anorth added this to the 💙 Mainnet milestone Jul 23, 2020
@rvagg
Copy link
Member

rvagg commented Jul 23, 2020

Is this going to get locked into Lotus at some point soon and therefore need to go into the specification?

@rvagg Which part? The fact that filecoin will use sha256?

Yeah, I've pinned down the specifics of the Filecoin HAMT in ipld/specs#282 that we're using for spec documentation but that will need to reflect a change to SHA2-256 and that'll need to get communicated to other implementers too. I'm mostly interested in knowing how likely this change is.

There's also https://github.com/ipfs/go-hamt-ipld/issues/53 which is a potential breaking change that should get done (and documented) sooner than later or it'll become a big problem later.

I hope this isn't derailing this issue too much, but as a side note related to all of this: as I've pointed out before, chain height is the main mechanism available at the moment for versioning a lot of these things, but it's not a very nice way of versioning when you're trying to inspect independent portions of the chain, so IMO these things shouldn't be considered all that flexible over time, the fewer changes the better. Unless you want to add more explicit versioning fields to the data structures that consume the HAMT and other complex things that are likely to get tweaked over time. Another example of potential tweakage is the bitWidth of the HAMT which is currently set to 5 in Filecoin, which gives a branching factor of 32. We never did research into the costs of that decision and it's mostly a guess that it's a good tradeoff between storage size and mutation cost. I'm guessing that there may be a point when we can back up and do some research and want to change that number to either fatten or thin the tree to account for how it's being used with this specific data set.

@anorth
Copy link
Member

anorth commented Jul 23, 2020

The setting of bitwidth=5 is not a guess, but an empirically determined number from experiments performed by @hannahhoward and @acruikshank in mid-2019. However, the dominant workload for that HAMT was different to the workload now, so it's quite possible that the optimum is an adjacent value now. I do consider it pretty much locked-in for good for the HAMTs that will exist in code at genesis, though.

We should repeat the benchmarking.

Stebalien added a commit to filecoin-project/lotus that referenced this issue Jul 23, 2020
This way there's a single source of truth. Preparation for fixing
filecoin-project/specs-actors#517 (requires changing
HAMT parameters).
Stebalien added a commit to Stebalien/lotus that referenced this issue Jul 23, 2020
This way there's a single source of truth. Preparation for fixing
filecoin-project/specs-actors#517 (requires changing
HAMT parameters).
Stebalien added a commit that referenced this issue Jul 23, 2020
And switch to sha256 based HAMTs to fix #517.
sbwtw pushed a commit to sbwtw/lotus that referenced this issue Jul 25, 2020
This way there's a single source of truth. Preparation for fixing
filecoin-project/specs-actors#517 (requires changing
HAMT parameters).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
P1 High priority, required for basic network functionality and growth
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants