-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
k-bucket support for proper kad bootstrapping #38
k-bucket support for proper kad bootstrapping #38
Conversation
@Stebalien This is the k-bucket work for correct dht bootstrapping. Please review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other random comment: these functions should have comments describing what they do and any requirements (e.g., the slice returned by GetAllBuckets
must not be modified or appended to).
func (rt *RoutingTable) GetAllBuckets() []*Bucket { | ||
rt.tabLock.RLock() | ||
defer rt.tabLock.RUnlock() | ||
return rt.Buckets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment explaining why this is safe (it's append-only).
if CommonPrefixLen(ConvertPeerID(peerID), rt.local) == targetCpl { | ||
return peerID, err | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. This is very unfortunate. Given our network size, we should have at least 16 buckets). That's 2**16 hashes to find a peer in that last bucket.
I don't know how to fix this on our current DHT. Given how we've defined the protocol, we have to know the un-hashed key.
Honestly, I think the best solution is to pre-compute a table with a few known-good bootstrap keys in each of the first 20 buckets. If you can write a program to do this (ideally highly parallel), I can run it on a really beefy machine for a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing context here, but a couple comments:
- Why do we need a public non-test function for computing a random peer ID?
- Even if we needed to generate random peer IDs we don't need to hash anything since the
randPeerID
function does not return the material that gets hashed into the PeerID. As a result, if I haven
buckets and want bucketi
that has 20 matching prefix bits with me all I should need to do is randomly generate a single number from 0 to 2^(n-20) and then just prepend the matching 20 bits. So very little computation (one random number generation) should be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann this is for libp2p/go-libp2p-kad-dht#375 (I've updated the PR description).
2
Due to how our DHT works, the we need to send the unhashed key with the request. That means we need to know the unhashed key. Kademlia as described by the paper assumes that all keys will be pre-hashed. If we wanted to do that, we'd have to pre-hash the keys when we store records in the datastore (e.g., peer routing information).
We could modify the DHT to send both the unhashed and hash keys but that seems like a lot of extra work just to support bootstrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien thx for the PR pointer. Fair enough about not wanting to do much changing to the dht message types if we don't have to. However, I'm not sure the solution of pre-generating keys helps. We can pre-generate keys for a single peer, but generating them for all peers is going to need require packaging a ton of keys (because the bucket distance is computed based on your local ID).
In any event it still doesn't seem like we need the hash function inside of randPeerID
, just the one in ConvertPeerID
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah, now I feel like an idiot. This isn't going to work (we'd need to ship ~1m peer IDs).
In any event it still doesn't seem like we need the hash function inside of randPeerID, just the one in ConvertPeerID.
The issue is that we need to make a DHT request with this key. To make the DHT request, we need to know the non-hashed key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out this didn't need a beefy machine and we really don't need 1m peer IDs. I've created a branch with a table mapping 20 bit hash prefixes to numbers that can be used to generate peer IDs that hash to these 20 bit hash prefixes: https://github.com/libp2p/go-libp2p-kbucket/tree/feat/bucket-prefix-gen
More concretely, to find a peer ID in a bucket N, we'll have to:
- Generate a random 20 bit number.
- Hash our peer ID (sha256)
- Replace the first N bits with the first N bits from our hashed peer ID.
- Lookup this random number in the table in the table to get SEED.
- Encode SEED into a 32 byte slice using BigEndian ordering.
- Use this slice as the hash digest in a sha256 multihash.
- This multihash is the peer ID.
When this peer ID is hashed, the first N bits should match the result of step (2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside is that this increases our binary size by 4MiB. But I'm not that concerned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reduced it to 16 bits (16 buckets) to reduce the size to 256KiB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien This was a really cool idea ! Have rebased on top of your branch & made all the changes. Let me know what you think.
PS: Made some minor changes to generate/main.go(line 31 & 58) & generated a new prefix-map so we can use it exactly as you've mentioned in the algorithm above.
Basically, instead of crafting the multi-hash like this :
[]byte{mh.Sha256, 32, 24 zero byte slices, 8 bytes of an Uint64}
We craft it as:
[]byte{mh.Sha256, 32, 8 bytes of an Uint64, 24 zero byte slices}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, apologies for the delay in getting this up. Things were a bit hectic at work.
table.go
Outdated
if _, err := io.ReadFull(r, buf); err != nil { | ||
return "", err | ||
} | ||
h, _ := mh.Sum(buf, mh.SHA2_256, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From above:
In any event it still doesn't seem like we need the hash function inside of randPeerID, just the one in ConvertPeerID.
The issue is that we need to make a DHT request with this key. To make the DHT request, we need to know the non-hashed key.
Sorry for not making this clearer. We still have to hash to turn peerID
into a kad appropriate hash. However, in randPeerID
there's no reason we need to hash random bytes when we could just craft a multihash with some random bytes in it (e.g. mh.Encode(buf, mh.ID)
might work well and buf only needs to be 8 bytes long since 2^8=256
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann Good point ! We've done something similar now.
@Stebalien has come up with a mapping m such that
Let k be a random uint64.
Let crmh(k) = A crafted sha2-256 multihash where the hash digest is k.
Then, m[first sixteen bits of sha2-256(crmh(k))] = k.
I've rebased on top of his branch and changed the code accordingly. Do let me know what you think !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me, clever that we can avoid sending some bits on the wire by just using a bunch of storage (proportional to global network size).
b1d55d1
to
776df76
Compare
table.go
Outdated
log.Debugf("failed to generate random peerID in bucket %d, error is %+v", bucketID, err) | ||
continue | ||
// generate random 16 bits | ||
r := rand.New(rand.NewSource(time.Now().UnixNano())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just single call to rand after rand.New
can be biased. The rand
should probably be in *RoutingTable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kubuxu That sounds interesting. Even if I seed it with the current time in nanos ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pseudo-random algorithms provide pseudo-random numbers if you use them for some time. There can be an initial bias if you use only the one number from the prng. It depends on the implementation of the prng itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we want to be able to manipulate the random number (i.e. for testing) why don't we just use rand's built in global? Are we concerned about contention on that lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the built-in global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay. It's been a bit busy here.
bucket.go
Outdated
@@ -108,3 +127,5 @@ func (b *Bucket) Split(cpl int, target ID) *Bucket { | |||
} | |||
return newbuck | |||
} | |||
|
|||
//go:generate go run ./generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(should probably put this at the top, actually.
generate/main.go
Outdated
@@ -55,7 +56,7 @@ func main() { | |||
|
|||
printf("package %s\n\n", pkg) | |||
printf("// Code generated by generate/generate_map.go DO NOT EDIT\n") | |||
printf("var keyPrefixMap = [...]uint32{") | |||
printf("var keyPrefixMap = [...]uint64{") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint32 should be sufficient. Using uint64 will double the size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien Makes sense. Have also changed the map generation logic to use uint32 instead of uint64 to craft the hash digest.
table.go
Outdated
log.Debugf("failed to generate random peerID in bucket %d, error is %+v", bucketID, err) | ||
continue | ||
// generate random 16 bits | ||
r := rand.New(rand.NewSource(time.Now().UnixNano())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the built-in global.
generate/main.go
Outdated
hasher := sha256.New() | ||
|
||
for i := uint64(0); count < target; i++ { | ||
binary.BigEndian.PutUint64(inp[2:], i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we had to do this. Putting this at the end should have worked just as well (but either way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think of it, you are right. This will work just as well if I put the integer in the same position while crafting the multihash.
Just seemed liked a good idea at that time 😄
table.go
Outdated
} | ||
|
||
// GenRandPeerID generates a random peerID in bucket=bucketID | ||
func (rt *RoutingTable) GenRandPeerID(bucketID int) (peer.ID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, let's just panic if the user passes a negative bucket. That's a program error, not a runtime error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien Thanks for the beautiful bit magic. Just had to make some changes to it. Have explained them in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also remove the error from the signature.
} | ||
|
||
// test generate rand peer ID | ||
for bucketID := 0; bucketID < nBuckets; bucketID++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's test this with with more buckets (40?) to check for overflow issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Stebalien Generating 40 buckets takes a lot of time as a new bucket is created ONLY IF the last bucket is full. This slows down the tests. So testing with 21 for now as it's still an overflow & hits all the code-paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be slow... We shouldn't have to generate 40 buckets, just ask for peer IDs in buckets 1-40 (where everything after 16 would just be random IDs.
But this should be good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I misunderstood your initial comment. So you meant simply passing in a bucketID of upto 40 to the GenRandPeerID method & not actually having 40 buckets. That makes more sense. Apologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But really, anything over 16 is plenty.
Fix bootstrapping id generation logic
@Stebalien Have addressed your comments & made the changes. Please take a look. Thanks ! |
2e4fbe5
to
71292a3
Compare
71292a3
to
d5e5a48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two small changes but otherwise LGTM. Thanks for catching the bit toggling issue!
table.go
Outdated
@@ -70,7 +70,7 @@ func (rt *RoutingTable) GetAllBuckets() []*Bucket { | |||
// GenRandPeerID generates a random peerID in bucket=bucketID | |||
func (rt *RoutingTable) GenRandPeerID(bucketID int) (peer.ID, error) { | |||
if bucketID < 0 { | |||
return "", errors.New("bucketID must be non-negative") | |||
panic(errors.New(fmt.Sprintf("bucketID %d is not non-negative", bucketID))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: error.New
isn't necessary.
table.go
Outdated
} | ||
|
||
// GenRandPeerID generates a random peerID in bucket=bucketID | ||
func (rt *RoutingTable) GenRandPeerID(bucketID int) (peer.ID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also remove the error from the signature.
@Stebalien Have made all the changes. I think PR can be merged now. Thanks ! |
For libp2p/go-libp2p-kad-dht#375.