-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFM-16: Effectiveness of Bitswap Discovery Process #25
Conversation
Modifying the value of `ProviderSearchDelay` seems to have a limited global impact, as it concerns only 1%-2% of all Bitswap requests. However, we believe that it is worth it to accelerate these requests. Another obvious improvement, is reducing Bitswap flooding, as it causes a lot of, sometimes unnecessary, overhead in the network. | ||
|
||
### Removing the `ProviderSearchDelay` | ||
From our measurements, we observed that Bitswap content discovery is very efficient, however, every Bitswap request on average generates more than 1700 messages, which is substential. For specific content that isn't found by Bitswap discovery 1%-2% of the observed traffic, the DHT lookup for this content is currently delayed for 1 second, a significant overhead. From the results of this study, we propose to set the `ProviderSearchDelay` to `0` for standard `kubo` nodes, or in other words, to start the DHT lookup concurrently with the Bitswap discovery process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the results of this study, we propose to set the
I actually don't follow this conclusion. Bitswap seems to be great and we don't need the DHT overhead 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if Bitswap is fast and accurate, it would be good to start the DHT walk concurrently, to make it faster.
The tradeoff here is to send 3-6 additional messages (~1700 are already sent) to decrease the tail latency from 1 second. In 98% of the cases, content resolution would take the same time, but in 2% of the cases it would take 1s less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guillaumemichel the overhead of the DHT is mainly related to the connection establishment part which includes several RTTs until it is complete. In terms of messages sent it is minimal, indeed. I continue not being negative to the 0ms setting here, although given the very small percentage of queries not found through Bitswap, it might be more reasonable to consider setting it to 400ms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the improvement of setting the ProviderSearchDelay
to 0ms would be a minor improvement vs 400ms or even 1s. However the price to pay to get this small improvement is even more negligible.
Moreover, 0ms would be appropriate for all peers, even poorly connected ones, and we don't introduce a new magic number :)
### Removing the `ProviderSearchDelay` | ||
From our measurements, we observed that Bitswap content discovery is very efficient, however, every Bitswap request on average generates more than 1700 messages, which is substential. For specific content that isn't found by Bitswap discovery 1%-2% of the observed traffic, the DHT lookup for this content is currently delayed for 1 second, a significant overhead. From the results of this study, we propose to set the `ProviderSearchDelay` to `0` for standard `kubo` nodes, or in other words, to start the DHT lookup concurrently with the Bitswap discovery process. | ||
|
||
Starting the DHT walk concurrently to the Bitswap request would imply initially sending [$\alpha=3$](https://github.com/libp2p/specs/tree/master/kad-dht#alpha-concurrency-parameter-%CE%B1) additional messages. If we get the block from Bitswap before we hear back from the DHT, the DHT walk is aborted, so the overhead is limited to 3 messages. In the case we hear back from the DHT before getting a block from Bitswap, the DHT walk continues, one additional message is sent to the DHT for each response we get from the DHT. Note that the number of inflight DHT requests is limited to 3. Therefore, we expect between 3 and 6 additional messages, in the case the Bitswap request is successful, which represents an overhead of $\frac{4.5}{1720}=0.262\%$. In the case the Bitswap request doesn't succeed in the first second, no additional messages are sent and the node doesn't wait one full second in vain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. However, I think the connection establishment to other DHT peers is super expensive compared to the Bitswap message exchange with an already connected peer. So I think this may not be a fair comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$\frac{4.5}{1720}=0.262\%$
the %
isn't rendered correctly :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
connection establishment to other DHT peers is super expensive
IIUC, connection establishment is expensive in time (e.g multiple RTTs are required to open a connection to a new peer). So it is true that it may require more messages than planned (e.g if 3 RTTs are required to open a new connection + DHT request, it would make 9-18 additional messages) but it remains very small compared to the number of messages sent by Bitswap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trading more speed against more spent messages.
The steps can be defined either by setting a fixed number of peers to contact for each step, or by setting a percentage of block coverage that should be served by the contacted peers. The number of peers contacted in each step should start small and follow an exponential growth. Ideally, a score for each provider for the past time period should be saved on disk on node shutdown. | ||
|
||
#### Request Context | ||
Another possibility is to use Contexts in Content Routing. A CID could be bundled with additional information, such as a prefix to provide more information about the content type, or where it might be stored. For instance, if the requested content is a NFT, it is likely to be served by a nft.storage peer, so we want the node to broadcast the request to nft.storage peers only. Each node could maintain a list of potential providers for each context, and request the peers associated with the request's context in priority. Other systems, such as the Indexers and DHTs could benefit from context routing too. However, this idea goes against the principle of flat namespace that IPFS is using. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reminds me of magnet links: https://en.wikipedia.org/wiki/Magnet_URI_scheme
|
||
This study showed Bitswap to be a fast and accurate mean of finding content in the IPFS network, with an discovery success rate of 98%, and 75% of the content was fetched within 200 ms. However, we measured that Bitswap literally floods the network by soliciting 853 peers per request on average, sending a total of 1714 messages. The high success rate can be explained by the fact that most content is served by a very small number of peers. 10 peers roughly serve 60% of the content requested in our study. Over time nodes will eventually discover these super providers, and hence requesting content to these peers is likely to result in a successful fetch. We cannot be certain that the list of CIDs we used for our measurements is totally representative of the IPFS traffic, but we double checked by taking 2 different sources of CIDs, and the results were similar. | ||
|
||
In order to accelerate Bitswap, we suggest to remove the `ProviderSearchDelay` and start the DHT lookup concurrently to the Bitswap broadcast. The network overhead is minimal (~0.258%), and the tail latency decreases from 1 second. If removing the `ProviderSearchDelay` isn't an option, decreasing its value would help. Limiting the query broadcasts from Bitswap would help reduce the traffic in the network. A significant improvement would be to carefully select the peers from which we request content, rather than flooding the network with a broadcast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As said above, I think the network overhead is significantly higher - it could still be worth it though. Things to consider:
- how often do we need to connect to peer during the DHT walk (we could already be connected to it with a 5% chance)
- what's the overhead of a connection establishment (handshakes)
I think if we had these numbers we could compare it with the bitswap message exchange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked this question in #libp2p-chatter. Anyway, if Bitswap returns the block if the DHT handshakes are still ongoing, all DHT traffic is aborted. Hence, we don't expect to have more traffic than 3-6 additional messages if the content is fetched by Bitswap in 200ms. If the content is fetched in 500ms, then we may have 6-9 additional messages, etc.
Suppose we want to act upon the ProviderSearchDelay
and reduce it to 500 ms. Now we compare this solution with Bitswap and the DHT starting concurrently.
- Concurrent DHT & Bitswap would reduce tail latency by 500 ms.
- Concurrent DHT & Bitswap add as many messages as the DHT can exchanges with at most 3 peers in 500ms. This number is expected to be very limited (<10), unless many of the DHT server nodes are at a very small ping distance away. (+ (PR[Bitswap success in >500]* (time_for_bitswap_to_find_content_in_seconds - 500ms) * number_of_DHT_messages_for_3_DHT_servers_per_second) which is expected to be very small)
Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>
Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>
Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>
Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent report! Several nit picks and beautification suggestions and a few comments for structural changes. Thanks a lot @guillaumemichel ! Great work!
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Thank you @guillaumemichel! This has been a good read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thank you for the extensive write-up!
| 8. | 12D3KooWENiDwDCPnbEQKHHsDnSsE5Y3oLyXnxuyhjcCEBK9TvkU | 2051 | | ||
| 9. | 12D3KooWC9L4RjPGgqpzBUBkcVpKjJYofCkC5i5QdQftg1LdsFb2 | 1826 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this list may not be up-to-date, it is very likely that all peers in this list are actually operated by nft.storage.
For what it is worth, these two peers (out of the 4 non nft.storage peers) are the only peer running kubo/0.17.0/4485d6b
using an ovh.net
IP address. nft.storage nodes as well as the remaining non-nft.storage nodes seem to be running the same kubo version, namely kubo/0.14.0/e0fabd6
where reverse DNS resolves to nsone.net
.
Let me know in case further digging here is helpful. Feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting
Co-authored-by: Max Inden <mail@max-inden.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @guillaumemichel! Some minor edit suggestions. Have a look and merge! 🚀
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>
Here comes the long awaited RFM16 report on the Effectiveness of Bitswap Discovery Process.
All feedback is welcome.
Note: The measurement tool implementation and the data analysis scripts are not included yet, but will be added before this PR is merged.