RFM-16: Effectiveness of Bitswap Discovery Process #25

guillaumemichel · 2022-12-14T13:23:57Z

Here comes the long awaited RFM16 report on the Effectiveness of Bitswap Discovery Process.

All feedback is welcome.

Note: The measurement tool implementation and the data analysis scripts are not included yet, but will be added before this PR is merged.

results/rfm16-bitswap-discovery-effectiveness.md

dennis-tra · 2022-12-14T16:07:17Z

results/rfm16-bitswap-discovery-effectiveness.md

+Modifying the value of `ProviderSearchDelay` seems to have a limited global impact, as it concerns only 1%-2% of all Bitswap requests. However, we believe that it is worth it to accelerate these requests. Another obvious improvement, is reducing Bitswap flooding, as it causes a lot of, sometimes unnecessary, overhead in the network.
+
+### Removing the `ProviderSearchDelay`
+From our measurements, we observed that Bitswap content discovery is very efficient, however, every Bitswap request on average generates more than 1700 messages, which is substential. For specific content that isn't found by Bitswap discovery 1%-2% of the observed traffic, the DHT lookup for this content is currently delayed for 1 second, a significant overhead. From the results of this study, we propose to set the `ProviderSearchDelay` to `0` for standard `kubo` nodes, or in other words, to start the DHT lookup concurrently with the Bitswap discovery process.


From the results of this study, we propose to set the

I actually don't follow this conclusion. Bitswap seems to be great and we don't need the DHT overhead 🤷

Even if Bitswap is fast and accurate, it would be good to start the DHT walk concurrently, to make it faster.

The tradeoff here is to send 3-6 additional messages (~1700 are already sent) to decrease the tail latency from 1 second. In 98% of the cases, content resolution would take the same time, but in 2% of the cases it would take 1s less.

@guillaumemichel the overhead of the DHT is mainly related to the connection establishment part which includes several RTTs until it is complete. In terms of messages sent it is minimal, indeed. I continue not being negative to the 0ms setting here, although given the very small percentage of queries not found through Bitswap, it might be more reasonable to consider setting it to 400ms.

I agree that the improvement of setting the ProviderSearchDelay to 0ms would be a minor improvement vs 400ms or even 1s. However the price to pay to get this small improvement is even more negligible.

Moreover, 0ms would be appropriate for all peers, even poorly connected ones, and we don't introduce a new magic number :)

dennis-tra · 2022-12-14T16:11:43Z

results/rfm16-bitswap-discovery-effectiveness.md

+### Removing the `ProviderSearchDelay`
+From our measurements, we observed that Bitswap content discovery is very efficient, however, every Bitswap request on average generates more than 1700 messages, which is substential. For specific content that isn't found by Bitswap discovery 1%-2% of the observed traffic, the DHT lookup for this content is currently delayed for 1 second, a significant overhead. From the results of this study, we propose to set the `ProviderSearchDelay` to `0` for standard `kubo` nodes, or in other words, to start the DHT lookup concurrently with the Bitswap discovery process.
+
+Starting the DHT walk concurrently to the Bitswap request would imply initially sending [$\alpha=3$](https://github.com/libp2p/specs/tree/master/kad-dht#alpha-concurrency-parameter-%CE%B1) additional messages. If we get the block from Bitswap before we hear back from the DHT, the DHT walk is aborted, so the overhead is limited to 3 messages. In the case we hear back from the DHT before getting a block from Bitswap, the DHT walk continues, one additional message is sent to the DHT for each response we get from the DHT. Note that the number of inflight DHT requests is limited to 3. Therefore, we expect between 3 and 6 additional messages, in the case the Bitswap request is successful, which represents an overhead of $\frac{4.5}{1720}=0.262\%$. In the case the Bitswap request doesn't succeed in the first second, no additional messages are sent and the node doesn't wait one full second in vain.


Okay, I see. However, I think the connection establishment to other DHT peers is super expensive compared to the Bitswap message exchange with an already connected peer. So I think this may not be a fair comparison.

$\frac{4.5}{1720}=0.262\%$ the % isn't rendered correctly :/

connection establishment to other DHT peers is super expensive

IIUC, connection establishment is expensive in time (e.g multiple RTTs are required to open a connection to a new peer). So it is true that it may require more messages than planned (e.g if 3 RTTs are required to open a new connection + DHT request, it would make 9-18 additional messages) but it remains very small compared to the number of messages sent by Bitswap

We are trading more speed against more spent messages.

results/rfm16-bitswap-discovery-effectiveness.md

dennis-tra · 2022-12-14T16:18:56Z

results/rfm16-bitswap-discovery-effectiveness.md

+The steps can be defined either by setting a fixed number of peers to contact for each step, or by setting a percentage of block coverage that should be served by the contacted peers. The number of peers contacted in each step should start small and follow an exponential growth. Ideally, a score for each provider for the past time period should be saved on disk on node shutdown.
+
+#### Request Context
+Another possibility is to use Contexts in Content Routing. A CID could be bundled with additional information, such as a prefix to provide more information about the content type, or where it might be stored. For instance, if the requested content is a NFT, it is likely to be served by a nft.storage peer, so we want the node to broadcast the request to nft.storage peers only. Each node could maintain a list of potential providers for each context, and request the peers associated with the request's context in priority. Other systems, such as the Indexers and DHTs could benefit from context routing too. However, this idea goes against the principle of flat namespace that IPFS is using.


This reminds me of magnet links: https://en.wikipedia.org/wiki/Magnet_URI_scheme

dennis-tra · 2022-12-14T16:22:30Z

results/rfm16-bitswap-discovery-effectiveness.md

+
+This study showed Bitswap to be a fast and accurate mean of finding content in the IPFS network, with an discovery success rate of 98%, and 75% of the content was fetched within 200 ms. However, we measured that Bitswap literally floods the network by soliciting 853 peers per request on average, sending a total of 1714 messages. The high success rate can be explained by the fact that most content is served by a very small number of peers. 10 peers roughly serve 60% of the content requested in our study. Over time nodes will eventually discover these super providers, and hence requesting content to these peers is likely to result in a successful fetch. We cannot be certain that the list of CIDs we used for our measurements is totally representative of the IPFS traffic, but we double checked by taking 2 different sources of CIDs, and the results were similar.
+
+In order to accelerate Bitswap, we suggest to remove the `ProviderSearchDelay` and start the DHT lookup concurrently to the Bitswap broadcast. The network overhead is minimal (~0.258%), and the tail latency decreases from 1 second. If removing the `ProviderSearchDelay` isn't an option, decreasing its value would help. Limiting the query broadcasts from Bitswap would help reduce the traffic in the network. A significant improvement would be to carefully select the peers from which we request content, rather than flooding the network with a broadcast.


As said above, I think the network overhead is significantly higher - it could still be worth it though. Things to consider:

how often do we need to connect to peer during the DHT walk (we could already be connected to it with a 5% chance)

what's the overhead of a connection establishment (handshakes)

I think if we had these numbers we could compare it with the bitswap message exchange.

I asked this question in #libp2p-chatter. Anyway, if Bitswap returns the block if the DHT handshakes are still ongoing, all DHT traffic is aborted. Hence, we don't expect to have more traffic than 3-6 additional messages if the content is fetched by Bitswap in 200ms. If the content is fetched in 500ms, then we may have 6-9 additional messages, etc.

Suppose we want to act upon the ProviderSearchDelay and reduce it to 500 ms. Now we compare this solution with Bitswap and the DHT starting concurrently.

Concurrent DHT & Bitswap would reduce tail latency by 500 ms.

Concurrent DHT & Bitswap add as many messages as the DHT can exchanges with at most 3 peers in 500ms. This number is expected to be very limited (<10), unless many of the DHT server nodes are at a very small ping distance away. (+ (PR[Bitswap success in >500]* (time_for_bitswap_to_find_content_in_seconds - 500ms) * number_of_DHT_messages_for_3_DHT_servers_per_second) which is expected to be very small)

Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>

yiannisbot

Excellent report! Several nit picks and beautification suggestions and a few comments for structural changes. Thanks a lot @guillaumemichel ! Great work!

results/rfm16-bitswap-discovery-effectiveness.md

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

mxinden · 2022-12-16T17:26:43Z

Thank you @guillaumemichel! This has been a good read.

mxinden

Again, thank you for the extensive write-up!

mxinden · 2023-01-10T12:50:22Z

results/rfm16-bitswap-discovery-effectiveness.md

+| 8. | 12D3KooWENiDwDCPnbEQKHHsDnSsE5Y3oLyXnxuyhjcCEBK9TvkU | 2051 |
+| 9. | 12D3KooWC9L4RjPGgqpzBUBkcVpKjJYofCkC5i5QdQftg1LdsFb2 | 1826 |


As this list may not be up-to-date, it is very likely that all peers in this list are actually operated by nft.storage.

For what it is worth, these two peers (out of the 4 non nft.storage peers) are the only peer running kubo/0.17.0/4485d6b using an ovh.net IP address. nft.storage nodes as well as the remaining non-nft.storage nodes seem to be running the same kubo version, namely kubo/0.14.0/e0fabd6 where reverse DNS resolves to nsone.net.

Let me know in case further digging here is helpful. Feel free to ignore.

Very interesting

results/rfm16-bitswap-discovery-effectiveness.md

Co-authored-by: Max Inden <mail@max-inden.de>

yiannisbot

Great work @guillaumemichel! Some minor edit suggestions. Have a look and merge! 🚀

results/rfm16-bitswap-discovery-effectiveness.md

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

guillaumemichel added 5 commits December 13, 2022 15:25

adding results and analysis for rfm16

9de6e28

added packets counts stats

e4f7a7c

added improvement suggestions

945a693

added conclusion

d7e2c25

corrected typos

fe53996

dennis-tra reviewed Dec 14, 2022

View reviewed changes

guillaumemichel and others added 4 commits December 14, 2022 17:32

Update results/rfm16-bitswap-discovery-effectiveness.md

1cdce6c

Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>

Update results/rfm16-bitswap-discovery-effectiveness.md

e347599

Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>

Update results/rfm16-bitswap-discovery-effectiveness.md

00330d8

Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>

Update results/rfm16-bitswap-discovery-effectiveness.md

458bd21

Co-authored-by: Dennis Trautwein <dennis.trautwein@posteo.de>

yiannisbot requested changes Dec 15, 2022

View reviewed changes

guillaumemichel and others added 19 commits December 15, 2022 10:46

Update results/rfm16-bitswap-discovery-effectiveness.md

37d8dd0

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

d81687d

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

62380cf

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

70d374f

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

3d829c5

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

c304753

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

ff37b02

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

df96748

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

0006cd5

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

3719ec2

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

f333dde

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

0bc717b

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

4fad9a0

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

5f8adb0

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

4123213

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

d208c3c

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

768aef3

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

4ada322

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

37f05a0

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

guillaumemichel added 14 commits December 15, 2022 11:19

testing % formatting

3fd7f6e

testing alpha character formatting

fd7eed7

github markdown formatting test

9c5489a

addressed reviews from Dennis and Yiannis

faddee0

adding kubo and bitswap forks

bde352a

added measurements scripts, they still need cleaning

235b133

adding reference to Hydra Reliability and Effectiveness study

8678c6c

rephrases time distribution

b7a78da

added step-by-step broadcast initial populate strategy

c863102

addind submodules

c1b3f52

Merge remote-tracking branch 'upstream/master'

c85f7d2

updated python scripts

c7615dd

added readme to implementation folder

0ff16f2

adding cids

ad5df1c

added CIDs for data files stored on web3.storage

f2f02c3

guillaumemichel requested a review from yiannisbot January 6, 2023 09:44

updated RFMs.md

7a70c3f

mxinden reviewed Jan 10, 2023

View reviewed changes

guillaumemichel and others added 2 commits January 10, 2023 14:11

Update results/rfm16-bitswap-discovery-effectiveness.md

d232ecb

Co-authored-by: Max Inden <mail@max-inden.de>

Merge branch 'master' into master

c0d7b4d

yiannisbot approved these changes Jan 16, 2023

View reviewed changes

results/rfm16-bitswap-discovery-effectiveness.md Outdated Show resolved Hide resolved

results/rfm16-bitswap-discovery-effectiveness.md Outdated Show resolved Hide resolved

guillaumemichel and others added 2 commits January 17, 2023 08:29

Update results/rfm16-bitswap-discovery-effectiveness.md

6052a76

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

Update results/rfm16-bitswap-discovery-effectiveness.md

3225444

Co-authored-by: Yiannis Psaras <52073247+yiannisbot@users.noreply.github.com>

guillaumemichel merged commit 328190e into probe-lab:master Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFM-16: Effectiveness of Bitswap Discovery Process #25

RFM-16: Effectiveness of Bitswap Discovery Process #25

guillaumemichel commented Dec 14, 2022

dennis-tra Dec 14, 2022

guillaumemichel Dec 15, 2022

yiannisbot Dec 16, 2022

guillaumemichel Dec 16, 2022

dennis-tra Dec 14, 2022

dennis-tra Dec 14, 2022

guillaumemichel Dec 15, 2022

guillaumemichel Dec 15, 2022

dennis-tra Dec 14, 2022

dennis-tra Dec 14, 2022

guillaumemichel Dec 15, 2022

yiannisbot left a comment

mxinden commented Dec 16, 2022

mxinden left a comment

mxinden Jan 10, 2023 •

edited

Loading

guillaumemichel Jan 10, 2023

yiannisbot left a comment


		This study showed Bitswap to be a fast and accurate mean of finding content in the IPFS network, with an discovery success rate of 98%, and 75% of the content was fetched within 200 ms. However, we measured that Bitswap literally floods the network by soliciting 853 peers per request on average, sending a total of 1714 messages. The high success rate can be explained by the fact that most content is served by a very small number of peers. 10 peers roughly serve 60% of the content requested in our study. Over time nodes will eventually discover these super providers, and hence requesting content to these peers is likely to result in a successful fetch. We cannot be certain that the list of CIDs we used for our measurements is totally representative of the IPFS traffic, but we double checked by taking 2 different sources of CIDs, and the results were similar.

		In order to accelerate Bitswap, we suggest to remove the `ProviderSearchDelay` and start the DHT lookup concurrently to the Bitswap broadcast. The network overhead is minimal (~0.258%), and the tail latency decreases from 1 second. If removing the `ProviderSearchDelay` isn't an option, decreasing its value would help. Limiting the query broadcasts from Bitswap would help reduce the traffic in the network. A significant improvement would be to carefully select the peers from which we request content, rather than flooding the network with a broadcast.

		\| 8. \| 12D3KooWENiDwDCPnbEQKHHsDnSsE5Y3oLyXnxuyhjcCEBK9TvkU \| 2051 \|
		\| 9. \| 12D3KooWC9L4RjPGgqpzBUBkcVpKjJYofCkC5i5QdQftg1LdsFb2 \| 1826 \|

RFM-16: Effectiveness of Bitswap Discovery Process #25

RFM-16: Effectiveness of Bitswap Discovery Process #25

Conversation

guillaumemichel commented Dec 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiannisbot left a comment

Choose a reason for hiding this comment

mxinden commented Dec 16, 2022

mxinden left a comment

Choose a reason for hiding this comment

mxinden Jan 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yiannisbot left a comment

Choose a reason for hiding this comment

mxinden Jan 10, 2023 •

edited

Loading