Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS consumes a large amount of network traffic #2917

Closed
cminnoy opened this issue Jun 28, 2016 · 28 comments
Closed

IPFS consumes a large amount of network traffic #2917

cminnoy opened this issue Jun 28, 2016 · 28 comments
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) status/deferred Conscious decision to pause or backlog topic/dht Topic dht

Comments

@cminnoy
Copy link

cminnoy commented Jun 28, 2016

Ubuntu 16.01 Intel x64
ipfs version 0.4.1

Type bug
Area DHT?
Priority high

Description:

Hi,

I'm running two instances of IPFS (on two different machines), each connect to a different ISP.
I noticed for a while a very high network traffic from those nodes. Those nodes run only IPFS,
so no other network traffic from those machines than IPFS network traffic.
Last week, I didn't use IPFS on those machines, but IPFS was still turned on.
During that one week, the first node consumed 58 Gigabyte of network traffic, and the second node
59,9 Gigabyte. This network traffic was completly generated by IPFS while not using IPFS actively.
Sure this must be a serious bug. When I stop the IPFS daemon service, the network traffic consumption stopped immediatly.

Cheers,

Chris

@Kubuxu
Copy link
Member

Kubuxu commented Jun 28, 2016

0.4.1 is very very old version, please try using build from the current master branch as it contains many performance improvements but we are still working on in.

@gsf
Copy link

gsf commented Jun 28, 2016

Same behavior on my nodes as well, running on master.

@Stebalien
Copy link
Member

If you run a public facing node, it will serve up all blocks it has downloaded on request. Unfortunately, even if you haven't downloaded anything, you'll still be constantly bombarded with wantlists and download requests.

@Kubuxu
Copy link
Member

Kubuxu commented Jun 30, 2016

Not really download requests but general chatter (DHT upkeep which we try to balance) and wantlists which we will try to optimize. Reducing passive traffic is on our list and this version includes few improvements, for example #2817 but it needs network to adopt it before you will see reduction of traffic.

@Stebalien
Copy link
Member

Stebalien commented Jun 30, 2016

It would also be nice if ipfs stats bw gave an accurate bandwidth estimate. For me, actual (passive) bandwidth usage is usually over an order of magnitude greater than that reported by ipfs. Unfortunately, this is probably hard to implement.

@cminnoy
Copy link
Author

cminnoy commented Jul 28, 2016

Checked v0.4.3-dev yesterday/today. In 24 hours the DHT consumed 6.2 gigabyte of network traffic (almost half to receive and transfer). Hardly a good figure.

@whyrusleeping whyrusleeping added kind/bug A bug in existing code (including security flaws) topic/dht Topic dht labels Jul 28, 2016
@pchiusano
Copy link

pchiusano commented Jul 28, 2016

Great that this is being worked on... this issue is super important for anyone thinking of running IPFS nodes in the cloud where they will have to pay for bandwidth.

Can you give any sort of sense for what the ideal, expected about of bandwidth usage would be assuming that:

  • You aren't requesting files from anyone
  • You have an initially totally empty IPFS node running. Thus there aren't any files you are uniquely holding.

Should it be 0... very close to 0? It depends (on what exactly?) The system feels like a black box at the moment. I have no idea whether this is just a simple bug to fix or something inherently problematic with the algorithms being used.

I don't really know what I'm talking about :) but it seems like DHT upkeep should be minimal bandwidth, unless other nodes are redundantly polling you. Likewise wantlist traffic seems like it should be low unless there is massive flux in the set of files stored network-wide and/or massive flux in demand for files.

Another general comment / feature request - provide a setting for amount of bandwidth you want to allocate to IPFS, and have routing, etc, make use of this information somehow.

@whyrusleeping
Copy link
Member

@pchiusano Ideally, bandwidth usage would be quite low, but configurable depending on how helpful to the network you want your node to be. On average, i think hitting < 50kbps is a good 'low' goal. DHTs are very chatty, and the dht employed by ipfs is even more talkative than for example bittorrents mainline DHT due to the way we to content routing.

In the short term, we're looking into implementing hard limits on outgoing and incoming bandwidth. This has the potential to cause severe slowdowns, but should keep ipfs running without destroying low bandwidth connections.

On the still-short-but-slightly-longer term we are looking at different options for content routing. That portion of the dht could be swapped out by a 'tracker' or 'federated dht' of supernodes to reduce the bandwidth consumption on any node choosing to use that system. This obviously impairs the decentralization of the system, but will be a good option in many cases. Even with this system, you will still be able to fall back to searching through the dht if you want to.

In the longer-ish term, we're hoping to implement more advanced routing systems and combining 'supernodes' and more exciting DHT algorithms (see coral clustered DHTs), as well as optimizing many aspects of how the content routing system works.

If any of this is interesting to anyone, I highly encourage you to get involved and help out. There's always plenty to do :)

@pchiusano
Copy link

Thanks for detailed reply. Do you have a formal description of the DHT algorithm being used? And to what extent can that be changed while still being IPFS?

@whyrusleeping
Copy link
Member

@pchiusano We don't have a formal writeup of the logic for our current DHT implementation, but it is an implementation of kademlia with peer routing, a direct value storage layer (for ipns records and public keys) as well as an indirect content storage layer (for storage records of who has which objects). Our K-Value as per kademlia is 20.

The majority of the bandwidth problem is the sheer number of provider records we store. Each provider record needs to be stored on the K closest peers to the block its referencing, which means lots of dht crawling during large adds. We have a few different ideas on improving this, such as batching outgoing provider storage (to save on the number of RPCs required) and integrating supernodes into the logic to short circuit certain routines.

@Stebalien
Copy link
Member

(theoretically, not all nodes need to run a full DHT).

@jbenet
Copy link
Member

jbenet commented Aug 4, 2016

  • not all nodes need a dht
  • resource constraints are much needed, and coming. please help us make
    them!
  • can even resource constrain per protocol (keeping dht serving low for
    example)
  • definitely want to spend way less bw.

We should experiment with some "client only" dht nodes relatively soon.
Will mean upgrading the protocol as some expectations would need to change
On Thu, Aug 4, 2016 at 11:54 Steven Allen notifications@github.com wrote:

(theoretically, not all nodes need to run a full DHT).


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#2917 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcocskMizYgnGglqSTMB7BhoWxjnHQks5qcgs5gaJpZM4JAeg5
.

@ulrichard
Copy link

My boss came to me today and told me that my machine generated a lot of traffic. I found out that it was IPFS. To me that was totally unexpected. Throttling would be highly desirable.

@d10r
Copy link

d10r commented Oct 8, 2016

My idle ipfs node now generates >2GB traffic per hour (I know because that's the default threshold for warning emails at Hetzner).

At the moment, connected peers count is >300. Would it help to limit the number of connected peers?

@whyrusleeping
Copy link
Member

@d10r what does ipfs stats bw --proto=/ipfs/dht report?
Other protocol options to try checking are:

  • /ipfs/kad/1.0.0 <- new dht protocol, should be more efficient
  • /ipfs/bitswap
  • /ipfs/bitswap/1.0.0 <- newer clients

The older /ipfs/dht protocol is still the most widely deployed, but as the network migrates to 0.4.3 (and 0.4.4, which is currently 'master') the bandwidth consumption there should go down.

If youre running a recent build from source, you can disable the dht (almost) entirely by running your daemon with ipfs daemon --routing=dhtclient. This feature is still very experimental, but it should help. Let me know if you try it and notice any odd behaviour.

@Kubuxu
Copy link
Member

Kubuxu commented Oct 9, 2016

Old dht protocol will have lot more overhead that won't be included in the stats, (I would estimate it to be about 20-30%).

@gsf
Copy link

gsf commented Oct 10, 2016

Here's a report from my mostly idle node that's been running on master for a couple of days. Thanks for the continued efforts to constrain resource use!

# ipfs stats bw --proto=/ipfs/dht
Bandwidth
TotalIn: 718 MB
TotalOut: 2.2 GB
RateIn: 53 B/s
RateOut: 1.6 kB/s
# ipfs stats bw --proto=/ipfs/kad/1.0.0
Bandwidth
TotalIn: 58 MB
TotalOut: 140 MB
RateIn: 78 B/s
RateOut: 1.5 kB/s
# ipfs stats bw --proto=/ipfs/bitswap
Bandwidth
TotalIn: 9.2 GB
TotalOut: 78 kB
RateIn: 36 kB/s
RateOut: 0 B/s
# ipfs stats bw --proto=/ipfs/bitswap/1.0.0
Bandwidth
TotalIn: 9.7 MB
TotalOut: 357 B
RateIn: 0 B/s
RateOut: 0 B/s
# ps aux | grep ipf[s]
root     12010 11.8 78.9 1351400 399524 pts/2  Sl+  Oct08 443:20 ipfs daemon
# ipfs --version
ipfs version 0.4.4-dev

@d10r
Copy link

d10r commented Oct 13, 2016

My stats now for ~48h, ipfs 0.4.3:

ipfs@ipfs:~$ ipfs stats bw --proto=/ipfs/dht
Bandwidth
TotalIn: 5.1 GB
TotalOut: 5.4 GB
RateIn: 14 kB/s
RateOut: 1.8 kB/s
ipfs@ipfs:~$ ipfs stats bw --proto=/ipfs/bitswap
Bandwidth
TotalIn: 7.1 GB
TotalOut: 137 MB
RateIn: 2.4 kB/s
RateOut: 0 B/s

It's now much lower then before, probably because I've set swarm address filters for non-public IP blocks after having been warned by my hosting provider (same as #1226).

@whyrusleeping
Copy link
Member

The bitswap numbers are surprisingly high. How do the bitswap TotalIn numbers compare to the amount of data you've downloaded with ipfs?

@d10r
Copy link

d10r commented Oct 14, 2016

I haven't downloaded anything during that time.
New stats:

Bandwidth
TotalIn: 11 GB
TotalOut: 208 MB
RateIn: 3.0 kB/s
RateOut: 0 B/s

Possibly related: Few weeks ago I tried https://github.com/davidar/ipfs-maps, resulting in the import of some tile files to my ipfs node.
So there is content on the node which could in theory be fetched by others. I don't however believe that to be the case. How can I check this?

@gsf
Copy link

gsf commented Oct 14, 2016

Similar stats for my node restarted a few days ago. Nothing downloaded. Only thing pinned is that quick-start directory (QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG).

# ipfs stats bw --proto=/ipfs/bitswap
Bandwidth
TotalIn: 13 GB
TotalOut: 481 kB
RateIn: 3.6 kB/s
RateOut: 0 B/s

@Kubuxu Kubuxu added the status/ready Ready to be worked label Nov 28, 2016
@whyrusleeping
Copy link
Member

whyrusleeping commented Sep 3, 2017

This should be better in the upcoming 0.4.11 release. Bitswap sessions reduces the amount of wantlist entries that bitswap broadcasts to everyone

@chilarai
Copy link

It still consumes a lot of bandwidth. When will the 0.4.11 be released?

@bogdanbiv
Copy link

@chilarai I will also be retesting on 0.4.13 (this is more of a self-reminder).

@Stebalien
Copy link
Member

With IPFS 0.4.13 and dhtclient off, I've been seeing about 5KiB (overestimate) in background traffic. That should sum to about ~3GiB of traffic per week.

@chilarai
Copy link

chilarai commented Dec 3, 2017

Thanks @bogdanbiv @Stebalien

@Stebalien Stebalien added status/deferred Conscious decision to pause or backlog and removed status/ready Ready to be worked labels Dec 18, 2018
@eingenito
Copy link
Contributor

Closing this issue as old/fixed. Please reopen if you are still seeing the same behavior.

@Stebalien
Copy link
Member

It's still bad but yeah, this issue isn't directly actionable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) status/deferred Conscious decision to pause or backlog topic/dht Topic dht
Projects
None yet
Development

No branches or pull requests