-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding the impact of latency on Libtorrent thoughput #2620
Comments
Current operational measurement scripts: https://github.com/vandenheuvel/libtorrent-latency-benchmark |
This is relevant for achieving onion routing with defence against traffic confirmation attacks. |
ToDo for next meet, from Libtorrent API docs + tuning:
|
Investigate |
Background http://blog.libtorrent.org/2015/07/slow-start/ and http://blog.libtorrent.org/2011/11/requesting-pieces/ Additional boost: asynchronous disk I/O Detailed solution : I’m unable to get more than 20Mbps with a single peer on a 140ms RTT link (simulated delay with no packet loss)..
|
ToDo, try to obtain more resilience against latency in Libtorrent with single seeder, single leecher. Plus read current research on traffic correlation attacks. The basics are covered here. Quote: 'recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side'. |
Strange observation that it takes 60 to 100 seconds for speed to pick up. 1 iteration to find the magic bottleneck... |
good! next bottleneck... |
The magic parameter setting are discovered now, resulting in 35MByte/s Libtorrent throughput. Please document your LXC containers. @qstokkink understands the details of tunnel community... |
You will probably want a repeatable experiment using the Gumby framework. You are in luck, I created a fix for the tunnel tests just a few weeks ago: https://github.com/qstokkink/gumby/tree/fix_hiddenseeding . You can use that branch to create your own experiment, extending the Once you have the experiment running (which is a good first step before you start modifying things - you will probably run into missing packages/libraries etc.), you can edit the TunnelCommunity class in the Tribler project. If you want to add delay to:
You can combine relaying nodes and the sending node into one by adding delay here (which does not include the exit node) |
1 Tribler seed + 1 Tribler leecher with normal Bittorrent with 200ms - 750ms latency is limited by congestion control. Read the details here. To push Tribler towards 35MB/s with added Tor-like relays we probably at some point this year need to see the internal state of the congestion control loop. ToDo for 2017: measure congestion window (cwnd) statistics during hidden seeding. |
We managed to do some new tests. We ran Tribler with the only the http API and libtorrent enabled. We found that the performance of libtorrent within Tribler is significantly worse than plain libtorrent. Below is a summary of results so far. Note that these values for a single seeder single leecher test.
Note that "magic" is the increasing of
|
@vandenheuvel Assuming all libtorrent versions are the same etc.: the only way Tribler interfaces with a libtorrent download is by retrieving its stats every second ( After writing that I found this in the libtorrent manual:
Even though this shouldn't be that bad, you could try and write a loop which gets the torrent handle status every second on your naked experiment and see how that affects things |
@vandenheuvel in addition, we are also processing all libtorrent alerts every second but I don't think this leads to much overhead actually. Could you try to disable the alert processing (by commenting out this line: https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L72)? |
Very impressive work guys:
Tribler shamefully collapses. Clearly something to dive into! Did the tip of fully disabling the stats+perhaps 5 second stats sampling lead to any results? btw, can you also expand this |
Our test results show a download speed of ~2.5 MB/s at 200 ms with our plain script as well, when we introduce the EPoll reactor into the code. This is similar to the results found in the previous tests with Tribler. However, tests with our plain script and Select reactor shows the original results that we have retrieved before introducing the reactor or even higher: a top speed of 30 MB/s. |
Interesting. Could you also try the normal Poll reactor (it is supposed to be faster than Select for large socket counts)? |
Strange enough, results are quite different between our script and Tribler. Summary:
It may take a little while for the download to come up to speed (~ 60 seconds), but after that the throughput is quite steady. Our next step will be profiling. |
32MByte / sec. So.. Python3 and 200ms latency results. This facinating mistery deepens. |
We just ran our script under both |
Due to an update for
All the below results are created by a modified script as well (200 ms):
Notes:
|
hmmm. so the lower non-script table is all Tribler? |
In the above post, all results are our own script. We retested everything non-Tribler. We're not sure what this change of results (especially the |
|
in the previous post, your description of how the congestion window is adjusted is correct. That's how it's supposed to work. I don't think there are any problems with it, but it sounds like you do. If so, would you mind elaborating on which behavior you think is wrong? Another (I think simpler) way of looking at that formula is this:
The final change to Now, this is made a bit more complicated by the slow-start logic in there, as well as the logic to detect whether the sender is not saturating the current cwnd, in which case we don't keep growing it indefinitely. |
@arvidn, I expect a widely variable delay caused by other connections maintained by intermediate peers. I would not expect latency added by these intermediate peers to have the distribution similar to that of the typical Internet router. These peers are regular PCs or seedboxes, with their connection always filled to the point of congestion, ruled by quite un-sophisticated queue management algorithms. Besides, there are always 2-3 of them in the way, producing a superposition of latency distributions. So, I assume the (almost) worst possible case - the normal distribution. As was noted by @shalunov, to be sure, I should get the distribution from a real tunnel connection on Tribler. |
@arvidn, regarding your question on LEDBAT Again, I'm no expert on differential equations and stability analysis, so I need to recheck everything several times. |
From these plots, it is obvious that we are not limited by the window size. Instead, we are limited by packet loss and overall instability of connection (and probably by buffer bloat).
Traditional congestion control and error correction do not work in these circumstances, and we are not going to invent our own. Instead, we can utilize Tribler peer network to simultaneously create several circuits to a single peer, for a single download. We could gradually create new circuits/connections until the download speed stops growing: that would signal that either leecher's or seeder's uplink bandwidth is saturated. |
Great progress again. Solid performance plots. With tokens we will move to online resource utilization of below 10%! There is strong evidence for that, please ensure to understand Figure 6 in our measurement study. Relays, seeds and exit nodes will be mostly idle when they demand payments. No premature optimizations please. Slowly we're are getting to the heart of the problems and fixing them. This is not a buffer bloat problem I believe, but a CPU overload problem. Just saw a "htop" of these boxes. We lack load management, rejection of excess request or anything at the exit nodes. These boxes are moving TeraBytes per day and are chronically overloaded, leading to slow servicing of requests. A simple heuristic of rejecting new circuits when overloaded (cpu >85%, core pegging) is likely to dramatically clear up everything. But first we need that token economy to work.. |
math model of oversupply economy: https://onlinelibrary.wiley.com/doi/epdf/10.1002/cpe.2856 |
That’s pretty heavy congestive+processing delay. The LEDBAT algorithm seems to correctly respond by slowing down as designed. Throwing more traffic at the overloaded relay nodes (with whatever mechanism—parameter adjustment or not) would not make them have more capacity. I would leave the parameters at defaults and figure out where the delays are coming from and why. |
Good news! ...arvidn finally did tests on a Windows OS and: There are a couple other posts by me on the same thread that may be relevant here, so I'll link to them to save searching: arvidn/libtorrent#3542 (comment) |
🤦 🤦♂️ 🤦♀️ |
@Seeker2 , thanks so much for this update! Please keep us informed about investigating the issue! |
@Seeker2 and @egbertbouman |
Old news for others to catch up: |
As this post demonstrates: This is especially painful news: |
Well, according to the qBittorrent post, the thing is at least reproducible. Maybe someone in our team could volunteer to spend around a month of their life fixing this. But that is unlikely to happen before September for we got much more pressing issues in Tribler right now, and basically everyone in the team is getting at least a month of vacation this summer. |
More issues found with uTP in libtorrent: |
Another example of uTP packet loss occurring even on non-Windows OSes: A partial workaround mentioned for Debian:
|
It's now nearing the end of September...a year later. Seeding torrents have become even more difficult for me due to indirect consequences from libtorrent's uTP-related problems arvidn/libtorrent#3542 (comment) As to the root of the uTP problems... more sinister causes need to be considered arvidn/libtorrent#7107 (comment) |
If someone is attempting to seed 1000's of torrents that are also being seeded on average by 10+ other seeds, overall latency my be noticeably increased due to this bug: |
Problem: Tor-like tunnels introduce significant latency.
Measure how 25ms to 1000ms of latency affects the Libtorrent throughput. Aim to understand how LEDBAT parameters can be set to maximize throughput. Create an experimental environment using containers. Use netem to create latency between two containers. Start seeder at one container, download across the connecting bridge with added latency.
Initial results: performance is severly efected by latency of just 50 ms.
The text was updated successfully, but these errors were encountered: