catch limited bandwidth issues sooner #999

totaam · 2015-10-12T12:38:01Z

Got some logs which show:

window[2].damage.out_latency.90p : 34
window[2].damage.out_latency.avg : 44
window[2].damage.out_latency.cur : 2
window[2].damage.out_latency.max : 1391
window[2].damage.out_latency.min : 0

Which means that it takes around 44ms to compress and send the packet out to the network layer, often less.
Except that in some cases it can take 1391ms!!

There is another one, which isn't quite as bad:

window[2].damage.out_latency.90p : 324
window[2].damage.out_latency.avg : 119
window[2].damage.out_latency.cur : 25
window[2].damage.out_latency.max : 408
window[2].damage.out_latency.min : 1

At that point the UI became sluggish, about 0.5s behind the actual actions.

Not entirely sure what we should be doing here: by the time the OS is pushing back to us, it is too late already and things will be slow because there isn't enough bandwidth to service us.

Maybe we can watch the "damage out latency" more carefully and immediately increase the batching delay to prevent further degradation?

The text was updated successfully, but these errors were encountered:

totaam · 2015-10-23T03:02:09Z

afarr forwarded some links:

Linux: https://calomel.org/network_loss_emulation.html
Windows: http://jagt.github.io/clumsy/index.html
OSX: http://www.slowyapp.com/

totaam · 2015-10-25T05:29:13Z

2015-10-25 05:29:13: antoine commented

Re: linux traffic control via tc.
Just as I remembered it, the documentation is absolutely awful.

First, your need to install kernel-modules-extra: Is Traffic Control (tc) broken in Fedora 17? (Bug 823316 - unable to simulate drops with tc / netem),

The documentation found at the linux foundation is incomplete: [http://www.linuxfoundation.org/collaborate/workgroups/networking/netem]: when trying to add the latency, you may get this totally unhelpful message: RTNETLINK answers: No such file or directory (which file would that be? no files involved here at all!)
You need to:
modprobe sch_netem
Added difficulty: for testing, it is much easier to run everything on the same system.
Unfortunately, even when using the system's public network IP, the network subsystem will take a shortcut and route through the loopback device.
So you have to apply the rules there, ie:
tc qdisc add dev lo root netem delay 100ms 50ms 25%
And remember to remove them when you're done as this will interfere with lots of things..
tc qdisc del dev lo root netem
For the record, some alternatives:

[http://lartc.org/wondershaper/] can only limit bandwidth, not introduce latency

ip_relay a barebones tcp shaping proxy in perl

Bandwidth Limiting HOWTO - sounds like it should be relevant, but it isn't: hopelessly out of date

using iptables

On OSX and FreeBSD, there's also (obviously) ipfw:

simulate a slow internet connection

Throttling Bandwidth on OS X

And in OSX Lion onwards with xcode: Network Link Conditioner

totaam · 2015-12-09T14:32:15Z

I have not found a simple solution to this problem - not one that can be merged this late in the release cycle. Re-scheduling. (hopefully some of the changes can be backported).

But I did find a huge bug in the process: r11376. (backported in 11380).

totaam · 2016-03-24T09:15:51Z

r12200 (12202 for v0.16.x branch) should prevent the line jitter from causing drastic changes in the base batch delay
r12158 (12196 for v0.16.x branch) makes it possible to tune the number of soft-expired sends - as this may make things worse on bandwidth constrained links

totaam · 2016-04-14T05:08:25Z

2016-05-13 00:21:31: afarr commented

Just as a note (as much so I will remember &/or be able to find more easily as for anyone else's benefit), I've managed to get some other tc functions to work as well: loss, reorder, delete all, and list active (examples above for an eth0 device).

To list rules: tc -s qdisc ls dev eth0.

To add loss: tc qdisc add dev eth0 root netem loss 1%, or tc qdisc add dev eth0 root netem loss 2% 40% to make it more jittery.

To add reorder: tc qdisc add dev eth0 root netem reorder 2%, or tc qdisc add dev eth0 root netem reorder 2% 50% to make it more jittery.

To delete all the tc rules: tc qdisc del dev eth0 root.

totaam · 2016-08-08T07:24:42Z

The low-level network code is a bit messy, in large part because of win32 and the way it (doesn't) handle blocking sockets...

r13270 ensures we don't penalise win32 clients (workaround is now only applied to win32 shadow servers), backported in 13271
r13272: code refactoring / cleanup
r13273: more detailed packet accounting

At the moment, we detect the network bottleneck because the network write call takes longer to return, handling WSAEWOULDBLOCK and socket.timeout would be more explicit.
Maybe we shouldn't be using blocking sockets? Or maybe reads can be blocking but it would be useful if writes were not so that we could detect when the network layer cannot handle any more data. (assuming that we can distinguish)
Or maybe we need to use different code altogether for win32 and posix?

2017-11-16 14:31:08: antoine uploaded file `log-queue-to-send.patch` (1.3 KiB)

logs immediate queue to send delays

totaam · 2017-11-16T17:51:51Z

Simple way to reproduce the problems:

trickle -d 100 -u 100 xpra ...

Launch glxgears and then enlarge the window, the sudden increase in bandwidth consumption causes a spike in send latency:

19:54:44,796 queue to send delay (start):   0ms
19:54:44,797 queue to send delay (end)  :   0ms
19:54:44,800 queue to send delay (start):   0ms
19:54:44,800 queue to send delay (end)  :   0ms
19:54:44,807 queue to send delay (start):   0ms
19:54:44,894 queue to send delay (end)  :  87ms
19:54:45,189 queue to send delay (start):   0ms
19:54:45,189 queue to send delay (end)  :   0ms
19:54:45,193 queue to send delay (start):   0ms
19:54:45,194 queue to send delay (end)  :   1ms
19:54:45,200 queue to send delay (start):   0ms
19:54:45,285 queue to send delay (end)  :  85ms
19:54:45,552 queue to send delay (start):   0ms
19:54:46,001 queue to send delay (end)  : 448ms
19:54:46,001 queue to send delay (start): 353ms
19:54:46,167 queue to send delay (end)  : 519ms
19:54:46,167 queue to send delay (start): 350ms
19:54:46,201 queue to send delay (end)  : 384ms
19:54:46,201 queue to send delay (start):  88ms
19:54:46,242 queue to send delay (end)  : 129ms
19:54:46,449 queue to send delay (start):   0ms
19:54:46,488 queue to send delay (end)  :  39ms
19:54:46,867 queue to send delay (start):   0ms
19:54:46,867 queue to send delay (end)  :   0ms
19:54:47,124 queue to send delay (start):   0ms
19:54:47,124 queue to send delay (end)  :   0ms

The heuristics quickly adapt to hitting the ceiling in this case: it takes under a second to go back to no delay at all (probably thanks to the batch delay adjusting dynamically with the backlog).
We should be able to spot the first packet that takes a long time to push out and raise the batch delay immediately.
When the send speed is restricted, its value ends up being pretty close to the actual line constraints, so we could implement a soft bandwidth limit, detected at runtime.

In other cases, things take longer to settle:

22:23:34,639 queue to send delay (end)  :   0ms for 12KB using   h264: 35205KB/s
22:23:34,659 queue to send delay (start):   0ms
22:23:34,782 queue to send delay (end)  : 123ms for 12KB using   h264: 99KB/s
22:23:34,782 queue to send delay (start): 103ms
22:23:34,783 queue to send delay (end)  : 104ms for 12KB using   h264: 121KB/s
22:23:34,783 queue to send delay (start):  83ms
22:23:34,913 queue to send delay (end)  : 213ms for 12KB using   h264: 60KB/s
22:23:34,914 queue to send delay (start): 193ms
22:23:35,046 queue to send delay (end)  : 325ms for 13KB using   h264: 40KB/s
22:23:35,046 queue to send delay (start): 306ms
22:23:35,176 queue to send delay (end)  : 436ms for 12KB using   h264: 29KB/s
22:23:35,176 queue to send delay (start): 416ms
22:23:35,308 queue to send delay (end)  : 548ms for 13KB using   h264: 23KB/s
22:23:35,308 queue to send delay (start): 527ms
22:23:35,436 queue to send delay (end)  : 656ms for 12KB using   h264: 19KB/s
22:23:35,436 queue to send delay (start): 636ms
22:23:35,551 queue to send delay (end)  : 750ms for 11KB using   h264: 15KB/s
22:23:35,551 queue to send delay (start): 700ms
22:23:35,703 queue to send delay (end)  : 852ms for 15KB using   h264: 17KB/s
22:23:35,703 queue to send delay (start): 789ms
22:23:35,852 queue to send delay (end)  : 938ms for 14KB using   h264: 15KB/s
22:23:35,852 queue to send delay (start): 845ms
22:23:36,015 queue to send delay (end)  : 1007ms for 16KB using   h264: 16KB/s
22:23:36,015 queue to send delay (start): 907ms
22:23:36,174 queue to send delay (end)  : 1066ms for 15KB using   h264: 14KB/s
22:23:36,174 queue to send delay (start): 945ms
22:23:36,300 queue to send delay (end)  : 1071ms for 12KB using   h264: 11KB/s
22:23:36,608 queue to send delay (start):   0ms
22:23:36,632 queue to send delay (end)  :  24ms for 2KB using   h264: 97KB/s
22:23:36,661 queue to send delay (start):   0ms
22:23:36,661 queue to send delay (end)  :   0ms for 2KB using   h264: 9176KB/s

And in the meantime, packets (and that's any packet, not just picture updates...) can take a second to send.. so the user experience suffers.
The difficulty here is that during those first ~120ms, we may have already encoded and queued another 6 packets...
We may get a signal in the form of a socket timeout or of partial send, but this would be caught very deep in the network layer, far from where we would be able to do anything about it.
So I have a patch that tries to deal with this by scheduling very short timers, that we cancel if the send completes in time. This timeout value is still TBD, and may vary from OS to OS... testing will tell.
This at least prevents things from getting completely out of control, but it does create some stuttering as it overcompensates. Synchronizing via the UI thread may not be the right thing to do here (too slow / expensive).
Also, it turns any form of congestion into an increase in batch delay, which may not be the right thing to do if the send queue is busy (could be another window), as this is too drastic.
Combining this approach with a soft bandwidth limit could smooth things out better.

totaam · 2017-11-17T12:55:02Z

2017-11-17 12:55:02: antoine uploaded file `bandwidth-congestion-handling.patch` (12.2 KiB)

work in progress patch

totaam · 2017-11-17T13:05:24Z

Some preparatory / related work:

r17447: remove "send-speed" batch factor (unreliable and superseded by this ticket)
r17448 + r17449: code cleanup / tidy up

The patch above applies to r17449.

Still TODO:

seems that a race condition can cause all screen updates to stop!
verify that this change doesn't negatively affect full-speed connections (extra calculations and branches, timers, etc)
verify on other OS, other types of connections (udp, ssh, etc)
expose new data via "xpra info"
derive the soft bandwidth limit
try harder to avoid using png: when limiting the bandwidth to 1Mbps (see re-implement bandwidth constraint option #417), a single medium size png frame can take 5 seconds to send! (~600KB)
need to adjust speed and quality auto-tuning with congestion events, and include (soft) bandwidth limit in the heuristics
move the encode thread pause code: it applies to damage packets from all window sources since they all share the same thread (generalize the code so source doesn't need to know about the damage sequence)
clipboard packets also use the encode thread and generally don't need throttling (and throttling them can cause problems.. priority queues?) - same for cleanup functions, those don't need to wait
ideally: get rid of closures from queue_damage_packet
some constants should be derived: how long we pause sending during congestion, how long a send is expected to take

totaam · 2017-11-18T18:00:34Z

2017-11-18 18:00:34: antoine uploaded file `bandwidth-congestion-handling-v6.patch` (33.9 KiB)

updated patch

totaam · 2017-11-18T18:14:07Z

Much improved patch attached:

adds some xpra info (more needed)
detects bandwidth limit, moves it to source so it applies to all windows
discounts speed and quality when we detect congestion
tries harder not to use png

Most work items from comment:16 remain, and also some new ones:

split the patch and merge the bandwidth detection first, fix cystats
we should support bandwidth-limit=none to turn off all congestion code, even detected bandwidth - without using env vars
try harder to stop queuing more packets when we already have late_packets
bandwidth budget: evaluate the compressed frame size before proceeding? (on very limited bandwidth setup, each frame may use 25% or more of the bandwidth, allowing it through when we get just under 100% of the budget may still be too much)
auto-refresh kicks in because we delay things so much that we never reach the code that re-schedules it... same for get_refresh_subregion_encoding
initial speed and quality should be lower when the bandwidth-limit is low
cythonize more? use numpy for replacing deque? see Ring buffers in Python/Numpy
maybe the encode queue logic could be re-worked to support multi-queues per window, to try to provide fairer queuing

totaam · 2017-11-19T07:19:29Z

Available bandwidth detection added in r17452.
This alone makes a huge improvement in user experience on bandwidth constrained links.

totaam · 2017-11-19T08:08:15Z

2017-11-19 08:08:15: antoine uploaded file `bandwidth-congestion-handling-v7.patch` (21.7 KiB)

updated patch to apply to r17452

totaam · 2017-11-19T18:10:37Z

More reliable and accurate bandwidth limit auto-detection in r17455.

Main todo items for this release:

lots of testing
try harder to avoid png: auto-refresh, video refresh, full frame updates can all trigger it...
adjust speed and quality with congestion events and bandwidth limit, initial settings should be lower? just when bandwidth limit is set?
try harder to delay regions when we hit congestion, don't queue any more!

totaam · 2017-11-20T09:44:05Z

Updates:

r17456: when we detect bandwidth problems: lower speed and quality, increase batch delay
r17458: also slow down the refresh timer

The big remaining problem: a single png frame can kill the bandwidth...
And we can trigger it from quite a few places. Even a jpeg at high quality can use up the whole budget for several seconds.
Then we also end up queuing too many screen updates. Maybe we can inspect the size of the queue and give up on it?
We should also switch to h264 more quickly.

totaam · 2017-11-20T16:09:26Z

2017-11-20 16:09:26: antoine uploaded file `bandwidth-hacks.patch` (12.4 KiB)

with these hacks, it becomes possible to run glxspheres on a 1Mbps connection

totaam · 2017-11-23T06:13:53Z

Updates:

r17459, r17460 + r17465 fixup: fix cystats tests, add info attributes
r17461: start at a lower quality and don't update the values until we have processed a few frames
r17462 prevent slow decode speeds from making us use high encoding speed too often (ie: png can be quite slow)
r17466: honour fixed speed and fixed quality immediately (backport in 17472)
re-add webp codec #1694: r17468: re-add webp (r17467 for html5, r17469 win32 build fixup), r17471: use webp more often

totaam · 2017-11-23T07:58:28Z

2017-11-23 07:58:28: antoine uploaded file `delay-refresh.patch` (3.9 KiB)

delay refresh

totaam · 2017-11-24T15:41:32Z

More updates:

r17480 change auto-refresh handling to avoid scheduling lossless updates when we're still processing or sending other updates (r17481 for video regions)
r17482: take bandwidth limit value into account for calculating auto-refresh delay, speed, quality
r17488: fix: keep track of regions to refresh when video region changes
r17490: refresh timer fix

I can now run glxspheres at 4 to 8fps on a 1Mbps connection, without getting any stuttering!

If anything, we use less bandwidth than we should (~250Kbps). Being under the limit is good, as it allows the odd large frame to go through without causing too many problems, but this means that the bandwidth-limit detection code never fires, and maybe we could raise the quality or speed (framerate) a bit.

New related ticket for followup: #1700 "faster damage processing"

For very limited bandwidth situations, it would be nice to have a more progressive picture refresh, either flif (#992) or just raising the jpeg / webp quality up, only doing true lossless if we have time and bandwidth.

totaam · 2017-11-26T11:20:04Z

With all the latest improvements and fixes (r17491, r17493, r17515, r17516, etc) and running glxspheres for testing, the server ends up turning on b-frames (#800), video scaling and YUV420P subsampling. It then reaches 25fps on a 1Mbps connection!

I think this is as good as it is going to get for this release. We can improve it some more in #1700.

@maxmylyn: bear in mind that a lossless refresh will use up a few seconds' worth of bandwidth, this is unavoidable. We try to delay the refresh as much as possible, especially when we know the bandwidth is limited - this works better if we know in advance that there are limits (#417).
It's a balancing act between quality, bandwidth usage and framerate.

We need results from the automated tests to ensure that the performance has not regressed for the non-bandwidth-constrained case.

When debugging, we need: "-d refresh,compress,regionrefresh", and the usual "xpra info". And a reproducible test case that does not involve user interaction, as I've launched glxspheres literally thousands of times to get here.

Some links for testing bandwidth constraints:

netlimiter (win32) is quite easy to use, not sure how well it simulates network congestion
Limit incoming and outgoing bandwidth and latency in linux
Manually Throttle the Bandwidth of a Linux Network Interface
How to add latency and bandwidth-limit interface using tc?
Advanced traffic control

totaam · 2017-11-28T16:30:07Z

To try to mitigate the problems with the auto-refresh killing the bandwidth, r17540 switches to "almost-lossless" (using high quality lossy) when bandwidth is scarce.
Note: in some cases, webp lossless will compress better than jpeg lossy... it really depends on the type of content. #1699 may help with those decisions.

totaam · 2017-12-11T23:53:24Z

2017-12-11 23:53:24: maxmylyn commented

Now that I'm not dealing with a horrible Fedora 25->26 upgrade I've had time to finally catch up on my tickets. While testing #1638 by watching some video files and reading Wikipedia at the same time (partly to pass the time, partly to add some extra strain on the session) I noticed that the session aggressively lowers quality to the point that it became almost impossible to read after scrolling or switching tabs unless I let the session sit for a couple seconds. I noticed the aggressive quality drops, but I thought I was going crazy when I thought my overall performance had tanked sometime last week - but the session was still responsive, so it wasn't lagging or drawing spinners to indicate a slow server. I went perusing through the timeline to see if anything had changed in the last couple weeks and I found the link to this ticket in r17540.

So, I rolled my server back to r17539 and rebuilt it, and re-launched a session - and lo and behold everything was pretty again. While the quality can drop if you aggressively scroll (as expected) generally the server refreshes much quicker and to the layman's eye the session looks and behaves much nicer.

So I guess I have a half question and a half response to this - would something in r17540 cause the server to aggressively pursue lower quality even when bandwidth wouldn't be scarce? My server is on a full gigabit (and fully wired, albeit kinda lengthy, Ethernet) LAN so there's no reason why it should think that it has limited bandwidth. Or am I completely wrong? (totally likely)

Also, unrelated, I'll be up physically at the test box to yet again run a Fedora upgrade to 26.

totaam · 2017-12-12T05:43:42Z

I noticed that the session aggressively lowers quality to the point that it became almost impossible to read after scrolling or switching tabs unless I let the session sit for a couple seconds

Aggressively lowering the quality is one thing, but this sounds like the auto-refresh is kicking in too slowly.

I found the link to this ticket in r17540

Well, this ticket has been set to "critical" and was assigned to you for the 2.2 release, that would be a more obvious way.

I rolled my server back to r17539 and rebuilt it...

Any other later revision than r17539? There are ~68 revisions since then.

would something in r17540 cause the server to aggressively pursue lower quality

No.
It only switches to non-lossless auto-refresh if both of those conditions are met:

current quality is lower than 70%
there were congestion events recently, or the bandwidth-limit (measured or specified) is lower than 1Mbps

This would manifest itself as an auto-refresh that is lossy but near lossless, this does not affect how quickly it kicks in.

It does nothing to the "quality" setting, and AFAICR - the only changes to the quality setting calculations only apply when there are bandwidth constraints (r17456).

You didn't include any debug information (ie: "xpra info" would be a start, also "-d refresh,regionrefresh" since this a refresh issue), so we can't say for sure if this is at play here.

The changesets that do change how we handle auto-refresh are: r17491, r17481, r17480, r17458, r17456.

In particular, r17481+r17480 change when we cancel and schedule the auto-refresh, to ensure that we only schedule one after we have sent a lossy packet (and not just at the time the screen update is queued up).

totaam · 2017-12-12T19:23:18Z

@maxmylyn commented:

It only switches to non-lossless auto-refresh if both of those conditions are met:
...
It does nothing to the "quality" setting, and AFAICR - the only changes to the quality setting calculations only apply when there are bandwidth constraints (r17456).

Okay I had a feeling that I was mistaken but wanted to make sure, thanks for confirming that.

So I did what I should have done yesterday and turned on the OpenGL paint boxes to investigate what's actually going on rather than blindly guessing. What I'm seeing is that as of 17607 it's painting an entire Firefox window with h264 (that is still the blue one, right?) under certain situations. In certain situations it makes - like when I switch from tab to tab rather quickly or click through a series of Wikipedia links quickly. But there are other cases where it doesn't make sense - the two most notably:

Hovering over the citations on Wikipedia pages (causes a small pop up to appear) causes rather large h264 paints - in some cases full window
Clicking on a different window and back to Firefox causes a full window refresh with h264

After some further investigation - it's much easier to repro what I'm seeing by opening an Xterm and running for i in {1..500}; do dmesg; done and then let the Xterm sit for a second or two. With the OpenGL paint boxes (and the refresh logs running) it becomes obvious to me that it's the autorefresh that's painting as lossy.

So before I walk through the revisions you listed, try a few other things, and then get some logs / Xpra info, do you want to create a new ticket or is that related enough to this one to continue here? (In case you're still awake, but I think that's highly unlikely)

totaam · 2018-04-06T00:25:27Z

2018-04-06 00:25:27: maxmylyn commented

Upped my server and client to r18990:

I get low bandwidth warnings fairly regularly, and the limits almost immediately drop my bandwidth down to the minimum. I'll attach requested logs in a second or two.

In this log snap, I connect, and almost immediately the level drops and I never see the bandwidth go above ~3megabits. However towards the end all of a sudden the level increases and everything gets much clearer. Which is weird, but I'm glad I had the logs running to catch it. Hopefully it helps.

EDIT: In the log sample I'm attaching, I started a server with Firefox and an Xterm as start childs - I connected, opened a couple tabs and loaded up an HTML5 video in one and switched between it and a couple text sites a few times while the logs were running.

Before I give this back, I'd like to request the XPRA_BANDWIDTH_DETECTION and XPRA_MIN_BANDWIDTH flags get turned into control commands and/or at the very least arguments that can be fed into an xpra start. I'm sure there are lots of people who would like to either turn the heuristics off or at the very least leave it on auto but set it to a higher minimum on the fly. I know that there are a few people here who would definitely be interested.

totaam · 2018-04-06T00:26:23Z

2018-04-06 00:26:23: maxmylyn uploaded file `999dbandwidthstats.log` (1434.0 KiB)

started a session with an Xterm and Firefox, connected, and opened a few tabs and watched an HTML5 video

totaam · 2018-04-06T17:28:49Z

2018-04-06 17:28:49: antoine commented

... the limits almost immediately drop my bandwidth down to the minimum ...
Bugs squashed:

r18999: this would have caused the bandwidth detection to go to the minimum value instead of starting disabled. 1Mbps! no wonder you were seeing all sorts of problems

r19000 prevented limits from being raised again once the congestion subsided

Please try again.

I'd like to request ...
Let's first iron out the bugs before deciding what should and should not be configurable, and where. Environment variables are suitable places for the more obscure configuration options, ideally bandwidth detection can be made reliable enough that those options can remain hidden.
FYI, they can already be fed to xpra start using:

the environment variable directly

the command line switches for environment variables

same switches via configuration files

totaam · 2018-04-19T19:11:44Z

2018-04-19 19:11:44: maxmylyn commented

Okay I've finally been able to carve out enough time to address some of the more important tickets:

My system is currently sitting at r19020 for client and server, for reference.

As it is now, the bandwidth detection is working much better - I get a little push back upon initial connection, but it smooths out quite a bit after a few seconds, which is good.

I'm going to ask for a control command or start flag to disable it again, because I know that the first thing I'm going to be asked is for exactly that.

totaam · 2018-04-22T09:16:05Z

2018-04-22 09:16:05: antoine commented

I get a little push back upon initial connection, but it smooths out quite a bit after a few seconds, which is good.
What is "push back"?
Do you have the log?

totaam · 2018-05-03T21:57:40Z

2018-05-03 21:57:40: maxmylyn commented

The pushback I'm seeing is the heuristics withhold a lot of the initial bandwidth for the first couple of seconds. Upon initial connection, everything is blurry and slow to respond (as is typical during network hiccups), and bandwidth monitors show it consistently using <1-2 mbps. Until a few seconds later.

I'll attach a log, but I'm not of the opinion that this behavior is entirely undesirable. It might not be a bad idea to put some small limits on the initial connection if the network can't handle that much traffic that instantaneously. I think we might be better served by slightly upping the min-bandwidth default. But I'll leave that decision to you.

For reference: I've upped my systems to r19184. However, I'm still running Fedora 26, I should carve out a day very soon to upgrade.

totaam · 2018-05-03T21:58:21Z

2018-05-03 21:58:21: maxmylyn uploaded file `999slowstart.log` (175.4 KiB)

Initial connection logs with -d bandwidth,stats showing the initial pushback described in the comments

totaam · 2018-05-06T18:31:06Z

2018-05-06 18:31:06: antoine commented

The pushback I'm seeing is the heuristics withhold a lot of the initial bandwidth for the first couple of seconds.
There are different heuristics that cause this.
(..)
I think we might be better served by slightly upping the min-bandwidth default

For the initial quality and speed (default is 40), r19222 allows you to change those if you wish, ie: XPRA_INITIAL_QUALITY=80 XPRA_INITIAL_QUALITY=80 xpra start ...

In your log there were 3 congestion events in the first second, and none after:
2018-05-03 13:52:36,151 Python/Gtk2 Linux Fedora 26 TwentySix x11 client version 2.3-[r19184](../commit/46637ba9afc80292579cc78f8f7b6d6d1f159cbe) 64-bit
2018-05-03 13:52:36,913 record_congestion_event(late-ack for sequence      4: late by 143ms, target latency=105 ((1, 4, 0, 100)), 4940, 0)
2018-05-03 13:52:36,976 record_congestion_event(late-ack for sequence      2: late by  52ms, target latency=103 ((1, 2, 0, 100)), 5133, 0)
2018-05-03 13:52:37,012 record_congestion_event(late-ack for sequence      5: late by  43ms, target latency=105 ((1, 4, 0, 100)), 2940, 0)
So r19224 increases the ack tolerance for the first few seconds after the client connects, or after a window reset.
This should prevent those relatively small screen updates from skewing the congestion detection.

Related: r19223 also raises the minimum bandwidth we detect from just 1Mbps to 5Mbps, most users should have at least that nowadays, right?
This one is unlikely to help with the initial bandwidth though: it kicks in when we detect bandwidth congestion and when things just start we don't have enough data yet to really trigger it.

totaam · 2018-05-10T21:31:29Z

2018-05-10 21:31:29: maxmylyn commented

Upped my server and client to r19268:

I've been playing around with this for the last couple days (mostly using Xpra to write some documentation for something else entirely), and the changes you mentioned in comment:58 have wrapped up the last issue. As far as I'm concerned, this ticket is pretty much done. I'm gonna pass to you in case there's anything left you want to do.

totaam · 2018-06-01T12:43:11Z

Found another wrapper tool which may be useful for simulating various network conditions: comcast

totaam · 2018-07-08T10:16:44Z

2018-07-08 10:16:44: antoine commented

Follow up: #1855, #1912, #2156, #2421, #2812

totaam closed this as completed Jun 1, 2018

totaam added the v0.14.x label Jan 22, 2021

This was referenced May 22, 2021

better TCP_NODELAY handling: only use it when it is useful #619

Closed

re-implement bandwidth constraint option #417

Closed

totaam mentioned this issue Apr 12, 2022

handle connection issues, session events, network drop outs, power events, etc #540

Closed

catch limited bandwidth issues sooner #999

catch limited bandwidth issues sooner #999

Comments

totaam commented Oct 12, 2015 • edited Loading

totaam commented Oct 23, 2015 • edited Loading

totaam commented Oct 25, 2015

2015-10-25 05:29:13: antoine commented

totaam commented Dec 9, 2015 • edited Loading

totaam commented Mar 24, 2016 • edited Loading

totaam commented Apr 14, 2016 • edited Loading

totaam commented May 13, 2016

2016-05-13 00:21:31: afarr commented

totaam commented Aug 8, 2016 • edited Loading

totaam commented Sep 26, 2016 • edited Loading

totaam commented Feb 20, 2017 • edited Loading

totaam commented Apr 14, 2017 • edited Loading

totaam commented Apr 22, 2017 • edited Loading

totaam commented Nov 16, 2017

2017-11-16 14:31:08: antoine uploaded file log-queue-to-send.patch (1.3 KiB)

totaam commented Nov 16, 2017 • edited Loading

totaam commented Nov 17, 2017

2017-11-17 12:55:02: antoine uploaded file bandwidth-congestion-handling.patch (12.2 KiB)

totaam commented Nov 17, 2017 • edited Loading

totaam commented Nov 18, 2017

2017-11-18 18:00:34: antoine uploaded file bandwidth-congestion-handling-v6.patch (33.9 KiB)

totaam commented Nov 18, 2017 • edited Loading

totaam commented Nov 19, 2017 • edited Loading

totaam commented Nov 19, 2017

2017-11-19 08:08:15: antoine uploaded file bandwidth-congestion-handling-v7.patch (21.7 KiB)

totaam commented Nov 19, 2017 • edited Loading

totaam commented Nov 20, 2017 • edited Loading

totaam commented Nov 20, 2017

2017-11-20 16:09:26: antoine uploaded file bandwidth-hacks.patch (12.4 KiB)

totaam commented Nov 23, 2017 • edited Loading

totaam commented Nov 23, 2017

2017-11-23 07:58:28: antoine uploaded file delay-refresh.patch (3.9 KiB)

totaam commented Nov 24, 2017 • edited Loading

totaam commented Nov 26, 2017 • edited Loading

totaam commented Nov 28, 2017 • edited Loading

totaam commented Dec 11, 2017

2017-12-11 23:53:24: maxmylyn commented

totaam commented Dec 12, 2017 • edited Loading

totaam commented Dec 12, 2017 • edited Loading

totaam commented Apr 6, 2018

2018-04-06 00:25:27: maxmylyn commented

totaam commented Apr 6, 2018

2018-04-06 00:26:23: maxmylyn uploaded file 999dbandwidthstats.log (1434.0 KiB)

totaam commented Apr 6, 2018

2018-04-06 17:28:49: antoine commented

totaam commented Apr 19, 2018

2018-04-19 19:11:44: maxmylyn commented

totaam commented Apr 22, 2018

2018-04-22 09:16:05: antoine commented

totaam commented May 3, 2018

2018-05-03 21:57:40: maxmylyn commented

totaam commented May 3, 2018

2018-05-03 21:58:21: maxmylyn uploaded file 999slowstart.log (175.4 KiB)

totaam commented May 6, 2018

2018-05-06 18:31:06: antoine commented

totaam commented May 10, 2018

2018-05-10 21:31:29: maxmylyn commented

totaam commented Jun 1, 2018 • edited Loading

totaam commented Jul 8, 2018

2018-07-08 10:16:44: antoine commented

totaam commented Oct 12, 2015 •

edited

Loading

totaam commented Oct 23, 2015 •

edited

Loading

totaam commented Dec 9, 2015 •

edited

Loading

totaam commented Mar 24, 2016 •

edited

Loading

totaam commented Apr 14, 2016 •

edited

Loading

totaam commented Aug 8, 2016 •

edited

Loading

totaam commented Sep 26, 2016 •

edited

Loading

totaam commented Feb 20, 2017 •

edited

Loading

totaam commented Apr 14, 2017 •

edited

Loading

totaam commented Apr 22, 2017 •

edited

Loading

2017-11-16 14:31:08: antoine uploaded file `log-queue-to-send.patch` (1.3 KiB)

totaam commented Nov 16, 2017 •

edited

Loading

2017-11-17 12:55:02: antoine uploaded file `bandwidth-congestion-handling.patch` (12.2 KiB)

totaam commented Nov 17, 2017 •

edited

Loading

2017-11-18 18:00:34: antoine uploaded file `bandwidth-congestion-handling-v6.patch` (33.9 KiB)

totaam commented Nov 18, 2017 •

edited

Loading

totaam commented Nov 19, 2017 •

edited

Loading

2017-11-19 08:08:15: antoine uploaded file `bandwidth-congestion-handling-v7.patch` (21.7 KiB)

totaam commented Nov 19, 2017 •

edited

Loading

totaam commented Nov 20, 2017 •

edited

Loading

2017-11-20 16:09:26: antoine uploaded file `bandwidth-hacks.patch` (12.4 KiB)

totaam commented Nov 23, 2017 •

edited

Loading

2017-11-23 07:58:28: antoine uploaded file `delay-refresh.patch` (3.9 KiB)

totaam commented Nov 24, 2017 •

edited

Loading

totaam commented Nov 26, 2017 •

edited

Loading

totaam commented Nov 28, 2017 •

edited

Loading

totaam commented Dec 12, 2017 •

edited

Loading

totaam commented Dec 12, 2017 •

edited

Loading

2018-04-06 00:26:23: maxmylyn uploaded file `999dbandwidthstats.log` (1434.0 KiB)

2018-05-03 21:58:21: maxmylyn uploaded file `999slowstart.log` (175.4 KiB)

totaam commented Jun 1, 2018 •

edited

Loading