Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for high-latency connections #724

Closed
robertfoss opened this issue Mar 9, 2016 · 3 comments
Closed

Better support for high-latency connections #724

robertfoss opened this issue Mar 9, 2016 · 3 comments

Comments

@robertfoss
Copy link

While doing remote backups, borgbackup does not perform particularly well over high-latency links.
Bandwidth usage remains low and is fully limited by be the latency of the link.

Presumably borg on does one operation at a time, which would be why the bandwidth usage is so low.

Is there an option for parallelization or is this a bug?

@ThomasWaldmann
Copy link
Member

Borg is currently single-threaded, but there is an experimental (and not usable yet) multithreading branch (+ PR) in the repo.

I don't know precisely what issue you had as the first post lacks a bit practical details, but I think current lack of parallelization is primarily a problem for first backup, when a lot of data is pushed and less for subsequent backups, when only the changes are processed. Well, except if a lot changes. ;)

In another ticket, there is also the idea of assembling segments locally and then transmitting them to remote - that would also save some roundtrips.

If you think there is anything else we could improve other than these, please add more details to make this ticket actionable.

@robertfoss
Copy link
Author

The problem to seems to be that none of my resources are saturated with borgbackup activities. Neither BW/CPU/storage are anything but very lightly loaded. Which means that the performance bottleneck is somewhere else. Which points to network latency being the culprit.

Multi-threading would be help the situation. But let's say we run 4 parallel threads doing what 1 thread is doing now, and we get a 4x performance increase from that. That would still only very lightly load the available resources.
A more clever mult-threading approach could of course have better performance characteristics.
Doing async networking I/O could probably improve performance too. The key for good performance for remote backups is utilizing the network resources to their fullest capacity and avoiding latency being a bottleneck.

If there are any test I could run or any details I could provide, please give me a heads up.

@ThomasWaldmann
Copy link
Member

You also see that for local backups. It is just because stuff is done sequential: read (wait), chunk, hash, compress, encrypt, store (wait) plus (at segment end) sync (wait).

The I/O wait times are not used otherwise currently, because there is just 1 thread.
Network latency might add more wait time here, but it is also there locally.

BTW, there is another ticket about bandwidth limiting. Just saying that maxing out the resources is not desirable for everybody / every scenario. Of course it would be nice if one could. ;)

I'm closing this for now as it is somehow duplicate to other tickets / other ongoing work, e.g.:

#37
#191
#631 + multithreading branch (buggy!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants