-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary chunks cache sync? #7278
Comments
What's going on is that the borg client doesn't cache archive chunks when it creates new archives. It only caches the data when it fetches it remotely after another host has added to the repo. I think this is correct behaviour with the commands I showed above, because host1 initially has no way to predict that the repo will be accessed by multiple hosts. So I think it's good that it didn't cache things as the cache can get large. But if you iterate the above commands, exchanging creates from host1 and host2, then it seems odd that the clients don't add to the cached chunks when they create new archives. It means that the client later fetches data that it at one point had locally, which is wasteful. So this is more of a feature request than a bug: if chunks.archive.d is in use and an archive is created, add the relevant info to the cache. I'm not sure whether this is trivial, or if the data would need to be massaged to be in the right format. (Also, I verified this behaviour with 1.2.3. Not sure why I was still running 1.2.0.) |
Correct, if one client is alone, it would never need that Sorry, I initially didn't precisely read the issue report, only had a quick glance (overlooking that you simulated 2 hosts). What you've seen is currently expected behaviour. |
Hmm, I am thinking about closing this as "works as expected":
|
We could regard the existence of a non-empty chunks.archives.d as a hint that multiple clients are accessing the repo, so we may as well save the data we already have locally. About the memory use, I guess it depends on whether we need to create an entirely new hashtable, or can just compress and write-out one that we already have. I don't know exactly what is stored or how it is created, or how big it is. |
@jdchristensen a yes, not-empty dir is a good indication. but we do not have that separate per-archive chunks index data when borg is creating an archive, it just does update the master chunks index (which contains the chunks index for all archives / for the repo). creating a one-archive chunks index would need maintaining a 2nd HT or creating a "diff" between updated master chunks index and previous master chunks index. |
Presumably a one-archive chunks index would be a lot smaller than the full chunks index, so we aren't really talking about 2x the memory, right? And if we're only creating it to write it to disk, we could create it in compact form, paying no attention to the hash function. Is there already code that does this when a client notices that it is missing chunks index data from one or more archives? |
2x is the worst case for a 100% deduplication rate (so the one-archive chunks index is the same size as the all-archives chunks index). "A lot smaller" might only happen if you have a lot of archives and a lot of changes between your archives. That chunks index archive is only used for the chunks index resync, nowhere else (IIRC). |
Have you checked borgbackup docs, FAQ, and open GitHub issues?
Yes. There's a chance this is related to #7274 , but I think it is probably different.
Is this a BUG / ISSUE report or a QUESTION?
BUG
System information. For client/server mode post info for both machines.
Ubuntu 22.04
Your borg version (borg -V).
borg 1.2.0. Also happens with master branch borg2.
Operating system (distribution) and version.
Ubuntu 22.04
Hardware / network configuration, and filesystems used.
N/A
How much data is handled by borg?
N/A
Full borg commandline that lead to the problem (leave away excludes and passwords)
See below.
Describe the problem you're observing.
When a host accesses a borg repo after another host has accessed it, sometimes some archives have their chunks cache synced even when they should already be in the cache. I first noticed it in #7277 (with borg2), where the line
shouldn't be there, since that archive was made with the same host. Then I reproduced it with borg 1.2.0, using the steps shown below.
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
In this case, both archives made on "host1" have their data fetched, even though we are currently on host1. See the two lines marked with "******". Also, the line marked with "<<<<<<" shows that borg thinks there is nothing locally cached. In #7277, only one gets synced that shouldn't, so it is less extreme there. Not sure what is going on.
The text was updated successfully, but these errors were encountered: