-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Files cache: use ctime instead of mtime #911
Comments
as we always backup file metadata, we are only interested in file contents changes (not also in file metadata changes). thus mtime fits better. |
Well, here lies the risk. Now I modify the file (without changing the size), and then use utimes to return the mtime back to what it was - now next run of borg would miss the changes. |
then you have shot yourself in the foot. :-P |
What if it was not me but:
|
well, borg is not meant to be a intrusion detection system. as far as a backup program is concerned, it would just not backup the hacked binary. even if we used ctime, an attacker could still modify the binary, then modify the borg cache and update the ctime there, so there is no safety against this. |
Besides "hackability" (detecting which is not our goal), mtime seems a better fit as we need the files cache to detect content changes (to re-chunk, if needed, but only if needed). So, isn't ctime the wrong timestamp anyway for this? ctime ctime is the inode or file change time. The ctime gets updated when the file attributes are changed, like changing the owner, changing the permission or moving the file to an other filesystem but will also be updated when you modify a file. mtime mtime is the file modify time. The mtime gets updated when you modify a file. Whenever you update content of a file or save a file the mtime gets updated. Most of the times ctime and mtime will be the same, unless only the file attributes are updated. In that case only the ctime gets updated. |
we do not want to detect file attribute changes like owner - borg stores them always anyway. so, close? |
As long as you ignore deliberate abusers that update a few bytes in the file (leaving hte size intact) and then reset the mtime to what it was (mtime is changeable by user, ctime is not), it's ok to close. |
I have had the debian package manager, on Ubuntu, replace files, setting the mtime back to what it was before. Since the file was the same size, rsync did not back it up (neither would have Borg). The other backup software I used that correctly used the ctime backed it up just fine. I'm not sure I see why this is a debate. Using ctime will always back up a file when it changes. Using mtime will usually back up a file that has changed. Why would we want to only do the correct thing most of the time. The cost, as mentioned above is possibly having to reread a file that had its metadata changed. Could we not somehow make this an option? For those that want the mostly-correct behavior, it could use mtime, and those of us that would actually like everything backed up, to use the ctime? |
@d3zd3z so, that sounds like a bit stupid behaviour of the package manager in the first place. did it also not change the inode number? because borg considers (mtime, size, inode) by default. |
At least dpkg will overwrite files if already present, so the inode number doesn't change. The only thing that was different on the file was the ctime. There are also revision control systems that set the mtime on files. At least those would be expected to put a time on the file that matched its contents, so if it ended up with the same mtime, it would also have the same contents. But there are lots of potential pieces of software out there, and it is hard to know what kinds of crazy things people could do. Even the 'touch' command can easily put an mtime back. |
Someone asked for when ctime/mtime are updated:
|
Additional information:
|
Just verified, OSX also returns mtime in the ctime field when mounting a filesystem that doesn't support a ctime. This means if we use 'ctime' on one of these filesystems, we would get the same behavior we have now (barring consistent inodes). |
added this to 1.1.0 milestone to keep it on the radar. we shouldn't do such fundamental changes at patch releases. it's already rather late considering we are at rc3 already, so we will only be able to do that if we can do it rather safely and quickly. and it will delay 1.1.0 as we will need at least another rc just for that. |
one downside of using ctime is that chown/chmod -R ... bigfiles/ would chunk/hash all the bigfiles again. |
I did some experiment under win10+cygwin (on ntfs):
Also did experiment under win10 native (on ntfs):
|
I prefer the current behaviour. Using mtime+size+inode is a fairly standard heuristic that works almost all of the time, whereas using ctime is going to cause a lot of unnecessary chunking in many cases. I'm also concerned because some programs that scan filesystems (including some backup programs) explicitly reset the atime after reading up a file, and this causes the ctime to update, which would cause borg to rechunk all files. Is it really the case that dpkg updates a file but preserves all of length, mtime and inode? I don't recall ever seeing that behaviour, and find it hard to believe. Which package (and version) and which file? |
@jdchristensen are you sure these backup programs really reset the atime (and not choose the NOATIME open mode available for root, that does not touch the atime when reading the file)? borg uses the NOATIME open mode, if possible. |
I'm not sure how common it is for atime to be reset, but I've heard of it. E.g. tar has as an option:
I also recall hearing about other indexing programs doing this, maybe before other approaches were available. In any case, even ignoring such programs, ctime will get changed for lots of reasons that don't involve changing the content, while mtime is precisely intended to indicate the last time the content was changed, so I think it's the best thing to look at. But I could understand someone wanting ctime as an option. |
In any case, you should leave an option for using the opposite of what you might implement in the future, i.e. make it optional. Only I as a user know, whether I mess around with my mtime or atime, or – if I have a "usual" system – can rely on these times. So the cases in which I need one thing or the other, may be different. There is no "one-thing-fits-it-all" solution here. Also the hacking argument (i.e. an attacker wants to prevent some files from getting backed-up without niticing), is, IMHO, still valid.
An attacker could use mtime/atime for the same. Just reset it, that is way subtler than deleting the file. |
Borg could always just do another check after this is detected, using the file hash, so no false-positives happen. |
…gbackup#911 using ctime is the more safe option for a backup tool (see borgbackup#911), but --use-mtime can be given if using mtime is good enough or if there are any issues with ctime on the platform / filesystem.
…gbackup#911 using ctime is the more safe option for a backup tool (see borgbackup#911), but --use-mtime can be given if using mtime is good enough or if there are any issues with ctime on the platform / filesystem.
…gbackup#911 using ctime is the more safe option for a backup tool (see borgbackup#911), but --use-mtime can be given if using mtime is good enough or if there are any issues with ctime on the platform / filesystem.
…gbackup#911 using ctime is the more safe option for a backup tool (see borgbackup#911), but --use-mtime can be given if using mtime is good enough or if there are any issues with ctime on the platform / filesystem.
Summary of twitter feedback:
|
What do other backup programs do? Not to say that Borg can't do something different, but might speak to user expectations/what the default should be. Research I was able to do on my phone: Duplicity: mtime |
restic: mtime |
unison is a directory synchronization program that I've been using on various systems for over 15 years to synchronize all of my files. It uses mtime. rsync uses mtime by default. Gnu tar uses mtime when updating an archive, but uses ctime when comparing against a snapshot file. |
Note:
|
sshfs: has a clientside cache for misc stuff, including stat() results. So it might give incoherent results over short timespans. But shouldn't be a problem because the default timeout is relatively short and (after timeout) mtime/ctime gets propagated from server to client. |
From IRC:
inode number changes also. |
everybody please have a look at PR #3024 (and comment either there or here) - I'ld like to fix that for borg 1.1.0. |
vfat / linux:
|
ntfs / linux (FUSE):
|
implement files cache mode control, fixes #911
You can now control the files cache mode using this option: --files-cache={ctime,mtime,size,inode,rechunk,disabled}* (only some combinations are supported) Previously, only these modes were supported: - mtime,size,inode (default of borg < 1.1.0rc4) - mtime,size (by using --ignore-inode) - disabled (by using --no-files-cache) Now, you additionally get: - ctime alternatively to mtime (more safe), e.g.: ctime,size,inode (this is the new default of borg >= 1.1.0rc4) - rechunk (consider all files as changed, rechunk them) Deprecated: - --ignore-inodes (use modes without "inode") - --no-files-cache (use "disabled" mode) The tests needed some changes: - previously, we use os.utime() to set a files mtime (atime) to specific values, but that does not work for ctime. - now use time.sleep() to create the "latest file" that usually does not end up in the files cache (see FAQ) (cherry picked from commit 5e2de8b)
implement files cache mode control, fixes #911
ctime can't be set via userspace, while mtime is easily manipulated.
For most files (e.g. not extracted from tarballs, or installed by a package manager) these two are the same, which limits the impact of the change on existing file caches.
@verygreen noted that in the Windows world ctime usually means "creation" not "change" time
The text was updated successfully, but these errors were encountered: