Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow --files-cache=size #5686

Closed
sebsauvage opened this issue Feb 12, 2021 · 13 comments · Fixed by #5831
Closed

Allow --files-cache=size #5686

sebsauvage opened this issue Feb 12, 2021 · 13 comments · Fixed by #5831

Comments

@sebsauvage
Copy link

Have you checked borgbackup docs, FAQ, and open Github issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

Feature suggestion

Describe the problem you're observing.

There are use cases when using size only for file change detection make sense.

rsync supports this feature (--size-only), and it would be convenient to have it in borg too (--files-cache=size)

@ThomasWaldmann
Copy link
Member

Can you tell more about the use cases when size-only makes sense?

@sebsauvage
Copy link
Author

sebsauvage commented Feb 13, 2021

When backing up a dataset where filename+size guarantee unicity (eg. photos taken by a smartphone).
Useful on storage when dates are unreliable (because changed by external process or fs does not report date/time in reliable manner)

Example: Backing up photos from a smartphone accessed via FUSE filesystem with unreliable dates.

Without this feature, borg would detect a change because date/time is different and re-read the file, which can be costly in bandwidth/time.

@ThomasWaldmann
Copy link
Member

Well, in that given usecase it might be usually sufficient, but not always.

Imagine you take a photo and your photo app decided to put timestamp and gps coordinates into metadata.

You make a backup of this, but later you decide to use some special software to remove such metadata from the file and the software just overwrites this information "in-place" with some fake date and fake gps coords. The file size will not change.

Then you make a backup again and the modified file will not be detected as modified and will be silently skipped and you won't have a backup of it.

@ThomasWaldmann
Copy link
Member

Also, in case there is some other file type in the same directory as your photos, like some fixed-size records database from some photo software, any modification of the db records will not trigger a fresh backup of the db file (only if the db file size changes e.g. by adding/removing records).

@sebsauvage
Copy link
Author

Indeed, files could be modified in-place and have the same size, hence not be detected as modified.
But this may be irrelevant regarding the dataset or intentions of the user. This may be a desired behaviour.

On datasets where files are: 1) only added and 2) remote, allowing the use of size-only detection would save a lot of bandwidth and time, because the file would not have to be re-transfered (eg. MTP or FUSE on slow remote fs as source).

@ThomasWaldmann
Copy link
Member

OK, we can add this as a (non-default) option.

There also needs to be some docs about it warning users not to use this except when they specifically know that it will work for them. "if it breaks, you will own the parts."

OTOH, maybe users wanting to use this rather need a bugfix in the filesystem they use: having either a valid ctime or mtime should be something a user can expect from a filesystem.

@sebsauvage
Copy link
Author

Thank you !

I totally agree about the warnings.

@Swanand01
Copy link
Contributor

Hi, I am a beginner contributor and I'd like to take this up. Please brief me on the changes to be made.

@ThomasWaldmann
Copy link
Member

A good starting point in the code is src/borg/archiver.py - the do_create function there implements the borg create command.

Also, you can search for files-cache in the file to find the options parsers for that (and the internally used attribute name set by the parser).

Then just navigate the source and use your global search function to find all places dealing with that.

@Swanand01
Copy link
Contributor

Hello @ThomasWaldmann, I did as you said and looked for files-cache in archiver.py.
rc = self.do_create(self.parse_args(['create', compression, '--files-cache=disabled', archive + '1', path]))
fs_group.add_argument('--files-cache', metavar='MODE', dest='files_cache_mode', type=FilesCacheMode, default=DEFAULT_FILES_CACHE_MODE_UI, help='operate files cache in MODE. default: %s' % DEFAULT_FILES_CACHE_MODE_UI)

As far as I understand, we want the user to be able to do --files-cache=size while suing the borg create command. However, I can't seem to understand what changes are to be made.
Please don't mind, I am a beginner.
Thanks.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 14, 2021 via email

@Swanand01
Copy link
Contributor

Swanand01 commented May 15, 2021

Hi @ThomasWaldmann, I've looked up the FilesCacheMode() function.
VALID_MODES = ('cis', 'ims', 'cs', 'ms', 'cr', 'mr', 'd')
If I'm not mistaken, for the user to be able to set --files-cache=size, 's' will have to be added to VALID_MODES?

Also,
'--files-cache') local files_cache_mode="ctime,size,inode mtime,size,inode ctime,size mtime,size rechunk,ctime rechunk,mtime disabled" COMPREPLY=( $(compgen -W "${files_cache_mode}" -- ${cur}) ) return 0 ;;
and
set -l files_cache_mode "ctime,size,inode mtime,size,inode ctime,size mtime,size rechunk,ctime rechunk,mtime disabled"
will have to be added with 'size'

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented May 15, 2021

Yes, add 's' to VALID_MODES. Not sure if we should change the autocompletions, the use case for size-only is very special.

Also, please check if changes at other places are required, just check all places using this stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants