Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "put updates shadow index" #5636

Merged

Conversation

ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Jan 18, 2021

@timmc found:

"I notice that Repository.put doesn't add to shadow_index -- is that intentional? Not sure if there's an impact on this PR."

See commit comments.

@ThomasWaldmann ThomasWaldmann force-pushed the fix-put-updates-shadow-index branch 2 times, most recently from 5841886 to 4e5cf7b Compare January 18, 2021 21:00
self._delete(id, segment, offset)

def _delete(self, id, segment, offset):
# common code used by put and delete
self.shadow_index.setdefault(id, []).append(segment)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ this line was missing in original put code.

@codecov-io
Copy link

codecov-io commented Jan 18, 2021

Codecov Report

Merging #5636 (920319b) into master (c3df6fc) will decrease coverage by 0.23%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5636      +/-   ##
==========================================
- Coverage   83.09%   82.86%   -0.24%     
==========================================
  Files          38       38              
  Lines       10139    10135       -4     
  Branches     1680     1680              
==========================================
- Hits         8425     8398      -27     
- Misses       1213     1231      +18     
- Partials      501      506       +5     
Impacted Files Coverage Δ
src/borg/repository.py 83.97% <100.00%> (-0.06%) ⬇️
src/borg/platform/__init__.py 72.00% <0.00%> (-12.00%) ⬇️
src/borg/platform/base.py 77.58% <0.00%> (-3.45%) ⬇️
src/borg/helpers/misc.py 80.23% <0.00%> (-1.80%) ⬇️
src/borg/helpers/fs.py 85.33% <0.00%> (-1.34%) ⬇️
src/borg/xattr.py 48.83% <0.00%> (-1.17%) ⬇️
src/borg/locking.py 86.64% <0.00%> (-0.69%) ⬇️
src/borg/archive.py 80.50% <0.00%> (-0.54%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83116e5...920319b. Read the comment docs.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Jan 18, 2021

OK, so looks like we had a bug here for quite a while.

But borg does usually not put something it already has in the repo - usually this would be pointless, except for:

  • chunkid 0 (manifest): updates always get put to id 0
  • borg recreate recompressing chunks: id is based on plaintext, so recompressed content gets put to same id

I just checked if borg does a explicit delete call in these cases: no, it does not. So the bug in put actually bit.

So, even if a user did not use recreate: everybody updates the manifest and the shadow_index was not updated for these.

Consequences?

@ThomasWaldmann
Copy link
Member Author

@enkore please review.

segment, size = self.io.write_delete(id)
self.compact[segment] += size
self.segments.setdefault(segment, 0)
self._delete(id, segment, offset)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth including a comment here indicating that the shadow_index will not get updated without performing an explicit delete here.

I'm not sure what to make of how borg's repository layer allows a PUT to shadow another PUT, but currently never does that (instead using DELETE + PUT). I'm not sure if PUT A, PUT A should be considered a sequence in need of repair.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment.

seems like put del put is also needed for other bookkeeping.
e.g. imagine borg recreate with recompress - data size will usually change here, but chunkid will stay same.

@timmc
Copy link

timmc commented Jan 23, 2021

Would it be sufficient to add to borg check's capabilities and add all but the final manifest to the shadow index?

@ThomasWaldmann
Copy link
Member Author

let's discuss consequences and borg check in #5661.

The shadow_index should be in same state after both of these sequences
(let's assume that A is not in repo yet for simplicity, but it does not matter):

a) explicit delete: put(A), delete(A), put(A), resulting in: PUT A, DEL A, PUT A repo contents

b) implicit delete: put(A), put(A), resulting in: PUT A, DEL A, PUT A repo contents
@ThomasWaldmann ThomasWaldmann force-pushed the fix-put-updates-shadow-index branch from 920319b to f079a83 Compare January 29, 2021 16:05
@ThomasWaldmann ThomasWaldmann merged commit dde13d7 into borgbackup:master Jan 29, 2021
@ThomasWaldmann ThomasWaldmann deleted the fix-put-updates-shadow-index branch January 29, 2021 16:31
@ThomasWaldmann ThomasWaldmann removed the bug label Feb 4, 2021
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Feb 4, 2021
…5661

A) the compaction code needs the shadow index only for this case:

segment A: PUT x, segment B: DEL x, with A < B  (DEL shadows the PUT).

B) for the following case, we have no shadowing DEL (or rather: it does not matter,
because there is a PUT right after the DEL) and x is in the repo index,
thus the shadow_index is not needed for the special case in the compaction code:

segment A: PUT x, segment B: DEL x PUT x

see also PR borgbackup#5636.

reverts f079a83
and clarifies the code by more comments.

we keep the code deduplication of 5f32b56
and just add a update_shadow_index param to make it not look like there was
something accidentally forgotten, which was the whole reason for the reverted
"fix".
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this pull request Feb 15, 2021
…5661

A) the compaction code needs the shadow index only for this case:

segment A: PUT x, segment B: DEL x, with A < B  (DEL shadows the PUT).

B) for the following case, we have no shadowing DEL (or rather: it does not matter,
because there is a PUT right after the DEL) and x is in the repo index,
thus the shadow_index is not needed for the special case in the compaction code:

segment A: PUT x, segment B: DEL x PUT x

see also PR borgbackup#5636.

reverts f079a83
and clarifies the code by more comments.

we keep the code deduplication of 5f32b56
and just add a update_shadow_index param to make it not look like there was
something accidentally forgotten, which was the whole reason for the reverted
"fix".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants