-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replication received_uuid blocker re snap to share promotion #2902
Comments
In the same forum thread, I documented a PoC with the suggested change:
changing: rockstor-core/src/rockstor/fs/btrfs.py Lines 2311 to 2314 in 1ddcf4b
to:
which resulted in successful replications beyond the usual failure point |
N.B. I have now observed this failure with quotas enabled (on receiving system): Reproduced with Rockstor 5.0.14-0 Leap 15.6 send & receive instances:
With the following qgroup details (receiving system):
N.B. in this reproducer instance there is no 2015 (rockstor) parent qgroup assignment. Only that of the default 0 group. |
@Hooverdan96 My previous comment reproducer details were observed with a trivial data set. This may well explain seeing this error while quotas are enabled. I'll continue with this issue while I have a reproducer and then look to the quotas related blocker that likely proceeds this issue when there is an actual real-life data payload. |
Likely pertinent historical reference from btrfs mailing list: https://www.spinics.net/lists/linux-btrfs/msg69951.html |
Notes on first 3 replication received subvol properties: 1stSend endNo longer available in reproducer systems as the oldest snapshot in replication is deleted. Receive endN.B. As this is the first replication event: this subvol has no parent.
2nd:Send end
Receive endN.B. This subvol has the first (1st above) as it's parent UUID subvol. Send receive working on sending differences between two subvols.
3rdSend end
Receive endN.B. In turn, this 3rd subvol has as its parent UUID the above 2nd subvol.
Original (Sending) source share infoHaving stopped the sending replication: to catch the final state of this replication failure reproducer we have the original source (sending side) share we were replicating showing up as follows:
|
@Hooverdan96 I'm just working through our options here, but remember that we already make allowances for our approach, I.e. the cascade of snapshots. We purposefully do not touch a 'live' receiving snapshot. And such a change is way too large for this late in the testing phase. But our code is such that we could look to improvements later. But not just yet I think. Still working on this one. But we do already account for this sensitivity: we were just not actually warned against what we do before hand. And that warning pertains to if the subvol we are modifyting was still involved in a send/receive. My understanding is that is is not: due to our precautions re the cascade sends. |
@Hooverdan96 Also note that a clone in btrfs speak is a little different to our clones. Here, as far as my understanding goes, we already maintain upstream advice via our snapshot cascade: and sending the differences. We send differences between ro snapshots only. The cascade then allows for us to do our 'repclone' (snap-to-share-supplant) which is to supplant a share with a snapshot. There-by updating the user-visible replication share. A snapshot is actually a clone (mostly instantaneous), and we already do this as part of our send/receive wrapper. It's where all the complexity comes from: and the purpose of our cascade in the first place. Incidentally we use to use 5 snapshot !!! But I changed it to 3 a few years ago. 5 really tended to confuse folks and could take a very long time to end up with results folks expected: an actual Share at the receiving end :) . We will have to enact some good technical docs for this whole process as I have to re-learn each time I look at it. But I think we have a good design of our own: it's just poorly documented for both us and the general users! Pretty sure we are good to go with your suggested force here: and didn't see a reference for removing a sending uuid. |
…#2902 When promoting the oldest of the 3 read-only snapshots received & retained by the replication service (btrfs send/receive wrapper), use the force flag during ro-to-rw/snap-to-share transition. At the time of this transition, this received subvol is no longer used for comparison in all future replication (btrfs send/receive) events. It represents an older version of the sending systems associated replication source share. Necessarily older by way of the constraints of the btrfs send/receive architecture, and the safeguards of the replication wrapper: a cascade of ro snapshots.
Yes, that explanation makes sense in the cloning context. And the point being that the third of these cascading snapshots will not be changed in between setting the read-write flag and it being promoted to share. |
…d-blocker-re-snap-to-share-promotion replication received_uuid blocker re snap to share promotion #2902
Closing as: |
As observed in the scenario described on the Rockstor community forum (users stevek, Hooverdan, phillxnet), when quotas are NOT enabled on the receiving system, it can happen that a snapshot cannot be promoted because the system fails to set the read-write (rw) property. In this scenario the receiving system was running Rockstor on OpenSUSE Tumbleweed.
https://forum.rockstor.com/t/disk-structure-under-mnt2-and-replication-question/9720/21
The resulting error message implies that using the
-f
(force) flag will allow the property setting.[EDIT by phillxnet] A dependency regarding reproducer systems: believed to pertain to Leap 15.6 / TW receiver side systems. Where a jump in kernel and btrfs was observed: container newer safe-guards that have lead to this
-f
requirement. See now associated and merged PR referenced below in comments.The text was updated successfully, but these errors were encountered: