-
-
Notifications
You must be signed in to change notification settings - Fork 9
Partition remapper (draft)
The remapper is providing a way to (non-destructively) migrate from one partition layout to another without relying on external storage, networking or enough RAM to hold the complete set of data.
Usually if you want to re-partition your disk or change the filesystem, you'd need to move your data to another storage device, write the new partition table, create the new filesystems and move the data over to the new filesystem.
For nixos-assimilate we need to write the Nix store to the existing filesystem, reboot into the new kernel of the new configuration, move to a new partitioning schema and continue booting from the previously written Nix store. Of course, we could just provide a bare minimum configuration expression that is built after re-partitioning, but it would introduce quite a lot of additional complexity and is error-prone, particularly if there are networking failures.
Also, a further advantage is that we can not only use it for just writing the Nix store, but retain existing data, which could be useful in order to roll back to a previously working state.
The partition remapper is comprised of two virtual block devices that are based on the real disk, so we have three devices:
- Virtual source device
- Virtual destination device
- Physical target device
Firstly, the source device is behaving the exact same way as the physical target device, so we can properly mount the filesystem(s) read-only on top of that. The reason for it to initially behave the same way (i.e., it's read-write and it writes to the actual physical target device), is that some filesystems need to do writes to disk, even if you do a read-only mount.
Secondly, we set the source device read-only as well and now the destination device is ready to receive the data.
Maps the new partitioning layout and has a caching mechanism which only writes data to the physical device if it has been successfully read from the source device and thus is no longer needed.
The above figure shows a device consisting of 5 blocks which should be remapped like this:
- Block
1
should go to block4
- Block
2
should go to block3
- Block
3
should go to block1
- Block
4
should go to block5
- Block
5
should go to block2
And these are the detailed steps to do the transformation:
- Move block
1
into the cache (because block4
wasn't transferred yet). - Move block
2
into the cache (because block3
wasn't transferred yet). - Move block
3
to destination block1
(because we already have block1
in the cache) and move cached block2
to destination block3
. - Move block
4
into the cache (because block5
wasn't transferred yet), move cached block3
to destination block1
; hence we can move cached block1
to destination block4
as well. - Move block
5
to block2
(it was already touched in step 2) and move the remaining cached block4
to the destination block5
.
Of course, this method has a few problems which need to be solved before we can implement it:
Especially for filesystems based on B-trees, metadata could be revisited more than once, also we need to avoid hard links or other means of deduplication. We could solve this by hooking into VFS and only work on extent level, but not sure how well this works with LVM. So let's investigate that and/or maybe find a better solution.
If our cache is quite small and we have too much overlap between the old and new layout, we could run out of RAM. A way to mitigate this would be to temporarily relocate the cached data to a known visited area on the physical disk where we could be sure that it's not corrupting the mounted source filesystem(s).
One example that comes in mind is a LUKS container, where it would make sense to randomize data before formatting. In this particular case we could write random data afterwards, because we already know which blocks are used by the new filesystem. However, there could be other cases, so let's check that first.