Skip to content

7. Using MKG for backups

fabnicol edited this page May 21, 2021 · 54 revisions

MKG as a backup tool

Although MKG has been written to quickstart a bootstrapped Gentoo platform, it can also be used as a backup tool.
The overall concept is flexible and should be adapted to particular user needs, yet roughly runs as follows:

  • backing up user data is a fairly easy task that is left to the user's responsibility (a simple rsync script covers most needs)
  • backing up the operating system (system-wide binaries, configuration files and bootloaders) is harder and hardly ever feasible while the platform is operating (so-called hot backups, which some commercial software boast on performing). When it is, it requests a lot of disk space and implies some computing overhead for compression and IO operations.
  • MKG offers a simple alternative along the following lines for OS backups, which relies on the combination of the KISS principle and a benign neglect approach:
    • do no care about bootloaders and (most) configuration files, as they will be automatically recovered with standard specifications,
    • save a user-defined, flexible list of critical configuration files,
    • simplify the backup scheme so that backups can be performed in real time
    • save your ebuild list as a core backup target.

It is up to users to adapt these principles to their needs. Below is the core of the backup script I personally use:

#!/bin/bash
# Before starting this, it is better to sync-update-and-clean the package tree
# to avoid inconsistencies.
# Do this rather manually than scripting:
# emerge --sync
# emerge -auDN --with-bdeps=y @world
# emerge --ask depclean
# revdep-rebuild
#
updatedb
qlist -I | uniq > packages
# If you want to save package versioning, use: -Iv
for i in conf json yaml xml rc # [other formats here]
do
  locate -b *\.$i
done | sed -r '/\/(run|dev|proc|sys|tmp|var\/tmp)/d' > config.list
# You should tweak this to systemd if this is your OS loader
echo '/etc/init.d' >> config.list
# You may want to save all of /etc in the above line # Another option is to save /etc using a local Git repo. # You should tweak this to gdm or lightdm etc.
# depending on your desktop logging interface
echo '/usr/share/sddm' >> config.list
# [At this stage I fine-tune config.list. ]
tar --xattrs -cJvf config.tar.xz -T config.list
tar --xattrs -cpJvf bash.tar.xz /etc/bash /etc/profile ~/.bash*
tar --xattrs -cpJvf portage.tar.xz /etc/portage
tar --xattrs -cpJvf kernel.tar.xz /boot/config*

And that's about it. Backed-up data consists of simple text files only with an average compressed size of 10 MB.
Backup can thus be done frequently through a chron/rsync job with almost no overhead and disk space usage. The recovery procedure then unravels as follows:

  • create a USB key installer (I use a high-transfer rate stick):
    # ./mkg [...] device_installer ext_device=...

or download a binary release from the Release section of this site and copy it with dd as indicated there.

  • boot your computer to the USB recovery medium. Reboot after the cloning has ended (about 3 minutes).
  • when logged to Gentoo, connect another USB stick with the tarballs on and extract the config, portage, kernel and bash tarballs into their original locations. Copy the packages file to your home (about 2-3 minutes).
    Baseline recovery is performed in exactly 6 minutes using quality USB sticks and an SSD main disk. Now:
  • reboot and run in your home:

tar --xattrs -xpJvf portage.tar.xz
# reset your portage specs to previous
rsync -avr etc/ /etc
# reset your kernel to previous
# tar --xattrs -xpJvf kernel.tar.xz
rsync -avr boot/ /boot
emerge gentoo-sources
# or your previous kernel version if not latest stable, see config files in kernel.tar.xz:
# emerge =sys-kernel/gentoo-sources-(version)
eselect kernel set 1 [or 2 if restoring a newer kernel ]
cd /usr/src/linux && cp /boot/config*[your kernel version] .config && make syncconfig
make modules_prepare && cd -
# only if you had overlays:
# emerge app-portage/layman
# layman -S # layman -a (your overlays)
emerge -auDN --keep-going --with-bdeps=y $(cat packages)

or, if you saved package versioning:

# emerge -auDN --keep-going --with-bdeps=y $(while read pack; do echo =${pack}; done < packages)

In the event of a failure, you should try this first:
# emerge --resume

The time taken by the portage merge step is variable as it depends on how much software you installed on top of the baseline MKG Gentoo distribution.
On average this takes about 2 to 5 hours starting from the more complete version of the distribution.

If your kernel version was more recent than that of the MKG distribution, you will have to rebuild it, which is a simple task (please consult the Gentoo installation manual again). In a nutshell, with administrative rights:

cd /usr/src/linux
make -j4 && make modules_install # change 4 into the number of assigned cores
rm /boot/vmlinuz* && rm /boot/initramfs* # in case you are short of space under /boot
make install
make clean
genkernel --install initramfs
grub-mkconfig -o /boot/grub/grub.cfg

Now reboot. You can now proceed with user data restoration using your rsync procedure of choice.


Warning

  1. Take extra care if restoring configuration files, especially from /etc. You should always restore the full kernel and software packages before restoring your previous /etc directory (except for portage config files). Note that your previous passwords will also be restored in this case. Tricky issues may arise if kernel or critical packages cannot be restored with the same versioning as in your backup. This is why it is preferable to backup config files and package lists on a daily basis. Do not perform this if you are not in a position to restore all critical system packages in their original version number.
    In any case do not forget to exclude /etc/fstab from your backup as copying it back may cause boot failures. This is an acknowledged limitation of the present approach: no partition design backup.

  2. Restoring /opt and /usr/local is less risky. You should restore at least /usr/local after you have compeletely updated your @world set as indicated above, to avoid potential environment or versioning mismatches between similar software under /usr/local/ and /usr. Run ldconfig after this optional step.


Much of the above code can be put together into a custom install script that will also fine-tune how configuration files are reinstalled (this cannot be generalized as it is platform or user-specific).
This procedure has been tested numerous times and is overall robust. Expect some occasional rough edges though: you may have to resolve a small number of portage conflicts or use/keyword issues, which actually will reflect undetected inconsistencies in your original package tree.

Environmental issues

All source-based OS distributions, not only Gentoo or Arch but also the BSD family, have an environmental weak point: building is a power-consuming operation that takes about 1 to 4 kWh for the baseline configuration. On top of this price tag, extra time and power must be added for user-installed software. As far as power expenses are concerned, source-based operating systems therefore seem to compare unfavorably to standard binary distributions, which cheaply duplicate and deploy their builds once they have been created.
My personal take on this issue is that all environmental footprints should be taken into consideration in the lifetime of a platform.
The replacement of power-demanding, conventional backups to hard disk with the very lightweight, energy-saving procedure outlined above more than makes up for the extra amount of power requested by source code compilation, at least if users consent to refraining from overbuying backup disk space and replace it with optical media or cheap USB sticks for critical text file backups.
Over the lifetime of a Gentoo platform, the environmental footprint of the saved disk ware (including the extra power needed to keep it running) should be greater than the extra power costs incurred, by an order of magnitude if the saved footprint is evaluated based on the market value of disk hardware compared to electric power. The comparison should even be more favorable in the long run, with the expected rise in renewable electric power production.