Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Due to deprecation of file-based pools, some of us need a ZFS qvm-pool backend #7009

Closed
Rudd-O opened this issue Oct 26, 2021 · 18 comments
Closed
Labels
bounty This issue has a public bounty associated with it. C: storage help wanted This issue will probably not get done in a timely fashion without help from community contributors. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.

Comments

@Rudd-O
Copy link

Rudd-O commented Oct 26, 2021

The problem you're addressing (if any)

Some of us use a file system that is not a stock Fedora file system, and are not willing to trust our data to other file systems. In this particular case, we are talking about ZFS.

So far -- albeit complicated to use -- the file-based pool system has worked fine for us. This is going to get deprecated in Qubes OS 4.1 and will be removed in the next release.

This leaves the abovementioned class of Qubes OS users without a solution that works to back our qubes with disk storage.

The solution you'd like

I would like to see a ZFS-based qvm-pool backend, that preferably uses a tree of ZFS volumes arranged similarly to the LVM arrangement. Cloning of VMs could be done extremely fast using ZFS snapshots and clones, and ephemeral file systems such as the root file system in standard AppVMs can also be supported in the same way. Due to the way ZFS works, it is entirely likely that no loop devices or device-mapper mappings will need to be manipulated at all, which also means a gain in simplicity w.r.t. how the backend is implemented.

(Nota bene: this effort would be greatly helped by documentation on how to implement new storage drivers.)

The value to a user, and who that user might be

The class of users is well-defined, although we understand that we are in a minority.

  • Superior data reliability and trustworthiness.
  • Higher performance, when combining rotational and solid storage devices (ARC / ZIL).
  • Ability to perform snapshot sends and receives.
  • Optionally, automatic snapshots for user-configurable revert of VMs to previous states.
  • Given careful design, certain VMs may require an encryption key (particular to a specific ZFS volume) to boot.
  • TRIM works by default and autotrim can be turned on -- all that is required from the VM itself is that discard is used as mount option. Even in the case when this is not possible to arrange, filling the VM's disks with zeroes still causes an enormous disk space gain due to ZLE compression.

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver -- installable as a package -- that I can test and verify myself, hopefully with the end goal of shepherding it into the main Qubes OS distribution at a later release.

Thanks in advance.

@Rudd-O Rudd-O added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: enhancement labels Oct 26, 2021
@DemiMarie
Copy link

documentation on how to implement new storage drivers

Please make a separate issue for this unless one already exists.

@Rudd-O
Copy link
Author

Rudd-O commented Oct 26, 2021

Will do later.

@Rudd-O
Copy link
Author

Rudd-O commented Oct 26, 2021

Linking this as this work may be useful:

cfcs/qubes-storage-zfs#2

@andrewdavidwong andrewdavidwong added this to the Release TBD milestone Oct 26, 2021
@rustybird
Copy link

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver

Have you considered funding openzfs/zfs#405 - it would make the file-reflink Qubes storage driver compatible with ZFS.

@DemiMarie
Copy link

I would be willing to commit roughly 0.25 BTC to funding the work, contingent on the delivery of an optional driver

Have you considered funding openzfs/zfs#405 - it would make the file-reflink Qubes storage driver compatible with ZFS.

It would also be incredibly useful in other areas, not just Qubes OS.

@Rudd-O
Copy link
Author

Rudd-O commented Oct 26, 2021

I considered it and rejected the idea at least for now:

  • it's more complex to get that right, hence why it's been sitting in review for a while,
  • I would prefer a more native solution that directly used one ZFS volume per one qvm-pool volume, to make revert of single VMs much faster and simpler than the current reversion process today (clone snapshot, mount clone, delete existing volume files from VM dir, copy old volume files into VM dir, unmount clone, destroy clone).

ZFS volumes are the right abstraction level for this -- what the VM wants are block devices, and what ZFS volumes provide us are block devices. The fact that the file driver currently has to make this absolutely horrible dm-mapper / loopback dance to get things to work is awful. This complexity would go away with a ZFS-specific zvol-using driver.

I agree with the premise that enabling reflink to work in ZFS would be a more fundamental and more generally beneficial change. I don't want the availiability of that feature to block progress on this proposal.

@rustybird
Copy link

But those limitations of the old 'file' driver don't apply to the 'file-reflink' driver. The latter doesn't use the device mapper (only loop devices - implicitly), and volumes can be instantly reverted to any previous revision with qvm-revert.

I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.

I don't want the availiability of that feature to block progress on this proposal.

Fair enough, they're your funds :)

@DemiMarie
Copy link

I also wouldn't underestimate the complexity of a ZFS-specific storage driver: The WIP zfs.py currently implementing a subset of the Qubes storage API is already the largest driver (sloccount 768), compared to lvm.py (sloccount 667) and reflink.py (sloccount 365) both implementing the full API.

There appear to be several reasons for this:

  • zfs.py tries to avoid running much of the code as root, but qubesd already runs with root privileges, so this doesn’t actually provide the intended protection. So all of the sudo zfs allow calls can be removed.
  • zfs.py has a rather verbose coding style (lots of multi-line array literals)
  • Some leftovers from the LVM2 driver

@Rudd-O
Copy link
Author

Rudd-O commented Oct 28, 2021

The latter doesn't use the device mapper (only loop devices - implicitly),

Correct, this I don't want in a ZFS driver.

and volumes can be instantly reverted to any previous revision with qvm-revert.

Cool to know. A ZFS driver would be able to do this by simply issuing zfs rollback path/to/volume as well.

@Rudd-O
Copy link
Author

Rudd-O commented Oct 28, 2021

So all of the sudo zfs allow calls can be removed.

Seconded.

@tlaurion
Copy link
Contributor

@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.

@tlaurion
Copy link
Contributor

tlaurion commented Jul 21, 2022

Also, it seems (from personal non-exhaustive research) that ZFS might be the only candidate we currently have to have pool-wide volumes reduplication (having qubes back up its advice to clone templates to specialize usage) without having exponential cost of storage with deployed software + in-place upgrades of fedora fast paced (annoying) release cycles.

I try to gather input on pool deduplication here: https://forum.qubes-os.org/t/pool-level-deduplication/12654

Please shed some light if you have any advice.

@tlaurion
Copy link
Contributor

@Rudd-O you saw QubesOS/qubes-core-admin#289 ?

@Rudd-O
Copy link
Author

Rudd-O commented Jul 21, 2022

@andrewdavidwong bounty label should be added here, since @Rudd-O offered 0.25 BTC bounty here.

I will happily honor my offer if the funds are used to finance the developer time to finish the half-written ZFS pool driver.

@andrewdavidwong andrewdavidwong added bounty This issue has a public bounty associated with it. help wanted This issue will probably not get done in a timely fashion without help from community contributors. labels Jul 22, 2022
@rustybird
Copy link

FYI, reflink support for ZFS also seems to be progressing nicely in PR #13392.

@andrewdavidwong andrewdavidwong removed this from the Release TBD milestone Aug 13, 2023
@ayakael
Copy link

ayakael commented Aug 30, 2023

Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.

@DemiMarie
Copy link

This pool has already been implemented, closing as fixed.

@Rudd-O
Copy link
Author

Rudd-O commented Sep 4, 2023

Reflink support for ZFS is slated for release with OpenZFS 2.2. I've been using cfcs's work-in-progress ZFS driver, but this might make me drop it and use file-reflink directly.

That's good news.

Do note that the ZFS driver being released with Qubes 4.2 lets you use ZFS snapshots to natively take care of your VM storage, supports send + receive, and will take advantage of ZFS encryption (if your system is set up to use it). ZFS stability is, of course, legendary.

I have a backport of the driver for 4.1 here: https://repo.rudd-o.com/q4.1/packages/qubes-core-dom0-4.1.33.1-40.qbs4.1.noarch.rpm . Do note upgrading your 4.1 system will result in this package being erased and you won't have a storage driver anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty This issue has a public bounty associated with it. C: storage help wanted This issue will probably not get done in a timely fashion without help from community contributors. P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.
Projects
None yet
Development

No branches or pull requests

6 participants