-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'zpool import' dynamically via udev #330
Comments
would it make sense to implement this as a pool property? My use case: I have a Xen server with the VM's using zvols for disks, but now I'd like to have one VM actually use ZFS directly, so I'd like to mark its (different) pool to not be imported by the Dom0 on system boot. I'll assign the physical disks to the DomU. In this case it would be OK for me to import it in rc.local. I suppose ideally this would look like:
so I could avoid the rc.local and put spl/zfs on initrd to get to it earlier for other cases. A default 'any' gets the current behavior, setting each pool to 'none' gets a 'zfs_autoimport_disable' behavior. It would raise the question of what to use for a uuid. The zfs module would probably need to read /etc/zfs/zfs-host.uuid or something like that (perhaps even creating it if missing). I think I'd do best with a symlink to /sys/hypervisor/uuid. Then again, zfs knows if I've moved a pool between machines - does it already have a host [g,u]uid? |
That might be nice, but part of the trouble is it's not so straight forward to retrieve a pool property until that pool is imported. Now these properties are cached in the zpool.cache file, so we could perhaps store them there but that seems a bit clunky. My feeling is the right way to do this on Linux is to disable all imports during module load, and then integrate the actual import with a udev helper. The udev helper will be invoked every time a new device appears. This helper can read the zfs label from the disk which fully describes the pool configuration. From this it can determine if all the needed devices are available any only then trigger the import. This solves at least two problems which have come up.
|
even better! Event-driven should have many benefits over time. |
The idea of having yet another file for zfs to update in the initramfs (apart from zpool.cache with the related troubles that one already gives) dosn't sound that good to me. Maybe a module parameter that can be given via bootloader would do the trick more nicely? |
Actually, the idea here it to have one less file that needs to be updated in the zpool.cache file. All pools could be detected automatically and only those with the required property (or passed somehow via bootloader) would be imported. You would no longer need a zpool.cache file at all, although we'd still probably support it for legacy reasons. |
This is a great effort. Can this be merged? |
@Rudd-O If you're asking about this commit:
then, I'd say no, it's not ready to be merged. The commit message states the missing functionality that needs to be added to the patch before it is anywhere near complete, and then once that is added, testing is needed to ensure it works correctly. Unfortunately, I don't see myself working on that patch in the very near future, so if you have some spare cycles I would highly encourage you to pick it up and run with it. |
Do you need financial support to see this through so you can work unmolested by other concerns? I would be willing, depending on what factors you see are necessary to see thru. |
I'm not even sure that's possible with the way my current employer, LLNL, works. I'll see what I can do about getting some time to work on this. |
I just want a kernel module option to auto import disable. That is all. |
@Rudd-O You can just comment out |
Brian, I want the cache file to be read during import. I just don't want ZFS to read it and then attempt to import on module zfs.ko load. Today marks 20 hours of debugging a bug -- of the worst kind: initrd bugs -- caused by this. I have a mirrored pool where one disk is encrypted (and consequently not available when zfs.ko loads, on demand and as a side effect of udev events or whatnot, during the dracut cmdline phase) and the other disk isn't (thus available when zfs.ko loads). As a consequence, before the first zpool import in our mount-zfs.sh even executes, the pool is already half-assedly imported (missing one leg). I tried the spa_config_path workaround, and it doesn't work well. Mainly if I set it to a dumb path, then later on it writes the file in that dumb path. And, of course, it does NOT read the (valid) cache file, which is what I want to prevent zfs force from being necessary. Even replaced my /dev/null with a regular file. I ended up having to write this dumb workaround: https://github.com/Rudd-O/zfs/blob/master/dracut/90zfs/parse-zfs-pre.sh.in but this breaks the use of zpool.cache anyway, so I ended having to do zfs_force=1 anyway. Well, at least this doesn't totally break import of mirrored pools. So, pretty please, now that we know it is inopportune to just import all pools on module load, can we please, please have a flag to turn this cancerous hell off? |
There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#330
There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #330
My tree's draces fixes those boot issues caused by import-at-module-load-time. Enjoy! |
There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#330
@behlendorf A behavior like what you describe in the initial comment can be obtained using |
@behlendorf @ryao @Rudd-O @bill-mcgonigle @prakashsurya Why is this still open? Considering that we now/already have a |
FWIW, my old patch that's loosely related to this, #1587, was intended to go beyond the functionality of |
@prakashsurya Is this possible without a cache file etc? The pull request mentioned seem to create a special cache file for this, but that's seems a little dumb, since we're removing the 'real' one... ? |
It depends what you mean by "cache file". It creates a temporary file to maintain state as udev populates the devices, but it's different than the existing This temporary file was only done as an optimization, alternatively you could probe every disk on the system as each new disk that is initialized by udev to build up the pool configuration, but that turns a O(N) algorithm into O(N^2). And with "a lot" of disks, that'd take an absurd amount of time. Do you have a better idea about how to automatically import a pool after a reboot, once the |
How about using ZED for this instead of UDEV? |
In my uninformed opinion, udev should signal to zed when ZFS component devices appear and disappear, while zed should (in some cases) take action based on whether the devices can conform a pool or not. but this raises all sorts of questions like, should zed start an incomplete pool and then add devices that should have been in the pool to begin with, what happens when zed is not running at the time of the first few udev notifications, what happens if zed crashes, et cetera. It's a complicated thing. I think ultimately what we all want is something like, some running program gets some signal (or perhaps a program is executed at that point) when a pool is available for import, and this program decides whether to import the pool or not. With my old systemd-based system, I sorta had something like that: I'd open the cachefile and ask it what devices to wait for, what pools they conformed, and then generated unit files to import the pools based on those dependencies, which then would be dependencies in their own right to the file systems within them. Made for a fairly reliable system. Now I'm back to mainline zpool import all unit files, which fail. |
Well, the 'udevadm monitor' should be reasonably simple to emulate inside zed. If zed is dealing with this, the script zed execute for this could have a config file where this is set. And if zed isn't running, well the same problem/argument could go for udevd, nothing is completely bulletproof. But considering zed is going to take care about a lot of action regarding the pool (such as spares, keep an eye open for checksum and i/o errors and in the future most likely to deal with sharing/unsharing of smb/iscsi/nfs etc, etc), it makes sense that zed is responsible for deciding what pool should be imported and not. So the init script might instead of doing the mount/share etc, just issue those commands to zed and have it do the actual work (through one of it's scripts). I think that would simplify a lot! Currently there's probably five different init scripts (all different, but do basically the same thing) in ZoL. On top of that, there's the different packages init files (the one we have in pkg-zfs for example). I've tried to rectify that into TWO init scripts (one import+mount/umount and one share/unshare) that is supposed to work on ALL the platform. But it's somewhat cludgy and possibly ugly in places because of this. If we could have zed do this, there wouldn't be any need for this - one action, one [zed] script and it works everywhere because zed is... |
Integrating udev into the zed infrastructure might not be a bad way to go, now that the zed work has landed. When I originally started that work, zed was far from finished (wasn't even started IIRC). But, now that it's here, I'm not opposed to designing the import infrastructure more tightly with it. One main thing we'd need to work out to make that happen, is how to allow userspace processes the ability to issue "events" into the zed infrastructure. I'm not too familiar with all of the zed machinery, but IIRC zed currently only consumes events issued via Without thinking too hard about it, leveraging udev to submit events to zed (e.g. disk /dev/sdX has appeared/disappeared), but keeping the policy decisions and configuration within zed (e.g. 9/10 leaf vdevs are present of this raidz2, go ahead and import) seems like a really clean solution to me. |
@dun and I have talked about how the zed infrastructure could be extended to provide a few additional bits of functionality which might be useful here.
All of this functionality might be helpful in this context. |
@behlendorf @dun A 'cron like' event driver would be nice to... Having ZED taking care of daily/weekly/monthly scrubs and maintenance seems possibly more appropriate than cron. |
@behlendorf Tag this with 'zed' as well? |
Good thought |
This change introduces a new userspace command 'zimport' whose intention is to be used as a udev helper for dynamically import ZFS pools as devices appear. Although this is still a work in progress, the command currently supports a stateful interface for adding and removing devices: # To add the /dev/sda1 device to the /tmp/zimport.cache file $ zimport -a /dev/sda1 # To remove the /dev/sda1 device to the /tmp/zimport.cache file $ zimport -r /dev/sda1 Once the zimport cache file contains all of the vdevs for a given pool, it will automatically attempt to import the pool. For example, to import a pool consisting of two devices (sda1 and sdb1) run the following: $ zimport -a /dev/sda1 $ zimport -a /dev/sdb1 It is also possible to use the '-c' option to specify a custom file name to use as the cache file. For example, to use the '/var/run/zimport' instead of the default one can do the following: $ zimport -a /dev/sda1 -c /var/run/zimport Work that still needs to be implemented includes the following: * l2arc devices are not yet supported. * A udev rules helper needs to be added to make use of the 'zimport' command. This helper should run 'zimport -a %k' to add a device as they appear and 'zimport -r %k' as they disappear. * The cache file needs to be cleared and/or devices removed from the cache file as pools are successfully imported. * A configuration interface is needed to specify different policies which define the conditions which need to be met for a pool to be imported. For example: - Which should pools should be imported? - Do all vdevs need to be present to import? (e.g. raidz degraded) - How much "slack" time between all devices online and import? Currently, all vdevs must be present before the pool is imported. Also, any pool detected will try to be imported. * It would also be useful to be able to specify scripts or executables to be run at specific points during the device recognition and import phase. Signed-off-by: Prakash Surya <surya1@llnl.gov> Issue openzfs#330
Authored by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#330 Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8026 OpenZFS-commit: openzfs/openzfs@9b33e07
Authored by: Andriy Gapon <avg@FreeBSD.org> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#330 Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8026 OpenZFS-commit: openzfs/openzfs@9b33e07
It's been a few years since there's been any chatter on this issue. Has anything changed related to this import behavior? I expect that the issue still persists in some form as I found my way here from a very recent document on OpenZFS (https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html#mpt2sas). Just wondering what to expect here as I do in fact use the mpt2sas driver with LSI HBA cards, and well into the double digits of drives to be initialized. I'm moving over to ZFS from mdadm. |
There are times where it would be desirable for zfs to not automatically import a pool on module load. Add a zfs_autoimport_disable module option to make this tunable without resorting to removing the cache file.
The text was updated successfully, but these errors were encountered: