'zpool import' dynamically via udev #330

behlendorf · 2011-07-21T18:28:02Z

There are times where it would be desirable for zfs to not automatically import a pool on module load. Add a zfs_autoimport_disable module option to make this tunable without resorting to removing the cache file.

bill-mcgonigle · 2011-12-10T00:35:51Z

would it make sense to implement this as a pool property?

My use case: I have a Xen server with the VM's using zvols for disks, but now I'd like to have one VM actually use ZFS directly, so I'd like to mark its (different) pool to not be imported by the Dom0 on system boot. I'll assign the physical disks to the DomU. In this case it would be OK for me to import it in rc.local.

I suppose ideally this would look like:

autoimport_host=guid|any|none

so I could avoid the rc.local and put spl/zfs on initrd to get to it earlier for other cases. A default 'any' gets the current behavior, setting each pool to 'none' gets a 'zfs_autoimport_disable' behavior.

It would raise the question of what to use for a uuid. The zfs module would probably need to read /etc/zfs/zfs-host.uuid or something like that (perhaps even creating it if missing). I think I'd do best with a symlink to /sys/hypervisor/uuid.

Then again, zfs knows if I've moved a pool between machines - does it already have a host [g,u]uid?

behlendorf · 2011-12-13T19:39:50Z

That might be nice, but part of the trouble is it's not so straight forward to retrieve a pool property until that pool is imported. Now these properties are cached in the zpool.cache file, so we could perhaps store them there but that seems a bit clunky.

My feeling is the right way to do this on Linux is to disable all imports during module load, and then integrate the actual import with a udev helper. The udev helper will be invoked every time a new device appears. This helper can read the zfs label from the disk which fully describes the pool configuration. From this it can determine if all the needed devices are available any only then trigger the import. This solves at least two problems which have come up.

The pool import must not occur until all (or almost all) of the devices are available. With asynchronous device detection there's no good reason to believe all of your devices will be available when the zfs modules are loaded. This becomes increases likely the more devices you have.
The behavior for what pools to import on boot could be moved to a more standard /etc/zfs/zfs.conf file. This would include not just the pools you wanted imported, but the criteria for importing them. Perhaps one person wants to import a raidz pool automatically when it's missing 1 disk and another person doesn't. This gives us one place to set this policy information on a per-pool basis.

bill-mcgonigle · 2011-12-13T20:17:22Z

even better! Event-driven should have many benefits over time.

GregorKopka · 2012-05-03T00:07:19Z

The idea of having yet another file for zfs to update in the initramfs (apart from zpool.cache with the related troubles that one already gives) dosn't sound that good to me.

Maybe a module parameter that can be given via bootloader would do the trick more nicely?

behlendorf · 2012-05-03T19:41:51Z

Actually, the idea here it to have one less file that needs to be updated in the zpool.cache file. All pools could be detected automatically and only those with the required property (or passed somehow via bootloader) would be imported. You would no longer need a zpool.cache file at all, although we'd still probably support it for legacy reasons.

Rudd-O · 2013-05-19T09:00:29Z

This is a great effort. Can this be merged?

prakashsurya · 2013-05-20T16:13:17Z

@Rudd-O If you're asking about this commit:

https://github.com/prakashsurya/zfs/commit/dd7ce6e932617495305f8fcf9f815f73f08fc713

then, I'd say no, it's not ready to be merged. The commit message states the missing functionality that needs to be added to the patch before it is anywhere near complete, and then once that is added, testing is needed to ensure it works correctly. Unfortunately, I don't see myself working on that patch in the very near future, so if you have some spare cycles I would highly encourage you to pick it up and run with it.

Rudd-O · 2013-07-01T01:32:34Z

Do you need financial support to see this through so you can work unmolested by other concerns? I would be willing, depending on what factors you see are necessary to see thru.

prakashsurya · 2013-07-01T14:52:47Z

I'm not even sure that's possible with the way my current employer, LLNL, works. I'll see what I can do about getting some time to work on this.

Rudd-O · 2013-07-03T04:05:59Z

I just want a kernel module option to auto import disable. That is all.

behlendorf · 2013-07-03T04:57:50Z

@Rudd-O You can just comment out spa_config_load() in spa_init() as a quick fix to prevent the cache file from being read during import. Alternately, if you don't care about the cache file at all (which is where I'd like to go anyway) you can probably just set the spa_config_path mount option to something like /dev/null or perhaps better mktemp -u /etc/zfs/zpool.cache.XXXXXXXX.

Rudd-O · 2013-07-03T08:14:38Z

Brian, I want the cache file to be read during import. I just don't want ZFS to read it and then attempt to import on module zfs.ko load.

Today marks 20 hours of debugging a bug -- of the worst kind: initrd bugs -- caused by this. I have a mirrored pool where one disk is encrypted (and consequently not available when zfs.ko loads, on demand and as a side effect of udev events or whatnot, during the dracut cmdline phase) and the other disk isn't (thus available when zfs.ko loads). As a consequence, before the first zpool import in our mount-zfs.sh even executes, the pool is already half-assedly imported (missing one leg).

I tried the spa_config_path workaround, and it doesn't work well. Mainly if I set it to a dumb path, then later on it writes the file in that dumb path. And, of course, it does NOT read the (valid) cache file, which is what I want to prevent zfs force from being necessary. Even replaced my /dev/null with a regular file.

I ended up having to write this dumb workaround: https://github.com/Rudd-O/zfs/blob/master/dracut/90zfs/parse-zfs-pre.sh.in but this breaks the use of zpool.cache anyway, so I ended having to do zfs_force=1 anyway. Well, at least this doesn't totally break import of mirrored pools.

So, pretty please, now that we know it is inopportune to just import all pools on module load, can we please, please have a flag to turn this cancerous hell off?

There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#330

behlendorf · 2013-07-03T17:00:48Z

@Rudd-O Try commit d028b00, it should get you the behavior your looking for. If it solves your issue we can merge it.

There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #330

Rudd-O · 2013-09-02T07:06:13Z

My tree's draces fixes those boot issues caused by import-at-module-load-time. Enjoy!

There are times when it is desirable for zfs to not automatically populate the spa namespace at module load time using the pools in the /etc/zfs/zpool.cache file. The zfs_autoimport_disable module option has been added to control this behavior. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#330

ryao · 2014-03-21T16:30:56Z

@behlendorf A behavior like what you describe in the initial comment can be obtained using zpool set cachefile=none $POOLNAME.

FransUrbo · 2014-06-06T22:14:33Z

@behlendorf @ryao @Rudd-O @bill-mcgonigle @prakashsurya Why is this still open? Considering that we now/already have a zfs_autoimport_disable and that the zpool.cache will (eventually) be removed all together...

prakashsurya · 2014-06-06T22:58:36Z

FWIW, my old patch that's loosely related to this, #1587, was intended to go beyond the functionality of zfs_autoimport_disable, zpool set cachefile=none, and zpool.cache. That patch was intended to allow a user to specify which pools should get imported by default, under what condition(s) (e.g. importing pool with missing vdevs), and potentially run hooks at various steps along the device enumeration and import process; all without the need for the zpool.cache file.

FransUrbo · 2014-06-06T23:19:41Z

@prakashsurya Is this possible without a cache file etc? The pull request mentioned seem to create a special cache file for this, but that's seems a little dumb, since we're removing the 'real' one... ?

prakashsurya · 2014-06-07T01:02:30Z

It depends what you mean by "cache file". It creates a temporary file to maintain state as udev populates the devices, but it's different than the existing zpool.cache file. The idea was, as udev populates ZFS disks, we can read in the ZFS labels from each disk and basically create the existing zpool.cache on the fly.

This temporary file was only done as an optimization, alternatively you could probe every disk on the system as each new disk that is initialized by udev to build up the pool configuration, but that turns a O(N) algorithm into O(N^2). And with "a lot" of disks, that'd take an absurd amount of time.

Do you have a better idea about how to automatically import a pool after a reboot, once the zpool.cache file is removed? AFAIK, currently, if the modules are loaded before the disks are populated via udev, the pool will not be imported. Also, I don't think a "degraded" pool will be automatically imported with the current code; which will probably be needed for large production JBOD configurations. It's been awhile since I looked at all that machinery though, so perhaps I'm overlooking something.

FransUrbo · 2014-06-07T17:29:47Z

How about using ZED for this instead of UDEV?

Rudd-O · 2014-06-09T01:06:30Z

In my uninformed opinion, udev should signal to zed when ZFS component devices appear and disappear, while zed should (in some cases) take action based on whether the devices can conform a pool or not. but this raises all sorts of questions like, should zed start an incomplete pool and then add devices that should have been in the pool to begin with, what happens when zed is not running at the time of the first few udev notifications, what happens if zed crashes, et cetera.

It's a complicated thing. I think ultimately what we all want is something like, some running program gets some signal (or perhaps a program is executed at that point) when a pool is available for import, and this program decides whether to import the pool or not. With my old systemd-based system, I sorta had something like that: I'd open the cachefile and ask it what devices to wait for, what pools they conformed, and then generated unit files to import the pools based on those dependencies, which then would be dependencies in their own right to the file systems within them. Made for a fairly reliable system. Now I'm back to mainline zpool import all unit files, which fail.

FransUrbo · 2014-06-09T14:41:01Z

Well, the 'udevadm monitor' should be reasonably simple to emulate inside zed. If zed is dealing with this, the script zed execute for this could have a config file where this is set. And if zed isn't running, well the same problem/argument could go for udevd, nothing is completely bulletproof. But considering zed is going to take care about a lot of action regarding the pool (such as spares, keep an eye open for checksum and i/o errors and in the future most likely to deal with sharing/unsharing of smb/iscsi/nfs etc, etc), it makes sense that zed is responsible for deciding what pool should be imported and not. So the init script might instead of doing the mount/share etc, just issue those commands to zed and have it do the actual work (through one of it's scripts).

I think that would simplify a lot! Currently there's probably five different init scripts (all different, but do basically the same thing) in ZoL. On top of that, there's the different packages init files (the one we have in pkg-zfs for example). I've tried to rectify that into TWO init scripts (one import+mount/umount and one share/unshare) that is supposed to work on ALL the platform. But it's somewhat cludgy and possibly ugly in places because of this. If we could have zed do this, there wouldn't be any need for this - one action, one [zed] script and it works everywhere because zed is...

prakashsurya · 2014-06-10T22:12:24Z

Integrating udev into the zed infrastructure might not be a bad way to go, now that the zed work has landed. When I originally started that work, zed was far from finished (wasn't even started IIRC). But, now that it's here, I'm not opposed to designing the import infrastructure more tightly with it.

One main thing we'd need to work out to make that happen, is how to allow userspace processes the ability to issue "events" into the zed infrastructure. I'm not too familiar with all of the zed machinery, but IIRC zed currently only consumes events issued via zpool events; and there isn't currently a way for a userspace process (e.g. a udev helper) to push events into the kernel to be processed by zed.

Without thinking too hard about it, leveraging udev to submit events to zed (e.g. disk /dev/sdX has appeared/disappeared), but keeping the policy decisions and configuration within zed (e.g. 9/10 leaf vdevs are present of this raidz2, go ahead and import) seems like a really clean solution to me.

behlendorf · 2014-06-11T16:24:58Z

@dun and I have talked about how the zed infrastructure could be extended to provide a few additional bits of functionality which might be useful here.

The ability for user space utilities to post arbitrary events to the kernel. These would then be consumed though the existing zed machinery.
The ability to modify some key/value pairs associated with the event in the kernel. This would provide a relatively easy way to build up some semi-persistent state without each script resorting to its own cache file.
The ability to post blocking events. In this case the kernel would post the event and then block until it was consumed by the zed and a return value passed back. This would provide a nice portable mechanism to replace the usermodehelper code. It's also a big part of what might be needed to move all the majority of the libshare code over to being scripts called by the zed.

All of this functionality might be helpful in this context.

FransUrbo · 2014-06-11T20:36:07Z

@behlendorf @dun A 'cron like' event driver would be nice to...

Having ZED taking care of daily/weekly/monthly scrubs and maintenance seems possibly more appropriate than cron.

FransUrbo · 2014-06-12T16:03:25Z

@behlendorf Tag this with 'zed' as well?

behlendorf · 2014-06-12T17:32:08Z

Good thought

This change introduces a new userspace command 'zimport' whose intention is to be used as a udev helper for dynamically import ZFS pools as devices appear. Although this is still a work in progress, the command currently supports a stateful interface for adding and removing devices: # To add the /dev/sda1 device to the /tmp/zimport.cache file $ zimport -a /dev/sda1 # To remove the /dev/sda1 device to the /tmp/zimport.cache file $ zimport -r /dev/sda1 Once the zimport cache file contains all of the vdevs for a given pool, it will automatically attempt to import the pool. For example, to import a pool consisting of two devices (sda1 and sdb1) run the following: $ zimport -a /dev/sda1 $ zimport -a /dev/sdb1 It is also possible to use the '-c' option to specify a custom file name to use as the cache file. For example, to use the '/var/run/zimport' instead of the default one can do the following: $ zimport -a /dev/sda1 -c /var/run/zimport Work that still needs to be implemented includes the following: * l2arc devices are not yet supported. * A udev rules helper needs to be added to make use of the 'zimport' command. This helper should run 'zimport -a %k' to add a device as they appear and 'zimport -r %k' as they disappear. * The cache file needs to be cleared and/or devices removed from the cache file as pools are successfully imported. * A configuration interface is needed to specify different policies which define the conditions which need to be met for a pool to be imported. For example: - Which should pools should be imported? - Do all vdevs need to be present to import? (e.g. raidz degraded) - How much "slack" time between all devices online and import? Currently, all vdevs must be present before the pool is imported. Also, any pool detected will try to be imported. * It would also be useful to be able to specify scripts or executables to be run at specific points during the device recognition and import phase. Signed-off-by: Prakash Surya <surya1@llnl.gov> Issue openzfs#330

Authored by: Andriy Gapon <avg@FreeBSD.org> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#330 Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8026 OpenZFS-commit: openzfs/openzfs@9b33e07

Authored by: Andriy Gapon <avg@FreeBSD.org> Approved by: Richard Lowe <richlowe@richlowe.net> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com> Closes openzfs#330 Ported-by: Giuseppe Di Natale <dinatale2@llnl.gov> OpenZFS-issue: https://www.illumos.org/issues/8026 OpenZFS-commit: openzfs/openzfs@9b33e07

h3lo · 2022-05-07T01:25:20Z

It's been a few years since there's been any chatter on this issue. Has anything changed related to this import behavior? I expect that the issue still persists in some form as I found my way here from a very recent document on OpenZFS (https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2022.04%20Root%20on%20ZFS.html#mpt2sas).

Just wondering what to expect here as I do in fact use the mpt2sas driver with LSI HBA cards, and well into the double digits of drives to be initialized. I'm moving over to ZFS from mdadm.

behlendorf mentioned this issue Dec 14, 2011

zfs not automounted during boot from init.d script in Ubuntu #381

Closed

behlendorf mentioned this issue May 1, 2012

Provide command to generate zpool.cache on demand #711

Closed

dajhorn mentioned this issue May 6, 2012

Since kernel 3.2.0-21 I have to import zpool on every reboot. #703

Closed

dajhorn mentioned this issue Nov 5, 2012

ZFS pool not mounted after booting zfsonlinux/pkg-zfs#55

Closed

dajhorn mentioned this issue Nov 26, 2012

Drives labeled by WWN inconsistently available at mount time? #1103

Closed

prakashsurya mentioned this issue Jul 12, 2013

Add 'zimport' to aid udev driven pool import (WIP) #1587

Closed

behlendorf mentioned this issue Sep 4, 2013

ghost pool #1533

Closed

dajhorn mentioned this issue Mar 6, 2014

booting from a pool on an external USB drive zfsonlinux/pkg-zfs#108

Closed

FransUrbo mentioned this issue Jun 9, 2014

systemd unit zfs-import-scan.service fails #2368

Closed

behlendorf added the zed label Jun 12, 2014

FransUrbo mentioned this issue Jun 28, 2014

Improve /etc/zfs/zpool.cache management #2433

Closed

dajhorn mentioned this issue Jul 2, 2014

Intermittent unreliable zpool mounting on boot #2444

Closed

dajhorn mentioned this issue Jul 11, 2014

Ubuntu: Not all /dev/disk/ symlinks are available when mountall runs #2472

Closed

behlendorf added the Difficulty - Medium label Oct 3, 2014

behlendorf removed this from the 0.6.5 milestone Oct 3, 2014

evujumenuk mentioned this issue Nov 26, 2015

linux>3.19.x fails to mount root ZFS on NVMe during stage 1 NixOS/nixpkgs#11003

Closed

behlendorf added this to the 0.8.0 milestone Mar 26, 2016

behlendorf removed the Difficulty - Medium label Oct 5, 2016

gmelikov mentioned this issue Dec 29, 2016

archive.zfsonlinux.org return 404 for everything #5536

Closed

dinatale2 mentioned this issue Apr 14, 2017

OpenZFS 8026 - retire zfs_throttle_delay and zfs_throttle_resolution #6014

Closed

behlendorf modified the milestones: 0.8.0, 1.0.0 Feb 9, 2018

behlendorf removed this from the 1.0.0 milestone Nov 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'zpool import' dynamically via udev #330

'zpool import' dynamically via udev #330

behlendorf commented Jul 21, 2011

bill-mcgonigle commented Dec 10, 2011

behlendorf commented Dec 13, 2011

bill-mcgonigle commented Dec 13, 2011

GregorKopka commented May 3, 2012

behlendorf commented May 3, 2012

Rudd-O commented May 19, 2013

prakashsurya commented May 20, 2013

Rudd-O commented Jul 1, 2013

prakashsurya commented Jul 1, 2013

Rudd-O commented Jul 3, 2013

behlendorf commented Jul 3, 2013

Rudd-O commented Jul 3, 2013

behlendorf commented Jul 3, 2013

Rudd-O commented Sep 2, 2013

ryao commented Mar 21, 2014

FransUrbo commented Jun 6, 2014

prakashsurya commented Jun 6, 2014

FransUrbo commented Jun 6, 2014

prakashsurya commented Jun 7, 2014

FransUrbo commented Jun 7, 2014

Rudd-O commented Jun 9, 2014

FransUrbo commented Jun 9, 2014

prakashsurya commented Jun 10, 2014

behlendorf commented Jun 11, 2014

FransUrbo commented Jun 11, 2014

FransUrbo commented Jun 12, 2014

behlendorf commented Jun 12, 2014

h3lo commented May 7, 2022 •

edited

Loading

'zpool import' dynamically via udev #330

'zpool import' dynamically via udev #330

Comments

behlendorf commented Jul 21, 2011

bill-mcgonigle commented Dec 10, 2011

behlendorf commented Dec 13, 2011

bill-mcgonigle commented Dec 13, 2011

GregorKopka commented May 3, 2012

behlendorf commented May 3, 2012

Rudd-O commented May 19, 2013

prakashsurya commented May 20, 2013

Rudd-O commented Jul 1, 2013

prakashsurya commented Jul 1, 2013

Rudd-O commented Jul 3, 2013

behlendorf commented Jul 3, 2013

Rudd-O commented Jul 3, 2013

behlendorf commented Jul 3, 2013

Rudd-O commented Sep 2, 2013

ryao commented Mar 21, 2014

FransUrbo commented Jun 6, 2014

prakashsurya commented Jun 6, 2014

FransUrbo commented Jun 6, 2014

prakashsurya commented Jun 7, 2014

FransUrbo commented Jun 7, 2014

Rudd-O commented Jun 9, 2014

FransUrbo commented Jun 9, 2014

prakashsurya commented Jun 10, 2014

behlendorf commented Jun 11, 2014

FransUrbo commented Jun 11, 2014

FransUrbo commented Jun 12, 2014

behlendorf commented Jun 12, 2014

h3lo commented May 7, 2022 • edited Loading

h3lo commented May 7, 2022 •

edited

Loading