CENTOS 7 dataset disappears #2600

dmaziuk · 2014-08-14T17:43:32Z

possibly related to #2563

zfs pool worked for a while, at some point in the last few days -- maybe the last kernel update? -- something broke. There is no zfs after boot.

Modules are loaded, systemd logs no errors, nfsd starts and shares the mountpoint. Zfs pool isn't there. That happens with or without /etc/default/zfs.

smartctl is happy with the drives, "zfs import -f tank" works fine, subsequent srub reports no errors (there's no data there yet though).

From clean reboot:

[root@manta ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md127 397G 2.0G 375G 1% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 8.6M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
[root@manta ~]# lsmod | grep zfs
zfs 1202168 3
zunicode 331251 1 zfs
zavl 15010 1 zfs
zcommon 51321 1 zfs
znvpair 93262 2 zfs,zcommon
spl 290139 5 zfs,zavl,zunicode,zcommon,znvpair
[root@manta ~]# zpool list
no pools available
[root@manta ~]# zfs list
no datasets available
[root@manta ~]# zpool status -v
no pools available
[root@manta ~]# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway
[root@manta ~]# zpool import -f tank
[root@manta ~]# zpool status -v
pool: tank
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Thu Aug 14 12:33:30 2014
config:

    NAME                                                  STATE     READ WRITE CKSUM
    tank                                                  ONLINE       0     0     0
      mirror-0                                            ONLINE       0     0     0
        ata-ST3000DM001-1CH166_W1F20LXR                   ONLINE       0     0     0
        ata-ST3000DM001-1E6166_W1F460D1                   ONLINE       0     0     0
    logs
      mirror-1                                            ONLINE       0     0     0
        ata-WDC_WD5003ABYX-01WERA1_WD-WMAYP3954112-part3  ONLINE       0     0     0
        ata-WDC_WD5003ABYX-01WERA1_WD-WMAYP4658203-part3  ONLINE       0     0     0

errors: No known data errors

The text was updated successfully, but these errors were encountered:

DeHackEd · 2014-08-14T17:50:08Z

The commands you demonstrated above show you going from a not-imported pool to an imported pool. At the end do you still have nothing mounted?

Like LVM you need to have the pool loaded before the filesystems within become available. Also note that even with auto-import (/etc/zfs/zpool.cache exists with your pool in it) ZFS doesn't automount the filesystems contained within. It is necessary for something (you or an init script) to either import from scratch or execute zfs mount -a to get things all up.

Possibly helpful command: zfs list -t filesystem -o name,mounted,mountpoint

dmaziuk · 2014-08-14T17:54:17Z

What I'm saying is I had the pool tank set up and working and mounted at boot until I rebooted that host today. Now it is not mounted at boot anymore.

Yes, after I manually import the pool it's there. But until today it was there after reboot, I dodn't have to manually import it.

The pool disappears at reboot.

dmaziuk · 2014-08-14T18:04:33Z

This might be useful info:
[root@manta ~]# rpm -q -a | grep zfs
zfs-dkms-0.6.3-1.el7.centos.noarch
libzfs2-0.6.3-1.el7.centos.x86_64
zfs-0.6.3-1.el7.centos.x86_64
zfs-release-1-2.el7.centos.noarch

dmaziuk · 2014-08-14T19:11:50Z

OK, now I got something: after exporting and re-importing the pool

Aug 14 14:06:31 manta zpool: cannot import 'tank': pool may be in use from other system
Aug 14 14:06:31 manta zpool: use '-f' to import anyway
Aug 14 14:06:31 manta systemd: zfs-import-cache.service: main process exited, code=exited, status=1/FAILURE
Aug 14 14:06:31 manta systemd: Failed to start Import ZFS pools by cache file.
Aug 14 14:06:31 manta systemd: Unit zfs-import-cache.service entered failed state.
Aug 14 14:06:31 manta systemd: Starting Mount ZFS filesystems...
Aug 14 14:06:31 manta systemd: Started Mount ZFS filesystems.

So the question is, should boot-time import run with -f, or is "may be in use from other system" a bug?

cointer · 2014-08-18T03:37:21Z

I am also having this issue on CentOS 7. I have to manually import the pool after reboot. This is an odd behavior, and as mentioned, the -f option must be passed when re-importing the pool.

Perhaps there is something happening where the pool is not being exported when the zfs service stops when reboot happens?

dmaziuk · 2014-08-18T14:21:50Z

I think the question is why zfs thinks the pool "may be in use". Is there lock of some sort that doesn't get cleared at shutdown?

I added -f to systemd's zfs-share target (or whatever it's called) for now.

cointer · 2014-08-18T15:05:32Z

It was my understanding that when the ZFS service stops, it is supposed to export all pools. The same "may be in use" behavior happens when you don't export a pool, move disks to another machine, and try to import it on that other machine.

I haven't tried it yet, but I bet if you export the pool before rebooting, it will import cleanly when the system comes back up. This obviously is not a solution, but perhaps it can be a clue as to what is happening. I can't try this on my systems at the moment, can you give it a try?

dmaziuk · 2014-08-18T16:33:48Z

You've any idea where in that fpo-systemd-s the "stop" action should go? Anyway, it doesn't look like you can actually do it: import runs with -c /etc/zfs/zpool.cache, but export doesn't like "-a", or "all", or anything other than the name of the pool. Which isn't known to the rpm maintainer.

/usr/lib/systemd/system has zfs-import-cache.service is the one where I added "-f" to zfs import, btw.

dmaziuk · 2014-08-18T16:38:44Z

Also, looking at stop() in /etc/init.d/zfs on centos 6, it only does sync. There's a commented out "zfs umount -a" -- understandable, you can't umont / at thas point -- but the thing is, on centos 6.5 it works fine without zpool export.

x042 · 2014-08-26T14:22:27Z

I am also having this same issue on CentOS 7 with latest kernel update.

I've figured out a workaround that solves the issue for me. I just made a shutdown "script" (not really much of a script, see below) which sleeps for 10 seconds and then runs zfs unmount -a.

#!/bin/bash -

/usr/bin/sleep 10
/usr/sbin/zfs umount -a

Seems ZFS volumes aren't gracefully unmounted at shutdown/reboot? Looks like this, #2575, and #2563 might all be related.

l1k · 2014-10-06T19:06:06Z

Possibly fixed by #2766 if Dracut is used.

Make use of Dracut's ability to restore the initramfs on shutdown and pivot to it, allowing for a clean unmount and export of the ZFS root. No need to force-import on every reboot anymore. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2195 Issue openzfs#2476 Issue openzfs#2498 Issue openzfs#2556 Issue openzfs#2563 Issue openzfs#2575 Issue openzfs#2600 Issue openzfs#2755 Issue openzfs#2766

dmaziuk · 2014-10-10T21:31:14Z

Dunno what you mean by "if Dracut is used" -- what's used is what the plain "basic server" centos 7 install has, -- but after youm up'ing zfs/spl to 0.6.3-1.1.el7.centos and dkms to 2.2.0.3-26.el7, my /tank is gone again after reboot:

systemctl status zfs-import-cache
zfs-import-cache.service - Import ZFS pools by cache file
Loaded: loaded (/usr/lib/systemd/system/zfs-import-cache.service; static)
Active: failed (Result: exit-code) since Fri 2014-10-10 17:27:17 EDT; 26s ago
Process: 548 ExecStart=/sbin/zpool import -c /etc/zfs/zpool.cache -aN (code=exited, status=1/FAILURE)

Main PID: 548 (code=exited, status=1/FAILURE)

Oct 10 17:27:17 batfish.bmrb.wisc.edu zpool[548]: cannot import 'tank': pool may be in use from other system
Oct 10 17:27:17 batfish.bmrb.wisc.edu zpool[548]: use '-f' to import anyway
Oct 10 17:27:17 batfish.bmrb.wisc.edu systemd[1]: zfs-import-cache.service: main process exited, code=exited, status=1/FAILURE
Oct 10 17:27:17 batfish.bmrb.wisc.edu systemd[1]: Failed to start Import ZFS pools by cache file.
Oct 10 17:27:17 batfish.bmrb.wisc.edu systemd[1]: Unit zfs-import-cache.service entered failed state.

dmaziuk · 2014-10-10T21:36:47Z

PS could you please not overwrite my /usr/lib/systemd/system/zfs-import-cache.service (it has "-f") until you've actually fixed this?

l1k · 2014-10-10T21:53:16Z

@dmaziuk: "pool may be in use from other system" means the pool was not exported on shutdown. Are you using ZFS for / (root partition)? I gathered from what you wrote above that you may be using ZFS for the root partition. If you do, you need to apply the patch in #2766, this will cause a proper export of a ZFS root on shutdown. Dracut is included in CentOS since version 6, it's the framework for generating the init ramdisk.

dmaziuk · 2014-10-10T21:54:37Z

PPS this is not a ZFS root if by "root" you mean "/"

l1k · 2014-10-10T22:05:45Z

Okay. Then I was mistaken and this is NOT fixed by #2766, rather there must be an issue with the initscript or systemd unit responsible for exporting all zpools on shutdown. Which one is used, an initscript or a systemd unit?

dmaziuk · 2014-10-10T22:11:14Z

it's systemd and if you look up at the comments made on Aug 18 and 26 you'll find that yes, we know it's umount/export that's missing and it seemed to work without it on centos 6.

l1k · 2014-10-12T16:14:10Z

@dmaziuk:
Please add this line to the [Unit] section in /usr/lib/systemd/system/zfs-mount.service:

Conflicts=final.target

Add this line to the [Service] section:

ExecStop=/bin/sh -c '/sbin/zpool list -H -o name | /usr/bin/xargs /sbin/zpool export'

Add this line to the [Service] section in /usr/lib/systemd/system/zfs-share.service:

ExecStop=/sbin/zfs unshare -a

Remove the -f you added in /usr/lib/systemd/system/zfs-import-cache.service so that zpools are not force-imported.
Activate these changes:

# systemctl daemon-reload

Does this solve the issue?

dmaziuk · 2014-10-12T20:52:21Z

No. :(

It's weird:

If I manually run 'systemctl stop zfs-mount' before reboot, it boots with /tank mounted.
If I just reboot it, there is no /tank: the usual "may be in use by other system" from zfs-import-cache.
If I 'zpool import -f -c /etc/zfs/zpool.cache -aN', then '/bin/systemctl restart zfs-mount.service' (/tank is now mounted) then reboot -- it comes up with /tank mounted, however, if I reboot again /tank is gone again.

So far the only thing that cosistently works is the "use '-f' to import anyway",

dmaziuk · 2014-10-12T21:22:27Z

On 2nd thoght I guess 'systemctl restart zfs-mount' does the explicit 'stop' so it's not weird that /tank is there if I reboot imediately afterwards -- it's the same as doing 'systemctl stop zfs-mount' before reboot.

Looks like on its own this systemd POS is not doing the 'stop' on reboot.

l1k · 2014-10-12T22:36:29Z

Is this with the modifications I described above? Before you applied these modifications, if you invoked systemctl stop zfs-mount before reboot, was /tank mounted on the next boot?

I'm not sure what the default log target is on CentOS, but assuming that it's the systemd journal, you may be able to see what's going on by inspecting the journal with journalctl. If this is of no use, the brute force method to get information is this: Edit /etc/systemd/system.conf and in the [Manager] section, configure:

DefaultStandardOutput=journal+console
DefaultStandardError=journal+console

Then add a sleep to the ExecStop command in /usr/lib/systemd/system/zfs-mount.service like this:

ExecStop=/bin/sh -c '/sbin/zpool list -H -o name | /usr/bin/xargs /sbin/zpool export ; /bin/sleep 120'

This gives you 2 minutes to inspect the output of the zpool commands immediately before shutdown. If the machine does not pause for 2 minutes, the ExecStop command probably isn't run at all. In that case, try changing the Conflicts clause to run the ExecStop command earlier:

Conflicts=shutdown.target

I don't use CentOS myself, thus can't debug this, but maybe we can track it down together.

dmaziuk · 2014-10-13T17:15:57Z

I'm pretty sure it's not running "zfs-mount stop" with "Conflicts=shutdown.target" either. Unfortunately I hosed my test system while fscking around with systemd logging, I'm taking this week off, and I've more important stuff to finish before I leave tonight. So I can roll you a centos 7 kvm image next week when I come back and upload it someplace.

dmaziuk · 2014-10-29T20:14:18Z

OK, got a fresh batch of round tuits and did a yum update to the latest systemd. In all cases /tank was mounted before reboot:

with "ExecStart=/sbin/zpool import -f -c /etc/zfs/zpool.cache -aN" in zfs-import-cache.service /tank is mounted after reboot
with "Conflicts=shutdown.target" and "ExecStop=/bin/sh -c '/sbin/zpool list -H -o name | /usr/bin/xargs /sbin/zpool export'" in zfs-mount.service: no /tank after reboot.
with "Conflicts=shutdown.target" and "ExecStop=/bin/sh -c '/sbin/zpool list -H -o name | /usr/bin/xargs /sbin/zpool export ; /bin/sleep 120'" in zfs-mount.service: "systemctl restart zfs-mount" from CLI sleeps for 2 minutes.
with "Conflicts=shutdown.target" and "ExecStop=/bin/sh -c '/sbin/zpool list -H -o name | /usr/bin/xargs /sbin/zpool export ; /bin/sleep 120'" in zfs-mount.service: no 2-minute sleep on shutdown, no /tank after reboot.

So ExecStop isn't run with Conflicts=shutdown.target either.

dmaziuk · 2014-10-29T20:14:56Z

PS. root@batfish system]# rpm -q systemd
systemd-208-11.el7_0.4.x86_64

dmaziuk · 2014-11-04T22:55:55Z

Update: curiouser and curiouser: I just set up another centos 7 box, all zfs datasets got mounted on reboot (on the first one at least). I'll see if I can figure out what's different...

dmaziuk · 2014-11-09T22:57:37Z

Good news, sort of: after yum updating the kernel, the other machine has lost /tank too (same error: "may be in use by other system, use -f"). So at least this is reproducible: it all works until you update the kernel. (dkms rebuilding the modules is changing some magic that invalidates the cache file?)

I'll see if I get time to play with "import-scan" tomorrow.

dswartz · 2014-11-09T23:32:00Z

Wasn't there a recent change relating to hostid ?

dmaziuk notifications@github.com wrote:

Good news, sort of: after yum updating the kernel, the other machine has lost /tank too (same error: "may be in use by other system, use -f"). So at least this is reproducible: it all works until you update the kernel. (dkms rebuilding the modules is changing some magic that invalidates the cache file?)

I'll see if I get time to play with "import-scan" tomorrow.

—
Reply to this email directly or view it on GitHub.

behlendorf · 2014-11-09T23:35:09Z

There was a hostid fix merged to master recently. However, it's not part of the stable repository yet. You could try updating to the testing repository and see if it resolves the issue.

dmaziuk · 2014-11-10T20:50:09Z

Looks like rpms from testing cured the 2nd machine. I'll see when I can update the 1st one and make sure it works there too.

dmaziuk · 2014-11-11T18:43:05Z

Testing rpms fix the other machine, too, so I'm calling the fix reproducible and the problem solved & closed.
Thanks guys.

behlendorf · 2014-11-12T23:51:44Z

@dmaziuk Thanks for the confirming this was fixed!

rbrunner2 · 2014-11-26T08:29:01Z

Sorry for the stupid question ... I am a newbie to centos and zfs. I am running stock centos 7 and am about to allow system upgrades and then will build my first zfs system. What must I do to pick up this change? Thanks!

dmaziuk · 2014-11-26T17:44:58Z

yum in epel-release
yum in kernel-devel zlib-devel libuuid-devel libblkid-devel libselinux-devel gcc dkms
yum in http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
yum in zfs

https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/

dmaziuk · 2014-11-26T17:47:55Z

DUH. Actually,

yum --enablerepo=zfs-testing in zfs

-- -133_g9635861 is the one with hostid fix.

rbrunner2 · 2014-11-26T20:23:52Z

Thank you so much!

Make use of Dracut's ability to restore the initramfs on shutdown and pivot to it, allowing for a clean unmount and export of the ZFS root. No need to force-import on every reboot anymore. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2195 Issue openzfs#2476 Issue openzfs#2498 Issue openzfs#2556 Issue openzfs#2563 Issue openzfs#2575 Issue openzfs#2600 Issue openzfs#2755 Issue openzfs#2766

behlendorf added this to the 0.6.4 milestone Aug 26, 2014

behlendorf added the Bug label Aug 26, 2014

dmaziuk closed this as completed Nov 11, 2014

behlendorf added Bug - Minor and removed Bug labels Nov 12, 2014

CENTOS 7 dataset disappears #2600

CENTOS 7 dataset disappears #2600

Comments

dmaziuk commented Aug 14, 2014

DeHackEd commented Aug 14, 2014

dmaziuk commented Aug 14, 2014

dmaziuk commented Aug 14, 2014

dmaziuk commented Aug 14, 2014

cointer commented Aug 18, 2014

dmaziuk commented Aug 18, 2014

cointer commented Aug 18, 2014

dmaziuk commented Aug 18, 2014

dmaziuk commented Aug 18, 2014

x042 commented Aug 26, 2014

l1k commented Oct 6, 2014

dmaziuk commented Oct 10, 2014

dmaziuk commented Oct 10, 2014

l1k commented Oct 10, 2014

dmaziuk commented Oct 10, 2014

l1k commented Oct 10, 2014

dmaziuk commented Oct 10, 2014

l1k commented Oct 12, 2014

dmaziuk commented Oct 12, 2014

dmaziuk commented Oct 12, 2014

l1k commented Oct 12, 2014

dmaziuk commented Oct 13, 2014

dmaziuk commented Oct 29, 2014

dmaziuk commented Oct 29, 2014

dmaziuk commented Nov 4, 2014

dmaziuk commented Nov 9, 2014

dswartz commented Nov 9, 2014

behlendorf commented Nov 9, 2014

dmaziuk commented Nov 10, 2014

dmaziuk commented Nov 11, 2014

behlendorf commented Nov 12, 2014

rbrunner2 commented Nov 26, 2014

dmaziuk commented Nov 26, 2014

dmaziuk commented Nov 26, 2014

rbrunner2 commented Nov 26, 2014