-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CENTOS 7 dataset disappears #2600
Comments
The commands you demonstrated above show you going from a not-imported pool to an imported pool. At the end do you still have nothing mounted? Like LVM you need to have the pool loaded before the filesystems within become available. Also note that even with auto-import ( Possibly helpful command: |
What I'm saying is I had the pool tank set up and working and mounted at boot until I rebooted that host today. Now it is not mounted at boot anymore. Yes, after I manually import the pool it's there. But until today it was there after reboot, I dodn't have to manually import it. The pool disappears at reboot. |
This might be useful info: |
OK, now I got something: after exporting and re-importing the pool Aug 14 14:06:31 manta zpool: cannot import 'tank': pool may be in use from other system So the question is, should boot-time import run with -f, or is "may be in use from other system" a bug? |
I am also having this issue on CentOS 7. I have to manually import the pool after reboot. This is an odd behavior, and as mentioned, the -f option must be passed when re-importing the pool. Perhaps there is something happening where the pool is not being exported when the zfs service stops when reboot happens? |
I think the question is why zfs thinks the pool "may be in use". Is there lock of some sort that doesn't get cleared at shutdown? I added -f to systemd's zfs-share target (or whatever it's called) for now. |
It was my understanding that when the ZFS service stops, it is supposed to export all pools. The same "may be in use" behavior happens when you don't export a pool, move disks to another machine, and try to import it on that other machine. I haven't tried it yet, but I bet if you export the pool before rebooting, it will import cleanly when the system comes back up. This obviously is not a solution, but perhaps it can be a clue as to what is happening. I can't try this on my systems at the moment, can you give it a try? |
You've any idea where in that fpo-systemd-s the "stop" action should go? Anyway, it doesn't look like you can actually do it: import runs with -c /etc/zfs/zpool.cache, but export doesn't like "-a", or "all", or anything other than the name of the pool. Which isn't known to the rpm maintainer. /usr/lib/systemd/system has zfs-import-cache.service is the one where I added "-f" to zfs import, btw. |
Also, looking at stop() in /etc/init.d/zfs on centos 6, it only does sync. There's a commented out "zfs umount -a" -- understandable, you can't umont / at thas point -- but the thing is, on centos 6.5 it works fine without zpool export. |
I am also having this same issue on CentOS 7 with latest kernel update. I've figured out a workaround that solves the issue for me. I just made a shutdown "script" (not really much of a script, see below) which sleeps for 10 seconds and then runs zfs unmount -a. #!/bin/bash -
/usr/bin/sleep 10
/usr/sbin/zfs umount -a Seems ZFS volumes aren't gracefully unmounted at shutdown/reboot? Looks like this, #2575, and #2563 might all be related. |
Possibly fixed by #2766 if Dracut is used. |
Make use of Dracut's ability to restore the initramfs on shutdown and pivot to it, allowing for a clean unmount and export of the ZFS root. No need to force-import on every reboot anymore. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2195 Issue openzfs#2476 Issue openzfs#2498 Issue openzfs#2556 Issue openzfs#2563 Issue openzfs#2575 Issue openzfs#2600 Issue openzfs#2755 Issue openzfs#2766
Dunno what you mean by "if Dracut is used" -- what's used is what the plain "basic server" centos 7 install has, -- but after youm up'ing zfs/spl to 0.6.3-1.1.el7.centos and dkms to 2.2.0.3-26.el7, my /tank is gone again after reboot:
Main PID: 548 (code=exited, status=1/FAILURE) Oct 10 17:27:17 batfish.bmrb.wisc.edu zpool[548]: cannot import 'tank': pool may be in use from other system |
PS could you please not overwrite my /usr/lib/systemd/system/zfs-import-cache.service (it has "-f") until you've actually fixed this? |
@dmaziuk: "pool may be in use from other system" means the pool was not exported on shutdown. Are you using ZFS for / (root partition)? I gathered from what you wrote above that you may be using ZFS for the root partition. If you do, you need to apply the patch in #2766, this will cause a proper export of a ZFS root on shutdown. Dracut is included in CentOS since version 6, it's the framework for generating the init ramdisk. |
PPS this is not a ZFS root if by "root" you mean "/" |
Okay. Then I was mistaken and this is NOT fixed by #2766, rather there must be an issue with the initscript or systemd unit responsible for exporting all zpools on shutdown. Which one is used, an initscript or a systemd unit? |
it's systemd and if you look up at the comments made on Aug 18 and 26 you'll find that yes, we know it's umount/export that's missing and it seemed to work without it on centos 6. |
@dmaziuk:
Add this line to the
Add this line to the
Remove the
Does this solve the issue? |
No. :( It's weird:
So far the only thing that cosistently works is the "use '-f' to import anyway", |
On 2nd thoght I guess 'systemctl restart zfs-mount' does the explicit 'stop' so it's not weird that /tank is there if I reboot imediately afterwards -- it's the same as doing 'systemctl stop zfs-mount' before reboot. Looks like on its own this systemd POS is not doing the 'stop' on reboot. |
Is this with the modifications I described above? Before you applied these modifications, if you invoked I'm not sure what the default log target is on CentOS, but assuming that it's the systemd journal, you may be able to see what's going on by inspecting the journal with
Then add a
This gives you 2 minutes to inspect the output of the zpool commands immediately before shutdown. If the machine does not pause for 2 minutes, the ExecStop command probably isn't run at all. In that case, try changing the
I don't use CentOS myself, thus can't debug this, but maybe we can track it down together. |
I'm pretty sure it's not running "zfs-mount stop" with "Conflicts=shutdown.target" either. Unfortunately I hosed my test system while fscking around with systemd logging, I'm taking this week off, and I've more important stuff to finish before I leave tonight. So I can roll you a centos 7 kvm image next week when I come back and upload it someplace. |
OK, got a fresh batch of round tuits and did a yum update to the latest systemd. In all cases /tank was mounted before reboot:
So ExecStop isn't run with Conflicts=shutdown.target either. |
PS. root@batfish system]# rpm -q systemd |
Update: curiouser and curiouser: I just set up another centos 7 box, all zfs datasets got mounted on reboot (on the first one at least). I'll see if I can figure out what's different... |
Good news, sort of: after yum updating the kernel, the other machine has lost /tank too (same error: "may be in use by other system, use -f"). So at least this is reproducible: it all works until you update the kernel. (dkms rebuilding the modules is changing some magic that invalidates the cache file?) I'll see if I get time to play with "import-scan" tomorrow. |
Wasn't there a recent change relating to hostid ? dmaziuk notifications@github.com wrote:
|
There was a hostid fix merged to master recently. However, it's not part of the stable repository yet. You could try updating to the testing repository and see if it resolves the issue. |
Looks like rpms from testing cured the 2nd machine. I'll see when I can update the 1st one and make sure it works there too. |
Testing rpms fix the other machine, too, so I'm calling the fix reproducible and the problem solved & closed. |
@dmaziuk Thanks for the confirming this was fixed! |
Sorry for the stupid question ... I am a newbie to centos and zfs. I am running stock centos 7 and am about to allow system upgrades and then will build my first zfs system. What must I do to pick up this change? Thanks! |
https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/ |
DUH. Actually,
-- |
Thank you so much! |
Make use of Dracut's ability to restore the initramfs on shutdown and pivot to it, allowing for a clean unmount and export of the ZFS root. No need to force-import on every reboot anymore. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#2195 Issue openzfs#2476 Issue openzfs#2498 Issue openzfs#2556 Issue openzfs#2563 Issue openzfs#2575 Issue openzfs#2600 Issue openzfs#2755 Issue openzfs#2766
possibly related to #2563
zfs pool worked for a while, at some point in the last few days -- maybe the last kernel update? -- something broke. There is no zfs after boot.
Modules are loaded, systemd logs no errors, nfsd starts and shares the mountpoint. Zfs pool isn't there. That happens with or without /etc/default/zfs.
smartctl is happy with the drives, "zfs import -f tank" works fine, subsequent srub reports no errors (there's no data there yet though).
From clean reboot:
[root@manta ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md127 397G 2.0G 375G 1% /
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 8.6M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
[root@manta ~]# lsmod | grep zfs
zfs 1202168 3
zunicode 331251 1 zfs
zavl 15010 1 zfs
zcommon 51321 1 zfs
znvpair 93262 2 zfs,zcommon
spl 290139 5 zfs,zavl,zunicode,zcommon,znvpair
[root@manta ~]# zpool list
no pools available
[root@manta ~]# zfs list
no datasets available
[root@manta ~]# zpool status -v
no pools available
[root@manta ~]# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway
[root@manta ~]# zpool import -f tank
[root@manta ~]# zpool status -v
pool: tank
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Thu Aug 14 12:33:30 2014
config:
errors: No known data errors
The text was updated successfully, but these errors were encountered: