-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool core dump #536
Comments
I'm getting weird behavior with the zpool and zfs commands, each invocation simply reports back by printing "Aborted"
|
Try running it under gdb and see what's happening. I occasionally see this sort of behavior in my automated test VMs too but I've never been able to reproduce it when I needed to and run it to ground. |
grove329@surya1:gdb zpool GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6) Copyright (C) 2010 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: ... Reading symbols from /sbin/zpool...(no debugging symbols found)...done. (gdb) run status Starting program: /sbin/zpool status [Thread debugging using libthread_db enabled] Detaching after fork from child process 6426. Program received signal SIGABRT, Aborted. 0x00002aaaacd56885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); Missing separate debuginfos, use: debuginfo-install zfs-0.6.0-rc6_1chaos.ch5.x86_64 (gdb) bt #0 0x00002aaaacd56885 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00002aaaacd58065 in abort () at abort.c:92 #2 0x00002aaaabc21192 in make_dataset_handle_common (zhp=0x6223e0, zc=) at ../../lib/libzfs/libzfs_dataset.c:426 #3 0x00002aaaabc211ef in make_dataset_handle_zc (hdl=0x61b060, zc=0x7fffffff7bc0) at ../../lib/libzfs/libzfs_dataset.c:473 #4 0x00002aaaabc2169b in zfs_iter_filesystems (zhp=0x61cdb0, func=0x2aaaac2e2270 , data=0x7fffffffe220) at ../../lib/libzfs/libzfs_dataset.c:2500 #5 0x00002aaaac2e231a in update_zfs_shares_cb (zhp=0x61cdb0, pcookie=0x7fffffffe220) at ../../lib/libshare/libshare.c:242 #6 0x00002aaaabc1ed31 in zfs_iter_root (hdl=0x61b060, func=0x2aaaac2e2270 , data=0x7fffffffe220) at ../../lib/libzfs/libzfs_config.c:365 #7 0x00002aaaac2e29d7 in update_zfs_shares (init_service=) at ../../lib/libshare/libshare.c:326 #8 sa_init (init_service=) at ../../lib/libshare/libshare.c:97 #9 0x00002aaaac2e29f0 in libshare_init () at ../../lib/libshare/libshare.c:113 #10 0x00002aaaac2e3a16 in __do_global_ctors_aux () from //lib64/libshare.so.1 #11 0x00002aaaac2e12f3 in _init () from //lib64/libshare.so.1 #12 0x00002aaaac8f1990 in ?? () #13 0x00002aaaaaab94a5 in call_init (main_map=0x2aaaaaccc188, argc=-1404156232, argv=0x7fffffffe2e8, env=0x7fffffffe300) at dl-init.c:70 #14 _dl_init (main_map=0x2aaaaaccc188, argc=-1404156232, argv=0x7fffffffe2e8, env=0x7fffffffe300) at dl-init.c:134 #15 0x00002aaaaaaabb3a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #16 0x0000000000000002 in ?? () #17 0x00007fffffffe5a9 in ?? () #18 0x00007fffffffe5b5 in ?? () #19 0x0000000000000000 in ?? () |
It looks like the zfs_dmustats.dds_type just isn't set yet. See the second abort in make_dataset_handle_common():426. |
Here's the piece of code triggering the
|
This seems odd. Just above the previous code snippet, it executes this: /* * We've managed to open the dataset and gather statistics. Determine * the high-level type. */ if (zhp->zfs_dmustats.dds_type == DMU_OST_ZVOL) zhp->zfs_head_type = ZFS_TYPE_VOLUME; else if (zhp->zfs_dmustats.dds_type == DMU_OST_ZFS) zhp->zfs_head_type = ZFS_TYPE_FILESYSTEM; else abort(); Since it is making essentially the same check on |
Just saw your comment Brian. I guess I should have refreshed this page sooner. If it's simply that |
Indeed, it should... clearly something more interesting is going on. I hadn't noticed the same check above. It's possible the zhp is being updated by multiple threads I'd need to check the whole callpath to see. |
Although, since it's failing at the same place every time I run |
I'd add some debug code to see what's going on. |
OK. If you point me to a branch whenever you get around to adding the debug code, I'll get it installed on grove. |
Actually, I was hoping you would add some debug code. :) It might be a little while before a get a chance to look at this. |
Oops. I read that as "I'll", not "I'd". Yeah that works too, I'll dig into it some more. |
OK. This confirms our suspicions. The It can be easily reproduced by running the |
It looks to me like it would be safe (and correct) to return -1 in the DMU_OST_OTHER case. |
Signed-off-by: Paul Dagnelie <pcd@delphix.com>
zpool create grove329 /dev/mapper/grove329_1 /dev/mapper/grove329_2 /dev/mapper/grove329_3
After the node crashed and was power cycled:
zpool status showed no pools available.
zpool import showed the two pools it could see (grove329 and grove330) but they were showing up with devices like sde, sdh, instead of the multipath devices and the pools were listed as unavailable.
A zpool import -d /dev/mapper showed the proper pools and all the devices show up as online.
When I try a zpool import -d /dev/mapper grove329 the zpool command core dumps. I tried a couple time with and without the -f flag and it core dumped both times. Also after this initial core dump, any subsequent zpool commands also dump core.
The text was updated successfully, but these errors were encountered: