-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement zeventd daemon #2
Comments
Additionally one case we need to make sure to handle here is to detect when a disk is going bad but has not yet failed. Often an educated guess about this can be made based on non-success sense codes returned by the drive during IO. The Linux port promotes non-success sense code to zevents for consumption by the zeventd. These events can be used to make a determination that the drive should be proactively failed and a hot space kicked in. |
The zeventd daemon also looks like the correct way to automatically handle autoexpand. See my first comment in issue #120 for how this should be done. |
I'd like to work on this. |
Wonderful, we'd love the help. This is a daemon we know we're going to need but just haven't found the time to write yet. However, we have given it some thought so let me sketch out for you what we're thinking and we can start a discussion plus implementation from there. At a high level here's how the zeventd needs to work. The daemon will most likely be started by the zfs init.d script at system boot time. It will need some minimal configuration from a /etc/zfs/zeventd.conf file. It would be reasonable to simply have the init script source this file and pass any needed configuration in as command line options. Alternately, a parser could be added to zeventd to read in the config file. Once started the zeventd needs to consume events posted from the kernel via the /dev/zfs ioctl interface. These events are nvlists which are packed in the kernel then passed via ioctl() to user space when they are unpacked. The zpool events command show exactly how this can be done. Depending on the configuration options the event should then be sent to syslog. Next, we want to add the ability for zeventd to exec an arbitrary helper application. Typically, this will just be a script which allows us to easily customize the behavior. The zeventd should pass the full event to stdin of the helper in an easily parsed form, for example the verbose zpool events -v form. We can then provide a default helper script which parses this and handles the common cases.
For all the above we should try to be consistent with the Solaris FMA behavior as much as possible. If people want different behavior they can of course tweak the script to their liking. That is exactly why we're making the policy bit a script and not placing it in the heart of zeventd. We can also provide some helper functions in the script to make customization easy and list all the possible events and what they mean. The easiest way to implement the core of the zeventd will be as a user space daemon written in C. This way the daemon can easily leverage certain existing libraries in the zfs package to simplify the kernel ioctl, the nvlist manipulation, and event formatting. There are already helper functions to do all these things so it should be relatively straight forward. |
Sounds good to me. I will set up a stubbed out server to be in zfs/cmds/zeventd and will work towards having a parser for a config file that won't do much at first. Following on through reading events from /dev/zfs etc... I'm only cursorily familiar with the behavior of Solaris FMA. I have a dev environment set up under openSUSE 11.3 now, built both zfs/spl and have created a zpool that mounted. Will follow up with more later. Any suggested resources to look at to learn more about Solaris FMA? |
Sounds like a plan. To be honest I'm not all that familiar with Solaris FMA either, but I don't think we need/want to replicate it entirely. At least initially we just want to make sure we have enough infrastructure in place to handle the common events like disk failure, autoreplace, and autoreplace. We can determine if we need a more complicated analysis engine latter. |
Quick questions: Do zfs events have some sort of computationally unique ID's? Can individual events be cleared vs clearing all events? Consider the following: zeventd is running, recognizes a disk failure event, runs a script that is configured to launch in the event of such a failure, presumably this script attempts to move a hot spare into place. Some time later, an admin restarts zeventd using the init.d script. zeventd rereads events and sees the same series that it acted upon earlier. That would be bad! This is moot if doing a zpool_next_event(...) using blocking I/O will only read new events and not old ones (which it looks like is what happens in my initial testing). Initially, I was thinking that zeventd could keep a log of the events it has acted upon, using some unique id associated with each event (an ID produced by the underlying ZFS kernel module) and not act on any event that it has already logged. and/or zeventd could remove events after it has acted on them (I don't like this idea, I'd rather leave the event log intact). Perhaps zeventd could inject log events as it acts on other events, so that the chain of event->action/resolution is maintained in the zfs events log itself. This would also be a way for it to keep track of events that it has already acted upon. |
There is supposed to be an ENA (Error Numeric Association) nvpair in the nvlist which is close to this. However, I just noticed there is not an ENA for all event types. This field is not added for events generated by zfs_post_common(), we may need to add this. If you look at the header in module/zfs/zfs_fm.c you can read a pretty good description of how ENAs are support to be used under Solaris FMA. The way events work at the moment is the kernel maintains a list of the last 64 events. These events are then accessed via an ioctl() on /dev/zfs. Each process when it opens /dev/zfs is allowed to read events as a private stream. This allows multiple consumers of /dev/zfs events to operate concurrently without destroying each others events. However, when the device is closed and opened the process will start getting against from the oldest to the newest. Your absolutely right this could be very bad if the zeventd was restart and reprocessed the same events. Well once we get the ENA added to all event you could start tracking those in a log file. Or you could try something even simpler to start with. All the events are time stamped by the kernel when they were generated. If you simply store the time stamp of the last event processed by zeventd, then you would know which events to skip if the daemon was restarted. Events older than the logged time stamp must already have been processed and can be ignored. There may be a small race here with something like ntp adjusting your clock but that seems like an exceptional rare case. |
Ahh yes, I had noticed the ENA and thought it might be this. I like your suggestion of storing a high resolution copy of the time as an initial start and look forward to having ENA on all events. |
Seems to me like udev or udisks already provide what you are looking for, and you should route events through them (making zeventd, if needed, a consumer of those interfaces). |
I have been busy with a job transition but am coming back to this. Re: udev and udisks. udev is for devices that the kernel notices are new or have been removed and is not appropriate in the case of passing messages for zeventd zeventd would most likely be a path for messages to end up in udisks. |
Hi Brian, |
No, I can't see this code being finished by -rc9. But your right it absolutely needs to happen, and sooner would be better than later. We just haven't had enough time with the other outstanding issues to get to this. You could probably cobble together some scripts to handle this job if your desperate and need it right away. |
Hi Brian, while composing some C-code for an zeventd, the timeshifts driving me crazy. There are always timeshifts within a system. On startup there are a timeshift if the timezone gets involved, which is a most common point for vdev failures. If we think about traveling around with a zpool, then there are lots. Also NTP timeshifts occur really periodically. |
Hi Brian, while composing some C-code for an zeventd, the timeshifts driving me crazy. There are always timeshifts within a system. On startup there are a timeshift if the timezone gets involved, which is a most common point for vdev failures. If we think about traveling around with a zpool, then there are lots. Also NTP timeshifts occur really periodically. |
@pyavdr Concerning the timeshift for ENAs we could perhaps include the current value of the kernels monotonically increasing clock along with the timestamp. Or event simpler a monotonically increasing number of ENAs which have been generated since the zfs module was loaded (although that might break down if you were to unload/reload the module). Really, the contents of the event are entirely up to us so don't feel to locked in to what's there now. Offhand I'm not exactly sure which events should be used to trigger a replacement. I'd need to look at it carefully, it wouldn't be the worst idea to look and see what Solaris does in their FMA. I haven't had a chance to seriously look in to either #250 or #544 yet. However, your right getting them fixed is a bit of a prerequisite for this work. |
Brian, while searching i found this : http://gdamore.blogspot.com/2010/07/zfs-disk-monitoring.html Is there any chance you can contact him to ask for the code of his monitor ? |
Garrett's been busy. It sounds like he's willing to share the code with anyone, it's certainly worth taking a look at. Although since it's a FMA module I'm not sure how useful it will be. |
Brian, while testing around failed disks the zpool events for some cases are missing. In my vmware suse guest i created a quite large pool with a zfs with 16 vdevs. Next i remove one vdev with vmware device manager. Copied a file to the pool. After that the pool shows state (zpool status) "degraded" with the missing vdev marked first as "faulted" and then "removed". But there is only a simply "state change" event in the zpool events. |
After spa_vdev_remove_aux() is called, the config nvlist is no longer valid, as it's been replaced by the new one (with the specified device removed). Therefore any pointers into the nvlist are no longer valid. So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the call to spa_vdev_remove_aux(). Instead, use spa_strdup() to save a copy of the string before calling spa_vdev_remove_aux. Found by AddressSanitizer: ERROR: AddressSanitizer: heap-use-after-free on address ... READ of size 34 at 0x608000a1fcd0 thread T686 #0 0x7fe88b0c166d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447 #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259 #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 #4 0x55ffbc769fba in ztest_execute ztest.c:6714 #5 0x55ffbc779a90 in ztest_thread ztest.c:6761 #6 0x7fe889cbc6da in start_thread openzfs#7 0x7fe8899e588e in __clone 0x608000a1fcd0 is located 48 bytes inside of 88-byte region freed by thread T686 here: #0 0x7fe88b14e7b8 in __interceptor_free #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874 #2 0x7fe88ae543ba in nvpair_free nvpair.c:844 #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978 #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185 #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221 #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 openzfs#7 0x55ffbc769fba in ztest_execute ztest.c:6714 openzfs#8 0x55ffbc779a90 in ztest_thread ztest.c:6761 openzfs#9 0x7fe889cbc6da in start_thread Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#9706
After spa_vdev_remove_aux() is called, the config nvlist is no longer valid, as it's been replaced by the new one (with the specified device removed). Therefore any pointers into the nvlist are no longer valid. So we can't save the result of `fnvlist_lookup_string(nv, ZPOOL_CONFIG_PATH)` (in vd_path) across the call to spa_vdev_remove_aux(). Instead, use spa_strdup() to save a copy of the string before calling spa_vdev_remove_aux. Found by AddressSanitizer: ERROR: AddressSanitizer: heap-use-after-free on address ... READ of size 34 at 0x608000a1fcd0 thread T686 #0 0x7fe88b0c166d (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x5166d) #1 0x7fe88a5acd6e in spa_strdup spa_misc.c:1447 #2 0x7fe88a688034 in spa_vdev_remove vdev_removal.c:2259 #3 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 #4 0x55ffbc769fba in ztest_execute ztest.c:6714 #5 0x55ffbc779a90 in ztest_thread ztest.c:6761 #6 0x7fe889cbc6da in start_thread #7 0x7fe8899e588e in __clone 0x608000a1fcd0 is located 48 bytes inside of 88-byte region freed by thread T686 here: #0 0x7fe88b14e7b8 in __interceptor_free #1 0x7fe88ae541c5 in nvlist_free nvpair.c:874 #2 0x7fe88ae543ba in nvpair_free nvpair.c:844 #3 0x7fe88ae57400 in nvlist_remove_nvpair nvpair.c:978 #4 0x7fe88a683c81 in spa_vdev_remove_aux vdev_removal.c:185 #5 0x7fe88a68857c in spa_vdev_remove vdev_removal.c:2221 #6 0x55ffbc7748f8 in ztest_vdev_aux_add_remove ztest.c:3229 #7 0x55ffbc769fba in ztest_execute ztest.c:6714 #8 0x55ffbc779a90 in ztest_thread ztest.c:6761 #9 0x7fe889cbc6da in start_thread Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@ixsystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #9706
Using zfs with Lustre, an arc_read can trigger kernel memory allocation that in turn leads to a memory reclaim callback and a deadlock within a single zfs process. This change uses spl_fstrans_mark and spl_trans_unmark to prevent the reclaim attempt and the deadlock (https://zfsonlinux.topicbox.com/groups/zfs-devel/T4db2c705ec1804ba). The stack trace observed is: #0 [ffffc9002b98adc8] __schedule at ffffffff81610f2e openzfs#1 [ffffc9002b98ae68] schedule at ffffffff81611558 openzfs#2 [ffffc9002b98ae70] schedule_preempt_disabled at ffffffff8161184a openzfs#3 [ffffc9002b98ae78] __mutex_lock at ffffffff816131e8 openzfs#4 [ffffc9002b98af18] arc_buf_destroy at ffffffffa0bf37d7 [zfs] openzfs#5 [ffffc9002b98af48] dbuf_destroy at ffffffffa0bfa6fe [zfs] openzfs#6 [ffffc9002b98af88] dbuf_evict_one at ffffffffa0bfaa96 [zfs] openzfs#7 [ffffc9002b98afa0] dbuf_rele_and_unlock at ffffffffa0bfa561 [zfs] openzfs#8 [ffffc9002b98b050] dbuf_rele_and_unlock at ffffffffa0bfa32b [zfs] openzfs#9 [ffffc9002b98b100] osd_object_delete at ffffffffa0b64ecc [osd_zfs] openzfs#10 [ffffc9002b98b118] lu_object_free at ffffffffa06d6a74 [obdclass] openzfs#11 [ffffc9002b98b178] lu_site_purge_objects at ffffffffa06d7fc1 [obdclass] openzfs#12 [ffffc9002b98b220] lu_cache_shrink_scan at ffffffffa06d81b8 [obdclass] openzfs#13 [ffffc9002b98b278] shrink_slab at ffffffff811ca9d8 openzfs#14 [ffffc9002b98b338] shrink_node at ffffffff811cfd94 openzfs#15 [ffffc9002b98b3b8] do_try_to_free_pages at ffffffff811cfe63 openzfs#16 [ffffc9002b98b408] try_to_free_pages at ffffffff811d01c4 openzfs#17 [ffffc9002b98b488] __alloc_pages_slowpath at ffffffff811be7f2 openzfs#18 [ffffc9002b98b580] __alloc_pages_nodemask at ffffffff811bf3ed openzfs#19 [ffffc9002b98b5e0] new_slab at ffffffff81226304 openzfs#20 [ffffc9002b98b638] ___slab_alloc at ffffffff812272ab openzfs#21 [ffffc9002b98b6f8] __slab_alloc at ffffffff8122740c openzfs#22 [ffffc9002b98b708] kmem_cache_alloc at ffffffff81227578 openzfs#23 [ffffc9002b98b740] spl_kmem_cache_alloc at ffffffffa048a1fd [spl] openzfs#24 [ffffc9002b98b780] arc_buf_alloc_impl at ffffffffa0befba2 [zfs] openzfs#25 [ffffc9002b98b7b0] arc_read at ffffffffa0bf0924 [zfs] openzfs#26 [ffffc9002b98b858] dbuf_read at ffffffffa0bf9083 [zfs] openzfs#27 [ffffc9002b98b900] dmu_buf_hold_by_dnode at ffffffffa0c04869 [zfs] Signed-off-by: Mark Roper <markroper@gmail.com>
Merged Allan's fixes for send/recv Resume
This is a fixup of commit 0fdd610 See added test case for a reproducer. Stack trace: panic: VERIFY3(nvlist_next_nvpair(redactnvl, pair) == NULL) failed (0xfffff80003ce5d18x == 0x) cpuid = 7 time = 1602212370 KDB: stack backtrace: #0 0xffffffff80c1d297 at kdb_backtrace+0x67 openzfs#1 0xffffffff80bd05cd at vpanic+0x19d openzfs#2 0xffffffff828446fa at spl_panic+0x3a openzfs#3 0xffffffff828af85d at dmu_redact_snap+0x39d openzfs#4 0xffffffff829c0370 at zfs_ioc_redact+0xa0 openzfs#5 0xffffffff829bba44 at zfsdev_ioctl_common+0x4a4 openzfs#6 0xffffffff8284c3ed at zfsdev_ioctl+0x14d openzfs#7 0xffffffff80a85ead at devfs_ioctl+0xad openzfs#8 0xffffffff8122a46c at VOP_IOCTL_APV+0x7c openzfs#9 0xffffffff80cb0a3a at vn_ioctl+0x16a openzfs#10 0xffffffff80a8649f at devfs_ioctl_f+0x1f openzfs#11 0xffffffff80c3b55e at kern_ioctl+0x2be openzfs#12 0xffffffff80c3b22d at sys_ioctl+0x15d openzfs#13 0xffffffff810a88e4 at amd64_syscall+0x364 openzfs#14 0xffffffff81082330 at fast_syscall_common+0x101 Signed-off-by: Christian Schwarz <me@cschwarz.com>
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #12339
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes #12339
`zpool_do_import()` passes `argv[0]`, (optionally) `argv[1]`, and `pool_specified` to `import_pools()`. If `pool_specified==FALSE`, the `argv[]` arguments are not used. However, these values may be off the end of the `argv[]` array, so loading them could dereference unmapped memory. This error is reported by the asan build: ``` ================================================================= ==6003==ERROR: AddressSanitizer: heap-buffer-overflow READ of size 8 at 0x6030000004a8 thread T0 #0 0x562a078b50eb in zpool_do_import zpool_main.c:3796 #1 0x562a078858c5 in main zpool_main.c:10709 #2 0x7f5115231bf6 in __libc_start_main #3 0x562a07885eb9 in _start 0x6030000004a8 is located 0 bytes to the right of 24-byte region allocated by thread T0 here: #0 0x7f5116ac6b40 in __interceptor_malloc #1 0x562a07885770 in main zpool_main.c:10699 #2 0x7f5115231bf6 in __libc_start_main ``` This commit passes NULL for these arguments if they are off the end of the `argv[]` array. Reviewed-by: George Wilson <gwilson@delphix.com> Reviewed-by: John Kennedy <john.kennedy@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Matthew Ahrens <mahrens@delphix.com> Closes openzfs#12339
Simplify and document OpenZFS library dependencies
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 #1 0xffffffff8058e86f at vpanic+0x17f #2 0xffffffff8058e6e3 at panic+0x43 #3 0xffffffff808adc15 at trap_fatal+0x385 #4 0xffffffff808adc6f at trap_pfault+0x4f #5 0xffffffff80886da8 at calltrap+0x8 #6 0xffffffff80669186 at vgonel+0x186 #7 0xffffffff80669841 at vgone+0x31 #8 0xffffffff8065806d at vfs_hash_insert+0x26d #9 0xffffffff81a39069 at sfs_vgetx+0x149 #10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #11 0xffffffff8065a28c at lookup+0x45c #12 0xffffffff806594b9 at namei+0x259 #13 0xffffffff80676a33 at kern_statat+0xf3 #14 0xffffffff8067712f at sys_fstatat+0x2f #15 0xffffffff808ae50c at amd64_syscall+0x10c #16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 #1 0xffffffff8059620f at vpanic+0x17f #2 0xffffffff81a27f4a at spl_panic+0x3a #3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 #4 0xffffffff8066fdee at vinactivef+0xde #5 0xffffffff80670b8a at vgonel+0x1ea #6 0xffffffff806711e1 at vgone+0x31 #7 0xffffffff8065fa0d at vfs_hash_insert+0x26d #8 0xffffffff81a39069 at sfs_vgetx+0x149 #9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 #10 0xffffffff80661c2c at lookup+0x45c #11 0xffffffff80660e59 at namei+0x259 #12 0xffffffff8067e3d3 at kern_statat+0xf3 #13 0xffffffff8067eacf at sys_fstatat+0x2f #14 0xffffffff808b5ecc at amd64_syscall+0x10c #15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes #14501
Under certain loads, the following panic is hit: panic: page fault KDB: stack backtrace: #0 0xffffffff805db025 at kdb_backtrace+0x65 openzfs#1 0xffffffff8058e86f at vpanic+0x17f openzfs#2 0xffffffff8058e6e3 at panic+0x43 openzfs#3 0xffffffff808adc15 at trap_fatal+0x385 openzfs#4 0xffffffff808adc6f at trap_pfault+0x4f openzfs#5 0xffffffff80886da8 at calltrap+0x8 openzfs#6 0xffffffff80669186 at vgonel+0x186 openzfs#7 0xffffffff80669841 at vgone+0x31 openzfs#8 0xffffffff8065806d at vfs_hash_insert+0x26d openzfs#9 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#10 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#11 0xffffffff8065a28c at lookup+0x45c openzfs#12 0xffffffff806594b9 at namei+0x259 openzfs#13 0xffffffff80676a33 at kern_statat+0xf3 openzfs#14 0xffffffff8067712f at sys_fstatat+0x2f openzfs#15 0xffffffff808ae50c at amd64_syscall+0x10c openzfs#16 0xffffffff808876bb at fast_syscall_common+0xf8 The page fault occurs because vgonel() will call VOP_CLOSE() for active vnodes. For this reason, define vop_close for zfsctl_ops_snapshot. While here, define vop_open for consistency. After adding the necessary vop, the bug progresses to the following panic: panic: VERIFY3(vrecycle(vp) == 1) failed (0 == 1) cpuid = 17 KDB: stack backtrace: #0 0xffffffff805e29c5 at kdb_backtrace+0x65 openzfs#1 0xffffffff8059620f at vpanic+0x17f openzfs#2 0xffffffff81a27f4a at spl_panic+0x3a openzfs#3 0xffffffff81a3a4d0 at zfsctl_snapshot_inactive+0x40 openzfs#4 0xffffffff8066fdee at vinactivef+0xde openzfs#5 0xffffffff80670b8a at vgonel+0x1ea openzfs#6 0xffffffff806711e1 at vgone+0x31 openzfs#7 0xffffffff8065fa0d at vfs_hash_insert+0x26d openzfs#8 0xffffffff81a39069 at sfs_vgetx+0x149 openzfs#9 0xffffffff81a39c54 at zfsctl_snapdir_lookup+0x1e4 openzfs#10 0xffffffff80661c2c at lookup+0x45c openzfs#11 0xffffffff80660e59 at namei+0x259 openzfs#12 0xffffffff8067e3d3 at kern_statat+0xf3 openzfs#13 0xffffffff8067eacf at sys_fstatat+0x2f openzfs#14 0xffffffff808b5ecc at amd64_syscall+0x10c openzfs#15 0xffffffff8088f07b at fast_syscall_common+0xf8 This is caused by a race condition that can occur when allocating a new vnode and adding that vnode to the vfs hash. If the newly created vnode loses the race when being inserted into the vfs hash, it will not be recycled as its usecount is greater than zero, hitting the above assertion. Fix this by dropping the assertion. FreeBSD-issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252700 Reviewed-by: Andriy Gapon <avg@FreeBSD.org> Reviewed-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Alek Pinchuk <apinchuk@axcient.com> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Co-authored-by: Rob Wing <rob.wing@klarasystems.com> Submitted-by: Klara, Inc. Sponsored-by: rsync.net Closes openzfs#14501
As of zfs-0.4.9 the infrastructure is in place to monitor events generated by zfs from user space. Currently the only consumer of this ABI is the 'zpool events' command which can be used to print the events which are just nvlists. What we need now is a zeventd daemon which uses this interface to monitor zevents. Initially this daemon can simply forward the events cleanly off to syslog but longer term we want to be able to have specific events trigger actions. For example, in the case of a disk failure event you may want to send an email or perhaps kick off a suite of diagnostics against the newly failed drive.
The text was updated successfully, but these errors were encountered: