-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various ZED fixes #6858
Various ZED fixes #6858
Conversation
0290a22
to
f7a56d1
Compare
tests/runfiles/issue-2562.run
Outdated
outputdir = /var/tmp/test_results | ||
|
||
[tests/functional/fault] | ||
tests = ['auto_online_001_pos', 'auto_replace_001_pos', 'auto_spare_001_pos', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should try setting TEST_ZFSTESTS_ITERS
in the TEST
file to a reasonable value and set TEST_ZFSTESTS_TAGS
to fault
in this case. It should simplify your testing quite a bit and help you avoid making a separate run file for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a bit awkward using a custom runfile in this case, but i want to run those fault
tests just once to verify i don't break anything when i update functions shared by that group (that is, before i drop the custom runfile) and instead "stress" the new test i'm adding: it's also easier to read the log from my browser if the output is always coming from the same "new" test.
One thing i would like to be able to do from the TEST
file is to disable the codecov builder/bot while the PR is still work in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be able to come up with something soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is exactly the fix I've wanted to get to for a while now.
I'm not sure how to explain the crash you observed locally but it wouldn't surprise me if it was something specific to the scsi_debug module. It doesn't look like the automated testing hit this specified issue.
On a related note, we also wanted to add another auto_spare
test case which verifies that in the face of multiple failures multiple spares are kicked in. This should work with the code as it stands today.
2d3d72a
to
713eede
Compare
log_must modprobe -r scsi_debug | ||
fi | ||
load_scsi_debug $(($SPA_MINDEVSIZE/1024/1024+32)) $SDHOSTS $SDTGTS $SDLUNS '512e' | ||
SPARE_DEVICE=$(get_debug_device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You’re creating a ‘512e’ scsi device here, but you never reload the ‘scsi_debug’ module to recreate the ‘512b’ device. You can’t depend on test ordering or that other tests don’t just assume that they should re-setup ‘scsi_debug’.
Since we need different device types, would it be worth considering having each test in this test group perform a ‘load_scsi_debug’ call to set up an appropriate scsi device? It might simplify the test group overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinatale2 sorry for the late reply, i've been unable to work on this due to an unexpected workload.
Loading the scsi_debug driver in each test group sounds good to me; i'm also trying to update the other test case to exercise ZED with multiple, parallel faults. I'm going to push something once my local builder is able to run these changes, hopefully before next monday.
OT: does it make sense to split this into two separate commits ("update ZED to handle ashift" + "new spare test case with multiple faults")? I don't mind the extra (minimal) step required if it's going to help keep the git history more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@loli10K I'm OK with this either as a single commit or as two. All I ask if that you open a new PR for the multiple fault test case if you do decide to split it. It makes it easier for me to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then let's merge this in a single commit, it seems easier for everyone.
On a more related note, moving the 'load_scsi_debug' call into the individual scripts that need it enabled some tests that were previously skipped (scsi_debug
was missing, and triggering a log_unsupported
in the setup script): now 'auto_spare_002_pos' seems to be failing on a couple of Ubuntu builders (Ubuntu 17.04 x86_64 / Ubuntu 16.04 x86_64).
13:24:38.47 NOTE: Performing local cleanup via log_onexit (cleanup)
13:24:38.48 pool: testpool
13:24:38.48 state: ONLINE
13:24:38.48 status: One or more devices has experienced an unrecoverable error. An
13:24:38.48 attempt was made to correct the error. Applications are unaffected.
13:24:38.48 action: Determine if the device needs to be replaced, and clear the errors
13:24:38.48 using 'zpool clear' or replace the device with 'zpool replace'.
13:24:38.48 see: http://zfsonlinux.org/msg/ZFS-8000-9P
13:24:38.48 scan: none requested
13:24:38.48 config:
13:24:38.48
13:24:38.48 NAME STATE READ WRITE CKSUM
13:24:38.48 testpool ONLINE 0 0 0
13:24:38.48 mirror-0 ONLINE 0 0 0
13:24:38.48 /mnt/file-1 ONLINE 0 0 142
13:24:38.48 /mnt/file-2 ONLINE 0 0 0
13:24:38.48 /mnt/file-3 ONLINE 0 0 0
13:24:38.48 /mnt/file-4 ONLINE 0 0 0
13:24:38.48 spares
13:24:38.48 /mnt/spare-1 AVAIL
13:24:38.48
13:24:38.48 errors: No known data errors
For some reason ZED doesn't kick in the hot spare. I've been trying to reproduce this for the past few hours on a local builder (Ubuntu 16.04 x86_64) without luck, the test is always a PASS here.
c28b5cf
to
3688d37
Compare
I think you want the test group to be skipped properly if If that's the case, you could introduce a new function similar to the following:
You can then call that within Hopefully I understood you correctly, if not let me know. |
3688d37
to
4bb3689
Compare
@dinatale2 sorry for not being clear, what i meant to say is this "new" failures ("auto_spare_002_pos") don't seem to be caused by this PR and we're only seeing them now because my changes are enabling tests that were previously skipped. I think it makes sense to keep them enabled and running, since they don't really need 'scsi_debug' loaded to run. EDIT: nevermind, we now have a new failure, somehow ZED is still running when expected to be stopped ("ZED already running" in setup.ksh, Ubuntu builders); this is different from the other issue i was seeing (device with many CKSUM not getting faulted by ZED) which could be caused by ZED not running during those tests. |
7a449ee
to
3a28e80
Compare
if [[ -f ${ZEDLET_DIR}/zed.pid ]]; then | ||
zedpid=$(cat ${ZEDLET_DIR}/zed.pid) | ||
kill $zedpid | ||
wait $zedpid | ||
rm -f ${ZEDLET_DIR}/zed.pid | ||
fi | ||
|
||
log_note "ZED stopped" | ||
ps -C zed | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something isn't quite right here (http://build.zfsonlinux.org/builders/Ubuntu%2017.04%20x86_64%20%28TEST%29/builds/1583/steps/shell_8/logs/log):
Test: /usr/share/zfs/zfs-tests/tests/functional/fault/cleanup (run as root) [00:00] [PASS]
06:52:31.22 SUCCESS: zpool create -f foopool27889 loop0 loop1 loop2
06:52:31.39 SUCCESS: zpool destroy -f foopool27889
06:52:31.39 NOTE: Stopping ZED
06:52:31.40 PID TTY TIME CMD
06:52:31.40 8743 ? 00:00:01 zed
06:52:31.41 NOTE: ZED stopped
06:52:31.42 PID TTY TIME CMD
06:52:31.42 8743 ? 00:00:01 zed
06:52:31.42 28187 ? 00:00:00 zed
06:52:31.43 SUCCESS: rm -f /var/tmp/zed/zed.rc
06:52:31.43 SUCCESS: rm -f /var/tmp/zed/zed-functions.sh
06:52:31.44 SUCCESS: rm -f /var/tmp/zed/all-syslog.sh
06:52:31.45 SUCCESS: rm -f /var/tmp/zed/all-debug.sh
06:52:31.45 SUCCESS: rm -f /var/tmp/zed/state
06:52:31.46 SUCCESS: rm -f /var/tmp/zed/zed.log
06:52:31.47 SUCCESS: rm -f /var/tmp/zed/zed.debug.log
06:52:31.47 SUCCESS: rm -f /etc/zfs/vdev_id.conf
06:52:31.48 SUCCESS: rm -f /var/tmp/zed/vdev_id.conf
Test: /usr/share/zfs/zfs-tests/tests/functional/fault/setup (run as root) [00:00] [FAIL]
06:52:31.62 SUCCESS: mkdir /var/tmp/zed
06:52:31.63 SUCCESS: touch /var/tmp/zed/vdev_id.conf
06:52:31.63 SUCCESS: ln -s /var/tmp/zed/vdev_id.conf /etc/zfs/vdev_id.conf
06:52:31.64 SUCCESS: cp /etc/zfs/zed.d/zed.rc /var/tmp/zed
06:52:31.65 SUCCESS: cp /etc/zfs/zed.d/zed-functions.sh /var/tmp/zed
06:52:31.65 SUCCESS: sed -i /\#ZED_DEBUG_LOG=.*/d /var/tmp/zed/zed.rc
06:52:31.66 SUCCESS: umask 0022
06:52:31.67 SUCCESS: cp /usr/lib/x86_64-linux-gnu/zfs/zed.d/all-syslog.sh /var/tmp/zed
06:52:31.67 SUCCESS: cp /usr/lib/x86_64-linux-gnu/zfs/zed.d/all-debug.sh /var/tmp/zed
06:52:31.68 SUCCESS: umask 0000
06:52:31.68 NOTE: Starting ZED
06:52:31.69 PID TTY TIME CMD
06:52:31.69 8743 ? 00:00:01 zed
06:52:31.69 28281 ? 00:00:00 zed
06:52:31.70 ZED already running
06:52:31.70 NOTE: Performing test-fail callback (/usr/share/zfs/zfs-tests/callbacks/zfs_dbgmsg.ksh)
3793eed
to
b3f1924
Compare
wait $zedpid | ||
while ps -p $zedpid > /dev/null; do | ||
sleep 1 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait
(shell-builtin) doesn't seem to work when zed_start()
and zed_stop()
are run in different processes (setup.ksh and cleanup.ksh respectively): this resulted in several failures running fault
tests with TEST_ZFSTESTS_ITERS="10"
.
b49d8e9
to
ab66cf3
Compare
@behlendorf we are de-referencing an invalid pointer in "libzfs.so.2.0.0", which is killing ZED. It's
I may have found the smoking gun in another core (crash is in "libc.so"); we have two different threads iterating over
IIRC libzfs is not thread-safe. I've tried reproducing the crash with the following patch applied but this seems to "avoid" the issue, which confirms my theory: diff --git a/lib/libzfs/libzfs_config.c b/lib/libzfs/libzfs_config.c
index 67379d0..2285369 100644
--- a/lib/libzfs/libzfs_config.c
+++ b/lib/libzfs/libzfs_config.c
@@ -49,6 +49,7 @@
#include <libuutil.h>
#include "libzfs_impl.h"
+#include "libuutil_impl.h"
typedef struct config_node {
char *cn_name;
@@ -401,6 +402,8 @@ zpool_iter(libzfs_handle_t *hdl, zpool_iter_f func, void *data)
if (!hdl->libzfs_pool_iter && namespace_reload(hdl) != 0)
return (-1);
+ (void ) pthread_mutex_lock(&hdl->libzfs_ns_avl->ua_pool->uap_lock);
+
hdl->libzfs_pool_iter++;
for (cn = uu_avl_first(hdl->libzfs_ns_avl); cn != NULL;
cn = uu_avl_next(hdl->libzfs_ns_avl, cn)) {
@@ -423,6 +426,8 @@ zpool_iter(libzfs_handle_t *hdl, zpool_iter_f func, void *data)
}
hdl->libzfs_pool_iter--;
+ (void ) pthread_mutex_unlock(&hdl->libzfs_ns_avl->ua_pool->uap_lock);
+
return (0);
} This is in no way a valid "fix": a better solution is probably to avoid using libzfs in a multi-thread context, maybe just running To reproduce the crash i'm running these in 3 different shells:
Finally, this is taking a lot more than i had expected and is starting to aggregate a lot of changes unrelated to the initial PR: sorry about that. Also, apologies for the wall of text. |
84cda9b
to
6757314
Compare
Right, it's not. So the caller needs to implement the needed locking. What you could do is either.
Option 1) would be my preference since it's nice and simple, but there may be some reason why this isn't workable which I'm not seeing.
No problem, I'm glad to see these issues addressed. But speaking of changes unrelated to the original PR while you're here you could also convert the ZED over to using the |
1a53bc4
to
a435665
Compare
cmd/zed/agents/zfs_mod.c
Outdated
@@ -559,15 +556,13 @@ zfs_iter_pool(zpool_handle_t *zhp, void *data) | |||
for (pool = list_head(&g_pool_list); pool != NULL; | |||
pool = list_next(&g_pool_list, pool)) { | |||
|
|||
if (pool->uap_enable_tid != 0) | |||
continue; /* entry already processed */ | |||
if (strcmp(zpool_get_name(zhp), | |||
zpool_get_name(pool->uap_zhp))) | |||
continue; | |||
if (zfs_toplevel_state(zhp) >= VDEV_STATE_DEGRADED) { | |||
/* send to a background thread; keep on list */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The entry can be removed from the list here the same as in the upstream code.
a435665
to
743a526
Compare
* Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
743a526
to
fd9b69b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for tackling this cleanup.
@@ -83,6 +82,7 @@ | |||
#include <sys/sunddi.h> | |||
#include <sys/sysevent/eventdefs.h> | |||
#include <sys/sysevent/dev.h> | |||
#include <thread_pool.h> | |||
#include <pthread.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit, you can probably now drop the pthread.h
include since the code now no longer directly uses pthreads (only through the thread pool library)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropping the #include <pthread.h>
doesn't seem to result in any build error (it's included in thread_pool.h
anyway), but since we are using pthread_create()
to spawn zfs_enum_pools()
it seems clearer to explicitly include the header file:
C symbol: pthread_create
File Function Line
0 zfs_agents.c zfs_agent_init 311 if (pthread_create(&g_agents_tid, NULL, zfs_agent_consumer_thread,
1 zfs_mod.c zfs_slm_init 861 if (pthread_create(&g_zfs_tid, NULL, zfs_enum_pools, NULL) != 0) {
2 zed_disk_event.c zed_disk_event_init 374 if (pthread_create(&g_mon_tid, NULL, zed_udev_monitor, g_mon) != 0) {
3 thread_pool.c create_worker 236 error = pthread_create(&thread, &tpool->tp_attr, tpool_worker, tpool);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to leave it then.
@loli10K when you're happy with this go ahead and remove the WIP label. |
@behlendorf i'm bulding this locally right now, i'll try to remove the |
@loli10K merged, thanks! |
* Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#2562 Closes openzfs#6858
* Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#2562 Closes openzfs#6858
* Teach ZED to handle spares usingi the configured ashift: if the zpool 'ashift' property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS natively detects them as 4K block devices. * Introduce an additional auto_spare test case which verifies that in the face of multiple device failures an appropiate number of spares are kicked in. * Fix zed_stop() in "libtest.shlib" which did not correctly wait the target pid. * Fix ZED crashing on startup caused by a race condition in libzfs when used in multi-threaded context. * Convert ZED over to using the tpool library which is already present in the Illumos FMA code. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: loli10K <ezomori.nozomu@gmail.com> Closes openzfs#2562 Closes openzfs#6858
Description
zed_stop()
in "libtest.shlib"TODO:
Motivation and Context
If the zpool "ashift" property is set then ZED should use its value when kicking in a hotspare; with this change 512e disks can be used as spares for VDEVs that were created with ashift=9, even if ZFS
natively detects them as 4K block devices.
#2562 (comment)
Consider a ZFS mirror of "old" 512b devices (i'm using file vdevs just for testing):
When one of the "old" disks start failing we want to keep the storage redundant, so we add a "new" 512e spare disk to the pool (
scsi_debug
device):Now one of the disk start failing: ZED detects this but is unable to kick in the hot spare. From
strace -f -e trace=ioctl zed -Ffv
:The pool remains without redundancy, because we attempted to replace the failing device with a disk providing a different sector size:
We need to provide a custom
ashift
value to replace the disk:How Has This Been Tested?
Test added to the ZTS. Full disclosure, the test panics my test builder:
Types of changes
Checklist:
Signed-off-by
.