zpool add intermittently fails #71

nedbass · 2010-10-20T20:38:16Z

I sometimes observe a failure to add a cache or log vdev to a zpool. If I issue the same command repeatedly, it eventually succeeds. I only see this for real disks, never for loopback or scsi_debug devices. Also if I run zpool under strace the problem never happens. So there seems to be a subtle timing issue.

$ zpool add ztest cache A4
cannot add to 'ztest': no such pool or dataset

This happens when the call zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, &zc) returns ENOENT. By instrumenting the kernel code with SPL debugging macros, I found where ENOENT is getting returned from and dumped a stack trace:

SPL: Showing stack for process 25977
Pid: 25977, comm: lt-zpool Tainted: P           ----------------  2.6.32-63.el6.x86_64 #1
Call Trace:
 [] spl_debug_dumpstack+0x27/0x40 [spl]
 [] spa_scan_get_stats+0x41/0x140 [zfs]
 [] ? nvlist_add_uint64_array+0x16/0x20 [znvpair]
 [] vdev_config_generate+0x590/0x10f0 [zfs]
 [] ? kmem_alloc_debug+0xeb/0x130 [spl]
 [] ? schedule+0x1e4/0x408
 [] ? _cond_resched+0x30/0x40
 [] vdev_config_generate+0x28f/0x10f0 [zfs]
 [] ? kmem_alloc_debug+0xeb/0x130 [spl]
 [] ? _cond_resched+0x30/0x40
 [] vdev_config_generate+0x28f/0x10f0 [zfs]
 [] ? kmem_free_debug+0x16/0x20 [spl]
 [] spa_config_generate+0x1f9/0x9c0 [zfs]
 [] spa_open_common+0xb4/0x490 [zfs]
 [] ? ftrace_call+0x5/0x2b
 [] spa_get_stats+0x49/0x2c0 [zfs]
 [] ? _cond_resched+0x9/0x40
 [] ? __kmalloc+0x20c/0x220
 [] zfs_ioc_pool_stats+0x31/0x70 [zfs]
 [] zfsdev_ioctl+0xef/0x1c0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] ? ftrace_call+0x5/0x2b
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b
SPL: 25977:0:(spa_misc.c:1662:spa_scan_get_stats()) Process leaving (rc=2 : 2 : 2)

For reference, here is the implicated code. I have confirmed that scn != NULL. This is as far as I've dug into the problem so far.

1653 /*
1654  * Get scan stats for zpool status reports
1655  */
1656 int
1657 spa_scan_get_stats(spa_t *spa, pool_scan_stat_t *ps)
1658 {
1659         dsl_scan_t *scn = spa->spa_dsl_pool ? spa->spa_dsl_pool->dp_scan : NULL;
1660 
1661         if (scn == NULL || scn->scn_phys.scn_func == POOL_SCAN_NONE)
1662                 return (ENOENT);
...

nedbass · 2010-10-20T21:30:41Z

Actually after looking more closely, I think spa_scan_get_stats() returning ENOENT is a red herring. But with better instrumentation I'm now focusing on spa_validate_aux_devs() and vdev_label_init().

nedbass · 2010-10-22T23:39:25Z

This is fixed by d877ac6.

The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #71

A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is **not** a preallocated task (when in fact it **is** a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a **huge** mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #71

* Fixed hdr.len check in io_receiver thread Signed-off-by: satbir <satbir.chhikara@gmail.com>

gvdijnsen mentioned this issue Jan 19, 2016

Intermittent stalling of ZFS pool #3979

Closed

richardelling pushed a commit to richardelling/zfs that referenced this issue Oct 15, 2018

[TA1143] Fix deadlock issue in finish_async_tasks. (openzfs#71)

9aa90df

* Fixed hdr.len check in io_receiver thread Signed-off-by: satbir <satbir.chhikara@gmail.com>

kocoman1 mentioned this issue Sep 5, 2021

None of the expected "get_disk_ro()" interfaces were detected #12537

Open

sdimitro pushed a commit to sdimitro/zfs that referenced this issue Feb 14, 2022

DOSE-442 cli_user misc tests (openzfs#71)

ed3b281

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zpool add intermittently fails #71

zpool add intermittently fails #71

nedbass commented Oct 20, 2010

nedbass commented Oct 20, 2010

nedbass commented Oct 22, 2010

zpool add intermittently fails #71

zpool add intermittently fails #71

Comments

nedbass commented Oct 20, 2010

nedbass commented Oct 20, 2010

nedbass commented Oct 22, 2010