-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool add intermittently fails #71
Comments
Actually after looking more closely, I think spa_scan_get_stats() returning ENOENT is a red herring. But with better instrumentation I'm now focusing on spa_validate_aux_devs() and vdev_label_init(). |
This is fixed by d877ac6. |
dajhorn
referenced
this issue
in zfsonlinux/pkg-zfs
Jan 6, 2012
The taskq_t's active thread list is sorted based on its tqt_ent->tqent_id field. The list is kept sorted solely by inserting new taskq_thread_t's in their correct sorted location; no other means is used. This means that once inserted, if a taskq_thread_t's tqt_ent->tqent_id field changes, the list runs the risk of no longer being sorted. Prior to the introduction of the taskq_dispatch_prealloc() interface, this was not a problem as a taskq_ent_t actively being serviced under the old interface should always have a static tqent_id field. Thus, once the taskq_thread_t is added to the taskq_t's active thread list, the taskq_thread_t's tqt_ent->tqent_id field would remain constant. Now, this is no longer the case. Currently, if using the taskq_dispatch_prealloc() interface, any given taskq_ent_t actively being serviced _may_ have its tqent_id value incremented. This happens when the preallocated taskq_ent_t structure is recursively dispatched. Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id field silently modified from under its feet. If this were to happen to a taskq_thread_t on a taskq_t's active thread list, this would compromise the integrity of the order of the list (as the list _may_ no longer be sorted). To get around this, the taskq_thread_t's taskq_ent_t pointer was replaced with its own static copy of the tqent_id. So, as a taskq_ent_t is pulled off of the taskq_t's pending list, a static copy of its tqent_id is made and this copy is used to sort the active thread list. Using a static copy is key in ensuring the integrity of the order of the active thread list. Even if the underlying taskq_ent_t is recursively dispatched (as has its tqent_id modified), this static copy stored inside the taskq_thread_t will remain constant. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #71
dajhorn
referenced
this issue
in zfsonlinux/pkg-zfs
Jan 6, 2012
A preallocated taskq_ent_t's tqent_flags must be checked prior to servicing the taskq_ent_t. Once a preallocated taskq entry is serviced, the ownership of the entry is handed back to the caller of taskq_dispatch, thus the entry's contents can potentially be mangled. In particular, this is a problem in the case where a preallocated taskq entry is serviced, and the caller clears it's tqent_flags field. Thus, when the function returns and task_done is called, it looks as though the entry is **not** a preallocated task (when in fact it **is** a preallocated task). In this situation, task_done will place the preallocated taskq_ent_t structure onto the taskq_t's free list. This is a **huge** mistake. If the taskq_ent_t is then freed by the caller of taskq_dispatch, the taskq_t's free list will hold a pointer to garbage data. Even worse, if nothing has over written the freed memory before the pointer is dereferenced, it may still look as though it points to a valid list_head belonging to a taskq_ent_t structure. Thus, the task entry's flags are now copied prior to servicing the task. This copy is then checked to see if it is a preallocated task, and determine if the entry needs to be passed down to the task_done function. Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #71
richardelling
pushed a commit
to richardelling/zfs
that referenced
this issue
Oct 15, 2018
* Fixed hdr.len check in io_receiver thread Signed-off-by: satbir <satbir.chhikara@gmail.com>
sdimitro
pushed a commit
to sdimitro/zfs
that referenced
this issue
Feb 14, 2022
This issue was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I sometimes observe a failure to add a cache or log vdev to a zpool. If I issue the same command repeatedly, it eventually succeeds. I only see this for real disks, never for loopback or scsi_debug devices. Also if I run zpool under strace the problem never happens. So there seems to be a subtle timing issue.
This happens when the call zfs_ioctl(hdl, ZFS_IOC_VDEV_ADD, &zc) returns ENOENT. By instrumenting the kernel code with SPL debugging macros, I found where ENOENT is getting returned from and dumped a stack trace:
For reference, here is the implicated code. I have confirmed that scn != NULL. This is as far as I've dug into the problem so far.
The text was updated successfully, but these errors were encountered: