Calling zio_interrupt() from vdev_disk_io_start() may cause RCU stall #3840

behlendorf · 2015-09-25T20:22:11Z

This issue is for the remainder of the work alluded to in #3652. The vdev_disk_io_start() function needs to updated such that it doesn't called zio_interrupt() directly in the case of an error. This work item should be deferred to prevent the same zio from appearing on multiple taskqs concurrently.

------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x8f/0xa0() (Tainted: P
-- ------------   )
Hardware name: RHEV Hypervisor
list_add corruption. prev->next should be next (ffff88021495ca40), but was
ffff880123b05610. (prev=ffff880123b05610).
Modules linked in: zfs(P)(U) zcommon(P)(U) zunicode(P)(U) znvpair(P)(U)
zavl(P)(U) splat(U) spl(U) zlib_deflate nfsd nfs_acl auth_rpcgss exportfs
lockd sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter
ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 sg serio_raw virtio_balloon virtio_console
virtio_net i2c_piix4 i2c_core ext4 jbd2 mbcache virtio_scsi virtio_blk sr_mod
cdrom virtio_pci virtio_ring virtio pata_acpi ata_generic ata_piix dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: spl]
Pid: 14249, comm: z_wr_int_6 Tainted: P           -- ------------
2.6.32-573.3.1.el6.x86_64 #1
Call Trace:
 [<ffffffff81077491>] ? warn_slowpath_common+0x91/0xe0
 [<ffffffff81077596>] ? warn_slowpath_fmt+0x46/0x60
 [<ffffffff812a38df>] ? __list_add+0x8f/0xa0
 [<ffffffffa0745ac0>] ? zio_execute+0x0/0x180 [zfs]
 [<ffffffffa021c42f>] ? taskq_dispatch_ent+0x15f/0x170 [spl]
 [<ffffffffa0745ac0>] ? zio_execute+0x0/0x180 [zfs]
 [<ffffffffa06e7a6e>] ? spa_taskq_dispatch_ent+0x7e/0xb0 [zfs]
 [<ffffffffa073f643>] ? zio_taskq_dispatch+0x73/0xa0 [zfs]
 [<ffffffffa06ff2c1>] ? vdev_disk_dio_free+0x51/0x60 [zfs]
 [<ffffffffa073f6a5>] ? zio_interrupt+0x15/0x20 [zfs]
 [<ffffffffa06ff349>] ? vdev_disk_dio_put+0x79/0x90 [zfs]      
 [<ffffffffa06ff750>] ? __vdev_disk_physio+0x3f0/0x460 [zfs]
 [<ffffffffa06ff84f>] ? vdev_disk_io_start+0x8f/0x1d0 [zfs]
 [<ffffffffa0740dcf>] ? zio_vdev_io_start+0xaf/0x300 [zfs]
 [<ffffffffa07436c5>] ? zio_nowait+0xb5/0x180 [zfs]
 [<ffffffffa06fbbeb>] ? vdev_probe+0x15b/0x240 [zfs]
 [<ffffffffa06fcdf0>] ? vdev_probe_done+0x0/0x280 [zfs]
 [<ffffffffa073fecd>] ? zio_vdev_io_done+0xdd/0x190 [zfs]
 [<ffffffffa0745b98>] ? zio_execute+0xd8/0x180 [zfs]
 [<ffffffffa021bfff>] ? taskq_thread+0x24f/0x520 [spl]
 [<ffffffff810672b0>] ? default_wake_function+0x0/0x20
 [<ffffffffa021bdb0>] ? taskq_thread+0x0/0x520 [spl]
 [<ffffffff810a101e>] ? kthread+0x9e/0xc0
 [<ffffffff8100c28a>] ? child_rip+0xa/0x20
 [<ffffffff810a0f80>] ? kthread+0x0/0xc0
 [<ffffffff8100c280>] ? child_rip+0x0/0x20
---[ end trace d8ec92eb438a10af ]---

The text was updated successfully, but these errors were encountered:

bprotopopov · 2017-03-01T14:58:46Z

@behlendorf can you please provide more detail as to the situations that might lead to a zio appearing on multiple taskqs concurrently ?

behlendorf · 2017-03-01T17:44:38Z

@bprotopopov my concern was specifically with the vdev_probe_done() done function which does something a little strange by resubmitting the vd->vdev_probe_zio to zio_nowait(). It does this in the context of one of the taskq threads so it's critical that taskq_thread() initializes the t->tqent_list before calling the task's function.

The current code does clearly do this so perhaps this is no longer an issue. To my knowledge we haven't had any subsequent reports of this.

bprotopopov · 2017-03-01T18:55:26Z

Hi, @behlendorf

to make sure we are on the same page, I believe you are referring to

static int
taskq_thread(void *args)
{
...
                if ((t = taskq_next_ent(tq)) != NULL) {
                        list_del_init(&t->tqent_list);
...
}

Given the list_del_init() above and subsequently dropped tq->tq_lock, it would seem that __zio_execute() or other tqent_func() runs in the context of a taskq_thread unencumbered, and is free to re-dispatch if it wishes to do so. One could be even more specific by adding a defensive INIT_LIST_HEAD(&t->tqent_list) in taskq_dispatch_ent() before adding to lists, and/or perhaps an assert() to that effect.

I have also seen this stack on my systems a few times, but it never seemed to lead to a kind of meltdown one would expect from an entry actually showing up on two linked lists. Which leads me to believe that this is a warning for an uninitialized data structure that is subsequently re-initialized in a benign fashion.

What do you think ?

bprotopopov · 2017-03-01T19:37:48Z

@behlendorf, I still feel like something does not quite fit right in this picture.

Looking at taskq_dispatch_ent() and the warning above, I get an impression that the warning is communicating invalid state of the list (head->prev->next != head), as opposed to invalid state of the taskq_ent that is being added to the list. I can only think of this happening when an entry is in fact added to two lists, and next time we try to add, we find the (first) list in this state.

So this might be something more serious, e.g. a zio is dispatched twice in a row without going through the taskq_thread() context.

behlendorf · 2017-03-01T20:04:31Z

Right. That's exactly what the warning is detecting, and what should have been prevented by the lines you posted above. If you're able to reproduce on a test system one thing which might be worth doing is adding another ASSERT to the top of taskq_dispatch_ent() to verify the passed taskq_ent_t is always properly initialized. That way we could catch the issue slightly sooner.

        ASSERT(taskq_empty_ent(t));

bprotopopov · 2017-03-01T20:22:51Z

Yes, so, the stack above is the 'victim' stack, and the damage has already been done.

The fact is that this type of ASSERT()

 ASSERT(taskq_empty_ent(&zio->io_tqent));

already seems to be present in placed where zios are dispatched. Which means that a zio is likely being acted on by two threads concurrently (each thread is adding to a different list), and the ASSERT()s are not catching this, due to timing and lack of serialization ?

Maybe in addition to the ASSERTS(), I can add some sort of atomic reference count that is incremented before ASSERT()/dispatch and atomically decremented/tested in dispatch after list_add() to make sure only one thread at a time is dispatching a given zio ?

bprotopopov · 2017-03-01T20:51:17Z

Actually, there is a locked section around adding to the list already:

...
        spin_lock(&t->tqent_lock);

	/*
	 * Mark it as a prealloc'd task.  This is important
	 * to ensure that we don't free it later.
	 */
	t->tqent_flags |= TQENT_FLAG_PREALLOC;

	/* Queue to the priority list instead of the pending list */
	if (flags & TQ_FRONT)
		list_add_tail(&t->tqent_list, &tq->tq_prio_list);
	else
		list_add_tail(&t->tqent_list, &tq->tq_pend_list);

	t->tqent_id = tq->tq_next_id;
	tq->tq_next_id++;
	t->tqent_func = func;
	t->tqent_arg = arg;
	t->tqent_taskq = tq;
	t->tqent_birth = jiffies;

	spin_unlock(&t->tqent_lock);
...

The
ASSERT(taskq_empty_ent(t)) could be simply placed after spin_lock(&t->tqent_lock) and before list_add_tail().

behlendorf · 2017-03-01T21:02:13Z

Yes, the problem is I'm not sure it's going to tell you anything new. The above stack trace was dumped only a few lines farther down from the list_add_tail() in this same locked section.

bprotopopov · 2017-03-01T21:20:13Z

Well, no - the previous code did not check under lock, so even if two threads were to check independently for non-empty list head, they could do

thread1 - check entry() - OK
thread2 - check entry() -OK
thread1 - list_add() (checks the list head - fine)
thread2 - list_add() (checks the list head - fine)

and then next time someone removes from the second list (and resets the head->prev->next to prev) and then adds to the first list, we would get the warning message.

With the check under lock, I believe we will catch the second list_add() in action, which seems like the info we need.

behlendorf · 2017-03-01T21:56:09Z

My mistake, I was under the mistaken impression that __list_add_valid() verified that the list head being added was initialized and empty. That seems not to be the case. Given that I'm all for adding the ASSERT as you've described.

bprotopopov · 2017-03-09T22:48:10Z

@behlendorf unfortunately, I don't have access to the system that reproduced this anymore
Do you think it might make sense to put the ASSERT() in just so next time someone sees this, the info is available ?

behlendorf · 2017-03-09T23:12:09Z

Yup, I think that'd be reasonable.

bprotopopov · 2017-03-10T17:29:22Z

Hi, @behlendorf

would you prefer I enter a separate issue with a short explanation of what this is for ?

behlendorf · 2017-03-10T18:00:07Z

Let's just reference this issue which includes the full discussion above in the commit message.

bprotopopov · 2017-03-10T19:07:19Z

Sounds good

taskq work item to more than one queue concurrently. Also, please see discussion in openzfs/zfs#3840. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Closes #609

taskq work item to more than one queue concurrently. Also, please see discussion in openzfs/zfs#3840. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Boris Protopopov <boris.protopopov@actifio.com> Closes openzfs#609

behlendorf · 2018-02-09T22:12:56Z

Closing for now, this issue in no longer reproducible and a debug patch was merged in case it surfaces again.

behlendorf added this to the 0.7.0 milestone Sep 25, 2015

behlendorf added the Bug - Major label Sep 25, 2015

behlendorf mentioned this issue Sep 25, 2015

CPU stall when disk is removed #3652

Closed

behlendorf modified the milestones: 0.8.0, 0.7.0 Mar 26, 2016

behlendorf removed the Bug - Major label Sep 30, 2016

dinatale2 mentioned this issue Mar 11, 2017

Add assert and warning message/stack dump openzfs/spl#609

Merged

behlendorf mentioned this issue Aug 3, 2017

Dummy for zfsonlinux/spl#609 #6454

Closed

13 tasks

behlendorf closed this as completed Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling zio_interrupt() from vdev_disk_io_start() may cause RCU stall #3840

Calling zio_interrupt() from vdev_disk_io_start() may cause RCU stall #3840

behlendorf commented Sep 25, 2015

bprotopopov commented Mar 1, 2017

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 •

edited

Loading

bprotopopov commented Mar 1, 2017 •

edited

Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 •

edited

Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 •

edited

Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 9, 2017

behlendorf commented Mar 9, 2017

bprotopopov commented Mar 10, 2017

behlendorf commented Mar 10, 2017

bprotopopov commented Mar 10, 2017

behlendorf commented Feb 9, 2018

Calling zio_interrupt() from vdev_disk_io_start() may cause RCU stall #3840

Calling zio_interrupt() from vdev_disk_io_start() may cause RCU stall #3840

Comments

behlendorf commented Sep 25, 2015

bprotopopov commented Mar 1, 2017

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 • edited Loading

bprotopopov commented Mar 1, 2017 • edited Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 • edited Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 1, 2017 • edited Loading

behlendorf commented Mar 1, 2017

bprotopopov commented Mar 9, 2017

behlendorf commented Mar 9, 2017

bprotopopov commented Mar 10, 2017

behlendorf commented Mar 10, 2017

bprotopopov commented Mar 10, 2017

behlendorf commented Feb 9, 2018

bprotopopov commented Mar 1, 2017 •

edited

Loading

bprotopopov commented Mar 1, 2017 •

edited

Loading

bprotopopov commented Mar 1, 2017 •

edited

Loading

bprotopopov commented Mar 1, 2017 •

edited

Loading