Skip to content
This repository has been archived by the owner on Feb 26, 2020. It is now read-only.

__taskq_destroy() hang #71

Closed
behlendorf opened this issue Dec 14, 2011 · 17 comments
Closed

__taskq_destroy() hang #71

behlendorf opened this issue Dec 14, 2011 · 17 comments
Assignees
Labels

Comments

@behlendorf
Copy link
Contributor

The following hang was observed (rarely) after running all of xfstests when it was unloading the zfs modules and destroying the pools. It looks like this was accidentally introduced by the recent taskq optimization is issue #65. There appears to be a case where we're waiting for a work it we think was queued but never gets executed.

SPL: Loaded module v0.6.0, using hostid 0x007f0100
ZFS: Loaded module v0.6.0, ZFS pool version 28, ZFS filesystem version 5
INFO: task zpool:7661 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool         D 0000000000000001     0  7661  21432 0x00000080
 ffff88003d533ca8 0000000000000086 0000000000000001 0000000000000000
 ffff88003d533c38 ffffffff81062384 ffff88003d533c38 000000004d97173a
 ffff88010e965078 ffff88003d533fd8 000000000000f508 ffff88010e965078
Call Trace:
 [] ? enqueue_task_fair+0x64/0x100
 [] ? prepare_to_wait+0x4e/0x80
 [] __taskq_wait_id+0x65/0xa0 [spl]
 [] ? autoremove_wake_function+0x0/0x40
 [] __taskq_wait+0x40/0x50 [spl]
 [] __taskq_destroy+0x3c/0xf0 [spl]
 [] spa_deactivate+0x60/0x190 [zfs]
 [] spa_export_common+0xfd/0x310 [zfs]
 [] spa_destroy+0x1a/0x20 [zfs]
 [] zfs_ioc_pool_destroy+0x1e/0x40 [zfs]
 [] zfsdev_ioctl+0xfd/0x1d0 [zfs]
 [] vfs_ioctl+0x22/0xa0
 [] do_vfs_ioctl+0x84/0x580
 [] sys_ioctl+0x81/0xa0
 [] system_call_fastpath+0x16/0x1b
@ghost ghost assigned prakashsurya Dec 14, 2011
@prakashsurya
Copy link
Member

Have you only seen this the one time we talked about?

@prakashsurya
Copy link
Member

I seem to be able to reliably reproduce this issue running ZFS's zpios-sanity script on my ARCH VM. Here's another stack:

[  563.764133] ------------[ cut here ]------------
[  563.764588] kernel BUG at mm/slub.c:2969!
[  563.764966] invalid opcode: 0000 [#1] SMP 
[  563.765375] last sysfs file: /sys/devices/virtual/bdi/zfs-6/uevent
[  563.765950] CPU 3 
[  563.766157] Modules linked in: zpios loop zfs(P) zcommon(P) zunicode(P) znvpair(P) zavl(P) splat spl zlib zlib_deflate ipv6 ext2 snd_ens1370 gameport snd_rawmidi 8139cp snd_seq_device snd_pcm 8139too serio_raw mii snd_timer i2c_piix4 floppy psmouse thermal snd pcspkr i2c_core soundcore evdev snd_page_alloc processor button ext4 mbcache jbd2 crc16 sr_mod cdrom sd_mod pata_acpi uhci_hcd ata_piix libata usbcore scsi_mod [last unloaded: zpios]
[  563.770392] Pid: 5972, comm: lt-zpool Tainted: P           2.6.32.49-1-lts #1 Bochs
[  563.771104] RIP: 0010:[<ffffffff81134228>]  [<ffffffff81134228>] kfree+0x128/0x130
[  563.771823] RSP: 0018:ffff880060c83d08  EFLAGS: 00010046
[  563.772318] RAX: 0100000000000000 RBX: ffffffffa07d5986 RCX: ffff8800718fa028
[  563.772978] RDX: 000000000038c7d0 RSI: ffffea00018d76b0 RDI: ffff8800718fa020
[  563.773637] RBP: ffff880060c83d28 R08: 0000000000000001 R09: 0000000000000000
[  563.774095] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800718fa020
[  563.774095] R13: ffff8800633a3250 R14: 0000000000000006 R15: ffff88006306b901
[  563.774095] FS:  00007fcbca771b40(0000) GS:ffff880001b80000(0000) knlGS:0000000000000000
[  563.774095] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  563.774095] CR2: 00007fcbc81c60ee CR3: 0000000060c42000 CR4: 00000000000006e0
[  563.774095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  563.774095] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  563.774095] Process lt-zpool (pid: 5972, threadinfo ffff880060c82000, task ffff88006306b980)
[  563.774095] Stack:
[  563.774095]  ffff880060c83d38 ffff880075567c00 ffff880075567c68 ffff8800633a3250
[  563.774095] <0> ffff880060c83d38 ffffffffa07d5986 ffff880060c83d68 ffffffffa07d7bbb
[  563.774095] <0> ffff880060c83d78 ffff880074f2c000 0000000000000000 ffff880074f2c000
[  563.774095] Call Trace:
[  563.774095]  [<ffffffffa07d5986>] kmem_free_debug+0x16/0x20 [spl]
[  563.774095]  [<ffffffffa07d7bbb>] __taskq_destroy+0xbb/0xf0 [spl]
[  563.774095]  [<ffffffffa08f3275>] spa_deactivate+0xb5/0x210 [zfs]
[  563.774095]  [<ffffffffa08f7d8a>] spa_export_common+0x13a/0x370 [zfs]
[  563.774095]  [<ffffffffa08f801a>] spa_destroy+0x1a/0x20 [zfs]
[  563.774095]  [<ffffffffa0926e1e>] zfs_ioc_pool_destroy+0x1e/0x40 [zfs]
[  563.774095]  [<ffffffffa092b30c>] zfsdev_ioctl+0xdc/0x1b0 [zfs]
[  563.774095]  [<ffffffff81157c8a>] vfs_ioctl+0x2a/0xa0
[  563.774095]  [<ffffffff81117dd0>] ? unmap_region+0x150/0x170
[  563.774095]  [<ffffffff8115820d>] do_vfs_ioctl+0x7d/0x530
[  563.774095]  [<ffffffff81161b8f>] ? alloc_fd+0x4f/0x150
[  563.774095]  [<ffffffff81158741>] sys_ioctl+0x81/0xa0
[  563.774095]  [<ffffffff81012072>] system_call_fastpath+0x16/0x1b
[  563.774095] Code: bf 88 4c 00 4d 85 ed 0f 84 26 ff ff ff 49 8b 45 00 49 83 c5 08 4c 89 e6 48 89 df ff d0 49 8b 45 00 48 85 c0 75 eb e9 08 ff ff ff <0f> 0b eb fe 0f 1f 40 00 55 48 89 e5 0f 1f 44 00 00 48 81 ef a8 
[  563.774095] RIP  [<ffffffff81134228>] kfree+0x128/0x130
[  563.774095]  RSP <ffff880060c83d08>
[  563.774095] ---[ end trace 1758a610091151e0 ]---

And it looks to be hitting BUG_ON here:

void kfree(const void *x)                                                           
{                                                                                   
        struct page *page;                                                          
        void *object = (void *)x;                                                   
                                                                                    
        trace_kfree(_RET_IP_, x);                                                   
                                                                                    
        if (unlikely(ZERO_OR_NULL_PTR(x)))                                          
                return;                                                             
                                                                                    
        page = virt_to_head_page(x);                                                
        if (unlikely(!PageSlab(page))) {                                            
                BUG_ON(!PageCompound(page));                                        
                kmemleak_free(x);                                                   
                put_page(page);                                                     
                return;                                                             
        }                                                                           
        slab_free(page->slab, page, object, _RET_IP_);                              
}                                                                                   
EXPORT_SYMBOL(kfree);

@behlendorf
Copy link
Contributor Author

Just a thought, but since your in a kmem_free() you might enable the basic spl debugging and the memory tracking. This will help you immediately catch any memory handling mistakes. In addition, adding an ASSERT(tq->tq\_flags & TQ_ACTIVE); check to the top of __taskq_destroy() might not be a bad idea. I could see something like your describing occur if for some reason taskq_destroy() is called twice.

@prakashsurya
Copy link
Member

Hmm, this may be revealing.. The first time I ran zpios-sanity with debug enabled on the SPL side I hit this ASSERT:

[ 2328.281520] SPLError: 1804:0:(spl-taskq.c:506:taskq_thread()) ASSERTION(tq->tq_lowest_id > id) failed
[ 2328.282387] SPLError: 1804:0:(spl-taskq.c:506:taskq_thread()) SPL PANIC
[ 2328.282996] SPL: Showing stack for process 1804
[ 2328.283427] Pid: 1804, comm: z_wr_iss/2 Tainted: P           2.6.32.49-1-lts #1
[ 2328.284098] Call Trace:
[ 2328.284429]  [<ffffffffa0473477>] spl_debug_dumpstack+0x27/0x40 [spl]
[ 2328.285030]  [<ffffffffa0474bd2>] spl_debug_bug+0x82/0xd0 [spl]
[ 2328.285582]  [<ffffffffa047fa3a>] taskq_thread+0x52a/0x850 [spl]
[ 2328.286145]  [<ffffffff81056c00>] ? default_wake_function+0x0/0x20
[ 2328.286721]  [<ffffffffa047f510>] ? taskq_thread+0x0/0x850 [spl]
[ 2328.287281]  [<ffffffff81084188>] kthread+0x88/0x90
[ 2328.287737]  [<ffffffff8104d478>] ? finish_task_switch+0x48/0xd0
[ 2328.288317]  [<ffffffff810130aa>] child_rip+0xa/0x20
[ 2328.288800]  [<ffffffff81084100>] ? kthread+0x0/0x90
[ 2328.289277]  [<ffffffff810130a0>] ? child_rip+0x0/0x20
[ 2328.289755] SPL: Dumping log to /tmp/spl-log.1323983531.1804

@prakashsurya
Copy link
Member

I wonder if I goofed something by introducing this change:

@@ -481,10 +481,6 @@ taskq_thread(void *args)
                if (pend_list) {
                         t = list_entry(pend_list->next, taskq_ent_t, tqent_list);
                         list_del_init(&t->tqent_list);
+                       /* In order to support recursively dispatching a
+                        * preallocated taskq_ent_t, tqent_id must be
+                        * stored prior to executing tqent_func. */
+                       id = t->tqent_id;
                        tqt->tqt_ent = t;
                        taskq_insert_in_order(tq, tqt);
                         tq->tq_nactive++;
@@ -497,6 +493,7 @@ taskq_thread(void *args)
                         tq->tq_nactive--;
                        list_del_init(&tqt->tqt_active_list);
                        tqt->tqt_ent = NULL;
-                       id = t->tqent_id;
                         task_done(tq, t);

                        /* When the current lowest outstanding taskqid is

@behlendorf
Copy link
Contributor Author

That's a good start. It would be worthwhile to change that to an ASSERT3S so we can actually see the bogus values, perhaps the splat tests would hit this as well with debugging enabled. As for the change you mention I don't see how it could cause this unless the tqent_func is tinkering with the tqent_id which seems very unlikely. I'd be more suspicious of something like that taskq_insert_in_order() changes although those look good too.

@prakashsurya
Copy link
Member

Well tqent_func does indeed increment tqent_id if it is using a preallocated taskq_ent_t, although I don't know if that falls into the category of "tinkering". That was the reason I needed to move it in the first place.

@prakashsurya
Copy link
Member

Well I ran the SPL's taskq splat tests with --enable-debug{,-kmem,-kmem-tracking}, but didn't hit the ASSERT I'm seeing.

I also changed it to an ASSERT3S as you suggested and hit it again:

[ 1758.245810] SPLError: 31360:0:(spl-taskq.c:506:taskq_thread()) VERIFY3(tq->tq_lowest_id > id) failed (1785 > 1786)

@behlendorf
Copy link
Contributor Author

So we're sure taskq_lowest_id() went backwards somehow. I think your idea about performing a cross check on taskq_lowest_id() and making sure we are really getting the lowest value is a good one. Perhaps we have an issue on the taskq_insert_in_order() side of things

@prakashsurya
Copy link
Member

Can you provide a sanity check for me on this patch:

diff --git a/module/spl/spl-taskq.c b/module/spl/spl-taskq.c
index b2b0e6c..1ac0e60 100644
--- a/module/spl/spl-taskq.c
+++ b/module/spl/spl-taskq.c
@@ -410,6 +410,9 @@ taskq_insert_in_order(taskq_t *tq, taskq_thread_t *tqt)
        taskq_thread_t *w;
        struct list_head *l;
 
+       taskq_thread_t *big = NULL;
+       taskq_thread_t *sml = NULL;
+
        SENTRY;
        ASSERT(tq);
        ASSERT(tqt);
@@ -425,6 +428,14 @@ taskq_insert_in_order(taskq_t *tq, taskq_thread_t *tqt)
        if (l == &tq->tq_active_list)
                list_add(&tqt->tqt_active_list, &tq->tq_active_list);
 
+       list_for_each_prev(l, &tq->tq_active_list) {
+               sml = big;
+               big = list_entry(l, taskq_thread_t, tqt_active_list);
+               if (sml != NULL) {
+                       ASSERT3S(big->tqt_ent->tqent_id, >, sml->tqt_ent->tqent_id);
+               }
+       }
+
        SEXIT;
 }
 

Am I traversing the list correctly (i.e. big should always have a bigger tqent_id than sml)? Or maybe I confused big and sml.. With the above patch I hit:

[  433.664553] SPLError: 24820:0:(spl-taskq.c:435:taskq_insert_in_order()) VERIFY3(big->tqt_ent->tqent_id > sml->tqt_ent->tqent_id) failed (1 > 2)
[  433.665768] SPLError: 24820:0:(spl-taskq.c:435:taskq_insert_in_order()) SPL PANIC

in the taskq:order SPL splat test.

@prakashsurya
Copy link
Member

I must be confusing the order direction, I swapped the comparison operator and it passed.

@behlendorf
Copy link
Contributor Author

Right, your operator was just wrong. Using list_for_each_next instead of list_for_each_prev might make it easier to read.

@prakashsurya
Copy link
Member

Ok, so it does appear the active list isn't staying sorted as it should.. I hit this:

[   87.703653] SPLError: 1083:0:(spl-taskq.c:435:taskq_insert_in_order()) VERIFY3(big->tqt_ent->tqent_id >= sml->tqt_ent->tqent_id) failed (630 >= 873)
[   87.704932] SPLError: 1083:0:(spl-taskq.c:435:taskq_insert_in_order()) SPL PANIC

With this patch applied:

diff --git a/module/spl/spl-taskq.c b/module/spl/spl-taskq.c
index b2b0e6c..79e0c9b 100644
--- a/module/spl/spl-taskq.c
+++ b/module/spl/spl-taskq.c
@@ -410,6 +410,9 @@ taskq_insert_in_order(taskq_t *tq, taskq_thread_t *tqt)
        taskq_thread_t *w;
        struct list_head *l;
 
+       taskq_thread_t *big = NULL;
+       taskq_thread_t *sml = NULL;
+
        SENTRY;
        ASSERT(tq);
        ASSERT(tqt);
@@ -425,6 +428,14 @@ taskq_insert_in_order(taskq_t *tq, taskq_thread_t *tqt)
        if (l == &tq->tq_active_list)
                list_add(&tqt->tqt_active_list, &tq->tq_active_list);
 
+       list_for_each_prev(l, &tq->tq_active_list) {
+               big = sml;
+               sml = list_entry(l, taskq_thread_t, tqt_active_list);
+               if (big != NULL) {
+                       ASSERT3S(big->tqt_ent->tqent_id, >=, sml->tqt_ent->tqent_id);
+               }
+       }
+
        SEXIT;
 }

@prakashsurya
Copy link
Member

With the taskq_thread id field changes we talked about, it passed the previous issues, but got caught up here:

[  304.798362] SPLError: 5353:0:(spl-taskq.c:663:__taskq_destroy()) ASSERTION(!(t->tqent_flags & TQENT_FLAG_PREALLOC)) failed
[  304.799381] SPLError: 5353:0:(spl-taskq.c:663:__taskq_destroy()) SPL PANIC

@prakashsurya
Copy link
Member

Hmm, yeah so far it seems that I may have picked up an old build artifact which caused the above assertion. I cleaned out my git repos (git clean -dxf && git reset --hard) and was able to loop through the zpios-sanity tests about 20 times until I hit this:


[ 4647.350227] SPLError: 5959:0:(zio.c:1109:zio_taskq_dispatch()) ASSERTION(taskq_empty_ent(&zio->io_tqent)) failed
[ 4647.351505] SPLError: 5959:0:(zio.c:1109:zio_taskq_dispatch()) SPL PANIC
[ 4647.352302] SPL: Showing stack for process 5959
[ 4647.352868] Pid: 5959, comm: txg_sync Tainted: P           2.6.32.49-1-lts #1
[ 4647.353709] Call Trace:
[ 4647.354013]  [<ffffffffa0309477>] spl_debug_dumpstack+0x27/0x40 [spl]
[ 4647.354769]  [<ffffffffa030abd2>] spl_debug_bug+0x82/0xd0 [spl]
[ 4647.355477]  [<ffffffffa054f3ee>] zio_taskq_dispatch+0x1ae/0x1c0 [zfs]
[ 4647.356251]  [<ffffffffa0555344>] zio_wait+0x1a4/0x3f0 [zfs]
[ 4647.356919]  [<ffffffff813960ae>] ? mutex_unlock+0xe/0x10
[ 4647.357566]  [<ffffffffa04cb72e>] dsl_pool_sync+0x14e/0x7f0 [zfs]
[ 4647.358297]  [<ffffffffa04f2ee3>] ? spa_errlog_sync+0x1f3/0x260 [zfs]
[ 4647.359090]  [<ffffffffa054fb48>] ? zio_destroy+0x138/0x270 [zfs]
[ 4647.359815]  [<ffffffffa04e74fd>] spa_sync+0x39d/0xc40 [zfs]
[ 4647.360481]  [<ffffffff81044b88>] ? __wake_up_common+0x58/0x90
[ 4647.361176]  [<ffffffffa04fb9ae>] txg_sync_thread+0x2ae/0x4f0 [zfs]
[ 4647.361939]  [<ffffffffa04fb700>] ? txg_sync_thread+0x0/0x4f0 [zfs]
[ 4647.362678]  [<ffffffffa0312d61>] thread_generic_wrapper+0x71/0xd0 [spl]
[ 4647.363462]  [<ffffffffa0312cf0>] ? thread_generic_wrapper+0x0/0xd0 [spl]
[ 4647.364256]  [<ffffffffa0312cf0>] ? thread_generic_wrapper+0x0/0xd0 [spl]
[ 4647.365049]  [<ffffffff81084188>] kthread+0x88/0x90
[ 4647.365624]  [<ffffffff8104d478>] ? finish_task_switch+0x48/0xd0
[ 4647.366340]  [<ffffffff810130aa>] child_rip+0xa/0x20
[ 4647.366925]  [<ffffffff81084100>] ? kthread+0x0/0x90
[ 4647.367508]  [<ffffffff810130a0>] ? child_rip+0x0/0x20

@prakashsurya
Copy link
Member

Well, maybe I'm wrong about the TQENT_FLAG_PREALLOC assertion above being a false alarm. Although it took running zpios-sanity nearly 50 times, I hit it again with debug enabled in the SPL but not in ZFS:


[ 1108.332022] SPLError: 28618:0:(spl-taskq.c:649:__taskq_destroy()) ASSERTION(!(t->tqent_flags & TQENT_FLAG_PREALLOC)) failed
[ 1108.333125] SPLError: 28618:0:(spl-taskq.c:649:__taskq_destroy()) SPL PANIC
[ 1108.333772] SPL: Showing stack for process 28618
[ 1108.334218] Pid: 28618, comm: lt-zpool Tainted: P           2.6.32.49-1-lts #1
[ 1108.334889] Call Trace:
[ 1108.335129]  [<ffffffffa04b0477>] spl_debug_dumpstack+0x27/0x40 [spl]
[ 1108.335730]  [<ffffffffa04b1bd2>] spl_debug_bug+0x82/0xd0 [spl]
[ 1108.336283]  [<ffffffffa04ba907>] __taskq_destroy+0x257/0x4a0 [spl]
[ 1108.336879]  [<ffffffffa05cb6c2>] ? spa_config_exit+0xa2/0xe0 [zfs]
[ 1108.337474]  [<ffffffffa05bd275>] spa_deactivate+0xb5/0x210 [zfs]
[ 1108.338079]  [<ffffffffa05c1d8a>] spa_export_common+0x13a/0x370 [zfs]
[ 1108.338689]  [<ffffffffa05c201a>] spa_destroy+0x1a/0x20 [zfs]
[ 1108.339234]  [<ffffffffa05f0e1e>] zfs_ioc_pool_destroy+0x1e/0x40 [zfs]
[ 1108.339848]  [<ffffffffa05f530c>] zfsdev_ioctl+0xdc/0x1b0 [zfs]
[ 1108.340403]  [<ffffffff81157c8a>] vfs_ioctl+0x2a/0xa0
[ 1108.340876]  [<ffffffff81117dd0>] ? unmap_region+0x150/0x170
[ 1108.341405]  [<ffffffff8115820d>] do_vfs_ioctl+0x7d/0x530
[ 1108.341910]  [<ffffffff81161b8f>] ? alloc_fd+0x4f/0x150
[ 1108.342012]  [<ffffffff81158741>] sys_ioctl+0x81/0xa0
[ 1108.342012]  [<ffffffff81012072>] system_call_fastpath+0x16/0x1b
[ 1108.342012] SPL: Dumping log to /tmp/spl-log.1324063836.28618

@prakashsurya
Copy link
Member

Aha! So yes.. As we suspected, the flags going in are not necessarily the flags coming out. In this particular case, 0x1 or TQENT_FLAG_PREALLOC was set prior to servicing the task, but not after it finished:

[  699.510038] SPLError: 18278:0:(spl-taskq.c:496:taskq_thread()) VERIFY3(tqt->tqt_flags == t->tqent_flags) failed (1 == 0)
[  699.512556] SPLError: 18278:0:(spl-taskq.c:496:taskq_thread()) SPL PANIC
[  699.513192] SPL: Showing stack for process 18278
[  699.513638] Pid: 18278, comm: z_null_iss/0 Tainted: P           2.6.32.49-1-lts #1
[  699.514343] Call Trace:
[  699.514585]  [<ffffffffa04b0477>] spl_debug_dumpstack+0x27/0x40 [spl]
[  699.515190]  [<ffffffffa04b1bd2>] spl_debug_bug+0x82/0xd0 [spl]
[  699.515748]  [<ffffffffa04bba2d>] taskq_thread+0x2ed/0x8a0 [spl]
[  699.516314]  [<ffffffff81056c00>] ? default_wake_function+0x0/0x20
[  699.516895]  [<ffffffffa04bb740>] ? taskq_thread+0x0/0x8a0 [spl]
[  699.517460]  [<ffffffff81084188>] kthread+0x88/0x90
[  699.517921]  [<ffffffff8104d478>] ? finish_task_switch+0x48/0xd0
[  699.518487]  [<ffffffff810130aa>] child_rip+0xa/0x20
[  699.518955]  [<ffffffff81084100>] ? kthread+0x0/0x90
[  699.519423]  [<ffffffff810130a0>] ? child_rip+0x0/0x20

behlendorf pushed a commit to behlendorf/spl that referenced this issue Dec 17, 2011
The taskq_t's active thread list is sorted based on its
tqt_ent->tqent_id field. The list is kept sorted solely by inserting
new taskq_thread_t's in their correct sorted location; no other
means is used. This means that once inserted, if a taskq_thread_t's
tqt_ent->tqent_id field changes, the list runs the risk of no
longer being sorted.

Prior to the introduction of the taskq_dispatch_prealloc() interface,
this was not a problem as a taskq_ent_t actively being serviced under
the old interface should always have a static tqent_id field. Thus,
once the taskq_thread_t is added to the taskq_t's active thread list,
the taskq_thread_t's tqt_ent->tqent_id field would remain constant.

Now, this is no longer the case. Currently, if using the
taskq_dispatch_prealloc() interface, any given taskq_ent_t actively
being serviced _may_ have its tqent_id value incremented. This happens
when the preallocated taskq_ent_t structure is recursively dispatched.
Thus, a taskq_thread_t could potentially have its tqt_ent->tqent_id
field silently modified from under its feet. If this were to happen
to a taskq_thread_t on a taskq_t's active thread list, this would
compromise the integrity of the order of the list (as the list
_may_ no longer be sorted).

To get around this, the taskq_thread_t's taskq_ent_t pointer was
replaced with its own static copy of the tqent_id. So, as a taskq_ent_t
is pulled off of the taskq_t's pending list, a static copy of its
tqent_id is made and this copy is used to sort the active thread
list. Using a static copy is key in ensuring the integrity of the
order of the active thread list. Even if the underlying taskq_ent_t
is recursively dispatched (as has its tqent_id modified), this
static copy stored inside the taskq_thread_t will remain constant.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#71
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants