System hangs (bad RIP value) when disk used in pool is removed (zfs-0.6.5.1) #3821

ab-oe · 2015-09-23T14:30:13Z

Hello,
I created new zpool with one disk and started copying files. Then I unplugged this disk and system hung instead of suspending I/O. After a few tests I was able to capture the call trace:

On version 0.6.4 everything works well. There is no possibility to get I/O suspended with the latest ZoL because system always hangs.

kernelOfTruth · 2015-09-23T15:16:50Z

@ab-oe thanks for the report !

referencing: #3817 Hang on USB disconnect

that report also shows

z_null_int

behlendorf · 2015-09-23T21:48:58Z

@ab-oe f you're able to reproduce this could you try setting the module option spl_taskq_thread_dynamic=0. That may resolve the issue and we an acceptable work around until it can be address properly. This effectively reverts the taskq's to their 0.6.4 behavior.

ab-oe · 2015-09-24T07:25:32Z

@behlendorf unfortunatelly setting spl_taskq_thread_dynamic to 0 doesn't resolve this issue. System hung just like before.

ab-oe · 2015-09-24T11:38:47Z

I performed bisection and see that issue with missing I/O suspend was introduced in b39c22b the z_null_int takes 100% of CPU. System is hardly responsive but it still works.

I got following call trace:

[ 1039.410577] INFO: rcu_sched self-detected stall on CPU { 2}  (t=6000 jiffies g=10308 c=10307 q=55247)
[ 1039.410585] sending NMI to all CPUs:
[ 1039.410643] NMI backtrace for cpu 0
[ 1039.410648] CPU: 0 PID: 6010 Comm: dd Tainted: P           O 3.10.36-oe64-ga7b1265 #4
[ 1039.410650] Hardware name: Supermicro X8DTS/X8DTS, BIOS 2.1     06/25/2012
[ 1039.410652] task: ffff8802f1038d00 ti: ffff8802eccaa000 task.ti: ffff8802eccaa000
[ 1039.410654] RIP: 0010:[<ffffffff81065898>]  [<ffffffff81065898>] hrtimer_run_pending+0x8/0xe0
[ 1039.410662] RSP: 0018:ffff880333203eb0  EFLAGS: 00000296
[ 1039.410664] RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff81980a40
[ 1039.410665] RDX: 000000000000008c RSI: 0000000000000010 RDI: ffffffff81951088
[ 1039.410667] RBP: 0000000000000100 R08: 000000000000c590 R09: 000000001966ad03
[ 1039.410668] R10: 0000020abff35118 R11: 000000f201b2b18c R12: 0000000000000001
[ 1039.410670] R13: ffff8802eccabfd8 R14: ffffffff81951088 R15: ffffffff81b60480
[ 1039.410672] FS:  00007fa329928700(0000) GS:ffff880333200000(0000) knlGS:0000000000000000
[ 1039.410674] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1039.410675] CR2: 0000000000915088 CR3: 00000002ed2a3000 CR4: 00000000000007f0
[ 1039.410677] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.410678] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1039.410680] Stack:
[ 1039.410681]  ffffffff8105001c 000000000000008c 0000000000000010 ffff8802f1038d00
[ 1039.410683]  0000000000000001 0000000000000100 0000000000000001 ffff8802eccabfd8
[ 1039.410686]  ffffffff81951088 0000000000000001 ffffffff8104a2f6 00000000810805ca
[ 1039.410688] Call Trace:
[ 1039.410690]  <IRQ> 

[ 1039.410694]  [<ffffffff8105001c>] ? run_timer_softirq+0x1c/0x200
[ 1039.410699]  [<ffffffff8104a2f6>] ? __do_softirq+0x116/0x210
[ 1039.410703]  [<ffffffff8169a01c>] ? call_softirq+0x1c/0x30
[ 1039.410707]  [<ffffffff8100d345>] ? do_softirq+0x45/0x80
[ 1039.410710]  [<ffffffff8104a55e>] ? irq_exit+0x9e/0xc0
[ 1039.410713]  [<ffffffff8102ba07>] ? smp_apic_timer_interrupt+0x67/0xa0
[ 1039.410716]  [<ffffffff8169985d>] ? apic_timer_interrupt+0x6d/0x80
[ 1039.410717]  <EOI> 

[ 1039.410750]  [<ffffffffa03bb065>] ? dbuf_hold_impl+0x75/0x90 [zfs]
[ 1039.410766]  [<ffffffffa03bacc8>] ? __dbuf_hold_impl+0x1d8/0x500 [zfs]
[ 1039.410772]  [<ffffffffa030ca83>] ? spl_kmem_zalloc+0x93/0x1c0 [spl]
[ 1039.410788]  [<ffffffffa03bb056>] ? dbuf_hold_impl+0x66/0x90 [zfs]
[ 1039.410804]  [<ffffffffa03bb346>] ? dbuf_hold+0x16/0x30 [zfs]
[ 1039.410825]  [<ffffffffa03d2e1a>] ? dnode_hold_impl+0x8a/0x4f0 [zfs]
[ 1039.410847]  [<ffffffffa03d71d9>] ? dsl_dataset_prev_snap_txg+0x29/0x70 [zfs]
[ 1039.410868]  [<ffffffffa03d7ddd>] ? dsl_dataset_block_freeable+0x3d/0x60 [zfs]
[ 1039.410888]  [<ffffffffa03ceb07>] ? dmu_tx_hold_object_impl.isra.7+0x27/0x110 [zfs]
[ 1039.410909]  [<ffffffffa03cfc05>] ? dmu_tx_hold_write+0x15/0x50 [zfs]
[ 1039.410939]  [<ffffffffa04476f5>] ? zfs_write+0x375/0xb00 [zfs]
[ 1039.410943]  [<ffffffff813bdfae>] ? account+0x10e/0x280
[ 1039.410972]  [<ffffffffa045c313>] ? zpl_write+0xb3/0x120 [zfs]
[ 1039.410975]  [<ffffffff810ff2e6>] ? vfs_write+0xb6/0x1b0
[ 1039.410978]  [<ffffffff810ffbb2>] ? SyS_write+0x42/0x90
[ 1039.410980]  [<ffffffff81698e18>] ? tracesys+0x7e/0xe2
[ 1039.410982]  [<ffffffff81698e77>] ? tracesys+0xdd/0xe2
[ 1039.410984] Code: 44 00 00 e8 9b ff ff ff eb e7 66 0f 1f 84 00 00 00 00 00 e9 bb ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 65 8b 04 25 d8 e2 00 00 <85> c0 74 04 c3 0f 1f 00 8b 15 a2 b0 9b 00 41 54 31 ff 55 53 85 
[ 1039.411009] NMI backtrace for cpu 1
[ 1039.411014] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O 3.10.36-oe64-ga7b1265 #4
[ 1039.411015] Hardware name: Supermicro X8DTS/X8DTS, BIOS 2.1     06/25/2012
[ 1039.411017] task: ffff88033109ba80 ti: ffff8803310c6000 task.ti: ffff8803310c6000
[ 1039.411019] RIP: 0010:[<ffffffff81353b4c>]  [<ffffffff81353b4c>] intel_idle+0xcc/0x132
[ 1039.411025] RSP: 0018:ffff8803310c7e88  EFLAGS: 00000046
[ 1039.411027] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[ 1039.411028] RDX: 0000000000000000 RSI: ffff8803310c7fd8 RDI: 0000000000000046
[ 1039.411030] RBP: 0000000000000020 R08: 0000000000d8e696 R09: 000000001966ad03
[ 1039.411032] R10: 0000020abec22151 R11: 00000000000001dc R12: 0000000000000003
[ 1039.411033] R13: 0000000000000004 R14: 0000000000000004 R15: ffff8803310c7fd8
[ 1039.411035] FS:  0000000000000000(0000) GS:ffff880333220000(0000) knlGS:0000000000000000
[ 1039.411037] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1039.411038] CR2: 00007f7912018510 CR3: 0000000001958000 CR4: 00000000000007e0
[ 1039.411040] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.411041] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1039.411042] Stack:
[ 1039.411043]  000000018107a90c ffff880333239000 ffff8803310c7ed8 ffffffff8199adc0
[ 1039.411046]  000000f1f7892584 ffffffff815643d5 ffff880333239000 ffffffff8199adc0
[ 1039.411048]  0000000000000000 ffff8803310c7fd8 0000000000000004 ffffffff815644f9
[ 1039.411051] Call Trace:
[ 1039.411056]  [<ffffffff815643d5>] ? cpuidle_enter_state+0x35/0xc0
[ 1039.411059]  [<ffffffff815644f9>] ? cpuidle_idle_call+0x99/0x160
[ 1039.411062]  [<ffffffff81013495>] ? arch_cpu_idle+0x5/0x30
[ 1039.411067]  [<ffffffff8107a309>] ? cpu_startup_entry+0x129/0x180
[ 1039.411068] Code: 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 65 48 8b 04 25 70 c6 00 00 48 8b 80 38 e0 ff ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <85> 1d 5e 72 64 00 75 0f 48 8d 74 24 04 bf 05 00 00 00 e8 6d c7 
[ 1039.411092] NMI backtrace for cpu 2
[ 1039.411095] CPU: 2 PID: 3503 Comm: z_null_int Tainted: P           O 3.10.36-oe64-ga7b1265 #4
[ 1039.411096] Hardware name: Supermicro X8DTS/X8DTS, BIOS 2.1     06/25/2012
[ 1039.411098] task: ffff88032c72ce00 ti: ffff8802ef4f4000 task.ti: ffff8802ef4f4000
[ 1039.411100] RIP: 0010:[<ffffffff813049f6>]  [<ffffffff813049f6>] delay_tsc+0x26/0x60
[ 1039.411106] RSP: 0018:ffff880333243de8  EFLAGS: 00000046
[ 1039.411108] RAX: 0000000000000002 RBX: 00000000c0d05c2e RCX: 0000000000000003
[ 1039.411109] RDX: 00000000c0d05c4e RSI: 0000000000000080 RDI: 0000000000030d49
[ 1039.411111] RBP: 0000000000030d49 R08: ffffffff81a208f0 R09: 000000000000fffe
[ 1039.411112] R10: ffff88033324e880 R11: 00000000000005bd R12: 0000000000000002
[ 1039.411114] R13: 0000000000000400 R14: 0000000000000002 R15: 0000000000000006
[ 1039.411116] FS:  0000000000000000(0000) GS:ffff880333240000(0000) knlGS:0000000000000000
[ 1039.411117] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1039.411119] CR2: 00007ff19ee92510 CR3: 0000000001958000 CR4: 00000000000007e0
[ 1039.411120] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.411122] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1039.411123] Stack:
[ 1039.411128]  0000000000001000 00000000000003e9 ffffffff81a208f0 ffffffff8102b8e2
[ 1039.411148]  0000000000000003 000000000000b03a ffffffff8102c5dd 0000000000000082
[ 1039.411166]  0000000000002710 ffffffff81a20900 ffffffff81984900 ffffffff81984840
[ 1039.411186] Call Trace:
[ 1039.411191]  <IRQ> 

[ 1039.411203]  [<ffffffff8102b8e2>] ? native_safe_apic_wait_icr_idle+0x22/0x50
[ 1039.411213]  [<ffffffff8102c5dd>] ? default_send_IPI_mask_sequence_phys+0xcd/0xf0
[ 1039.411224]  [<ffffffff8102c742>] ? arch_trigger_all_cpu_backtrace+0x52/0x90
[ 1039.411235]  [<ffffffff810adf0d>] ? rcu_check_callbacks+0x2ed/0x580
[ 1039.411240]  [<ffffffff8105052b>] ? update_process_times+0x3b/0x70
[ 1039.411244]  [<ffffffff81082744>] ? tick_sched_timer+0x34/0x50
[ 1039.411247]  [<ffffffff8106509c>] ? __run_hrtimer.isra.32+0x3c/0xc0
[ 1039.411249]  [<ffffffff810656bf>] ? hrtimer_interrupt+0xef/0x240
[ 1039.411252]  [<ffffffff8102ba02>] ? smp_apic_timer_interrupt+0x62/0xa0
[ 1039.411255]  [<ffffffff8169985d>] ? apic_timer_interrupt+0x6d/0x80
[ 1039.411256]  <EOI> 

[ 1039.411290]  [<ffffffffa0454bdf>] ? zio_execute+0xff/0x190 [zfs]
[ 1039.411296]  [<ffffffffa0310110>] ? taskq_thread+0x220/0x470 [spl]
[ 1039.411300]  [<ffffffff8106cf80>] ? wake_up_process+0x50/0x50
[ 1039.411305]  [<ffffffffa030fef0>] ? taskq_cancel_id+0x110/0x110 [spl]
[ 1039.411309]  [<ffffffff81061e12>] ? kthread+0xc2/0xd0
[ 1039.411312]  [<ffffffff81060000>] ? add_sysfs_param.isra.7+0x150/0x1a0
[ 1039.411315]  [<ffffffff81061d50>] ? kthread_create_on_node+0x110/0x110
[ 1039.411318]  [<ffffffff81698bbc>] ? ret_from_fork+0x7c/0xb0
[ 1039.411321]  [<ffffffff81061d50>] ? kthread_create_on_node+0x110/0x110
[ 1039.411322] Code: 00 00 00 00 00 41 54 55 89 fd 53 65 44 8b 24 25 34 b0 00 00 66 66 90 0f ae e8 e8 f6 d1 d0 ff 66 90 89 c3 66 90 66 66 90 0f ae e8 <e8> e5 d1 d0 ff 66 90 89 c2 29 d8 39 e8 73 1b f3 90 65 8b 04 25 
[ 1039.411416] NMI backtrace for cpu 3
[ 1039.411420] CPU: 3 PID: 0 Comm: swapper/3 Tainted: P           O 3.10.36-oe64-ga7b1265 #4
[ 1039.411422] Hardware name: Supermicro X8DTS/X8DTS, BIOS 2.1     06/25/2012
[ 1039.411424] task: ffff88033109c780 ti: ffff8803310ca000 task.ti: ffff8803310ca000
[ 1039.411425] RIP: 0010:[<ffffffff81353b4c>]  [<ffffffff81353b4c>] intel_idle+0xcc/0x132
[ 1039.411431] RSP: 0018:ffff8803310cbe88  EFLAGS: 00000046
[ 1039.411433] RAX: 0000000000000020 RBX: 0000000000000008 RCX: 0000000000000001
[ 1039.411434] RDX: 0000000000000000 RSI: ffff8803310cbfd8 RDI: 0000000000000046
[ 1039.411436] RBP: 0000000000000020 R08: 0000000000d950ab R09: 000000001966ad03
[ 1039.411438] R10: 0000020abff35118 R11: 00000000000001de R12: 0000000000000003
[ 1039.411439] R13: 0000000000000004 R14: 0000000000000004 R15: ffff8803310cbfd8
[ 1039.411441] FS:  0000000000000000(0000) GS:ffff880333260000(0000) knlGS:0000000000000000
[ 1039.411443] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1039.411444] CR2: 00007fa79ec9255b CR3: 0000000001958000 CR4: 00000000000007e0
[ 1039.411446] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.411447] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1039.411448] Stack:
[ 1039.411449]  000000038107a90c ffff880333279000 ffff8803310cbed8 ffffffff8199adc0
[ 1039.411452]  000000f1f821ee85 ffffffff815643d5 ffff880333279000 ffffffff8199adc0
[ 1039.411454]  0000000000000000 ffff8803310cbfd8 0000000000000004 ffffffff815644f9
[ 1039.411457] Call Trace:
[ 1039.411461]  [<ffffffff815643d5>] ? cpuidle_enter_state+0x35/0xc0
[ 1039.411463]  [<ffffffff815644f9>] ? cpuidle_idle_call+0x99/0x160
[ 1039.411467]  [<ffffffff81013495>] ? arch_cpu_idle+0x5/0x30
[ 1039.411470]  [<ffffffff8107a309>] ? cpu_startup_entry+0x129/0x180
[ 1039.411471] Code: 89 d1 48 8d 86 38 e0 ff ff 0f 01 c8 0f ae f0 65 48 8b 04 25 70 c6 00 00 48 8b 80 38 e0 ff ff a8 08 75 08 b1 01 48 89 e8 0f 01 c9 <85> 1d 5e 72 64 00 75 0f 48 8d 74 24 04 bf 05 00 00 00 e8 6d c7

behlendorf · 2015-09-24T21:44:43Z

@ab-oe thanks for posting the debugging. This definitely looks like a duplicate of #3652, and it's clear that the z_null_int thread is getting blocked spinning on the taskq spin lock. Thanks for bisecting the change, that's helpful. Have you tried reverting b39c22b and setting spl_taskq_thread_dynamic to 0. Does it resolve the issue?

ab-oe · 2015-09-25T07:56:37Z

@behlendorf yes it works with reverted b39c22b it works even if spl_taskq_thread_dynamic is set to 1. I haven't found the commit that causes immediate system hang when disk is removed yet.

behlendorf · 2015-09-25T16:10:55Z

Fix proposed in #3833.

Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870

behlendorf · 2015-09-25T20:24:58Z

Resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release.

ab-oe · 2015-09-28T08:05:44Z

@behlendorf thank you. The 5592404 fixes this issue.
There is another issue #3577 I tested it lately on ZFS version with b39c22b and it seems that WRITE_SYNC resolved it and now it is introduced again.

Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870

behlendorf added this to the 0.6.5.2 milestone Sep 23, 2015

behlendorf added the Bug - Minor label Sep 23, 2015

behlendorf mentioned this issue Sep 23, 2015

CPU stall when disk is removed #3652

Closed

behlendorf closed this as completed Sep 25, 2015

ab-oe mentioned this issue Sep 28, 2015

Data corruption in cluster environment that uses zfs and shared storage. #3577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System hangs (bad RIP value) when disk used in pool is removed (zfs-0.6.5.1) #3821

System hangs (bad RIP value) when disk used in pool is removed (zfs-0.6.5.1) #3821

ab-oe commented Sep 23, 2015

kernelOfTruth commented Sep 23, 2015

behlendorf commented Sep 23, 2015

ab-oe commented Sep 24, 2015

ab-oe commented Sep 24, 2015

behlendorf commented Sep 24, 2015

ab-oe commented Sep 25, 2015

behlendorf commented Sep 25, 2015

behlendorf commented Sep 25, 2015

ab-oe commented Sep 28, 2015

System hangs (bad RIP value) when disk used in pool is removed (zfs-0.6.5.1) #3821

System hangs (bad RIP value) when disk used in pool is removed (zfs-0.6.5.1) #3821

Comments

ab-oe commented Sep 23, 2015

kernelOfTruth commented Sep 23, 2015

behlendorf commented Sep 23, 2015

ab-oe commented Sep 24, 2015

ab-oe commented Sep 24, 2015

behlendorf commented Sep 24, 2015

ab-oe commented Sep 25, 2015

behlendorf commented Sep 25, 2015

behlendorf commented Sep 25, 2015

ab-oe commented Sep 28, 2015