Receiving invalid backup stream hangs pool #6524

nedbass · 2017-08-17T02:33:59Z

System information

Type	Version/Name
Distribution Name	TOSS (RHEL 7.3 derivative)
Distribution Version	toss-release-3.1-4.1rc2
Linux Kernel	3.10.0-514.26.2.1chaos.ch6_1.x86_64
Architecture	x86_64
ZFS Version	v0.7.0-31_gc8f9061
SPL Version	v0.7.0-12_g9df9692

Describe the problem you're observing

ZFS hangs when receiving an incremental backup stream.

Describe how to reproduce the problem

I can reproduce the problem with this script. It only happens if I write enough data to the dataset and if I change the dnodesize property from auto to legacy between the two snapshots.

#!/bin/bash

ZFS=./cmd/zfs/zfs
ZPOOL=./cmd/zpool/zpool
POOL=tank
DS=$POOL/fish
ROOTMTPT=/mnt/$POOL
MTPT=/mnt/$DS
NEWDS=$POOL/new
RFILE=./rfile

dd if=/dev/urandom of=$RFILE bs=262144 count=1

truncate -s 10g /tmp/a
$ZPOOL create -f -O mountpoint=$ROOTMTPT $POOL /tmp/a

mkfiles () {
        for ((i=0;i<5000;i++)) ; do
                dd if=$RFILE of=$MTPT/file-$i bs=262144 > /dev/null
        done
}

$ZFS create -o dnsize=auto $DS
mkfiles
umount $MTPT
$ZFS snap $DS@a

$ZFS set dnsize=legacy $DS
$ZFS mount $DS
rm $MTPT/*
mkfiles
umount $MTPT
$ZFS snap $DS@b

$ZFS send $DS@a > senda
$ZFS send -i $DS@a $DS@b > sendb
$ZFS recv $NEWDS < senda
$ZFS recv $NEWDS < sendb

Include any warning/errors/backtraces from the system logs

[  138.335288] SPL: Loaded module v0.7.0-12_g9df9692
[  138.402534] SPLAT: Loaded module v0.7.0-12_g9df9692
[  140.931713] ZFS: Loaded module v0.7.0-31_gc8f9061, ZFS pool version 5000, ZFS filesystem version 5
[  961.068457] INFO: task txg_quiesce:50966 blocked for more than 120 seconds.
[  961.076907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  961.086224] txg_quiesce     D 0000000000000000     0 50966      2 0x00000080
[  961.094685]  ffff882033bf3d38 0000000000000046 ffff882033bf3fd8 ffff882005e8ce70
[  961.103556]  ffff882033bf3fd8 ffff882033bf3fd8 ffff882005e8ce70 ffff881e6e780338
[  961.112421]  ffff881e6e780270 ffff881e6e780340 ffff881e6e780298 0000000000000000
[  961.121300] Call Trace:
[  961.124590]  [<ffffffff816a2e39>] schedule+0x29/0x70
[  961.130705]  [<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[  961.138167]  [<ffffffff810b4b80>] ? wake_up_atomic_t+0x30/0x30
[  961.145237]  [<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[  961.152055]  [<ffffffffa0e8c2eb>] txg_quiesce_thread+0x2fb/0x410 [zfs]
[  961.159918]  [<ffffffffa0e8bff0>] ? txg_init+0x2b0/0x2b0 [zfs]
[  961.166981]  [<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[  961.175014]  [<ffffffffa0534f70>] ? __thread_exit+0x20/0x20 [spl]
[  961.182370]  [<ffffffff810b3a0f>] kthread+0xcf/0xe0
[  961.188364]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
[  961.196211]  [<ffffffff816ae398>] ret_from_fork+0x58/0x90
[  961.202794]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
[  961.210645] INFO: task lt-zfs:95520 blocked for more than 120 seconds.
[  961.218488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  961.227797] lt-zfs          D 0000000000000000     0 95520  50879 0x00000080
[  961.236287]  ffff880f1170b710 0000000000000082 ffff880f1170bfd8 ffff881036bd2f10
[  961.245176]  ffff880f1170bfd8 ffff880f1170bfd8 ffff881036bd2f10 ffff881ff02202f8
[  961.254074]  ffff881ff0220240 ffff881ff0220300 ffff881ff0220268 0000000000000000
[  961.262973] Call Trace:
[  961.266282]  [<ffffffff816a2e39>] schedule+0x29/0x70
[  961.272440]  [<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[  961.279938]  [<ffffffff810b4b80>] ? wake_up_atomic_t+0x30/0x30
[  961.287043]  [<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[  961.293905]  [<ffffffffa0e8b93f>] txg_wait_synced+0xef/0x140 [zfs]
[  961.301405]  [<ffffffffa0e31679>] dmu_recv_cleanup_ds+0x49/0xc0 [zfs]
[  961.309191]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
[  961.315713]  [<ffffffff810bf9a4>] ? __wake_up+0x44/0x50
[  961.322141]  [<ffffffff816a0dc5>] ? mutex_lock+0x5/0x42
[  961.328563]  [<ffffffff816a30b5>] ? _cond_resched+0x5/0x50
[  961.335274]  [<ffffffff816a0de5>] ? mutex_lock+0x25/0x42
[  961.341834]  [<ffffffffa0e31635>] ? dmu_recv_cleanup_ds+0x5/0xc0 [zfs]
[  961.349721]  [<ffffffffa0e35fae>] dmu_recv_stream+0x19e/0x1090 [zfs]
[  961.357414]  [<ffffffffa0e2f510>] ? receive_cksum+0x40/0x40 [zfs]
[  961.364824]  [<ffffffffa0ec07c9>] zfs_ioc_recv_impl+0x2d9/0x10c0 [zfs]
[  961.372719]  [<ffffffffa0ec1785>] zfs_ioc_recv+0x1d5/0x2d0 [zfs]
[  961.380007]  [<ffffffff81006200>] ? amd_pmu_addr_offset+0x30/0x60
[  961.387377]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
[  961.393881]  [<ffffffff810955f7>] ? capable+0x17/0x20
[  961.400088]  [<ffffffff816a30b5>] ? _cond_resched+0x5/0x50
[  961.406798]  [<ffffffffa0ec15b5>] ? zfs_ioc_recv+0x5/0x2d0 [zfs]
[  961.414091]  [<ffffffffa0ebed96>] zfsdev_ioctl+0x606/0x650 [zfs]
[  961.421365]  [<ffffffffa0ebe790>] ? pool_status_check.part.37+0xe0/0xe0 [zfs]
[  961.429881]  [<ffffffff8121bba5>] do_vfs_ioctl+0x305/0x510
[  961.436545]  [<ffffffff8121be51>] SyS_ioctl+0xa1/0xc0
[  961.442714]  [<ffffffff816ae449>] system_call_fastpath+0x16/0x1b

More compactly:

[root@jet22:zfs]# cat /proc/`pidof txg_sync`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a6b5>] __cv_wait_sig+0x15/0x20 [spl]
[<ffffffffa0e8c8ce>] txg_sync_thread+0x22e/0x510 [zfs]
[<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff810b3a0f>] kthread+0xcf/0xe0
[<ffffffff816ae398>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
[root@jet22:zfs]# cat /proc/`pidof txg_quiesce`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa0e8c2eb>] txg_quiesce_thread+0x2fb/0x410 [zfs]
[<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff810b3a0f>] kthread+0xcf/0xe0
[<ffffffff816ae398>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
[root@jet22:zfs]# cat /proc/`pidof lt-zfs`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa0e8b93f>] txg_wait_synced+0xef/0x140 [zfs]
[<ffffffffa0e31679>] dmu_recv_cleanup_ds+0x49/0xc0 [zfs]
[<ffffffffa0e35fae>] dmu_recv_stream+0x19e/0x1090 [zfs]
[<ffffffffa0ec07c9>] zfs_ioc_recv_impl+0x2d9/0x10c0 [zfs]
[<ffffffffa0ec1785>] zfs_ioc_recv+0x1d5/0x2d0 [zfs]
[<ffffffffa0ebed96>] zfsdev_ioctl+0x606/0x650 [zfs]
[<ffffffff8121bba5>] do_vfs_ioctl+0x305/0x510
[<ffffffff8121be51>] SyS_ioctl+0xa1/0xc0
[<ffffffff816ae449>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

The text was updated successfully, but these errors were encountered:

nedbass · 2017-08-17T16:33:45Z

Ran into this while hunting a reproducer for #6366.

behlendorf · 2017-08-17T18:23:44Z

By chance did you try writing additional data to the pool to force trigger a few more txg syncs. It looks like the txg_sync thread is waiting for a signal in txg_thread_wait()->cv_wait_sig() which somehow got missed or maybe lost?

nedbass · 2017-08-17T19:24:16Z

Nope that doesn't work. Eventually the writes hang waiting for a new open txg.

Here is the tail of the debug log when the hang occurs:

1502992692   space_map.c:91:space_map_load(): object=528  offset=0  size=950
1502992692   space_map.c:91:space_map_load(): object=911  offset=0  size=7c8
1502992692   zap_micro.c:1560:zap_cursor_retrieve(): error 2
1502992692   dnode.c:551:dnode_allocate(): os=ffff882036fb7000 obj=545 txg=913 blocksize=131072 ibs=17 dn_slots=1
1502992692   dbuf.c:2340:dbuf_findbp(): error 2
1502992692   space_map.c:91:space_map_load(): object=905  offset=0  size=110
1502992692   zap.c:766:fzap_checksize(): error 22
1502992692   dnode_sync.c:121:free_blocks(): ds=          (null) obj=3d num=1
1502992692   txg.c:543:txg_sync_thread(): txg=914 quiesce_txg=915 sync_txg=917
1502992692   txg.c:661:txg_wait_synced(): broadcasting sync more tx_synced=913 waiting=917 dp=ffff882036fb2800
1502992692   dnode_sync.c:62:dnode_increase_indirection(): os=ffff882036fee800 obj=616, increase to 2
1502992692   zap.c:766:fzap_checksize(): error 22
1502992692   dnode_sync.c:121:free_blocks(): ds=          (null) obj=3d num=1
1502992692   txg.c:661:txg_wait_synced(): broadcasting sync more tx_synced=914 waiting=917 dp=ffff882036fb2800
1502992692   txg.c:594:txg_quiesce_thread(): txg=915 quiesce_txg=916 sync_txg=917

nedbass · 2017-08-17T20:31:46Z

It looks like the txg_sync thread is waiting for a signal in txg_thread_wait()->cv_wait_sig()

It's actually in cv_wait() here in txg_quiesce():

        /*
         * Quiesce the transaction group by waiting for everyone to txg_exit().
         */
        for (c = 0; c < max_ncpus; c++) {
                tx_cpu_t *tc = &tx->tx_cpu[c];
                mutex_enter(&tc->tc_lock);
                while (tc->tc_count[g] != 0)
                        cv_wait(&tc->tc_cv[g], &tc->tc_lock);
                mutex_exit(&tc->tc_lock);
        }

behlendorf · 2017-08-17T20:38:10Z

Interesting, are there any clues as to which process didn't call txg_rele_to_sync() or just hasn't yet and is holding up the whole txg sync process?

nedbass · 2017-08-17T21:56:29Z

Interesting, are there any clues as to which process didn't call txg_rele_to_sync() or just hasn't yet and is holding up the whole txg sync process?

I didn't see any obvious clues in the stack traces for all processes:

https://gist.github.com/nedbass/9a954e356b99910d6e06f01005e043b4

nedbass · 2017-08-17T22:50:05Z

Under a debug build I hit the following ASSERT when receiving the incremental stream:

2017-08-17 15:48:05 [  214.484025] PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:48:05 [  214.489889] Showing stack for process 132314
2017-08-17 15:48:05 [  214.495243] CPU: 9 PID: 132314 Comm: receive_writer Tainted: P           OE  ------------   3.10.0-514.26.2.1chaos.ch6_1.x86_64 #1
2017-08-17 15:48:05 [  214.495244] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
2017-08-17 15:48:05 [  214.495247]  ffffffffa0fdd3cc 00000000c83e131a ffff880fa273bb30 ffffffff8169d4bc
2017-08-17 15:48:05 [  214.495248]  ffff880fa273bb40 ffffffffa05332d4 ffff880fa273bcc8 ffffffffa05333a9
2017-08-17 15:48:05 [  214.495250]  ffff880fa273bbd8 ffffffff00000028 ffff880fa273bcd8 ffff880fa273bc78
2017-08-17 15:48:05 [  214.495250] Call Trace:
2017-08-17 15:48:05 [  214.495258]  [<ffffffff8169d4bc>] dump_stack+0x19/0x1b
2017-08-17 15:48:05 [  214.495266]  [<ffffffffa05332d4>] spl_dumpstack+0x44/0x50 [spl]
2017-08-17 15:48:05 [  214.495273]  [<ffffffffa05333a9>] spl_panic+0xc9/0x110 [spl]
2017-08-17 15:48:05 [  214.495278]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495279]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495282]  [<ffffffffa052ddba>] ? spl_kmem_free+0x2a/0x40 [spl]
2017-08-17 15:48:05 [  214.495284]  [<ffffffffa052ddba>] ? spl_kmem_free+0x2a/0x40 [spl]
2017-08-17 15:48:05 [  214.495322]  [<ffffffffa0e49ef5>] dmu_tx_abort+0x65/0x70 [zfs]
2017-08-17 15:48:05 [  214.495336]  [<ffffffffa0e41b8d>] receive_object.isra.10+0x31d/0x6f0 [zfs]
2017-08-17 15:48:05 [  214.495349]  [<ffffffffa0e42198>] receive_process_record+0x118/0xb50 [zfs]
2017-08-17 15:48:05 [  214.495351]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495364]  [<ffffffffa0e42c70>] receive_writer_thread+0xa0/0x230 [zfs]
2017-08-17 15:48:05 [  214.495376]  [<ffffffffa0e42bd0>] ? receive_process_record+0xb50/0xb50 [zfs]
2017-08-17 15:48:05 [  214.495379]  [<ffffffffa052ffe1>] thread_generic_wrapper+0x71/0x80 [spl]
2017-08-17 15:48:05 [  214.495381]  [<ffffffffa052ff70>] ? __thread_exit+0x20/0x20 [spl]
2017-08-17 15:48:05 [  214.495384]  [<ffffffff810b3a0f>] kthread+0xcf/0xe0
2017-08-17 15:48:05 [  214.495387]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
2017-08-17 15:48:05 [  214.495388]  [<ffffffff816ae398>] ret_from_fork+0x58/0x90
2017-08-17 15:48:05 [  214.495390]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
2017-08-17 15:48:05 Aug 17 15:48:06 jet22 kernel: VERIFY(tx->tx_txg == 0) failed
2017-08-17 15:48:05 Aug 17 15:48:06 jet22 kernel: PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:48:05 Aug 17 15:48:05 jet22 kernel: [  214.478771] VERIFY(tx->tx_txg == 0) failed
2017-08-17 15:48:05 Aug 17 15:48:05 jet22 kernel: [  214.484025] PANIC at dmu_tx.c:1157:dmu_tx_abort()

nedbass · 2017-08-17T22:59:06Z

Changed it to a ASSERT0 so we could see the actual txg value.

2017-08-17 15:57:52 Aug 17 15:57:53 jet22 kernel: VERIFY3(0 == tx->tx_txg) failed (0 == 119)
2017-08-17 15:57:52 Aug 17 15:57:53 jet22 kernel: PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:57:52 Aug 17 15:57:52 jet22 kernel: [  213.145403] VERIFY3(0 == tx->tx_txg) failed (0 == 119)
2017-08-17 15:57:52 Aug 17 15:57:52 jet22 kernel: [  213.151762] PANIC at dmu_tx.c:1157:dmu_tx_abort()

behlendorf · 2017-08-17T23:10:08Z

It looks like that dmu_tx_abort() call was changed from dmu_tx_commit() by the recent encyption patch. That's not OK, once a TX is assigned it can't be aborted and must be committed. We'll need to change that back. Let's also check with @tcaputi what the reasoning was here.

nedbass · 2017-08-17T23:52:53Z

I confirmed that the hang doesn't happen on the commit prior to the encryption patch.

nedbass · 2017-08-17T23:54:13Z

It turns out that the test script here is a reproducer for #6366, and the hang happens during the cleanup of the failed receive operation.

behlendorf · 2017-08-18T00:32:49Z

Does this fix the issue as expected?

diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index aca5019..1f4410d 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -2492,7 +2492,7 @@ receive_object(struct receive_writer_arg *rwa, struct drr_
                    drro->drr_bonustype, drro->drr_bonuslen, tx);
        }
        if (err != 0) {
-               dmu_tx_abort(tx);
+               dmu_tx_commit(tx);
                return (SET_ERROR(EINVAL));
        }

nedbass · 2017-08-18T01:11:40Z

Does this fix the issue as expected?

Yes.

tcaputi · 2017-08-18T01:32:55Z

Sorry. It looks like that was a typo on my part. I believe I thought that was the error handling for dmu_tx_assign(). Thanks for finding this.

spacelama · 2017-08-21T02:20:47Z

Talking of dmu_tx_assign, and chance #6317 is related? That hang seems to be kicked along by forcing extra txg syncs, and one of the stack traces involves dmu_tx_assign().

tcaputi · 2017-08-21T14:37:52Z

@spacelama Probably not. This bug was merged along with the encryption patch.
@behlendorf I am preparing a follow up to the encryption patch which will fix this issue along with a few other bugs that have been found so far. I should have the PR submitted today or tomorrow.

behlendorf · 2017-08-21T15:45:54Z

@tcaputi I also saw some recent discussion in the OpenZFS Encryption PR regarding a crash when trying to zfs send/receive from an unencrypted to an encrypted dataset. Have we seen this issue in ZoL?

tcaputi · 2017-08-21T15:48:21Z

Yes. Thats one thing I'm fixing as well. That one should be relatively trivial. It only happens when doing a --raw send of an unencrypted dataset. Seems like there is just a missing check somewhere.

This patch fixes several issues discovered after the encryption patch was merged: * Fixed a bug where encrypted datasets could attempt to receive embedded data records. * Fixed a bug where dirty records created by the recv code wasn't properly setting the dr_raw flag. * Fixed a typo where a dmu_tx_commit() was changed to dmu_tx_abort() * Fixed a few error handling bugs unrelated to the encryption patch in dmu_recv_stream() Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Closes openzfs#6512 Closes openzfs#6524 Closes openzfs#6545

nedbass mentioned this issue Aug 17, 2017

(!copied) is equivalent to (hdr->b_l1hdr.b_freeze_cksum == NULL) #6527

Closed

nedbass mentioned this issue Aug 17, 2017

Invalid backup stream #6366

Closed

nedbass changed the title ~~Incremental receive hangs~~ Receiving invalid backup stream hangs pool Aug 18, 2017

tcaputi mentioned this issue Aug 22, 2017

Send / Recv Fixes following b52563 #6545

Merged

13 tasks

behlendorf closed this as completed in 9b84076 Aug 23, 2017

loli10K mentioned this issue Sep 9, 2017

zfs recv hangs, locking up all pool I/O #6621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receiving invalid backup stream hangs pool #6524

Receiving invalid backup stream hangs pool #6524

nedbass commented Aug 17, 2017 •

edited

Loading

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 18, 2017

nedbass commented Aug 18, 2017 •

edited

Loading

tcaputi commented Aug 18, 2017

spacelama commented Aug 21, 2017

tcaputi commented Aug 21, 2017

behlendorf commented Aug 21, 2017

tcaputi commented Aug 21, 2017

Receiving invalid backup stream hangs pool #6524

Receiving invalid backup stream hangs pool #6524

Comments

nedbass commented Aug 17, 2017 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 17, 2017

nedbass commented Aug 17, 2017

nedbass commented Aug 17, 2017

behlendorf commented Aug 18, 2017

nedbass commented Aug 18, 2017 • edited Loading

tcaputi commented Aug 18, 2017

spacelama commented Aug 21, 2017

tcaputi commented Aug 21, 2017

behlendorf commented Aug 21, 2017

tcaputi commented Aug 21, 2017

nedbass commented Aug 17, 2017 •

edited

Loading

nedbass commented Aug 18, 2017 •

edited

Loading