Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving invalid backup stream hangs pool #6524

Closed
nedbass opened this issue Aug 17, 2017 · 18 comments
Closed

Receiving invalid backup stream hangs pool #6524

nedbass opened this issue Aug 17, 2017 · 18 comments

Comments

@nedbass
Copy link
Contributor

nedbass commented Aug 17, 2017

System information

Type Version/Name
Distribution Name TOSS (RHEL 7.3 derivative)
Distribution Version toss-release-3.1-4.1rc2
Linux Kernel 3.10.0-514.26.2.1chaos.ch6_1.x86_64
Architecture x86_64
ZFS Version v0.7.0-31_gc8f9061
SPL Version v0.7.0-12_g9df9692

Describe the problem you're observing

ZFS hangs when receiving an incremental backup stream.

Describe how to reproduce the problem

I can reproduce the problem with this script. It only happens if I write enough data to the dataset and if I change the dnodesize property from auto to legacy between the two snapshots.

#!/bin/bash

ZFS=./cmd/zfs/zfs
ZPOOL=./cmd/zpool/zpool
POOL=tank
DS=$POOL/fish
ROOTMTPT=/mnt/$POOL
MTPT=/mnt/$DS
NEWDS=$POOL/new
RFILE=./rfile

dd if=/dev/urandom of=$RFILE bs=262144 count=1

truncate -s 10g /tmp/a
$ZPOOL create -f -O mountpoint=$ROOTMTPT $POOL /tmp/a

mkfiles () {
        for ((i=0;i<5000;i++)) ; do
                dd if=$RFILE of=$MTPT/file-$i bs=262144 > /dev/null
        done
}

$ZFS create -o dnsize=auto $DS
mkfiles
umount $MTPT
$ZFS snap $DS@a

$ZFS set dnsize=legacy $DS
$ZFS mount $DS
rm $MTPT/*
mkfiles
umount $MTPT
$ZFS snap $DS@b

$ZFS send $DS@a > senda
$ZFS send -i $DS@a $DS@b > sendb
$ZFS recv $NEWDS < senda
$ZFS recv $NEWDS < sendb

Include any warning/errors/backtraces from the system logs

[  138.335288] SPL: Loaded module v0.7.0-12_g9df9692
[  138.402534] SPLAT: Loaded module v0.7.0-12_g9df9692
[  140.931713] ZFS: Loaded module v0.7.0-31_gc8f9061, ZFS pool version 5000, ZFS filesystem version 5
[  961.068457] INFO: task txg_quiesce:50966 blocked for more than 120 seconds.
[  961.076907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  961.086224] txg_quiesce     D 0000000000000000     0 50966      2 0x00000080
[  961.094685]  ffff882033bf3d38 0000000000000046 ffff882033bf3fd8 ffff882005e8ce70
[  961.103556]  ffff882033bf3fd8 ffff882033bf3fd8 ffff882005e8ce70 ffff881e6e780338
[  961.112421]  ffff881e6e780270 ffff881e6e780340 ffff881e6e780298 0000000000000000
[  961.121300] Call Trace:
[  961.124590]  [<ffffffff816a2e39>] schedule+0x29/0x70
[  961.130705]  [<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[  961.138167]  [<ffffffff810b4b80>] ? wake_up_atomic_t+0x30/0x30
[  961.145237]  [<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[  961.152055]  [<ffffffffa0e8c2eb>] txg_quiesce_thread+0x2fb/0x410 [zfs]
[  961.159918]  [<ffffffffa0e8bff0>] ? txg_init+0x2b0/0x2b0 [zfs]
[  961.166981]  [<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[  961.175014]  [<ffffffffa0534f70>] ? __thread_exit+0x20/0x20 [spl]
[  961.182370]  [<ffffffff810b3a0f>] kthread+0xcf/0xe0
[  961.188364]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
[  961.196211]  [<ffffffff816ae398>] ret_from_fork+0x58/0x90
[  961.202794]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
[  961.210645] INFO: task lt-zfs:95520 blocked for more than 120 seconds.
[  961.218488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  961.227797] lt-zfs          D 0000000000000000     0 95520  50879 0x00000080
[  961.236287]  ffff880f1170b710 0000000000000082 ffff880f1170bfd8 ffff881036bd2f10
[  961.245176]  ffff880f1170bfd8 ffff880f1170bfd8 ffff881036bd2f10 ffff881ff02202f8
[  961.254074]  ffff881ff0220240 ffff881ff0220300 ffff881ff0220268 0000000000000000
[  961.262973] Call Trace:
[  961.266282]  [<ffffffff816a2e39>] schedule+0x29/0x70
[  961.272440]  [<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[  961.279938]  [<ffffffff810b4b80>] ? wake_up_atomic_t+0x30/0x30
[  961.287043]  [<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[  961.293905]  [<ffffffffa0e8b93f>] txg_wait_synced+0xef/0x140 [zfs]
[  961.301405]  [<ffffffffa0e31679>] dmu_recv_cleanup_ds+0x49/0xc0 [zfs]
[  961.309191]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
[  961.315713]  [<ffffffff810bf9a4>] ? __wake_up+0x44/0x50
[  961.322141]  [<ffffffff816a0dc5>] ? mutex_lock+0x5/0x42
[  961.328563]  [<ffffffff816a30b5>] ? _cond_resched+0x5/0x50
[  961.335274]  [<ffffffff816a0de5>] ? mutex_lock+0x25/0x42
[  961.341834]  [<ffffffffa0e31635>] ? dmu_recv_cleanup_ds+0x5/0xc0 [zfs]
[  961.349721]  [<ffffffffa0e35fae>] dmu_recv_stream+0x19e/0x1090 [zfs]
[  961.357414]  [<ffffffffa0e2f510>] ? receive_cksum+0x40/0x40 [zfs]
[  961.364824]  [<ffffffffa0ec07c9>] zfs_ioc_recv_impl+0x2d9/0x10c0 [zfs]
[  961.372719]  [<ffffffffa0ec1785>] zfs_ioc_recv+0x1d5/0x2d0 [zfs]
[  961.380007]  [<ffffffff81006200>] ? amd_pmu_addr_offset+0x30/0x60
[  961.387377]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
[  961.393881]  [<ffffffff810955f7>] ? capable+0x17/0x20
[  961.400088]  [<ffffffff816a30b5>] ? _cond_resched+0x5/0x50
[  961.406798]  [<ffffffffa0ec15b5>] ? zfs_ioc_recv+0x5/0x2d0 [zfs]
[  961.414091]  [<ffffffffa0ebed96>] zfsdev_ioctl+0x606/0x650 [zfs]
[  961.421365]  [<ffffffffa0ebe790>] ? pool_status_check.part.37+0xe0/0xe0 [zfs]
[  961.429881]  [<ffffffff8121bba5>] do_vfs_ioctl+0x305/0x510
[  961.436545]  [<ffffffff8121be51>] SyS_ioctl+0xa1/0xc0
[  961.442714]  [<ffffffff816ae449>] system_call_fastpath+0x16/0x1b

More compactly:

[root@jet22:zfs]# cat /proc/`pidof txg_sync`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a6b5>] __cv_wait_sig+0x15/0x20 [spl]
[<ffffffffa0e8c8ce>] txg_sync_thread+0x22e/0x510 [zfs]
[<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff810b3a0f>] kthread+0xcf/0xe0
[<ffffffff816ae398>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
[root@jet22:zfs]# cat /proc/`pidof txg_quiesce`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa0e8c2eb>] txg_quiesce_thread+0x2fb/0x410 [zfs]
[<ffffffffa0534fe1>] thread_generic_wrapper+0x71/0x80 [spl]
[<ffffffff810b3a0f>] kthread+0xcf/0xe0
[<ffffffff816ae398>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff
[root@jet22:zfs]# cat /proc/`pidof lt-zfs`/stack 
[<ffffffffa053a655>] cv_wait_common+0x125/0x150 [spl]
[<ffffffffa053a695>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa0e8b93f>] txg_wait_synced+0xef/0x140 [zfs]
[<ffffffffa0e31679>] dmu_recv_cleanup_ds+0x49/0xc0 [zfs]
[<ffffffffa0e35fae>] dmu_recv_stream+0x19e/0x1090 [zfs]
[<ffffffffa0ec07c9>] zfs_ioc_recv_impl+0x2d9/0x10c0 [zfs]
[<ffffffffa0ec1785>] zfs_ioc_recv+0x1d5/0x2d0 [zfs]
[<ffffffffa0ebed96>] zfsdev_ioctl+0x606/0x650 [zfs]
[<ffffffff8121bba5>] do_vfs_ioctl+0x305/0x510
[<ffffffff8121be51>] SyS_ioctl+0xa1/0xc0
[<ffffffff816ae449>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

Ran into this while hunting a reproducer for #6366.

@behlendorf
Copy link
Contributor

By chance did you try writing additional data to the pool to force trigger a few more txg syncs. It looks like the txg_sync thread is waiting for a signal in txg_thread_wait()->cv_wait_sig() which somehow got missed or maybe lost?

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

Nope that doesn't work. Eventually the writes hang waiting for a new open txg.

Here is the tail of the debug log when the hang occurs:

1502992692   space_map.c:91:space_map_load(): object=528  offset=0  size=950
1502992692   space_map.c:91:space_map_load(): object=911  offset=0  size=7c8
1502992692   zap_micro.c:1560:zap_cursor_retrieve(): error 2
1502992692   dnode.c:551:dnode_allocate(): os=ffff882036fb7000 obj=545 txg=913 blocksize=131072 ibs=17 dn_slots=1
1502992692   dbuf.c:2340:dbuf_findbp(): error 2
1502992692   space_map.c:91:space_map_load(): object=905  offset=0  size=110
1502992692   zap.c:766:fzap_checksize(): error 22
1502992692   dnode_sync.c:121:free_blocks(): ds=          (null) obj=3d num=1
1502992692   txg.c:543:txg_sync_thread(): txg=914 quiesce_txg=915 sync_txg=917
1502992692   txg.c:661:txg_wait_synced(): broadcasting sync more tx_synced=913 waiting=917 dp=ffff882036fb2800
1502992692   dnode_sync.c:62:dnode_increase_indirection(): os=ffff882036fee800 obj=616, increase to 2
1502992692   zap.c:766:fzap_checksize(): error 22
1502992692   dnode_sync.c:121:free_blocks(): ds=          (null) obj=3d num=1
1502992692   txg.c:661:txg_wait_synced(): broadcasting sync more tx_synced=914 waiting=917 dp=ffff882036fb2800
1502992692   txg.c:594:txg_quiesce_thread(): txg=915 quiesce_txg=916 sync_txg=917

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

It looks like the txg_sync thread is waiting for a signal in txg_thread_wait()->cv_wait_sig()

It's actually in cv_wait() here in txg_quiesce():

        /*
         * Quiesce the transaction group by waiting for everyone to txg_exit().
         */
        for (c = 0; c < max_ncpus; c++) {
                tx_cpu_t *tc = &tx->tx_cpu[c];
                mutex_enter(&tc->tc_lock);
                while (tc->tc_count[g] != 0)
                        cv_wait(&tc->tc_cv[g], &tc->tc_lock);
                mutex_exit(&tc->tc_lock);
        }

@behlendorf
Copy link
Contributor

Interesting, are there any clues as to which process didn't call txg_rele_to_sync() or just hasn't yet and is holding up the whole txg sync process?

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

Interesting, are there any clues as to which process didn't call txg_rele_to_sync() or just hasn't yet and is holding up the whole txg sync process?

I didn't see any obvious clues in the stack traces for all processes:

https://gist.github.com/nedbass/9a954e356b99910d6e06f01005e043b4

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

Under a debug build I hit the following ASSERT when receiving the incremental stream:

2017-08-17 15:48:05 [  214.484025] PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:48:05 [  214.489889] Showing stack for process 132314
2017-08-17 15:48:05 [  214.495243] CPU: 9 PID: 132314 Comm: receive_writer Tainted: P           OE  ------------   3.10.0-514.26.2.1chaos.ch6_1.x86_64 #1
2017-08-17 15:48:05 [  214.495244] Hardware name: Intel Corporation S2600WTTR/S2600WTTR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
2017-08-17 15:48:05 [  214.495247]  ffffffffa0fdd3cc 00000000c83e131a ffff880fa273bb30 ffffffff8169d4bc
2017-08-17 15:48:05 [  214.495248]  ffff880fa273bb40 ffffffffa05332d4 ffff880fa273bcc8 ffffffffa05333a9
2017-08-17 15:48:05 [  214.495250]  ffff880fa273bbd8 ffffffff00000028 ffff880fa273bcd8 ffff880fa273bc78
2017-08-17 15:48:05 [  214.495250] Call Trace:
2017-08-17 15:48:05 [  214.495258]  [<ffffffff8169d4bc>] dump_stack+0x19/0x1b
2017-08-17 15:48:05 [  214.495266]  [<ffffffffa05332d4>] spl_dumpstack+0x44/0x50 [spl]
2017-08-17 15:48:05 [  214.495273]  [<ffffffffa05333a9>] spl_panic+0xc9/0x110 [spl]
2017-08-17 15:48:05 [  214.495278]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495279]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495282]  [<ffffffffa052ddba>] ? spl_kmem_free+0x2a/0x40 [spl]
2017-08-17 15:48:05 [  214.495284]  [<ffffffffa052ddba>] ? spl_kmem_free+0x2a/0x40 [spl]
2017-08-17 15:48:05 [  214.495322]  [<ffffffffa0e49ef5>] dmu_tx_abort+0x65/0x70 [zfs]
2017-08-17 15:48:05 [  214.495336]  [<ffffffffa0e41b8d>] receive_object.isra.10+0x31d/0x6f0 [zfs]
2017-08-17 15:48:05 [  214.495349]  [<ffffffffa0e42198>] receive_process_record+0x118/0xb50 [zfs]
2017-08-17 15:48:05 [  214.495351]  [<ffffffff816b0171>] ? ftrace_call+0x5/0x2f
2017-08-17 15:48:05 [  214.495364]  [<ffffffffa0e42c70>] receive_writer_thread+0xa0/0x230 [zfs]
2017-08-17 15:48:05 [  214.495376]  [<ffffffffa0e42bd0>] ? receive_process_record+0xb50/0xb50 [zfs]
2017-08-17 15:48:05 [  214.495379]  [<ffffffffa052ffe1>] thread_generic_wrapper+0x71/0x80 [spl]
2017-08-17 15:48:05 [  214.495381]  [<ffffffffa052ff70>] ? __thread_exit+0x20/0x20 [spl]
2017-08-17 15:48:05 [  214.495384]  [<ffffffff810b3a0f>] kthread+0xcf/0xe0
2017-08-17 15:48:05 [  214.495387]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
2017-08-17 15:48:05 [  214.495388]  [<ffffffff816ae398>] ret_from_fork+0x58/0x90
2017-08-17 15:48:05 [  214.495390]  [<ffffffff810b3940>] ? kthread_create_on_node+0x140/0x140
2017-08-17 15:48:05 Aug 17 15:48:06 jet22 kernel: VERIFY(tx->tx_txg == 0) failed
2017-08-17 15:48:05 Aug 17 15:48:06 jet22 kernel: PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:48:05 Aug 17 15:48:05 jet22 kernel: [  214.478771] VERIFY(tx->tx_txg == 0) failed
2017-08-17 15:48:05 Aug 17 15:48:05 jet22 kernel: [  214.484025] PANIC at dmu_tx.c:1157:dmu_tx_abort()

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

Changed it to a ASSERT0 so we could see the actual txg value.

2017-08-17 15:57:52 Aug 17 15:57:53 jet22 kernel: VERIFY3(0 == tx->tx_txg) failed (0 == 119)
2017-08-17 15:57:52 Aug 17 15:57:53 jet22 kernel: PANIC at dmu_tx.c:1157:dmu_tx_abort()
2017-08-17 15:57:52 Aug 17 15:57:52 jet22 kernel: [  213.145403] VERIFY3(0 == tx->tx_txg) failed (0 == 119)
2017-08-17 15:57:52 Aug 17 15:57:52 jet22 kernel: [  213.151762] PANIC at dmu_tx.c:1157:dmu_tx_abort()

@behlendorf
Copy link
Contributor

It looks like that dmu_tx_abort() call was changed from dmu_tx_commit() by the recent encyption patch. That's not OK, once a TX is assigned it can't be aborted and must be committed. We'll need to change that back. Let's also check with @tcaputi what the reasoning was here.

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

I confirmed that the hang doesn't happen on the commit prior to the encryption patch.

@nedbass
Copy link
Contributor Author

nedbass commented Aug 17, 2017

It turns out that the test script here is a reproducer for #6366, and the hang happens during the cleanup of the failed receive operation.

@behlendorf
Copy link
Contributor

Does this fix the issue as expected?

diff --git a/module/zfs/dmu_send.c b/module/zfs/dmu_send.c
index aca5019..1f4410d 100644
--- a/module/zfs/dmu_send.c
+++ b/module/zfs/dmu_send.c
@@ -2492,7 +2492,7 @@ receive_object(struct receive_writer_arg *rwa, struct drr_
                    drro->drr_bonustype, drro->drr_bonuslen, tx);
        }
        if (err != 0) {
-               dmu_tx_abort(tx);
+               dmu_tx_commit(tx);
                return (SET_ERROR(EINVAL));
        }
 

@nedbass nedbass changed the title Incremental receive hangs Receiving invalid backup stream hangs pool Aug 18, 2017
@nedbass
Copy link
Contributor Author

nedbass commented Aug 18, 2017

Does this fix the issue as expected?

Yes.

@tcaputi
Copy link
Contributor

tcaputi commented Aug 18, 2017

Sorry. It looks like that was a typo on my part. I believe I thought that was the error handling for dmu_tx_assign(). Thanks for finding this.

@spacelama
Copy link

Talking of dmu_tx_assign, and chance #6317 is related? That hang seems to be kicked along by forcing extra txg syncs, and one of the stack traces involves dmu_tx_assign().

@tcaputi
Copy link
Contributor

tcaputi commented Aug 21, 2017

@spacelama Probably not. This bug was merged along with the encryption patch.
@behlendorf I am preparing a follow up to the encryption patch which will fix this issue along with a few other bugs that have been found so far. I should have the PR submitted today or tomorrow.

@behlendorf
Copy link
Contributor

@tcaputi I also saw some recent discussion in the OpenZFS Encryption PR regarding a crash when trying to zfs send/receive from an unencrypted to an encrypted dataset. Have we seen this issue in ZoL?

@tcaputi
Copy link
Contributor

tcaputi commented Aug 21, 2017

Yes. Thats one thing I'm fixing as well. That one should be relatively trivial. It only happens when doing a --raw send of an unencrypted dataset. Seems like there is just a missing check somewhere.

FransUrbo pushed a commit to FransUrbo/zfs that referenced this issue Apr 28, 2019
This patch fixes several issues discovered after
the encryption patch was merged:

* Fixed a bug where encrypted datasets could attempt
  to receive embedded data records.

* Fixed a bug where dirty records created by the recv
  code wasn't properly setting the dr_raw flag.

* Fixed a typo where a dmu_tx_commit() was changed to
  dmu_tx_abort()

* Fixed a few error handling bugs unrelated to the
  encryption patch in dmu_recv_stream()

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#6512 
Closes openzfs#6524 
Closes openzfs#6545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants