Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send stuck in D state #3655

Closed
snajpa opened this issue Aug 1, 2015 · 5 comments
Closed

Send stuck in D state #3655

snajpa opened this issue Aug 1, 2015 · 5 comments

Comments

@snajpa
Copy link
Contributor

snajpa commented Aug 1, 2015

Running version at https://github.com/vpsfreecz/zfs (close to master +-).

We've switched from our rsync backups to send/recv just today and if this keeps happening, we'll be without back-ups :(

Any ideas for a quick workaround please?

[root@node9.prg.vpsfree.cz]
 ~ # ps aux | grep zfs | grep -v "\["
root      487100  0.0  0.0 127492  1388 pts/0    D+   22:16   0:00 zfs send vz/private/802 2015-08-01T18:51:35
root      487132  0.0  0.0 106100  1140 ?        S    22:16   0:00 sh -c exec zfs send vz/private/802@2015-08-01T18:51:35 | nc 172.16.0.5 10000 2>&1
root      487133  0.0  0.0 127492  1384 ?        D    22:16   0:00 zfs send vz/private/802 2015-08-01T18:51:35
root      554088  0.0  0.0 103260   864 pts/3    S+   22:24   0:00 grep zfs
[root@node9.prg.vpsfree.cz]
 ~ # cat /proc/487100/stack
[<ffffffffa0251935>] taskq_wait_id+0x65/0xa0 [spl]
[<ffffffffa03042dd>] spa_taskq_dispatch_sync+0x8d/0xc0 [zfs]
[<ffffffffa02cb232>] dump_bytes+0x42/0x50 [zfs]
[<ffffffffa02cb66a>] dump_write+0x20a/0x240 [zfs]
[<ffffffffa02cc96c>] backup_cb+0x71c/0x760 [zfs]
[<ffffffffa02cd2c6>] traverse_visitbp+0x476/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cdce1>] traverse_dnode+0x71/0xd0 [zfs]
[<ffffffffa02cd547>] traverse_visitbp+0x6f7/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cdce1>] traverse_dnode+0x71/0xd0 [zfs]
[<ffffffffa02cd3a1>] traverse_visitbp+0x551/0x7b0 [zfs]
[<ffffffffa02cd784>] traverse_impl+0x184/0x400 [zfs]
[<ffffffffa02cda96>] traverse_dataset+0x56/0x60 [zfs]
[<ffffffffa02cba0b>] dmu_send_impl+0x36b/0x4f0 [zfs]
[<ffffffffa02cc067>] dmu_send_obj+0x197/0x210 [zfs]
[<ffffffffa033a107>] zfs_ioc_send+0xa7/0x280 [zfs]
[<ffffffffa033db25>] zfsdev_ioctl+0x495/0x4d0 [zfs]
[<ffffffff811cb492>] vfs_ioctl+0x22/0xa0
[<ffffffff811cb634>] do_vfs_ioctl+0x84/0x5b0
[<ffffffff811cbbaf>] sys_ioctl+0x4f/0x80
[<ffffffff8100b122>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
[root@node9.prg.vpsfree.cz]
 ~ # cat /proc/487133/stack
[<ffffffffa0251935>] taskq_wait_id+0x65/0xa0 [spl]
[<ffffffffa03042dd>] spa_taskq_dispatch_sync+0x8d/0xc0 [zfs]
[<ffffffffa02cb232>] dump_bytes+0x42/0x50 [zfs]
[<ffffffffa02cc59a>] backup_cb+0x34a/0x760 [zfs]
[<ffffffffa02cd2c6>] traverse_visitbp+0x476/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cd23b>] traverse_visitbp+0x3eb/0x7b0 [zfs]
[<ffffffffa02cdce1>] traverse_dnode+0x71/0xd0 [zfs]
[<ffffffffa02cd3a1>] traverse_visitbp+0x551/0x7b0 [zfs]
[<ffffffffa02cd784>] traverse_impl+0x184/0x400 [zfs]
[<ffffffffa02cda96>] traverse_dataset+0x56/0x60 [zfs]
[<ffffffffa02cba0b>] dmu_send_impl+0x36b/0x4f0 [zfs]
[<ffffffffa02cc067>] dmu_send_obj+0x197/0x210 [zfs]
[<ffffffffa033a107>] zfs_ioc_send+0xa7/0x280 [zfs]
[<ffffffffa033db25>] zfsdev_ioctl+0x495/0x4d0 [zfs]
[<ffffffff811cb492>] vfs_ioctl+0x22/0xa0
[<ffffffff811cb634>] do_vfs_ioctl+0x84/0x5b0
[<ffffffff811cbbaf>] sys_ioctl+0x4f/0x80
[<ffffffff8100b122>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
@kernelOfTruth
Copy link
Contributor

@snajpa perhaps you'll find some mitigating steps when going through

#3148 (comment)

#3613 (comment)

Taking a look at https://www.illumos.org/issues/3705 this reminded me of the slab tweaking switches

spl_kmem_cache_kmem_limit & spl_kmem_cache_slab_limit (http://git.net/zfs-discuss/dsc19010.html)
spl_kmem_cache_magazine_size (https://github.com/zfsonlinux/spl/blob/master/module/spl/spl-kmem-cache.c#L73)
spl_kmem_cache_reclaim (#2570 (comment))
spl_kmem_alloc_max (#3041 (comment))

the first three settings and the last one might be worth taking a look into

not sure if that could help

@snajpa
Copy link
Contributor Author

snajpa commented Aug 30, 2015

Doesn't matter what I do, none of what @kernelOfTruth suggested has lead to anywhere (but still, thanks!).

Receiving side is Illumos and it's definitely not stuck.

@tomassrnka
Copy link

This bug still occurs on 0.6.5.3 (3.19 & 4.1 kernel)

@behlendorf
Copy link
Contributor

Does anyone know if this is still an issue in master?

@behlendorf
Copy link
Contributor

This should be resolved in recent 0.6.5.x releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants