Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent stalling of ZFS pool #3979

Closed
gvdijnsen opened this issue Nov 4, 2015 · 24 comments
Closed

Intermittent stalling of ZFS pool #3979

gvdijnsen opened this issue Nov 4, 2015 · 24 comments
Labels
Status: Inactive Not being actively updated Type: Performance Performance improvement or performance problem

Comments

@gvdijnsen
Copy link

We have had intermittent crashes of our file server, initially spaced weeks apart, but with the recent updates this has become more frequent. Yesterday it crashed 3 times in 24 hours. Some observations:

  • Just before everything stalls, memory use drops rapidly from 90GB (which is normal) to around 40GB.
  • Access to the ZFS pool becomes increasingly slow, and grinds to a halt completely after a while.
  • A 'zpool iostat -v 1' show all disks show 0 iops for all disks with 0 throughput.
  • All nfs processes start waiting on access and eventually throw a stack trace:
[Tue Nov  3 07:27:11 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Nov  3 07:27:11 2015] nfsd            D ffff880ffee13180     0  6640      2 0x00000000
[Tue Nov  3 07:27:11 2015]  ffff880fafd15c80 0000000000000046 ffff880fafd18000 ffff880fafd15fd8
[Tue Nov  3 07:27:11 2015]  0000000000013180 0000000000013180 ffff880fafd18000 ffff880d17599c28
[Tue Nov  3 07:27:11 2015]  ffff880d17599c2c ffff880fafd18000 00000000ffffffff ffff880d17599c30
[Tue Nov  3 07:27:11 2015] Call Trace:
[Tue Nov  3 07:27:11 2015]  [<ffffffff817285e9>] schedule_preempt_disabled+0x29/0x70
[Tue Nov  3 07:27:11 2015]  [<ffffffff8172a455>] __mutex_lock_slowpath+0x135/0x1b0
[Tue Nov  3 07:27:11 2015]  [<ffffffff8172a4ef>] mutex_lock+0x1f/0x2f
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05d2761>] nfsd_lookup_dentry+0xa1/0x490 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffff810cd340>] ? getboottime+0x30/0x40
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05d2bb9>] nfsd_lookup+0x69/0x130 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa0514da0>] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05dc1b1>] nfsd3_proc_lookup+0xe1/0x1a0 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05cdd3b>] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa051163d>] svc_process_common+0x46d/0x6d0 [sunrpc]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05119a7>] svc_process+0x107/0x170 [sunrpc]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05cd71f>] nfsd+0xbf/0x130 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffffa05cd660>] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Nov  3 07:27:11 2015]  [<ffffffff8108b7d2>] kthread+0xd2/0xf0
[Tue Nov  3 07:27:11 2015]  [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Nov  3 07:27:11 2015]  [<ffffffff817347e8>] ret_from_fork+0x58/0x90
[Tue Nov  3 07:27:11 2015]  [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0

I have tried several memory/spl tweaks (which mostly come from the issues her in github), disabled snapshotting, disabled replication, disabled scrubbing and upgraded everything to the latest version available. It is probably load related because we have a slave here that mirrors the master with exactly the same setup and it has never failed.

This is a large system with roughly 80TB of data available, of which 30TB is in use. It has 128G RAM. Special boot options (most have come in there in the past because of other crashes or optimizations):
zfs_arc_max=68719476736
zfs_arc_meta_limit=17179869184
zfs_prefetch_disable=1
spl_kmem_cache_slab_limit=0
zfs_vdev_cache_size=1310720
zfs_vdev_cache_max=131072
zfs_vdev_cache_bshift=17

Ubuntu 14.04.3 LTS
Kernel: 3.13.0-66-generic
zfs-dkms / zfsutils: 0.6.5.3-1~trusty

Any help would be appreciated!

@kernelOfTruth
Copy link
Contributor

ough :(

@gvdijnsen could you please post output of

cat /proc/spl/kstat/zfs/arcstats

when it's working properly and when those stalls & memory collapses happen ?

Could you please attempt to set

zfs_arc_min

to a value, say, 24 or 32 GB ?

I doubt that this would help and believe that it would be related to NFS but there has been some issues with stalls and ARC collapsing in the recent past

Make sure to set those options in /etc/modprobe.d/zfs.conf (or similar) since setting them via /sys/ oftentimes doesn't seem to stick or even work

@gvdijnsen
Copy link
Author

Thank you for extending a helping hand here!

I found your answer just a I had rebooted from another crash. One thing I forgot to mention is that at a reboot when this behaviour is displayed, the shutdown sequence hangs with the message "umount2: Device or resource busy" and it needs a reset.

Because this is a production system I cannot reboot at will, so I put the suggested change in /etc/modprobe.d/zfs.conf so it will be active at the reboot. I also set them through 'echo 34359738368 > /sys/module/zfs/parameters/zfs_arc_min' so that it may perhaps be active this time around. An immediate efffect seems to be that the memory consumption (at least for now, uptime 1:50) no longer seems to go past 50GB, which is a change with recent behaviour at least. I will keep you updated.

@gvdijnsen
Copy link
Author

Ok it crashed again, so I have some more information for you.

The arcstats as I dumped them after an uptime of around 15 hrs. Everything was working smoothly at that time:
6 1 0x01 91 4368 6329068452 55247022853839
name type data
hits 4 297557532
misses 4 41666862
demand_data_hits 4 15492913
demand_data_misses 4 5044451
demand_metadata_hits 4 281919966
demand_metadata_misses 4 35520282
prefetch_data_hits 4 103732
prefetch_data_misses 4 1056410
prefetch_metadata_hits 4 40921
prefetch_metadata_misses 4 45719
mru_hits 4 93028899
mru_ghost_hits 4 516540
mfu_hits 4 204390219
mfu_ghost_hits 4 519825
deleted 4 14348831
mutex_miss 4 1723
evict_skip 4 3489265
evict_not_enough 4 501026
evict_l2_cached 4 74643544576
evict_l2_eligible 4 55270809600
evict_l2_ineligible 4 535376780288
evict_l2_skip 4 1
hash_elements 4 4743317
hash_elements_max 4 4743317
hash_collisions 4 3727992
hash_chains 4 555840
hash_chain_max 4 5
p 4 57787568128
c 4 68719476736
c_min 4 34359738368
c_max 4 68719476736
size 4 68481911872
hdr_size 4 698717608
data_size 4 51302054400
metadata_size 4 5831588352
other_size 4 10348536824
anon_size 4 1891328
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 43601494016
mru_evictable_data 4 40806964736
mru_evictable_metadata 4 46756352
mru_ghost_size 4 13786593792
mru_ghost_evictable_data 4 13351921664
mru_ghost_evictable_metadata 4 434672128
mfu_size 4 13530257408
mfu_evictable_data 4 10495049728
mfu_evictable_metadata 4 7239168
mfu_ghost_size 4 54430862848
mfu_ghost_evictable_data 4 54065164800
mfu_ghost_evictable_metadata 4 365698048
l2_hits 4 2561445
l2_misses 4 39105378
l2_feeds 4 56007
l2_rw_clash 4 0
l2_read_bytes 4 6233748480
l2_write_bytes 4 32917245952
l2_writes_sent 4 46728
l2_writes_done 4 46728
l2_writes_error 4 0
l2_writes_lock_retry 4 8
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 472
l2_cdata_free_on_write 4 36
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 51203695104
l2_asize 4 25893342720
l2_hdr_size 4 301014688
l2_compress_successes 4 1933854
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 2736550
arc_meta_used 4 17179857472
arc_meta_limit 4 17179869184
arc_meta_max 4 17270630208
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2110943232

Around one day later, it crashed again, displaying the same behavior as I have mentioned before. This is the arcstats dump:
6 1 0x01 91 4368 6329068452 160622092889234
name type data
hits 4 783644686
misses 4 105794049
demand_data_hits 4 38305010
demand_data_misses 4 10365616
demand_metadata_hits 4 745194483
demand_metadata_misses 4 94324328
prefetch_data_hits 4 103733
prefetch_data_misses 4 1056502
prefetch_metadata_hits 4 41460
prefetch_metadata_misses 4 47603
mru_hits 4 174273135
mru_ghost_hits 4 995124
mfu_hits 4 609232839
mfu_ghost_hits 4 1930000
deleted 4 33748066
mutex_miss 4 4400
evict_skip 4 8434906
evict_not_enough 4 1241088
evict_l2_cached 4 223849254400
evict_l2_eligible 4 112828147712
evict_l2_ineligible 4 1020344815104
evict_l2_skip 4 191
hash_elements 4 6413616
hash_elements_max 4 6462685
hash_collisions 4 12107913
hash_chains 4 953974
hash_chain_max 4 6
p 4 63189267456
c 4 68719476736
c_min 4 34359738368
c_max 4 68719476736
size 4 67059956128
hdr_size 4 649741024
data_size 4 49526423552
metadata_size 4 6387539968
other_size 4 10008623560
anon_size 4 512512
anon_evictable_data 4 0
anon_evictable_metadata 4 0
mru_size 4 37708841472
mru_evictable_data 4 32923219456
mru_evictable_metadata 4 14125056
mru_ghost_size 4 1035494912
mru_ghost_evictable_data 4 1035443712
mru_ghost_evictable_metadata 4 51200
mfu_size 4 18204609536
mfu_evictable_data 4 16600908800
mfu_evictable_metadata 4 12973568
mfu_ghost_size 4 62457486848
mfu_ghost_evictable_data 4 62448094208
mfu_ghost_evictable_metadata 4 9392640
l2_hits 4 11006289
l2_misses 4 94787721
l2_feeds 4 162359
l2_rw_clash 4 1
l2_read_bytes 4 35790534144
l2_write_bytes 4 63248285696
l2_writes_sent 4 108993
l2_writes_done 4 108993
l2_writes_error 4 0
l2_writes_lock_retry 4 15
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_evict_l1cached 4 0
l2_free_on_write 4 1406
l2_cdata_free_on_write 4 75
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 93080623104
l2_asize 4 52930593792
l2_hdr_size 4 487628024
l2_compress_successes 4 3123307
l2_compress_zeros 4 0
l2_compress_failures 4 0
memory_throttle_count 4 0
duplicate_buffers 4 0
duplicate_buffers_size 4 0
duplicate_reads 4 0
memory_direct_count 4 0
memory_indirect_count 4 0
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 6609772
arc_meta_used 4 17533532576
arc_meta_limit 4 17179869184
arc_meta_max 4 17533542096
arc_meta_min 4 16777216
arc_need_free 4 0
arc_sys_free 4 2110943232

The dmesg errors are different though, So I have included them here:
[Fri Nov 6 22:35:51 2015] INFO: task spl_system_task:1149 blocked for more than 120 seconds.
[Fri Nov 6 22:35:51 2015] Tainted: P OX 3.13.0-66-generic #108-Ubuntu
[Fri Nov 6 22:35:51 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Nov 6 22:35:51 2015] spl_system_task D ffff880ffef13180 0 1149 2 0x00000000
[Fri Nov 6 22:35:51 2015] ffff880fe2b9d650 0000000000000046 ffff880fe3688000 ffff880fe2b9dfd8
[Fri Nov 6 22:35:51 2015] 0000000000013180 0000000000013180 ffff880fe3688000 ffffffffa03a5330
[Fri Nov 6 22:35:51 2015] ffffffffa03a5334 ffff880fe3688000 00000000ffffffff ffffffffa03a5338
[Fri Nov 6 22:35:51 2015] Call Trace:
[Fri Nov 6 22:35:51 2015] [] schedule_preempt_disabled+0x29/0x70
[Fri Nov 6 22:35:51 2015] [] __mutex_lock_slowpath+0x135/0x1b0
[Fri Nov 6 22:35:51 2015] [] mutex_lock+0x1f/0x2f
[Fri Nov 6 22:35:51 2015] [] buf_hash_find+0xa0/0x150 [zfs]
[Fri Nov 6 22:35:51 2015] [] arc_read+0x10f/0xa80 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_prefetcher+0x141/0x180 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x4cb/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] ? add_reference.isra.15+0x6a/0xb0 [zfs]
[Fri Nov 6 22:35:51 2015] [] ? arc_read+0x478/0xa80 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_dnode+0x71/0xe0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x6fb/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_dnode+0x71/0xe0 [zfs]
[Fri Nov 6 22:35:51 2015] [] traverse_visitbp+0x5df/0x7f0 [zfs]
[Fri Nov 6 22:35:51 2015] [] ? native_sched_clock+0x13/0x80
[Fri Nov 6 22:35:51 2015] [] ? sched_clock+0x9/0x10
[Fri Nov 6 22:35:51 2015] [] traverse_prefetch_thread+0xde/0x160 [zfs]
[Fri Nov 6 22:35:51 2015] [] ? prefetch_needed.isra.5+0x40/0x40 [zfs]
[Fri Nov 6 22:35:51 2015] [] taskq_thread+0x20d/0x410 [spl]
[Fri Nov 6 22:35:51 2015] [] ? wake_up_state+0x20/0x20
[Fri Nov 6 22:35:51 2015] [] ? taskq_cancel_id+0x120/0x120 [spl]
[Fri Nov 6 22:35:51 2015] [] kthread+0xd2/0xf0
[Fri Nov 6 22:35:51 2015] [] ? kthread_create_on_node+0x1c0/0x1c0
[Fri Nov 6 22:35:51 2015] [] ret_from_fork+0x58/0x90
[Fri Nov 6 22:35:51 2015] [] ? kthread_create_on_node+0x1c0/0x1c0

Hopefully you can make something of this!

@kernelOfTruth
Copy link
Contributor

@gvdijnsen What exactly is happening during these stalls ? Do these correlate with pre-set cron jobs ?

Any special memory-hungry programs running ? rsync ? zfs send (snapshots) ?

Also please specify "crash", what still works ?

it either looks like stack issues, heavy load related problems (the error message appears different on non-preemptible kernels) - at least now (#3148)

previously it looked like it was related to nfs - you don't use nfs4 so that can be ruled out ? (https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1315955)

I'd say to disable the dynamic taskqs, but it's already disabled with 0.6.5.3 (Disable dynamic taskqs by default to avoid deadlock openzfs/spl#484)

A specific reason you set spl_kmem_cache_slab_limit to 0 ?

CC'ing

@dweeezil and @behlendorf in case they've seen similar behavior

@gvdijnsen
Copy link
Author

@kernelOfTruth During the stalls, nothing out of the ordinary is happening. That is to say, nothing I know of that might cause this specifically. However, the system is in heavy use by potentially over 130 users and processes. This is a heavily loaded production system, 10.000 IOPS on average with a throughput of 100-300 MB/s on average, with peaks of well over 800MB/s on busy moments. The users are often developers checking stuff in or out with git or other tools and the system is also used in automated release builds that may load the system quite heavily. Most access happens through NFS and it is near impossible to track down all that is happening on the system all the time.

However I can help you with some of the questions:

  • We reduced snapshot frequency (and cleaning up) from once every hour to once per day and this did nothing in the frequency of stalls. Also the stalls do not specifically happen at the time of a snapshot.
  • Same for replication jobs.
  • We only use NFS3, NFS4 can be ruled out.
  • A "crash" is indeed too vague, and I should not have called it that. The system actually never crashes, but the underlying ZFS file system just stops working; All NFS processes stop and local access to the system is impaired. Writes to it hang immediately and indefinitely. An 'ls' ran locally on the system sometimes works for a few dirs, but eventually stalls and hangs there forever. In the past we had situations where a "echo 2 > /proc/sys/vm/drop_caches" could pull the filesystem out of it's misery, but in the last weeks that has never helped.
  • We have had these kinds of problems in the past, but they were spaced much further apart: In our tests we never saw them but after we took the system into production they started to appear: months apart at first, but lately that became weeks and now it is days if not hours. This however might be fully explained by the fact that we have gradually been taking this system into more heavy use as we went along and may not be regressions in the code at all.
  • The reason for spl_kmem_cache_slab_limit is that a while ago we had a problem with memory becoming trapped in SLAB and never released, leading to situations where we rand out of memory after a while. I picked this parameter up from https://utcc.utoronto.ca/~cks/space/blog/linux/ZFSonLinuxMemoryWhere and since we are using it, these problems are over. Please tell me if you want me to disable that tweak. It might have been fixed in the current release, I do not know.
  • We have had disabled dynamic taskqs for a while, but this has not helped.

Were you able to glean anything from the arcstats I sent you?

@gvdijnsen
Copy link
Author

FYI: We had several of these stalls since my last reports. I found out that if we stop nfs-kernel-server and wait a few minutes for all threads to exit, and then start nfs-kernel-server again, the system is restored to working order. This at least indicates that is is related to nfs. This does not always work out though; the nfs threads refuse to die sometimes and then we need a reboot.

@dweeezil
Copy link
Contributor

Following up because I noticed my name mentioned in this issue: I've not really had the time to follow this issue too closely but it sounds like increasing zfs_arc_min helped. In my opinion, raising it and possibly zfs_arc_meta_min from their very low defaults very important in many ZoL use cases; mainly where there are significant non-ZFS tasks which need to use a lot of memory.

@kernelOfTruth
Copy link
Contributor

@dweeezil thanks for your input ! Alright so if I understood it correctly ZFS needs to have a certain minimum for ARC otherwise everything inevitably would stall, come to a grinding halt, since all data structures, etc. are on ZFS and everything else in turn depending on that - interesting ...

I remember having read that it's partially similar on one of the other architectures where OpenZFS runs ? But on Linux the situation is aggravated due to the Pagecache, Slab and other mechanisms - so it's kind of hard to strike a (perfect) balance ?

I read on some of the issues that settings zfs_arc_min and zfs_arc_max to the same value fixed an issue, can't recall right now what particular

@gvdijnsen one other shot you could attempt would be setting the min and max values for ARC and the min and max value for meta(data) to the same value (but of course keeping the ratio of ARC, data to metadata) and see whether that stabilizes things, the ARC max perhaps a little lower than before to have enough RAM for the non-filesystem processes

@gvdijnsen
Copy link
Author

Hi all,
Based on everything I read (both here and elsewhere), I have now the following changes in place for when it next stalls:

  • Installed kernel 3.16.0-52 (larger stack)
  • Set zfs_arc_meta_min to 8589934592 (as per @dweeezil suggestion)
  • Installed irqbalance (to better spread irq handling over the available cores)
  • Lifting the txqueuelen to 10000 (might settle some buffer issues at 10 Gbit speed)

@dweeezil I will try your suggestion about setting the ARC min and max value to the same value. Cant try everything at once, as much as I would like ;-). Also, I do not quite understand your remark about "keeping the ratio of ARC, data to metadata". Could you elaborate about that please?

Anyway, the following zfs options will now be active on the next reboot:
options zfs zfs_arc_max=68719476736
options zfs zfs_arc_min=34359738368
options zfs zfs_arc_meta_limit=17179869184
options zfs zfs_arc_meta_min=8589934592
options zfs zfs_prefetch_disable=1
options spl spl_kmem_cache_slab_limit=0
options zfs zfs_vdev_cache_size=1310720
options zfs zfs_vdev_cache_max=131072
options zfs zfs_vdev_cache_bshift=17
options spl spl_taskq_thread_dynamic=0

@kernelOfTruth
Copy link
Contributor

@gvdijnsen I meant that you don't make zfs_arc_meta* too large compared to zfs_arc_min and zfs_arc_max

I believe I did that error once at the beginning 😏

*fingers crossed * let's see how that works out :)

@gvdijnsen
Copy link
Author

@dweeezil @kernelOfTruth From my understanding, it is like this:

arc_min <= arc_meta_min <= arc_meta_limit <= arc_max

So the only way to be setting this is setting them all to the same value if arc_min=arc_max. I am probably misunderstanding how this works. Please advise: What would you personally choose for settings for a 130GB RAM server ZFS/NFS/Samba server for all these values?

@kernelOfTruth
Copy link
Contributor

@gvdijnsen that's also how I understand it,

the recommended default is around 1/2 of RAM for ARC and 1/4 of RAM for metadata;

so going with data - I'd go a slight bit below half of RAM for ARC and raise the size of metadata for ARC in relation to that (meaning a little more than 25% since metadata appears to be properly used, also there's the l2arc with not so high hit rate but that might improve with longer runtime) :

(The topmost values are the suggested values according to my understanding and from the threads that I read so far, if there's additional changes to be made, please go ahead 😄 )

After, recommended (static)
arc_min 62,277,025,792
arc_max 62,277,025,792
data
(58)

arc_meta_min 17270630208
arc_meta_max 17270630208
metadata
(16)

Before (dynamic):
max
68719476736
(64)
min
34359738368
(32)

meta max
17270630208
(16)
meta min
16777216
(0.016)

before (dynamic)
no stall yet
47.7G data
5.4G metadata

after (dynamic)
after, during stall
46G data
5.9G metadata

@gvdijnsen
Copy link
Author

I am sorry for the silence from my side, but there is a reason for that:

Over the last 2 weeks we did not have a single crash. Curiously, I did not implement the change suggested by @dweeezil and @kernelOfTruth yet. There may be 2 reasons I can see:

  • We upped the txqueuelen from 1000 to 10000 after a tip we got from a storage specialist. I do not see why it would help, but it is the only change we made since the last crash.
  • The IOPS requested from system has suddenly fallen with 20% (on average, peak IOPS is 40% less) around the same time 2 weeks ago. As it turns out someone was repeatedly running large rsyncs over NFS and this has since stopped.

On one hand I am glad that the system is stable now, but I am suspicious since we had periods with weeks of stability before and this perceived stability may well be caused by the 20% drop in IOPS we are seeing. I am waiting for a good opportunity to reboot the system and run with the new ARC settings...

@kernelOfTruth
Copy link
Contributor

@gvdijnsen glad it's gotten more stable !

In relation to that raising the nr_requests might also make a change

@gvdijnsen
Copy link
Author

Hi, just to let you know that it stalled again just before the weekend. I brought it back up with the recommended settings @kernelOfTruth provided. (not the nr_requests yet, that is our following step if this turns out not to work). Now running 5 days steady...

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Dec 9, 2015
@gvdijnsen
Copy link
Author

Stalled after 11 days. I am at a loss what to do... I now started to up some values like: /proc/sys/net/core/rmem_default because they were kinda low, but I am not sure what else to do...

The nr_requests is a per disk setting. Any advise on a good value for that? Default seems 128....

@kernelOfTruth
Copy link
Contributor

@gvdijnsen sorry for the long delay, I've been on a move to a different country and slightly lost track of some things

nr_requests should be a pretty high value when running with raid arrays or several drives

so nr_requests of 1024 would be a good start, mark however that higher values might lead to slightly higher latency

Was there any change on your side ?

There were meanwhile several locking issues addressed in December which made it into master (I was also affected of some of them), seemingly only the locking inversion fix made it into 0.6.5.4 release

Your issue could be related to #4106 ZFS 0.6.5.3 servers hang trying to get mutexes , since your dmesg messages reminds me of similar output in that issue

also referencing: #4166 live-lock in arc_reclaim, blocking any pool IO



Tl;dr

applying patches:
openzfs/spl@08fbc02
to SPL

and

to ZFS
#4106 (comment)

#4124 Fix z_hold_mtx deadlock
(2 commits)

then set zfs_object_mutex_size to 4096


should clearly help with your issue

@gvdijnsen
Copy link
Author

On our side nothing much has changed. We stall and reboot every once in a while (about twice a week) and try to keep load as low as possible until we have a solution. Meanwhile, to my dismay, we are looking at an exit scenario to fall back to ext4 :-(.

I have looked at the issues you mentioned and although I am quite positive that these look very much like our problem, so the fixes for them should be able to help us out. However, they are not in the current stable release (6.5.4) and I am not bold enough top roll my own and use it on our production system, so we will have to wait until 6.5.5 comes along. I could try to run it on our test system, but that one doesn't crash because we cannot load it in the same manner as we do in production. Hopefully 6.5.5 will come along soon.

For now, I have set nr_requests for all disks to 1024 in production and will keep you updated on the results!

@behlendorf
Copy link
Contributor

@gvdijnsen sorry I'm a little late to this issue, but if you could post all the stack traces from a system when this happens I could tell you for sure if the #4106 patches will resolve your issue. These changes have just been merged to master so they may make it in to 0.6.5.5.

@gvdijnsen
Copy link
Author

@behlendorf Thanks for looking into this, even if it is late in the game. I am gratedul for every help I can get. These are stack traces for a recent stall just before reboot:
[Tue Jan 5 22:53:34 2016] INFO: task spl_system_task:1161 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #7114.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] spl_system_task D ffff880ffeed30c0 0 1161 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fe3d93540 0000000000000046 ffff880fe3e51460 ffff880fe3d93fd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe74b1460 ffffffffc0615d50
[Tue Jan 5 22:53:34 2016] ffffffffc0615d54 ffff880fe3e51460 00000000ffffffff ffffffffc0615d58
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] buf_hash_find+0xa0/0x150 [zfs]
[Tue Jan 5 22:53:34 2016] [] arc_read+0x10f/0xa80 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_prefetcher+0x141/0x180 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x4cb/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[Tue Jan 5 22:53:34 2016] [] ? spl_kmem_alloc+0xc7/0x170 [spl]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_dnode+0x71/0xe0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x6fb/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x312/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_dnode+0x71/0xe0 [zfs]
[Tue Jan 5 22:53:34 2016] [] traverse_visitbp+0x5df/0x7f0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? sched_clock+0x9/0x10
[Tue Jan 5 22:53:34 2016] [] ? sched_clock_cpu+0x85/0xc0
[Tue Jan 5 22:53:34 2016] [] traverse_prefetch_thread+0xde/0x160 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? prefetch_needed.isra.5+0x40/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] taskq_thread+0x20d/0x410 [spl]
[Tue Jan 5 22:53:34 2016] [] ? wake_up_state+0x20/0x20
[Tue Jan 5 22:53:34 2016] [] ? taskq_cancel_id+0x120/0x120 [spl]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task arc_prune:1268 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #71
14.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] arc_prune D ffff881ffe6930c0 0 1268 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fcc74f940 0000000000000046 ffff880fcc72d180 ffff880fcc74ffd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe25f5bb0 ffffffffc0615d50
[Tue Jan 5 22:53:34 2016] ffffffffc0615d54 ffff880fcc72d180 00000000ffffffff ffffffffc0615d58
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] ? multilist_insert+0xac/0xc0 [zfs]
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] arc_buf_remove_ref+0xa7/0x150 [zfs]
[Tue Jan 5 22:53:34 2016] [] dbuf_rele_and_unlock+0x9c/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? update_curr+0xe4/0x180
[Tue Jan 5 22:53:34 2016] [] dbuf_rele+0x36/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dnode_rele_and_unlock+0x80/0x90 [zfs]
[Tue Jan 5 22:53:34 2016] [] dnode_rele+0x39/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dbuf_rele_and_unlock+0x379/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? __slab_free+0xfe/0x2c3
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] dbuf_rele+0x36/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dmu_buf_rele+0xe/0x10 [zfs]
[Tue Jan 5 22:53:34 2016] [] sa_handle_destroy+0x80/0xe0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_zinactive+0xa7/0x180 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_inactive+0x67/0x240 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? truncate_pagecache+0x59/0x60
[Tue Jan 5 22:53:34 2016] [] zpl_evict_inode+0x43/0x60 [zfs]
[Tue Jan 5 22:53:34 2016] [] evict+0xb4/0x180
[Tue Jan 5 22:53:34 2016] [] dispose_list+0x39/0x50
[Tue Jan 5 22:53:34 2016] [] prune_icache_sb+0x47/0x60
[Tue Jan 5 22:53:34 2016] [] super_cache_scan+0x105/0x170
[Tue Jan 5 22:53:34 2016] [] zfs_sb_prune+0xf2/0x160 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_prune_sb+0x26/0x30 [zfs]
[Tue Jan 5 22:53:34 2016] [] arc_prune_task+0x1f/0x50 [zfs]
[Tue Jan 5 22:53:34 2016] [] taskq_thread+0x20d/0x410 [spl]
[Tue Jan 5 22:53:34 2016] [] ? wake_up_state+0x20/0x20
[Tue Jan 5 22:53:34 2016] [] ? taskq_cancel_id+0x120/0x120 [spl]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task arc_prune:1299 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #7114.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] arc_prune D ffff880ffee730c0 0 1299 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fcc003940 0000000000000046 ffff880fcc7f1e90 ffff880fcc003fd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe747f010 ffffffffc0615d50
[Tue Jan 5 22:53:34 2016] ffffffffc0615d54 ffff880fcc7f1e90 00000000ffffffff ffffffffc0615d58
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] ? multilist_insert+0xac/0xc0 [zfs]
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] arc_buf_remove_ref+0xa7/0x150 [zfs]
[Tue Jan 5 22:53:34 2016] [] dbuf_rele_and_unlock+0x9c/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? free_one_page+0x1f5/0x330
[Tue Jan 5 22:53:34 2016] [] dbuf_rele+0x36/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dnode_rele_and_unlock+0x80/0x90 [zfs]
[Tue Jan 5 22:53:34 2016] [] dnode_rele+0x39/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dbuf_rele_and_unlock+0x379/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? __switch_to+0xe4/0x580
[Tue Jan 5 22:53:34 2016] [] ? __slab_free+0xfe/0x2c3
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] dbuf_rele+0x36/0x40 [zfs]
[Tue Jan 5 22:53:34 2016] [] dmu_buf_rele+0xe/0x10 [zfs]
[Tue Jan 5 22:53:34 2016] [] sa_handle_destroy+0x80/0xe0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_zinactive+0xa7/0x180 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_inactive+0x67/0x240 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? truncate_pagecache+0x59/0x60
[Tue Jan 5 22:53:34 2016] [] zpl_evict_inode+0x43/0x60 [zfs]
[Tue Jan 5 22:53:34 2016] [] evict+0xb4/0x180
[Tue Jan 5 22:53:34 2016] [] dispose_list+0x39/0x50
[Tue Jan 5 22:53:34 2016] [] prune_icache_sb+0x47/0x60
[Tue Jan 5 22:53:34 2016] [] super_cache_scan+0x105/0x170
[Tue Jan 5 22:53:34 2016] [] zfs_sb_prune+0xf2/0x160 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_prune_sb+0x26/0x30 [zfs]
[Tue Jan 5 22:53:34 2016] [] arc_prune_task+0x1f/0x50 [zfs]
[Tue Jan 5 22:53:34 2016] [] taskq_thread+0x20d/0x410 [spl]
[Tue Jan 5 22:53:34 2016] [] ? wake_up_state+0x20/0x20
[Tue Jan 5 22:53:34 2016] [] ? taskq_cancel_id+0x120/0x120 [spl]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7079 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #71
14.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff880ffeed30c0 0 7079 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff881fd7ad3a50 0000000000000046 ffff881fd021bd20 ffff881fd7ad3fd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fb6e1b2f0 ffff880fe5191180
[Tue Jan 5 22:53:34 2016] ffff880fe5191184 ffff881fd021bd20 00000000ffffffff ffff880fe5191188
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_vget+0x230/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_fh_to_dentry+0x72/0xb0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] exportfs_decode_fh+0x72/0x2e0
[Tue Jan 5 22:53:34 2016] [] ? cache_check+0x106/0x3d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? exp_find+0xfc/0x1b0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] fh_verify+0x2f6/0x5e0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? groups_free+0x54/0x60
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_getattr+0x69/0xe0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7085 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #7114.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff880ffeed30c0 0 7085 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff881fd1efb910 0000000000000046 ffff881fd0dcc750 ffff881fd1efbfd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe24e32f0 ffff880fc2e611f0
[Tue Jan 5 22:53:34 2016] ffff880fc2e611f4 ffff881fd0dcc750 00000000ffffffff ffff880fc2e611f8
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_dirent_lock+0x49a/0x580 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_dirlook+0x81/0x310 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_lookup+0x2c5/0x310 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_lookup+0x8e/0x110 [zfs]
[Tue Jan 5 22:53:34 2016] [] lookup_real+0x1d/0x50
[Tue Jan 5 22:53:34 2016] [] __lookup_hash+0x33/0x40
[Tue Jan 5 22:53:34 2016] [] lookup_one_len+0xe6/0x140
[Tue Jan 5 22:53:34 2016] [] encode_entryplus_baggage+0xb2/0x1f0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] encode_entry.isra.12+0x322/0x370 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? nfs3svc_encode_entry+0x20/0x20 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfs3svc_encode_entry_plus+0x14/0x20 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_readdir+0x19f/0x2a0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_direct_splice_actor+0x20/0x20 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_readdirplus+0xc7/0x210 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7122 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #71
14.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff880ffeed30c0 0 7122 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fb63afa50 0000000000000046 ffff880fc8951460 ffff880fb63affd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe74b1460 ffff880fe5191180
[Tue Jan 5 22:53:34 2016] ffff880fe5191184 ffff880fc8951460 00000000ffffffff ffff880fe5191188
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_vget+0x230/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_fh_to_dentry+0x72/0xb0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] exportfs_decode_fh+0x72/0x2e0
[Tue Jan 5 22:53:34 2016] [] ? cache_check+0x106/0x3d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x58/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] fh_verify+0x2f6/0x5e0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? groups_free+0x54/0x60
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_getattr+0x69/0xe0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7123 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #7114.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff881ffe7d30c0 0 7123 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fb202ba50 0000000000000046 ffff880fc89528c0 ffff880fb202bfd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe7581460 ffff880fe5191180
[Tue Jan 5 22:53:34 2016] ffff880fe5191184 ffff880fc89528c0 00000000ffffffff ffff880fe5191188
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_vget+0x230/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_fh_to_dentry+0x72/0xb0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] exportfs_decode_fh+0x72/0x2e0
[Tue Jan 5 22:53:34 2016] [] ? cache_check+0x106/0x3d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? exp_find+0xfc/0x1b0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? put_prev_entity+0x31/0x3f0
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] fh_verify+0x2f6/0x5e0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? groups_free+0x54/0x60
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_getattr+0x69/0xe0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7125 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #71
14.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff881ffe7d30c0 0 7125 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fb64c3a50 0000000000000046 ffff880fc8957010 ffff880fb64c3fd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fc831c750 ffff880fe5191180
[Tue Jan 5 22:53:34 2016] ffff880fe5191184 ffff880fc8957010 00000000ffffffff ffff880fe5191188
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_vget+0x230/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_fh_to_dentry+0x72/0xb0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] exportfs_decode_fh+0x72/0x2e0
[Tue Jan 5 22:53:34 2016] [] ? cache_check+0x106/0x3d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? exp_find+0xfc/0x1b0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] fh_verify+0x2f6/0x5e0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? groups_free+0x54/0x60
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_getattr+0x69/0xe0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] INFO: task nfsd:7141 blocked for more than 120 seconds.
[Tue Jan 5 22:53:34 2016] Tainted: P OE 3.16.0-52-generic #71~14.04.1-Ubuntu
[Tue Jan 5 22:53:34 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan 5 22:53:34 2016] nfsd D ffff880ffef130c0 0 7141 2 0x00000000
[Tue Jan 5 22:53:34 2016] ffff880fb8edfa50 0000000000000046 ffff881fddcc65e0 ffff880fb8edffd8
[Tue Jan 5 22:53:34 2016] 00000000000130c0 00000000000130c0 ffff880fe7538000 ffff880fe5191180
[Tue Jan 5 22:53:34 2016] ffff880fe5191184 ffff881fddcc65e0 00000000ffffffff ffff880fe5191188
[Tue Jan 5 22:53:34 2016] Call Trace:
[Tue Jan 5 22:53:34 2016] [] schedule_preempt_disabled+0x29/0x70
[Tue Jan 5 22:53:34 2016] [] __mutex_lock_slowpath+0xd5/0x1d0
[Tue Jan 5 22:53:34 2016] [] mutex_lock+0x1f/0x2f
[Tue Jan 5 22:53:34 2016] [] zfs_zget+0x17b/0x320 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? getboottime+0x30/0x40
[Tue Jan 5 22:53:34 2016] [] ? mutex_lock+0x12/0x2f
[Tue Jan 5 22:53:34 2016] [] ? rrw_enter_read_impl+0xc1/0x170 [zfs]
[Tue Jan 5 22:53:34 2016] [] zfs_vget+0x230/0x3e0 [zfs]
[Tue Jan 5 22:53:34 2016] [] zpl_fh_to_dentry+0x72/0xb0 [zfs]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] exportfs_decode_fh+0x72/0x2e0
[Tue Jan 5 22:53:34 2016] [] ? cache_check+0x106/0x3d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? exp_find+0xfc/0x1b0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? put_prev_entity+0x31/0x3f0
[Tue Jan 5 22:53:34 2016] [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
[Tue Jan 5 22:53:34 2016] [] ? aa_alloc_task_context+0x27/0x40
[Tue Jan 5 22:53:34 2016] [] fh_verify+0x2f6/0x5e0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] ? groups_free+0x54/0x60
[Tue Jan 5 22:53:34 2016] [] ? unix_gid_show+0x110/0x110 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] nfsd3_proc_getattr+0x69/0xe0 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd_dispatch+0xbb/0x200 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process_common+0x440/0x6d0 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] svc_process+0x107/0x170 [sunrpc]
[Tue Jan 5 22:53:34 2016] [] ? nfsd_destroy+0x80/0x80 [nfsd]
[Tue Jan 5 22:53:34 2016] [] nfsd+0xc7/0x130 [nfsd]
[Tue Jan 5 22:53:34 2016] [] kthread+0xd2/0xf0
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0
[Tue Jan 5 22:53:34 2016] [] ret_from_fork+0x58/0x90
[Tue Jan 5 22:53:34 2016] [] ? kthread_create_on_node+0x1c0/0x1c0

Hope this is the same problem that is solved with the patch!!

@gvdijnsen
Copy link
Author

Actually it seems that the stack trace is too long to submit here, but I am hoping you can work with this. Let's hope you can verify that our problem is solved with this patch...

@behlendorf
Copy link
Contributor

Yes, it looks like your issue should already be addressed in the master branch. And we'll cherry pick those fixes back for the next point release.

@gvdijnsen
Copy link
Author

@behlendorf Is there a place where I can find a probable release date for 0.6.5.5? I am kind of anxious to fix this on my production server...

@behlendorf
Copy link
Contributor

@gvdijnsen if you don't mind rolling your own version I'd suggest pulling the latest code from the spl/zfs-0.6.5-release branches. The fixes you need are applied there and that where we'll tag 0.6.5.5.

https://github.com/zfsonlinux/spl/tree/spl-0.6.5-release
https://github.com/zfsonlinux/zfs/tree/zfs-0.6.5-release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

5 participants
@behlendorf @gvdijnsen @dweeezil @kernelOfTruth and others