poor lstat and rename performance - dirent cache congestion? #3829

woffs · 2015-09-24T11:37:42Z

Symptoms:

lstat and rename performance gets degraded after reading lots of large
directories (find, rsync, backup scenario)
disk utilisation is not increased, but lower than normal
perf does not show any suspicious deadlocks or spins
no hanging kernel threads (txg_sync), everything looks fine in that
corner

My system:

linux 3.16.7-ckt11-1+deb8u4 (Debian Jessie)
zfs 0.6.5.1-2
two pools, 113 ZFSs, lz4, no dedup, no l2arc
NUMA system (two nodes), 96 GB RAM

After doing

vm.drop_caches=2 (which apparently clears the ARC), or
setting zfs_arc_meta_limit and zfs_arc_max to larger values

performance is restored for a short time (until some cache is filled up
again). Interestingly the ARC does not need to be near arc_meta_limit
resp. c_max for the performance to get degraded.

Setting primarycache=metadata brings no mitigation. Setting zfs_arc_meta_strategy=0 does not help.

Downgrading to linux 3.2.68-1+deb7u3 + zfs 0.6.4-16-544f71-wheezy brings back very good performance.

The (or a similar) problem must have been introduced a few commits after 544f71 and apparently not fully resolved with 0.6.5.1.

Perhaps related to

The text was updated successfully, but these errors were encountered:

woffs · 2015-09-24T12:50:06Z

A stack trace of a perl process mainly renaming lots of directory entries:

[<ffffffff8127cabf>] __blk_run_queue+0x2f/0x40
[<ffffffff81281073>] blk_queue_bio+0x323/0x360
[<ffffffff810968f0>] default_wake_function+0x0/0x10
[<ffffffffa10a8646>] __vdev_disk_physio+0x446/0x460 [zfs]
[<ffffffffa10a8af5>] vdev_disk_io_start+0x75/0x1b0 [zfs]
[<ffffffffa10e44d9>] zio_vdev_io_start+0x99/0x2e0 [zfs]
[<ffffffffa10e79cf>] zio_nowait+0xaf/0x180 [zfs]
[<ffffffffa10af31d>] vdev_raidz_io_start+0x14d/0x2c0 [zfs]
[<ffffffffa10acfb0>] vdev_raidz_child_done+0x0/0x20 [zfs]
[<ffffffffa10e44d9>] zio_vdev_io_start+0x99/0x2e0 [zfs]
[<ffffffffa10e79cf>] zio_nowait+0xaf/0x180 [zfs]
[<ffffffffa10abb90>] vdev_mirror_io_start+0xa0/0x1a0 [zfs]
[<ffffffffa10ab200>] vdev_mirror_child_done+0x0/0x20 [zfs]
[<ffffffffa10e461d>] zio_vdev_io_start+0x1dd/0x2e0 [zfs]
[<ffffffffa10e79cf>] zio_nowait+0xaf/0x180 [zfs]
[<ffffffffa10407de>] arc_read+0x5de/0xa80 [zfs]
[<ffffffffa1047eae>] dbuf_read+0x2ae/0x920 [zfs]
[<ffffffffa10510b0>] dmu_buf_hold+0x50/0x80 [zfs]
[<ffffffffa10afe9a>] zap_get_leaf_byblk+0x4a/0x290 [zfs]
[<ffffffffa10af9aa>] zap_idx_to_blk+0xda/0x150 [zfs]
[<ffffffffa10b0145>] zap_deref_leaf+0x65/0x70 [zfs]
[<ffffffffa10b0c61>] fzap_lookup+0x51/0x160 [zfs]
[<ffffffffa054e97f>] spl_kmem_alloc+0xbf/0x170 [spl]
[<ffffffffa10b56c4>] zap_lookup_norm+0x104/0x1d0 [zfs]
[<ffffffffa10b57bf>] zap_lookup+0x2f/0x40 [zfs]
[<ffffffffa10be052>] zfs_dirent_lock+0x512/0x5c0 [zfs]
[<ffffffffa10b8a99>] zfs_zaccess_aces_check+0x199/0x360 [zfs]
[<ffffffffa10be186>] zfs_dirlook+0x86/0x2d0 [zfs]
[<ffffffffa10d2714>] zfs_lookup+0x2c4/0x310 [zfs]
[<ffffffffa10edf26>] zpl_lookup+0x86/0x100 [zfs]
[<ffffffff811b0f79>] lookup_real+0x19/0x50
[<ffffffff811b180f>] __lookup_hash+0x2f/0x40
[<ffffffff811b5b00>] SYSC_renameat2+0x1f0/0x530
[<ffffffff811b4fc1>] do_unlinkat+0xd1/0x2c0
[<ffffffff811acecc>] SYSC_newlstat+0x2c/0x40
[<ffffffff8151164d>] system_call_fast_compare_end+0x10/0x15
[<ffffffffffffffff>] 0xffffffffffffffff

behlendorf · 2015-09-24T21:51:48Z

@woffs thanks for filing this. I wasn't aware things had regressed we'll want to git bisert this to find the offending patch.

Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and may explain the performance regressions reported in both openzfs#3829 and openzfs#3780. This patch resolves the issue by making the blocking behavior dependant on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3780 Issue openzfs#3829 Issue openzfs#3652

woffs · 2015-09-25T09:42:07Z

Note: I could lift the performance in my nightly rsync-find-backup scenario to almost half of the usual level by lowering the ARC to ⅓ of RAM and spawning more parallel rsync threads (6 instead of 4). Glad I had not to drop_caches all night. ☺

Don't know if the little improvement in my setup is caused by lowering ARC or by parallelizing.

behlendorf · 2015-09-25T16:31:23Z

@woffs I believe the patch #3833 will address this regression and it'll be part of the next point release. If you could verify the fix that would be appreciated.

woffs · 2015-09-25T18:02:23Z

thanks a lot. patched module is running. 12 hours later, after the backup cycle we will know more about performance and stability.

Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870

behlendorf · 2015-09-25T20:24:22Z

This is expected to be resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release. If that's not the case we can reopen this issue.

woffs · 2015-09-26T05:47:39Z

Hit. Performance is great. Everything is fast.

Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870

behlendorf added the Type: Performance Performance improvement or performance problem label Sep 24, 2015

behlendorf added this to the 0.7.0 milestone Sep 24, 2015

behlendorf modified the milestones: 0.6.5.3, 0.7.0 Sep 24, 2015

behlendorf mentioned this issue Sep 25, 2015

Fix synchronous behavior in __vdev_disk_physio() #3833

Closed

behlendorf modified the milestones: 0.6.5.3, 0.6.5.2 Sep 25, 2015

behlendorf closed this as completed Sep 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor lstat and rename performance - dirent cache congestion? #3829

poor lstat and rename performance - dirent cache congestion? #3829

woffs commented Sep 24, 2015

woffs commented Sep 24, 2015

behlendorf commented Sep 24, 2015

woffs commented Sep 25, 2015

behlendorf commented Sep 25, 2015

woffs commented Sep 25, 2015

behlendorf commented Sep 25, 2015

woffs commented Sep 26, 2015

poor lstat and rename performance - dirent cache congestion? #3829

poor lstat and rename performance - dirent cache congestion? #3829

Comments

woffs commented Sep 24, 2015

woffs commented Sep 24, 2015

behlendorf commented Sep 24, 2015

woffs commented Sep 25, 2015

behlendorf commented Sep 25, 2015

woffs commented Sep 25, 2015

behlendorf commented Sep 25, 2015

woffs commented Sep 26, 2015