-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
poor lstat and rename performance - dirent cache congestion? #3829
Comments
A stack trace of a perl process mainly renaming lots of directory entries:
|
@woffs thanks for filing this. I wasn't aware things had regressed we'll want to git bisert this to find the offending patch. |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and may explain the performance regressions reported in both openzfs#3829 and openzfs#3780. This patch resolves the issue by making the blocking behavior dependant on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#3780 Issue openzfs#3829 Issue openzfs#3652
Note: I could lift the performance in my nightly rsync-find-backup scenario to almost half of the usual level by lowering the ARC to ⅓ of RAM and spawning more parallel rsync threads (6 instead of 4). Glad I had not to drop_caches all night. ☺ Don't know if the little improvement in my setup is caused by lowering ARC or by parallelizing. |
thanks a lot. patched module is running. 12 hours later, after the backup cycle we will know more about performance and stability. |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
This is expected to be resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release. If that's not the case we can reopen this issue. |
Hit. Performance is great. Everything is fast. |
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio based on the ZIO_PRIORITY_* flag passed in. This had the unnoticed side-effect of making the vdev_disk_io_start() synchronous for certain I/Os. This in turn resulted in vdev_disk_io_start() being able to re-dispatch zio's which would result in a RCU stalls when a disk was removed from the system. Additionally, this could negatively impact performance and explains the performance regressions reported in both #3829 and #3780. This patch resolves the issue by making the blocking behavior dependent on a 'wait' flag being passed rather than overloading the passed bio flags. Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to non-rotational devices where there is no benefit to queuing to aggregate the I/O. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #3652 Issue #3780 Issue #3785 Issue #3817 Issue #3821 Issue #3829 Issue #3832 Issue #3870
Symptoms:
directories (find, rsync, backup scenario)
txg_sync
), everything looks fine in thatcorner
My system:
After doing
vm.drop_caches=2
(which apparently clears the ARC), orzfs_arc_meta_limit
andzfs_arc_max
to larger valuesperformance is restored for a short time (until some cache is filled up
again). Interestingly the ARC does not need to be near arc_meta_limit
resp. c_max for the performance to get degraded.
Setting
primarycache=metadata
brings no mitigation. Settingzfs_arc_meta_strategy=0
does not help.Downgrading to linux 3.2.68-1+deb7u3 + zfs 0.6.4-16-544f71-wheezy brings back very good performance.
The (or a similar) problem must have been introduced a few commits after 544f71 and apparently not fully resolved with 0.6.5.1.
Perhaps related to
The text was updated successfully, but these errors were encountered: