-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illumos #5244 zio pipeline callers should explicitly invoke next stage #2828
Conversation
Is there something specific holding this patch back from landing? I see that the upstream Illumos patch that @ryao was referencing has been merged. This is causing a fair number of testing failures with Lustre at OST unmount time (https://jira.hpdd.intel.com/browse/LU-5242) so it would be great to get this fixed. This is one of the last issues blocking our ability to enforce review-zfs testing for every Lustre patch submission. |
…xt stage openzfs#2828 References: https://www.illumos.org/projects/illumos-gate//issues/5244 https://reviews.csiden.org/r/119/diff/# Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. 3. Our changes to use the TQ_PUSHPAGE extension and vdev_file_taskq caused merge conflicts in module/zfs/vdev_file.c. The taskq import that I plan to send in the future would have prevented this particular conflict. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: Adam Leventhal ahl@delphix.com Reviewed by: Alex Reece alex.reece@delphix.com Reviewed by: Christopher Siden christopher.siden@delphix.com Ported-by: Richard Yao richard.yao@clusterhq.com
@adilger have you verified that this patch resolves your issue? I know there was some speculation that this would resolve #2523 but frankly that was just a (wrong) hunch. And in fact the real root cause of #2523 has been identified and a patch proposed. That said, the only thing holding up this patch is getting it reviewed and then merged. It's on my short list but I've been heads down on the kmem rework for the last week or so. |
Bumping to 0.6.5 this isn't critical and delaying it will all us to pull in the Illumos dependencies. |
…xt stage openzfs#2828 References: https://www.illumos.org/projects/illumos-gate//issues/5244 https://reviews.csiden.org/r/119/diff/# Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. 3. Our changes to use the TQ_PUSHPAGE extension and vdev_file_taskq caused merge conflicts in module/zfs/vdev_file.c. The taskq import that I plan to send in the future would have prevented this particular conflict. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: Adam Leventhal ahl@delphix.com Reviewed by: Alex Reece alex.reece@delphix.com Reviewed by: Christopher Siden christopher.siden@delphix.com Ported-by: Richard Yao richard.yao@clusterhq.com * files that needed updating: - module/zfs/vdev_disk.c - module/zfs/vdev_file.c
…nvoke next stage openzfs#2828" This reverts commit 9291279.
…xt stage openzfs#2828 References: https://www.illumos.org/projects/illumos-gate//issues/5244 https://reviews.csiden.org/r/119/diff/# Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. 3. Our changes to use the TQ_PUSHPAGE extension and vdev_file_taskq caused merge conflicts in module/zfs/vdev_file.c. The taskq import that I plan to send in the future would have prevented this particular conflict. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: Adam Leventhal ahl@delphix.com Reviewed by: Alex Reece alex.reece@delphix.com Reviewed by: Christopher Siden christopher.siden@delphix.com Ported-by: Richard Yao richard.yao@clusterhq.com ---------------------------------------------------------------------------------- * files that needed updating: - module/zfs/vdev_disk.c - module/zfs/vdev_file.c (2nd attempt, ~kernelOfTruth)
…nvoke next stage openzfs#2828" This reverts commit 9291279. conflicts with: commit 92119cc Author: Brian Behlendorf <behlendorf1@llnl.gov> Date: Sun Jul 13 14:35:19 2014 -0400 Mark IO pipeline with PF_FSTRANS In order to avoid deadlocking in the IO pipeline it is critical that pageout be avoided during direct memory reclaim. This ensures that the pipeline threads can always make forward progress and never end up blocking on a DMU transaction. For this very reason Linux now provides the PF_FSTRANS flag which may be set in the process context. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> not sure where that commit came from, it's not in master (?!) but commits where always fetched from master /sadpanda
Recently ran into an issue that looked to be very similar to issues in this particular pull request as well as referenced issues. Currently running Centos 6.5 Kernel 3.14.4 with ZFS 0.6.3.1 with some additonal pull requests that were recommended by @ryao including the mutex serialization fix to SPL authored by @tuxoko My system is 15VCPU with 30GB of RAM running on an AWS HVM instnace with a single 1TB device with a single Zpool with a heavy NFS workload. We were running a 320GB L2arc made up of 2x 160GB ephemeral storage devices. We were also running ZFS send and receive for replication to another machine. The initial complaint was of NFS clients being disconnected after TXG sync had gotten stuck. The dmesg output looks like this
Started working with @dweeezil and @ryao after we were able to succesfully reproduce the issue. I am happy to after numerous reproductions we applied this patch and the issue appears to be resolved at this point |
@ryao it would be great if you could rebase this on master so we could get a fresh round of tests in before it's merged. |
e81dc3d
to
8e8589a
Compare
so I may have spoken to soon on this pull request solving this issue completely. After multiple hours of running with this patch we have seen the following new issue. Apr 26 22:19:10 ip-50-0-0-122 kernel: INFO: task zfs:5492 blocked for more than 120 seconds. |
@behlendorf I have been focused on developing the ability to upstream code changes to Illumos this past week, so I have relied solely on the buildbot for testing fixes to merge conflicts in vdev_file.c. The latest push resolved the merge conflicts correctly and is largely passing the buildbot. The two failures are an issue with the Ubuntu repository and a spurious failure in ztest. I am going to rebase on latest master and do another push. Hopefully all tests will be green with no spurious failures this time. @eolson78's second issue appears to be #2060, which is resolved by #3348. |
…xt stage References: https://www.illumos.org/issues/5244 https://reviews.csiden.org/r/119/diff/# Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. 3. Our changes to use the TQ_PUSHPAGE extension and vdev_file_taskq caused merge conflicts in module/zfs/vdev_file.c. The taskq import that I plan to send in the future would have prevented this particular conflict. Reviewed by: Matthew Ahrens mahrens@delphix.com Reviewed by: Adam Leventhal ahl@delphix.com Reviewed by: Alex Reece alex.reece@delphix.com Reviewed by: Christopher Siden christopher.siden@delphix.com Ported-by: Richard Yao <richard.yao@clusterhq.com>
@eolson78 thanks for the follow up. This fix has now been merged. |
…next stage 5244 zio pipeline callers should explicitly invoke next stage Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: Alex Reece <alex.reece@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Richard Elling <richard.elling@gmail.com> Reviewed by: Dan McDonald <danmcd@omniti.com> Reviewed by: Steven Hartland <killing@multiplay.co.uk> Approved by: Gordon Ross <gwr@nexenta.com> References: https://www.illumos.org/issues/5244 illumos/illumos-gate@738f37b Porting Notes: 1. The unported "2932 support crash dumps to raidz, etc. pools" caused a merge conflict due to a copyright difference in module/zfs/vdev_raidz.c. 2. The unported "4128 disks in zpools never go away when pulled" and additional Linux-specific changes caused merge conflicts in module/zfs/vdev_disk.c. Ported-by: Richard Yao <richard.yao@clusterhq.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#2828
References:
https://www.illumos.org/projects/illumos-gate//issues/5244
https://reviews.csiden.org/r/119/diff/#
Porting Notes:
merge conflict due to a copyright difference in module/zfs/vdev_raidz.c.
additional Linux-specific changes caused merge conflicts in
module/zfs/vdev_disk.c.
caused merge conflicts in module/zfs/vdev_file.c. The taskq import that
I plan to send in the future would have prevented this particular
conflict.
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: Adam Leventhal ahl@delphix.com
Reviewed by: Alex Reece alex.reece@delphix.com
Reviewed by: Christopher Siden christopher.siden@delphix.com
Ported-by: Richard Yao richard.yao@clusterhq.com