Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host freezes under 0.6.5.1, ubuntu vivid #3832

Closed
adrienkohlbecker opened this issue Sep 24, 2015 · 3 comments
Closed

Host freezes under 0.6.5.1, ubuntu vivid #3832

adrienkohlbecker opened this issue Sep 24, 2015 · 3 comments
Milestone

Comments

@adrienkohlbecker
Copy link

Hello,

I've updated my server last week, switching from Ubuntu 14.10 to 15.04, and from a custom 3.16 kernel to mainstream 3.19. My zfs version was subsequently updated to 0.6.5 from 0.6.3. This was an online upgrade, no reinstallation from scratch.

Since then, I'm getting intermittent lockups that completely freeze the system. It does not (until now) happen when the server is lightly used : only when I have a VM running and/or a docker container running.
In this case, I get around 30min to a couple hours of uptime before the host freezes and I have to force reset it.

I have updated to 0.6.5.1 yesterday, but it did not improve the problem.
I managed to take the following screenshots from the remote KVM, but that's all I have, since the display is usually sleeping when it happens, and there is nothing in syslog

screenshot 2015-09-23 10 38 41

screenshot 2015-09-23 10 38 50

This is a bit above my competencies but I'm willing to help with debug if needed.
Otherwise, is there a way for me to install 0.6.4 or 0.6.3 while this gets sorted out ? The packages have been removed from the ppa

Best,

@behlendorf
Copy link
Contributor

@adrienkohlbecker this is a duplicate on #3652 and on the short list of things to be fixed in the next point release. As a short term workaround you can try setting echo 0 > /sys/module/spl/parameters/spl_taskq_thread_dynamic which has been reported to help. I'm not sure about the easier way to roll back packages on Ubuntu.

@behlendorf behlendorf added this to the 0.6.5.2 milestone Sep 24, 2015
@adrienkohlbecker
Copy link
Author

Ah, good to know. I skimmed the list of issues and didn't think this was related to removing disks. I'll try the workaround, thanks!

behlendorf added a commit that referenced this issue Sep 25, 2015
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio
based on the ZIO_PRIORITY_* flag passed in.  This had the unnoticed
side-effect of making the vdev_disk_io_start() synchronous for
certain I/Os.

This in turn resulted in vdev_disk_io_start() being able to
re-dispatch zio's which would result in a RCU stalls when a disk
was removed from the system.  Additionally, this could negatively
impact performance and explains the performance regressions reported
in both #3829 and #3780.

This patch resolves the issue by making the blocking behavior
dependent on a 'wait' flag being passed rather than overloading
the passed bio flags.

Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to
non-rotational devices where there is no benefit to queuing to
aggregate the I/O.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3652
Issue #3780
Issue #3785
Issue #3817
Issue #3821
Issue #3829
Issue #3832
Issue #3870
@behlendorf
Copy link
Contributor

Resolved by 5592404 which will be cherry-picked in to 0.6.5.2 release.

behlendorf added a commit that referenced this issue Sep 30, 2015
Commit b39c22b set the READ_SYNC and WRITE_SYNC flags for a bio
based on the ZIO_PRIORITY_* flag passed in.  This had the unnoticed
side-effect of making the vdev_disk_io_start() synchronous for
certain I/Os.

This in turn resulted in vdev_disk_io_start() being able to
re-dispatch zio's which would result in a RCU stalls when a disk
was removed from the system.  Additionally, this could negatively
impact performance and explains the performance regressions reported
in both #3829 and #3780.

This patch resolves the issue by making the blocking behavior
dependent on a 'wait' flag being passed rather than overloading
the passed bio flags.

Finally, the WRITE_SYNC and READ_SYNC behavior is restricted to
non-rotational devices where there is no benefit to queuing to
aggregate the I/O.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #3652
Issue #3780
Issue #3785
Issue #3817
Issue #3821
Issue #3829
Issue #3832
Issue #3870
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants