-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure when importing mirrored pool #604
Comments
I should probably add that I used the Ubuntu LiveCD with the daily PPA to do repartitioning. I had a failure occur when trying to import rpool in order to send a snapshot to the backup pool before I had repartitioned the disk. The dmesg output was at the following url: http://paste.pocoo.org/show/565855/ After the import hung, I tried running "zfs" and "zpool" once or twice, which locked each terminal. :/ |
Interestingly I saw this for the first time yesterday as well, I filed it as issue #602. |
Do you want to keep this one open until we can verify a fix for this resolves things on my system? With that said, I just hit this again: [ 177.536568] SPLError: 6297:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed I had booted using a dracut-built initramfs when trying to debug the cause of a deadlock in a genkernel-built initramfs. bpool had not been mounted by dracut, so I ran The logs are available from the following urls: http://www.cs.stonybrook.edu/~ryao/spl-log.1331849987.6297 |
Sure, I don't mind leaving both open. Although it might be a while before I get around to looking at this. I have a lot on my plate. |
@behlendorf My guess is that there is a race condition that triggers this when imports are fast. Before commit 613d88e, imports were slow enough that we avoided the unsafe region of the race condition, but after it was committed imports into new pools were made fast enough that we enter the unsafe region with much greater certainty. I realize this is a bit of hand-waving, but my familiarity with the codebase is not good enough to go into more detail. |
You may be right about blaming 613d88e but I suspect different reasons. One accidental side-effect of this change we're just noticing is that if you create a new zpool with the updated alignment over and existing zpool with the old alignment. Then one of the old vdev labels will not get over written at appear at the beginning of partition 9. That may cause issues during import if since it will now detect the new pool in partition one and the old pool in partition 9. |
This probably goes without saying, but a workaround for this issue is to compile without --enable-debug. I was previously compiling both SPL and ZFS with --enable-debug, but I discovered that Ubuntu's daily PPA was unaffected, found the difference to be --enable-debug and worked around the issue by disabling it. So far, there has been no ill effect by disabling the assertions, although the cause of the assertion failure needs more attention. Time permitting, I will try to investigate it, although I make no promises. |
Yes, by default all the assertions are disabled since they do have a negative impact performance. I just tend to always leave them enabled when working. I doubt there will be any ill effects from this particular error, but your right it still needs to be explained and fixed. |
@gentoofan If you still have a pool which consistently hits this issue could you try reverting commit 2c6d0b1 and see if resolves the problem. I suspect this patch may have had unexpected consequences. |
@gentoofan Or better yet, if your still have a test case, can you try the above fix, behlendorf/zfs@1813cc0 , which I'm virtually certain will resolve the issue which was introduced by 2c6d0b1 . |
I tried behlendorf/zfs@1813cc0, which works with --disable-debug, but if I compile with --enable-debug, the system will hang when attempting to load the zfs module. This is during the early boot process. I am booting off a ZFS rootfs. I will try reverting 2c6d0b1 tomorrow. |
I tried reverting 2c6d0b1 and the result is the same as when I had applied behlendorf/zfs@1813cc0. I am not sure what is causing this and unfortunately, I have no way of finding out. |
With --enable-debug and the patch applied the system just hangs for the test case pool. That's odd, presumably this just occurs for the test case pool? |
This patch was slightly flawed and allowed for zio->io_logical to potentially not be reinitialized for a new zio. This could lead to assertion failures in specific cases when debugging is enabled (--enable-debug) and I/O errors are encountered. It may also have caused problems when issues logical I/Os. Since we want to make sure this workaround can be easily removed in the future (when we have the real fix). I'm reverting this change and applying a new version of the patch which includes the zio->io_logical fix. This reverts commit 2c6d0b1. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#602 Issue openzfs#604
Closing issue, I'm quite sure this fixes the described assertion. If you can reproduce a new failure mode lets open a new issue to track it. |
…aster Merge remote-tracking branch '6.0/stage' into 'master'
I have two pools across 6 drives. One is a mirrored pool called bpool and takes 64MB of each drive. The other is a raidz2 pool called rpool and takes the remainder. Both are 1MB aligned at both their start and their end. I had just booted the system after repartitioning (and restoring a snapshot) to align rpool's end at a 1MB boundary when I encountered an assertion failure:
[ 276.990683] SPLError: 14061:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed
[ 276.990726] SPLError: 14061:0:(zio.c:2893:zio_done()) SPL PANIC
[ 276.990751] SPL: Showing stack for process 14061
[ 276.990758] Pid: 14061, comm: z_rd_int/3 Tainted: P O 3.2.8 #12
[ 276.990763] Call Trace:
[ 276.990786] [] ? spl_debug_dumpstack+0x1d/0x40 [spl]
[ 276.990798] [] ? spl_debug_bug+0x73/0x100 [spl]
[ 276.990826] [] ? zio_worst_error+0x14f6/0x1ff0 [zfs]
[ 276.990849] [] ? zio_buf_free+0x220/0x11d0 [zfs]
[ 276.990869] [] ? zil_vdev_offline+0x2f6/0x6a0 [zfs]
[ 276.990899] [] ? vdev_queue_io_done+0x79/0x69e0 [zfs]
[ 276.990919] [] ? zio_execute+0xd7/0x8f0 [zfs]
[ 276.990932] [] ? __taskq_destroy+0xbf9/0x11d0 [spl]
[ 276.990941] [] ? __schedule+0x29e/0x740
[ 276.990950] [] ? try_to_wake_up+0x270/0x270
[ 276.990961] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 276.990972] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 276.990980] [] ? kthread+0x96/0xa0
[ 276.990989] [] ? kernel_thread_helper+0x4/0x10
[ 276.990997] [] ? kthread_worker_fn+0x180/0x180
[ 276.991033] [] ? gs_change+0xb/0xb
[ 276.991125] SPL: Dumping log to /tmp/spl-log.1331804541.14061
The contents of /tmp/spl-log.1331804541.14061 are available at the following address:
http://www.cs.stonybrook.edu/~ryao/spl-log.1331804541.14061
The exact sequence of events were as followed:
zfs
commanddmesg
to see assertion failureI had exported bpool at step 9 because I wanted to import it to force an update of /etc/zfs/zpool.cache for the initramfs. I have /boot symlinked to /bpool/boot, but nothing should have touched it when I had done the export.
The text was updated successfully, but these errors were encountered: