Assertion failure when importing mirrored pool #604

ryao · 2012-03-15T09:56:26Z

I have two pools across 6 drives. One is a mirrored pool called bpool and takes 64MB of each drive. The other is a raidz2 pool called rpool and takes the remainder. Both are 1MB aligned at both their start and their end. I had just booted the system after repartitioning (and restoring a snapshot) to align rpool's end at a 1MB boundary when I encountered an assertion failure:

[ 276.990683] SPLError: 14061:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed
[ 276.990726] SPLError: 14061:0:(zio.c:2893:zio_done()) SPL PANIC
[ 276.990751] SPL: Showing stack for process 14061
[ 276.990758] Pid: 14061, comm: z_rd_int/3 Tainted: P O 3.2.8 #12
[ 276.990763] Call Trace:
[ 276.990786] [] ? spl_debug_dumpstack+0x1d/0x40 [spl]
[ 276.990798] [] ? spl_debug_bug+0x73/0x100 [spl]
[ 276.990826] [] ? zio_worst_error+0x14f6/0x1ff0 [zfs]
[ 276.990849] [] ? zio_buf_free+0x220/0x11d0 [zfs]
[ 276.990869] [] ? zil_vdev_offline+0x2f6/0x6a0 [zfs]
[ 276.990899] [] ? vdev_queue_io_done+0x79/0x69e0 [zfs]
[ 276.990919] [] ? zio_execute+0xd7/0x8f0 [zfs]
[ 276.990932] [] ? __taskq_destroy+0xbf9/0x11d0 [spl]
[ 276.990941] [] ? __schedule+0x29e/0x740
[ 276.990950] [] ? try_to_wake_up+0x270/0x270
[ 276.990961] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 276.990972] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 276.990980] [] ? kthread+0x96/0xa0
[ 276.990989] [] ? kernel_thread_helper+0x4/0x10
[ 276.990997] [] ? kthread_worker_fn+0x180/0x180
[ 276.991033] [] ? gs_change+0xb/0xb
[ 276.991125] SPL: Dumping log to /tmp/spl-log.1331804541.14061

The contents of /tmp/spl-log.1331804541.14061 are available at the following address:

http://www.cs.stonybrook.edu/~ryao/spl-log.1331804541.14061

The exact sequence of events were as followed:

Restore backup of rpool from backup pool (external drive) after repartitioning and recreating rpool
Boot the system
Encounter failure due to rpool being faulted.
Reboot to rescue mode of initramfs
Export rpool
Notice that backup pool (no longer connected) is listed, so export it.
Import rpool
Resume boot sequence.
Log into system
Recompile initramfs and while it is running, export bpool
Import bpool, which hangs.
Run zfs command
Run dmesg to see assertion failure

I had exported bpool at step 9 because I wanted to import it to force an update of /etc/zfs/zpool.cache for the initramfs. I have /boot symlinked to /bpool/boot, but nothing should have touched it when I had done the export.

The text was updated successfully, but these errors were encountered:

ryao · 2012-03-15T10:02:47Z

I should probably add that I used the Ubuntu LiveCD with the daily PPA to do repartitioning. I had a failure occur when trying to import rpool in order to send a snapshot to the backup pool before I had repartitioned the disk. The dmesg output was at the following url:

http://paste.pocoo.org/show/565855/

After the import hung, I tried running "zfs" and "zpool" once or twice, which locked each terminal. :/

behlendorf · 2012-03-15T18:20:38Z

Interestingly I saw this for the first time yesterday as well, I filed it as issue #602.

ryao · 2012-03-15T22:38:30Z

Do you want to keep this one open until we can verify a fix for this resolves things on my system?

With that said, I just hit this again:

[ 177.536568] SPLError: 6297:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed
[ 177.536595] SPLError: 6297:0:(zio.c:2893:zio_done()) SPL PANIC
[ 177.536614] SPL: Showing stack for process 6297
[ 177.536616] Pid: 6297, comm: z_rd_int/1 Tainted: P O 3.2.9 #2
[ 177.536618] Call Trace:
[ 177.536628] [] ? spl_debug_dumpstack+0x1d/0x40 [spl]
[ 177.536631] [] ? spl_debug_bug+0x73/0x100 [spl]
[ 177.536642] [] ? zio_worst_error+0x14f6/0x1ff0 [zfs]
[ 177.536647] [] ? zio_buf_free+0x220/0x11d0 [zfs]
[ 177.536652] [] ? zil_vdev_offline+0x2f6/0x6a0 [zfs]
[ 177.536660] [] ? vdev_queue_io_done+0x79/0x69e0 [zfs]
[ 177.536665] [] ? zio_execute+0xd7/0x8f0 [zfs]
[ 177.536668] [] ? __taskq_destroy+0xbf9/0x11d0 [spl]
[ 177.536671] [] ? __schedule+0x29e/0x740
[ 177.536674] [] ? try_to_wake_up+0x270/0x270
[ 177.536677] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 177.536680] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 177.536682] [] ? kthread+0x96/0xa0
[ 177.536685] [] ? kernel_thread_helper+0x4/0x10
[ 177.536687] [] ? kthread_worker_fn+0x180/0x180
[ 177.536688] [] ? gs_change+0xb/0xb
[ 177.536757] SPL: Dumping log to /tmp/spl-log.1331849987.6297
[ 177.537705] SPLError: 6300:0:(zio.c:2893:zio_done()) ASSERTION(zio->io_vd == NULL && zio->io_bp != NULL) failed
[ 177.537746] SPLError: 6300:0:(zio.c:2893:zio_done()) SPL PANIC
[ 177.537771] SPL: Showing stack for process 6300
[ 177.537778] Pid: 6300, comm: z_rd_int/4 Tainted: P O 3.2.9 #2
[ 177.537783] Call Trace:
[ 177.537803] [] ? spl_debug_dumpstack+0x1d/0x40 [spl]
[ 177.537814] [] ? spl_debug_bug+0x73/0x100 [spl]
[ 177.537838] [] ? zio_worst_error+0x14f6/0x1ff0 [zfs]
[ 177.537860] [] ? zio_buf_free+0x220/0x11d0 [zfs]
[ 177.537881] [] ? zil_vdev_offline+0x2f6/0x6a0 [zfs]
[ 177.537910] [] ? vdev_queue_io_done+0x79/0x69e0 [zfs]
[ 177.537931] [] ? zio_execute+0xd7/0x8f0 [zfs]
[ 177.537943] [] ? __taskq_destroy+0xbf9/0x11d0 [spl]
[ 177.537951] [] ? __schedule+0x29e/0x740
[ 177.537959] [] ? try_to_wake_up+0x270/0x270
[ 177.537970] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 177.537981] [] ? __taskq_destroy+0x980/0x11d0 [spl]
[ 177.537988] [] ? kthread+0x96/0xa0
[ 177.537997] [] ? kernel_thread_helper+0x4/0x10
[ 177.538026] [] ? kthread_worker_fn+0x180/0x180
[ 177.538034] [] ? gs_change+0xb/0xb
[ 177.538152] SPL: Dumping log to /tmp/spl-log.1331849987.6300

I had booted using a dracut-built initramfs when trying to debug the cause of a deadlock in a genkernel-built initramfs. bpool had not been mounted by dracut, so I ran zpool import bpool, which froze. I then checked dmesg and saw a double assertion failure.

The logs are available from the following urls:

http://www.cs.stonybrook.edu/~ryao/spl-log.1331849987.6297
http://www.cs.stonybrook.edu/~ryao/spl-log.1331849987.6300

behlendorf · 2012-03-15T22:52:03Z

Sure, I don't mind leaving both open. Although it might be a while before I get around to looking at this. I have a lot on my plate.

ryao · 2012-03-15T23:04:46Z

@behlendorf My guess is that there is a race condition that triggers this when imports are fast. Before commit 613d88e, imports were slow enough that we avoided the unsafe region of the race condition, but after it was committed imports into new pools were made fast enough that we enter the unsafe region with much greater certainty.

I realize this is a bit of hand-waving, but my familiarity with the codebase is not good enough to go into more detail.

behlendorf · 2012-03-15T23:45:52Z

You may be right about blaming 613d88e but I suspect different reasons. One accidental side-effect of this change we're just noticing is that if you create a new zpool with the updated alignment over and existing zpool with the old alignment. Then one of the old vdev labels will not get over written at appear at the beginning of partition 9. That may cause issues during import if since it will now detect the new pool in partition one and the old pool in partition 9.

ryao · 2012-03-16T01:05:44Z

This probably goes without saying, but a workaround for this issue is to compile without --enable-debug. I was previously compiling both SPL and ZFS with --enable-debug, but I discovered that Ubuntu's daily PPA was unaffected, found the difference to be --enable-debug and worked around the issue by disabling it.

So far, there has been no ill effect by disabling the assertions, although the cause of the assertion failure needs more attention. Time permitting, I will try to investigate it, although I make no promises.

behlendorf · 2012-03-16T02:27:27Z

Yes, by default all the assertions are disabled since they do have a negative impact performance. I just tend to always leave them enabled when working. I doubt there will be any ill effects from this particular error, but your right it still needs to be explained and fixed.

behlendorf · 2012-03-20T18:56:29Z

@gentoofan If you still have a pool which consistently hits this issue could you try reverting commit 2c6d0b1 and see if resolves the problem. I suspect this patch may have had unexpected consequences.

behlendorf · 2012-03-20T22:13:19Z

@gentoofan Or better yet, if your still have a test case, can you try the above fix, behlendorf/zfs@1813cc0 , which I'm virtually certain will resolve the issue which was introduced by 2c6d0b1 .

ryao · 2012-03-21T04:04:08Z

I tried behlendorf/zfs@1813cc0, which works with --disable-debug, but if I compile with --enable-debug, the system will hang when attempting to load the zfs module. This is during the early boot process. I am booting off a ZFS rootfs.

I will try reverting 2c6d0b1 tomorrow.

ryao · 2012-03-21T15:01:34Z

I tried reverting 2c6d0b1 and the result is the same as when I had applied behlendorf/zfs@1813cc0. I am not sure what is causing this and unfortunately, I have no way of finding out.

behlendorf · 2012-03-21T16:09:47Z

With --enable-debug and the patch applied the system just hangs for the test case pool. That's odd, presumably this just occurs for the test case pool?

This patch was slightly flawed and allowed for zio->io_logical to potentially not be reinitialized for a new zio. This could lead to assertion failures in specific cases when debugging is enabled (--enable-debug) and I/O errors are encountered. It may also have caused problems when issues logical I/Os. Since we want to make sure this workaround can be easily removed in the future (when we have the real fix). I'm reverting this change and applying a new version of the patch which includes the zio->io_logical fix. This reverts commit 2c6d0b1. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#602 Issue openzfs#604

behlendorf · 2012-03-24T20:13:36Z

Closing issue, I'm quite sure this fixes the described assertion. If you can reproduce a new failure mode lets open a new issue to track it.

…aster Merge remote-tracking branch '6.0/stage' into 'master'

ryao mentioned this issue Mar 16, 2012

ASSERT(zio->io_vd == NULL && zio->io_bp != NULL); #602

Closed

behlendorf closed this as completed Mar 24, 2012

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023

Merge pull request openzfs#604 from delphix/projects/merge-upstream/m…

db3f3f6

…aster Merge remote-tracking branch '6.0/stage' into 'master'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion failure when importing mirrored pool #604

Assertion failure when importing mirrored pool #604

ryao commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 16, 2012

behlendorf commented Mar 16, 2012

behlendorf commented Mar 20, 2012

behlendorf commented Mar 20, 2012

ryao commented Mar 21, 2012

ryao commented Mar 21, 2012

behlendorf commented Mar 21, 2012

behlendorf commented Mar 24, 2012

Assertion failure when importing mirrored pool #604

Assertion failure when importing mirrored pool #604

Comments

ryao commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 15, 2012

behlendorf commented Mar 15, 2012

ryao commented Mar 16, 2012

behlendorf commented Mar 16, 2012

behlendorf commented Mar 20, 2012

behlendorf commented Mar 20, 2012

ryao commented Mar 21, 2012

ryao commented Mar 21, 2012

behlendorf commented Mar 21, 2012

behlendorf commented Mar 24, 2012