Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to import pool, mirror pool gets corrupted #10942

Closed
arturpzol opened this issue Sep 17, 2020 · 20 comments
Closed

Unable to import pool, mirror pool gets corrupted #10942

arturpzol opened this issue Sep 17, 2020 · 20 comments
Assignees
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@arturpzol
Copy link

System information

Type Version/Name
Distribution Name ALL
Distribution Version N/A
Linux Kernel 4.19, 5.7
Architecture x86_64
ZFS Version 0.8.3, 2.0.0-rc1_71_g51de0fc
SPL Version 0.8.3, 2.0.0-rc1_71_g51de0fc

Describe the problem you're observing

I have mirrored pool which is created with local and remote disks. Remote disks are connected via iSCSI (Open-iSCSI as initiator and LIO as target). Nodes are configured in cluster. When node which has active pool be force shutdowned sometimes pool cannot be imported on second node:

zpool import -d /dev/disk/by-id/ 
   pool: Pool-0
     id: 16191863491827003787
  state: FAULTED
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-EY
 config:

        Pool-0                                FAULTED  corrupted data
          mirror-0                            DEGRADED
            scsi-SQEMU_QEMU_HARDDISK_47136-6  UNAVAIL
            scsi-SQEMU_QEMU_HARDDISK_47135-2  ONLINE
          mirror-1                            FAULTED  corrupted data
            scsi-SQEMU_QEMU_HARDDISK_47135-5  ONLINE
            scsi-SQEMU_QEMU_HARDDISK_47136-2  UNAVAIL
          mirror-2                            DEGRADED
            scsi-SQEMU_QEMU_HARDDISK_47136-5  UNAVAIL
            scsi-SQEMU_QEMU_HARDDISK_47135-1  ONLINE
          mirror-3                            DEGRADED
            scsi-SQEMU_QEMU_HARDDISK_47135-6  ONLINE
            scsi-SQEMU_QEMU_HARDDISK_47136-4  UNAVAIL
          mirror-4                            DEGRADED
            scsi-SQEMU_QEMU_HARDDISK_47135-4  ONLINE
            scsi-SQEMU_QEMU_HARDDISK_47136-3  UNAVAIL
          mirror-5                            DEGRADED
            scsi-SQEMU_QEMU_HARDDISK_47136-1  UNAVAIL
            scsi-SQEMU_QEMU_HARDDISK_47135-3  ONLINE

zpool import -d /dev/disk/by-id/ Pool-0 -f  
internal error: Invalid exchange
Aborted

When the node which is shut-downed will be power on and pool is imported with all disks all works correctly (pool can be imported).

I have tested a few kernels 4.19, 5.7 and ZFS version 0.8.3, 2.0.0-rc1_71_g51de0fc, real hardware and virtual machines but in all cases corruption can be repeated. On ZFS 0.7 with kernel 4.4 I cannot repeat this issue.

One of the ZFS parameter may have an impact on this but is set to 0:

cat /sys/module/zfs/parameters/zfs_nocacheflush 
0

Write cache (WCE bit) on all disks is disabled:

...
[  108.809737] sd 6:0:0:0: [sdk] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  108.819202] sd 5:0:0:0: [sdl] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  108.824804] sd 4:0:0:0: [sdm] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
...

It looks like mirror vdev ale not fully synchronized - is it possible?

I have 100% scenario so can debug its deeper - if possible please sugest any advices.

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs


zdb -e 16191863491827003787

Configuration for import:
        vdev_children: 6
        version: 5000
        pool_guid: 16191863491827003787
        name: 'Pool-0'
        state: 0
        hostid: 1992193376
        hostname: 'node-76be7560'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 16191863491827003787
            children[0]:
                type: 'mirror'
                id: 0
                guid: 5605788562835368998
                metaslab_array: 97
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 7178498062918981115
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-6-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sdb-lun-0'
                    whole_disk: 1
                    DTL: 225
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 16591098662696590643
                    whole_disk: 1
                    DTL: 150
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-2-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-2-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:2'
            children[1]:
                type: 'mirror'
                id: 1
                guid: 282143093284297119
                metaslab_array: 92
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 6472600858552425507
                    whole_disk: 1
                    DTL: 151
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-5-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-5-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:5'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 17113231675049857162
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-2-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sdf-lun-0'
                    whole_disk: 1
                    DTL: 230
                    create_txg: 4
            children[2]:
                type: 'mirror'
                id: 2
                guid: 9984053040191654874
                metaslab_array: 91
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 15526353223082081112
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-5-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sdc-lun-0'
                    whole_disk: 1
                    DTL: 229
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 2300092093248624877
                    whole_disk: 1
                    DTL: 193
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-1-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:1'
            children[3]:
                type: 'mirror'
                id: 3
                guid: 17574792320296115157
                metaslab_array: 90
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 3766421981009873448
                    whole_disk: 1
                    DTL: 189
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-6-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-6-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:6'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 13185060219019474904
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-4-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sdd-lun-0'
                    whole_disk: 1
                    DTL: 228
                    create_txg: 4
            children[4]:
                type: 'mirror'
                id: 4
                guid: 15735968152016949572
                metaslab_array: 89
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 13836446814032290996
                    whole_disk: 1
                    DTL: 186
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-4-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:4'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 12366175748760901668
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-3-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sde-lun-0'
                    whole_disk: 1
                    DTL: 227
                    create_txg: 4
            children[5]:
                type: 'mirror'
                id: 5
                guid: 36146539366144861
                metaslab_array: 84
                metaslab_shift: 29
                ashift: 12
                asize: 21459632128
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 1483558466681550060
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47136-1-part1'
                    phys_path: 'ip-15.15.15.23:3261-iscsi-sdg-lun-0'
                    whole_disk: 1
                    DTL: 226
                    create_txg: 4
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 18020501035025221906
                    whole_disk: 1
                    DTL: 182
                    create_txg: 4
                    path: '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-3-part1'
                    devid: 'scsi-SQEMU_QEMU_HARDDISK_47135-3-part1'
                    phys_path: 'pci-0000:00:06.0-scsi-0:0:0:3'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
zdb: can't open '16191863491827003787': Invalid exchange

ZFS_DBGMSG(zdb) START:
spa.c:5490:spa_import(): spa_import: importing 16191863491827003787
spa_misc.c:408:spa_load_note(): spa_load(16191863491827003787, config trusted): LOADING
vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa 16191863491827003787. txg 2222
spa_misc.c:408:spa_load_note(): spa_load(16191863491827003787, config untrusted): using uberblock with txg=2222
vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
spa_misc.c:393:spa_load_failed(): spa_load(16191863491827003787, config trusted): FAILED: vdev_load failed [error=52]
spa_misc.c:408:spa_load_note(): spa_load(16191863491827003787, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END
cat /proc/spl/kstat/zfs/dbgmsg

1600344485   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344486   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344487   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344488   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344489   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344490   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344492   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344493   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344494   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344495   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344496   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000000299/1000000000
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000034646/1000000000
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000040601/1000000000
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000052292/1000000000
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000003818/1000000000
1600344498   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000001729/1000000000
1600344498   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344498   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344498   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344498   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344498   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): spa_load_retry: rewind, max txg: 2220
1600344498   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000009622/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000008885/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000004451/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000004366/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000002216/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000178/1000000000
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344499   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2220
1600344499   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11035437170 ub_mmp mmp_interval=1000 ub_mmp_delay=103543717 import_intervals=10
1600344499   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344501   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344502   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344503   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344504   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344505   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344506   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344508   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344509   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344510   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344511   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344512   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000029411/1000000000
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000008779/1000000000
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000033282/1000000000
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000020355/1000000000
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000003404/1000000000
1600344514   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000001911/1000000000
1600344514   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344514   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344514   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344514   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344514   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): spa_load_retry: rewind, max txg: 2219
1600344514   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000043111/1000000000
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000146774/1000000000
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000221573/1000000000
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000378374/1000000000
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000008859/1000000000
1600344515   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000670/1000000000
1600344516   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344516   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2219
1600344516   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11229268880 ub_mmp mmp_interval=1000 ub_mmp_delay=122926888 import_intervals=10
1600344516   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344517   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344518   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344519   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344520   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344521   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344522   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344524   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344525   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344526   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344527   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344528   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000018064/1000000000
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000196572/1000000000
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000214534/1000000000
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000201421/1000000000
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000020634/1000000000
1600344530   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000002988/1000000000
1600344530   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344530   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344530   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344530   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344531   spa.c:5638:spa_tryimport(): spa_tryimport: importing Pool-0
1600344531   spa_misc.c:408:spa_load_note(): spa_load($import, config trusted): LOADING
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000003614/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000294689/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000006579/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000004015/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000003699/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000001092/1000000000
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344532   spa_misc.c:408:spa_load_note(): spa_load($import, config untrusted): using uberblock with txg=2222
1600344532   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11020506040 ub_mmp mmp_interval=1000 ub_mmp_delay=102050604 import_intervals=10
1600344532   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344533   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344534   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344535   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344536   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344537   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344539   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344540   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344541   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344542   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344543   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344544   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344545   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa $import. txg 2222
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000007030/1000000000
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000068439/1000000000
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000064695/1000000000
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000176550/1000000000
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000002782/1000000000
1600344547   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000129/1000000000
1600344548   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344548   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344548   spa_misc.c:393:spa_load_failed(): spa_load($import, config trusted): FAILED: vdev_load failed [error=52]
1600344548   spa_misc.c:408:spa_load_note(): spa_load($import, config trusted): UNLOADING
1600344548   spa.c:5490:spa_import(): spa_import: importing Pool-0
1600344548   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000002579/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000004640/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000020853/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000150143/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000363331/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000001486/1000000000
1600344549   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2222
1600344549   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2222
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000001888/1000000000
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000177254/1000000000
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000129735/1000000000
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000183481/1000000000
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000000879/1000000000
1600344550   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000001828/1000000000
1600344550   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344550   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344550   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344550   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344550   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): spa_load_retry: rewind, max txg: 2221
1600344550   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000003879/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000000285/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000013811/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000010647/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000002844/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000298/1000000000
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344551   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2221
1600344551   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11065171540 ub_mmp mmp_interval=1000 ub_mmp_delay=106517154 import_intervals=10
1600344551   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344552   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344554   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344555   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344556   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344557   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344558   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344559   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344560   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344561   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344563   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344564   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344565   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-1-part1': best uberblock found for spa Pool-0. txg 2221
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000004114/1000000000
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000030912/1000000000
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000075731/1000000000
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000008639/1000000000
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000000305/1000000000
1600344567   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000011433/1000000000
1600344567   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344567   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344567   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344567   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344567   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): spa_load_retry: rewind, max txg: 2220
1600344567   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000010132/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000002113/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000002244/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000001101/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000001364/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000065/1000000000
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344568   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2220
1600344568   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11035437170 ub_mmp mmp_interval=1000 ub_mmp_delay=103543717 import_intervals=10
1600344568   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344569   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344571   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344572   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344573   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344574   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344575   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344576   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344577   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344578   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344579   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344581   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2220
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000026888/1000000000
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000149610/1000000000
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000182842/1000000000
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000195391/1000000000
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000009413/1000000000
1600344583   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000005239/1000000000
1600344583   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344583   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344583   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344583   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
1600344583   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): spa_load_retry: rewind, max txg: 2219
1600344583   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): LOADING
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1000009625/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1000002502/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1000014042/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1000000477/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1000003089/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1000000969/1000000000
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344584   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config untrusted): using uberblock with txg=2219
1600344584   spa.c:2566:spa_activity_check_duration(): fail_intvals=0 import_delay=11229268880 ub_mmp mmp_interval=1000 ub_mmp_delay=122926888 import_intervals=10
1600344584   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344585   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344586   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344588   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344589   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344590   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344591   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344592   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344593   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344594   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344595   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47135-4-part1': best uberblock found for spa Pool-0. txg 2219
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-6-part1': open error=2 timeout=1005886511/1000000000
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-5-part1': open error=2 timeout=1008791787/1000000000
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-1-part1': open error=2 timeout=1006079848/1000000000
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-3-part1': open error=2 timeout=1008842173/1000000000
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-2-part1': open error=2 timeout=1006176490/1000000000
1600344597   vdev.c:125:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47136-4-part1': open error=2 timeout=1008848700/1000000000
1600344598   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): unable to read the metaslab array [error=52]
1600344598   vdev.c:130:vdev_dbgmsg(): mirror-1 vdev (guid 282143093284297119): vdev_load: metaslab_init failed [error=52]
1600344598   spa_misc.c:393:spa_load_failed(): spa_load(Pool-0, config trusted): FAILED: vdev_load failed [error=52]
1600344598   spa_misc.c:408:spa_load_note(): spa_load(Pool-0, config trusted): UNLOADING
@arturpzol arturpzol added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Sep 17, 2020
@rincebrain
Copy link
Contributor

Is it always the same vdev (e.g. mirror-1) that reports corruption when this happens?
If you cleanly export the pool on the one host, can you cleanly import it on the other?
If you bring up the host that was forcibly shut down and do the import, does it report having to repair any data on the disk that the other server reported as corrupted?

Is your setup that each of the two servers has one local and one remote disk in each vdev, so when you forcibly power cycle the one hosting the pool, the "remote" disks all become unavailable?

In general (note that this is my vague understanding and may be wrong) I would expect writes to be considered "succeeded" when they're written to all disks in a mirror vdev, so getting corruption on only one disk of a pair surprises me somewhat, depending on how "forced" the shutdown was, and I'd further expect any sort of mangling to be limited to "oops we're rolling back to the last stable transaction on disk", not "ooh this disk is mangled I can't make progress."

It looks like the error you're getting, 52/EBADE, is used internally to mean checksum error, and the log says it tried rolling back several transaction groups and still got checksum errors trying to load the metaslab list from that disk.

@arturpzol
Copy link
Author

@rincebrain

Is it always the same vdev (e.g. mirror-1) that reports corruption when this happens?

Differently, sometimes it is mirror-1, sometimes it is mirror-3 - randomly

If you cleanly export the pool on the one host, can you cleanly import it on the other?

Yes

If you bring up the host that was forcibly shut down and do the import, does it report having to repair any data on the disk that the other server reported as corrupted?

Yes scrub repairs some errors on these disks.
Additionally, if I plug-off ethernet cable (so iSCSI will not connect, hosts can see only local disks) I can import the pool on node which was forcibly shut down and cannot import the pool on node which was online which indicates that mirror is not fully synchronized.

Is your setup that each of the two servers has one local and one remote disk in each vdev, so when you forcibly power cycle the one hosting the pool, the "remote" disks all become unavailable?

Yes, each vdev mirrror is created with one local and one remote disk.

I think that quite important information is that on ZFS 0.7 with kernel 4.4 I cannot reach to such corruption :(

@arturpzol
Copy link
Author

I have tested release ZFS 0.7.13 and seems that on this version corruption does not happen. First ZFS version where corruption occur is ZFS 0.8-rc1.

@behlendorf
Copy link
Contributor

@arturpzol I suspect this may be your issue. When the device doesn't support a write cache ZFS won't issue cache flush commands since it was told there's no cache to flush.

[  108.809737] sd 6:0:0:0: [sdk] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA

However, it sounds like some iSCSI target may misreport this. ZFS will only disable cache flushes when it issues a flushto the device and it returns a not supported error. I'm not sure exactly what LIO does, but it's possible with the 4.4 kernel it accepts the command even though it claims not to support it. I'd suggest testing zfs-0.8.4 using the 4.4 kernel if you haven't already.

If you're comfortable building from source to can apply the following patch to log a console message when ZFS determines that device doesn't support cache flushing and disables them.

diff --git a/module/os/linux/zfs/vdev_disk.c b/module/os/linux/zfs/vdev_disk.c
index e6e7df3..f99baa2 100644
--- a/module/os/linux/zfs/vdev_disk.c
+++ b/module/os/linux/zfs/vdev_disk.c
@@ -621,8 +621,10 @@ BIO_END_IO_PROTO(vdev_disk_io_flush_completion, bio, error)
        zio->io_error = -error;
 #endif
 
-       if (zio->io_error && (zio->io_error == EOPNOTSUPP))
+       if (zio->io_error && (zio->io_error == EOPNOTSUPP)) {
+               printk(KERN_WARNING "ZFS: Disabling cache flushes");
                zio->io_vd->vdev_nowritecache = B_TRUE;
+       }
 
        bio_put(bio);
        ASSERT3S(zio->io_error, >=, 0);

@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Sep 23, 2020
@arturpzol
Copy link
Author

@behlendorf I have tested:

  • ZFS 0.7.13 with kernel 4.19 - corruption does not occur
  • ZFS 0.8-4 with kernel 4.4 - corruption occurs
  • ZFS 0.8-rc1 with kernel 4.19 - corruption occurs
  • ZFS 0.8-3 with kernel 4.19 - corruption occurs
  • ZFS 0.8-3 with kernel 5.7 - corruption occurs
  • ZFS 2.0.0-rc1_71_g51de0fc with kernel 4.19 - corruption occurs

so I think the kernel version doesn't matter.

Additionally on all my environments cache flushes is enabled. I allowed myself to modify yours patch:

Index: module/zfs/vdev_disk.c
===================================================================
--- module/zfs/vdev_disk.c      (revision 56008)
+++ module/zfs/vdev_disk.c      (working copy)
@@ -638,8 +638,12 @@
        zio->io_error = -error;
 #endif
 
-       if (zio->io_error && (zio->io_error == EOPNOTSUPP))
+       if (zio->io_error && (zio->io_error == EOPNOTSUPP)){
+               printk(KERN_WARNING "ZFS: Disabling cache flushes");
                zio->io_vd->vdev_nowritecache = B_TRUE;
+        } else {
+               printk(KERN_WARNING "ZFS: Cache flushes enabled");
+        }
 
        bio_put(bio);
        ASSERT3S(zio->io_error, >=, 0);

and kernel logs shows:

[ 1072.095822] ZFS: Cache flushes enabled
[ 1072.095827] ZFS: Cache flushes enabled
[ 1072.450951] ZFS: Cache flushes enabled

also tested ZFS 0.8 with vol_request_sync=1 but without change.

Is there anything else I can check?

@behlendorf
Copy link
Contributor

Yes it does seem that way. Though it's a little surprising this this code hasn't change in a long time. The next thing I'd suggest checking is that the cache flushes themselves are being issued successfully. This is done by the vdev_disk_io_flush() function which is called by vdev_disk_io_start(), you should see a cache flush per-vdev for each TXG. You can add little debugging after the call to verify this is happening. Something like this:

         error = vdev_disk_io_flush(vd->vd_bdev);
+        zfs_dbgmsg("vdev_disk_io_flush(%s) = %d", vd->vd_path, error);
         if (error == 0) {

The internal log can be read by dumping the /proc/spl/kstat/zfs/dbgmsg file.

@arturpzol
Copy link
Author

@behlendorf with debug I can see that flush is executed each time without error:

1601020981   vdev_disk.c:734:vdev_disk_io_start(): vdev_disk_io_flush(/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47209-5-part1) = 0
1601020981   vdev_disk.c:734:vdev_disk_io_start(): vdev_disk_io_flush(/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47209-1-part1) = 0
1601020981   vdev_disk.c:734:vdev_disk_io_start(): vdev_disk_io_flush(/dev/disk/by-id/scsi-SQEMU_QEMU_HARDDISK_47209-6-part1) = 0

I have noticed one important point. After hard shutdown and when node is back, automatic resilver is performed (because remote iSCSI disks are back) so if after finished resilver I manually run the scrub corruption does not occur. In order to bring the corruption I need to shutdown the nodes one after the other so exemplary full scenario is:

  1. Pool active on Node A, mirrored vdev based on local and remote iSCSI disks.
  2. Force shutdown the Node A.
  3. Pool is imported on Node B (by the cluster).
  4. Power on Node A. Disks on Node B back and resilver is performed.
  5. Force shutdown the Node B.
  6. Pool is imported on Node A (by the cluster).
  7. Power on Node B. Disks on Node A back and resilver is performed.
  8. Force shutdown the Node A.
  9. Node B is trying to import the pool but without success. Mostly in this point the issue occurs with errors:

internal error: Invalid exchange
or
Destroy and re-create the pool from a backup source.

so now if I power on the node A and disks back, pool can be successfully imported.

@behlendorf as you wrote the code for flushing hasn't change in a long time, so maybe resilver (rewritten in ZFS 0.8 rc-1) is not full and some data losts,
Although tried with zfs_scan_legacy=1 and zfs_resilver_disable_defer=1 but effect is the same.

@arturpzol
Copy link
Author

arturpzol commented Oct 2, 2020

@behlendorf I have repeated the issue with shared storage with two SAS JBODs and two nodes in the cluster. Mirrored vdevs were created with disks from both SAS JBODs. In order to lead to the corruption one JBOD and node with active pool has been powered off and so on.

I wanted to eliminate iSCSI initiator or target layer (remote disks), so the issue can be also repeated on normal hardware connection.

@arturpzol
Copy link
Author

@behlendorf I did git bisect and found that commit: a1d477c from ZFS 0.8-rc1 introduced such corruption.

Issue can be simulated with following script:

#!/bin/bash

set -x

MIRROR1=("sdb" "sdc")
MIRROR2=("sdd" "sde")

POOL="Tank"
HOST_NR="0"

zpool create ${POOL} mirror ${MIRROR1[@]} mirror ${MIRROR2[@]}
zfs create -s -b 131072 -o sync=always -V 10737418240 ${POOL}/1
zvol_wait

exit_from_main_loop () {
    stop_io
    exit
}

run_io()
{
    dd if=/dev/zero of=/dev/${POOL}/1 oflag=direct &>/dev/null &
}

stop_io()
{
    killall dd
    sleep 15
}

wait_for_pool_online()
{
    zpool reopen ${POOL}
    sleep 15

    while ! cat /proc/spl/kstat/zfs/${POOL}/state | grep ONLINE; do
        sleep 5
    done
}

wait_for_resilver_to_finish()
{
    while zpool status ${POOL} | grep "resilver in progress" ; do
        sleep 5
    done
    
    while ! zpool status ${POOL} | grep "resilvered" ; do
        sleep 5
    done
}

test_import()
{
    zpool export ${POOL}
    sleep 5
    zpool import -d /dev/disk/by-id/ ${POOL}
    if [ "`echo $?`" -ne 0 ]; then
        exit
    fi
    zvol_wait
}


main_loop ()
{
    date
    run_io
    
    sleep 10
    # Remove two devices
    echo 1 > /sys/block/${MIRROR1[0]}/device/delete
    echo 1 > /sys/block/${MIRROR2[0]}/device/delete
    sleep 5
    
    stop_io
    test_import
    run_io

    sleep 15
    # "Plug" back removed devices
    echo "- - -" > /sys/class/scsi_host/host${HOST_NR}/scan
    sleep 30
    wait_for_pool_online
    sleep 15
    wait_for_resilver_to_finish
    
    sleep 10
    # Remove two devices
    echo 1 > /sys/block/${MIRROR1[1]}/device/delete
    echo 1 > /sys/block/${MIRROR2[1]}/device/delete
    sleep 5

    stop_io
    test_import
    run_io
    
    sleep 15
    # "Plug" back removed devices
    echo "- - -" > /sys/class/scsi_host/host${HOST_NR}/scan
    sleep 30
    wait_for_pool_online
    sleep 15
    wait_for_resilver_to_finish
    sleep 10
    stop_io
}

while true; do
    trap exit_from_main_loop EXIT INT TERM
    main_loop
done

The test takes a few loops (e.g. with ZFS 0.8.3) to gain to the corruption with one of bellow errors:

zpool import -d /dev/disk/by-id/

   pool: Tank
     id: 2783007011672722007
  state: FAULTED
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-5E
 config:

        Tank                  FAULTED  corrupted data
          mirror-0                  DEGRADED
            wwn-0x5000c5003c3551cb  ONLINE
            wwn-0x5000c500436f5bbb  UNAVAIL
          mirror-1                  FAULTED  corrupted data
            wwn-0x5000c50043716123  ONLINE
            wwn-0x5000c50043b38587  UNAVAIL


zpool import -d /dev/disk/by-id/ Tank
internal error: Invalid exchange

other case:

zpool import -d /dev/disk/by-id/ Tank
cannot import 'Tank: I/O error
        Destroy and re-create the pool from
        a backup source.

of course if we plug back removed devices corruption disappears:

echo "- - -" > /sys/class/scsi_host/host1/scan
zpool reopen


zpool import -d /dev/disk/by-id/ Tank
zpool status
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 266K in 0 days 00:00:01 with 0 errors on Wed Oct 14 12:43:18 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        Tank                  ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            wwn-0x5000c5003c3551cb  ONLINE       0     0     0
            wwn-0x5000c500436f5bbb  ONLINE       0     0     2
          mirror-1                  ONLINE       0     0     0
            wwn-0x5000c50043716123  ONLINE       0     0     1
            wwn-0x5000c50043b38587  ONLINE       0     0     1

errors: No known data errors

but in case on real hardware when disks are damaged, plug back the disks is not possible.

Similar bugs are reported in #10161 and #10910

I tried to disable features: feature@device_removal=disabled and feature@obsolete_counts=disabled which were introduced in a1d477c but unfortunately issue can be also repeated.

@behlendorf, @ahrens do you see any change of this commit that could affect to the such corruption (e.g. resilver, synchronization or something else) ?

@arturpzol
Copy link
Author

@behlendorf, @ahrens is there a chance to have a look into that issue by you?

I tried to partially revert the commit a1d477c but it has a lot of dependencies and a revert is not possible in easy way.

When PR #6900 was in progress some comments about corruption has been posted and new PR openzfs/openzfs#561 has been proposed. I tried it but unfortunately with that PR corruption also occurs.

@behlendorf
Copy link
Contributor

@arturpzol thanks for bisecting this to narrow it down. I'll try to find some time to investigate.

@arturpzol
Copy link
Author

I tried to initialize the pool using zpool initialize as suggested in #10161 and leading to the corruption is more difficult but after some time assertion from dmu_buf_hold_array occurs:

[10695.200342] VERIFY3(0 == dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, &numbufs, &dbp)) failed (0 == 5)
[10695.200344] PANIC at dmu.c:1096:dmu_write()
[10695.200345] Showing stack for process 4943
[10695.200352] CPU: 0 PID: 4943 Comm: txg_sync Kdump: loaded Tainted: P           O      4.19.75-oe64-00000-g6bb029e #47
[10695.200352] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[10695.200354] Call Trace:
[10695.200384]  dump_stack+0x5c/0x7b
[10695.200421]  spl_panic+0xc7/0x110 [spl]
[10695.200517]  ? zio_nowait+0xaa/0x140 [zfs]
[10695.200542]  ? dmu_buf_hold_array_by_dnode+0x35f/0x4a0 [zfs]
[10695.200546]  ? spl_kmem_cache_free+0x132/0x1e0 [spl]
[10695.200570]  dmu_write.part.10+0xf5/0x100 [zfs]
[10695.200617]  space_map_write+0x190/0x600 [zfs]
[10695.200631]  ? spl_kmem_zalloc+0xa8/0x140 [spl]
[10695.200635]  ? kvm_clock_get_cycles+0xd/0x10
[10695.200637]  ? ktime_get_raw_ts64+0x34/0xb0
[10695.200663]  ? dmu_tx_create_dd+0x90/0xd0 [zfs]
[10695.200692]  metaslab_sync+0x4b4/0xba0 [zfs]
[10695.200703]  ? kvm_clock_get_cycles+0xd/0x10
[10695.200704]  ? ktime_get_raw_ts64+0x34/0xb0
[10695.200729]  ? dmu_tx_create_dd+0x90/0xd0 [zfs]
[10695.200761]  vdev_sync+0x63/0x1c0 [zfs]
[10695.200795]  spa_sync+0x5ce/0xf50 [zfs]
[10695.200835]  ? spa_txg_history_init_io+0xf9/0x100 [zfs]
[10695.200866]  txg_sync_thread+0x2c2/0x490 [zfs]
[10695.200901]  ? txg_wait_open+0xd0/0xd0 [zfs]
[10695.200905]  ? __thread_exit+0x10/0x10 [spl]
[10695.200908]  thread_generic_wrapper+0x73/0x80 [spl]
[10695.200915]  kthread+0x105/0x140
[10695.200919]  ? kthread_park+0xa0/0xa0
[10695.200922]  ret_from_fork+0x35/0x40

@behlendorf did you have a chance to look into this issue?

@arturpzol
Copy link
Author

I have made some research and if feature@spacemap_histogram is disabled corruption does not happen. Also changing size of space_map_phys_t (of course only for test) for example by removing not used smp_object does not cause corruption:

--- space_map.c (revision 56232)
+++ space_map.c (working copy)
Index: include/sys/space_map.h
===================================================================
--- include/sys/space_map.h     (revision 56232)
+++ include/sys/space_map.h     (working copy)
@@ -55,8 +55,8 @@
  * for backward compatibility.
  */
 typedef struct space_map_phys {
-       /* object number: not needed but kept for backwards compatibility */
-       uint64_t        smp_object;
+//     /* object number: not needed but kept for backwards compatibility */
+//     uint64_t        smp_object;
 
        /* length of the object in bytes */
        uint64_t        smp_length;
Index: module/zfs/space_map.c
===================================================================
--- module/zfs/space_map.c      (revision 56232)
+++ module/zfs/space_map.c      (working copy)
@@ -737,12 +737,12 @@
         * now contains the object number but is maintained for backwards
         * compatibility.
         */
-       sm->sm_phys->smp_object = sm->sm_object;
-
-       if (range_tree_is_empty(rt)) {
-               VERIFY3U(sm->sm_object, ==, sm->sm_phys->smp_object);
-               return;
-       }
+//     sm->sm_phys->smp_object = sm->sm_object;
+//
+//     if (range_tree_is_empty(rt)) {
+//             VERIFY3U(sm->sm_object, ==, sm->sm_phys->smp_object);
+//             return;
+//     }
 
        if (maptype == SM_ALLOC)
                sm->sm_phys->smp_alloc += range_tree_space(rt);

Maybe some part of spacemap is saved with wrong shifting or something silently overwrite some part of spacemap. As wrote above this issue was introduced in a1d477c.

@behlendorf are you able to look into it or suggest what can be checked next?

@behlendorf behlendorf self-assigned this Nov 10, 2020
@arko-pl
Copy link

arko-pl commented Nov 14, 2020

I reproduced this issue with @arturpzol script. Seems that code related to mutex removal from the RT introduced this issue in a1d477c. Here's the code that fixes this issue. For the test purposes I dropped locking the vd->vdev_dtl_lock mutex to avoid deadlock.

diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
index f16e630..52fc7b7 100644
--- a/module/zfs/vdev.c
+++ b/module/zfs/vdev.c
@@ -2473,21 +2473,10 @@ vdev_dtl_contains(vdev_t *vd, vdev_dtl_type_t t, uint64_t txg, uint64_t size)
        ASSERT(t < DTL_TYPES);
        ASSERT(vd != vd->vdev_spa->spa_root_vdev);
 
-       /*
-        * While we are loading the pool, the DTLs have not been loaded yet.
-        * Ignore the DTLs and try all devices.  This avoids a recursive
-        * mutex enter on the vdev_dtl_lock, and also makes us try hard
-        * when loading the pool (relying on the checksum to ensure that
-        * we get the right data -- note that we while loading, we are
-        * only reading the MOS, which is always checksummed).
-        */
-       if (vd->vdev_spa->spa_load_state != SPA_LOAD_NONE)
-               return (B_FALSE);
-
-       mutex_enter(&vd->vdev_dtl_lock);
+//     mutex_enter(&vd->vdev_dtl_lock);
        if (!range_tree_is_empty(rt))
                dirty = range_tree_contains(rt, txg, size);
-       mutex_exit(&vd->vdev_dtl_lock);
+//     mutex_exit(&vd->vdev_dtl_lock);
 
        return (dirty);
 }
@@ -4119,8 +4108,7 @@ vdev_stat_update(zio_t *zio, uint64_t psize)
        if (zio->io_vd == NULL && (zio->io_flags & ZIO_FLAG_DONT_PROPAGATE))
                return;
 
-       if (spa->spa_load_state == SPA_LOAD_NONE &&
-           type == ZIO_TYPE_WRITE && txg != 0 &&
+       if (type == ZIO_TYPE_WRITE && txg != 0 &&
            (!(flags & ZIO_FLAG_IO_REPAIR) ||
            (flags & ZIO_FLAG_SCAN_THREAD) ||
            spa->spa_claiming)) {

@behlendorf
Copy link
Contributor

@arturpzol @arko-pl thank you for doing the hard work of identifying the offending commit and isolating the problematic code! Based on your findings and test case I've opened PR #11218 with a fix similar to the one above (but with locking). It's held up well in my local testing using your test case, however there's nothing quite like the real thing. If it wouldn't be too much to ask would you mind verifying the fix in your test environment.

@ahrens
Copy link
Member

ahrens commented Nov 18, 2020

Great find, thanks!

@behlendorf how do the missing DTL entries cause the pool to not be importable? Without the DTL entries that are encountered as a result of the zil claim's, shouldn't we try both sides of the mirror? Unless there's some block that is not checksummed and we are blindly trusting what we get from disk?

@behlendorf
Copy link
Contributor

behlendorf commented Nov 18, 2020

@ahrens the heart of the problem is that the missing DTL entries cause any resilvering which happens before SPA_LOAD_NONE is set to incorrectly decide there's nothing to repair. That results in the pool determining that full redundancy has been restored for all blocks when it may not have been. The pool then can't be imported because we actually don't any valid copies of the blocks. This test is particularly good and causing the problem since it repeatedly fails one half of the mirror, exports, imports, resilvers, then repeats the process with the other half of the mirror.

So the issue is really more the use of SPA_LOAD_NONE in vdev_dtl_contains() and not any specific DTL entries.

@ahrens
Copy link
Member

ahrens commented Nov 19, 2020

Interesting. It would be good to preserve that reasoning somewhere - in a comment or the commit message.

Maybe we shouldn't be doing resilvering while loading (before SPA_LOAD_NONE), which would be another way of addressing the problem. That should speed up the import as well. We can also make the changes proposed here if you like.

@arturpzol
Copy link
Author

@behlendorf very thank you for the fix, looks promising, so far the tests are stable without problem with pool importing for several hours. I will let you know about the full results after the tests on a few different environments.

@arturpzol
Copy link
Author

@behlendorf after a few days of tests on a few different environments fix seems to be stable. I think that bug can be closed. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

5 participants