Faulted after upgrade on linux - files intact on BSD ZFS #984

MiceMiceRabies · 2012-09-23T20:12:58Z

Just upgraded my ZFS on linux, after a reboot my pools are "faulted too many errors" on all disks.
I am running Debian squeeze with kernel 3.2.0. Im not too sure where to go from here... I can supply any info needed to help track this bug.

dechamps · 2012-09-23T20:16:18Z

Start with zpool status?

MiceMiceRabies · 2012-09-23T20:37:39Z

I tried an export at first, thinking that when I go to import it may correct some error. I was wrong. Also zpool import -fD tank does not import, heres the output, this would be the same for status.

bill@workbox:~$ sudo zpool import
pool: tank
id: 13503037218073075533
state: UNAVAIL
status: One or more devices are faulted.
action: The pool cannot be imported due to damaged devices or data.
config:

tank                                               UNAVAIL  insufficient replicas
  raidz1-0                                         UNAVAIL  insufficient replicas
    scsi-SATA_SAMSUNG_HD103SJS246J90Z150704-part1  FAULTED  too many errors
    scsi-SATA_SAMSUNG_HD103SJS2AEJ1BZ406106-part1  FAULTED  too many errors
    scsi-SATA_SAMSUNG_HD103SJS2AEJ1BZ406107-part1  FAULTED  too many errors
    scsi-SATA_SAMSUNG_HD204UIS2H7J1BZC11220-part1  FAULTED  too many errors
    scsi-SATA_SAMSUNG_HD204UIS2H7J1BZC11225-part1  FAULTED  too many errors

cwedgwood · 2012-09-24T07:59:16Z

@MiceMiceRabies are all the devices available at the time /etc/init.d/zfs runs?

i have a system here where the sas controller takes maybe 20s to find all the disks and so it typically will fault on import

MiceMiceRabies · 2012-09-24T15:10:52Z

@cwedgwood - I just tried restarting via init.d with no luck.

I did boot to the latest NAS4FREE and was successful by doing a
#zpool import -f tank
All files are still intact so I know my disks are not faulted.

Another interesting thing to note, during my last upgrade I did notice that the modules did not get built into the other kernels on the system.

ryao · 2012-09-24T18:42:23Z

Have you tried importing the pool after exporting it from FreeBSD?

MiceMiceRabies · 2012-09-24T19:08:43Z

Ryao, Yes that was the next step I went for, I still get the same output on the linux side. STATE: UNAVAIL, and all drives are faulted

Ukko-Ylijumala · 2012-09-26T11:44:09Z

ZoL seems to be trying to mount the ZFS pool from partitions. Are the disks in GPT format? Did you initially make the pool on whole disks or partitions?

CONFIG_EFI_PARTITION=y enabled in kernel config?

Do issues #94, #489 or #955 have relevance for this? The most likely candidate for your problem might be the last one (AF disk detection). Which ZoL version you're running? Is your /etc/zfs/zpool.cache stale?

MiceMiceRabies · 2012-09-26T13:53:57Z

The disks are GPT. The lay out is (3) 1TB drives, (2) 2TB drives. The 2TB drives are partitioned in half so I have a total of 5 disks in my pool the leftover partitions are used for LVG.

CONFIG_EFI_PARTITION=y is enabled in the kernel config.

As for ZoL - Version: 0.6.0.80-0ubuntu2~lucid1

/etc/zfs/zpool.cache doesnt exist the only file in /etc/zfs/ is zdev.conf

I think this looks more like Issue #94 then the others because of the 1049kB offset. As for the others I am not too sure as I am very new to zfs.

Heres my drive layout:

For the 2TB drives (quantity 2)

Model: ATA SAMSUNG HD204UI (scsi)
Disk /dev/sd[b-c]: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 1000GB 1000GB zfs zfs
9 1000GB 1000GB 8389kB
2 1000GB 2000GB 1000GB zfs Linux filesystem

And the 1TB drives (quantity 3)

Model: ATA SAMSUNG HD103SJ (scsi)
Disk /dev/sd[d-f]: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 1049kB 1000GB 1000GB zfs zfs
9 1000GB 1000GB 8389kB

dechamps · 2012-09-26T13:58:11Z

I think this looks more like Issue #94 then the others because of the 1049kB offset.

No. I'll just repeat what I said in #94:

Just FYI, parted doesn't have any unit calculation issues, it just uses IEC units:

(parted) print                                                            
Model: ATA Hitachi HDP72502 (scsi)
 Disk /dev/sdj: 250GB
 Sector size (logical/physical): 512B/512B
 Partition Table: gpt

Number  Start   End    Size    File system  Name  Flags
 1      1049kB  250GB  250GB   zfs          zfs
 9      250GB   250GB  8389kB

(parted) unit kib                                                         
(parted) print                                                            
Model: ATA Hitachi HDP72502 (scsi)
 Disk /dev/sdj: 244198584kiB
 Sector size (logical/physical): 512B/512B
 Partition Table: gpt

Number  Start         End           Size          File system  Name  Flags
 1      1024kiB       244190208kiB  244189184kiB  zfs          zfs
 9      244190208kiB  244198400kiB  8192kiB

Sure, choosing kB instead of kiB as the default unit is a surprising decision from the parted folks, but that's not the issue here.

There is no "1049kB offset" issue. 1049kB is 1024kiB. Everything is fine.

MiceMiceRabies · 2012-09-27T03:22:41Z

so then its something else im missing. good to know its not #94

MiceMiceRabies · 2012-09-28T14:58:25Z

Fixed!
I had done my upgrade while running a Xen kernel which caused the issue.

To fix I purged the ubunut-zfs package, then rebooted to a non-xen kernel (3.2bpo) and reinstalled the ubuntu-zfs package.

Now all my disks appear as online again with no issues.

pool: tank
state: ONLINE
scan: none requested
config:

NAME                                               STATE     READ WRITE CKSUM
tank                                               ONLINE       0     0     0
  raidz1-0                                         ONLINE       0     0     0
    scsi-SATA_SAMSUNG_HD103SJS246J90Z150704-part1  ONLINE       0     0     0
    scsi-SATA_SAMSUNG_HD103SJS2AEJ1BZ406106-part1  ONLINE       0     0     0
    scsi-SATA_SAMSUNG_HD103SJS2AEJ1BZ406107-part1  ONLINE       0     0     0
    scsi-SATA_SAMSUNG_HD204UIS2H7J1BZC11220-part1  ONLINE       0     0     0
    scsi-SATA_SAMSUNG_HD204UIS2H7J1BZC11225-part1  ONLINE       0     0     0

errors: No known data errors

behlendorf · 2012-10-05T20:10:44Z

Great, I'm glad you got it sorted out.

Bumps [lru](https://github.com/jeromefroe/lru-rs) from 0.10.0 to 0.10.1. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](jeromefroe/lru-rs@0.10.0...0.10.1) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

behlendorf closed this as completed Oct 5, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faulted after upgrade on linux - files intact on BSD ZFS #984

Faulted after upgrade on linux - files intact on BSD ZFS #984

MiceMiceRabies commented Sep 23, 2012

dechamps commented Sep 23, 2012

MiceMiceRabies commented Sep 23, 2012

cwedgwood commented Sep 24, 2012

MiceMiceRabies commented Sep 24, 2012

ryao commented Sep 24, 2012

MiceMiceRabies commented Sep 24, 2012

Ukko-Ylijumala commented Sep 26, 2012

MiceMiceRabies commented Sep 26, 2012

dechamps commented Sep 26, 2012

MiceMiceRabies commented Sep 27, 2012

MiceMiceRabies commented Sep 28, 2012

behlendorf commented Oct 5, 2012

Faulted after upgrade on linux - files intact on BSD ZFS #984

Faulted after upgrade on linux - files intact on BSD ZFS #984

Comments

MiceMiceRabies commented Sep 23, 2012

dechamps commented Sep 23, 2012

MiceMiceRabies commented Sep 23, 2012

cwedgwood commented Sep 24, 2012

MiceMiceRabies commented Sep 24, 2012

ryao commented Sep 24, 2012

MiceMiceRabies commented Sep 24, 2012

Ukko-Ylijumala commented Sep 26, 2012

MiceMiceRabies commented Sep 26, 2012

dechamps commented Sep 26, 2012

MiceMiceRabies commented Sep 27, 2012

MiceMiceRabies commented Sep 28, 2012

behlendorf commented Oct 5, 2012