Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding second spare to pool causes core dump. #4247

Closed
waxhead opened this issue Jan 20, 2016 · 4 comments
Closed

Adding second spare to pool causes core dump. #4247

waxhead opened this issue Jan 20, 2016 · 4 comments

Comments

@waxhead
Copy link

waxhead commented Jan 20, 2016

Hi,

This is the RAID and disk setup on the controller:
hptraidconf
HighPoint RAID Management Command Line Utility v3.3
Copyright (C) 2009 HighPoint Technologies, Inc. All rights reserved.

Login:RAID
Password:
HighPoint CLI>query arrays

ID Capacity(GB) Type Status Block Sector Cache Name

1/1 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_0
1/2 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_1
1/3 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_2
1/4 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_3
1/5 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_4
1/6 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_5
1/7 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_6
1/8 4000.79 Hard Disk LEGACY -- 512B NONE HPT DISK 0_7

This is the disks by id:

root@piglet~ # ls -l /dev/disk/by-id/
total 0
lrwxrwxrwx 1 root root 9 Jan 20 19:48 ata-Corsair_Force_3_SSD_12166503000013411391 -> ../../sda
lrwxrwxrwx 1 root root 10 Jan 20 19:49 ata-Corsair_Force_3_SSD_12166503000013411391-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 ata-Corsair_Force_3_SSD_12166503000013411391-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jan 20 19:48 ata-Corsair_Force_3_SSD_12166503000013411391-part3 -> ../../sda3
lrwxrwxrwx 1 root root 9 Jan 20 19:48 ata-SAMSUNG_DVDWBD_SH-B083L_R78L6GHZ406356 -> ../../sr0
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0000000000 -> ../../sdb
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0000000000-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0000000000-part9 -> ../../sdb9
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0100000000 -> ../../sdc
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0100000000-part1 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0100000000-part9 -> ../../sdc9
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0200000000 -> ../../sdd
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0200000000-part1 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0200000000-part9 -> ../../sdd9
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0300000000 -> ../../sde
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0300000000-part1 -> ../../sde1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0300000000-part9 -> ../../sde9
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0400000000 -> ../../sdf
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0400000000-part1 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0400000000-part9 -> ../../sdf9
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0500000000 -> ../../sdg
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0500000000-part1 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0500000000-part9 -> ../../sdg9
lrwxrwxrwx 1 root root 9 Jan 20 20:07 scsi-200193c0600000000 -> ../../sdh
lrwxrwxrwx 1 root root 9 Jan 20 19:49 scsi-200193c0700000000 -> ../../sdi
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0700000000-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 Jan 20 19:49 scsi-200193c0700000000-part9 -> ../../sdi9

One of the disks failed in the RAID and was eventually replaced, the RAIDZ-2 is meant to be 7 data disks and a spare. You can see there are only 7 disks and 05 is the spare.

pool: datastore
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 2.61T in 20h7m with 0 errors on Tue Jan 19 16:04:57 2016
config:

    NAME                            STATE     READ WRITE CKSUM
    datastore                       DEGRADED     0     0     0
      raidz2-0                      DEGRADED     0     0     0
        scsi-200193c0000000000      ONLINE       0     0     0
        scsi-200193c0100000000      ONLINE       0     0     0
        scsi-200193c0200000000      ONLINE       0     0     0
        scsi-200193c0300000000      ONLINE       0     0     0
        scsi-200193c0400000000      ONLINE       0     0     0
        spare-5                     DEGRADED     0     0     0
          spare-0                   DEGRADED     0     0     0
            old                     OFFLINE      0     0     0
            scsi-200193c0500000000  ONLINE       0     0     0
          scsi-200193c0500000000    ONLINE       0     0     0
        scsi-200193c0700000000      ONLINE       0     0     0
    spares
      scsi-200193c0500000000        INUSE     currently in use

errors: No known data errors

I'm trying to add /dev/disk/by-id/scsi-200193c0600000000 back into the pool, and it was suggested to add it back in as a spare, which should push 05 into the pool.. ?

So this is the error:

root@piglet~ #zpool add datastore spare /dev/disk/by-id/scsi-200193c0600000000
zpool: zpool_vdev.c:895: Assertion `nvlist_lookup_string(cnv, ZPOOL_CONFIG_PATH, &path) == 0' failed.
Aborted (core dumped)

@behlendorf
Copy link
Contributor

Thanks for taking the time to file this issue so we can resolve it. You've got an interesting case here, it looks like you've replaced a spare with a spare which isn't a common case. That may be related to the issue.

@waxhead
Copy link
Author

waxhead commented Jan 21, 2016

Brian,

Thanks for taking the time to reply. I agree that the pool is in an
interesting state. Put it down to weird things happing, label changes and
sheer ignorance on my behalf.

For the most part I'm looking to return the pool back to 7 disks with a
spare.

I realised the I didn't include info on OS and zfs version in my report.
I'll update that tonight.

Peter.
On 22 Jan 2016 8:24 AM, "Brian Behlendorf" notifications@github.com wrote:

Thanks for taking the time to file this issue so we can resolve it. You've
got an interesting case here, it looks like you've replaced a spare with a
spare which isn't a common case. That may be related to the issue.


Reply to this email directly or view it on GitHub
#4247 (comment).

@waxhead
Copy link
Author

waxhead commented Jan 22, 2016

Addtional informaiton:

Linux piglet 3.19.0-43-generic #49-Ubuntu SMP Sun Dec 27 19:43:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

@loli10K
Copy link
Contributor

loli10K commented Feb 4, 2018

This is probably fixed by 390d679.

@loli10K loli10K closed this as completed Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants