manually replaced drive with, now new drive = unavail, corrupted data #3215

cousins · 2015-03-23T20:11:14Z

I had a drive (vdev id = 1-2) that was showing read errors. The OEM sent me a drive so I decided to manually replace the drive with the hot-spare (vdev id = 1-44) I had set up. zpool history shows:

2015-03-19.12:54:53 zpool offline pool3 1-2

--- physically replaced the drive ---

2015-03-19.14:19:17 zpool replace pool3 1-2 1-44
2015-03-23.15:28:50 zpool clear pool3
2015-03-23.15:32:46 zpool online pool3 1-2

Now the status looks like:

[root@nfs2 ~]# zpool status
  pool: pool3
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Mon Mar 23 15:32:36 2015
  2.04T scanned out of 126T at 1.08G/s, 32h42m to go
  0 repaired, 1.61% done
config:

    NAME            STATE     READ WRITE CKSUM
    pool3           ONLINE       0     0     0
      raidz2-0      ONLINE       0     0     0
        0-0         ONLINE       0     0     0
        0-1         ONLINE       0     0     0
        0-2         ONLINE       0     0     0
        0-24        ONLINE       0     0     0
        0-25        ONLINE       0     0     0
        1-0         ONLINE       0     0     0
        1-1         ONLINE       0     0     0
        spare-7     ONLINE       0     0     0
          1-2       UNAVAIL      0     0     0  corrupted data
          1-44      ONLINE       0     0     0
        1-24        ONLINE       0     0     0
        1-25        ONLINE       0     0     0
        1-26        ONLINE       0     0     0
      raidz2-1      ONLINE       0     0     0
        0-3         ONLINE       0     0     0
        0-4         ONLINE       0     0     0
        0-5         ONLINE       0     0     0
        0-26        ONLINE       0     0     0
        0-27        ONLINE       0     0     0
        1-3         ONLINE       0     0     0
        1-4         ONLINE       0     0     0
        1-5         ONLINE       0     0     0
        1-27        ONLINE       0     0     0
        1-28        ONLINE       0     0     0
        1-29        ONLINE       0     0     0
      raidz2-2      ONLINE       0     0     0
        0-6         ONLINE       0     0     0
        0-7         ONLINE       0     0     0
        0-8         ONLINE       0     0     0
        0-28        ONLINE       0     0     0
        0-29        ONLINE       0     0     0
        1-6         ONLINE       0     0     0
        1-7         ONLINE       0     0     0
        1-8         ONLINE       0     0     0
        1-30        ONLINE       0     0     0
        1-31        ONLINE       0     0     0
        1-32        ONLINE       0     0     0
      raidz2-3      ONLINE       0     0     0
        0-9         ONLINE       0     0     0
        0-10        ONLINE       0     0     0
        0-11        ONLINE       0     0     0
        0-30        ONLINE       0     0     0
        0-31        ONLINE       0     0     0
        1-9         ONLINE       0     0     0
        1-10        ONLINE       0     0     0
        1-11        ONLINE       0     0     0
        1-33        ONLINE       0     0     0
        1-34        ONLINE       0     0     0
        1-35        ONLINE       0     0     0
      raidz2-4      ONLINE       0     0     0
        0-12        ONLINE       0     0     0
        0-13        ONLINE       0     0     0
        0-14        ONLINE       0     0     0
        0-32        ONLINE       0     0     0
        0-33        ONLINE       0     0     0
        1-12        ONLINE       0     0     0
        1-13        ONLINE       0     0     0
        1-14        ONLINE       0     0     0
        1-36        ONLINE       0     0     0
        1-37        ONLINE       0     0     0
        1-38        ONLINE       0     0     0
      raidz2-5      ONLINE       0     0     0
        0-15        ONLINE       0     0     0
        0-16        ONLINE       0     0     0
        0-17        ONLINE       0     0     0
        0-34        ONLINE       0     0     0
        0-35        ONLINE       0     0     0
        1-15        ONLINE       0     0     0
        1-16        ONLINE       0     0     0
        1-17        ONLINE       0     0     0
        1-39        ONLINE       0     0     0
        1-40        ONLINE       0     0     0
        1-41        ONLINE       0     0     0
    logs
      mirror-6      ONLINE       0     0     0
        ssd0-part1  ONLINE       0     0     0
        ssd1-part1  ONLINE       0     0     0
    cache
      ssd0-part2    ONLINE       0     0     0
      ssd1-part2    ONLINE       0     0     0
    spares
      1-44          INUSE     currently in use

   errors: No known data errors

I have tried replacing 1-44 with 1-2 again:

[root@nfs2 ~]#  zpool replace pool3 1-44 1-2
cannot replace 1-44 with 1-2: can only be replaced by another hot spare

and I even tried removeing 1-2:

[root@nfs2 ~]#  zpool remove pool3 1-2
cannot remove 1-2: only inactive hot spares, cache, top-level, or log devices can be removed

I'm not sure what to do now. The goal is to get the new 1-2 to take over and put 1-44 back in its role as a hot spare. Can anyone advise?

Thanks,

Steve

The text was updated successfully, but these errors were encountered:

cousins · 2015-03-30T15:17:56Z

Anybody have any ideas on this?

I have another system that is very similar to this one that now has a bad drive in it (when scrubbing it comes up with 225 READ errors and 291 WRITE errors) yet the spare was not pulled in to replace it and all drives etc. are ONLINE. I'd like to replace the drive but I don't want to run into the same thing as before. Maybe I shouldn't use the spare at all? Should I do:

zpool offline pool2 mpathaf         # uses mpath names instead of vdev-id's
pull the drive
put in new drive
make sure multipath brings in the new drive
zpool replace pool2 mpathaf mpath??   # where mpath?? is the new mpath dev.
zpool online pool2 mpath??

Maybe the last two lines need to be switched?

Thanks,

Steve

cousins · 2015-03-31T20:54:00Z

Am I submitting this to the wrong place?

cousins · 2015-04-02T17:28:19Z

Resolved on the #xfsonlinux IRC. The answer to the main problem was to just do:

zpool replace pool3 1-2 1-2

it resilvered and put the spare back as a spare. Thanks @DeHackEd.

The answer to the general question of how to replace a drive in-place seems to be:

zpool offline pool_name disk_name
pull the drive
put new drive in
zpool replace pool_name disk_name new_disk_name

Depending on the environment there may be some steps between putting the new drive in and the replace command in order to have the new_disk_name be appropriate (using vdev_id.conf and/or multipath for instance) and it might be the same name as "disk_name". For instance mine are vdev-id's like 1-2 indicating enclosure 1, slot 2.

The down-side to this is that it degrades the array. A safer way to do it is to use a spare disk but first you need to remove the spare from the pools. My spare is called 1-44. So I would do:

zpool remove pool3 1-44
zpool remove pool4 1-44
zpool replace pool3 1-2 1-44

then once it is done I'm assuming 1-2 would be unassigned. I think I'd do:

zpool offline 1-2
pull the drive
put the new drive in

Then either make the new drive the spare or replace 1-44 with the new drive that is in 1-2 in order to keep things consistent and balanced out, in my case with SAS cards and four backplanes.

kernelOfTruth mentioned this issue Mar 28, 2015

Pool frozen (bad hdd), running processes stuck, zpool not working anymore #3233

Closed

cousins closed this as completed Apr 2, 2015

johnr14 mentioned this issue Apr 9, 2018

Replacing drive in a pool with multipath | EDIT: can offline and replace with non mpath disk; must export/import to reenable multipath #7413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

manually replaced drive with, now new drive = unavail, corrupted data #3215

manually replaced drive with, now new drive = unavail, corrupted data #3215

cousins commented Mar 23, 2015

cousins commented Mar 30, 2015

cousins commented Mar 31, 2015

cousins commented Apr 2, 2015

manually replaced drive with, now new drive = unavail, corrupted data #3215

manually replaced drive with, now new drive = unavail, corrupted data #3215

Comments

cousins commented Mar 23, 2015

cousins commented Mar 30, 2015

cousins commented Mar 31, 2015

cousins commented Apr 2, 2015