Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manually replaced drive with, now new drive = unavail, corrupted data #3215

Closed
cousins opened this issue Mar 23, 2015 · 3 comments
Closed

manually replaced drive with, now new drive = unavail, corrupted data #3215

cousins opened this issue Mar 23, 2015 · 3 comments

Comments

@cousins
Copy link

cousins commented Mar 23, 2015

I had a drive (vdev id = 1-2) that was showing read errors. The OEM sent me a drive so I decided to manually replace the drive with the hot-spare (vdev id = 1-44) I had set up. zpool history shows:

2015-03-19.12:54:53 zpool offline pool3 1-2

--- physically replaced the drive ---

2015-03-19.14:19:17 zpool replace pool3 1-2 1-44
2015-03-23.15:28:50 zpool clear pool3
2015-03-23.15:32:46 zpool online pool3 1-2

Now the status looks like:

[root@nfs2 ~]# zpool status
  pool: pool3
 state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Mon Mar 23 15:32:36 2015
  2.04T scanned out of 126T at 1.08G/s, 32h42m to go
  0 repaired, 1.61% done
config:

    NAME            STATE     READ WRITE CKSUM
    pool3           ONLINE       0     0     0
      raidz2-0      ONLINE       0     0     0
        0-0         ONLINE       0     0     0
        0-1         ONLINE       0     0     0
        0-2         ONLINE       0     0     0
        0-24        ONLINE       0     0     0
        0-25        ONLINE       0     0     0
        1-0         ONLINE       0     0     0
        1-1         ONLINE       0     0     0
        spare-7     ONLINE       0     0     0
          1-2       UNAVAIL      0     0     0  corrupted data
          1-44      ONLINE       0     0     0
        1-24        ONLINE       0     0     0
        1-25        ONLINE       0     0     0
        1-26        ONLINE       0     0     0
      raidz2-1      ONLINE       0     0     0
        0-3         ONLINE       0     0     0
        0-4         ONLINE       0     0     0
        0-5         ONLINE       0     0     0
        0-26        ONLINE       0     0     0
        0-27        ONLINE       0     0     0
        1-3         ONLINE       0     0     0
        1-4         ONLINE       0     0     0
        1-5         ONLINE       0     0     0
        1-27        ONLINE       0     0     0
        1-28        ONLINE       0     0     0
        1-29        ONLINE       0     0     0
      raidz2-2      ONLINE       0     0     0
        0-6         ONLINE       0     0     0
        0-7         ONLINE       0     0     0
        0-8         ONLINE       0     0     0
        0-28        ONLINE       0     0     0
        0-29        ONLINE       0     0     0
        1-6         ONLINE       0     0     0
        1-7         ONLINE       0     0     0
        1-8         ONLINE       0     0     0
        1-30        ONLINE       0     0     0
        1-31        ONLINE       0     0     0
        1-32        ONLINE       0     0     0
      raidz2-3      ONLINE       0     0     0
        0-9         ONLINE       0     0     0
        0-10        ONLINE       0     0     0
        0-11        ONLINE       0     0     0
        0-30        ONLINE       0     0     0
        0-31        ONLINE       0     0     0
        1-9         ONLINE       0     0     0
        1-10        ONLINE       0     0     0
        1-11        ONLINE       0     0     0
        1-33        ONLINE       0     0     0
        1-34        ONLINE       0     0     0
        1-35        ONLINE       0     0     0
      raidz2-4      ONLINE       0     0     0
        0-12        ONLINE       0     0     0
        0-13        ONLINE       0     0     0
        0-14        ONLINE       0     0     0
        0-32        ONLINE       0     0     0
        0-33        ONLINE       0     0     0
        1-12        ONLINE       0     0     0
        1-13        ONLINE       0     0     0
        1-14        ONLINE       0     0     0
        1-36        ONLINE       0     0     0
        1-37        ONLINE       0     0     0
        1-38        ONLINE       0     0     0
      raidz2-5      ONLINE       0     0     0
        0-15        ONLINE       0     0     0
        0-16        ONLINE       0     0     0
        0-17        ONLINE       0     0     0
        0-34        ONLINE       0     0     0
        0-35        ONLINE       0     0     0
        1-15        ONLINE       0     0     0
        1-16        ONLINE       0     0     0
        1-17        ONLINE       0     0     0
        1-39        ONLINE       0     0     0
        1-40        ONLINE       0     0     0
        1-41        ONLINE       0     0     0
    logs
      mirror-6      ONLINE       0     0     0
        ssd0-part1  ONLINE       0     0     0
        ssd1-part1  ONLINE       0     0     0
    cache
      ssd0-part2    ONLINE       0     0     0
      ssd1-part2    ONLINE       0     0     0
    spares
      1-44          INUSE     currently in use

   errors: No known data errors

I have tried replacing 1-44 with 1-2 again:

[root@nfs2 ~]#  zpool replace pool3 1-44 1-2
cannot replace 1-44 with 1-2: can only be replaced by another hot spare

and I even tried removeing 1-2:

[root@nfs2 ~]#  zpool remove pool3 1-2
cannot remove 1-2: only inactive hot spares, cache, top-level, or log devices can be removed

I'm not sure what to do now. The goal is to get the new 1-2 to take over and put 1-44 back in its role as a hot spare. Can anyone advise?

Thanks,

Steve

@cousins
Copy link
Author

cousins commented Mar 30, 2015

Anybody have any ideas on this?

I have another system that is very similar to this one that now has a bad drive in it (when scrubbing it comes up with 225 READ errors and 291 WRITE errors) yet the spare was not pulled in to replace it and all drives etc. are ONLINE. I'd like to replace the drive but I don't want to run into the same thing as before. Maybe I shouldn't use the spare at all? Should I do:

zpool offline pool2 mpathaf         # uses mpath names instead of vdev-id's
pull the drive
put in new drive
make sure multipath brings in the new drive
zpool replace pool2 mpathaf mpath??   # where mpath?? is the new mpath dev.
zpool online pool2 mpath??

Maybe the last two lines need to be switched?

Thanks,

Steve

@cousins
Copy link
Author

cousins commented Mar 31, 2015

Am I submitting this to the wrong place?

@cousins
Copy link
Author

cousins commented Apr 2, 2015

Resolved on the #xfsonlinux IRC. The answer to the main problem was to just do:

zpool replace pool3 1-2 1-2 

it resilvered and put the spare back as a spare. Thanks @DeHackEd.

The answer to the general question of how to replace a drive in-place seems to be:

zpool offline pool_name disk_name
pull the drive
put new drive in
zpool replace pool_name disk_name new_disk_name

Depending on the environment there may be some steps between putting the new drive in and the replace command in order to have the new_disk_name be appropriate (using vdev_id.conf and/or multipath for instance) and it might be the same name as "disk_name". For instance mine are vdev-id's like 1-2 indicating enclosure 1, slot 2.

The down-side to this is that it degrades the array. A safer way to do it is to use a spare disk but first you need to remove the spare from the pools. My spare is called 1-44. So I would do:

zpool remove pool3 1-44
zpool remove pool4 1-44
zpool replace pool3 1-2 1-44

then once it is done I'm assuming 1-2 would be unassigned. I think I'd do:

zpool offline 1-2
pull the drive
put the new drive in

Then either make the new drive the spare or replace 1-44 with the new drive that is in 1-2 in order to keep things consistent and balanced out, in my case with SAS cards and four backplanes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant