4

I have an 8-bay NAS running Fedora 29 (kernel version 4.20.8) and zfs version 0.7.12. All of the drive bays are used for a zfs pool named “tank.” Here is the zpool layout:

tank
  mirror-0
    sda
    sdb
  mirror-1
    sdc
    sdd
  mirror-2
    sde
    sdf
  cache
    sdg
  spare
    sdh

One of the drives (sdb) is failing SMART tests with uncorrected offline reallocated sectors, but still shows as “ONLINE” by ‘zpool status.’ I want to physically replace sdb with a new drive (sdi).

Since there are no available physical bays, I plan to use the following to replace the drive:

 zpool offline tank sdb
 zpool replace tank sdb sdh
 zpool detach tank sdb
 echo 1 | sudo tee /sys/block/sdb/device/delete
 # Remove the physical hard drive associated with sdb and plug in new physical drive mapped to sdi

I do not know how to best proceed from here. Is it better to: (a) just add sdi as a new spare (leaving sdh as a permanent replacement for sdb) zpool add tank spare sdi (b) replace sdh with sdi and have sdh go back the spare drive pool?

 zpool replace tank sdi sdh
 zpool detach tank sdh

In this case “better” means less administrative complexity going forward (e.g. if sdh goes bad when applying option (a), would a ‘detach,’ or other, command fail or produce unexpected results since sdh used to be a spare?)? Also, I’m uncertain if I am missing/ incorrect in steps under option (b).

Notes:

  1. pool names simplified (e.g vdevs are mapped ids)
  2. know the kernel/ zfs are ancient, but solving this failing drive ahead of upgrade
  3. Controller card and bays support hot swapping
  4. Searched and read topics on ZoL disk replacement as well as Oracle’s docs, but haven’t seen a topic on best practice (sorry if I missed it)
2
  • Is there a raid active on hardware level?
    – Turdie
    Nov 12 at 20:50
  • No raid at the hardware level. Thank you.
    – user489879
    Nov 12 at 20:58

2 Answers 2

2

There's no difference between the options when all is said and done — drives are interchangeable. "Just add sdi as the new spare" requires fewer steps and minimizes the amount of time that you spend in a resyncing state, so it's the natural choice.

1
  • Marking this as the answer as it works, though prototyped a file-based zpool to confirm subsequent maintenance (at least for replacement of a failed in-use spare) wasn’t impacted by this approach. Thank you for the time.
    – user489879
    Nov 17 at 13:06
2

First of all create a good backup.

Then

1:(Offline the failing drive:) zpool offline tank sdb

2:(Replace sdb with sdi:) zpool replace tank sdb sdi

3:Detach the old drive (sdh): zpool detach tank sdh

3
  • Thank you, and that is the process I mentioned in my original question (though I did not include the pool name in the command syntax…updated that above). My question was about what follows the removal of the disk (ie leave spare as permanent replacement and add new disk as spare or temporarily use spare with procedure to replace the faulty disk and restore spare pool with the original spare). Please note that sdh is the newly purchased disk (sdb is the faulty disk).
    – user489879
    Nov 12 at 21:33
  • 2
    Also, please note that there is no open bay (so I can’t do a straightforward zpool replace tank sdb sdi)
    – user489879
    Nov 12 at 21:45
  • There is some confusion (typo(s)?) going on with this answer. Step 3 is detaching sdh but sdh wasn't referenced in the previous steps. Step 2 is using sdi but sdi can't exist until the failing drive is replaced, because there isn't a drive bay for it. And the question covers/explains what they are going to do about that anyways (use the spare), they are wondering what to do after that stage...
    – mbrig
    Nov 13 at 19:50

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .