capablanca’s three zpools were a mixture of WD green and red disks:
archive
had one Green, one Red diskpublic_storage
had one Green, one Red diskstorage
had one Red disk.Yes, capablanca has all Western Digital disks - everyone has their favourite disk brand and mine for the longest time was WD. I’m inclined towards picking HGST or Toshiba these days due to Backblaze’s data but the current array of WD disks are still 100%.
However, I have had three WD Green disk failures over the years. Two had actually failed within 2 weeks of each other. Even if it is unscientific, I don’t trust them anymore.
I had done a fair amount of reorganizing my files (one step at a
time of course) and had moved / deduplicated everything on the
public_storage
pool.
The plan was then to shuffle the disks around: I wanted to move the Red disk
that public_storage
was using into the archive
pool and remove the
Green disk.
Another objective of this shuffle was to switch to 4k blocks with ZFS. All my pools were using the smaller 512 byte sector size and I wanted to switch that.
Yet another objective was to use /dev/diskid
instead of GPT labels. I
wanted the same functionality as Linux’s /dev/disk/by-id
directory.
So, let the great shuffle begin. The process was a little like the riddle where the farmer has to take lettuce, a rabbit, and a wolf across the river…
The first step was disk identification:
# glabel list | grep -B 3 zfs
Geom name: ada0
Providers:
1. Name: label/zfs2
Geom name: ada2
Providers:
1. Name: label/zfs1
Geom name: ada3
Providers:
1. Name: label/zfs4
Geom name: ada4
Providers:
1. Name: label/zfs3
Geom name: ada5
Providers:
1. Name: label/zfs5
# camcontrol identify /dev/ada0
serial number WD-WMC300563174
# camcontrol identify /dev/ada2
serial number WD-WMAZA9460416
# camcontrol identify /dev/ada3
serial number WD-WCC4M2773993
# camcontrol identify /dev/ada4
serial number WD-WMC300578369
# camcontrol identify /dev/ada5
serial number WD-WMAZA0686209
The zpools had the following disk configuration:
archive:
zfs1 -> ada2 -> WD-WMAZA9460416 (WDC WD20EARX)
zfs2 -> ada0 -> WD-WMC300563174 (WDC WD20EFRX)
storage:
zfs3 -> ada4 -> WD-WMC300578369 (WDC WD20EFRX)
public_storage:
zfs4 -> ada3 -> WD-WCC4M2773993 (WDC WD20EFRX)
zfs5 -> ada5 -> WD-WMAZA0686209 (WDC WD20EARS)
where:
Here’s the plan:
public_storage
zpool so the disks WCC4M2773993
and WMAZA0686209
can be usedtank
using WCC4M2773993
(Red disk)WMAZA0686209
(Green disk) as an ashift=9
disk to archive
(archive
then contains 3 disks)/archive
to /tank
WMC300563174
(Red disk) from archive
, zero it out (archive
then contains 2 disks)WMC300563174
(Red disk) to tank
as a two disk mirrorarchive
to oldarchive
and rename tank
to archive
First, I destroyed the public_storage
pool:
# zpool destroy public_storage
Now the disks were available for use.
For both disks, I zeroed out disk partiion, zfs, and glabel information. In my case:
# geli detach /dev/label/zfs5
# dd if=/dev/zero of=/dev/label/zfs5 bs=1m count=10
# dd if=/dev/zero of=/dev/ada5 bs=1m seek=1907719
This gives ZFS the whole disk.
Now, /dev/diskid
has the same functionality as /dev/disk/by-id
in
Linux: entries show up as /dev/diskid/DISK-WD-WCC4M2773993
in the
case of one of the Red disks. But entries in /dev/diskid
have the odd quirk of
disappearing when referred to by another name. For example, if I used
/dev/ada3
instead of /dev/diskid/DISK-WD-WCC4M2773993
, then the
DISK-WD-WCC4M2773993
entry would disappear and I would need to reboot.
So, I rebooted, then took the Red disk from the public_storage
pool and made the tank
zpool:
# zpool create tank /dev/diskid/DISK-WD-WCC4M2773993.eli
# zpool status tank
pool: tank
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
diskid/DISK-WD-WCC4M2773993.eli ONLINE 0 0 0
errors: No known data errors
I verified that tank
is using the correct sector size:
# zdb -C tank | grep ashift
ashift: 12
Note that this requires using a sector size of 4096 with geli init
.
Next, I added public_storage
’s Green disk WMAZA0686209
to archive
,
and waited for the resilver:
# zpool attach archive label/zfs1.eli /dev/diskid/DISK-WD-WMAZA0686209.eli
# zpool status archive
pool: archive
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Dec 13 00:35:38 2016
3.48M scanned out of 1.30T at 396K/s, (scan is slow, no estimated time)
3.28M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
archive ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/zfs1.eli ONLINE 0 0 0
label/zfs2.eli ONLINE 0 0 0
diskid/DISK-WD-WMAZA0686209.eli ONLINE 0 0 0 (resilvering)
Once done, I rsynced all the data from /archive
to /tank
. Those
following along: don’t skip this step, it’s important :)
Then, I removed zfs2 (the Red disk) from archive
:
# zpool offline archive label/zfs2.eli
# zpool detach archive label/zfs2.eli
# zpool status archive
pool: archive
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
scan: resilvered 1.30T in 10h20m with 0 errors on Tue Dec 13 10:56:27 2016
config:
NAME STATE READ WRITE CKSUM
archive ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/zfs1.eli ONLINE 0 0 0
diskid/DISK-WD-WMAZA0686209.eli ONLINE 0 0 0
At this point, using /dev/label/zfs2
has made the corresponding
/dev/diskid
entry disappear, so I rebooted.
I attached WMC300563174
(zfs2) to tank
:
# zpool attach tank diskid/DISK-WD-WCC4M2773993.eli diskid/DISK-WD-WMC300563174.eli
# zpool status tank
pool: tank
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Dec 14 13:00:29 2016
4.97M scanned out of 1.51T at 1018K/s, 442h5m to go
4.67M resilvered, 0.00% done
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-WD-WCC4M2773993.eli ONLINE 0 0 0
diskid/DISK-WD-WMC300563174.eli ONLINE 0 0 0 (resilvering)
errors: No known data errors
After the resilver, zpool status
showed:
pool: archive
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
scan: resilvered 1.30T in 10h20m with 0 errors on Tue Dec 13 10:56:27 2016
config:
NAME STATE READ WRITE CKSUM
archive ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/zfs1.eli ONLINE 0 0 0
diskid/DISK-WD-WMAZA0686209.eli ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: resilvered 1.51T in 5h19m with 0 errors on Wed Dec 14 18:19:35 2016
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-WD-WCC4M2773993.eli ONLINE 0 0 0
diskid/DISK-WD-WMC300563174.eli ONLINE 0 0 0
errors: No known data errors
Resilvering tank
took half the time for the exact same data! Possible
explanations for this include:
I didn’t investigate further.
At this point /archive
and /tank
contained the same data. I renamed
archive
to oldarchive
, and tank
to archive
by
simply importing it with a different name:
# zpool export archive
# zpool export tank
# zpool import archive oldarchive
# zpool import tank archive
# zpool status
pool: archive
state: ONLINE
scan: resilvered 1.51T in 5h19m with 0 errors on Wed Dec 14 18:19:35 2016
config:
NAME STATE READ WRITE CKSUM
archive ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
diskid/DISK-WD-WCC4M2773993.eli ONLINE 0 0 0
diskid/DISK-WD-WMC300563174.eli ONLINE 0 0 0
errors: No known data errors
pool: oldarchive
state: ONLINE
status: The pool is formatted using a legacy on-disk format. The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
pool will no longer be accessible on software that does not support feature
flags.
scan: resilvered 1.30T in 10h20m with 0 errors on Tue Dec 13 10:56:27 2016
config:
NAME STATE READ WRITE CKSUM
oldarchive ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/zfs1.eli ONLINE 0 0 0
diskid/DISK-WD-WMAZA0686209.eli ONLINE 0 0 0
All that is left to do now is destroy oldarchive
, and either repurpose
those disks for something else or recycle them.