GELI fail (or how I almost lost capablanca's zpools)
Tags: [freebsd] [capablanca] [zfs] [defcon 1]
Published: 27 Jan 2017 09:21

After the last post I decided to perform a FreeBSD upgrade on capablanca. This went smoothly and I performed the standard zpool export, geli detach, and reboot. When it came time to bring the zpools back, however, the shell script that brings up each zpool stopped at the first disk:

+ geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/diskid/DISK-WD-WCC4M2773993
Enter passphrase: 
geli: Wrong key for diskid/DISK-WD-WCC4M2773993

I retried the passphrase many times and nothing worked. Not something you ever want to see.

Luckily the second disk worked!

NAME                                 STATE     READ WRITE CKSUM
archive                              DEGRADED     0     0     0
  mirror-0                           DEGRADED     0     0     0
    11764393544983926301             UNAVAIL      0     0     0  was /dev/diskid/DISK-WD-WCC4M2773993.eli
    diskid/DISK-WD-WMC300563174.eli  ONLINE       0     0     0

I was saved by the two disk mirror. I went through my original process of reinitializing:

# zpool detach archive diskid/DISK-WD-WCC4M2773993.eli
# geli init -l 256 -s 4096 -K /root/geli_key_WD-WCC4M2773993 /dev/diskid/DISK-WD-WCC4M2773993
# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/diskid/DISK-WD-WCC4M2773993
# zpool attach archive diskid/DISK-WD-WMC300563174.eli diskid/DISK-WD-WCC4M2773993.eli

Everything looked OK:

  pool: archive
 state: ONLINE
  scan: resilvered 1.55T in 11h7m with 0 errors on Wed Jan 11 11:41:24 2017
config:

        NAME                                 STATE     READ WRITE CKSUM
        archive                              ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            diskid/DISK-WD-WMC300563174.eli  ONLINE       0     0     0
            diskid/DISK-WD-WCC4M2773993.eli  ONLINE       0     0     0

But, the original geli attach failure started to make me feel uneasy. This has never happened before - FreeBSD has always been solid for me.

I started repurposing the disks of oldarchive (see previous post) by adding to the archive and storage pools:

# geli init -l 256 -s 4096 -K /root/geli_key_WD-WMAZA0686209 /dev/diskid/DISK-WD-WMAZA0686209
# geli attach -k /root/geli_key_WD-WMAZA0686209 /dev/diskid/DISK-WD-WMAZA0686209
# cat /dev/zero > /dev/diskid/DISK-WD-WMAZA0686209.eli
# zpool attach archive diskid/DISK-WD-WMC300563174.eli diskid/DISK-WD-WMAZA0686209.eli

This went OK as well:

  pool: archive
 state: ONLINE
  scan: resilvered 1.55T in 6h15m with 0 errors on Wed Jan 11 23:28:10 2017
config:

        NAME                                 STATE     READ WRITE CKSUM
        archive                              ONLINE       0     0     0
          mirror-0                           ONLINE       0     0     0
            diskid/DISK-WD-WMC300563174.eli  ONLINE       0     0     0
            diskid/DISK-WD-WCC4M2773993.eli  ONLINE       0     0     0
            diskid/DISK-WD-WMAZA0686209.eli  ONLINE       0     0     0

But I decided to test detaching and re-attaching each disk individually before fully detaching every one. If the re-attachment of each disk went OK, I would feel more comfortable.

# geli detach /dev/diskid/DISK-WD-WCC4M2773993
# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/diskid/DISK-WD-WCC4M2773993
Enter passphrase:
# 

# geli detach /dev/diskid/DISK-WD-WMC300563174
# geli attach -k /root/geli_key_WD-WMC300563174 /dev/diskid/DISK-WD-WMC300563174
Enter passphrase:
geli: Wrong key for diskid/DISK-WD-WMC300563174
#

It happened again!

  pool: archive
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 1.55T in 6h15m with 0 errors on Wed Jan 11 23:28:10 2017
config:

    NAME                                 STATE     READ WRITE CKSUM
    archive                              DEGRADED     0     0     0
      mirror-0                           DEGRADED     0     0     0
        7825631058441138981              UNAVAIL      0     0     0  was /dev/diskid/DISK-WD-WMC300563174.eli
        diskid/DISK-WD-WCC4M2773993.eli  ONLINE       0     0     0
        diskid/DISK-WD-WMAZA0686209.eli  ONLINE       0     0     0

and WMC300563174 was the disk that saved the zpool last time!

I didn’t know what to think. Was I using these tools wrong all these years? I didn’t trust myself anymore.

Before doing anything else, I went out and bought a 2TB external HDD, formatted it as LUKS + ext4, and rsynced all the data over just to be safe.

I started to run through the potential reasons for the passphrase not working:

I was giving GELI the whole disk - was it erasing its own metadata? Unlikely, that’d be a pretty bad bug.
I was using keyfiles plus a passphrase when performing the GELI init. Was my zroot hard drive returning bad data? Doing a zpool scrub showed no problems.
Was WMC300563174 OK before it was the target that WCC4M2773993 attached to (as part of zpool attach)? Afterward the attach, the GELI passphrase didn’t work. Did this mean anything? Was it a coincidence?

I tried reproducing these errors. I created a fake GELI disk and tried to detach + attach a few times:

# zfs create -V 1G archive/testdisk
# geli init -l 256 -s 4096 -K /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
Enter passphrase:
# 
# geli detach /dev/zvol/archive/testdisk
# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
Enter passphrase:
#

OK, so what about the theory that GELI was erasing its own metadata?

# cat /dev/zero > /dev/zvol/archive/testdisk.eli
# geli detach /dev/zvol/archive/testdisk
# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
Enter passphrase: 
#

Didn’t seem like it, and again that would be a pretty bad bug.

I tried GELI + zpool:

# zfs create -V 100M archive/testdisk

# geli init -l 256 -s 4096 -K /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk 
Enter new passphrase: 
Reenter new passphrase: 

Metadata backup can be found in /var/backups/zvol_archive_testdisk.eli and
can be restored with the following command:

    # geli restore /var/backups/zvol_archive_testdisk.eli /dev/zvol/archive/testdisk

# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
Enter passphrase: 

# zpool create test /dev/zvol/archive/testdisk.eli

# zpool status test
  pool: test
 state: ONLINE
  scan: none requested
config:

    NAME                         STATE     READ WRITE CKSUM
    test                         ONLINE       0     0     0
      zvol/archive/testdisk.eli  ONLINE       0     0     0

errors: No known data errors

# dd if=/dev/random of=/test/random
dd: /test/random: No space left on device
97282+0 records in
97281+0 records out
49807872 bytes transferred in 9.445392 secs (5273246 bytes/sec)

# zpool export test

# geli detach /dev/zvol/archive/testdisk

# geli attach -k /root/geli_key_WD-WCC4M2773993 /dev/zvol/archive/testdisk
Enter passphrase: 
#

# zpool import test
# ls /test/
random

Seems OK, however this test was not performed on real hardware.

So - nothing definitive.

I decided to change a few things at once (based only on my theories about what when wrong and not on anything concrete):

partition the disk before handing it to GELI
not using a key file

There’s an additional two reasons for partitioning the disk: I wanted to hand GELI an aligned partition which it could then lay 4k sectors on, and I wanted to use GPT labels instead of diskid. GPT labels are stored in the partition metadata so they will not be overwritten by processes given only the partition to work with. I also didn’t like the fragility of diskid - one wrong use of /dev/adX and they disappear.

I also wanted to reduce the potential for losing the zpools due to being unable to access the key files. What if zroot disappeared?

Here’s the procedure:

remove existing partition and metadata using dd
gpart create -s gpt disk
gpart add -a 4k -t freebsd-zfs disk
gpart show disk
geli init -s 4096 -l 256 diskp1

Example:

# zpool detach archive diskid/DISK-WD-WCC4M2773993.eli
# geli detach diskid/DISK-WD-WCC4M2773993.eli
# glabel status | grep WCC4M2773993
               diskid/DISK-WD-WCC4M2773993     N/A  ada3
# dd if=/dev/zero of=/dev/ada3 bs=1m count=10
10+0 records in
10+0 records out
10485760 bytes transferred in 0.112651 secs (93081728 bytes/sec)
# dd if=/dev/zero of=/dev/ada3 bs=1m seek=1907719
dd: /dev/ada3: short write on character device
dd: /dev/ada3: end of device
11+0 records in
10+1 records out
10575872 bytes transferred in 0.153967 secs (68689256 bytes/sec)
# gpart create -s gpt ada3
ada3 created
# gpart add -a 4k -t freebsd-zfs -l WCC4M2773993p1 ada3
ada3p1 added
# geli init -s 4096 -l 256 /dev/gpt/WCC4M2773993p1 
Enter new passphrase: 
Reenter new passphrase: 

Metadata backup can be found in /var/backups/gpt_WCC4M2773993p1.eli and
can be restored with the following command:

    # geli restore /var/backups/gpt_WCC4M2773993p1.eli /dev/gpt/WCC4M2773993p1

# geli attach /dev/gpt/WCC4M2773993p1
Enter passphrase: 
# zpool attach archive diskid/DISK-WD-WMAZA0686209.eli /dev/gpt/WCC4M2773993p1.eli

After doing this with all the disks, zpool status looks like:

# zpool status archive
  pool: archive
 state: ONLINE
  scan: scrub repaired 0 in 8h37m with 0 errors on Tue Jan 17 08:45:57 2017
config:

    NAME                        STATE     READ WRITE CKSUM
    archive                     ONLINE       0     0     0
      mirror-0                  ONLINE       0     0     0
        gpt/WMC300563174p1.eli  ONLINE       0     0     0
        gpt/WCC4M2773993p1.eli  ONLINE       0     0     0
        gpt/WMAZA0686209p1.eli  ONLINE       0     0     0

errors: No known data errors

So test it:

# zpool export archive

# geli detach gpt/WMC300563174p1.eli
# geli attach gpt/WMC300563174p1
Enter passphrase: 
#
# geli detach gpt/WCC4M2773993p1.eli
# geli attach gpt/WCC4M2773993p1
Enter passphrase: 
# 
# geli detach gpt/WMAZA0686209p1.eli
# geli attach gpt/WMAZA0686209p1
Enter passphrase: 
# 
# zpool import archive

All the reattachments worked. By repurposing the 2 disks from oldarchive, archive is now a 3 disk mirror, and storage is a 2 disk mirror.

This configuration has survived a few reboots since, and due to that I’m confident again in this setup. However this has shaken my total confidence in either FreeBSD or my understanding of the tools. Given that capablanca is my primary file server, drives dying or zpools going away could mean actual data loss if the newest data hasn’t been replicated yet.

I still don’t know why this happened. I’ve been using the configuration of whole disk GELI + zpool since before 2013 without problems. I don’t think it was the FreeBSD update that did anything but I can’t rule it out completely. It was too coincidental - I updated and shortly after saw this.

Unfortunately none of my attempts to reproduce the problem have yielded anything.