ZFS - Another Introduction

I don't think the world needs another intro/tutorial about ZFS. But writing about a subject helps me to think about it and particularly to organize those thoughts and put them in order and make connections. And I hope that this can help people who ... well, think like me.

History

ZFS began life as a file system for Solaris in 2001, where it gained a dedicated following. Wikipedia points out (as does OpenZFS's own lead-in) that it's actually a file system and volume manager - an important distinction, and part of what makes it so powerful. It was open-sourced in 2005: OpenZFS is distributed under the CDDL ("Common Development and Distribution License" from Sun). Oracle close-sourced it again after the acquisition of Sun in 2010, so the OpenZFS project is now an open, independent fork. The CDDL licensing leads to some interesting problems with the Linux port: as Wikipedia's CDDL page puts it, "The Free Software Foundation (FSF) considers it a free software license, but one which is incompatible with the GNU General Public License (GPL)." No Linux distribution can ship with ZFS, you have to download it (and build it!) later. And while it's highly recommended for your data partitions, support isn't entirely there for OS/boot partitions.

Installation

Installing ZFS on Debian was surprisingly easy (I would strongly recommend you NOT use Debian's documentation on the subject, as their instructions on creating new FSs are very poor for beginners). All that's needed is:

# apt update ; apt install zfsutils-linux

This installed seven packages (that will vary by system, depending on what you already have installed) and gave me a text-based warning that included: "You are going to build OpenZFS using DKMS in such a way that they are not going to be built into one monolithic binary." No work was required on my part: the package install did everything needed, although it did take a while for the build/install to finish.

Concepts

Before you create your first ZFS partition, it would be a good idea to understand the principles that ZFS functions under. For that, we have this highly recommended documentation: https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/ - yes, this is from 2012, and as a result there are some inaccuracies caused by changes over time (most notably on encryption and compression, and likely package names). But his documentation is still the recommended source for understanding the underlying concepts of the file system.

Create a Volume

Now that you've ignored my recommendation to go do some more reading, let's create a file system. First, look at the output of ls -l /dev/disk/by-path/ or better (in my opinion) ls -l /dev/disk/by-id/. You can use references like /dev/sdb7 or similar ... but as Linux occasionally changes the drive letters at boot, you're laying yourself open to major problems down the line. That's why it's recommended you use the ID or Path references to the drives, which shouldn't change.

So far, the only two ZFS commands I've found myself using are zpool for volume management, and zfs for file system management - although both have a plethora of subcommands.

I used two same-size partitions to create a mirrored set of volumes:

# zpool create share mirror ata-SAMSUNG_HD133SI_S1Y5J91S533453-part1 ata-ST31500340AS_9VS487N1-part7

Recommendations online say that you should use raw disks rather than partitions if possible - but I only read that after I set this up, and one of the two drives is significantly larger than the other so I could only have used one raw. This is nevertheless fairly good and gave me a mount point called /share that was a 1TB ZFS volume, mirrored. I'll add that "share" is a name of my choice, while a lot of people - kind of the de facto standard - call that volume "tank." Your "tank" can also be subdivided to host several file systems, but aside from using mirroring, I chose the simplest case and just used all the space as one FS.

Destroy a Volume

Because I initially read the bad Debian documentation, I ended up with a messed up set of file systems and volumes (the zpool command given above is reasonably accurate, and not the cause of the problems). That left me needing to start over. To remove a file system, there's the zfs destroy ... command, but I wanted to get rid of the entire ZFS volume, not just the file system on it. For that we have zpool destroy .... Let's take a look at the setup left by Debian's complex and unclear guidance (and get rid of it):

root@zserver:/tank# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank   928G   178K   928G        -         -     0%     0%  1.00x    ONLINE  -
root@zserver:/tank# zfs list
NAME         USED  AVAIL     REFER  MOUNTPOINT
tank         178K   899G       24K  /tank
tank/share    24K   899G       24K  /share
root@zserver:/tank# zfs unmount tank/share
root@zserver:/tank# cd ..
root@zserver:/# zfs unmount tank
root@zserver:/# zpool destroy tank

Notice I didn't use Linux's standard umount command: it appears it would work (I haven't tested), but not all Linux utilities will react quite as you might expect to ZFS volumes. The one that got me is lsblk: it shows the partitions and disks that ZFS is on, but not the mount points it creates. It took me a while to realize that this actually makes sense: lsblk looks for block devices, not managed volumes, and the block devices used to create the managed volumes aren't directly mounted disk partitions so lsblk isn't aware of them. If I were more familiar with LVM or Software RAID I'd be used to this, but until now I've used them very little.

Features

ZFS has a huge feature set. My commentary here is based almost entirely on reading (and probably not enough of that) rather than personal experience. Encryption is kind of expected these days, but good that they have it. On-the-fly compression is very cool - particularly given that everyone says it's "cheap" (low memory use and very fast), so you should switch it on. De-duplication - which is slow and expensive, and no one seems to recommend it. Snapshots - again, very cheap and highly recommended. Quotas. RAID-equivalents to striping and mirroring and pretty much any mix of these. End-to-end checksumming - although this is probably only useful if you have redundant disks, but if you do it appears to be better than standard RAID. And best of all, the send and receive commands, which allow you to send snapshots to an image file or to another system to be unpacked as an FS there, greatly simplifying backups.

The zpool scrub ... command replaces the fsck family of commands - and it can act on a live file system, which fsck cannot do. That's a huge step forward.

All of which sounds great. Are there problems? Yes. The more full the file system is, the less efficiently ZFS behaves. The number I remember is 80% (I could be wrong about this, don't take it as gospel): if your FS is more than 80% full, ZFS will behave progressively more poorly. As previously mentioned, it's not really ready to be your root partition in Linux (particularly given that Linux-based OSes can't be shipped with it). And finally, ZFS really, really likes to have some memory to play in. I think the ZFS docs said it was okay in 2G of memory, but general opinion is that you really want 8G of memory if you're using ZFS. While this is fairly common on machines these days, it isn't necessarily true on that box you had sitting around that you decided to turn into a NAS ...

Two things I've noticed that I think qualify more as "quirks" rather than "problems:" pools can only be grown, they can't be shrunk. And there's the issue of space sharing between volumes in a pool: because every volume in a pool claims to have the space of the full pool, and only the used space of the files on its own volume - not mentioning the space lost to the files on other sharing volumes. I find this deeply misleading.

A final disappointing additional note: I thought I was going to attach a couple USB drives to a Raspberry Pi, and then mirror them with ZFS and use it as a network share. First, default Raspberry Pi OS balks at installing the "zfsutils-linux" package, claiming it doesn't exist. Second, Jeff Geerling says "ZFS does not enjoy USB drives, though it can work on them. I wouldn't really recommend ZFS for the Pi 4 model B or other Pi models that can't use native SATA, NVMe, or SAS drives." Since Geerling seems to be obsessively testing everything related to the Raspberry Pi, I listen when he says things like this.

With all these limitations in mind, I'm planning on making this my FS of choice for all (non-Raspberry Pi) data partitions going forward.