I recently came across a magical little project called buttervolume, a Docker Volume plugin that manages your Docker (and potentially Podman) Volumes as BTRFS subvolumes. It solves a problem I’ve had with OCI container hosted workloads for years - enabling migration of centralized storage volumes between discrete systems.
VM orchestrators handle volume migration better…
…or I gues more accurately, they can actually do it. I’ve been hosting services for almost my entire career. I cut my teeth on Linux and web server hosting at a local ISP and hosting provider just out of high school. And yet, I still remember the magic that I experienced when I had my first VMWare vSphere deployment. Not only was it possible to easily host multiple servers on the same system, it was possible to move those virtual servers between hosts. Gone were the days when hardware maintenance was tied to extended down times. Server migration between hosts was a quick and easy process. And it was good.
And then those new fangled things called OCI containers started to become a thing. Now we could easily ship container images, and entire application stack configurations directly to a system. It was cool! However, OCI container orchestration doesn’t have great tooling for stateful workload portability. Horizontally scaling stateless applications across nodes is a first class citizen. Moving stateful systems though, such as database servers, is a frustrating exercise.
Most OCI container orchestrators, for all their fantastic benefits, assume that if you want to migrate stateful workloads between systems, you’re using decentralized or hyper-converged storage. At the enterprise and hyperscaler level, that’s a fair assumption. However at the small to medium business level, including us self-hosters, decentralized or hyper-converged storage platforms are probably a level of complexity in your infrastructure that gives you very little for a significant increase in cost and management burden. The only OCI container orchestrator that I’m aware of that can handle volume migration is Incus. As an aside, I was working on a project to add some of the service and configuration discovery portions I depend on. That ended up being more project than I could reasonably take on, however, and progress stalled. I still very much so like the Incus project, and would love to find the right project to use it.
Container stacks in VMs - Great for portability, bad for maintenance
When I first started leveraging OCI containers, I treated them essentially as convenient methods for deploying appliances. I started with a One VM Per Application Stack design, where an application and its dependent services were all colocated on one system. This simplified maintenance orchestration for the physical hosts, but came with a great deal of management overhead. At the time, Portainer was in its infancy, and would have quickly shot this idea in the foot with its per-host licensing costs. Very few other Docker orchestrators existed, and many of them have been abandoned. We were in the death throws of Swarm V1, and companies like Rancher, who built their original product atop Docker, had already swapped over to Kubernetes orchestration.
This meant that I was doing container and OS maintenance either by hand or with Ansible. Additionally, I was paying the resource cost of multiple VMs in both efficiency and administration. VMWare was doing a good job of coalescing identical RAM pages, but I was still dealing with a litany of small, 2 core, 2 GB of RAM virtual machines running their own kernel, all of which needed OS and application patches. So when I started to play around with a new hosting model in my homelab, I decided to eschew all of that, and run my applications on bare metal.
Container stacks colocated - great deployment and maintenance experience, at the cost of host maintenance
The workflow efficiencies at the application layer were astounding. I could now take advantage of awesome container orchestrators like Portainer, and eventually Dockhand (the player I’m the most interested in these days). I could start leveraging GitOps for my application stack deployments. Traefik’s Docker-powered service discovery allowed me to keep my application router config colocated with the rest of my stack config, giving me TLS for every service I hosted. It was great! Right up until the services in the homelab became “home prod”.
If we’re being honest with ourselves, I could suffer some downtime for regular maintenance like OS patching. No one was looking at me to provide five-nines of uptime. And hey, if barely one-nine of uptime is good enough for GitHub, it should be good enough for any of us, right? That said, all it took was one maintenance event going sideways to violate the partner approval SLO for me to pine for the days where moving workloads between hosts was easy again.
Gotta be a way to have my cake and eat it too though, yeah?
For a few years now, I’ve been scrounging around looking for a way to solve this problem, and I had made very little headway. None of the container orchestrators I’ve found solve for this problem, because they’ve all come to the conclusion that migration of stateful workloads backed by host storage is simply not a problem they care to solve. VM orchestrators like Proxmox, Incus, XCP-ng, Nutanix, and others support this use case, largely IMO because this is a “brass tacks” feature in VM hosting orchestration.
Kubernetes? Run pv-migrate, and enjoy a logical copy of your data with no snapshotting/checkpointing of the underlying volume. Docker and Podman? Same situation my friend, but without the convenient tooling, which means you’re spooling up a temporary work container to tarball your drives, schlepp the data to the target host, and then restore it. Nomad? Nothing that I could find, although Nomad operates in more of a node-agnostic fashion, so the answer is probably to lean on the solve for whatever container manager you’re using under it.
None of these container orchestrator backed solutions are anything anywhere close to the experience I was looking for. But I had an idea of how it could be solved for. In my opinion, the ideal solution would make snapshotting and synchronization of container volumes a first class citizen by taking advantage of advanced file systems that support these features natively.
I had been hoping to find a solution around ZFS. ZFS is a venerable, battle tested filesystem with RAID-esque multi-disk array support. More importantly, it has the the concept of datasets, a storage abstraction that enables individual tunings for target workloads, and the ability to create filesystem snapshots, with facilities to send those snapshots to other systems. Docker and Podman do both support ZFS as a storage filesystem, and while the functionality is useful, it is limited in scope to handling the unique characteristics of OCI images, and doesn’t extend any of the more advanced features ideal for my desired architecture to volumes.
Ultimately, in order to be able to create individual snapshots of container volumes, each volume would need to be its own ZFS dataset, which is not a feature the built in driver supports. For functionality like what I was shooting for, Docker and Podman expect you to use a volume plugin. For a time, a number of folks were writing volume plugins, however, as Docker Engine and Swarm fell out of favor for multi-host orchestration, many of these plugins have languished. Much to my dismay, this includes a ZFS volume plugin , which does exactly what I was hoping for. While there are some forks with a bit more attention, ZFS development has been accelerating fairly rapidly (especially on Linux), and I have concerns with onboarding a plugin in a fresh deployment who’s best maintained forks haven’t received development in 5 years.
However, after talking with one of my colleagues who’s working on an ambitious project to build a Docker Swarm-like orchestrator atop Podman, I decided to look into BTRFS. While BTRFS lacks trustworthy support of RAID-like multi-disk arrays (at least, when dealing with drives with parity), it does have a similar feature to ZFS datasets in BTRFS subvolumes, with accompanying snapshott support. Additionally, a feather in its cap is that BTRFS has options to allow users to create and delete subvolumes without root-level privileges, which has potential to support to both rootless Docker and Podman. I’ve been a user of BTRFS for single volume cases for a while, and I depend on it as the root filesystem for my workstation, so I was a little miffed. that I had discarded it as an option altogether when exploring solutions for this case.
No longer myopic to solutions oriented around ZFS, I quickly found buttervolume, which was not only a volume plugin that creates container volumes as BTRFS subvolumes, but comes with companion tooling for container volume snapshotting, replication, and scheduling, which eases so much of the tooling burden I was anticipating having to solve when working with ZFS.
Proof of Concept
I imagined that after years of support that buttervolume would do what it said on the tin, but I wanted to specifically test migrating an application stack between two discrete systems, complete with its persistent data. For this, I set up a couple of Debian 13 VMs (Host A and Host B), installed Docker and btrfs-progs, and (mostly) followed the installation instructions in buttervolume’s Github repo. The instructions seem to be a bit of a hodge-podge, as things have changed over time and the Readme seems to have been updated piecemeal. I essentially followed the manual installation steps instead of using CLI utility based setup, as further instructions recommend adding some helper functions to your shell profile to call the CLI utility via the volume plugin container.
From here, on Host A, I drafted a Compose manifest for a test application stack. It’s a simple example, using the nginx image with a container volume backed by the buttervolume plugin.
name: migration-test
services:
nginx:
image: nginx
volumes:
- datavol:/data
volumes:
datavol:
driver: ccomb/buttervolume
Once I started the stack, I verified that the volume was a BTRFS subvolume, and then copied some data to the data container.
root@docker-btrfs-node1:~> docker compose up -d
[+] up 1/1
✔ Container migration-test-nginx-1 Started
root@docker-btrfs-node1:~> docker volume ls
DRIVER VOLUME NAME
ccomb/buttervolume:latest migration-test_datavol
root@docker-btrfs-node1:~> ls /var/lib/buttervolume/volumes/
migration-test_datavol
root@docker-btrfs-node1:~> btrfs subvolume list /var/lib/buttervolume
ID 256 gen 26 top level 5 path volumes/migration-test_datavol
root@docker-btrfs-node1:~> docker compose cp /etc nginx:/data
[+] copy 1/1
✔ migration-test-nginx-1 Copied /etc to migration-test-nginx-1:/data
Next, I moved over to Host B. I copied the same Compose manifest from Host A, and created the stack, but did not start it.
root@docker-btrfs-node2:~> docker compose create
[+] create 3/3
✔ Network migration-test_default Created
✔ Volume migration-test_datavol Created
✔ Container migration-test-nginx-1 Created
A key feature of buttervolume’s sync command is that if the volume already exists on the sync target, it simply updates it with the latest data. This is great, as this allows the Compose stack to own the volume, which greatly simplifies management. Finally, we perform the sync to bring the data from Host A to Host B.
root@docker-btrfs-node2:~> buttervolume sync migration-test_datavol host-a.example.com
Surprisingly, that was it. After starting the stack, I started an interactive TTY session inside of the nginx container, and navigated to /data, and found the contents that were created on Host A. Host B’s instance of the stack had everything it needed to operate that service. Magic. The final result was that the Docker stack on Host B owned the data volume, and any management burden (for instance, cleaning up the subvolume if the stack was destroyed) would work just as it would if it were any other container volume.
So… where to from here?
For Docker Engine based container hosts, the system works as intended. Next steps for this would be to have this functionality integrated into an orchestrator. I’ve currently reached out to the Dockhand team to see if there’s any appetite for onboarding this as a part of their stack. It’s a big ask to go to someone else’s project and to propose such a massive feature, but I’m at least hoping for
For Podman however, there’s some work to do here. I plan to do a similar PoC, as I did with Docker Engine, but as I understand it, there’s a gap with being able to support user namespace remapping, which I believe would be essential for both Rootless Docker and Podman based workflows. I’m looking into whether this is something I would be willing to tackle. For now however, given that I’m already entrenched in Docker Engine, I can hold off a potential migration to Podman if it is preventing implementation of a nit I’ve been facing for a very long time.
That said, assuming my colleague’s Podman based orchestrator continues to grow legs, I would be happy to throw support into the ring to enable this functionality there as well. While I wish he would come around to using the Compose spec as his way of defining application stacks (hehe), there’s a lot of his design that shows promise and would benefit from this as a feature.
For now though? I have an aging Docker Engine based host that I need to figure out a migration plan for, and I think the next step is looking at migrating from Traefik to Caddy, and taking advantage of Caddy plugins for decentralized certificate storage and RFC 2163-backed DNS registration.