I recently set up Bichon, a really neat mail archiving app from the creator of RustMailer. Bichon was everything I was looking for in a mail archiver - web based access without the cruft of a webmail client, with the added benefit of being able to regularly ingest email from IMAP accounts. Not only could I store and search through my email archive, I could easily maintain that archive in real time, giving me peace of mind that my email archive would no longer be in the hands of a third party provider. What follows is the saga of how I failed in setting up my own archive, recovered from that mistake, and now have a very solid workflow going forward. If you’re interested in my Bichon setup, including how I wired it up to my new provider, Proton Mail, please look forward to a quick writeup with examples on my new Codeberg repo for personal cloud / self-hosted infrastructure.

My misadventures in setting up a mail archive started with Google’s sudden announcement that they were killing off their legacy free G Suite / Google Workspace / Google Apps For Your Domain accounts in 2022. I had migrated my email from Google to another provider beforehand, but had depended on the existing account being around to have access to legacy messages. I didn’t need it frequently, but when I did want it, it was handy to have. While the cancellation of these accounts ended up transitioning to a new “legacy single user” plan, the rug had already been pulled, and I decided I wanted off Mr. Alphabet’s Wild Ride.

After pulling my email archive from Google Takeout, I had several gigabytes of emails that could only be read on a device with a full email client. Definitely had my data, but the outputs were designed to be opened in an email client, which wasn’t ideal. I really wanted a solution where access to my old mails wasn’t dependent on me having that data on my device. I started looked for archiving solutions, and at the time, I couldn’t find anything that was really fit for purpose. What I really wanted was a web based interface to search for and read the contents of messages. I didn’t need a full email server, nor did I need a full webmail client, but that appeared to be the only options I had at the time. I ended up settling on setting up a Dovecot server in a Docker container. The Dovecot Docker image’s default config seemed pretty great to get started. It would automagically create email accounts as clients connected, using a password for all accounts specified by an environment variable. I decided I’d set up Dovecot first, and then eventually wire up a webmail client at some point. I wired up a regular mail client, imported my Gmail backup, and after that arduous process finally completed, things seemed great.

That is, until I performed a single container image update. The dark angel of “things worked fine on day 1, but day 2+ is full of terrors” solutions strikes again. Somehow, Dovecot decided that the old mailbox wasn’t real anymore, and when my mail client connected, it created a fresh database with no emails in it. For all I know, this is sane behavior for how the default config expects to work, or perhaps some data that should have been persisted wasn’t, but it certainly wasn’t how I thought it would go. This left me with an empty mailbox, and a directory full of Dovecot files that, at first glance, were not in a format that I could do anything with. I of course had also made the terrible mistake of deleting the exported emails (hey, the server’s set up and being backed up, we’re in good shape!), and my Google account was now long gone, leaving me in quite a pickle. Reading Dovecot’s documentation and various write-ups on the internet left me to believe that my emails were now in Dovecot’s proprietary database format, which would require dark sorcery in order to recover. Dejected, I saved the Dovecot container and its data, but assumed that this data was lost without significant effort.

Fast forward to a week ago, and I had made the decision to switch email providers again, again leading me to want to archive those messages from the old provider. While I decided to import those old messages into the new provider, this lit a fire under me to consider setting up a mail archive again. In the time between then and now, a few folks had started OSS projects that were absolutely fit for purpose for my needs. It was amazing! Where were these mail archiving projects before? Thankfully I had three options that I could readily find that I could choose from, and I settled on Bichon. Bichon is not perfect - while it can bring in email dumps as well as the just-in-time ingest via IMAP, you can not incorporate those under one “mailbox”. It’s a minor nit in what otherwise seems to be a solid app.

But those old Gmail messages were still gnawing at me. I knew I had to be able to recover those messages. Ready to dive deep into Dovecot’s codebase, documentation and tooling, I recovered the data out of the container volume and started poking around. It was at this point that I realized that I had made a terrible assumption years ago when this problem reared its ugly head that would have saved me so much head and heartache. Dovecot does have a binary database format that emails can be stored in, but that is not the format that the default config of the Docker container uses. While it does transform the messages into a filename schema that mates with its message index database, the files on the disk were standard .EML email messages. This meant that anything that could read .EML files could view those files. It was all there, just waiting for me to recover it.

Why had I assumed that my data was in Dovecot’s proprietary format? Well, it was a mixture of not spending enough dedicated time on the problem, and trusting Stack Overflow at first glance. I should have known better! I had assumed that because Dovecot’s message naming convention left a litany of files like u.2591 and u.385109 on the filesystem that they certainly could not be plain EML files, even though the same places I looked leading me to think that they were told me that those files would have a different prefix.

Regardless of the trials and tribulations, I’m so very thankful for how things landed. I have an email archiving solution that handles both my legacy provider dumps as well as ingests regularly from my current provider, with a web interface for easy access across all of my devices in my personal cloud. And most importantly, I dug a little deeper into the problems i was facing, and questioned some assumptions, and found that, thankfully, things weren’t quite as bad as I thought they were.