What Mailbox Format Do You Use And Why?
"I currently store all of my e-mail in a local mbox-style IMAP store in ~/mail/, so that I am not tied to any particular mail client. However, I am planning on syncing my mail across multiple machines (home, work, and soon a laptop) so I need to have mail in a form which can be synced easily. MBox is bad for this because if I grab mail on one machine, and later delete some mails from the same folder on another machine, then sync, the new mails will be lost. This is where maildir is good - each message is a separate file. But why do so many people hate it? If I do change over to mailbox, what IMAP/SMTP servers should I use? A hacked sendmail/UoW IMAP? Courier-IMAP + QMail? Something else? How do other people keep their mailstores synced across many machines, and what software do they use?"
I've been using 'mbox' for -- gawd, can I say this? -- fifteen years, and it's served me well. 'mbox's advantages for me are that it is efficient with disk space (you don't eat an inode per message), and that it is quick to search.
9 times out of 10, when I'm searching my mail, typically with 'grep', I'm looking for something in the body, not the headers. With 'maildir', you have to open each message and search it. This is preposterously slow. There is also the danger that the shell's wildcard expansion limits may be exceeded if you have a lot of messages. With 'mbox', 'grep' opens the one file and slurps through it quickly.
Remote synchronization is not an issue for me. All my email resides on my laptop, which follows me everywhere.
However, I'm hip to 'maildir's increased reliability. I have over 2000 messages in my outgoing box alone, and I'd hate to have a system hiccup destroy any of it. If I could search the bodies of a 'maildir' spool as quickly as an 'mbox' spool, I could be convinced to switch.
Schwab
Editor, A1-AAA AmeriCaptions
Originally, the reason we switched to maildir was that even without NFS, mbox was corrupting our filesystems. Not just the files, mind you, but the filesystems themselves. It was a total pain in the ass, and we damn near left Linux for FreeBSD. This was using 2.0.36 and Sendmail. We had to put /var/spool/mail on it's own partition so we could unmount and fsck it until we found a solution. Between that and problems with files > 500MB, my opinion of Linux 2.0 is very bad.
:)
Our solution was moving to qmail and using Maildir mailboxes for our users. We never saw the problem again.
Recently, I've switched to courier mail server (http://www.courier-mta.org/) on all my non-production machines to evaluate it. I'm really, really happy with it. Courier is a complete mail system, not just an IMAP server, so you might take a look at the whole package. The whole thing is RFC compliant, which causes troublte for software that isn't, but that's a fault in the other software.
As a final rant against UW-IMAP: I hate it. It loads the whole damn mailbox being checked into memory (regardless of the type), which creates a huge load every time someone with a large mailbox checks their mail. This problem affects the POP3 server as well, since that also uses the c-client code.
That's just plain wrong. Qmail supports both maildir and mbox. I've been using qmail with only mbox files for years...
And at only $87/user client access license (courtesy of Shopper.com), it's a STEAL...
(oh, plus Win2000)...
(oh, plus a machine with at LEAST 256-512MB RAM)...
(oh, plus a backup solution to backup the DB live)...
(oh, plus some sort of a firewall/gateway... you wouldn't want this DIRECTLY on the 'NET..!)
-- You can't idiot-proof anything, because they're always coming out with better idiots.
ArsDigita has a great article on using Oracle as a backend for your mail and ACS as a front end.
Prevent email address forgery. Publish SPF records for y
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
My mailbox works just fine, and it hasn't changed in over 20 years! It sits at shoulder height just to the right of my front door. Here's the advantages:
-No encryption techniques neccesary
-rarely have to waste time with forwarded jokes
-Best of all, the spam it collects is occasionally useful (I know all the pizza deals available in town).
From hell's heart I fstab at /dev/hdc
As someone who is, as we speak, supposed to be implementing an IMAP server, let me say this: If the person who dreamed up RFC2060 says that X is "slow and dangerous" run, DO NOT WALK, to leap onto the X bandwagon--it'll be the wave of the future.
--
MailOne
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
I think the guys who wrote Cyrus IMAP server got it right. I have been using Cyrus for about 4 years now and I rarely delete mail. The server is still responsive and full body text searches are pretty speedy, even on the P133 server that it is running on. I think keeping each mail in a seperate file, and making a directory for each folder is the way to go. It also makes it very simple to restore a lost mail message and to index the whole mailbox. Anyway,.. thats my two cents.
http://www.jwz.org/doc/
has a number of essays about mail on Unix systems, including problems with mail box formats.
I use Xemacs/Gnus/nnml so all my mail is stored as individual files, which is handy (as other posters have said) and has it's downsides, as they have said too (grep now bitches if passed all files in my main mail box). Still, I like it, best system I've used. Not so great for the multiple hosts thing though.
Or you could run your mail and xemacs on one machine, and either read your mail in a terminal, or open X windows on your local display. Look up gnuserve to do that, I think.
Plato seems wrong to me today
Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.
Fortunately, a solution to this problem is being developed right now. The Citadel/UX project is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange. SMTP and POP3 are already in place; IMAP will be available by the end of the year. Web-based access works as well. After that's done we'll be writing plug-ins for both Evolution and Outlook, in order to facilitate all of the 'shiny things' working as well: calendars, address books, etc.
So, you might ask, what mailbox format does it use? None of the above. Messages are stored in a database, like they should be. The Berkeley DB package from Sleepycat Software (yes, it's open source) is used for robust back-end storage, including transaction and logging support.
I'd encourage any developers who are looking for the open source world's "Exchange Killer" to get involved in this project.
--
Tired of FB/Google censorship? Visit UNCENSORED!
Email messages are a specifically interesting topic. They're (for the most part) text, and tend to be larger than database fields want to be (on the order of 1+ kB each ranging all the way up to many megabytes in common practice).
This makes most mail messages poor choices for database storage (for example you want to be able to use "grep" on mail or compress in-place. Headers on the other hand are a major win in a database ("select messageid from headers where user = 'me' and date > yesterday and fromaddr = 'taco@slashdot.org'" should be fast even if I have tens of thousands of messages).
The easy solution is to keep the headers in the database, and then just keep maildirs with the original messages in the normal filesystem with the filenames in the database with the headers (something like message.headerid => headers.id and message.text is a path to the maildir entry for this message.
This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.
Maildir is better because:
1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.
2) maildir is more resistant to catastrophic corruption since each email is a seperate file.
3) maildir keeps metadata about the email in the emails filename, rather than a seperate index file. This helps prevent the metadata, such as "replied-to" and "forwarded this" from getting out of sync
4) filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.
5) maildir is better positioned to take advantage of advanced new filesystems like reiserfs. when reiserfs has a plugin for file-level transparent compression, maildir will be able to selectivle and invisibly compess emails to the disk without requiring other programs/scripts to decompress them before use.
Study maildir, it's just plain better.