Improving Unix Mail Storage?
At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage?
CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
- mbox: There are problems with locking, corruption, access-times, and bloat.
- Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
- Cyrus: Basically the same as Maildir with database features.
- UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
- Evolution: Basically mbox with database features.
- Windows clients: Typically some proprietary db-format. Pathetic.
But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).
Some ideas about the ideal mail-storage:
- One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
- Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
- File format: gdbm, Sleepycat db? Something new?
- Should the security model allow users to directly access their files, grep them, copy them around?
- Shared folders, virtual domains?
- Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
- How would MTAs deliver mail? How would clients access? File-locking (NFS)?
- What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?
Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"
Exchange is actually a pretty decent mail server, although only using it for mail is pretty dumb - its groupware features are the killer app. It exposes both benefits (in particular, single storage of messages with multiple recipients) and flaws (if your db goes boom, it affects all your users - or at least all your users in a given mail partition) of database-based mail storage.
I remember seeing a project to combine mail storage with PostgreSQL a while ago. Anyone know what happened to it?
Lead developer, http://wisptools.net
Something like Maildir .. if the FS is slow and can't handle that kind of application, then we need to improve our filesystems!
Lots of applications need lightweight databases with indexes, locking, and atomic operations. Why not bake this into the filesystem, and it won't have to be just for email, it will have many uses.
I was thinking about this the other day as I was working on a logging system for a large in-house email filtering system.. similar problem, except instead of storing emails, I'm storing small XML fragments describing the structure of each email and what was done to each. So far the easiest solution was large monolithic XML files, and an external index pointing in the large file (i.e., like mbox + a DB index). As it grows we'll probably have to move it to a "real" database.
There is a need for something like sleepycat DB + ReiserFS on steriods..
Automatically deliver mail to recipients who can then save the mail on their own machines. It's like distributed processing except it's distributed storage.
/dev/null
If someone isn't logged on to receive their mail (like those saps who turn their machines off every night), then forward the mail to
I have been pwned because my
The great advantage of the current system is that it is very easy to move your e-mail from one program or computer to another with little hassle and/or risk. With any type of database system, you introduce a level of complexity that virtually assures that only one e-mail program will be able to read your e-mail. I think the best solution as far as I am concerned is to just stick with current mbox format, but allowing attachments to be deleted independently though that is just personal preference. But I think we should be wary of adding any complexity that endangers the portability of mail. Also, the other thing to be said for the mbox format is that worst come to worse you can still access your e-mail with a text editor and/or grep.
/. punchingbag jwz has some strong opinions about using databases (etc.) for mail storage. I tend to agree: everything can read from and write to files, there no versioning issues, they can be easily transported among different operating and file systems, they can be backed up easily. But it's another wheel to reinvent, so everyone hop to it at once and then lose interest in two or three weeks!
Chris
M-x auto-bs-mode
Of course, with traditional UNIX file systems, this is a bit slow. The thing to do is to fix the file system, not to kludge ever more complex mail formats on top of it. ReiserFS goes much of the way; we now also need some system calls to open and read multiple files with a single call.
Until file systems catch up, one kludge is as good as another. UNIX mbox format is at least simple, so I stick with that.
Life is not that simple. All databases are limited by the size of the basic block, and if you can't fit your data into that block performance takes a hit.
With PostgreSQL this a compile-time option, default 8k and it can go up to 32k.
It *is* possible to store larger items, esp. if they're 'TOASTable' or blobs, but this often just pushes the problem of dealing with thousands of files onto the database. Only now it's a lot harder to figure out why performance sucks.
Does this mean that database solutions won't work? Of course not. But it does mean that simple solutions won't scale well when you're dealing with massive amounts of data.
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken
If you think that you can replicate what Exchange does in "a couple house of time" you have not been at it long enough.
There are two excellent reasons that so many people use Exchange.
1) In general, it works out of the box. A company with someone with meager knowledge can set up a fairly complex mail handling system without much help.
2) It does A LOT. In it's most basic configuration it does what you need 10 or more programs in Linux to do, not to mention that most of those 10 don't exist.
Rage against the machine all you want, but when your boss says you will have shared contacts and calendars and your clients will run Windows; find me a solution that comes within miles of the ease of Outlook and Exchange and I'll give you a cookie.
Actually, I'll probally give you several thousand dollars.
What is the problem with Maildir? I mean if you're going to store email, might as well use reisterfs. I don't have any problems with big mail boxes, and the extra integrity of the email messages, are worth the (non-noticable) dealy.
Used to get the mbox corrupted once in a while. Never had problems with Maildir.
Je ne parle pas francais.
I liked this until the server (well cluster actually) that served our EMEA operation fell over. EMC, Compaq and Microsoft fought over who was at fault and in the end 22 hours later the thing had been rebuilt and restored from tape. This was a solution put together by a Microsoft Premier Support partner that was supposed to have 5 9's availability and fell over in its first couple months! Instead of 0 lost email we had all emails that hadn't been in the last tape cycle lost along with any emails that timed out waiting for the server to come back up, not only that but noone could read their email for an entire day (2 business days actually).
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
The problem with using database formats is that you can't access them with vi. How many times has your mail client crashed attempting to read an email, but you still _need_ to get access to it? If it's in a database (proprietary or not), you're up the creek. If it's stored in a flat file, you at least have the option of using vi/emacs/grep to find and read the email, and then excise it.
This has happened to me in Netscape, Kmail, Outlook, Evolution, Eudora, etc. Every single one has had problems at one point or another. The best programs are the ones that are _truly_ open, and let you get at the mail from other directions.
Don't doubt the power of the text utilities in Unix. :)
Jason Pollock
Heheh...I read a funny quote here on slashdot earlier today that I think applies:
I've heard from a lot of people who consider themselves experts that ReiserFS is not stable, never has been, never will be, all that fun stuff. But I know better, because I have data. Hard numbers...I know I can run a Squid box harder and at higher loads for longer on ReiserFS than ext2 or ext3. I know that I can run a Squid machine for 2 years with ReiserFS cache partitions with uptimes over a year, with the reboot after all that time being for a kernel upgrade.
Yes, there have been data corruption issues for some people for ReiserFS. But I'm on the ext3 and jfs mailing lists as well...I know they have data corruptions of their own. It's a fact of life when dealing with computers, things go wrong for everyone at some point. I simply don't believe the masses when they tell me ReiserFS is not suitable for production use, because I have more machines to administer than the vast majority of slashdotters, and I believe I can trust ReiserFS. I trust my opinion above most.
Shredding and compressing mail messages is almost always a bad idea. Essentially *nobody* does it correctly, and you can't reconstruct messages in their original byte-for-byte formats, which trashes digital signatures. You won't save much disk space, because real text doesn't take up enough space for anybody except a big ISP mailsystem to worry about, and binary attachments usually only compress well if they've been encoded in some non-8-bit-transparency format like base64 or uucode. About the only time it wins is when one person on your keep-mail-on-server mailsystem is sending an attachment to a bunch of people who can then all use the original, which is to say they should probably have stored the file on the web and mailed a URL. If you're going to do things like this, get yourself a compression-equipped filesystem and just store your raw mail messages there.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Quite right. Just try it. You might be a bit surprised by the results.
News for Nerds. Stuff that Matters? Like hell.
There is a standard to move E-mail around. It's called RFC 2822.
Rudd-O - http://rudd-o.com/
Sorry, restores are even more important. I hope you check your backup strategy by trying a recovery every so often. Many a time I have heard people who "thought they had a backup" and then it turns out that the thing that was being backed up was in an inconsistent state.
Even more simple solution - just send everyone a text message.
In it you can put a link to the html variant for those that want that - and put that variant on a webb server.
Email should be text. Webb pages should be HTML.
Just saying it like it are.
I don't think things are that bad - for example, Cyrus with its indexes works pretty well and large (20,000+) folders. And things like searches are pretty fast with a client like evolution that does a lot of cacheing.
I would take the simple structure of Cyrus over the easy to break "database" files of Exchange server any day.
So NNTP solved this IMHO a rather elegant way...
.overview, which is just the summary information for all the files.
.overview file. Or grep through it, if you like.
You have directories corresponding to newsgroups or mail folders or whatnot. i.e. alt.swedish.chef.bork.bork.bork is really alt/swedish/chef/bork/bork/bork
Articles are numeric, i.e. \d+ for Perl types. The raw message is stored in each file.
In each directory, there's a file called
Thus, you can have zillions of small files, and happily grep and copy them to your heart's content. But you never do a 'ls' on a huge directory, you always just look through the
So, in that sense, it's very much the best of both worlds. And, on the same box, you can specify rules on who can access the folders, so one file can be read by multiple people. Ooh.
GNUS, an Emacs based mail/news reader, uses a variant of this called nnml, which rocks.
Of course, when you get down to it, JWZ arguments aside, databases start to really look like what you want, especially on a corporate level when you're tossing the same piece of mail around to tons of different folks.
-e
This is not how mail is actually stored on disk in Plan 9. The "real" mail storage is just mbox files. What rpeppe has described is the view that the mail storage system provides to clients.
I agree it's very sweet, but the question is primarily dealing with the actual storage format.
---glv
From what I've been told, there are currently no plans for a Linux Notes client. The reason is that there is just not enough money in the Linux desktop market right now to justify the expense and hassle of trying to port the Notes client (it would be hellish because of the GUI and for various other reasons). Most people at IBM who use Linux on the desktop use Notes under WINE. It works reasonably well for basic email and calendaring. Besides, as someone else said, the Web is a more practical cross-platform solution than a native GUI port anyway.
VMSmail's storage format is instructive. Each message is represented by a single record in an indexed file. A short message body is simply tucked into the record along with the headers and other metadata. Long bodies (more than around 2kb IIRC) are stored as individual files and their header records point to the files by name.
Of course you all realized at once that the main file can get out of sync. with the directory which holds the external bodies. It does, sometimes, and fixing it up can be a pain. Any storage method which partitions a single message among multiple files is going to have similar problems. But it works pretty well, and it shouldn't be too hard to write a tool to groom the message store in case of inconsistency. It's worth study.
It was a natural choice on VMS, which has really good multi-indexed file support in the base package. It works well with text messages, which often do fall within the size limit for avoiding external storage of the message body. Today it suffers the same problem that mbox does -- people use email differently now.
I have in excess of 46K email messages in my account alone, not to mention everyone elses accounts on my company's mail server. We use cyrus IMAP and qmail, both of which use the Maildir format mailboxes ... every client I've used (Mozilla, Communicator, Outlook/OL Express, Mail.app on OS X, Eudora, and Papi-Mail on PalmOS) seem to have absolutely no problem with this setup. Most MUAs are intelligent enough not to download all your headers every time you connect, so unless you're getting 1000+ new emails everytime you open a particular folder, you're generally not going to need to read all those headers every time.
... long live Rackspace).
... optimize the file subsystem beneath it, maybe allow for compression/encryption or that sort of thing, but otherwise, the folks that put together Maildir have certainly done a decent job!
The server that runs this is a measly 600MHz PIII w/ 128MB RAM running RedHat 6.2 w/ a 20GB hard drive. I haven't gotten even close to running out of inodes, to my knowledge, and my server never goes down (really, the only times its gone down is when power has been cut to it and this has only happened twice in the past 1.8 yrs
Maildir is specifically designed to handle mailboxes with large numbers of emails in them, contrary to other formats such as mbox. The problem with any sort of DB approach is the waste of space, even if you compress. A basic course in file structures will teach you a wealth of knowledge in this regard.
Imagine this: you have a table that stores everything you need to know about an email. You have a few distinct fields for commonly accessed headers (subject, from, to, cc, etc.) each of which would need to be 'text' blobs, since you cannot limit their size (you've seen the emails that have to/cc fields that are miles long, right?) - well, 'text' fields are notoriously poorly optimized in database engines and quite difficult to search (you can create an index on a part of a text field, but that might not be enough, right?). Next you have the message body which would also need to be a text field since you don't limit it's length, either.
Now, since the space for these fields (which don't *ever* change) is not optimized in the slightest, you might think that compressing them is a good idea, right? Well, what if an email is deleted - then you start looking at fragmented space in your database table which would need to be compacted periodically (much as mbox/.mbx files do today, if I recall).
All in all, storing each message to its own file is not really *that* bad
Sometimes it makes me want to scream that Microsoft gets away with this stuff and no one seems to care.
One thing to keep in mind about Exchange is that it's really a X.400 mail system, with some proprietary routing features kludged on top, lots of back-compat MS Mail features kludege on top of that, and then (as the last afterthought) SMTP kludged on top of all that. Next time you are at a computer book store, gander at the architecture diagram for Exchange -- it's so complex that it _should_ make you queasy. The thing just reeks of early-90s incorrect design assumptions.
So it shouldn't be a shock that it can't handle a large number of SMTP edge cases. Frankly, nobody would buy a product like Exchange if it didn't have the Microsoft logo on it and a nice client which gets installed with Word and Excel.
Microsoft, in their heart of hearts knows that it's a piece of shit, but it's _their_ piece of shit. And it happens to sell well, and any product that profitable can't be all that bad.
I wouldn't be shocked if numerous skunkwork project have come and gone at MS to replace their Big X.400 Jet DB Kludge with a real Internet-saavy mail server, but the poltics of the place probably dictate that that they lumber on with what's working (that also explains products like Windows ME).
I don't think it makes sense to store email in dbm files. It's too sketchy - what happens when the dbm file gets corrupted? The nice thing about flat files is that if something goes wrong, you can fix it with vi.
I think the right solution to the problem is to key off the message ID, which is supposed to be unique. Then define a mail folder as simply a list of message IDs. Messages can appear in more than one folder, but hopefully not in no folders.
To make this efficient, I'd hash the message ID, and use a hierarchy of directories, because Unix doesn't do well with large flat directories. The hierarchy could auto-extend, so that as one subdirectory fills up, you do a sub-hash and split it into more directories.
The problem of tiny files is a real one. The solution is probably to make the bottom of a hash a file rather than a directory, and store more than one message in each such file. You don't have to store a lot of messages in these files to win - even ten messages would produce a big win, and would be pretty efficient.
The format of the individual files should probably be indexed sequential access - that is, a TOC at the front, and then the contents as plain text, nothing fancy. The TOC should be in ASCII, not binary, and you should be able to rebuild the TOC by looking at the file.
Babyl used to use a control character as a delimiter, which worked pretty nicely - much better than using "^From ". Ever seen >From in an email message? That's because Unix mail uses "^From " as an inter-message delimiter, so it has to quote it, and it does so stupidly. So use ^_ as a delimiter, and if ^_ appears in the email message, just double it. Take a doubled ^_ out when reading a message.
As for compression, I don't think it's worth doing at first. Disk space is cheap. Yes, my email folder is pretty huge, but it's really not a major problem. Making the storage system extra-complicated by uncompressing MIME is something to add on after you've got something more basic that works - you don't have to solve every problem all at once.
As for folder scan performance, you can make a cache, and have the mail program scan the cache from time to time when it's idle to clean up errors. This is much better than trying to come up with a format that's optimized toward folders - if you try to optimize toward folders, you wind up creating all kinds of problems, IMHO.
The questioner makes the correct observation that Maildir is very slow with large directories when performing aggregate operations such as viewing the inbox.
Unfortunately the questioner doesn't notice the correlary that the single-file-per-folder solution will tend to be slower for *unit* operations -- adding newly arrived mail becomes a problem because of locking issues, removing deleted mail neccesitates compacting the file and so forth.
I worked at the 8th largest web based e-mail provider -- they provide cobranded web based e-mail for over half a million domains, with over 12,000,000 mailboxes when I left.
A gentleman we interviewed who had left a competitor told us about a major problem they had: They were using stock maildir to store messages, and with a *slighty* larger userbase than us they were crushing a $1,000,000 EMC SAN capable of handling some 8,000 NFS operations per second (Or was it 16,000? Can't recall...) -- 300-400 NFS operations to view an inbox just isn't good. My employer was using a low-end NetApp capable of handling something like 4,000 NFS operations per second (Again, don't remember for certain -- it was half or less of the EMC box's capacity though) and the box was only at 20% of it's throughput capacity, with nearly as much mail coming through the system.
The *one* key architectural difference we made was storing certain headers in a MySQL database -- from, subject, sent date, etc. The stuff you need to view an inbox or what have you.
Following such an approach -- particularly with a DB capable of fine-grained locking gives you the best of both worlds: Fast aggregate operations (use the DB to aggregate and index data for inbox-viewing, searches, and so forth), and fast unit operations (using individual files to store messages). And writing software to interact with such a mailbox remains very simple.
You can use compression on the individual files to save space, or you could be courageous and come up with a binary-safe heirarchical file format that can represent a MIME document efficiently in order to "undo" the 35+% penalty encoding poses. If you're really gutsy you could then compress that file. Or, in order to really maximize performance you could simply opt to compress *segments* of the file (think binary attachments -- leave headers and text/HTML sections uncompressed), so that viewing a mail doesn't involve decompressing it -- only accessing large attachments would incur that penalty. In fact, this gives you room to make user-definable performance vs. space tradeoffs: Let the user decide what sorts of things get compressed. Want to save the maximum amount of disk space? Compress everything. Maximum speed? Compress nothing. (And in that event you don't even have to pay the CPU penalty of MIME-decoding the attachment!)