Slashdot Mirror


Improving Unix Mail Storage?

At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage? CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
  • mbox: There are problems with locking, corruption, access-times, and bloat.
  • Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  • Cyrus: Basically the same as Maildir with database features.
  • UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
  • Evolution: Basically mbox with database features.
  • Windows clients: Typically some proprietary db-format. Pathetic.

But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).

Some ideas about the ideal mail-storage:

  • One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
  • Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
  • File format: gdbm, Sleepycat db? Something new?
  • Should the security model allow users to directly access their files, grep them, copy them around?
  • Shared folders, virtual domains?
  • Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
  • How would MTAs deliver mail? How would clients access? File-locking (NFS)?
  • What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?

Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"

4 of 554 comments (clear)

  1. Re:One folder to rule them all... by alen · · Score: 1, Troll

    That's MS Exchange alright. It's been doing all this and more for almost a decade. And managed properly it's very stable. It's one of the good products that MS makes.

  2. Re:One folder to rule them all... by whirred · · Score: 0, Troll

    Are you on crack? Calling Exchange's "groupware features" anything but an utter joke is absurd. They're still trying to catch up to what Lotus has been doing for years, and they aren't doing a very good job of it.

    If you just want to run email, Exchange/Outlook is fine. If you want a collaborative groupware sollution with work flow built in, Domino/Notes is the only answer, currently.

    Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.

    Not to mention, unlike Exchange, when one mail database gets hosed your whole server doesn't get scrapped. And you aren't supporting Microsoft.

  3. Re:Maildirs by Ian+Bicking · · Score: 2, Troll
    In case you haven't noticed, the default settings for the Linux ext[23] filesystems is to allocate one inode per 4096 or 8192 bytes of disk space. Which happens to be pretty much the size of an average E-mail message. So, in other words, you are unlikely to run out of inodes before you run out of disk space, since both are going to be used up pretty much at the same clip.
    This doesn't make sense to me, at least not as presented. You are going to run out of inodes at exactly the same time you run out of disk space, because they are one and the same thing. In fact, I believe all the inodes are created when you create your filesystem, all space is mapped to an inode (though of course one file can use multiple inodes).

    The issue is not the waste of inodes, but the waste of diskspace because the smallest file chunk is one inode worth of space. It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file. This is not really correct, because inodes themselves take up space -- I remember reading a paper somewhere many years ago where they estimated that most users would find 4k inodes better than smaller values, because in normal file distributions the space you save with the smaller inode is less than the space of the increased number of inodes themselves. However, this would lead one to believe you should have really big inodes and really big files, and then you'll be very efficient.

    But really, none of this should be given much weight until someone does a statistical analysis of just how inefficient a one-mail-per-file system is. It might not be significant, or it might be insignificant compared to storing base64 messages, or it may be insignificant compared to the benefits of compression. It's bad form to optimize before profiling, and the many-file inefficiency concerns feel like they are more based on intuition and less on fact. But then, someone must have studied it, so maybe not.

  4. Re:Exchange brain-damage by bmetz · · Score: 1, Troll

    "Gets away with it"? Come on. Someone made a mistake in their team. Yell at MS and the next version will fix it. It happens.

    I doubt there's a sinister plot in MS to mess up
    people's emails. They've got to use these products too, you know.

    --
    What did you eat today? http://www.atetoday.com/