Slashdot Mirror


Improving Unix Mail Storage?

At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage? CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
  • mbox: There are problems with locking, corruption, access-times, and bloat.
  • Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  • Cyrus: Basically the same as Maildir with database features.
  • UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
  • Evolution: Basically mbox with database features.
  • Windows clients: Typically some proprietary db-format. Pathetic.

But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).

Some ideas about the ideal mail-storage:

  • One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
  • Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
  • File format: gdbm, Sleepycat db? Something new?
  • Should the security model allow users to directly access their files, grep them, copy them around?
  • Shared folders, virtual domains?
  • Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
  • How would MTAs deliver mail? How would clients access? File-locking (NFS)?
  • What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?

Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"

4 of 554 comments (clear)

  1. I vote for a filesystem-based database by Dr.+Awktagon · · Score: 5, Insightful

    Something like Maildir .. if the FS is slow and can't handle that kind of application, then we need to improve our filesystems!

    Lots of applications need lightweight databases with indexes, locking, and atomic operations. Why not bake this into the filesystem, and it won't have to be just for email, it will have many uses.

    I was thinking about this the other day as I was working on a logging system for a large in-house email filtering system.. similar problem, except instead of storing emails, I'm storing small XML fragments describing the structure of each email and what was done to each. So far the easiest solution was large monolithic XML files, and an external index pointing in the large file (i.e., like mbox + a DB index). As it grows we'll probably have to move it to a "real" database.

    There is a need for something like sleepycat DB + ReiserFS on steriods..

  2. Something to keep in mind... by cwinters · · Score: 5, Insightful

    /. punchingbag jwz has some strong opinions about using databases (etc.) for mail storage. I tend to agree: everything can read from and write to files, there no versioning issues, they can be easily transported among different operating and file systems, they can be backed up easily. But it's another wheel to reinvent, so everyone hop to it at once and then lose interest in two or three weeks!

    --

    Chris
    M-x auto-bs-mode

  3. Re:The Reiser guys have some ideas. by SwellJoe · · Score: 5, Insightful
    I have also heard from someone who does Linux consulting who won't use ReiserFS. Overall, I don't call it stable.


    Heheh...I read a funny quote here on slashdot earlier today that I think applies:

    The plural of anecdote is not data.


    I've heard from a lot of people who consider themselves experts that ReiserFS is not stable, never has been, never will be, all that fun stuff. But I know better, because I have data. Hard numbers...I know I can run a Squid box harder and at higher loads for longer on ReiserFS than ext2 or ext3. I know that I can run a Squid machine for 2 years with ReiserFS cache partitions with uptimes over a year, with the reboot after all that time being for a kernel upgrade.

    Yes, there have been data corruption issues for some people for ReiserFS. But I'm on the ext3 and jfs mailing lists as well...I know they have data corruptions of their own. It's a fact of life when dealing with computers, things go wrong for everyone at some point. I simply don't believe the masses when they tell me ReiserFS is not suitable for production use, because I have more machines to administer than the vast majority of slashdotters, and I believe I can trust ReiserFS. I trust my opinion above most.

  4. Don't speculate. Profile. by Doktor+Memory · · Score: 5, Insightful
    Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi)
    Psssst. It's not 1978 any more. Inodes are cheap. So is disk space. Stop spreading FUD.
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
    Quite right. Just try it. You might be a bit surprised by the results.
    --

    News for Nerds. Stuff that Matters? Like hell.