Improving Unix Mail Storage?
At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage?
CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
- mbox: There are problems with locking, corruption, access-times, and bloat.
- Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
- Cyrus: Basically the same as Maildir with database features.
- UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
- Evolution: Basically mbox with database features.
- Windows clients: Typically some proprietary db-format. Pathetic.
But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).
Some ideas about the ideal mail-storage:
- One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
- Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
- File format: gdbm, Sleepycat db? Something new?
- Should the security model allow users to directly access their files, grep them, copy them around?
- Shared folders, virtual domains?
- Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
- How would MTAs deliver mail? How would clients access? File-locking (NFS)?
- What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?
Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"
slashdot sucks, and all you pathetic pieces of shit should shove your head up your assez so's u can see your own shit like the goatsecx man! fucktards!
That's MS Exchange alright. It's been doing all this and more for almost a decade. And managed properly it's very stable. It's one of the good products that MS makes.
u all people are goddam nigger fucktards! fuck u all! nigger fucktards! eat my shit motherfuckers! goddam nigger fucktards! fucktards all! god damn you nigger fucktards!
Settle down Bucky! You're talking about a whole new approach. This is Slashdot, home of Open Sores users. We only reinvent shit that's been around for ages.
It is official; Kuro5hin confirms: Slashdot is dying (because of JonKatz)
One more crippling bombshell hit the already beleaguered Slashdot community when JonKatz posted more tripe, causing the Slashdot readership to
drop yet again, now down to less than a fraction of 1 percent of all geeks and nerds. Coming on the heels of a recent Slashdot survey which p
lainly states that JonKatz has lost more readers, this news serves to reinforce what we've known all along. Slashdot's journalistic in
tegrity is collapsing in complete disarray, as fittingly exemplified by failing dead last in the opinions stated following a recent JonKatz article.
You don't need to be a Kreskin to predict JonKatz's future. The hand writing is on the
wall: Slashdot faces a bleak future. In fact there won't be any future at all for Slashdot because Slashdot is dying - all because of t
his cancer calling itself JonKatz. Things are looking very bad for Slashdot. As many of us are already aware, Slashdot continues to lose reade
rship. Red ink flows like a river of blood.
Let's keep to the facts and look at the numbers.
Kuro5hin leader rusty states that there are 7000 readers of Kuro5hin. How many users of Freshmeat are there? Let's see. The number of Kuro5hin
versus Freshmeat posts on Usenet is roughly in ratio of 5 to 1. Therefore there are about 7000/5 = 1400 Freshmeat users. OSDN.com posts on Us
enet are about half of the volume of Freshmeat posts. Therefore there are about 700 users of OSDN.com. A recent article put Slashdot at about
80 percent of the troll market. Therefore there are (7000+1400+700)*4 = 36400 Slashdot readers. This is consistent with the number of Slashdot
comments.
Due to the troubles of VA Linux Systems, abysmal sales and so on, VA changed their name to VA Software, Slashdot went out of business and s
tarted taking bribes (called subscriptions) and was taken over by VA Software who sell more troubled software. Now VA Software is also
dead, its corpse turned over to yet another charnel house.
All major surveys show that thanks to JonKatz, Slashdot has steadily declined in readership. Slashdot is very sick and its long term survival
prospects are very dim. If Slashdot is to survive at all it will be among JonKatz and his loyal troll-followers. Slashdot continues to decay.
Nothing short of a miracle could save it at this point in time. For all practical purposes, Slashdot is dead.
Fact: Slashdot is dying (thanks to JonKatz)
Are you on crack? Calling Exchange's "groupware features" anything but an utter joke is absurd. They're still trying to catch up to what Lotus has been doing for years, and they aren't doing a very good job of it.
If you just want to run email, Exchange/Outlook is fine. If you want a collaborative groupware sollution with work flow built in, Domino/Notes is the only answer, currently.
Plus, Domino runs on Linux, Aix, Solaris, NT, 2000, OS/2, AS/400... The list goes on and on. As far as a shared database, just setup shared mail.
Not to mention, unlike Exchange, when one mail database gets hosed your whole server doesn't get scrapped. And you aren't supporting Microsoft.
The issue is not the waste of inodes, but the waste of diskspace because the smallest file chunk is one inode worth of space. It's usually said that if you have 4k inodes, you'll lost 2k (on average) per file. This is not really correct, because inodes themselves take up space -- I remember reading a paper somewhere many years ago where they estimated that most users would find 4k inodes better than smaller values, because in normal file distributions the space you save with the smaller inode is less than the space of the increased number of inodes themselves. However, this would lead one to believe you should have really big inodes and really big files, and then you'll be very efficient.
But really, none of this should be given much weight until someone does a statistical analysis of just how inefficient a one-mail-per-file system is. It might not be significant, or it might be insignificant compared to storing base64 messages, or it may be insignificant compared to the benefits of compression. It's bad form to optimize before profiling, and the many-file inefficiency concerns feel like they are more based on intuition and less on fact. But then, someone must have studied it, so maybe not.
"Gets away with it"? Come on. Someone made a mistake in their team. Yell at MS and the next version will fix it. It happens.
I doubt there's a sinister plot in MS to mess up
people's emails. They've got to use these products too, you know.
What did you eat today? http://www.atetoday.com/