Improving Unix Mail Storage?
At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage?
CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
- mbox: There are problems with locking, corruption, access-times, and bloat.
- Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
- Cyrus: Basically the same as Maildir with database features.
- UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
- Evolution: Basically mbox with database features.
- Windows clients: Typically some proprietary db-format. Pathetic.
But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).
Some ideas about the ideal mail-storage:
- One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
- Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
- File format: gdbm, Sleepycat db? Something new?
- Should the security model allow users to directly access their files, grep them, copy them around?
- Shared folders, virtual domains?
- Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
- How would MTAs deliver mail? How would clients access? File-locking (NFS)?
- What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?
Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"
SPAM is a burden to everyone. As a system admin, I was told to do something about it. After some research, the best solution was to impliment SpamAssassin on our linux mail server. I tried sendmail SPAM filters, procmail rules, etc. SpamAssassin is undoubtedly the best solution and I recommend it to everyone. It needs to be implimented at the server level, so email your ISP if you don't have root access. It is a simple perl script that can be run with sendmail (using a C++ version) or in procmail (perl). It is very easy to setup using perl CMOS.
How does it work so well? Spamassassin checks the headers and body of every email passing in to the mail server. It searches the email for certain keywords and phrases and other SPAM characteristics and assigns points to the email based on these. It works very well and has many options --including the ability to have "black lists" and "white lists" in file glob format.
So far I have blocked about 94% of the SPAM coming in through our mail server. It only misses a couple and is highly configurable! Download and install it!
Cheers,
Tom
http://tomgould.com/
Just edit your .procmailrc:
:0
/dev/null
And all your problems are solved.
Very few systems give alternate functional views to different views. In order to send a letter to a section, I'd have to find out a name of a person in the section, send it to that person's email, and then hope that person is in.
What is needed is a parallel view where people can add functions to their role (at user level, for example).
So an email to recruitment@sample.com will get to the recruitment folder, and any of the recruitment officers can deal with it.
The only way around this is then to look at the issue of spam. If everyone has a "recruitment" address, then one could send out mail to "recruitment@[each domain]" a lot easier than getting the right name for each domain.
The idea is that a section inbox should be available to a section, and not an individual, and that people in the section should be given access when appropriate. A section would then retain the same name, regardless of the personnel making it up.
None of the mail systems that I see grasp this point.
One could have view sets, which are alternate tree structures, with the accounts at the leaf objects. One could be in the flat "name" tree, or access the personnel\recruitment intray, or whatever.
OS/2 - because choice is a terrible thing to waste.