Slashdot Mirror


Improving Unix Mail Storage?

At first, there was mbox, then there was Maildir, and Bill begat Outlook and .mbx. CaraCalla wonders if there is a better way to store mail than the way we currently store it today. I admit, with the changes that email has undergone over the past 5 years (changes in what is being sent, not necessarily in how it is sent), it may be time to reinvent the mail format. Read on for CaraCalla's analysis of the current mail options, and his thoughts on where we may go in the future. If you were to design your own MUA, how would you design its mail storage? CaraCalla asks: "Does anybody know a good, free solution for storing mail on unix hosts? The reason that I ask this question is my discontent with available techniques:
  • mbox: There are problems with locking, corruption, access-times, and bloat.
  • Maildir: Do you really want to clutter your system with millions of small files? That's waste of inodes, space (unless perhaps you use Linux/ReiserFS or SGi) and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.
  • Cyrus: Basically the same as Maildir with database features.
  • UW-Imap mbx: That's classical mbox with extensions allowing multiple access.
  • Evolution: Basically mbox with database features.
  • Windows clients: Typically some proprietary db-format. Pathetic.

But the thing that bugs me most is disk space. Typical inboxes are made of 5% to 10% of Text including Headers and HTML. The rest are BASE64- (or UU-) encoded pictures, word documents, zip archives and so on. The problem here is the encoding which wastes considerable amounts of space (at least one third).

Some ideas about the ideal mail-storage:

  • One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?
  • Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?
  • File format: gdbm, Sleepycat db? Something new?
  • Should the security model allow users to directly access their files, grep them, copy them around?
  • Shared folders, virtual domains?
  • Unicode support in folder names? Imap message-IDs, flags, useragent specific state-information?
  • How would MTAs deliver mail? How would clients access? File-locking (NFS)?
  • What about backwards-compatibility? Writing libmailstore (anyone)? adopting UW c-client?

Does my ideal mailstorage exist somewhere? Is somebody working on a project addressing this? Does anybody have some other hints? And please no mbox/Maildir flamewar!"

21 of 554 comments (clear)

  1. One folder to rule them all... by Pig+Hogger · · Score: 5, Interesting
    Stuff the mail in one folder, one to rule them all.

    But put multiple indexes (by sender, subject, date, whatever key-classes you want to assign messages) and the possibility to restrict the range displayed. With careful programming, you can manage many users who won't be able to read each other's mail, except as required.

    This way, you can arrange your mail as you please.

    No more message duplication. Send a memo to 250 people? Just send it once, but tag it as readable by the 250 sendees.

    Of course, this calls for an SQL database... :) :) :)

    1. Re:One folder to rule them all... by mikefoley · · Score: 3, Interesting

      Do they still not have single mailbox restore? Do you still need to build a seperate exchange server just to restore a single mailbox or message?

      Exchange made my life miserable for many years in the 93-95 timeframe. It might be better now.

      The concepts weren't bad (db for mail, etc) but the execution was terrible. I was field testing Exchange (for Alpha NT) when I was at DEC and asked the Exchange manager point blank about single mailbox restore and he said "Why?" My answer "When my boss wants that email he really needs yesterday, you're telling me I have to build a totally new system and restore 8GB (at the time) of data just to restore a single mail message????"

      "Uh, yea?"

      No thanks...

      --
      What's my Karma Mr. Burns? "Excellent"
    2. Re:One folder to rule them all... by jcoy42 · · Score: 2, Interesting
      Exchange is actually a pretty decent mail server

      The problem is, mail is a critical app, and in some ways exchange *really* misses the point.

      I think the most frustrating situation I've ever seen as a system administrator was when we were doing a scheduled reboot of the exchange server. After about 20 minutes waiting for it to resync itself & shutdown, my boss, one of those "smart enough to be dangerous types", decided it was a typical NT hang & to go ahead and hit the reset button (he did it in a single motion without a word, there was nothing anyone could do).

      It took almost 2 weeks to get things straightend out. We had backups, but it turned out there was a 2048 meg bug with NT restores that had been re-introduced by a recent server upgrade, and we had problems getting the patch rolled into the new code (legato tech support- need I say more?).

      350 people *screaming* for 2 weeks. I was very glad I was not the mail administrator.. but very sad to be sitting next to him.
      --
      Never trust an atom. They make up everything.
    3. Re:One folder to rule them all... by H310iSe · · Score: 4, Interesting

      ditto. Either exchange is impossible to administer well or just very, very hard. Until recently you couldn't restore single users mailboxes (there was brick backup but eventually even MS admitted it didn't work) you have to restore the entire server to get back one corrupted data store.

      At one firm I was at exchange went down and it took 3 days of 24 hour work to get it back up (I guess we were 'lucky') - the solution? a $50,000 backup server that does absolutely nothing but wait for the main exchange server to go down. First time we had to use it, we were down for a day, it didn't come up.

      I'm currently looking for a mail server, any server, that does mail well for 50 - 500 users (I'd settle for 50-100). I've played w/ xMail, it's tough to config. Heard good things about WorldMail (qualcomm?) but not used it. Heard free BSD's qmail (?) is good as well. I'm very interested in anyone who has info about free or cheap mail servers that can be configured in a day or two of work. If that exists.

      --
      closed minded is as closed minded does
  2. My fast, easy solution. by crazney · · Score: 2, Interesting

    Well. My solution for storing ALOT of BIG email but still browsing fast is to use MySQL. My mail client is Pronto! (written by Muhri, in perl, gtk, etc).. I have several 10's of thousands emails in about 10 different folders. Reaction time is immediate and searching is pretty damn quick aswell.
    The mysql server is at work, and I can view my mail from anywhere simply by pointing my client at my IP. Presto.

    I'm also slowly writing a MySQL-based IMAP server which will hopefully be compatible with Pronto!... But as with so many projects, itl probably take some time to complete...

    David

    --
    stuff
  3. Eudora by cpaluc · · Score: 1, Interesting
    I'm quite happy with Eudora's mail storage technique. The messages are stored in a format much like mbox except that the attachments are stripped out and dumped in a user-specified directory. This leaves text-only mailboxes that are reasonably small in size. They can be searched easily/quickly and they can be compressed even smaller for storage/backup. I really don't see the point of retaining attachments within the mbox file - apart from the inefficiency, they're not accessible from the shell/OS (eg. you can't grep your attachments unless you manually export them).

    This is one feature i miss in Linux mail clients. At one stage i wrote a perl filter to achieve this functionality with Kmail.

  4. Well.... by BJH · · Score: 2, Interesting

    One file per Mailbox-folder, allowing multiple folders per user. Should those files reside in one central location or in users Homedirs?

    Depends on how the user accesses their mail. If they read their mail only on the local machine, it should be in their home dir. If the server allows multiple forms of access (like local + IMAP), central storage makes sense. There's a lot of other issues here, like backup methods.

    Compression: Should messages be broken into pieces and the MIME-attachments stored separately (thus searching of the text parts would still be possible without decompressing the whole file)?

    No. Separating a single mail into its component parts is just asking for trouble (not to mention that it massively increases your locking problems).

    File format: gdbm, Sleepycat db? Something new?

    Personally, I like Maildir, since it lets me use standard tools like grep to find particular mails. I admit that a more efficient method is probably required these days.

    Should the security model allow users to directly access their files, grep them, copy them around?

    Yes, of course. It's their mail - let them do what they want with it. The mail app must be able to deal with that.

    Shared folders, virtual domains?

    Shared folders would be nice - IMAP can do that now, although it's overengineered and not necessarily fully implemented in any particular IMAP server. Virtual domains I've never had any use for myself...

    Unicode support in folder names?

    Why not?

    Imap message-IDs, flags, useragent specific state-information?

    As you say, IMAP does that already...

    File-locking (NFS)?

    More the fault of NFS than the mail software (and I believe NFS4 handles locking better).

  5. Re:I vote for a filesystem-based database by GoRK · · Score: 3, Interesting

    This, as several other threads note, is the approach that Hans Reiser is taking with his filesystem. That is, if the filesystem is not good enough for storing our (large-grained) data, that we are resorting to what basically amounts to indexed archive files or databases full of BLOB objects to store our data, then our filesystems are broken. A directory with 1,000,000 files in it shouldn't take any longer to return a sorted directory listing than one with say, 10 files - because it all should be indexed behind the scenes. Same for the problems of inode starvation, fsck, etc. A program such as mail clients and servers (for most people anyway) -- or any other apps that need simple storage should use the filesystem as the storage mechanism.

  6. OS400 has been doing this for years by Starbuck · · Score: 1, Interesting

    on a REAL computer (albeit big iron), OS400 does exactly what they are proposing. Sure, the as400 has a bunch of smaller processors that operate the individual subsystems, but isn't this somewhat like what the video card industry is stepping towards in terms of GPUs. If your hard drive handled all of the hard drive tasks (meaning it only requests/sends data to the CPU) things would be a lot faster. Also a lot of proprietary hardware, but that's what standards are for. something like this is years away, but there is a limit on how bloated and stupid an OS can get. (sorry XP, but your 1000MB butt is too big for my taste.)

  7. Re:Quantum-like Storage by akh · · Score: 2, Interesting

    A similar idea has already been implemented. Some
    Canadian researchers used an existing 8000km fiber
    optic network as a storage device. Basicly, the network
    is configured as a loop and the
    data to be stored is simply sent onto the network.
    Packets of data are placed onto the network and can be
    pulled from it as they pass a node on the network.
    It's kind of like a cross between a token ring network
    and a mercury delay line. You can find a few more
    details from this link.

    --
    Accept Eris as your Fnord and personally sate her
  8. Usenet-style, with overview database. by strredwolf · · Score: 4, Interesting

    Plain and simple. Switch from mail to Usenet. Maildir-like structure, but with a .overview (XOVER) file to help out with indexing.

    Storage is another problem, though... but Usenet messages can be sidetracked a bit with the encoding.

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
  9. Re:No Notes on Linux by Anonymous Coward · · Score: 2, Interesting

    as i understand it, and i know i don't know much, it looks a lot like notes is going to be ditched in favor of web-ish access. i would guess that notes and the sametime client are probably going to get obsoleted at the same time...

    of course i have no real information, but that's how it looks. i don't think anyone believes notes is actually a good piece of work. i certainly hope not.

    [posting anonymously so as not to irritate my superiors]

  10. Looking for a problem that doesn't exist by NotZed · · Score: 5, Interesting

    I lost the plot half way through this, but here's some food for thought anyway. Now I should get back to work ...

    Z

    I think that this is looking for the solution to a problem that doesn't really exist in the first place. Although I guess it depends somewhat on what you define as 'Unix mail'.

    I'm a developer on Evolution, and primarily on Camel, evolution's email library. I'm not sure i'd rave about it (although I think Camel is a mostly beautiful piece of code ;), but it works reasonably well, and we've had a chance to try and deal with users with lots of email.

    What IS 'Unix mail'?

    I would define Unix mail as mail (rfc822 format) downloaded and stored locally on a per-user basis. IMAP, Exchange, and other remote protocols are very different beasts.

    Why are DBMS's not suitable for 'Unix mail'?

    Once you have a remote server you have to do things differently than if you have local access. Using a DBMS, and having a trained administrator to manage it are practical considerations, as are the benefits you might get from this configuration. These solutions dont really make sense for standalone users. They shouldn't need to install and manage databases, complex backup prodedures, and so forth, just to read their email.

    i.e. rdbms's are:
    hard to setup
    hard to maintain
    another major point of failure

    If however, I was to design a multi-user groupware server, then a DBMS would come into serious consideration - at the backend at least. It allows you do to things like easily consolidate authentication outside of the operating system (the idea of having a 'shell account' to access mail is somewhat outdated), it allows you to save space by storing common data, like attachments and email content in a single place, and redirecting it to multiple recipients (which is a common practice within organisations). It may be practical to use a mixture, a RDBMS to store textual parts or indices to data stored in a more conventional filesystem.

    But even with a RDBMS backend, I would personally probably still stick to IMAP to serve it to actual clients. The IMAP protocol is a bit heavy, but not really that bad, and it serves email, I dont think there's really any need to reinvent the wheel here.

    So ...

    If you define unix mail as I have, and separate it from a *mail server*, then you rule out full blown RDBMS's, and are left with:

    single file database
    multiple file database

    I'm not even going to mention XML because I think it is the single most stupid idea anyone's come up with. It is completely unsuitable for this purpose.

    And well, there's really no reason not to use MIME to store the messages. MIME already does everything you can possibly do with email (since, uh, it is how the email *will* be sent), any client will already have to deal with it, and mime decoding is for the most part really quite simple and fast anyway. Translating the mime format into some other storage format really doesn't make sense.

    single file databases

    mbox

    Mbox is a single file database. Its just that everyone that uses it generally writes their own access code. This is where problems with 'locking' come about, either because the underlying filesystem doesn't support it properly (e.g. some nfs implementations), or everyones clients don't use the same locking mechanism. This really just an implementation issue anyway. There would be nothing to stop someone writing a common 'mbox.db' library that stored everything in completely compatible mbox files, which took all the work out of it, and then you'd have an mbox DBMS ...

    mbox scales ok, without any caching of header information it handles in the order of 2K messages in an interactive timescale, and quite a lot more if you dont mind some short delays (i.e. in the order of the time it takes mozilla to start up).

    Appending and reading is quick, and reliable - assuming the filesystem works, which is a pretty safe assumption to make. This is assuming the mailbox is first summarised at first opening, otherwise looking up messages can be slow, because you have to scan the whole file first.

    The only operation that is slow is expunging messages, and at worst case isn't really any slower than copying a whole file across to another file.

    The only other issue is agreement on the 'standard' for what constitutes an mbox file. For example. Solaris uses and honours the 'Content-Length' header, and thus it does not translate any lines beggining with "From " into the conventional ">From ". Some mail clients translate "(>*)From " into ">\1From " (using sed syntax) and visa versa, others do not. There is no standard, just some conventions, some of which aren't easy to determine either.

    Because you need to keep the whole index in memory at once, this can become expensive, but you could use a secondary database as an index into the real file. But eventually you hit a point where the cost of expunging does get too expensive. You could just archive the mail regularly, or use a format like maildir instead.

    gdbm/db/etc

    db files wrap the single file in a common api that handles all of the locking issues and access issues for you. Some have different features, e.g. querying capability, logging and transactions, etc.

    We've never tried to use db for this purpose, more just because we didn't think it was worth it. All you really get with a minimal implementation is the ability to store and retrieve a blob of data using a single key. Writing is fairly slow because the database has to manage more details for you (locking, allocating blocks, unlocking, etc). You could use multiple db files as indices to perform multiple-key searches, but they are quite slow at creating them (we tried using db for the content indices and it was way too slow).

    i.e. even if you store the data in a db file, which gives you a slight benefit of inbuilt referential integrity, you still need to provide additional indices to actually be able to use it in any useful way. Evolution suffers this problem with the addressbook which stores vCards in db records.

    Most db libraries (all?) also dont provide any mechanism to stream data. You either get the whole lot into memory, or you get none of it. So for large messages you're limited by memory (well, evolution is anyway, but it doesn't have to be). Yes, memory is cheap, but it is still a consideration, and it would certainly rule out a simple database in a multi-user environment.

    db files are also slower than native files, especially for large objects. You're mapping an arbitrarily sized chunk of data to some 'database blocks', which are then stored in an arbitrarily sized 'database file' which the operating system is then mapping to its 'filesystem blocks'.

    multifile solutions

    Well I guess this comes down to mh and maildir. mh isn't really suitable for anything, because of its just plain bad design and lack of defined semantics. There's no way to guarantee anything about its operation.

    maildir - i like. It moves the scourge of trying to implement a reliably, scalable, multiple access database almost entirely into the operating system layer. Operating systems already do this very well - they manage hundreds of thousands of files randomly written across your disks, without skipping a beat.

    No operation requires more than a single message size of data, and the operating system already indexes the message, via its filename. Sure, ext2 doesn't do such a swell job with long directories, but that can be addressed (and the same problem can be addressed on just about any platform). For 'free' you get concurrent multiple-reader, multiple-writer database access, without any of the considerable problems you have to solve to implement it otherwise.

    The maildir 'protocol' is simple, reliable, and it works.

    Again, it can easily be augmented by a client with additional indices, but for things like delivery agents who dont care about existing email, they dont need to suffer that overhead at all.

    Some other comments specific to the question:

    Compression. Personally I dont see the point. But a maildir-like structure would fit well with compression. Flat files would be the worst (e.g. mbox), and block-file formats (like db files) would also work well with compression. The good thing about email is it is 'write once', you don't edit or change the messages in the mailbox.

    External attachments. I guess its possible, but again, it isn't really worth it in most cases. Parsing MIME is *fast*. It is much faster than parsing xml, and besides, people rarely look at an email more than once or twice. There isn't much use going off and storing the attachment in a high-performance reading format if it isn't going to be accessed often, and it just places a greater burden on your server.

    base64, etc. Well, its entirely possible simply to store the messages as 'binary' format. Assuming the boundary markers are checked properly, Camel can work with binary encoded mail messages, and probably at least some other mail clients can too. There are some problems with some of the extremely broken openpgp/pgp/mime specs which suddenly say that mail transports aren't allowed to alter the *transport* encodings of some parts, but well, these specs are just braindead, and can be worked around.

    Security model. Well, talking about Unix mail, not server mail, the filesystem is adequate.

    Shared folders - is not an issue for unix mail.

    Unicode. Well you can write unicode filenames to most unix filesystems, evne if 'ls' doesn't show it right.

    MTA. Nothing could be simpler or safer than maildir as a delivery format. The mta doesn't have to care about any client-side indices, the mua will simply update them when it incorporates the new messages, etc.

    Writing libmailstore? Mate, its called Camel, and its already written. Camel already does mbox, maildir, mh, it can read spool files directly (it doesn't create a summary file or build any indexes), it can talk imap, pop, and partial support for nntp. If someone gave me a decent RDBMS table schema and a carton of pale, I could probably write a MySQL backend in a couple of days, well, assuming the MySQL api is mt-safe.

    Finally, some comments on evolution.

    Evolution isn't reinventing any wheel. We use standard mbox format (if such a thing really exists anyway). We use standard maildir format, etc. Yes we may optionally create body indices, and we do usually create on-disk binary/compressed 'summaries' of the data, but these are really just on-disk caches of in-memory data structures, rather than anything to do with the mail storage format.

    We put mail in another location, but everyone else has done that too, elm:Mail, pine:mail (or is it the other way around?), netscape:ns_mail, etc. At least we now offer the option to read most of this 'in place'.

    The main problems evolution has with scalability is:

    indexing.

    Indexing is quite costly. The original index code was written somewhat like a database, it handled all internal data structures, used blocks of data, etc. It was slow, it scaled poorly. Definetly some of the algorithm choices and the implementation wasn't that hot, but it shows that such a solution isn't as simple as at first thought. Using libdb was impossibly slow (like several orders of magnitude slower).

    The new stuff is a lot better, but can still use a lot of resources while indexing, and copies the whole file (well 2 files) across when performing expunges, but they are only performed occasionally, and the indices are smaller than the original indices, so in practice it scales much much better.

    the summaries

    The summaries are indices of a sort anyway. They are an in-memory tree of a subset of the information on each message. Enough information to display a list of messages, and perform vfoldering operations. Even though we do some tricks, like sharing common strings, the summary can get very large.

    But, its a tradeoff I thought was worth it, rather than using on-disk summaries. The api's are much easier to use, and the problem gets pushed to the user - if they want to have folders with 100K messages, they should expect it to use a bit of memory. The on-disk size of the summaries is very small too, although I guess it could be made even smaller if we consolidated common strings.

    per-message memory use

    Currently, a lot of data gets copied around in memory. Every time you read a message, at least 1 whole copy of the (decoded) message is in memory at a given time (yes, including attachments). For IMAP this can get even worse (2-3 copies of a given attachment at a given time), because it doesn't stream enough. Most of this could use a disk-backing without changing any api's though, and well, i'm rewriting IMAP.

    Wrapping up ...

    And yeah, we're talking 100K messages here, not 1400. My 500Mhz celeron laptop has about 35K messages stored over about 10 mbox files, and it starts up in under 10 seconds, and that includes all of the bonobo/activation overhead (which is very significant). Yeah it uses a bit of memory, but memory is cheap on a personal workstation.

    In short. The current mailbox formats we have suffice for "Unix mail". Add some archiving abilities to your mail client (even RDBMS backed mail clients need archiving), and you'll never have to delete a message again, and still get work done and still use mbox.

    If you want to talk about writing a server - well who cares, you can do whatever you want, because everyone has to go through your interface anyway (you DO NOT want clients accessing data under you, thats what DBMS's are all about in the first place ... and you dont want 1-tier applications), so it doesn't matter what format you use under the belt - you can choose the format which best suits what you're trying to do.

    It seems some people think using 1-tier applications (client code talking directly to a database) are the way to go for multi-user environments. They're not, they dont scale and are impossible to maintain. Nobody writes any real software like that anymore, unless you're writing dodgey vb toy apps.

    --
    _ // `Thinking is an exercise to which all too few brains
    \\/ are accustomed' - First Lensman
  11. Use standards by dybdahl · · Score: 2, Interesting

    There is absolutely no reason to abandon the standard e-mail file format, including uuencode for file formats. Doing that, you would end up with a file format that depends on certain versions of the e-mail file format to work optimally. If you want to reduce harddisk space, zip it like OpenOffice.org does.

    E-mails are documents. Documents belong into the home directory, and so do e-mails. If you want to do something new, you should use the harddisk folders as e-mail storage, so that e-mails, spreadsheets and documents mix. This probably requires inventing a new ".e-mail" file format so that e-mails can be properly recognized and indexed.

    Storing one e-mail in one file is not a problem as long as you index the filenames properly, for which you can use gdbm.

    Dybdahl.

  12. mbox.funkified by yem · · Score: 2, Interesting

    This is all very interesting because I'm slowly writing an IMAP server at the moment..

    But here's the setup I'm currently using:

    Inbox:
    /var/mail/$USER
    Subfolders
    /var/mail/$USER-folders/$FOLDER/.messages

    Eg:

    /var/mail/
    |-- root
    |-- fred
    `-- fred-folders
    |-- 1ZB
    | `-- .messages
    |-- Friends
    | `-- .messages
    |-- Games
    | |-- .messages
    | |-- Rune-Beta
    | | `-- .messages
    | `-- Tribes
    | `-- .messages
    `-- Mailing Lists
    |-- .messages
    |-- EFNZ chat
    | `-- .messages
    `-- Hard News
    `-- .messages

    I started with uw-imap but I want to store messages and subfolders together. Plain uw-imap doesn't do this and last time I checked, neither does Maildir. So I did a [kludgy, incomplete] mod and produced the above. Works for me :)

    Get the patch: http://home.y3m.net/uw-imap-2001a-nested-folders.p atch
    (diff against imap-2001a)

    In the server I'm working on you will be able to implement a relatively simple C++ API to do your own storage. So you can use Maildir, mbox, PostgreSQL, whatever. We'll see.

    flame away :P

    --
    No, I did not read the f***ing article!
  13. Some thoughts by ChrisJones · · Score: 3, Interesting

    There seem to be two discussions going on in the comments today, one about mail storage for an MUA and one for storing mail on servers.
    As far as the client end is concerned, from the point of view of writing an MUA, having an SQL backend is a complete godsend because you have to write virtually no IO code, you can put all the logic in the queries. However, there are some tricks you need to use to keep up the speed, most importantly to use two tables, one for metadata and one for the mails themselves. This keeps the speed up by keeping the metadata table small (maybe on a better RDBMS than MySQL this wouldn't make a difference, but I found that >10,000 mails all in a single table in MySQL got quite slow until I moved the metadata into a seperate table).
    The obvious downside of using a DB for client end storage is that you have to have a centreal DB server, or one on each client and you need to admin one more set of authentication/permission details, plus you can't move the mail very easily to other MUAs. IMO a much better solution would be to keep the use of SQL/RDBMS, but move the DB into the filesystem so you can just have a bunch of files with metadata stored in the fs. Need to make an mbox? "cat ~/mail/* >>/tmp/my_new_mbox".
    From the server point of view, many people have been mentioning Exchange/Domino etc. Personally I can't stand Exchange, I've had to admin it on several occasions and it's generally done everything it can to stop me from having an easy life (just thought I'd air my predjudice against Exchange in the spirit of fairness and honesty ;) I've never used OpenMail/Domino/Notes/whatever, but I guess they do roughly the same thing, which is a pretty good idea. However, these things all have the distinct disadvantage that they use propritary protocols and aren't particularly cheap. There's always IMAP, which many people really like, but I feel is too complex a protocol (compare with the infant levels of complexity in POP3).

    With a colleague of mine, I'm working on a set of POP3 extensions that give some IMAP like features, but is really designed to keep multiple mail clients in sync with each other by way of a transaction log. There are still some limitations, but I think I know what they are and how to fix them (e.g. not enough metadata can be associated with each mail yet). It adds about 6 or 7 commands to POP3 and currently lacks any decent client support, but I have written a fairly usable library and patch to gnu-pop3d for it. I've just submitted it as my University final year project, so I'll try and get the protocol description documentation online soon. In the mean time, if you're interested, it's on SourceForge

    --
    Chris "Ng" Jones
    cmsj@tenshu.net
    www.tenshu.net
  14. A alternate proposal by Anonymous Coward · · Score: 2, Interesting

    What bugs me the most with current mail technology is the problems with distributed mail handling.

    I access my mail on all kinds of devices, sometimes online sometimes not.

    My main problem is not so much witch mail-server / retrieval / presentation to use, since they all have the same inability to give me a working distributed solution.

    For online usage imap is sufficient, but if I go ofline with my laptop or ipaq, Im lost.

    POP isnt very efficient either, since only one of my clients can be the deleter, I must make sure that I synced all my other devices before the deleter removes the message.

    Since I use tons of folders for my mail, some of my stored mails data back to the late '80s, it basically forces me to use imap so my folders are insync on all the devices, but again that only works online

    Further it only works if my imap server is online. That can be a trouble if Im in some far of part of the world and for some reason or not I have no contact with my mailserver.

    What I would like is a concept I call SyncMail

    A distributed db-system. First I set up some 3-4 primaries, spread out on the net with completly different access routes. Each of them gets a MX record.

    The sending mta is happy to deliver to a secondary mailserver if the primary is ofline.

    But here comes the magic!

    The system regarded as a secondary MX by the rest of the world is in fact a primary!

    It sucks the message instead of queing it into its db, tags it with it's own internal server id, and tries to sync it to all other SyncMail primaries.

    Sooner or later the new mail is propagated to all the primaries.

    On the client side, the SyncMail app, contacts all the primaries, and cheks against a private index, and syncs all new mails, first trying with the closest server.

    Since all mails are tagged with what primaries it's been delivered to, no mail is retrieved to the client more than one time.

    Now I have a complete local mail-tree in my client, regardles of which primary I was able to contact, sure if a mail was delivered to a primary that goes ofline before the client syncs, and it hasnt been able to sync it to the other primaries, I wont get it until that primary comes online, but - what the heck, in pop/imap is my mailserver ofline im completly out of buisness, so the loss is defenetly smaller in this case.

    And for my ipaq i just configure the client to work with a few important folders, and to skip attachments, to save storage

    And for sending, all clients stores it in a outbox, wich is then synced to the primaries, once it gets to a primary it is sent in normal SMTP
    this way I solve the problem of being able to send mail with propper originating SMTP headers. Of course the outbox is synced as well, so I get a ref copy of my mail on all systems.

    I have started on a SyncMail application and someday I might be able to complete it, but there is so much work all the time :(

    Would anybody else be interested in this concept, maybe we could complete it together.

    Or if this is a realy stupid Idea, I'd be glad if someone would point it out, so that I can focus on finding a better solution.

  15. Re:Don't speculate. Profile. by mgedmin · · Score: 3, Interesting
    An interesting comparison, but its a comparison of Courier-IMAP vs UW IMAP, and not just Maildir vs mbox.

    I once tried benchmarking Maildir vs mbox for my mail archives (mailboxes with ~3000 messages). On ext2 Maildir was a loss:

    • Mutt took twice as long to open a Maildir than mbox from cold cache.
    • Mutt still took a bit longer to open Maildir than mbox from hot cache.
    • On ext2 with 4K blocks mbox ate 13 MB of space, Maildir ate 21 MB.
    • Small UI degradation: Mutt wouldn't show the number of lines in a message from a Maildir, and it wouldn't show percent progress indicator while reading the Maildir.
    Basically for my situation (read-only mail archives with large numbers of messages, which are rarely in filesystem cache, ext2 and constant disk space shortage) mbox was better. But my situation (personal mosty static mail archives) is remarkably different from running IMAP server.

    I did this test in 2000. I should probably try again some day with Reiserfs, but I heard various people telling me it doesn't improve Maildir performance. Can't say anything until I try myself.

    I therefore recommend you to try it yourself and see if Maildirs really help in your situation.

  16. the plan 9 approach by rpeppe · · Score: 5, Interesting
    as a basis for an approach i like what plan 9 does. the mail is made available to clients as a filesystem (provided by a user level program). each mail message gets its own directory; each mime attachment gets its own subdirectory within that message (and recursively, as MIME is recursive).

    here's a little transcript:

    % cd /mail/fs/mbox
    % lc
    Directories:
    1 113 128 142 157 171 186 20 214 229 243 258 272 287 300 315 33 344 359 373 388 401 416 430 445 46 474 56 70 85
    [...]
    % cd 318
    % lc
    Files:
    bcc date filename info messageid rawbody sender type body digest from inreplyto mimeheader rawheader subject unixheader cc disposition header lines raw replyto to

    Directories:
    1 2 3
    % head raw
    Return-Path:
    Received: from punt-1.mail.demon.net by mailstore for rog@vitanuova.com
    id 1021665470:10:17045:138; Fri, 17 May 2002 19:57:50 GMT
    Received: from psuvax1.cse.psu.edu ([130.203.4.6]) by punt-1.mail.demon.net
    id aa1016828; 17 May 2002 19:57 GMT
    Received: from psuvax1.cse.psu.edu (psuvax1.cse.psu.edu [130.203.6.6])
    by mail.cse.psu.edu (CSE Mail Server) with ESMTP
    id 27DA4199BE; Fri, 17 May 2002 15:57:13 -0400 (EDT)
    Delivered-To: 9fans@cse.psu.edu
    Received: from acl.lanl.gov (plan9.acl.lanl.gov [128.165.147.177])
    % head body
    This is a multi-part message in MIME format.
    --upas-mbyuptynpdsmbjuyeermihdgur
    Content-Disposition: inline
    Content-Type: text/plain; charset="US-ASCII"
    Content-Transfer-Encoding: 7bit

    Hi,

    If you seek excitement and thrills you need to look no further than
    Plan9 -- it gives you everything and then some, but in a good way (or
    % cd 2
    % lc
    Files:
    bcc date filename info messageid rawbody sender type
    body digest from inreplyto mimeheader rawheader subject unixheader
    cc disposition header lines raw replyto to
    % cat mimeheader
    Content-Type: image/jpeg
    Content-Disposition: attachment; filename=iostats.jpg
    Content-Transfer-Encoding: base64
    % page body
    reading through graphics...
    %
    "raw" contains the raw data that makes up the message. "body" contains the data after the encoding formats have been applied (hence in that case /mail/fs/mbox/318/2/body is a jpeg file, viewable directly by any usual jpeg viewer).

    the beauty of this scheme is that it hides the underlying storage scheme from the mail clients. if i wish to change things so that the underlying storage format is many files [currently it uses a traditional mbox format], none of the mail client programs have to change.

    plus i can use grep, diff, shell scripts, etc directly on the messages in my mailbox. procmail eat your heart out.

  17. Re:hmmm by AdTropis · · Score: 2, Interesting

    when i read your post, i immediately thought of a Jamie Zawinski article that i read a few weeks ago:

    http://www.jwz.org/doc/mailsum.html

    he talks about this very thing. quite interesting if you ask me.

  18. Maildir and 1000+ Mails by PCGod · · Score: 2, Interesting
    and just try to open a Maildir with 1000+ mails and see how long it takes your favorite Mailprogram to only display the subjects.

    Until about 3 days ago, I had 1700+ messages in my Maildir, and pine (patched to support Maildir) opened my inbox in about two seconds. Compare this with my sent-mail folder, which had about the same number of messages in it. This folder is stored in mbox format and it took 5+ seconds to open AND CLOSE this folder. I believe that Maildir is the fastest option, short of keeping a seperate database.