Slashdot Mirror


What Mailbox Format Do You Use And Why?

RossyB asks: "What format for my mailbox is best? The University of Washingtom IMAP server only supports mbox, and claims that maildir is slow and dangerous. Qmail only supports maildir, and claims that mbox is slow and dangerous! Who is right? Why?" I think one of the large problems with the adoption of maildir is the lack of MUA [?] 's that support it.

"I currently store all of my e-mail in a local mbox-style IMAP store in ~/mail/, so that I am not tied to any particular mail client. However, I am planning on syncing my mail across multiple machines (home, work, and soon a laptop) so I need to have mail in a form which can be synced easily. MBox is bad for this because if I grab mail on one machine, and later delete some mails from the same folder on another machine, then sync, the new mails will be lost. This is where maildir is good - each message is a separate file. But why do so many people hate it? If I do change over to mailbox, what IMAP/SMTP servers should I use? A hacked sendmail/UoW IMAP? Courier-IMAP + QMail? Something else? How do other people keep their mailstores synced across many machines, and what software do they use?"

20 of 364 comments (clear)

  1. Re:Need better filesystem for maildir by Christopher+Cashell · · Score: 3

    Actually, I believe this is one of the things that ReiserFS excels at.

    I have very limited experience with Reiser myself, so perhaps someone else can provide more details, but as I understand it ReiserFS is capable of dealing with thousands of small files extremely efficiently (Through the use of tree structures to hold the filesystem). From what I've read, it would be a fairly ideal file system for things like maildir storage.

    In fact, now that the 2.4.1 kernel is out, with included stable ReiserFS support, I might just give this a shot. ;-)

    -- Toph

    --
    Topher
  2. Take a look at Cyrus by thule · · Score: 3

    Cyrus http://asg.web.cmu.edu/cyrus/ seems to use a hybrid approach. Messages are stored in individual files, but the envelope information is stored in dbm format. So opening up a mailbox and listing messages is very fast. So is searching unless you want to do a full body search on all emails. Give it a try. It supports IMAP, POP, and LMTP.

  3. UW Imap mailbox formats by Outland+Traveller · · Score: 3

    If you look under the hood of the UW Imap server, you will see that it supports many more formats than straight mbox. I don't think that maildir is one of them, unfortunately, but there are a few (mbx comes to mind) that overcome some of the more blantant shortcomings of mbox.

    Is UW Imap free software? If so, someone should feel free to give it maildir, db, sql, or other mailbox support. For some reason I seem to remember that IWImap was not free software, even though the source is available (some weird academic license hostile to commercial use?). The author is a good programmer and active in the standards process, but can be abrasive to work with.

  4. Re:JWZ and me by bkeeler · · Score: 3
    Your commandline does not solve the problem that the original invocation of xargs was intended to solve - passing a *huge* number of files to grep on the commandline (grep * in a directory with a ton of files) causes it to break.
    Yes, it does solve that problem. xargs knows the system-specific limit on how long a command line can be, and will invoke the given command multiple times if necessary.

    Thus

    $ /bin/ls | xargs grep "foo"
    might end up invoking, if you have thosands of files, something like
    $ grep "foo" 1 2 3 ... 467 468
    $ grep "foo" 469 470 ... 876 877
    and so on. Using the -i flag to xargs just means it has to create a seperate process for each grep, taking a lot of extra time.

    --

  5. Re:Outlook corporate mailbox by iceT · · Score: 3

    Ok. First off, Outlook is the client. Not the mail server. The mail server is called Exchange. Try not to mix the two. I can use Outlook against MANY back ends, including HP's Openmail, (almost) any IMAP/POP3 server, or no backend at all.

    Second, you site three 'benefits' to Exchange:

    Fast: Define fast. The Exchange/Outlook RPC is great over a 100MB network, but try it over a dial-up line, or some line with a high latency. They performance goes right now the crapper, because the protocol is very 'chatty'. The client and server communicate back and for repeatedly to get a task done. IMAP/POP3 are infinately better in adverse environments, because their protocol is 'batch' oriented. A couple of commands, and you have data streaming to the client. Another example is over that same high-latency connection, try forwarding a message with an attachment. The attachment has to be uploaded to the server before you can COMPOSE YOUR MESSAGE. On the server side alone, every internet message has to be 'decoded' into MAPI body parts for storage in the database. If it pukes on a body-part, it'll crash your information store. the IMAP servers do/can parse the messages based on MIME body parts, but that is only when necessary. Exchange parses EVERY internet message, and at a lower level that the MIME body parts.

    Second, you site 'scalability'. I ran a 7000 mailbox UofW POP3 server on a dual 166Mhz Solaris box with 256MB of RAM. The concurancy was about 25%, and the server ran with a load-average of about 1.2. My previous employer is having trouble running 2500 users on a quad PII-450 with 1GB of RAM at a 50% concurency. How is that scalability?

    Third, you mention 'workgroup features'. True, Exchange includes a fairly decent calendar service, this discussion is about e-mail. If you want to talk about workgroup functions, we can do that... (btw, voting is a client function, as it the task management. There is no true 'workflow' in that because there is no central process tracking the work. It's all source-routing/message updates.)

    You also said that Qmail is technically correct, but it's not going to do my company's productivity any good. This may be true. But talk to me when your company starts to interact with OTHER companies, and tell me how well Exchange does. Internet software is designed for interoperability, and when you're dealing with other companies, THAT'S what will make your company productive.

    As for security, I'll leave that to the rest of these guys. I already like the comment about the 5 days w/out mail due to the I Love You virus.

    --
    -- You can't idiot-proof anything, because they're always coming out with better idiots.
  6. Try MH+(S)IMAP by KMitchell · · Score: 3

    I ran into the "sync" mail issue a while back and came up with the following criteria:

    1) I want to be able to read mail both from a GUI-based mail prog (Outlook, Eudora, Netscape, whatever) **AND** from a shell

    2) I want to be able to access live and "older" mail anytime from (at least) home and work, preferably both my home and work email accounts.

    3) I do not want to send any cleartext passwords

    What I came up with is the following:

    At home I run the UW-IMAP server, and store my incoming mail in MH folders. Stunnel does a fine job of adding SSL support to IMAP.

    At work we run Netscape's Mail server which actively supports SIMAP.

    Either at home or at work, both servers (and all the mail in all the folders) are available.

    Just about the only thing missing is the ability to read my work mail from a shell, but that's where most of the big ugly attachments are, anyway...

  7. Re:it is nice by OlympicSponsor · · Score: 3

    "...if I had to resubscribe every time I use a new client."

    You have to type your password into the new client--maybe we should store that on the server too?

    "What if there was no last session for the client?"

    Then everything is RECENT. I realize this loses you a feature, namely that you can't see only those messages in client B that you didn't see in client A. But you don't have that feature now. Why not? Because there is a race condition in the spec: if a message comes in AFTER the last time you check your mail (in client A) but BEFORE you logout (with client A) that message won't be RECENT in client B.
    --
    MailOne

    --
    Non-meta-modded "Overrated" mods are killing Slashdot
    (Hey Ryan! Here's your proof!)
  8. I'm an 'mbox' user... by ewhac · · Score: 4

    I've been using 'mbox' for -- gawd, can I say this? -- fifteen years, and it's served me well. 'mbox's advantages for me are that it is efficient with disk space (you don't eat an inode per message), and that it is quick to search.

    9 times out of 10, when I'm searching my mail, typically with 'grep', I'm looking for something in the body, not the headers. With 'maildir', you have to open each message and search it. This is preposterously slow. There is also the danger that the shell's wildcard expansion limits may be exceeded if you have a lot of messages. With 'mbox', 'grep' opens the one file and slurps through it quickly.

    Remote synchronization is not an issue for me. All my email resides on my laptop, which follows me everywhere.

    However, I'm hip to 'maildir's increased reliability. I have over 2000 messages in my outgoing box alone, and I'd hate to have a system hiccup destroy any of it. If I could search the bodies of a 'maildir' spool as quickly as an 'mbox' spool, I could be convinced to switch.

    Schwab

  9. Why I don't use mbox by MSG · · Score: 4

    Originally, the reason we switched to maildir was that even without NFS, mbox was corrupting our filesystems. Not just the files, mind you, but the filesystems themselves. It was a total pain in the ass, and we damn near left Linux for FreeBSD. This was using 2.0.36 and Sendmail. We had to put /var/spool/mail on it's own partition so we could unmount and fsck it until we found a solution. Between that and problems with files > 500MB, my opinion of Linux 2.0 is very bad.

    Our solution was moving to qmail and using Maildir mailboxes for our users. We never saw the problem again. :)

    Recently, I've switched to courier mail server (http://www.courier-mta.org/) on all my non-production machines to evaluate it. I'm really, really happy with it. Courier is a complete mail system, not just an IMAP server, so you might take a look at the whole package. The whole thing is RFC compliant, which causes troublte for software that isn't, but that's a fault in the other software.

    As a final rant against UW-IMAP: I hate it. It loads the whole damn mailbox being checked into memory (regardless of the type), which creates a huge load every time someone with a large mailbox checks their mail. This problem affects the POP3 server as well, since that also uses the c-client code.

  10. Qmail also supposts mbox by scm · · Score: 4
    "Qmail only supports maildir..."

    That's just plain wrong. Qmail supports both maildir and mbox. I've been using qmail with only mbox files for years...

  11. Re:Exchange Mailbox format by iceT · · Score: 4

    And at only $87/user client access license (courtesy of Shopper.com), it's a STEAL...

    (oh, plus Win2000)...

    (oh, plus a machine with at LEAST 256-512MB RAM)...

    (oh, plus a backup solution to backup the DB live)...

    (oh, plus some sort of a firewall/gateway... you wouldn't want this DIRECTLY on the 'NET..!)

    --
    -- You can't idiot-proof anything, because they're always coming out with better idiots.
  12. Re:Enterprise-grade messaging for Linux/Unix by FattMattP · · Score: 4

    ArsDigita has a great article on using Oracle as a backend for your mail and ACS as a front end.

    --
    Prevent email address forgery. Publish SPF records for y
  13. Re:My mailbox by Wraithlyn · · Score: 4
    Unfortunately, "analog snailmail boxes" are highly susceptible to quite a few undesirable things:
    • High latency
    • Address spoofing
    • Packet flooding (AOL CDs)
    • Denial of Service attacks (rednecks driving by in pickups with baseball bats)
    --
    "Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
  14. My mailbox by Rudeboy777 · · Score: 4

    My mailbox works just fine, and it hasn't changed in over 20 years! It sits at shoulder height just to the right of my front door. Here's the advantages:

    -No encryption techniques neccesary

    -rarely have to waste time with forwarded jokes

    -Best of all, the spam it collects is occasionally useful (I know all the pizza deals available in town).

    --

    From hell's heart I fstab at /dev/hdc

  15. A word of advice by OlympicSponsor · · Score: 4

    As someone who is, as we speak, supposed to be implementing an IMAP server, let me say this: If the person who dreamed up RFC2060 says that X is "slow and dangerous" run, DO NOT WALK, to leap onto the X bandwagon--it'll be the wave of the future.
    --
    MailOne

    --
    Non-meta-modded "Overrated" mods are killing Slashdot
    (Hey Ryan! Here's your proof!)
  16. Cyrus Rocks by anewsome · · Score: 5

    I think the guys who wrote Cyrus IMAP server got it right. I have been using Cyrus for about 4 years now and I rarely delete mail. The server is still responsive and full body text searches are pretty speedy, even on the P133 server that it is running on. I think keeping each mail in a seperate file, and making a directory for each folder is the way to go. It also makes it very simple to restore a lost mail message and to index the whole mailbox. Anyway,.. thats my two cents.

  17. JWZ and me by Chaostrophy · · Score: 5

    http://www.jwz.org/doc/
    has a number of essays about mail on Unix systems, including problems with mail box formats.

    I use Xemacs/Gnus/nnml so all my mail is stored as individual files, which is handy (as other posters have said) and has it's downsides, as they have said too (grep now bitches if passed all files in my main mail box). Still, I like it, best system I've used. Not so great for the multiple hosts thing though.

    Or you could run your mail and xemacs on one machine, and either read your mail in a terminal, or open X windows on your local display. Look up gnuserve to do that, I think.

    --
    Plato seems wrong to me today
  18. Enterprise-grade messaging for Linux/Unix by IGnatius+T+Foobar · · Score: 5

    Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.

    Fortunately, a solution to this problem is being developed right now. The Citadel/UX project is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange. SMTP and POP3 are already in place; IMAP will be available by the end of the year. Web-based access works as well. After that's done we'll be writing plug-ins for both Evolution and Outlook, in order to facilitate all of the 'shiny things' working as well: calendars, address books, etc.

    So, you might ask, what mailbox format does it use? None of the above. Messages are stored in a database, like they should be. The Berkeley DB package from Sleepycat Software (yes, it's open source) is used for robust back-end storage, including transaction and logging support.

    I'd encourage any developers who are looking for the open source world's "Exchange Killer" to get involved in this project.
    --

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
  19. A few thoughts on message storage by ajs · · Score: 5

    Email messages are a specifically interesting topic. They're (for the most part) text, and tend to be larger than database fields want to be (on the order of 1+ kB each ranging all the way up to many megabytes in common practice).

    This makes most mail messages poor choices for database storage (for example you want to be able to use "grep" on mail or compress in-place. Headers on the other hand are a major win in a database ("select messageid from headers where user = 'me' and date > yesterday and fromaddr = 'taco@slashdot.org'" should be fast even if I have tens of thousands of messages).

    The easy solution is to keep the headers in the database, and then just keep maildirs with the original messages in the normal filesystem with the filenames in the database with the headers (something like message.headerid => headers.id and message.text is a path to the maildir entry for this message.

    This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.

  20. Maildir is WAY better by benploni · · Score: 5

    Maildir is better because:
    1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.

    2) maildir is more resistant to catastrophic corruption since each email is a seperate file.

    3) maildir keeps metadata about the email in the emails filename, rather than a seperate index file. This helps prevent the metadata, such as "replied-to" and "forwarded this" from getting out of sync

    4) filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.

    5) maildir is better positioned to take advantage of advanced new filesystems like reiserfs. when reiserfs has a plugin for file-level transparent compression, maildir will be able to selectivle and invisibly compess emails to the disk without requiring other programs/scripts to decompress them before use.

    Study maildir, it's just plain better.