What Mailbox Format Do You Use And Why?
"I currently store all of my e-mail in a local mbox-style IMAP store in ~/mail/, so that I am not tied to any particular mail client. However, I am planning on syncing my mail across multiple machines (home, work, and soon a laptop) so I need to have mail in a form which can be synced easily. MBox is bad for this because if I grab mail on one machine, and later delete some mails from the same folder on another machine, then sync, the new mails will be lost. This is where maildir is good - each message is a separate file. But why do so many people hate it? If I do change over to mailbox, what IMAP/SMTP servers should I use? A hacked sendmail/UoW IMAP? Courier-IMAP + QMail? Something else? How do other people keep their mailstores synced across many machines, and what software do they use?"
ls | xargs grep "search string" xargs is cool
If the poster was looking for a nice cross platform easily synchronized mailbox format. I don't think Exchange is going to be right for him, especially since he apparently runs (gasp) Linux.
I read the internet for the articles.
I think you forgot to read the message. The poster wants a mailbox format that he can download on a laptop/PDA or some other device not connected to the net, read and respond to email, and the synchronized the system back up when he gets back online (IE, copy over the mbox or maildir).
Exchange doesn't like that last time I used it, in fact it acts more or less like mbox in that respect, although it depends on how your mail is localy stored. Plus Exchange is pretty expensive for people who don't have large expense accounts and have to support a large base of people.
Finally, how does supporting POP and IMAP make you a "lot more cross platform than UNIX mail"? Especially since UNIX based mail systems can do the same thing (and can share mailboxes between other non-Exchange servers if need be). My biggest beef with Exchange is the binary message format. Just try to resurrect a slightly damaged file, or search/modify something without having to fire up your mail client or web browser.
I read the internet for the articles.
maildir's speed is far better under reiserfs. I can't say that there's no slowdown, but it's certainly much smaller. You can also do a significant amount of filtering based on filename, rather than mbox (where you've no choice but to grep through every message in the file).
Re the wildcard expansion limit, xargs can handle that.
ReiserFS has been designed to be easily portable. That Linux is its {current,primary} target means little -- throw $20K (or whatever they charge) at Reiser and his team and a port will get done.
It being that some organizations spend that much on a single server, this is pretty damn reasonable.
And he's entirely right -- my experience confirms that using ReiserFS makes maildir handling much faster than under ext2.
...you reinvented a wheel. In particular, the MAPI architecture.
While much M$ software is poorly designed, MAPI is an exception. MAPI is a pretty flexible, intelligent architecture for all things messaging.
MAPI allows you to do things like substitute message stores, address books stores, etc., by treating them as abstract components. Exactly what you're claiming to have done with your "data store API"
I don't want to be too critical, but I hope you folks looked at MAPI before you went out inventing another API...
Let's try not to let fact interfere with our speculation here, OK?
I just moved my qmail/courier-imap mail system from freebsd to linux (ffs to ext2) and performance is not so good when dealing with large directories.. I'd definitely recommend using an advanced fs on your mail server like sgi's xfs, ibm's jfs, or reiserfs. xfs is especially cool, because if the file is small enough, the file itself is stored in the inode and no space is allocated. That sounds great for storing messages to me! SGI also have created a redhat7/i386 installer cd, which allows for xfs-only systems (with 2.4) from the get-go. Tried it our last night, works like a champ.
as far as mta's go, does anyone know if qmail supports secure sendmail (using sasl)? I'm running an old version of postfix on my relays, time to update.
cheers,
-o
Actually, what it means is that you can use a high-performance, industry standard query language like SQL to extract data, instead of having to kluge together a patchwork of file and stream manipulation tools.
1) It needs to be *trivial* to use standard unix tools on the mailbox to find things.
e.g.,
rmm `scan
should remove all messages with badthing[12] in the heading, where f4 is an alias for
sed -e 's/\(....\).*/\1/' |tr '\n' ' '
[I'll admit that I was briefly worried the first time that this was
my reaction to a bunch of messages from a mailer gone nuts . . . ]
2) it would be nice for the system to be hostile to abusive mailings--not by content, but from the idiots that send plain text messages in html and mime. That's not a user preference; it's *wrong*.
3) Must be command line friendly. MUA's are for sissies. Real men read from the command line
hawk
Actually, the Cyrus IMAP Server is open source and takes a similar approach. It's been deployed and in use at Carnegie Mellon for quite some time, and I believe that the older version of the engine has the bases for a number of commercial web servers out there, including the iPlanet mail server.
sigs are a waste of space
It used to be that qmail was only allowed to be distributed in source code form, and the CDB database system (it's a cool thing well worth looking at) used a license that was somewhat incompatible with the GPL. There seemed to be some rancor to the effect that the GPL wasn't a "free" license; seemingly an independent recreation of the "BSD bigot" approach to software licensing.
If you're not part of the solution, you're part of the precipitate.
o_dev_t st_dev;
o_ino_t st_ino;
o_mode_t st_mode;
o_nlink_t st_nlink;
o_uid_t st_uid;
o_gid_t st_gid;
o_dev_t st_rdev;
off_t st_size;
time_t st_atime;
time_t st_mtime;
time_t st_ctime;
};
which adds up to around 48 bytes, and add to that the size of the directory entry that attaches to the inode.
It's not forcibly "ludicrously big," but it's space overhead nonetheless.
As for "flaming," it's somewhat unfortunate that Dan
If he tried to find some places for agreement, his software would probably get used more. Some of it's really very neat, cdb and the microscopic DNS server being particular examples...
The fact that he comes from a pretty strongly "pure math" background means that he comes up with substantially different ideas than most people. The PM factor adds in two particularly useful things:
If you're not part of the solution, you're part of the precipitate.
This may be important on a big mail server where inodes or disk space may wind up being scarce commodities.
There are then nontechnial issues.
The creator of Maildir , Dan Bernstein, is a, um, "somewhat prickly character." Take a look at his criticisms of Postfix for some mild material. Comparative discussions of Postfix and qmail have resulted in extremely inflammatory discussions. And Bernstein's attitudes towards the GPL seem similarly "inflammatory." This appears to have put some people off his software, whether rightly or wrongly.
Personally, I use Postfix as my MTA, and push messages through Maildir as interim step to pushing them into MH, which is only a fairly small step removed from Maildir...
If you're not part of the solution, you're part of the precipitate.
Exchange has its good points, this is true, but the biggest problem I have with it is that it holds my data hostage. I can't get at the mail spools if something dies and, if it does, you're fucked unless you also bought support contracts.
We've been running qmail + vpopmail for over 1500 people with Maildir formatted message stores without a problem for over two years now. When something breaks, I can fix it. Data is stored either in the database or in regular old files. It seems to work very well on a mediocre P2 and has all the good stuff: (A)POP, IMAP (courier-IMAP), selective relaying (relaying is allowed after a successful POP or IMAP authentication), user-run mailing lists (ezmlm) and web configuration (vpopmail has a web client). Oh yes and Squirrelmail for the web based mail reading folk.
There's one thing I learned early on and that's that I don't like having my data held hostage. The software I reccomend for the companies I advise for is pretty much any software is alright so long as either a) it's open-API b) opensource or c) I get copies (and updates) of the data formats. Surprisingly few companies balk at this.
"'Tis great confidence in a friend to tell him your faults, greater to tell him his." --Poor Richard's Almanac
Your commandline does not solve the problem that the original invocation of xargs was intended to solve - passing a *huge* number of files to grep on the commandline (grep * in a directory with a ton of files) causes it to break.
xargs works two different ways depending on how you invoke it.
is the equivilant to
or
Whereas invoking xargs like this :
is congruent to :
So, umm, there, and such.
</pedant>
--
filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.
Yeah, being to grep to find a particular message properly is really handy - as is being able to kill all the messages containing 'University Diploma' with just find, grep and rm...
The other thing I've found in the past with mbox is that if you're really unlucky, the POP3 server will make a temporary copy of your (whole!) mailbox before doing a UIDL/LIST. qpopper used to do this at least, and you really knew about it when someone had a 30Mb mailbox. Maildir has a minimum of file shuffling and reading/rewriting.
"don't fall into the fallacy of believing that Perl can solve social problems. Maybe Perl 6 can, but that's a ways off"
The Citadel/UX project is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange.
On the face of it, this statement makes no sense at all. The big mail communications servers these days are the Internet MTAs, which in all the major ISPs handle typically many millions of messages per day on behalf of millions of customers per ISP. As others on this thread have mentioned, Exchange runs out of steam if you push it beyond some 2000 users per server -- it just doesn't scale, so it's not "Enterprise Grade" by any stretch of the imagination, it's out by 2-3 orders of magnitude. You've got to stop believing manufacturer's propaganda.
You should compare Citadel/UX to qmail or Exim installations in large ISPs, not against toy systems. Server farms with dozens of hierarchically-organized, multi-CPU MTAs which provide the massive underpinning to the world's Internet mail traffic, those are the "Enterprise grade" systems of today, not the relatively puny corporate systems of yesteryear being portrayed as "Enterprise grade" by manufacturers of personal computer software with more money than experience.
I feel I must also comment on your novel use of the word "robust". If one compares the reliability, availability and robustness of a flat file to that of even the simplest database system, the mind boggles that anyone could consider the database system as anything but the less reliable of the two by a collosal amount.
We run massive database systems here from the best regarded RDBMS manufacturer in the industry and configured with their help, yet even our DBAs will admit that the reliability of their databases is not brilliant. In contrast, the reliability of Exim is, er, well, it has never failed, so I guess the reliability is infinite. And I hear that qmail is likewise excellent in that respect. How the hell is a database going to improve on that kind of reliability and robustness?
Even the best databases crash and corrupt data every once in a while, and a new database could easily be less stable rather than more. But I've never had a flat file crash on me.
If it makes you feel any better, Unix is a sort of combined I/O multiplexer and storage mechanism, which inevitably makes it a particular kind of database too. To get the most out of it you should leverage its capabilities instead of trying to impose a totally different semantic on top of it. You'll never gain robustness by adding complexity.
"The question of whether machines can think is no more interesting than [] whether submarines can swim" - Dijkstra
However, you'll be happy to know that we've wrapped all of the database calls into a data store API. Recently we made the transition from GDBM to Berkeley DB without having to rewrite everything -- just drop in a new data store module and re-import the data (yes, there's an import/export utility). It would be quite straightforward for someone to write a data store module that uses MySQL, Oracle, or whatever.
--
Tired of FB/Google censorship? Visit UNCENSORED!
I'm very concerned about security, so I configured Courier-IMAP to ONLY provide SSL/TLS secure POP and IMAP. I set it up to provide insecure (non-SSL) service only on localhost (127.0.0.1), but not visible over the network. That way SquirrelMail or MUAs running on my server can get to it without SSL, which is OK because there's no way for someone else on the wire to eavesdrop. Of course, I also have the .htaccess file for SquirrelMail set up to only server over SSL/TLS (see below), and I don't allow telnet, rlogin, or non-SSL'd FTP. into my server.
I'm somewhat interested in developing up with a database back end for the IMAP server, so that old archived email can be stored more efficiently than either a maildir or mbox, but still be readily accessible.
# .htaccess for SSL-only services
# Options -Indexes
<IfDefine HAVE_SSL>
SSLRequireSSL
# insert the https: URL of the service in the next line
# for automatic redirect if the user attempts a non-SSL connection
ErrorDocument 403 https://host/webmail/
</IfDefine>
<IfDefine !HAVE_SSL>
# this is to make sure that if the web server is accidentally started without
# mod_ssl, the web pages won't be served up insecurely
Deny from all
</IfDefine>
refile +foobar `pick -from foobar`
will move all messages from "foobar" into my foobar folder in about 15 keystrokes (with autocompletion).
refile -link +foobar `pick -search project6` +project6
will refile messages in my foobar folder containing the text "project6" to my project6 folder using hard links. Now the messages exists in both folders.
I can type inc, show, next, comp, etc. in any terminal window at home or at work, and the right thing happens (with a few ssh tricks and gnuclient). No fumbling for some icon to click on, or waiting for the gui to come up, or finding the window running my mail agent...
The only drawback is that after a few hundred thousand messages scattered in hundreds of folders indexing the files for backup can take a bit of time, "what do you think I'm running here, a news server?"
"You have to type your password into the new client--maybe we should store that on the server too?"
Yes, you do have to store the password (or a derivative thereof) on the server. Otherwise, the server would never know if you typed in the correct password or not. But, I think you're poorly trying to make a point that not all data should be stored on the server.
It's true; not all data should be stored on the server. Like certain subscriptions. Of course, the client doesn't have to use the server's capabilities to manage subscriptions.
I would like to have a client that allows me to choose server-based or client-based management of subscriptions and recent messages. That way, I could say "I always want this subscription, but this other subscription should only show up when I'm using balsa from home" or something. That would not be possible if the server could not store subscriptions, but the ability to store subscriptions does not prevent the client from doing its own management.
And race conditions in the spec should be fixed. They're not excuses to throw away the idea entirely.
-Dave
Citizens Against Plate Tectonics
maildir format does not scale well to large mailboxes on large servers because it has no sort of overview cache information. Mark Crispin (author of UW imapd) correctly deduced that MH format sucked for the same reason that qmail format sucks, and refused to implement it. Without a way to do overview information, getting headers to do the message list is excessively slow.
Yes, I'm aware of that. The problem is that it's dog-slow. Opening and scanning 2000 files for one mailbox alone is just darned painful. Even if the mailbox is hundreds of megabytes in size, 'grep' will operate on it faster if it's a single file than if it's zillions of separate files.
Also, when your mailbox grows to thousands of messages, the wildcard expansion in the shell ('*' in your example) may overflow or truncate, and you may not actually scan all the messages. Yes, you can resort to foreach, but then not only are you opening zillions of files, you're discretely launching 'grep' a zillion times as well.
Like I said, I admire 'maildir's reliability, and it's certainly more flexible in certain ways., and if I could get the same or similar search speed out of 'maildir', I'd switch. But for the moment, 'mbox' serves my purposes.
Schwab
Editor, A1-AAA AmeriCaptions
I have experience working for a company that hosted millions of users with a maildir format. There are some problems with it. First, some filesystems are just not built for having zillions of inodes and tiny files. WAFL, used by Network Appliance, can fail under this sort of load. Secondly, maildir file names can be quite long. There was a bug in a version of Solaris where the operating system would not cache file contents of an NFS-mounted file whose name was longer than 31 characters. This can result in very poor performance.
IMHO you seem to mix up a few thinks. You shouldn't compare Exchange and qmail. since the latter is written with the "do-one-thing-but-do-it-right" paradigm in mind. it is only an MTA, that means it delivers mail to your mailbox. while exchange does a whole lot more. using the right tools (e.g. courier-imap, procmail, fetchmail, etc.) you can get most (if not all) functionality you get with exchange. and each of those tools uses the DOTBDIR paradigm. you just need to combine the appropriate tools, and voila...
IMAP? courier-imap.
security? you have to make a tradeoff since you refuse to use proper products, there's no tradeoff using qmail-courier-imap-ssl-mutt-whatever.
workgroup facilities? there are a lot (Evolution, many webbased) so that's a moot point. you get everything.
Resources? qmail is a lot more resource friendly than Exchange...
still there's a tradeoff using OS-tools, you need somebody put all this together. a smart guy...
This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.
Yes, it is great until the two get out of sync. If you can limit access to the raw filesystem, then that'll eliminate most of the problems, and most of the advantages.
Besides, databases are a lot better (these days) at storing large hunks of arbitrary data, so I'd just stick everything in the database.
That or use a future version of reiserfs, which could give you a database-like view of your filesystem.
What I do is configure maildirs for everyone on the mail server, using either qmail or postfix (both can deliver to maildir; qmail is more minimalistic but a bit confusing, postfix is about as good and a lot more understandable), and then setup qmail's pop3 daemon (even if using postfix to deliver). This combination has worked so well for me that I use it both on server and on my desktop computer (getting mail from pop3 with fetchmail, delivering into maildirs, reading with mutt).
The only thing to make sure with maildir is that you have enough inodes. But that's easy to handle when formatting the partition, and (even better) you could use reiserfs, which has dynamic inode allocation and handles large directories of small files very well.
--
Couple of considerations have to be made regarding choice of mbox formats. Here are my thoughts:
Flat mbox file:
pros: easy to set up, accessible.
cons: subject to locking issues,
not scalable, limited to local fs
Maildir format:
pros: fast, highly scalable, good
performance, very few locking
issues, reliable
cons: limited user access to directory
Proprietary db format:
pros: transactions, scalable
cons: expensive, corrupts easily,
word of warning:
backup frequently if you are
using MSexchange.
University politics are vicious precisely because the stakes are so small. -- Henry Kissinger
Not FUD: The Exchange guys at my old job were sharp, loved Microsoft products, and generally kept Exchange up.
Their points:
1) No version of Exchange had a stable message store until 5.5SP1. According to them, that's at least 3 years on the market, corrupting mail all along! But it does work fine now, and Ex2000 solves the '1 big database' problem.
2) They had weekly maintance downtime to handle the database issues. That meant they took turns coming in on Sunday mornings. Whoop for them.
3) Even so they still occassionally had niggling database consistancy problems which they never could quite work out. When these things were happening, people would get nervous because basically the server could crash anytime. Many times they had to go offline and restore the entire messagestore from tape to solve these things.
Meanwhile, I used to do some Notes stuff. Notes has it's own problems, but at least you could backup and restore mailboxes with the COPY command, as well as solve DB corruption and whitespace issues (which cropped up rarely) with the server online. I never had to come in on the weekends at least. But to prove this isn't FUD, I'd take the Outlook interface over Notes or Netscape any day of the week
--
Business. Numbers. Money. People. Computer World.
Courier-IMAP works with it. So every IMAP client works with it. Of course, mutt works fine with it, too.
It is RECENT for that particular client but not for the end user. And end-users are the ultimate target of email systems. Clients just help make reading the e-mail less painful.
I'm still scratching my head trying to come up with a scenario where a user would want all of his mail to suddenly be marked UNSEEN behind his back. On the hand, every user I've ever met likes the scenario where switching to a different client maintains the state of his email world.
But you don't have that feature now.
There is a vast difference between a race condition that might affect erroneously flag some mail and a design that always erroneously flags all mail. In the four years I've been using IMAP I've never had this race condition hit me. Despite your claim, I do have this feature now.
I use exmh as my mail client. The mh tools use a separate file per message. Here are the issues with it as I see them:
Advantages:
* Easy to access any message with standard Unix text utilites (grep, more, and such).
* No worry about corrupting the entire mailbox if one message gets clobbered by a broken client (or broken file system or whatnot).
* Incremental backups and syncronization is easier
Disadvantages:
* Uses lots of storage. [Oh wait, I work for a storage company, so this is an advantage.]
* With one file per message, you can get more files in a directory than your shell will allow you to use as command line arguements. (e.g., `grep important *` may fail)
I guess the big safety issue is how well it behaves if you have more than one mail client accessing your email at a time. I don't see this as a very likely situation, but still something that should work.
I used to work in a place that stored about 20 terabytes of certain documents it worked with, which varied in size from 1K to 5G each. Median size was about 40M. All the meta data, like what customer it applied to, dates of processing, and so forth, were stored in a database. But the actual document file never was. The network path to the document was in the database, but the documents were stored on hundreds of Novell (ick) file servers. The database was still the major bottleneck of the whole operation. All these wonderful database facilities like SQL don't mean squat when the main functionality was to get the document, process it, and store it back, which is what happened most of the time. Of course it was nice to have the SQL when you needed to manually check on things or do some odd searches. But I would never store bulk data in a database; only the pointer to it would go in there. Databases are faster at complex searching, but not at bulk delivery of data.
now we need to go OSS in diesel cars
Why are they concentrating so much in a single box like that anyway? Why not a few separate smaller boxes?
now we need to go OSS in diesel cars
You have got to be fucking kidding me!
I haven't used Exchange 2000 but Exchange 5.5's mailbox format is a piece of shit! Its one huge flat file. And I mean HUGE. Plus the "Jet" database format it uses is slow as balls. And to top it all off, you have to take the service offline to defrag it! Unless you love getting up several Saturday mornings a month because your users can't check their email, then exchange isn't for anyone.
-Lee
-----BEGIN GEEK CODE BLOCK----- Version: 3.12 GIT d? s: a-- C++++ UL++++ P++ L+++ E- W++ N o-- K- w--- O- M+ V PS+ P
That or use a future version of reiserfs, which could give you a database-like view of your filesystem.
I find future versions of trendy software to be pretty impossible to use in building a workable solution...
"The future's good and the present is nothing to sneeze at." - Roblimo's last
Three different string encapsulations. That should be enough to tag it as bletcherous.
-russ
Don't piss off The Angry Economist
Doesn't seem to hard to me. As far as consistency checking goes, you can ignore the on-disk text except for displaying the message. If you want to use the headers from the file to refresh the database in the event of coruption, fine, but it's not a big requirement.
Any backups of my data that I keep are also stored in the same physical universe, but I don't use this as an excuse not to keep backups. Having the headers lying in a plain text file to sanity-check against can only help. Generally when one assesses risk, one works with cost/benefit tradeoffs. What you propose is very costly in terms of database resources, whereas duplicating headers on disk is very cheap. This cost comes in terms of disk space, time used to duplicate the data (which in a very large system could be staggering for every message body), etc.I think you will find that the benefits of storing headers twice will far outweigh the cost of having done so. I can't say the same for storing open-ended (in terms of size) message bodies in a relational database.
Nice idea, but we're talking about software design here, not system administration procedures. Clearly a sysadmin should be backing the data up, but to tell the user, "something looks odd here, go chase down a sysadmin and make him restore a backup," is a lot less friendly than, "I found some courupt headers in message 501719, correcting..."Because it makes harder for root to read everyone elses email?. Actually as a sys-admin myself i'd love to have mail stored in a way where only the recipent can read it.
Unless the "binary" is encrypted data then it's hardly going to make a difference. Also the encryption key had better not be stored anywhere. Otherwise "su -l \" will do the trick anyway.
Let alone that in many enviroments encrypting mail in such a way that only the the user could read it would be a very bad idea.
Maildir seems elegant at first, but it has one problem: our filesystems suck. We need a filesystem that is good and fast at creating, opening, and deleting files, even when there are 20000 files in a single directory.
/var/spool/news under?
What filesystem do you have
I balked when one of the sysadmins at my work suggested trying out our experimental exchange 2000 server. But, it supports IMAP and Webmail, so I checked it out. Man, is that great! The webmail is really what puts it over than just plain IMAP, although I guess an IMAP server could have a webmail client as well.
Try having a look at www.courier-mta.org
Whatabout current implimentations of mbox that need to be converted into maildir... Can this even be done in an orderly fashion, or is it just slash and burn /var/spool/mail?
Quite trivial, since it's simply a matter of cutting up files into smaller bits. Can't have anything else accessing the mbox file at the time, but once the MTAs and MUAs have been switched to maildir then nothing else should be looking at it.
If maildir is indeed the great thing that some people make it out to be, you'd think that there would be more people switching
The problem is MUA writers tending to ignore maildir. Even though they will happily put the effort into more complex or redundant ways of accessing email. e.g. kmail has inbuilt POP3 support, but every machine it can run on can also run fetchmail.
I can just imagine what went on in his head "hmmm....I have to find some format for RFC8222 messages together. Oh, I know! I'll just throw them in one big file. Wait, but how will I know where e-mails end? What's one of the popular words in English? From! I'll use "From " to distinguish e-mails, and let people quote from-lines"
There is always MMDF which does the same thing, except for using ^A as a message separator. Other than this it has all the same "features" as mbox.
UW IMAP has different requirements. It doesn't just place mail on the system and leave it at that; it needs to read all of the mail that is there. Maildir will suck for that task. With maildir the IMAP server will need to open every single file and read some info and then close the file again. If you have a folder with a couple hundred emails in it it will very quickly thrash your system(100s of opening and closing of files for just a couple bytes of data from each one). MBox on the other hand, you just open one file per folder and parse through that. For the delivery this can cause problems because you need to open it and append to the file. For reading it is much easier on the system, but you take up a lot more memory and have the possibility of corruption(if you are careful that is a pretty low possiblility).
Except that mail "readers" don't just read. They also do things such as add metadata, delete, move mail around, etc. With mbox metadata is commonly done through adding extra headers into the existing file inserting stuff into the middle of a file is expensive as well as meaning that anything other than exclusive access probably isn't possible. With maildir it's simply a matter of renaming the file. To delete a message with mbox you either have to leave holes in the middle of the file (and "compress" it later) or rewrite as you go. With maildir simply delete the file. To move with mbox it's a matter of a file append followed by a delete. With maildir it's simply a rename.
For maildirs, you would do a mv(1) into the tmp subdir, which is essentially free, rather than qpoppers copy to a different fs (/tmp) which is slow and expensive)
Actually the latter is probably even more expensive since it isn't a simple matter of copying the data a chunk at a time from one file to another. The code doing the copying needs to look at the data being copied, either generating an index or verifying an index... As well as adding metadata by inserting extra data into the file (or the copy).
It abuses the filesystem with one file per message in the same way that mh folders do.
/var/spool/news must really "abuse the filesystem" then. Odd that in nearly 20 years noone has come up with an alternative.
Guess
Unless you're running a decent btree structured filesystem like XFS, ReiserFS or JFS, expect a performance hit if you get thousands of messages in a single mailbox.
Expect an even bigger performance hit if you have lots of messages in the same file. You must use lots of locking (and it must be reliable otherwise the whole thing will get corrupted). Things such as index files must be understood by every piece of software which does anything with the file, etc. Effectivly you will end up trying to enumatle a file system in user space software.
1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.
There is another consquence of this maildir supports an arbitary number of processes reading and writing at the same time. The mailbox format requires complex locking, even then adding new messages has to be strictly serial.
Maildir is also a better analogy with paper mail. Mailbox would be something like you have a scroll of all the messages pasted together which you periodically have to hand to the postman for more bits to be stuck on the end...
The other thing I've found in the past with mbox is that if you're really unlucky, the POP3 server will make a temporary copy of your (whole!) mailbox before doing a UIDL/LIST.
It's not just pop3 servers which do this, indeed it's almost the standard way of processing a maildir file.
You CAN use NFS, if you want -- without getting real paranoid.
You could even use SMB to access the mailbox from a Windows workstation. (Or at least you could if the software existed.)
A point which hasn't been mentioned is that accessing email from a workstation using file sharing (the same file sharing which is in use anyway) means no need for additional password entry (or storing passwords in plain text/reversable encryption formats.) User simply needs to log in and there mail is there. If they log in on more than one machine everything still works fine too...
Actually, what it means is that you can use a high-performance, industry standard query language like SQL to extract data
SQL being so "standard" that software vendors demand a specific implimentation... Pull the other one, it's got bells on!
The only objection I can see the Linux camp having is qmail is released under a "non-free" (as in freedom) license.
The licence is likly to upset both GPL and BSD diehards. Also who the author is may be an issue too...
Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.
.INI files. With the same problems, it's an "all eggs in one basket approach" and difficult to deal with when things go wrong.
Sounds like big iron propaganda... Both mailbox and maildir have the advantage of being conceptually simple. The database solution is complex, probably more complex than is needed for storing email in the first place.
Arguing for a database looks to me quite similar to the arguments as to why the Windows registry is better than
Using mailbox means that a problem with John's mailbox probably won't afffect Jane's. Using maildir means that a problem with one of John's messages probably won't affect the rest of them. Using some kind of DB could easily mean, John has a problem with mail, everyone has the same problem.
Coz if you put everything in one or two big proprietary boxes people can charge you a lot for consultancy and support.
;) ) Unix admins.
As well as these boxes being expensive, since they need to be reliable and hot swappable redundant. Not that even that will help when something external such as a router, cable, etc fails.
Whereas if you distribute the mail load to 10x the number of boxes (albeit cheap of the shelf boxes), you just need maybe one or two decent (backup/redundancy
Unfortunatly RAIC or RAIB dosn't quite have the ring of RAID.
With mail the load can be distributed. In fact I believe people don't really mind having their email addresses being user@tag.domain.com. It's the marketing/PR guys who'd complain. Heck market it as user@neighbourhood.domain.com
Assuming the distribution needs to be that obvious in the first place...
We had issues with databases becoming corrupt (and hey, 150000 users like it when they lose all their mail), the database being overly bogged down (guess what, fopen is faster than going through a database) amongst other things.
As opposed to one user out of 150,000 losing their mail with mailbox or one user out of 150,000 losing some of their mail with maildir.
While granted I'm sure that bugs such as these can be worked around
What's the point of applying a work around to get a complex system to work when you could simply apply a KISS aproach?
I believe mbx format is better than both maildir and mbox. I think part of it is in binary, which makes it faster.
It may make it faster but it also means that you can easily be tied to specific hardware/software combinations in order to be able to read your email.
Sorry to tell you, but this doesn't work.
(storing email in a database that is)
I worked as lead programmer at a mail provider and was in charge of the system's design from the start. The ingenius idea to store email in a database, while it sounds good...is rather horrific. We had issues with databases becoming corrupt (and hey, 150000 users like it when they lose all their mail), the database being overly bogged down (guess what, fopen is faster than going through a database) amongst other things.
While granted I'm sure that bugs such as these can be worked around, databases were meant for holding fields of data, not whole files - especially binary ones (and before you say that email is ascii, thing other languages where they use multibyte encoding etc.)
this is basically X.400, upon which OpenMail is based. But instead of having a JDBC/ODBC/Whatever link to a relational database, the architecture is to have a mail-specific "mail store" which stores the mail in SOME way, and then client tools which just talk to the mail store (and client tools in this case are things like an IMAP daemon). It's basically this model, assuming that you are dealing with your mail store as a Mail-specific interface rather than raw SQL.
Wasn't there a recent article on a MySQL file system? Wouldn't this be the best of both worlds?
It's 10 PM. Do you know if you're un-American?
If you're using POP, typically the MUA downloads and removes all mail from the server. If that's the case with you folks, then you're taking advantage of the client's CPU and storage, rather than the server. That's swell if you already have heavy-duty clients with good backups and your people don't move around much. IMAP makes more sense if you need a central message store, though.
Both are reasonable choices, but it's unfair to compare them in an apples-to-oranges fashion.
I'd agree that "prickly" is a good word to describe Bernstein; a while back I wrote a chapter of a book on email and while I was writing it I had nightmares about him reaming me for a minor error. Luckily, the book seems to have escaped his notice.
Having used qmail for a few years, I can indeed say that it is a safe and reliable product. But I wouldn't recommend it for a novice sysadmin; DJB is a really smart guy, and he seems to have little patience for those who aren't.
As to his views on licensing, here is the distribution policy for his software. He strictly forbids distribution of qmail except in forms approved by him:
http://cr.yp.to/qmail/dist.html
Exchange defrags itself online, but deleted items become white space. The only way to remove that white space is by an offline defrag. But if you have your act together from the start (planning, esp. mailbox limits), tons of whitespace shouldn't be an issue.
Stop spreading FUD.
ostiguy
Your statement was true until about 12 months ago.
MS is going to enterprise style per CPU licenses with its xxxxx 2000 products. Exchange 2k will be pushed heavily at ISPs and especially ASPs. MS will expect companies to scale highly, with x,000's of users per box.
ostiguy
I can think of two advantages right off:
/Maildir/ instead of messing around with the MAILBOX file.
1) If your MAILBOX file gets trashed, you're out your entire e-mail directory. If one MAILDIR file gets messed up, you've only lost one e-mail.
2) If you get a messed up e-mail that you can't read in a mail program (and this DOES happen) you only have to delete the corresponding message in the
I know that UofW claims Maildir take a performance hit, but I've not noticed one. There's all sorts of web resources on tweaking UofW to pump out e-mails faster. I'm currently using Qmail + IMAP-2000 (with the Maildir patch on Qmail's site) on a P100 w/ 32Mb of RAM and I've got it pumping out IMAP as fast as my work's commercial server does.
Some people take their .sig way too seriously
A few points:
Tune your file system for what its used for. Your /home directories (where the mail will be stored by default) should be set to have a relatively large number of inodes because of a tendancy toward small files in there.
Read the docs on updatedb -- set the execlusions to include "/home/*/Maildir" if you wish.
Maildir also allows for multiple processes accessing a 'mailbox' because it uses per-file locking on per-message files, not a lock on an entire mbox itself. This allows for situations where 6 people all have the same IMAP shared folders for shared incoming mail (like an accounting office, or tech support) without locking problems for the MUA or IMAP server.
- Michael T. Babcock (Yes, I blog)
Use rgrep or GNU grep's -r option to do a recursive search:
grep -ri "slashdot" ~/Maildir/*- Michael T. Babcock (Yes, I blog)
Does someone want to explain how mbox is better for concurrent access than Maildir? If you do some good coding, they're equal. For Maildir though, you just do read locks on individual files in your Maildir when opening them to present them to the user, and you create new files to write new messages, which doesn't have any effect on (eg 25) other processes accessing that Maildir.
- Michael T. Babcock (Yes, I blog)
Take a look at Courier-IMAP. It handles Maildir quite efficiently.
- Michael T. Babcock (Yes, I blog)
What he may not realise is that distributiors and VARs are the ones who (usually) get the first calls from their clients. If those people (like RedHat) can modify the software (a la GPL) to behave like the rest of their software, they'll find it easier to support themselves. RedHat distributes a version of the Linux kernel with several patches added so that their customers will be happy (based on their own presuppositions). Does the kernel mailing list still get questions from those users? Of course. Does it take much to tell them to contact their distributor instead? No.
- Michael T. Babcock (Yes, I blog)
That was in 25-user bundles, new users (no upgrades)... I image there are volume discounts... my previous employer got Exchange 5.5 down to $37/mailbox at 150,000 user volumes...
Of course, $35 is the current rate for Exchange 5.5.
Meanwhile, Sendmail.com's advanced server (SMTP/POP3/IMAP4) is $3/user (500user minimum).
-- You can't idiot-proof anything, because they're always coming out with better idiots.
An even better solution is a length of 132 lbs (to the yard) rail.
:)
And a largish electromagnetic coil, to make a neato rail gun. Add some optical recognition software to identify the Rednecks on your video camera feed and away you go
I have to agree here.
I used the Washington Uni IMAPd on our branch mailserver for a little while, but it became painfully slow due to the mbox format it used.
Try this simple mbox experiment - delete one email from a folder containing 1000 mails, or a few emails with megabyte attachments.
with only 15 - 20 users the (admitedly underpowered - P133 16meg) server was on it's knees most of the time. It was rare to see a load average of less than 3
So I upgraded to a PII 266 with 64meg, and installed the Cyrus Imapd, and since then, the userbase has doubled, and since quite a few of us don't ever delete most of our emails (you never know when you might need to recall a conversation with a client from 2 years ago), the server now handles around 50,000 emails totaling nearly a couple of gig of data, without breaking a sweat. I've never seen the load hit 1, and nobody has ever complained about speed since.
IMAP is the easiest and safest way of making your mail available from any machine. And since the mail is stored on a central server, there's no synchronisation problems at all.
But if you don't want go to the (rather small) trouble of installing an IMAPd, or your favourite mail program that you just cant be without doesnt support IMAP, then I would say that maildir is the way to go unless you tend to delete messages as soon as you've read them, in which case it probably doesnt matter much. (although I noticed another comment that maildir works well over NFS due to not needing file level locking).
Advanced users are users too!
Now where did you got that load of crap from? Certainly not from the documentation, because those (QMail FAQ) state:
Is qmail compatible with sendmail? Answer: Yes. qmail supports .forward, /etc/aliases, binmail deliveries to a central mail spool in the usual mbox format, the /usr/{lib,sbin}/sendmail interface for mail injection, and the normal UNIX user database in /etc/passwd. There is a checklist for large sites moving from sendmail to qmail.
Do get the facts right before drawing conclusions whether something is or isn't possible.
To delete an email in mbox you have to read the entire spool file, then write it out without the one email. Very slow, very resource intensive.
Wu-Imap supports maildir, however the problem is that imap needs information such as the subject and sender from the header of the email. That brings you right back to open/close/read on every mail. Why bother?
Where I work we use wu-imap for pop. We didn't realize the performance implications until the last minute, then wrote a patch to the code that doesn't load the body of the email until it is needed. With that patch, you can't run imap, only the pop gateway but the performance is great. We might have used qmail or something similar if we had it to do over again, but at the time we had a lot invested in wu-imap and went with it, _with_ our patches. Otherwise wu-imap loads every mail from the mailbox into memory... kinda defeats the whole purpose of maildir. (btw.. we have 54k named users popping two 300mhz single cpu suns.)
Maildir also performs better over nfs. That's because 1) stat isn't expensive, 2) don't have to load the enitre file over the network just to delete one email, and 3) no locking issues.
Finally, note that most people who pop their email don't even have new email. This is the most important fact. Mbox format requires that you read the whole spool just so that a client can say, OK, I already have these emails. Very wasteful.
So go with pop, shun imap, go with maildir. Imap would be great if you had a integrated MUA/MDA which could save headers in a database or somesuch upfront. Otherwise imap requires too much as a protocol I think.
Bullshit. Yes, Outlook pretends to know where you're going and is often wrong but this virus has nothing to do with that. The ILOVEYOU virus was spread simply because users could simply double-click on an attachment to execute it. The difference between Outlook and other clients in this regard is that other clients make it a pain in the ass to do so because their developers never thought anyone would want to execute an attachment. So it's more like giving you a car. If people give you the directions to hell which you choose to follow, Outlook will simply take you there faster. Other clients might save you from going there because they make it too far away.
Mmmm.. Donuts
I have had too many security issues with IMAP. POP may suck, but it does so the same way every day.
Take a look at what sorts (and counts) of messages / files your users tend to have. Choose your mailstore / filesystem based on that.
--
Free Software: Like love, it grows best when given away.
Despite a campaign to get rid of the worst {ab,}users (many had 'leave on server' accidently turned on), there were still many users with mailboxes over 50MB (In one case, a customer with dialup accounts had a friend that sent them a digitized home video (3 copies). They never managed to get their email after that because they kept on concluding that the connection had timed out.
In any case, I could notice the system load go up when a user with a large mailbox tried to POP their mail. Even with server mode turned on, mailboxes regularly got thrashed back and forth. Had I been free to do so (and understood qmail at the time), I would have definitely considered going to maildir format for performance reasons. (we had customized the software to remove just about every other performance issue).
Problems that I can see with maildir:
- On NFS, directory searches tend to be expensive if you have 'full' directories. (then again, there are problems with POP and NFS
- You're going to eat more inodes (so reformat the drive already!)
- You have to open a file for every message... but you only need to read the beginning of it to get a message list (RFS, here we come!)
- The Mail(1) command won't work anymore (real nice for all sorts of customer support stuff).
- It's gonna hurt you if users have lots of small messages (they way that email used to be).
- Regular file commands can be used for all sorts of mailbox work (pointed out by someone else).
- you don't thrash files around whenever a user checks email, and/or deletes one or two messages (of hundreds)
- You CAN use NFS, if you want -- without getting real paranoid.
-
I see MailDir format being the biggest winner when you have users with mailboxes full of big messages (15
.DOC attachment anyone?)
I don't see how mailbox format can be a security consideration. I can see a specific program that uses that format having problems. Related, but different issues.On the other hand:
--
Free Software: Like love, it grows best when given away.
Let's see now: A medium-small ISP with 10,000 customers, at $87 each... that's almost $1M. (minus discounts). In that space, the cost of a dual PIII with 1MB ram and firewall is going to be trivial.
Bluntly put, that pricing is obscene..
Another thing: This is about mailbox formats, not software (though exchange would mandate their prefered format).
--
Free Software: Like love, it grows best when given away.
Whenever you use the word "or" in specifying network applications, it's often worth considering whether you should use the word "and". It might be worth making subscriptions modifiable on a per client and per user basis. It might even be worth having several profiles storable on the server per user and the current profile decided by the client. You do have to draw a line when complexity increases too much but e-mail is one thing where flexibility is important.
Rich
It's always possible that your IMAP server isn't written to the spec of course. It wouldn't be the first time a programmer has done what's sensible and not what's written down on a piece of paper.
Of course, then someone usually writes a piece of software that relies on the braindead part of the spec and everything breaks.
Rich
How does it work on something much bigger?
We have about 40,000 users spread over 6 POP servers (Pentium 600s or something). And we have people like me with about 1/4 million messages in our folders...
I'd love to see us do IMAP, but I think it would choke without racks and racks of servers.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
First of all, databsses can handle large amounts of arbitrary data, such as BLOBs or big chunks of text.
Secondly, I think you missed the fact that maintaining two separate data stores (database for headers, filesystem for message content) will certainly be more work than just using one or the other.
Lastly, a filesystem and a database are both stored on a magnetic disk (for the most part). How is it easier to corrupt one than the other? Back in the good ol' MS-DOS days, I lost data when BOTH copies of the FAT were corrupted. I've also had to manually rebuild NTFS partition data. In both cases, there were relatively small points of failure.
So, what do you do in th ereverse case from what you described? Suppose the portion of the filesystem containing the mail is lost. Can you rebuild those lost messages from the database which you would have store only the headers?
I say do one or the other, and then BACKUP OFTEN!
In addition to the advantages mentioned above, it fully supports accessing the same maildrop by multiple clients at the same time, which UW only partially does (the last time I looked, anyway). It also doesn't have the innumerable security problems that UW had forever that gave IMAP a bad name. You can send mail directly to a maildrop other than your inbox, if you set the permissions on it. You can even share your maildrops with other users on the same system with a nice ACL system. They've recently implemented a deliver-time filtering system (that I haven't used yet), which fixes the one drawback that it ever had.
You should definitely go with Cyrus if you think you will ever have the need to access your mail from multiple places.
Fuck 'im up, Tim! His views are invalid! -Pirate Corp$
Despite that, I'm cutting my teeth on qmail, well, because that O'Reilly's Sendmail book is just too fucking big for me to spend my valuable time reading. I may be getting a bit O/T here - qmail seems to be more like a "real" unix program than sendmail. Small discrete modules that pipe output to other small discrete modules, each mutually untrusting, instead of one all-inclusive behemoth of a program. There are also some easy to use tools for it like vpopmail, makes virtual domains a cinch. Some of the "big boys" are using it too, so it seems to be proven. Of course, qmail does not enjoy the what, 80-85% market penetration that sendmail does.
cat
>
/bin/grep: Argument list too long
./waters.roger/Bring_the_Boys_Back_Home
./shakespears.sister/_Hello_(Turn_Your_Radio_On)
./pink.floyd/Is_there_anybody_out_there?
/bin/ls | xargs grep "foo"
.: directory
./Roger_Waters-It's_a_Miracle.mp3: symbolic link
../../.mp3/3155
/bin/ls | xargs -i grep "foo" {}
Main Entry: pedant
obsolete : a male schoolteacher
2 a : one who makes a show of knowledge b : one
who is unimaginative or who unduly emphasizes
minutiae in the presentation or use of knowledge
c: a formalist or precisionist in teaching
Which one of these applies exactly? Because your post is incorrect.
>Your commandline does not solve the problem that
>the original invocation of xargs was intended to
>solve - passing a *huge* number of files to grep
>on the commandline (grep * in a directory with a
>ton of files) causes it to break.
Actually, it *does* solve the issue that xargs was intended to solve - shells have finite (often 100 or 255 character) commandline buffers.
For example:
bash-2.04$ find . -type f | wc -l
35416
bash-2.04$ grep -il "is there anybody out \
there" `find -type f`
bash:
bash-2.04$ find . -type f -print0 | xargs -0 \
grep -il "is there anybody out there"
I've done this on Linux, Solaris, OpenBSD, HP/UX, Openserver and Unixware without any problems.
> $
> is the equivilant to
> $ grep "foo" `/bin/ls`
This is almost, but not quite true.
First difference is that `/bin/ls` will be executed and substituted into the commandline buffer of the shell you're using, thus giving the possibility of an overrun.
The second difference is that you don't have to worry about escaping out special characters when using the backtick operators, but you do when using xargs.
For example, I have a directory containing a file with a single quote in its name. So this happens:
bash-2.04$ ls | xargs file
xargs: unmatched single quote
To work around this, its usually much better to use find.
bash-2.04$ find . -print0 | xargs -0 file
to
You brought up syntax like:
> $
This is mostly for cases where you need to repeat a command with a series of arguments, but again where the argument would be too long.
It is roughly equivilent to
for I in `/bin/ls`; do $I; done
Matt
Actually, if you want maildir support, you don't have to go with qmail. Postfix does it as well. While I haven't used qmail so I can't compare the two, Postfix is very easy to configure and get up and running. I gave up on sendmail after two weeks when I couldn't get it do what I wanted. Postfix on the other hand took me maybe four hours to completely install and configure to my liking. It also has the same benefits of qmail from a discrete module standpoint -- definitely not the big behemoth-all-in-one that sendmail is. Anyway, check it out at here. An article that I found to be helpful when getting started was here.
The program supports around a dozen different mailbox formats, easily user selectable with an ~/.imaprc file. Some are just legacy stuff, but a few are clearly suitable in different situations.
I think UW recommends the "mbx" format for most situations - fast, safe in concurrent-access situations, etc. Clearly unlike either UNIX mbox or Qmail maildir.
Of course, if what you REALLY care about is fast IMAP access instead of being able to bypass the IMAP server with client-side solutions, you should look at Cyrus IMAP server as well.
Ask a silly person, get a silly answer.
If I'm creating security problems for myself, I'd like to know what they are.
__________________
What happens when you have > 32768 messages in a given folder under Solaris 2.5.1 running UFS?
--
Do daemons dream of electric sleep()?
/. is a commercial entity. goto slashdot.com
I'm going to put on my Captain Obvious costume and state that the administrator has more to do with the system succeeding or failing than the package itself (Unless of course it's MS Exchange, then failure is assured).
I'm not trying to be cute, it's just that any system can be slow and dangerous if not properly administered. Get the best talent you can and give them the support they need; you'll end up with a good system.
He who joyfully marches in rank and file has already earned my contempt. - "Big Al" Einstein
My Outlook client crashes all the time; it often can't find the server due to slow WINS lookups; when it crashes it takes forever to reopen due to the massive mail store (.pst file). Maybe the admins like it, but I sure don't.
sulli
RTFJ.
Sound just like.... BeOS! Amazing what duplication exists in the technical world....
(Except BeOS has an advantage: the database is the filesystem, which brings unparalleled speed and stability, not to mention that you can edit the messages with a resource editor.)
I use mbox for no special reason. It's the default for sendmail, exim, postfix, procmail, and all that good stuff. In fact, I don't know what SMTP server defaults to maildir.
That's about the only reason, but let me tell you why I stick with mbox, instead of switching to maildir.
Clean trees. Remember, every time you run updatedb (or locate.updatedb if you come from FreeBSD), it's looking for files. The more messages you have, the more files it's got to hunt down. That means time and size. Keeping clean directory trees is generally a good thing, it helps keep things where they belong.
Inodes. Many people keep thousands upon thousands of messages. This means thousands upon thousands of files. Multiply this by a few horde-happy users, and you're quickly running out of inodes in your /var filesystem. Then you start getting horrible messages like "Could not create /var/tmp/tmp.file". Not a good thing.
Cluster waste. Don't get clusterfucked! Remember, each file needs to fill a cluster, no matter how big the file is. Thousands of files can waste significant amounts of diskspace.
The only disadvantage I see is filesystem damage. If you trash your filesystem, it is easy to wipe out a lifetime of messages by ruining your mail file. Maildir reduces the damage to (potentially) only a few messages. However, I think this is insignificant.
A new year calls for a new signature.
Databases are not without problems. Most importantly, they can easily leave you stranded with data in a format you can't read, they guarantee data integrity only under a very limited set of failures and will corrupt your data otherwise, they have their own set of bizarre resource limits, and they require considerably more complex software and maintenance.
It's also a common fallacy to assume that scaling up requires a database. But mailboxes of different users don't interact with one another. If a file-system-based mailbox system works fine for one users, it will work fine for each of 1000 users. In fact, in many cases, a database makes the problem worse, because if messages for different users are stored in the same database, using a database introduces interactions between users that weren't there before.
Both maildir and the standard mailbox format work fine and do scale up. They do have their limitations, problems, and failure modes. But, you know what, so do databases. In practice, in my experience, mailbox-based mailers seem to work more reliably than Exchange, so being an "Exchange-killer" isn't much of an advertisement.
As for mailbox and maildir based mailers, you can avoid almost all problems by turning on message size limits and per-user quotas; even very liberal limits will keep accidents from happening while not interfering with normal mail traffic.
...until I started implementing.
Example: Why does ENVELOPE exist? It just lists header fields that could be obtained from BODY[HEADER]. All it really does is provide them in a different format--is that a job the server should be doing?
Example: Why are subscriptions stored on the server? That is client data. What if different clients (of the same user) want to subscribe to different folders? What if IMAP is being used as a front end to an existing mail system that can't have mods made to it?
Example: Why does the RECENT flag exist? First of all, it's client data. Second of all, if it didn't exist at all the client is perfectly able to calculate RECENTness just be storing the UIDs from the last session.
And these are the larger issues. It's a huge mish-mash of bizarreness--it's no wonder it's made almost no inroads against POP3, whatever the user-side usefulness (and I have to admit IMAP should be nicer than POP3).
--
MailOne
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
I've worked in the past with Isocor (now Critical Path) N-Plex Ultra, a high-end mail server product for ISPs.
One of the ways they reached very high performance (200 messages/second on a Sun E450, when a good conventional Unix mailer like Postfix or Qmail will more typically be running at 20 messages per second on equivalent hardware) was their proprietary message store organized as an object database.
Using this database and a multi-threaded server architecture, they could batch multiple email commits in a single disk write before acknowledging a SMTP transaction. The one file per message in the queue architecture of sendmail, postfix, qmail and the like means each email will require at least two disk writes (one for the file itself, one for the directory entry).
Obviously, using the standard Unix format has benefits in terms of ease of customizability, but there is a performance tradeoff to openness.
This is less of an issue than it appears, as there aren't that many companies or even ISPs that require such as high volume of email handling capacity in a relay, and those who do can easily expand capacity using a load-balanced server farm of cheap machines for much less than the licensing costs of proprietary software alone.
Actually, I believe this is one of the things that ReiserFS excels at.
I have very limited experience with Reiser myself, so perhaps someone else can provide more details, but as I understand it ReiserFS is capable of dealing with thousands of small files extremely efficiently (Through the use of tree structures to hold the filesystem). From what I've read, it would be a fairly ideal file system for things like maildir storage.
In fact, now that the 2.4.1 kernel is out, with included stable ReiserFS support, I might just give this a shot. ;-)
-- Toph
Topher
Cyrus http://asg.web.cmu.edu/cyrus/ seems to use a hybrid approach. Messages are stored in individual files, but the envelope information is stored in dbm format. So opening up a mailbox and listing messages is very fast. So is searching unless you want to do a full body search on all emails. Give it a try. It supports IMAP, POP, and LMTP.
If you look under the hood of the UW Imap server, you will see that it supports many more formats than straight mbox. I don't think that maildir is one of them, unfortunately, but there are a few (mbx comes to mind) that overcome some of the more blantant shortcomings of mbox.
Is UW Imap free software? If so, someone should feel free to give it maildir, db, sql, or other mailbox support. For some reason I seem to remember that IWImap was not free software, even though the source is available (some weird academic license hostile to commercial use?). The author is a good programmer and active in the standards process, but can be abrasive to work with.
Thus
might end up invoking, if you have thosands of files, something like and so on. Using the -i flag to xargs just means it has to create a seperate process for each grep, taking a lot of extra time.--
Ok. First off, Outlook is the client. Not the mail server. The mail server is called Exchange. Try not to mix the two. I can use Outlook against MANY back ends, including HP's Openmail, (almost) any IMAP/POP3 server, or no backend at all.
Second, you site three 'benefits' to Exchange:
Fast: Define fast. The Exchange/Outlook RPC is great over a 100MB network, but try it over a dial-up line, or some line with a high latency. They performance goes right now the crapper, because the protocol is very 'chatty'. The client and server communicate back and for repeatedly to get a task done. IMAP/POP3 are infinately better in adverse environments, because their protocol is 'batch' oriented. A couple of commands, and you have data streaming to the client. Another example is over that same high-latency connection, try forwarding a message with an attachment. The attachment has to be uploaded to the server before you can COMPOSE YOUR MESSAGE. On the server side alone, every internet message has to be 'decoded' into MAPI body parts for storage in the database. If it pukes on a body-part, it'll crash your information store. the IMAP servers do/can parse the messages based on MIME body parts, but that is only when necessary. Exchange parses EVERY internet message, and at a lower level that the MIME body parts.
Second, you site 'scalability'. I ran a 7000 mailbox UofW POP3 server on a dual 166Mhz Solaris box with 256MB of RAM. The concurancy was about 25%, and the server ran with a load-average of about 1.2. My previous employer is having trouble running 2500 users on a quad PII-450 with 1GB of RAM at a 50% concurency. How is that scalability?
Third, you mention 'workgroup features'. True, Exchange includes a fairly decent calendar service, this discussion is about e-mail. If you want to talk about workgroup functions, we can do that... (btw, voting is a client function, as it the task management. There is no true 'workflow' in that because there is no central process tracking the work. It's all source-routing/message updates.)
You also said that Qmail is technically correct, but it's not going to do my company's productivity any good. This may be true. But talk to me when your company starts to interact with OTHER companies, and tell me how well Exchange does. Internet software is designed for interoperability, and when you're dealing with other companies, THAT'S what will make your company productive.
As for security, I'll leave that to the rest of these guys. I already like the comment about the 5 days w/out mail due to the I Love You virus.
-- You can't idiot-proof anything, because they're always coming out with better idiots.
I ran into the "sync" mail issue a while back and came up with the following criteria:
1) I want to be able to read mail both from a GUI-based mail prog (Outlook, Eudora, Netscape, whatever) **AND** from a shell
2) I want to be able to access live and "older" mail anytime from (at least) home and work, preferably both my home and work email accounts.
3) I do not want to send any cleartext passwords
What I came up with is the following:
At home I run the UW-IMAP server, and store my incoming mail in MH folders. Stunnel does a fine job of adding SSL support to IMAP.
At work we run Netscape's Mail server which actively supports SIMAP.
Either at home or at work, both servers (and all the mail in all the folders) are available.
Just about the only thing missing is the ability to read my work mail from a shell, but that's where most of the big ugly attachments are, anyway...
"...if I had to resubscribe every time I use a new client."
You have to type your password into the new client--maybe we should store that on the server too?
"What if there was no last session for the client?"
Then everything is RECENT. I realize this loses you a feature, namely that you can't see only those messages in client B that you didn't see in client A. But you don't have that feature now. Why not? Because there is a race condition in the spec: if a message comes in AFTER the last time you check your mail (in client A) but BEFORE you logout (with client A) that message won't be RECENT in client B.
--
MailOne
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
I've been using 'mbox' for -- gawd, can I say this? -- fifteen years, and it's served me well. 'mbox's advantages for me are that it is efficient with disk space (you don't eat an inode per message), and that it is quick to search.
9 times out of 10, when I'm searching my mail, typically with 'grep', I'm looking for something in the body, not the headers. With 'maildir', you have to open each message and search it. This is preposterously slow. There is also the danger that the shell's wildcard expansion limits may be exceeded if you have a lot of messages. With 'mbox', 'grep' opens the one file and slurps through it quickly.
Remote synchronization is not an issue for me. All my email resides on my laptop, which follows me everywhere.
However, I'm hip to 'maildir's increased reliability. I have over 2000 messages in my outgoing box alone, and I'd hate to have a system hiccup destroy any of it. If I could search the bodies of a 'maildir' spool as quickly as an 'mbox' spool, I could be convinced to switch.
Schwab
Editor, A1-AAA AmeriCaptions
Originally, the reason we switched to maildir was that even without NFS, mbox was corrupting our filesystems. Not just the files, mind you, but the filesystems themselves. It was a total pain in the ass, and we damn near left Linux for FreeBSD. This was using 2.0.36 and Sendmail. We had to put /var/spool/mail on it's own partition so we could unmount and fsck it until we found a solution. Between that and problems with files > 500MB, my opinion of Linux 2.0 is very bad.
:)
Our solution was moving to qmail and using Maildir mailboxes for our users. We never saw the problem again.
Recently, I've switched to courier mail server (http://www.courier-mta.org/) on all my non-production machines to evaluate it. I'm really, really happy with it. Courier is a complete mail system, not just an IMAP server, so you might take a look at the whole package. The whole thing is RFC compliant, which causes troublte for software that isn't, but that's a fault in the other software.
As a final rant against UW-IMAP: I hate it. It loads the whole damn mailbox being checked into memory (regardless of the type), which creates a huge load every time someone with a large mailbox checks their mail. This problem affects the POP3 server as well, since that also uses the c-client code.
That's just plain wrong. Qmail supports both maildir and mbox. I've been using qmail with only mbox files for years...
ArsDigita has a great article on using Oracle as a backend for your mail and ACS as a front end.
Prevent email address forgery. Publish SPF records for y
"Mind, as manifested by the capacity to make choices, is to some extent present in every electron." -Freeman Dyson
My mailbox works just fine, and it hasn't changed in over 20 years! It sits at shoulder height just to the right of my front door. Here's the advantages:
-No encryption techniques neccesary
-rarely have to waste time with forwarded jokes
-Best of all, the spam it collects is occasionally useful (I know all the pizza deals available in town).
From hell's heart I fstab at /dev/hdc
As someone who is, as we speak, supposed to be implementing an IMAP server, let me say this: If the person who dreamed up RFC2060 says that X is "slow and dangerous" run, DO NOT WALK, to leap onto the X bandwagon--it'll be the wave of the future.
--
MailOne
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
I think the guys who wrote Cyrus IMAP server got it right. I have been using Cyrus for about 4 years now and I rarely delete mail. The server is still responsive and full body text searches are pretty speedy, even on the P133 server that it is running on. I think keeping each mail in a seperate file, and making a directory for each folder is the way to go. It also makes it very simple to restore a lost mail message and to index the whole mailbox. Anyway,.. thats my two cents.
http://www.jwz.org/doc/
has a number of essays about mail on Unix systems, including problems with mail box formats.
I use Xemacs/Gnus/nnml so all my mail is stored as individual files, which is handy (as other posters have said) and has it's downsides, as they have said too (grep now bitches if passed all files in my main mail box). Still, I like it, best system I've used. Not so great for the multiple hosts thing though.
Or you could run your mail and xemacs on one machine, and either read your mail in a terminal, or open X windows on your local display. Look up gnuserve to do that, I think.
Plato seems wrong to me today
Both formats have problems. A true enterprise-grade message store will use an embedded database with transactions support.
Fortunately, a solution to this problem is being developed right now. The Citadel/UX project is developing a robust communications server that will compete with products like OpenMail, Groupwise, and Exchange. SMTP and POP3 are already in place; IMAP will be available by the end of the year. Web-based access works as well. After that's done we'll be writing plug-ins for both Evolution and Outlook, in order to facilitate all of the 'shiny things' working as well: calendars, address books, etc.
So, you might ask, what mailbox format does it use? None of the above. Messages are stored in a database, like they should be. The Berkeley DB package from Sleepycat Software (yes, it's open source) is used for robust back-end storage, including transaction and logging support.
I'd encourage any developers who are looking for the open source world's "Exchange Killer" to get involved in this project.
--
Tired of FB/Google censorship? Visit UNCENSORED!
Email messages are a specifically interesting topic. They're (for the most part) text, and tend to be larger than database fields want to be (on the order of 1+ kB each ranging all the way up to many megabytes in common practice).
This makes most mail messages poor choices for database storage (for example you want to be able to use "grep" on mail or compress in-place. Headers on the other hand are a major win in a database ("select messageid from headers where user = 'me' and date > yesterday and fromaddr = 'taco@slashdot.org'" should be fast even if I have tens of thousands of messages).
The easy solution is to keep the headers in the database, and then just keep maildirs with the original messages in the normal filesystem with the filenames in the database with the headers (something like message.headerid => headers.id and message.text is a path to the maildir entry for this message.
This combines the best of both worlds. This also means that while it's easy to corrupt your database with a single bug in your code, you can always re-build it from the on-disk messages.
Maildir is better because:
1) it is more reliable over nfs. Maildir is designed to not need file-level locking, which sucks over nfs.
2) maildir is more resistant to catastrophic corruption since each email is a seperate file.
3) maildir keeps metadata about the email in the emails filename, rather than a seperate index file. This helps prevent the metadata, such as "replied-to" and "forwarded this" from getting out of sync
4) filesystem level tool work well with maildir. you don't need special "formail" type tools to work wirh them, bash scripting is capable of doing it all by itself.
5) maildir is better positioned to take advantage of advanced new filesystems like reiserfs. when reiserfs has a plugin for file-level transparent compression, maildir will be able to selectivle and invisibly compess emails to the disk without requiring other programs/scripts to decompress them before use.
Study maildir, it's just plain better.