Good POP3 Server for Huge Mailboxes?
brainchill asks: "I've got about 10,000 users split between a couple of quad 550 xeon machines. The machines have 2GB of ram. The problem is that the UW POP3 server takes a huge hit in both cpu and memory utilization when a 40+MB mail spool is requested via POP3. Sometimes it's bad enough to drag the monster boxes to their knees. What other POP3 daemons do you guys have experience with and how do they perform with large mailboxes"
Read the various qmail + whatever guides. Also, remember that system tuning can make a difference as well.
I use mbx personally (NOT mbox) and it scales wonderfully. The mailbox is fully indexed to speed up searching and can be accessed simultaneously by various processes. See here for more information.
I can only second that. qmail runs like a charm and scales.
Check out cr.yp.to/qmail.html and www.qmail.org
I work for an ISP where we have ~ 50 000 email users. Maildir's great when you have a few messages, and if one of these messages happens to be big then it doesn't matter. However, if a user has tens of thousands of emails of whatever size in their mailbox (happens far, far more often than you might think) then just getting a list of files in the directory can take an age. In the scenario where a user has masses of small messages (sub 2k) then mbox would probably be faster.
Whilst I'd certainly recommend using Maildir over mbox, it's certainly not going to solve all the problems.
Blaming GW Bush for the Iraq war is like blaming Ronald McDonald for the poor quality of food.
Have you looked at Cyrus? It is probably best known as an IMAP server, but it has very nice pop3 support as well.
Cyrus stores messages in a variation of the maildir format - it maintain a database of the flags, headers, etc for the messages in a folder to speed up access.
Notable features include shared mail folders (with independent views), quotas, multiple mail partitions (with the ability to move users across partitions on the fly), duplicate email checking, and a server side filtering language (sieve).
Most of this would probably be most useful if you were using IMAP, but it should scale quite well as a POP server.
Another option is using a filesystem that handles large numbers of files in a directory.
Probably the best option is to have your local mail delivery program write out both the message and keep a header cache. The pop server simply reads the cache to get the info it needs to present to the user, while still manipulating the message files to give the user
his/her messages.
Modifying the local delivery app is trivial if you use, say, qmail-local or procmail to do the work. I could probably whip something up in a couple of hours. However, it does break one of the main principles of maildirs -- no locking. You'll have to lock the header cache file in order to append or delete from it.
Someone else recommended cyrus; that might also turn out to be a useful option. It already does something like this.
However, if a user has tens of thousands of emails of whatever size in their mailbox (happens far, far more often than you might think) then just getting a list of files in the directory can take an age.
This is a filesystem problem. Use a better one. On FreeBSD, enable dirhash. On Linux, use ReiserFS or ext3 with htree.
Good christ, you'd think that by the time you outgrew a QUAD XEON mailserver with only 5000 users, you'd have been reevaluating performance before plunking down what must have been close to 10-15 grand or more at the time on a second one!
Your mailbox format is all wrong. Storing all messages in a single file is pretty much the worst way to do anything useful. You want to explore some alternative storage format such as mbx or maildir. I personally use maildir on ReiserFS on Linux and have good luck. (The filesystem is VERY important for maildirs. ReiserFS's block tail support and directory indexing give it major disk space and speed advantages for a maildir mailserver application, while running something like maildirs on XFS would instantly kill your server. I hear mbx is pretty good too, if you're stuck on some sort of standard filesystem since it uses indexing and fewer files than maildir. The downside is that it's not as immediately parseable as maildir or mbox... Ie you couldnt write a script to say... delete extremely high scoring spam messages from any user who hasn't checked their mail in over 3 months, or other things ISP's might routinely do to maintain their servers.
Finally, if you plan to scale way up there (60,000+), you need to start looking at better cluster systems than just a couple machines. Specialize the tasks of several machines to do mail storage or talk POP3. Look at something like POPular for specialized POP3 server clustering software.
~GoRK
I chose Cyrus for a customer that needed a MySQL backend for his server. But I quickly ran into a problem : the minimal timeout for unlocking the mailbox in the Cyrus POP3 server is 10 minutes (yes, that's right. 10 _minutes_ !). As people with buggy mailers (*cough* Outlook Express *cough*) are very common nowadays, I was forced to go patch the sources to weed out that stupid limitation. What's sad is that I found lots of messages on their mailing-list talking about this problem since a long time, and that one-liner patch never made it to the tree, which would lead me to think the authors are unconcerned about the needs of their users (I hope I'm mistaken here)...
:-) Or else, you could try using LARTs on your unruly, mailbox-filling users... Good luck to you, anyway !
BTW, there is a fine POP3 server that we've used without problems for a year (and we've customers that *never* empty their mboxes, so we've huge 300 MB horrors lying on the primary MX hard disk). It has no frills but works like a charm. It's called Solid-POP3, it's Polish, made by the same people who brought you the PLD Linux distro, and you can download it here (alternatively, just do an `apt-get install solid-pop3d' if you run that good ole' Debian
Xenu brings order!