Infrastructure for One Million Email Accounts?
cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"
For starters, uptime should usually be higher than 99.9% for this large a site. 99.9% uptime means 40-45 minutes of downtime a month. Try going for 99.99% at least, though this usually increases the cost by about 250% according to what I have seen a few years back.
+1 funny, -2 overrated. Life isn't fair.
I'd start with talking to vendors. Consult with some sendmail gurus, Notes guys, etc. Any of these people/companies would salvate at the thought of being a part of a project this large. First, talk to the client and hammer out the real needs with solid performance requirements, timeframes, growth expectations, (meaning real numbers) etc. Put together a well thought-out Request For Proposal and send them out to as many applicable vendors that interest you. Then just stand back and play the role of ringmaster. The vendors will give you all the ideas you need.
Just do one thing, please: make sure that the client is honest-to-goodness serious about this. I absolutely hate getting pie-in-the-sky RFPs from people who are just kicking the tires. It's a good way to burn bridges by not looking professional.
Entrepreneur : (noun), French for "unemployed"
I agree. The google appliance should implement gmail and a web front end for administration. Like the Colbalt machines of yore, only better. Google-ified.
It really is the best email.
I'm not sure that there is any commerical solution that can support 1 million emails well. Hence why Yahoo and Google have built there own custom systems. Some engineering may need to be required.
For pop3 & imap4rev1, look at:
http://www.dbmail.org/index.php?page=overview
Still need an MTA, I think qmail is the fastest, best, but I'd used exim, as its easier.
Database - not sure if MySQL and PostgreSQL will scale with dbmail.
I'd say use FreeBSD, because of the ports collection (Don't linux Flame me). However, something like Solaris 10 x86 (or Solaris+Sun Hardware) might provide a bit better scaling, and HA hardware, SAN support, support in general, etc. Though, a bit tougher on the OSS software installs (In My Experience)
Or maybe this is a legitimate cry for help from EDS who duped the US Navy into thinking they could actually outsource IT on the exact scale that the poster is talking about. Mind you, no one has ever provided ubiquitous support for an organization as large as the Department of the Navy, but they somehow convinced congress that they could do it for $6B dollars.
Just so you know. Most of us out in South East Asia refer to NMCI (Navy-Marine Corps Intranet) as the Not Mission Capable Intranet.
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
WalMart runs the worlds biggest Exchange install. They and msft are quite proud of it, actually...
The Navy maywant to take a page out of walmarts book, if they're having that much trouble.
... hi bingo
My God no! Friends don't let friends use qmail. Want reasons why?
1) It's a bitch to install. Won't even compile on modern Linux distributions. You have to patch it to compile it and the patch isn't even hosted on qmail's site.
2) It's a bitch to configure. Rather than parsing a single configuration file, qmail relies heavily on the presence of individual files in a directory.
3) Not not not not scalable! That's a myth. Doesn't properly batch jobs together. Hell! qmail was originally designed to be run from inetd!
4) Heavy reliance on other daemontools.
5) Breaks well-known and understood UNIX standards.
6) Security through lack-of-functionality.
7) Not really secure despite the claims.
8) No longer maintained.
9) No features. Adding them requires patching, and patching, and more patching.
Serious sysadmins don't use qmail and for damn good reason. I don't give a damn if Yahoo did manage to string it together and make it work well. In short, qmail isn't particularly suited for deployment in any capacity.
Slackware, what else when it must be secure, stable, and easy?
> you could do the entire thing with MySQL if you REALLY wanted to
I am so tired of people shoving everything into relational databases. What queries are you going to run against your database, anyway? SELECT * FROM messages WHERE read=0? Try "ls new" in your maildir. The reason things never scale right is because people design things to be "new" and "cool" like putting their e-mail into a relational database. No. Just use the filesystem. It, and its supporting tools, have been around for 30 years! It Just Works! It doesn't use any userspace memory! There are no permissions issues, because the kernel controls the permissions. It's the optimal solution.
The filesystem is really really efficient (for e-mail) and really really reliable.
Please, don't use a database!
My other car is first.
The Walmart exchange site was not properly backed up for "years". Mostly because Exchange was not 3rd party software friendly at all, and M$ didn't have much of their own backup software to offer. Veritas and Legato couldn't bend over enough for a million users.
Walmart invited countless consulting firms and data backup experts. They deployed Exchange strictly because M$ was willing to "support" them. To say they were vulnerable to a major IT disaster was an understatement. The Navy want nothing to do with Walmart's IT.
Definately agree on point 9. I maintain a mail server of over 2,000 users. Currently running Qmail with the following patches:
chkuser-2.0.8b-release.tar.gz
doublebounce-trim.patch
netqmail-1.05-tls-20050329.patch
outgoingip.patch
qmail-smtpd-auth-0.31.tar.gz
qmail-smtpd-auth-close3.patch
qmail-smtpd_gmfcheck.patch
qmail-spf-rc5.patch
Most of these patches require hand editing the sources and Makefiles to successfuly merge them all into the stock qmail or netqmail base. Lots of manually reading through *.rej files to make it all work.
In order to simplify new installations I've created my own personal CVS repository for my Qmail sources. I commit changes to the tree whenever a new patch comes out with functionality I need. Hence on a new install I simply check out my custom tree and compile.
The initial work was a royal pain in the ass, however, once it is all up and running the stability and performance has been excellent.
I don't know if you actually have experience running a mail server or not or if you just wanted to go off on your relational db rant, but mail data tends to be created and deleted A LOT with varying size files, and file-based structures on a mail server create serious fragmentation problems. If you do decide to go this way, allow plenty of free drive space - well above normal recommendations - like 80% free or more.
Also many people have their mail clients set with ridiculousy frequent mail check times (like every minute), and on a file based system each check requires a trip to the drive and back. Even with the data on a RAID array with a decent read/write cache, you're still going through the disk subsystem, whereas with a database it would all be in memory.
What's wrong with SELECT * FROM messages WHERE userid=xyz and read=0? That is a cakewalk for a properly indexed dbms. On a medium sized server (say, quad processor w/ 8-16GB RAM) there is more userspace memory than os memory space.
The mods are on crack, the meta-mods are on pot
Music is everybody's possession.
It's only publishers who think that people own it.
Fuck Beta
~John Lenno
All of these systems will be running sendmail.
You're high. Building a massive production email system on Sendmail 9 is slow-motion suicide. If the security holes don't get you, the terrible configuration methods and complete lack of scaleability will, nevermind the fact that Sendmail Inc is trying desperately to replace the product.
"Most managable with [...] heavy customization?" I'd laugh if I wasn't crying. And I'm crying because I used to work for a company that deployed a massively customized sendmail infrastructure -- and I was one of the poor bastards who had to maintain it. Trust me, you don't want to do this. Ever.
Yes, milter is cool. No, it's not cool enough to justify burning CPU cycles on sendmail in 2005.
Even Sendmail Inc tacitly admits that Sendmail's design is garbage: take a look at the design document for Sendmail X, and note carefully how much it resembles Postfix and Qmail. There are very good reasons for this.
News for Nerds. Stuff that Matters? Like hell.
Plan 9 OS has filesystem that does just this. I think it was called Venti. Basicly it hashes the datablocks on the filesystem and only stores each unique block once. There was (is?) project where the filesystem was being ported to Linux.
- Raynet --> .
Of course nearly everyone who uses it hates it, because it seems unnecessarily complicated. But this is precisely the kind of situation Domino was designed to handle: scaling. If you can get by with Sendmail, you don't need or want Domino, but if you want to manage a million email accounts, this is one of the first places I'd look.
This is exactly what Notes was designed to do: scale. People have been building systems on this scale with notes for nearly twenty years. You can not only scale it by moving parts of your email system onto mainframe class iron, but you can distribute it and provide all kinds of flexibility and redundancy into your system to meet virtually any messaging requirement (e.g. choose an alternate MTA for high priority traffic when there are Internet disruptions). Naturally there's some complexity involved, but if you can get by with sendmail you probably shouldn't be using Notes.
What's more important is that management of accounts and identity, which is distributed, delegatable, and backed up by robust cryptographic certificate management. You can let a subsidiary manage it's own accounts, they can subdelegate that to a division and the division can subdelegate that to the IT staff on site; at each level policies can be set, enforced, and changed for lower levels.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.