Infrastructure for One Million Email Accounts?
cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"
Gmail is beta.
:)
Gmail does not have guaranteed uptime.
You do not pin your companies communications system on something you cannot sign a SLA agreement with.
need I go on?
And please don't forget to use Maildir for email storage, it's very good for backup and very easy to manage.
http://www.michel.eti.br
Seriously. If high availability systems is not your company's core competency, call IBM, Red Hat, Sun, Oracle, Novell. Tell them you have a million users. Tell them you have a very fat checkbook and that you want them to provide you with a complete solution. Tell them that nothing but 5 nines of uptime will do.
DO NOT implement a half-assed solution. Unless you really know what you're doing (and if you were, you wouldn't be asking this question), don't assume that a million Linux servers strewn about a million offices and data centers is the best solution, even if it is easiest to set up and administer. Maybe it is, come up with a proposal with hard numbers and see how they compare to the vendors. A million dollars spent on a Sun E10000, and Oracle Grid subscription (scales perfectly, right?), or a million IBM engineers flown into your site when an emergency happens may be worth paying for.
Actually, I was going to use "Obviously" as my subject line... so I'll just respond to yours.
I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.
Stilll convinced to migrate? Well, something with multiple datacenters, large scale, compressed SAN backend, and alot of clustering will do it. Shit, you could do the entire thing with MySQL if you REALLY wanted to. Moving the existing data over will be a huge pain no matter what you migrate to though.
My suggestion? Don't just jump off Exchange, do a proper requirements analysis and you might find it is alot cheaper to just redesign the existing architecture.
This is the best advice he'll get? Sheesh.
Think this through -- a lot of e-mail programs check every 20 minutes. Assuming I actually hit any without duplications, I could potentially need 400 minutes or over six hours to get all my mail. Since it's random, it could take days.
And that's just for starters with this lame scheme. If I want to check mail, say, from the field on a dial-up once a day... hopefully you can see how badly this would suck.
What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.
Sometimes it's best to just let stupid people be stupid.
Resign. You're obviously in way over your head if you have to resort to asking Slashdot readers for advice like this.
if you need another reason not to use qmail, this is a good one.
A single server? For one million users?
Insert "imagine a beowolf of those" joke here, except it isn't a joke.
I think you might be underestimating the requirements for this large a project that "must scale perfectly". The "99.9% uptime is expected" requirement alone requires multiple internet connections, a large cluster of front end servers, and redundent database servers, preferably located in different states. (ie: "What do you mean our only server is in New Orleans?")
I don't think the average Dell dual Xeon box is up to the task for this large a project...
Tequila: It's not just for breakfast anymore!
All of the paren't suggestions are decent, but there are a few alternatives that may make sense:
-Cyrus IMAP, while a monster to build and configure, can handle a pretty heavy load, and the latest versions can handle a lot of load-balancing internally.
-Exim's nice. I'm a Postfix man, myself. Sendmail is king, though. I'm not going to claim to like it, but it's up to the task, and there's something to be said with using a standard tool.
-While things like MD4 are okay for hashing, they're kind of CPU-intensive. Consider something like "second and third letter of username" that takes less CPU time. The right answer here depends a lot on the relative speed of CPU versus disk. If you can get dedicated hardware to do this (rare, but it exists), use whatever hashing the hardware supports.
-Consider some sort of cache (maybe even separate machines) between incoming SMTP and SpamAssassin/ClamAV. When the 2am spam run hits, your incoming SMTP machines can become overloaded. The downside: deciding what to do with mail that's not rejected the moment it's received.
-Set up a "mail machine" configuration with whatever OS and tools you use, and make it possible to create a disk image quickly. You're going to need a lot of hardware, which means that you'll have enough random failures to make building machines by hand impractical. This also means "have at least one extra built machine/disk array/etc. powered-on and waiting at all times" for those 4am hardware failures.
-You may find that things like NFS just aren't fast enough. Be ready to look at SAN or shared "direct-looking" storage. The tough part: this is hard to discover during testing. It may be overkill, but don't lock it out as a possibility.
-I/O is king. CPU speed won't matter as much as bus speed, disk speed, and memory speed. This is why a lot of companies use banks of big proprietary unix machines for their mail, even if they use commodity PCs elsewhere.
-I don't trust hardware load balancers. Sometimes they're necessary (and they do make life better when they work), but they're a big single point of failure. Consider other ways to split the load, or at least ways to work around the load balancer if it should fail. The Cyrus aggregator can handle some of this.
Forward, retransmit, or republish anything I say here. Just don't misquote me.
Since you've taken things off topic, I'll grab the wheel and pull it right off a cliff.
The reason Exchange uses a database can be summed up in three words: Single Instance Store.
Say you send one 1MB Word document to 100 of your colleagues. In a relational database-based, Single Instance Store-driven mail server, that document takes up exactly 1MB on the server. If somebody in the organization forwards the Word doc to the remaining 900 people in your organization, how much space does it take on the server? 1MB.
Send a 1MB document to 1000 users on a flat, mbox-style mail server, and how much space is taken up on the server? 1000MB.
I see your point about some things, sure. Being able to jump in and restore a mailbox from tape by just dumping a folder somewhere is nice, but it just doesn't scale in terms of storage the way a db-driven mail system does.
Don't flame me as an MS advocate. There are times when an SIS-based email system is good, and there are times when a flat email system is good. I've run Exchange environments for 500+ people, and I've run Linux-based mail systems for 1000+ people. I'm just saying that your particular argument is one-sided and flawed.
Or you could just use a filesystem that supports hard-linking files (see: man ln), so you do not have to worry about that even when using a filesystem for this purpose. Since such a file is read-only, it could just be linked to all of those people's mail boxes. If you do not know what a hard link is, it is basically the same thing you are describing, except done in the filesystem and handled transparently by the kernel. Basically, every "file" you see in an Ext 2/3 filesystem is really just a pointer to where the file is stored, and any actual file can have as many as these links as you want. When there are no remaining links to a file, it is allowed to be deleted.
Centralization breaks the internet.
There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).
Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.
And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.
what is so bad about POP3
Having never been near a computer, I have no idea. If I had to guess, I'd suppose that with a million users, 100,000 of them will have to be constantly reminded to delete their mail off the servers. 25,000 of them won't EVER delete their mail no matter what you do, and 5,000 will bitch and whine when you cap their fucking mailboxes. One of them will be the CEO, and he'll berate you in front of his smarmy suspender-wearing jerkoff golf buddies because you're a dumb hick that can't fit a terabyte of mp3s and porn (most of it redundant for chrissakes) into only 500 gigs of disk. You will also get to deal with countless issues involving different email clients. You would give almost anything to have a massive natural disaster wipe everything out so you didn't have to go to work tomorrow, but there's the wife and kids, so y'know, there it is.
my recommendations:
- Calculate with about 20-30 man days for the initial design. You'll need some software
development for about 30-50 man days, 100 man
days for setup, testing and fine tuning.
Figures may wary upon skill and LWF. Time
for integration into your backup service is
not included.
- Use a directory service with replication mechanism (preferred LDAP, we've done it with MySQL too). Every system except the load balancers will
get a replica.
- The user data is stored on machines with Cyrus . Depending on machine size, user profile, mbox size etc. you take between 5.000 and 50.000 users per system.
- The directory service knows which user is on which system. Prepare a script to move users from
one server to another (including the mbox).
- Incoming IMAP connects go through a loadbalancer to frontend systems with the perdition proxy. Those will relay thre requests
according to the directory to the responsible
IMAP server.
- Incoming HTTP requests will go through the
loadbalance to an Apache with Squirrel on the
frontend systems. Those will convert the requests
into IMAP requests and connect to the local
perdition.
- Generate a web frontend for the user to setup
auto reply, vacation and anti-spam settings.
- From those settings you can create SIEVE scripts for the user.
- Incoming and outgoing SMTP traffic is handled by systems with sendmail. Local delivery is handled by LMTP connects directly to the IMAP servers (cyrus can handle LMTP).
- Antivirus and Antispam is handled through the milter interface and appropiate plugins. Plan for individual settings per user (can be generated
from the data in the directory server).
- Loadbalancing SMTP us trivial.
- Add monitoring (e.g. Nagios), Backup and Restore (last one most important, nobody wants backup, all everyone wants is restore).
- If desired, use a cluster file system for those
IMAP servers to have even more redundancy.
- Make sure you have access to the internal DNS of your company. If you can setup "mail.acmecompany.com" to point to several ips (depending on location) this may ease your job
lot. If you cannot, this may be hard (and expensive) for your load balancers.
- You can scale everything horizontal in this concept. Choking point may be the load balancers.
- You can distribute the system easily onto several locations. Distribution over several continents is only recommended if you can either manage the DNS or the mail agent settings per continent.
Please forgive me, if i'm not completely correct. I'm only the sales repWith backup support you should be able to setup such a system in 6 to 12 months (the later more realistic for big companies).
Most probably users will complain about the lacking calendar.
Most troublesome will be the migration phase (hope you realized i didn't mention it above). This depends so much on your current scenario that it is very difficult to give a general advice.
> where would you start?
Contacting me ;-). Perhaps get a budget first. As i said, i'm sales....
Regards, Martin
One BIG issue between what people are running now and what they will HAVE to run soon is the little item of SOX compliancy. Be VERY careful that your little million user mail system is compliant or the implementation costs will double. Believe me i do this for a living and just saw one of our financial clients get stung big time.
*--- Sometimes a majority only means that all the fools are on the same side. ---*