Infrastructure for One Million Email Accounts?
cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"
Gmail is beta.
:)
Gmail does not have guaranteed uptime.
You do not pin your companies communications system on something you cannot sign a SLA agreement with.
need I go on?
And please don't forget to use Maildir for email storage, it's very good for backup and very easy to manage.
http://www.michel.eti.br
Seriously. If high availability systems is not your company's core competency, call IBM, Red Hat, Sun, Oracle, Novell. Tell them you have a million users. Tell them you have a very fat checkbook and that you want them to provide you with a complete solution. Tell them that nothing but 5 nines of uptime will do.
DO NOT implement a half-assed solution. Unless you really know what you're doing (and if you were, you wouldn't be asking this question), don't assume that a million Linux servers strewn about a million offices and data centers is the best solution, even if it is easiest to set up and administer. Maybe it is, come up with a proposal with hard numbers and see how they compare to the vendors. A million dollars spent on a Sun E10000, and Oracle Grid subscription (scales perfectly, right?), or a million IBM engineers flown into your site when an emergency happens may be worth paying for.
Actually, I was going to use "Obviously" as my subject line... so I'll just respond to yours.
I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.
Stilll convinced to migrate? Well, something with multiple datacenters, large scale, compressed SAN backend, and alot of clustering will do it. Shit, you could do the entire thing with MySQL if you REALLY wanted to. Moving the existing data over will be a huge pain no matter what you migrate to though.
My suggestion? Don't just jump off Exchange, do a proper requirements analysis and you might find it is alot cheaper to just redesign the existing architecture.
This is the best advice he'll get? Sheesh.
Think this through -- a lot of e-mail programs check every 20 minutes. Assuming I actually hit any without duplications, I could potentially need 400 minutes or over six hours to get all my mail. Since it's random, it could take days.
And that's just for starters with this lame scheme. If I want to check mail, say, from the field on a dial-up once a day... hopefully you can see how badly this would suck.
What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.
Sometimes it's best to just let stupid people be stupid.
Resign. You're obviously in way over your head if you have to resort to asking Slashdot readers for advice like this.
I was curious about that, too...
Wal-mart has an estimated 1.6 million employees. (source)
General Motors, by contrast, has approximately 360,000 employees.
The post says "around one million accounts" which is very different from one million employees. I have over ten email accounts that I actively use for receiving mail and four to six for sending.
An ISP could easily have millions of accounts. But since he said "huge" company, they were using Exchange, and because he's asking Slashdot my guess is that he's not at an ISP. Instead, I'd guess he's at a medium-sized company that might offer email accounts to its customers or at a large company that also contains many subsidiaries (but wants one email domain for all of those).
if you need another reason not to use qmail, this is a good one.
A single server? For one million users?
Insert "imagine a beowolf of those" joke here, except it isn't a joke.
I think you might be underestimating the requirements for this large a project that "must scale perfectly". The "99.9% uptime is expected" requirement alone requires multiple internet connections, a large cluster of front end servers, and redundent database servers, preferably located in different states. (ie: "What do you mean our only server is in New Orleans?")
I don't think the average Dell dual Xeon box is up to the task for this large a project...
Tequila: It's not just for breakfast anymore!
All of the paren't suggestions are decent, but there are a few alternatives that may make sense:
-Cyrus IMAP, while a monster to build and configure, can handle a pretty heavy load, and the latest versions can handle a lot of load-balancing internally.
-Exim's nice. I'm a Postfix man, myself. Sendmail is king, though. I'm not going to claim to like it, but it's up to the task, and there's something to be said with using a standard tool.
-While things like MD4 are okay for hashing, they're kind of CPU-intensive. Consider something like "second and third letter of username" that takes less CPU time. The right answer here depends a lot on the relative speed of CPU versus disk. If you can get dedicated hardware to do this (rare, but it exists), use whatever hashing the hardware supports.
-Consider some sort of cache (maybe even separate machines) between incoming SMTP and SpamAssassin/ClamAV. When the 2am spam run hits, your incoming SMTP machines can become overloaded. The downside: deciding what to do with mail that's not rejected the moment it's received.
-Set up a "mail machine" configuration with whatever OS and tools you use, and make it possible to create a disk image quickly. You're going to need a lot of hardware, which means that you'll have enough random failures to make building machines by hand impractical. This also means "have at least one extra built machine/disk array/etc. powered-on and waiting at all times" for those 4am hardware failures.
-You may find that things like NFS just aren't fast enough. Be ready to look at SAN or shared "direct-looking" storage. The tough part: this is hard to discover during testing. It may be overkill, but don't lock it out as a possibility.
-I/O is king. CPU speed won't matter as much as bus speed, disk speed, and memory speed. This is why a lot of companies use banks of big proprietary unix machines for their mail, even if they use commodity PCs elsewhere.
-I don't trust hardware load balancers. Sometimes they're necessary (and they do make life better when they work), but they're a big single point of failure. Consider other ways to split the load, or at least ways to work around the load balancer if it should fail. The Cyrus aggregator can handle some of this.
Forward, retransmit, or republish anything I say here. Just don't misquote me.
What does Hotmail run these days?
I am under the impression that if Hotmail were running clusters of Exchange servers Microsoft would be quite vocal in the enterprise scalability of Exchange.
Since you've taken things off topic, I'll grab the wheel and pull it right off a cliff.
The reason Exchange uses a database can be summed up in three words: Single Instance Store.
Say you send one 1MB Word document to 100 of your colleagues. In a relational database-based, Single Instance Store-driven mail server, that document takes up exactly 1MB on the server. If somebody in the organization forwards the Word doc to the remaining 900 people in your organization, how much space does it take on the server? 1MB.
Send a 1MB document to 1000 users on a flat, mbox-style mail server, and how much space is taken up on the server? 1000MB.
I see your point about some things, sure. Being able to jump in and restore a mailbox from tape by just dumping a folder somewhere is nice, but it just doesn't scale in terms of storage the way a db-driven mail system does.
Don't flame me as an MS advocate. There are times when an SIS-based email system is good, and there are times when a flat email system is good. I've run Exchange environments for 500+ people, and I've run Linux-based mail systems for 1000+ people. I'm just saying that your particular argument is one-sided and flawed.
Or you could just use a filesystem that supports hard-linking files (see: man ln), so you do not have to worry about that even when using a filesystem for this purpose. Since such a file is read-only, it could just be linked to all of those people's mail boxes. If you do not know what a hard link is, it is basically the same thing you are describing, except done in the filesystem and handled transparently by the kernel. Basically, every "file" you see in an Ext 2/3 filesystem is really just a pointer to where the file is stored, and any actual file can have as many as these links as you want. When there are no remaining links to a file, it is allowed to be deleted.
Centralization breaks the internet.
Say you send one 1MB Word document to 100 of your colleagues. In a relational database-based, Single Instance Store-driven mail server, that document takes up exactly 1MB on the server. If somebody in the organization forwards the Word doc to the remaining 900 people in your organization, how much space does it take on the server? 1MB. Send a 1MB document to 1000 users on a flat, mbox-style mail server, and how much space is taken up on the server? 1000MB.
Speaking of which, is there any filesystem around that "automagically" detects redundancy and avoids storing the same data twice (i.e. two files with the same content end up being stored only once)? (I don't mean hardlinks. Suppose I download some file for the second time without knowing the first instance exists). I suspect this would add a lot of overhead to the filesystem driver, but it'd certainly be a cool feature.
Score: i, Imaginary
Current mailserver system I designed and built is hosting 80,000 email accounts, and will scale out to a million quite cheaply by just adding more machines.
80,000 is trivial. I was running a 12 node system with 87,000 users 12 years ago on hardware that was slower than a play station.
The complexity of going from 100,000 to 1,000,000 isn't just 10 times harder, you start to get into that area where sigma 4 system works with few problems with 100k but dies horribly with 1000k users. There is a line where instead of one machine being broken is unusual, you get this situation where at least one machine is always broken and it will often be broken in a way that is hard to diagnose.
There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).
Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.
And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.
what is so bad about POP3
Having never been near a computer, I have no idea. If I had to guess, I'd suppose that with a million users, 100,000 of them will have to be constantly reminded to delete their mail off the servers. 25,000 of them won't EVER delete their mail no matter what you do, and 5,000 will bitch and whine when you cap their fucking mailboxes. One of them will be the CEO, and he'll berate you in front of his smarmy suspender-wearing jerkoff golf buddies because you're a dumb hick that can't fit a terabyte of mp3s and porn (most of it redundant for chrissakes) into only 500 gigs of disk. You will also get to deal with countless issues involving different email clients. You would give almost anything to have a massive natural disaster wipe everything out so you didn't have to go to work tomorrow, but there's the wife and kids, so y'know, there it is.
my recommendations:
- Calculate with about 20-30 man days for the initial design. You'll need some software
development for about 30-50 man days, 100 man
days for setup, testing and fine tuning.
Figures may wary upon skill and LWF. Time
for integration into your backup service is
not included.
- Use a directory service with replication mechanism (preferred LDAP, we've done it with MySQL too). Every system except the load balancers will
get a replica.
- The user data is stored on machines with Cyrus . Depending on machine size, user profile, mbox size etc. you take between 5.000 and 50.000 users per system.
- The directory service knows which user is on which system. Prepare a script to move users from
one server to another (including the mbox).
- Incoming IMAP connects go through a loadbalancer to frontend systems with the perdition proxy. Those will relay thre requests
according to the directory to the responsible
IMAP server.
- Incoming HTTP requests will go through the
loadbalance to an Apache with Squirrel on the
frontend systems. Those will convert the requests
into IMAP requests and connect to the local
perdition.
- Generate a web frontend for the user to setup
auto reply, vacation and anti-spam settings.
- From those settings you can create SIEVE scripts for the user.
- Incoming and outgoing SMTP traffic is handled by systems with sendmail. Local delivery is handled by LMTP connects directly to the IMAP servers (cyrus can handle LMTP).
- Antivirus and Antispam is handled through the milter interface and appropiate plugins. Plan for individual settings per user (can be generated
from the data in the directory server).
- Loadbalancing SMTP us trivial.
- Add monitoring (e.g. Nagios), Backup and Restore (last one most important, nobody wants backup, all everyone wants is restore).
- If desired, use a cluster file system for those
IMAP servers to have even more redundancy.
- Make sure you have access to the internal DNS of your company. If you can setup "mail.acmecompany.com" to point to several ips (depending on location) this may ease your job
lot. If you cannot, this may be hard (and expensive) for your load balancers.
- You can scale everything horizontal in this concept. Choking point may be the load balancers.
- You can distribute the system easily onto several locations. Distribution over several continents is only recommended if you can either manage the DNS or the mail agent settings per continent.
Please forgive me, if i'm not completely correct. I'm only the sales repWith backup support you should be able to setup such a system in 6 to 12 months (the later more realistic for big companies).
Most probably users will complain about the lacking calendar.
Most troublesome will be the migration phase (hope you realized i didn't mention it above). This depends so much on your current scenario that it is very difficult to give a general advice.
> where would you start?
Contacting me ;-). Perhaps get a budget first. As i said, i'm sales....
Regards, Martin
What you have here is an opportunity for a tremendous open source win against exchange, and you are about to stuff it up because you do not have a clue how to do it.
So, what you do right now is you go find someone who does know how to do it. And by that I mean someone who can demonstrate they know how. Which does not equate to having a low slashdot id; it equates to having done real projects of this scale.
So, how do you start? You ring IBM and get them to come in and talk to you. You ring Red Hat. You ring Accenture.
If you want impartial advice from someone who isn't a vendor (which is a good idea), then you go find some companies that has a million seat open source e-mail deployment in place and you see if you can get their messaging admin to talk to you.
~~~~~ BigLig2? You mean there's another one of me?
In my opinion you're going to need a cluster of servers or at least round robin'd mx records for the servers. I personally think sendmail scales the best of the mta packages and offers the best set of features and ease of maintenance, although alot of people would argue it's intrinsicly insecure... I've never had problems, but I kept our mail servers up to date. I would seperate the smtp machines the outside world uses to deliver mail to your space from the servers used by users of your service to deliver mail. I would also move delivery services (imap, pop, webmail) to their own machines instead of having them on the smtp machine and you would probally be best to use a nas for the actual storage medium. This is actually a really interesting project. Good luck and let us know how it turns out :)
Shadus
One BIG issue between what people are running now and what they will HAVE to run soon is the little item of SOX compliancy. Be VERY careful that your little million user mail system is compliant or the implementation costs will double. Believe me i do this for a living and just saw one of our financial clients get stung big time.
*--- Sometimes a majority only means that all the fools are on the same side. ---*
No they cannot. Microsoft does not want you backing up mailboxes. You backup mailstores, which are several (hundred - however many will fit on a single disk partition) mailboxes. This works great for disaster recovery, you restore the failed disk.
It is worthless for a single user who just deleted some important message. You end up building a new exchange server, and then restoring the entire mailstore, than going into that box and grabbing the one message. Veritas (I presume Legato as well) has an option to go in an grab each message from the mailbox one at a time. However this is slow - 1/5th the speed of a normal backup.
I work for, a company that competes with Veritas and Legato (though we try for much smaller accounts, big enterprizes need things we don't provide). We do Exchange backup, and are pretty sure that Veritas is doing it exactly like us. I strongly doubt anyone can scale mailbox level backup to millions of users.