Slashdot Mirror


Infrastructure for One Million Email Accounts?

cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"

29 of 1,216 comments (clear)

  1. Obviously by SpiffyMarc · · Score: 5, Funny

    I'd start by submitting a question to Ask Slashdot.

    1. Re:Obviously by CDMA_Demo · · Score: 5, Funny

      I'd start by submitting a question to Ask Slashdot.

      Upon which the global "wankfest" will commence, leading to solutions ranging from Novell to qmail based solutions, upon which the OP will look for someone else for advice, upon which the OP will end up paying an IBM consultant to set up his company's email.

    2. Re:Obviously by WarPresident · · Score: 5, Funny

      I'd start by submitting a question to Ask Slashdot.

      Ah, a proof by contradiction, eh?

      --
      Here come da fudge!
    3. Re:Obviously by kryonD · · Score: 5, Interesting

      Or maybe this is a legitimate cry for help from EDS who duped the US Navy into thinking they could actually outsource IT on the exact scale that the poster is talking about. Mind you, no one has ever provided ubiquitous support for an organization as large as the Department of the Navy, but they somehow convinced congress that they could do it for $6B dollars.

      Just so you know. Most of us out in South East Asia refer to NMCI (Navy-Marine Corps Intranet) as the Not Mission Capable Intranet.

      --
      I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
    4. Re:Obviously by whackco · · Score: 5, Insightful

      Actually, I was going to use "Obviously" as my subject line... so I'll just respond to yours.

      I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.

      Stilll convinced to migrate? Well, something with multiple datacenters, large scale, compressed SAN backend, and alot of clustering will do it. Shit, you could do the entire thing with MySQL if you REALLY wanted to. Moving the existing data over will be a huge pain no matter what you migrate to though.

      My suggestion? Don't just jump off Exchange, do a proper requirements analysis and you might find it is alot cheaper to just redesign the existing architecture.

    5. Re:Obviously by AKAImBatman · · Score: 5, Funny

      upon which the OP will end up paying an IBM consultant to set up his company's email.

      At which point the highly paid consultant will post a question to Ask Slashdot...

    6. Re:Obviously by 88NoSoup4U88 · · Score: 5, Funny

      The obvious answer is of course : Send all those thousand employees an Gmail invite !

    7. Re:Obviously by Stephan+Schulz · · Score: 5, Funny
      Or complain loudly enough to be an embarrasement to Microsoft and they will supply equipment and support to get Exchange running smoothly!
      Yes, but who can affort the space, electricity and cooling for 500000 servers (generously assuming that Exchange can handle 2 users per server)?
      --

      Stephan

    8. Re:Obviously by Karl+Cocknozzle · · Score: 5, Informative
      I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.

      Your point about putting more effort up-front into design is well taken, but thhat advice applies to any platform...

      WIth that said, and without turning this thread into an Exchange bitchfest...

      Why in the hell can't you restore a mailbox from backup using only the tools you already have if the user is no longer present in Active Directory? You can't even export the mailbox with EXMERGE... Your choices are 1) 3rd party recovery tool (like Quest Recovery for Exchange) or 2) Build an ENTIRE OTHER SERVER and do a normal, full restore of the entire mail store so you can extract one measly mailbox.

      OBviously, the "Recovery Storage Group" feature is a VAST improvement over the old Exchange 5.5 way of bringing back just one mailbox (that being setup another server) but this is a MAJOR duh situation on Microsoft's part. They seem to think that since their "best practice" is to never ever erase any user account ever ever ever, that its okay to leave this gaping flaw in their enterprise groupware product. Sorry, but I think that sucks. We paid out the ass for "Enterprise" edition (to avoid the arbitrary 16gb limit on the mail store) and goddammit, I should be able to bring back a mailbox without its corresponding AD account without wasting a whole day setting up another server... I've only had to do it once (today) but the whole time I Was thinking how much esaier a mailbox restore on my OS X Server at home would be... Just restore the frickin' files and move on with your life.
      --
      Who did what now?
    9. Re:Obviously by mollymoo · · Score: 5, Funny

      I didn't think there was anything more tragic to do on /. than boast about a first post. But the idea of boasting about a first post you didn't even make had never occurred to me. Kudos.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    10. Re:Obviously by pjbgravely · · Score: 5, Funny

      WalMart runs the worlds biggest Exchange install. They and msft are quite proud of it, actually...


      Thanks, another reason to never shop there.

      --
      Star Trek, there maybe hope.
  2. Easy. by Chess_the_cat · · Score: 5, Funny

    gmail.google.com

    --
    Support the First Amendment. Read at -1
  3. ~ 320K accounts by Anonymous Coward · · Score: 5, Informative

    At IBM we use Lotus Notes which has saved us LOTS of virus hassles. Every employee has an account and we're something like 320,000 worldwide. The mail "databases" are spread among Domino servers but I don't know what platform these run on, or what hardware specs they have. I imagine it's either Windows or Linux... but who knows, maybe we're using some of our PowerPC-based iSeries servers. These are the boxen formerly known as AS/400.

    1. Re:~ 320K accounts by DaveCar · · Score: 5, Funny

      The mail "databases" are spread among Domino servers

      Yeah, but we all know what happens when one of these Domino servers falls over ...

  4. Re:POP? by JoshWurzel · · Score: 5, Funny

    I'd ask for six bullets. Why would you want to risk getting the empty chamber?

  5. For the lazy... by Spy+der+Mann · · Score: 5, Informative

    Here's Slidey's post. (Disclaimer: Copyright blahblahblah appropriate people yadda yadda fair use etc etc don't sue me, thank you)

    ---
    ok i work for a large uk isp in the messaging (email) operations dept. we currently have 2.5-3 million active accounts (and a load of suspended), and manage anywhere upto 12-16million mails per day

    our setup is like this (this is simplistic though):

    front line - anti abuse mta's - these do dnsbl type lookups (spamcop, spamhaus and sorbs). we have 9 incoming
    next we have mta's. they farm mail off to brightmail servers, which do similar to spamassassin. we have 6 incoming mtas, and 8 brightmail servers (not enough - high load)
    after that they farm off to vscans (6)
    after that any mail that gets through is delivered to mail stores (8 + 2 hot spares)

    what you want to be doing is similar to this above - chaining hte mail from one level to the next. the first level should be the rbl's - these are less processor intensive, and can remove a fair whack of your mails in one swoop. spamassassin is going to be more cpu intensive, since it has to open each mail and read the first x many bytes

    id have separate machine(s) holding your master directory, and if you can get directory caches then do that too (to take the load off the master directory) - ours run oracle

    i dont know what your budget is, but split up hte different tasks as much as possible. that way if you need to add more to any pool (rbl lookups, spamassassin etc) you just add another machine..

    one last thing - we also have a separate box just for postmaster mail (with exim + spamassassin funnily enough) - it tends to get busy

    Last edited by Slidey on 09-08-2005 at 11:19 PM
    --
    (end of quote)

  6. Re:go to gmail by Chmarr · · Score: 5, Insightful

    Gmail is beta.

    Gmail does not have guaranteed uptime.

    You do not pin your companies communications system on something you cannot sign a SLA agreement with.

    need I go on? :)

  7. CommunigatePro from Stalker.com by ejoe_mac · · Score: 5, Informative

    1) It'll run on anything - Win32, Linux, BSD, Solaris, x86, XServers, Alphas, Power5
    2) It'll scale as big as you can dream - over 5 million accounts with clustering
    3) MAPI support

  8. Re:POP? by mre5565 · · Score: 5, Interesting
    A million users and they want POP3? Add a gun and a single bullet to your administration requirements.
    No doubt a well deseved +5 for humor, but for those of us less in the know (and a chance at another +5 for informative), what is so bad about POP3? Thx.
  9. Re:POP? by Mr.+Underbridge · · Score: 5, Funny

    I'd ask for six bullets. Why would you want to risk getting the empty chamber? I see that you are familiar with the subtle nuances of Polish Roulette.

  10. Simplicity is key. by chrome · · Score: 5, Informative

    My job is building systems like this. Current mailserver system I designed and built is hosting 80,000 email accounts, and will scale out to a million quite cheaply by just adding more machines.

    OpenLDAP

    You need a central configuration repository to store the email accounts, their passwords, etc. OpenLDAP is perfect for this, and you can replicate it out for scalability. Be prepared to learn about LDAP schemas.

    Exim

    Use Exim because it has a simple process model (a single binary that does all the work, like sendmail) but has a human readable configuration file and has to be the most flexible MTA out there. You will have customers with weird requirements sometimes, and Exim will be able to meet those. Plus, it has Exiscan-ACL built-in these days, which allows you to do virus scanning and spam scanning at the DATA stage, before the mail is actually accepted by the MTA. It means you can make the sending MTA deal with the bounces if the mail is a virus or is obvious spam.

    Courier-IMAP for POP3 and IMAP access.

    Yeah its written by a sociopath, but nothing else works as good in the field. It works out of the box with sensible LDAP schemas and is fast, reliable and secure. Handles SSL, all the different authentication methods, what have you. Maildir compatible.

    Maildir message store.

    Store the mail in maildirs. Don't put them in /maildirs/domain.com/user/Maildir - split the domains up with a 2 level deep hashing algorithm (if you're virtual hosting domains, which is what it sounds like to me), so make it something like /maildirs/xx/xx/domain.com/user/Maildir, where xx/xx might be something like 3f/6b (depending on the hash). Use MD4 for the hash because its more balanced than MD5.

    NFS mount the maildirs from a fast NFS device like a Netapp. Netapps are recommended because you can plug them in, and they just work, plus they are easy to scale by adding more trays.

    Linux NFS servers set up with heartbeat and shared disk also make a nice HA NFS, and would be cost effective, but you'll have to buy an array anyway (probably fiber channel) so it might be better just get something thats completely integrated like the Netapp.

    Spamassassin.

    Can be configured to scan make at DATA time in the SMTP conversation. A LOT of configuration work here to make it play nice on a massively scaled platform, but it can be done. Mostly it needs to have things like the auto whitelisting and bayseasn filtering turned off, as the extra DB file work is a bit excessive.

    Actually, I'm sure there is a way to make it work with a less resource intensive repository, but using the standard SA rules seems to work well for my environment. *shrug*

    ClamAV.

    Free antivirus, it works, and integrates well with Exiscan-ACL. Set it up to scan via the daemon, and configure it to update every couple of hours from cron, and bob's your uncle.

    Scaling out

    Make every box the same. Make every box an MTA, a POP3/IMAP server, etc. Use something like Kickstart to automate builds so that you can build a machine in 10 minutes, and all you have to do is configure the IP address and plug it in. If you want to be REALLY sexy, you could make the machines boot off the network, and mount / from a shared NFS area, and make /var/spool/exim the internal mirrored disks. DHCP them, then all you do is plug a machine in and set it to PXE boot. Pretty trivial to do.

    Load balancing

    Hardware load balancers are pretty much a necessity. Don't touch cisco stuff. Its not very good. Go with Foundry Networks ServerIrons. The XLs can handle 1 billion requests/day if you configure them in Direct Server Return mode (also known as DSR/Foundry switchback). Use it. It makes all the return traffic go directly out to the net, meaning your ServerIrons have to switch less traffic and track less sessions. I would recommend however for a million users a pair of the ServerIron 450GTs, or bigger. Maybe one per VIP/Service.

    Now, if this is all looking pretty daunting, you could always hire me to build it for you :)

  11. While we answer this question... by hellfire · · Score: 5, Funny

    ... Is anyone wondering what's going on at Microsoft right now?

    It starts with a slashdot geek working in the email department spitting up his coffee, followed by a few rumors which make it up to a guy in accounting and customer service, followed by frantic management emails, including some inappropriate language, from Steve and Bill. Then a few good geeks start tracing who this cfsmp3 guy is and try to trace him to a company while the salesreps begin coldcalling any customers running around 1 million customers.

    And Microsoft will botch it because they have no experience in cowtowing and bootlicking, which are important skills for any company who wants to humbly keep its customers.

    --

    "All great wisdom is contained in .signature files"

  12. Easy by xihr · · Score: 5, Insightful

    Resign. You're obviously in way over your head if you have to resort to asking Slashdot readers for advice like this.

  13. Intelligent Architecture by Anonymous Coward · · Score: 5, Informative

    Hi Cliff;

    Sounds like a fantastic design opportunity here. The 5% of the project that is Enterprise architecture is what I enjoy the most as well. I'm assuming money probably isn't an object in terms of how much gear and bandwidth you may have to feed to this.

    I'm happy to let my fingers type away below, I'd love to keep in touch and see how you end up shaping this system. my email is allowmx at hotm...

    Before I ask, are there actually a million accounts? Or is that just a ceiling that you have to show proof of concept with?

    I've only implemented up until about 250,000 accounts of any kind, as I'm sure you're probably aware, the base transactional resource costing is essentially the same..

    For me, I would look at this for sure from at least these two angles:

    1) knowing your transactional costs (how much of your hard resources, bandwidth, cpu and disk space) will each type of transaction in your system take?) I mostly use this approach to get not an exact number, but an idea of magnitude, and detail where it happens on it's own to make sure the proper attention is applied to them.

    2) Failsafe intelligence & capacity in the infrastructure, as well as the failsafe intelligence & capacity in at the application layer. You have to know that your hardware, software, os, business logic and applications are all monitorable internally, externally for availabilty and actual "can I use it". Transactional logs, etc, of having information available when the inevitable problems come up.

    Also, having a capacity for as many of these layers to be self-healing, and fungible to the point that your service delivery is homogenous in as many ways possible. If your network finds something doesnt work or route, with mail, you can find another way to route it. Having a transactional manager of some kind, direct or not, could be useful in this case depending on what the client wants.

    99.9% uptime equates to about 526 minutes, or 87.6 hours you _could_ be down each year. Thats about 7.3 hours a month, or one day a month.

    Based on that, having flexible, redundant tools setup in a high-availabily arrangement at their respective operating capacities is key. I'm not sure if your current exchange problems are being aided by not enough equipment, bandwidth, or other stability issues, so I'll just assume that it's all of them :)

    I apologize if anyone else has already mentioned some of this, but here's some of what I've found to help me where email has become as crucial to a business as their cell phone.

    On the hardware level:

    - STORAGE: Everything goes on a SAN, if not more than one. Don't waste your time with anything less.
    - SERVERS: All servers have redundant hot swappable parts in the very least, power and hard drives. I'd even suggest making the servers Iscsi bootable so they can boot off the backbone. Beyond this, I like to buy my servers in piles of identical ones. Have 1-2 spare serevrs of each kind sitting there, ready to throw hot swap drives into from a failed server. That way if a server dies, you can address the power supplies, or get the HD's in that machine into another identical server and get it up and running while you diagnose the hardware problem independantly. My approach to any kind of problem is FIX, DETECT and REPAIR. Get it up and running, find out what was wrong, make sure it's fixed for good. Too many of us stop at the first too ;)

    The idea I have in mind is a smaller scale of a google beige box army. linux/bsd offer so much more transcations for each piece of hardware, so that works very much in your favor. Obviously something enterprise grade to satisfy the client such as the Compaq/HP Proliants, etc. I feel these Servers ahve the best overall support, manageability and information tools, and their openlinux drivers interface wonderfully with open source operating systems)

    Networking/Communication level:

    - Entire mail processing architechture communi

  14. Re:Qmail!! by Pharmboy · · Score: 5, Insightful

    A single server? For one million users?

    Insert "imagine a beowolf of those" joke here, except it isn't a joke.

    I think you might be underestimating the requirements for this large a project that "must scale perfectly". The "99.9% uptime is expected" requirement alone requires multiple internet connections, a large cluster of front end servers, and redundent database servers, preferably located in different states. (ie: "What do you mean our only server is in New Orleans?")

    I don't think the average Dell dual Xeon box is up to the task for this large a project...

    --
    Tequila: It's not just for breakfast anymore!
  15. Re:I'd start by by moranar · · Score: 5, Funny

    Ah! Sendmail!

    --
    "I think it would be a good idea!"
    Gandhi, about Internet Security
  16. Re:NO GMAIL by Denis+Lemire · · Score: 5, Interesting

    Definately agree on point 9. I maintain a mail server of over 2,000 users. Currently running Qmail with the following patches:

    chkuser-2.0.8b-release.tar.gz
    doublebounce-trim.patch
    netqmail-1.05-tls-20050329.patch
    outgoingip.patch
    qmail-smtpd-auth-0.31.tar.gz
    qmail-smtpd-auth-close3.patch
    qmail-smtpd_gmfcheck.patch
    qmail-spf-rc5.patch

    Most of these patches require hand editing the sources and Makefiles to successfuly merge them all into the stock qmail or netqmail base. Lots of manually reading through *.rej files to make it all work.

    In order to simplify new installations I've created my own personal CVS repository for my Qmail sources. I commit changes to the tree whenever a new patch comes out with functionality I need. Hence on a new install I simply check out my custom tree and compile.

    The initial work was a royal pain in the ass, however, once it is all up and running the stability and performance has been excellent.

  17. Army Knowledge Online does it for 1.72 million use by kenblakely · · Score: 5, Informative

    AKO (www.us.army.mil) is the Army's official intranet portal. We provide email for over 1.72M users, and we move almost 3 million messages a day. We do it all with Sun Messaging Server ver5.2 (soon to be Jes3) and we have exactly 2 (count 'em) two mail administrators. Sun mail is rock solid and scales great. We offer POP, SMTP, enterprise SPAM and Virus filtering as well as personal address books besides. We don't get the rich Outlook fat client, but then we want to be all web-based anyway. Can't say enough about Sun mail. If we had to do this with Exchange, I'd have to hire prolly 50 admins and deploy order of magnitude more machines.

  18. Re:Qmail!! by Allador · · Score: 5, Informative

    No. 0.1% != 0.1

    365 days * 24 hrs/day = 8760 hours per year

    0.1% downtime = 0.001 downtime

    8760 * 0.001 = 8.76 hrs

    You're off by two orders of magnitude.

    8.76 hrs / 12 months = 0.73 hrs/month = 43.8 minutes/month

    One 45 minute scheduled downtime (assuming its scheduled) per month isnt terrible. It's not great, but costs really start to go up as you add nines beyond those 3.