Slashdot Mirror


Infrastructure for One Million Email Accounts?

cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"

198 of 1,216 comments (clear)

  1. Obviously by SpiffyMarc · · Score: 5, Funny

    I'd start by submitting a question to Ask Slashdot.

    1. Re:Obviously by CDMA_Demo · · Score: 5, Funny

      I'd start by submitting a question to Ask Slashdot.

      Upon which the global "wankfest" will commence, leading to solutions ranging from Novell to qmail based solutions, upon which the OP will look for someone else for advice, upon which the OP will end up paying an IBM consultant to set up his company's email.

    2. Re:Obviously by WarPresident · · Score: 5, Funny

      I'd start by submitting a question to Ask Slashdot.

      Ah, a proof by contradiction, eh?

      --
      Here come da fudge!
    3. Re:Obviously by dzelenka · · Score: 2, Funny

      Or complain loudly enough to be an embarrasement to Microsoft and they will supply equipment and support to get Exchange running smoothly!

      --
      Bah!
    4. Re:Obviously by kryonD · · Score: 5, Interesting

      Or maybe this is a legitimate cry for help from EDS who duped the US Navy into thinking they could actually outsource IT on the exact scale that the poster is talking about. Mind you, no one has ever provided ubiquitous support for an organization as large as the Department of the Navy, but they somehow convinced congress that they could do it for $6B dollars.

      Just so you know. Most of us out in South East Asia refer to NMCI (Navy-Marine Corps Intranet) as the Not Mission Capable Intranet.

      --
      I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
    5. Re:Obviously by whackco · · Score: 5, Insightful

      Actually, I was going to use "Obviously" as my subject line... so I'll just respond to yours.

      I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.

      Stilll convinced to migrate? Well, something with multiple datacenters, large scale, compressed SAN backend, and alot of clustering will do it. Shit, you could do the entire thing with MySQL if you REALLY wanted to. Moving the existing data over will be a huge pain no matter what you migrate to though.

      My suggestion? Don't just jump off Exchange, do a proper requirements analysis and you might find it is alot cheaper to just redesign the existing architecture.

    6. Re:Obviously by EnderWiggnz · · Score: 4, Interesting

      WalMart runs the worlds biggest Exchange install. They and msft are quite proud of it, actually...

      The Navy maywant to take a page out of walmarts book, if they're having that much trouble.

      --
      ... hi bingo ...
    7. Re:Obviously by AKAImBatman · · Score: 5, Funny

      upon which the OP will end up paying an IBM consultant to set up his company's email.

      At which point the highly paid consultant will post a question to Ask Slashdot...

    8. Re:Obviously by 88NoSoup4U88 · · Score: 5, Funny

      The obvious answer is of course : Send all those thousand employees an Gmail invite !

    9. Re:Obviously by Stephan+Schulz · · Score: 5, Funny
      Or complain loudly enough to be an embarrasement to Microsoft and they will supply equipment and support to get Exchange running smoothly!
      Yes, but who can affort the space, electricity and cooling for 500000 servers (generously assuming that Exchange can handle 2 users per server)?
      --

      Stephan

    10. Re:Obviously by Karl+Cocknozzle · · Score: 5, Informative
      I work with Exchange, and think that the chances are better that they just had shitty architecture to begin with. Exchange is a great platform and scales well, so if the original people wouldn't do it, well then f*ck em.

      Your point about putting more effort up-front into design is well taken, but thhat advice applies to any platform...

      WIth that said, and without turning this thread into an Exchange bitchfest...

      Why in the hell can't you restore a mailbox from backup using only the tools you already have if the user is no longer present in Active Directory? You can't even export the mailbox with EXMERGE... Your choices are 1) 3rd party recovery tool (like Quest Recovery for Exchange) or 2) Build an ENTIRE OTHER SERVER and do a normal, full restore of the entire mail store so you can extract one measly mailbox.

      OBviously, the "Recovery Storage Group" feature is a VAST improvement over the old Exchange 5.5 way of bringing back just one mailbox (that being setup another server) but this is a MAJOR duh situation on Microsoft's part. They seem to think that since their "best practice" is to never ever erase any user account ever ever ever, that its okay to leave this gaping flaw in their enterprise groupware product. Sorry, but I think that sucks. We paid out the ass for "Enterprise" edition (to avoid the arbitrary 16gb limit on the mail store) and goddammit, I should be able to bring back a mailbox without its corresponding AD account without wasting a whole day setting up another server... I've only had to do it once (today) but the whole time I Was thinking how much esaier a mailbox restore on my OS X Server at home would be... Just restore the frickin' files and move on with your life.
      --
      Who did what now?
    11. Re:Obviously by HalWasRight · · Score: 2, Insightful

      Obviously school just started.

      --
      "This mission is too important to allow you to jeopardize it." -- HAL
    12. Re:Obviously by jrockway · · Score: 4, Interesting

      > you could do the entire thing with MySQL if you REALLY wanted to

      I am so tired of people shoving everything into relational databases. What queries are you going to run against your database, anyway? SELECT * FROM messages WHERE read=0? Try "ls new" in your maildir. The reason things never scale right is because people design things to be "new" and "cool" like putting their e-mail into a relational database. No. Just use the filesystem. It, and its supporting tools, have been around for 30 years! It Just Works! It doesn't use any userspace memory! There are no permissions issues, because the kernel controls the permissions. It's the optimal solution.

      The filesystem is really really efficient (for e-mail) and really really reliable.

      Please, don't use a database!

      --
      My other car is first.
    13. Re:Obviously by superpulpsicle · · Score: 4, Interesting

      The Walmart exchange site was not properly backed up for "years". Mostly because Exchange was not 3rd party software friendly at all, and M$ didn't have much of their own backup software to offer. Veritas and Legato couldn't bend over enough for a million users.

      Walmart invited countless consulting firms and data backup experts. They deployed Exchange strictly because M$ was willing to "support" them. To say they were vulnerable to a major IT disaster was an understatement. The Navy want nothing to do with Walmart's IT.

    14. Re:Obviously by mollymoo · · Score: 5, Funny

      I didn't think there was anything more tragic to do on /. than boast about a first post. But the idea of boasting about a first post you didn't even make had never occurred to me. Kudos.

      --
      Chernobyl 'not a wildlife haven' - BBC News
    15. Re:Obviously by pjbgravely · · Score: 5, Funny

      WalMart runs the worlds biggest Exchange install. They and msft are quite proud of it, actually...


      Thanks, another reason to never shop there.

      --
      Star Trek, there maybe hope.
    16. Re:Obviously by MarkGriz · · Score: 4, Funny

      Whoa there cowboy...

      He said "up".... beat yourself *up*

      --
      Beauty is in the eye of the beerholder.
    17. Re:Obviously by cecil_turtle · · Score: 3, Interesting

      I don't know if you actually have experience running a mail server or not or if you just wanted to go off on your relational db rant, but mail data tends to be created and deleted A LOT with varying size files, and file-based structures on a mail server create serious fragmentation problems. If you do decide to go this way, allow plenty of free drive space - well above normal recommendations - like 80% free or more.

      Also many people have their mail clients set with ridiculousy frequent mail check times (like every minute), and on a file based system each check requires a trip to the drive and back. Even with the data on a RAID array with a decent read/write cache, you're still going through the disk subsystem, whereas with a database it would all be in memory.

      What's wrong with SELECT * FROM messages WHERE userid=xyz and read=0? That is a cakewalk for a properly indexed dbms. On a medium sized server (say, quad processor w/ 8-16GB RAM) there is more userspace memory than os memory space.

    18. Re:Obviously by Not+The+Real+Me · · Score: 3, Insightful

      What does Hotmail run these days?

      I am under the impression that if Hotmail were running clusters of Exchange servers Microsoft would be quite vocal in the enterprise scalability of Exchange.

    19. Re:Obviously by jerkychew · · Score: 4, Insightful

      Since you've taken things off topic, I'll grab the wheel and pull it right off a cliff.

      The reason Exchange uses a database can be summed up in three words: Single Instance Store.

      Say you send one 1MB Word document to 100 of your colleagues. In a relational database-based, Single Instance Store-driven mail server, that document takes up exactly 1MB on the server. If somebody in the organization forwards the Word doc to the remaining 900 people in your organization, how much space does it take on the server? 1MB.

      Send a 1MB document to 1000 users on a flat, mbox-style mail server, and how much space is taken up on the server? 1000MB.

      I see your point about some things, sure. Being able to jump in and restore a mailbox from tape by just dumping a folder somewhere is nice, but it just doesn't scale in terms of storage the way a db-driven mail system does.

      Don't flame me as an MS advocate. There are times when an SIS-based email system is good, and there are times when a flat email system is good. I've run Exchange environments for 500+ people, and I've run Linux-based mail systems for 1000+ people. I'm just saying that your particular argument is one-sided and flawed.

    20. Re:Obviously by cc.Scotty · · Score: 3, Funny

      Get your company signed up as an early adopter on the next beta version of Exchange. It will surely solve all your problems!

    21. Re:Obviously by AnyoneEB · · Score: 4, Insightful

      Or you could just use a filesystem that supports hard-linking files (see: man ln), so you do not have to worry about that even when using a filesystem for this purpose. Since such a file is read-only, it could just be linked to all of those people's mail boxes. If you do not know what a hard link is, it is basically the same thing you are describing, except done in the filesystem and handled transparently by the kernel. Basically, every "file" you see in an Ext 2/3 filesystem is really just a pointer to where the file is stored, and any actual file can have as many as these links as you want. When there are no remaining links to a file, it is allowed to be deleted.

      --
      Centralization breaks the internet.
    22. Re:Obviously by doshell · · Score: 3, Insightful

      Say you send one 1MB Word document to 100 of your colleagues. In a relational database-based, Single Instance Store-driven mail server, that document takes up exactly 1MB on the server. If somebody in the organization forwards the Word doc to the remaining 900 people in your organization, how much space does it take on the server? 1MB. Send a 1MB document to 1000 users on a flat, mbox-style mail server, and how much space is taken up on the server? 1000MB.

      Speaking of which, is there any filesystem around that "automagically" detects redundancy and avoids storing the same data twice (i.e. two files with the same content end up being stored only once)? (I don't mean hardlinks. Suppose I download some file for the second time without knowing the first instance exists). I suspect this would add a lot of overhead to the filesystem driver, but it'd certainly be a cool feature.

      --
      Score: i, Imaginary
    23. Re:Obviously by the+real+darkskye · · Score: 3, Interesting

      The mods are on crack, the meta-mods are on pot

      --
      Music is everybody's possession.
      It's only publishers who think that people own it.
      Fuck Beta
      ~John Lenno
    24. Re:Obviously by Aceto3for5 · · Score: 2, Insightful

      Amen to that. I support a base that is one of the last holdouts against NMCI. (IBM was involved in the biddg process originally, and once they saw the scale of the project laughed and walked away.)As it is, we pay millions a year towards NMCI for the limited email-only version, which no one uses because it never works. Now that its going come full bloom, the talk of the town here is that we will end up with two networks, two jacks at each desk, one NMCI and one functional. Talk about wasting tax money!

      The biggest infrastructure problem plauging EDS right now is constructing a building large enough to hold all the money they are bilking out of us.

    25. Re:Obviously by AKAImBatman · · Score: 2, Informative

      Mo it doesn't. Grep searches horribly slow. If you're sorting through 2 gigabytes of email (a fairly common amount per user in corporations), you're going to be heavily limited by the disk speed and processor time. i.e. Searches could take on the order of minutes. Not good when you want to show a list of emails and the user attempts to sort by something, or search for that email from three years ago.

    26. Re:Obviously by Karl+Cocknozzle · · Score: 3, Informative
      You run the cleanup agent which shows you the tombstoned mailbox, you can then right click that and reconnect it to any Active Directory user.
      ...right up until the 30-day default and then your "tombstoned" mailboxes are gone, never to return--without the achingly painful "restore server" scenario. Hope you weren't counting on being able to bring them back until the end of time... Because unless you changed the default setting from 30-days, that is all the time you get. Sorry I didn't mention the 30+ days timeframe earlier, but I was on my way to the pub and didn't realize some exchange fanboy would be mortally offended by my least favorite feature of an otherwise decent product.
      --
      Who did what now?
    27. Re:Obviously by afidel · · Score: 2, Informative

      GE runs Exchange. I don't know of any company that has more employees likely to use email then GE (Walmart has more employees but a LOT of them are minimum wage drones who are unlikely to need email access). If they can make it work, and work well, I don't think anyone can deny that it's enterprise ready =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    28. Re:Obviously by joib · · Score: 4, Informative
    29. Re:Obviously by Fulcrum+of+Evil · · Score: 2, Interesting

      Exchange/Outlook will let you modify the attachment in place and keep it in your mailbox.

      Are you saying that I can send a file to 100 people, then edit it after I send it and leave the 100 people with no audit trail? That's horrible!

      --
      "We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
    30. Re:Obviously by Electrum · · Score: 2, Informative

      Please point to a mail system that actually uses the file system to hard link 1000 emails like the grandparent proposed.

      http://asg.web.cmu.edu/cyrus/download/imapd/overvi ew.html#singleinstance
      http://doc.powerdns.com/powermail/indepth.html#AEN 824

    31. Re:Obviously by chrisd · · Score: 2, Funny
      There are worse ideas.

      /me runs...

      --
      Co-Editor, Open Sources
      Open Source Program Manager, Google, Inc.
    32. Re:Obviously by raynet · · Score: 4, Interesting

      Plan 9 OS has filesystem that does just this. I think it was called Venti. Basicly it hashes the datablocks on the filesystem and only stores each unique block once. There was (is?) project where the filesystem was being ported to Linux.

      --
      - Raynet --> .
    33. Re:Obviously by sco08y · · Score: 2, Interesting

      I am so tired of people shoving everything into relational databases.

      What relational DBMSs? All I've heard discussed are SQL products.

      The filesystem is really really efficient (for e-mail) and really really reliable.

      I'm tired of everyone shoveling everything into a filesystem.

      How are you going to run queries against your contacts? Or your appointments?

      How does a filesystem guarantee referential integrity? Can a filesystem guarantee an appointment doesn't exist for a bogus contact?

      *Any* kind of integrity? Can a filesystem guarantee that a message is well formed?

    34. Re:Obviously by QuietLagoon · · Score: 2, Insightful
      Moving the existing data over will be a huge pain no matter what you migrate to though.

      Yup, that's a big problem with Microsoft Exchange's proprietary datastore.

      Like the roach motel, data goes in, but you can't get it out.

    35. Re:Obviously by Tuna_Shooter · · Score: 4, Insightful

      One BIG issue between what people are running now and what they will HAVE to run soon is the little item of SOX compliancy. Be VERY careful that your little million user mail system is compliant or the implementation costs will double. Believe me i do this for a living and just saw one of our financial clients get stung big time.

      --
      *--- Sometimes a majority only means that all the fools are on the same side. ---*
    36. Re:Obviously by lgw · · Score: 2, Insightful

      Veritas and Legato couldn't bend over enough for a million users.

      I'm pretty sure that both Veritas and Legato can scale to a million exchange mailboxes, but as it happens Wallmart used Tivoli (which should scale that large as well, given its mainframe background). It's strange that they didn't have Exchange backups with a high-end backup product in place corporately - but I know next to nothing about Tivoli. Was Wallmart just being cheap?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    37. Re:Obviously by bluGill · · Score: 3, Insightful

      No they cannot. Microsoft does not want you backing up mailboxes. You backup mailstores, which are several (hundred - however many will fit on a single disk partition) mailboxes. This works great for disaster recovery, you restore the failed disk.

      It is worthless for a single user who just deleted some important message. You end up building a new exchange server, and then restoring the entire mailstore, than going into that box and grabbing the one message. Veritas (I presume Legato as well) has an option to go in an grab each message from the mailbox one at a time. However this is slow - 1/5th the speed of a normal backup.

      I work for, a company that competes with Veritas and Legato (though we try for much smaller accounts, big enterprizes need things we don't provide). We do Exchange backup, and are pretty sure that Veritas is doing it exactly like us. I strongly doubt anyone can scale mailbox level backup to millions of users.

  2. Easy. by Chess_the_cat · · Score: 5, Funny

    gmail.google.com

    --
    Support the First Amendment. Read at -1
    1. Re:Easy. by Anonymous Coward · · Score: 2, Insightful

      Assuming you dont mind google scanning your internal email achives looking for interesting business information!

    2. Re:Easy. by nherm · · Score: 2, Funny

      Obviously this cfsmp3 guy is one of these phd that google hired for creative solutions... so google asked him how to expand its array of mail servers in one million accounts, and guess what is the cheapest solution that this brilliant cs phd discovered?

      Ask slashdot, of course!

      Nice try, google, you evil overlord...

      /tinfoilhat

    3. Re:Easy. by ComputerSherpa · · Score: 2, Informative

      I've had GMail go down once or twice in the year I've had my account. Problem might be on your end. Holy frick, I've had my GMail account for a year. And three days. O.O

      --
      Information wants to be anthropomorphized!
  3. Kerio by Anonymous Coward · · Score: 2, Informative

    I would start by talking to Kerio , their mailserver is very scaleable. www.kerio.com

    1. Re:Kerio by epiphani · · Score: 3, Informative

      Or outsource the whole damn thing. There are dozens of providers out there that could drop a rack worth of gear into your datacenter and maintain the whole thing with plenty more experience in handling mail systems of that size. And at that level, I'm sure you'd have no problem getting it branded however you like.

      disclaimer: I work for one of those companies.

      --
      .
  4. I'd start by by technoextreme · · Score: 4, Funny

    bashing my head up against a desk.

    --
    Ooo man the floppy drive is broken. No wait. The computer is just upside down.
    1. Re:I'd start by by moranar · · Score: 5, Funny

      Ah! Sendmail!

      --
      "I think it would be a good idea!"
      Gandhi, about Internet Security
    2. Re:I'd start by by Viper233 · · Score: 2

      Ah! Sendmail!
      I think the bashing of the head into my desktop would result in fair less pain (and brain damage) then trying to run sendmail....
      Yes, I'm bashing sendmail

    3. Re:I'd start by by jdunn14 · · Score: 2, Funny

      Come on, just move the keyboard under your head and bash away.... it'll make a valid sendmail config....

  5. Um... by Stevyn · · Score: 3, Informative

    I'd start by contacting people who know how to do it and can actually help you. A few responses on slashdot aren't going to help you along the entire process. Maybe even bring in a consultant.

    1. Re:Um... by ugo · · Score: 3, Funny

      I think he is the consultant.

  6. qmail by tadauphoenix · · Score: 2, Insightful

    I've always favored it, and with some scripting/automation, I wouldn't see why you couldn't scale that large with inexpensive hardware.

  7. For starters... by cached · · Score: 3, Interesting

    For starters, uptime should usually be higher than 99.9% for this large a site. 99.9% uptime means 40-45 minutes of downtime a month. Try going for 99.99% at least, though this usually increases the cost by about 250% according to what I have seen a few years back.

    --
    +1 funny, -2 overrated. Life isn't fair.
  8. for spam... by file+cabinet · · Score: 2, Informative

    take a look here: http://www.webhostingtalk.com/showthread.php?threa did=441925 .. the post by slidey is possibly the most useful.

  9. NO GMAIL by Anonymous Coward · · Score: 2, Informative

    I would have to say use Qmail on a freeBSD/Linux system. If you look at yahoo they have millions of email accounts and use qmail wich is very stable and very portable.

    1. Re:NO GMAIL by Alan+Hicks · · Score: 4, Interesting
      I would have to say use Qmail

      My God no! Friends don't let friends use qmail. Want reasons why?

      1) It's a bitch to install. Won't even compile on modern Linux distributions. You have to patch it to compile it and the patch isn't even hosted on qmail's site.
      2) It's a bitch to configure. Rather than parsing a single configuration file, qmail relies heavily on the presence of individual files in a directory.
      3) Not not not not scalable! That's a myth. Doesn't properly batch jobs together. Hell! qmail was originally designed to be run from inetd!
      4) Heavy reliance on other daemontools.
      5) Breaks well-known and understood UNIX standards.
      6) Security through lack-of-functionality.
      7) Not really secure despite the claims.
      8) No longer maintained.
      9) No features. Adding them requires patching, and patching, and more patching.

      Serious sysadmins don't use qmail and for damn good reason. I don't give a damn if Yahoo did manage to string it together and make it work well. In short, qmail isn't particularly suited for deployment in any capacity.

      --
      Slackware, what else when it must be secure, stable, and easy?
    2. Re:NO GMAIL by Denis+Lemire · · Score: 5, Interesting

      Definately agree on point 9. I maintain a mail server of over 2,000 users. Currently running Qmail with the following patches:

      chkuser-2.0.8b-release.tar.gz
      doublebounce-trim.patch
      netqmail-1.05-tls-20050329.patch
      outgoingip.patch
      qmail-smtpd-auth-0.31.tar.gz
      qmail-smtpd-auth-close3.patch
      qmail-smtpd_gmfcheck.patch
      qmail-spf-rc5.patch

      Most of these patches require hand editing the sources and Makefiles to successfuly merge them all into the stock qmail or netqmail base. Lots of manually reading through *.rej files to make it all work.

      In order to simplify new installations I've created my own personal CVS repository for my Qmail sources. I commit changes to the tree whenever a new patch comes out with functionality I need. Hence on a new install I simply check out my custom tree and compile.

      The initial work was a royal pain in the ass, however, once it is all up and running the stability and performance has been excellent.

  10. 1 Million Users! by joesucks · · Score: 2, Informative

    Wow, That is pretty huge scale but if Google, MSN and Yahoo have supported that many, and many more users all along open the back doors to see what they are doing? If it were me Linux obviously, Hi-Availability Clusters, some kind of solid indexing. Its still email :)

  11. POP? by lseltzer · · Score: 4, Funny

    A million users and they want POP3? Add a gun and a single bullet to your administration requirements.

    1. Re:POP? by JoshWurzel · · Score: 5, Funny

      I'd ask for six bullets. Why would you want to risk getting the empty chamber?

    2. Re:POP? by tktk · · Score: 4, Funny

      I'd ask for enough bullets to handle the department thats making you to do this.

    3. Re:POP? by Anonymous Coward · · Score: 2, Funny

      I'd ask for 900,000 bullets

    4. Re:POP? by mre5565 · · Score: 5, Interesting
      A million users and they want POP3? Add a gun and a single bullet to your administration requirements.
      No doubt a well deseved +5 for humor, but for those of us less in the know (and a chance at another +5 for informative), what is so bad about POP3? Thx.
    5. Re:POP? by euxneks · · Score: 3, Funny

      I'd ask for six bullets. Why would you want to risk getting the empty chamber?

      Exactly!

      Remember, redundancy is good! ;P

      --
      in girum imus nocte et consumimur igni
    6. Re:POP? by Mr.+Underbridge · · Score: 5, Funny

      I'd ask for six bullets. Why would you want to risk getting the empty chamber? I see that you are familiar with the subtle nuances of Polish Roulette.

    7. Re:POP? by QuasiEvil · · Score: 2, Insightful

      If my company would only go BACK to POP3, my life would be so much easier. First, we moved from POP3 to IMAP - no big deal, but I don't care for IMAP and the whole remote folder thing. However, it just required me to modify fetchmail to dump it in the mail spool on my linux box, same as always. Then set Windows box with Eudora to leave mail on my linux box for 2 days. Then, I can use Eudora as I want, mail is stored on my Windows box, and I can read it using pine over SSH for 48 hours. Worked great, did everything I needed for five years.

      As of six months ago, we have Exchange/Outlook, and no POP3/IMAP access to the server at all. You're stuck with Outlook or webmail based on how it's configured. After much reconfiguration, I finally got Outlook to behave mostly the way I want - including delivering mail locally rather than leaving it on some server a thousand miles away (literally, not joking here). Now if I didn't hate everything about Outlook...

      All I want, and all I've ever wanted, is to be able to grab my messages easily and put them on my machine, not stored on a server somewhere. POP3 is great for that. It does absolutely everything I want and need for mail, and it's dead simple. Even if you don't make it the standard implementation, it'd be nice if admins everywhere left those of us who know what we're doing the option of using it.

    8. Re:POP? by lukewarmfusion · · Score: 3, Insightful

      I was curious about that, too...

      Wal-mart has an estimated 1.6 million employees. (source)

      General Motors, by contrast, has approximately 360,000 employees.

      The post says "around one million accounts" which is very different from one million employees. I have over ten email accounts that I actively use for receiving mail and four to six for sending.

      An ISP could easily have millions of accounts. But since he said "huge" company, they were using Exchange, and because he's asking Slashdot my guess is that he's not at an ISP. Instead, I'd guess he's at a medium-sized company that might offer email accounts to its customers or at a large company that also contains many subsidiaries (but wants one email domain for all of those).

    9. Re:POP? by sgt_doom · · Score: 2, Funny
      SO! China's Department of Public Security has finally gotten around to developing its own email.

      At last!

    10. Re:POP? by Anonymous Coward · · Score: 4, Insightful

      what is so bad about POP3

      Having never been near a computer, I have no idea. If I had to guess, I'd suppose that with a million users, 100,000 of them will have to be constantly reminded to delete their mail off the servers. 25,000 of them won't EVER delete their mail no matter what you do, and 5,000 will bitch and whine when you cap their fucking mailboxes. One of them will be the CEO, and he'll berate you in front of his smarmy suspender-wearing jerkoff golf buddies because you're a dumb hick that can't fit a terabyte of mp3s and porn (most of it redundant for chrissakes) into only 500 gigs of disk. You will also get to deal with countless issues involving different email clients. You would give almost anything to have a massive natural disaster wipe everything out so you didn't have to go to work tomorrow, but there's the wife and kids, so y'know, there it is.

  12. ~ 320K accounts by Anonymous Coward · · Score: 5, Informative

    At IBM we use Lotus Notes which has saved us LOTS of virus hassles. Every employee has an account and we're something like 320,000 worldwide. The mail "databases" are spread among Domino servers but I don't know what platform these run on, or what hardware specs they have. I imagine it's either Windows or Linux... but who knows, maybe we're using some of our PowerPC-based iSeries servers. These are the boxen formerly known as AS/400.

    1. Re:~ 320K accounts by DaveCar · · Score: 5, Funny

      The mail "databases" are spread among Domino servers

      Yeah, but we all know what happens when one of these Domino servers falls over ...

    2. Re:~ 320K accounts by Nefarious+Wheel · · Score: 2, Informative
      I've designed and administered Exchange, Notes, DEC All-In-One, a few *nix based mail systems and a few others, some of them quite large (water utilities, national postal systems among them). Notes took over the role of most egregiously unpleasant mail system to set up or administer when MS Mail died. Very admin-hostile.

      Argue for your favorite all you want, but friends don't specify Lotus Notes to friends.

      --
      Do not mock my vision of impractical footwear
    3. Re:~ 320K accounts by bittmann · · Score: 2, Insightful
      320,000 accounts on a single iSeries? Child's play. I doubt that IBM is only on "one box" though, given the wide-ranging network that Big Blue maintains.

      1 million total users at 99.9% uptime as per the original request? Not exactly "child's play", but honestly, not much harder

      Domino on iSeries does seem to be a reasonable option for a deployment of this size, especially given the rather generous uptime allocation that is being offered..."3 nines" being EXTREMELY generous for an iSeries shop (you'd even be able to schedule monthly downtme on purpose and still meet this uptime goal.)

      I do note that IBM has benchmarked Domino on a 16-way Power5-based iseries at a 33ms response time for 175,000 concurrent users (details here: http://www-03.ibm.com/servers/eserver/iseries/domi no/scalerecord.html)...and given the limited usage pattern of POP3 (yuck!), a properly-deployed solution should be able to meet the published needs with just one server. AND provide backup. AND enable the user to restore an individual mail store, mail box, or object on-demand. If high-availability or higher performance is necessary, 2 servers could be deployed in several different configurations (mirrors, clusters, HA failover, etc.).

      And if the moans of "Outlook-only users" get to be too much of a problem, IBM offers a "connector" that can offer MAPI access to Domino's mail store.

      Hell yes, I'm an iSeries fanboy. Those machines have proven themselves to be reliable, capable, economical systems over the long haul. Now, while (due to price) I wouldn't suggest deploying an iSeries to be a simple file, print, web, or small-database server, true...but when you need to move freight and *lots* of it, but you don't want to spend hours every week in operating and administering the system, it's hard to beat the venerable System/38 ne AS/400 ne iSeries systems.

  13. It's obvious by gulfan · · Score: 4, Informative
    Your first bet would be Ask Slashdot.

    However, I'd personally ask Google. They've done it and even their search engine has information. I found an interesting link from there detailing the deployment of a large hundred thousand user mail system, from the architecture to the software located on Linux Journal.

  14. Who to talk to by Effugas · · Score: 2, Informative

    I've heard surprisingly good things about Communigate Pro, though I have no idea if it scales that high.

    Mirapoint is probably _the_ vendor to speak to, though.

  15. openwave's email server does this but it's $$$ by Serveert · · Score: 2, Informative

    I'm sure other commercial vendors have it but I do know that large companies like ATT et al use it to handle their email. It's a shrinkwrap product that does it all and then some but it's very pricy.

    I'm sure you could hack together something to do this much like what google did. Might take some time but it's totally doable.

    --
    2 years and no mod points. Join reddit. Because openness is good.
    1. Re:openwave's email server does this but it's $$$ by dougnet · · Score: 2, Interesting

      I ran an InterMail MX system for about 3 years for a national ISP. The company that sells InterMail was called Software.com at the time... and then they merged with phone.com and the combined entity was renamed Openwave. They provide many of the browsers used on cell phones... check an old phone and it probably says "phone.com" and a newer one will say "openwave". I used version 4.x of their InterMail Mx product primarily and had a little experience with version 5.0. It is a fairly complex system but is obviously very powerful. The system used an Oracle database for all user information (LDAP on the front-end, with the data stored in an Oracle DB on the back-end) and also used an Oracle database for each Message Store server. For example, if an E-Mail message was sent to 2000 users on your system, one instance of the message was saved to disk (in a hashed directory structure) and 2000 "links" were stored in the Oracle DB. Once all 2000 links were deleted (IE all users deleted the message) then a garbage collection process would remove the message file. This can obviously save a lot of space on a busy system. The server scaled by adding Message Store Servers (MSS) and front-end POP/IMAP/Web servers. The front-end servers are typically setup for load-balancing with F5 BigIPs or the like. The back end servers (directory/ldap server, MSS servers) are less redundant and require a cluster/HA solution. We had a 3 to 1 fail-over for our directory server and two MSS servers to one stand-by system. This was at least US $2M of hardware by the time you added an EMC Symmetrix for multiple TB of storage. This was a while ago and you may not need to use a tier 1 storage vendor... but when you're talking 1 million users and 99,9% uptime, you can't just throw something together and cross your fingers. OpenWave also offered an InterMail Kx solution (thousands of users rather than millions of users) that was less complicated. Below that was post.office. The price at the time was negotiable and was generally based on the number of users. Their support was generally quite good. They appear to call the product Email MX now: http://www.openwave.com/us/products/wireline/email _mx/index.htm The main reason companies choose (or stay with) MS Exchange really comes down to these two things: 1) Integration of the Windows Domain with the E-Mail account (often single sign on). 2) Integrated Calendar I'm not sure if Openwave offers something comparable now with their product, but I'd much rather run a system with that many users on a Unix platform than on a ton of Windoze systems. As other posters have mentioned, if it is properly architected... many different options are possible.

  16. CommuniGate by Anonymous Coward · · Score: 2, Informative

    www.stalker.com

    Is able to run clusters, and clusters of clusters, and theoretically scale into the hundreds of millions of accounts. Offers all the things you want, and more. LDAP, ACAP, etc, etc, integrated webmail. Intelligent directory creation structures, etc.

    1. Re:CommuniGate by p0rkmaster · · Score: 3, Informative

      I second that recommendation. I've been running CommuniGate Pro for many years now, and I love it. There's a cellphone provider in sweden that is hosting over a million accounts on a single 8-processor server - but for your requirements I'd probably recommend looking into CommuniGate's clustering solutions.

      --
      ... I like to keep an open mind, but not so open that my brains fall out. - Judge Harry Stone, Night Court
  17. earthlink's setup by Triumph+The+Insult+C · · Score: 2, Interesting

    earthlink's mail server complex has come up on freebsd-isp a few times

    this guy used to work at both sendmail and earthlink and he has links to some good resources

    --
    vodka, straight up, thank you!
  18. Please Please Please by xactuary · · Score: 2, Funny
    Let me send your peeps a million .mac invites. Then I'd be set for life! Mmmmwwwhhaaaaa!

    If that's too rich for ya, how about gmail invites? Slashdotters could come up with a million of those I bet.

    --
    Say hello to my little sig.
  19. Vendors by XorNand · · Score: 4, Interesting

    I'd start with talking to vendors. Consult with some sendmail gurus, Notes guys, etc. Any of these people/companies would salvate at the thought of being a part of a project this large. First, talk to the client and hammer out the real needs with solid performance requirements, timeframes, growth expectations, (meaning real numbers) etc. Put together a well thought-out Request For Proposal and send them out to as many applicable vendors that interest you. Then just stand back and play the role of ringmaster. The vendors will give you all the ideas you need.

    Just do one thing, please: make sure that the client is honest-to-goodness serious about this. I absolutely hate getting pie-in-the-sky RFPs from people who are just kicking the tires. It's a good way to burn bridges by not looking professional.

    --
    Entrepreneur : (noun), French for "unemployed"
    1. Re:Vendors by This+is+outrageous! · · Score: 3, Funny
      hammer out the real needs with solid performance requirements, timeframes, growth expectations, (meaning real numbers)
      Integers, kid. INTEGERS.

      Those newfangled "real numbers" are nothing but bullet-point creeping featuritis. Integers, on the other hand, have been around since at least Kernighan & Richie. They do one thing and do it well. Keep true to the Unix philosophy! Real numbers in information technology? Just say NO.

      --
      This is...

      O
      U
      T
      R
      A
      G
      E
      O
      U
      S

      !

  20. Split up the tasks by jgardn · · Score: 4, Informative

    There are three parts to your system: sending mail, receiving mail, and storing mail. Keep them separate.

    Your receivers will be a bank of servers running sendmail. They will do appropriate spam processing to reduce the amount of mail actually received. They feed the data into the storage servers.

    The storage system has the data partitioned out so that all the data for one user would go to one server while all the data for another will go to a different one. The storage system also has to provide POP and IMAP access. You may want a special setup where the IMAP or POP service known which server to go to. Investigate having one giant virtual filesystem so that the system isn't too complicated.

    Your webmail access will use IMAP to access the actual mail. It can be a completly different system.

    The sending system will be a chokepoint for all outgoing mail. You are going to scan it as it goes out to look for virus-sent emails or unauthorized messages. For instance, you may want marketing email to be processed differently than inter-office email and such.

    All of these systems will be running sendmail. I know sendmail has a bad rap for being insecure, but the insecurities have been found and since fixed. It is by far the most manageable system when it comes to large-scale deployments with heavy customization.

    --
    The radical sect of Islam would either see you dead or "reverted" to Islam.
    1. Re:Split up the tasks by michelcultivo · · Score: 4, Insightful

      And please don't forget to use Maildir for email storage, it's very good for backup and very easy to manage.

    2. Re:Split up the tasks by UndeadDude · · Score: 2, Interesting

      Having dealt with sendmail at scale, I would definitely say no. And if you think that it is the most configurable, sounds like there are some MTAs you still need to check out. I recommend Exim.

      I agree that you want to split things up-- make farms of large numbers of servers to make horizontal scaling easy. Store your user info in LDAP (OpenLDAP works very well, with very good data replication in 2.3.x). Most common server software will support LDAP and it scales very well.

      You need "layer-4 switching" to load balance across machines, and automatically disable systems/services that are down. You need something that will cluster. I recommend Foundry ServerIron switches. F5 BigIP is another common alternative.

    3. Re:Split up the tasks by Antique+Geekmeister · · Score: 2, Informative

      You wrote: Your receivers will be a bank of servers running sendmail. They will do appropriate spam processing to reduce the amount of mail actually received. That's 2 tasks. This requires absolutely robust, absolutely lightest weight email servers, with serious caching. Sendmail can do it: Postfix can do it, and is vastly easier to manage. The syntax of configuring sendmail configurations is just too arcane for most of us to deal with. Definitely add blacklist filtering and SPF on the front end, to reduce the load on all your other servers of handling and processing the spam, and very definitely create an SPF record for your own domain: much of your email will be to and from people inside your own domain, and being able to throw out all forgeries and inappropriately sent emails before wasting time on sophisticated virus or spam checking is a huge, huge, huge CPU win. This is in fact a big enough project that I'd contact Novell: they have support for those hardcase Outlook clients, they have good calendaring for Linux with their latest Evolution email clients and matching servers, and they've worked very hard on things this scale.

    4. Re:Split up the tasks by schave · · Score: 2, Informative

      Many people are now putting e-mail security devices in front of the "receivers".

      Products such as Ironport, Openwave Edge Gx, and Symantec Mail Security Security use technologies such as traffic shaping, reputation services, directory harvest attack detection, etc. to help keep spam out of your network.

    5. Re:Split up the tasks by fwc · · Score: 3, Informative
      This is right on the mark. I would differ in a few implementation details (aka I hate sendmail with a passion), but this is the way we do it at a medium-size ISP with a mail server "cluster" running in the thousands of mailboxes category.

      In short, we have mail servers accepting the mail and dropping it on a shared NFS server which stores all the mail. The incoming servers run spam and virus filtering and is responsible solely for delivering the mail to the customer's mail directory which lives on the NFS server.

      On the client side, we run IMAP and POP3 servers which access the stored mail on the NFS server to deliver it to the clients.

      The exact software used for both of these functions are somewhat irrelevant. Once you split this up this way, you can also split the selection process. I.E. which is the best server for accepting SMTP mail and dumping it in customer's mail directories. Which can be answered with a completely different answer than the question of "what is the best NFS (or SANS) server to use to store the mail", or "what IMAP server should we be using", or "what webmail front end should we be using", or so on.

      It also makes changing your mind down the road on any piece easier since you can actually run and test any one of these components in the live system as a final test before moving a replacement into the system.

      FWIW, I would *love* to consult on something this scale.

    6. Re:Split up the tasks by JChris · · Score: 2, Informative
      Your receivers will be a bank of servers running sendmail. They will do appropriate spam processing to reduce the amount of mail actually received.

      You might give serious consideration to outsourcing your spam and virus filtering.

    7. Re:Split up the tasks by drsmithy · · Score: 2, Informative
      There are three parts to your system: sending mail, receiving mail, and storing mail. Keep them separate.

      I would argue that should be sending, receiving, accessing, storing. I'm not so sure sending and receiving need to be separated either.

      The storage system has the data partitioned out so that all the data for one user would go to one server while all the data for another will go to a different one.

      Uh, sounds to me like you're suggesting a storage system for every user. I'm sure that's not what you meant, but it's what you wrote :).

      The storage system also has to provide POP and IMAP access. You may want a special setup where the IMAP or POP service known which server to go to. Investigate having one giant virtual filesystem so that the system isn't too complicated.

      You should separate where the mail is stored from where the mail is accessed. Ie: your IMAP and POP servers access a mail store on a SAN or NAS. Depending on load, things like Webmail might require yet another layer of separation (ie: Webmail <-> IMAP <-> Storage).

      It's really important to separate the mail access system(s) from where the mail is actually stored, otherwise you are building a system with single points of failure and performance bottlenecks.

      All of these systems will be running sendmail.

      That would be an absolute nightmare. Postfix is just as functional and orders of magnitude easier to administer.

      Although, as I've said elsewhere, if this palce really does have a million-seat Exchange environment, they're almost certainly not going to be able to replace that with Squirrelmail, IMAP and Postfix. Exchange does a hell of a lot more than just sending emails back and forth.

    8. Re:Split up the tasks by Niten · · Score: 2, Informative

      Good point. One thing to be aware of when using Maildir, though, is that since each message is stored in its own file you'll have to make sure you configure your filesystem so that it can handle holding a massive number of files/messages. If you configure an ext3 partition with the default number of inodes, for example, then with one inode per message you might find yourself running out of inodes before you run out of disk space.

    9. Re:Split up the tasks by thogard · · Score: 2, Interesting

      All of these systems will be running sendmail.

      That would be an absolute nightmare. Postfix is just as functional and orders of magnitude easier to administer.


      If its a million seats, its not going to be easy to admin at all. It will require several people that know MTAs inside and out and sendmail has a track record in very large systems.

      Remember that in this case, the job will be 100% running an email system so the best tool for the job should be used, not the best tool for the admin.

    10. Re:Split up the tasks by thogard · · Score: 2, Informative

      Lots of things should be dead and buried (like :wq in your sig, where did that come from? there are very few people who were using ex in the days before :x of ZZ).
      However sendmail isn't one of them. Just because there was an issue with m4 (also something that should be gone) doesn't mean the core app is broken. M4 use started over a decade ago when even awk wasn't consistent on all unix systems.

      On one setup that has both sendmail and postfix, I know postfix loses far more mail than sendmail (which has lost none).

      I use sendmail because I can make it do everything I want it to and sometimes I have to have an MTA that does odd or unusual things. I've spent time and learned how to make it do very unusual things (at the .cf macro level) and it is very powerful since it has a full programming language built in. I've used it in very large instilaions and it works and it keeps on working. It hasn't ever let me down. I can't say that for the other MTAs.

      Also a complete rewrite of sendmail is being done right now. Too bad its taking away all the cool low level macros but I expect most people will find that an advantage.

  21. New Google Appliance by Anonymous Coward · · Score: 3, Interesting

    I agree. The google appliance should implement gmail and a web front end for administration. Like the Colbalt machines of yore, only better. Google-ified.

    It really is the best email.

  22. Novell? by lorien420 · · Score: 2, Informative

    www.myrealbox.com is a tech demo of NetMail and eDirectory.

    --
    "[We'll be] really getting inside your head and making it an unpleasant place to be" -- Trent Reznor
  23. If theyre using exchange by gad_zuki! · · Score: 2, Insightful

    they're probably using the groupware too. Are they also willing to ditch outlook?

    If you're looking for a groupware replacement, then you've got a big job ahead of you. Scalix is a mess, bynari is a hack, etc. When you do get them running things end users end up buying like PDAs and apps that hook into outlook are going to cause more problems.

    If its just pop/imap you really can't go wrong. A good webmail option is kinda a catch. Squirrelmail is nice, but compared to OWA its really out of its league.

    If your post told us what they were fed up with and how they used their system you'd get some real advice. Expect the usual postfix vs qmail vs sendmail vs whoever mini-flamewars.

  24. Unless It's A Very Old Exchange System... by zentec · · Score: 2, Insightful

    ...they need to think about this very carefully.

    I'm sure someone, somewhere within the enterprise is using features of Exchange that they won't get anywhere else. Not to sound like a Microsoft fan-boy sock puppet, but there's some features that Exchange has that people in a business environment just love.

    However, since you asked. I'd run Exim or Qmail and Cyrus IMAP.

  25. For the lazy... by Spy+der+Mann · · Score: 5, Informative

    Here's Slidey's post. (Disclaimer: Copyright blahblahblah appropriate people yadda yadda fair use etc etc don't sue me, thank you)

    ---
    ok i work for a large uk isp in the messaging (email) operations dept. we currently have 2.5-3 million active accounts (and a load of suspended), and manage anywhere upto 12-16million mails per day

    our setup is like this (this is simplistic though):

    front line - anti abuse mta's - these do dnsbl type lookups (spamcop, spamhaus and sorbs). we have 9 incoming
    next we have mta's. they farm mail off to brightmail servers, which do similar to spamassassin. we have 6 incoming mtas, and 8 brightmail servers (not enough - high load)
    after that they farm off to vscans (6)
    after that any mail that gets through is delivered to mail stores (8 + 2 hot spares)

    what you want to be doing is similar to this above - chaining hte mail from one level to the next. the first level should be the rbl's - these are less processor intensive, and can remove a fair whack of your mails in one swoop. spamassassin is going to be more cpu intensive, since it has to open each mail and read the first x many bytes

    id have separate machine(s) holding your master directory, and if you can get directory caches then do that too (to take the load off the master directory) - ours run oracle

    i dont know what your budget is, but split up hte different tasks as much as possible. that way if you need to add more to any pool (rbl lookups, spamassassin etc) you just add another machine..

    one last thing - we also have a separate box just for postmaster mail (with exim + spamassassin funnily enough) - it tends to get busy

    Last edited by Slidey on 09-08-2005 at 11:19 PM
    --
    (end of quote)

    1. Re:For the lazy... by therus121 · · Score: 3, Informative
      I work with Slidey, but in the Solutions side of the team (i'm the guy who architects the infrastructure of the platform). Here's a few additions:

      1. Storage - \Disks, lots of Disks\ - we use EMC DMX3000's for the stateful machines (~180TB raw) which work very nicely.

      Your back end needs to handle lots of small random writes - this makes storage vendors cringe when mentioned, as it makes a mockery of their lovely benchmarks.

      2. Clustering - you'll need that also on your master directory and message stores's. Veritas is nice.

      3. Load balancing - For the front end boxes (pop, imap, web). Cisco CSS's are pretty good for this.

      4. OS - We run Solaris. It might not be the fastest thing around, but it works pretty much non-stop; has good vendor support and is very mature. RedHat might be on the horizon as well as Solaris for x86. Windows? don't be daft.

      5. Test environment. Have a scaled down exact copy of the production system to test things on. i can't stress how important this is.

      6. Proper automated server build procedure. One word - Jumpstart. All OS and application configs and builds in Jumpstart. So if you loose a box, it's no big deal about rebuilding it at 3am on Saturday morning when you've had a bevvy or two the night before, and all you feel like doing is chundering (i speak from experience - a SunFire 6800 does not respond well to projectile vomit)

      One correction of Slideys post, we now have 16 brightmail boxes (10 in, 6 out) and it's not enough.

      Cheers.

  26. Still Have to Engineer it by DavidDPD · · Score: 3, Interesting

    I'm not sure that there is any commerical solution that can support 1 million emails well. Hence why Yahoo and Google have built there own custom systems. Some engineering may need to be required.

    For pop3 & imap4rev1, look at:
    http://www.dbmail.org/index.php?page=overview

    Still need an MTA, I think qmail is the fastest, best, but I'd used exim, as its easier.

    Database - not sure if MySQL and PostgreSQL will scale with dbmail.

    I'd say use FreeBSD, because of the ports collection (Don't linux Flame me). However, something like Solaris 10 x86 (or Solaris+Sun Hardware) might provide a bit better scaling, and HA hardware, SAN support, support in general, etc. Though, a bit tougher on the OSS software installs (In My Experience)

    1. Re:Still Have to Engineer it by QuasiEvil · · Score: 2, Insightful

      I'd strongly consider exim and maybe postfix if you're not looking to go with good ol' sendmail. That's the voice of a five year qmail user talking.

      I currently run qmail in a small production environment, handling about 20k messages a day. It's small, but enough to point out the cracks.

      qmail does many things well, but it also is a product of DJB-bizarroworld. The worst of the offenses, in my book, is that due to his security model, the smtp receiver will accept messages to any recipient, not just valid ones. Then, if it can't figure out what to do with it, it generates a bounce message - which usually bounces. This can kill a machine and a network connection during a dictionary spammer attack. Implementing SMTP-AUTH with qmail is a royal, gigantic, immense, overwhelming pain in the ass. It took me several hours to get it all patched together and working.

      Want any of the above to work? Patch. Want a blacklist of users that shouldn't get mail? Patch. Want SPF support? Patch. Want the non-POSIX use of errno to be fixed? Patch. Usually, the patches don't go together smoothly, so you wind up spending hours figuring out the rejected chunks and how to properly patch them together. And this is a modern MTA?

      While I've patched qmail to deal with a host of issues, there's no reason a modern MTA should need to be patched for most these. The rcpt authentication thing is just downright dumb, and smtp-auth is reasonably widely supported with the ESMTP standard.

      I'm testing exim right now, and I'm pretty happy with it. It's fairly light, does everything I want and need, and isn't the configuration quagmire of sendmail. As soon as I rebuild the mail server, I'm switching the production environment away from qmail.

      If you're a hard-core qmail adherent, that's great. It's fast and reasonably easy to configure in its basic form. However, I prefer something that's more standards-compliant and feature-rich right out of the tarball.

      My advice to anybody considering qmail for the first time is to try it, but consider other popular MTAs like exim and postfix as well, including the 800lb. gorilla, sendmail. It's a pain, but get the O'Reilly book and you can do positively anything (and I do mean anything) you want with it.

    2. Re:Still Have to Engineer it by bani · · Score: 4, Insightful

      if you need another reason not to use qmail, this is a good one.

  27. When all else fails... goto spec.org by pci · · Score: 2, Informative

    Using this as a reference point (and from recommendations I've heard)...
    I recommend CommuniGate.

  28. Didn't We Just Have This Question? by vigilology · · Score: 2, Informative
  29. Re:go to gmail by Chmarr · · Score: 5, Insightful

    Gmail is beta.

    Gmail does not have guaranteed uptime.

    You do not pin your companies communications system on something you cannot sign a SLA agreement with.

    need I go on? :)

  30. Gmail accounts... by slashname3 · · Score: 2, Funny

    I have several gmail accounts I can give you. Once you have serveral of these you can assign gmail accounts to the rest of your users. :)

  31. I would use postfix by shunk · · Score: 2, Informative

    From my experience postfix scales the best for sending and receiving email. Use postfix+(mysql or ldap) + amavisd-new + clamav (or some proprietary alternernative) + spamassassin. Cyrus is probably the best for pop and imap access. Squirrelmail for webmail.

  32. CommunigatePro from Stalker.com by ejoe_mac · · Score: 5, Informative

    1) It'll run on anything - Win32, Linux, BSD, Solaris, x86, XServers, Alphas, Power5
    2) It'll scale as big as you can dream - over 5 million accounts with clustering
    3) MAPI support

    1. Re:CommunigatePro from Stalker.com by msblack · · Score: 2, Informative

      You want to blame the makers of CommuniGate Pro for enforcing the terms of their license? I take it that you believe customers should be entitled to infinite upgrades at no charge. CGP users are always able to use the version of CGP they purchased or their last upgrade before the license expiration for as long as they please.

      This so called time bomb applies only to FORMER customers who upgrade without a current license. Sounds fair to me.

      --
      signature pending slashdot approval
  33. Hire Matt Simerson, the creator of MailToaster by ChrisKnight · · Score: 3, Informative

    My number one suggestion is hire someone who has built scalable mail systems, and written tons of code to support them: Matt Simerson

    You can learn about him, and his mail projects at http://www.tnpi.biz/internet/mail/toaster.shtml

    -Chris Knight

    --
    -- This sig is only a test. If this were a real sig it would say something witty. --
  34. Scalable e-mail systems? by shub · · Score: 3, Informative
    Try Googling for "Scalable E-mail Systems" and "Scalable IMAP services". Of course, I'm biased since most of the top hits are from the slides from the presentations that I've done at LISA 2000, LISA 2002, etc....

    My slides relevant to this discussion can be found at http://www.shub-internet.org/brad/papers/dihses/ and http://www.shub-internet.org/brad/papers/sistpni/.

    And yes, Nick Christenson has been a long-time friend and co-author of mine.

    Feel free to contact me directly if you want some referrals.

    --
    Brad Knowles
    http://daily.daemonnews.org/ -- if you're not
  35. Google services by naoursla · · Score: 2, Funny

    I bet Google would be willing to sell you a solution.

  36. Start with universities by dubl-u · · Score: 2, Insightful

    I'd start seeing what universities near you use. They won't be as big, but a large school should have circa 100k accounts and a lot of the same issues you'll face. They may already describe their infrastructure somewhere on the web. And offering to take two or three of the mail guys out to lunch or dinner will get you a ton of the nitty-gritty details and smart questions to ask yourself (and vendors).

    Then once you think you have a solution, budget plenty of time for extensive testing against simulated load. Make sure you simulate failures by, e.g., pulling plugs randomly. Buy the hardware and software *after* you're 100% sure it works, not before. And where possible, roll your solution out gradually, so that small problems don't turn into MCFs.

  37. IBM Z990 by 1c3mAn · · Score: 2, Informative

    Contact IBM. A mainframe running z/VM is your solution here.

    99.9% reliabilities is more then normal for those machines. It is modular enough to expand to what ever you may need in the future, and it has the dataprocessing horsepower to actually hand the 20k or so concurrent users at a time and have the harddrive space to match that many users as well.

    Run linux or unix on top of VM and you should be fine.

    Product Page for Z990:
    http://www-03.ibm.com/servers/eserver/zseries/z990 /

  38. Re:Let the vendors do the work. by joe_bruin · · Score: 4, Insightful

    Seriously. If high availability systems is not your company's core competency, call IBM, Red Hat, Sun, Oracle, Novell. Tell them you have a million users. Tell them you have a very fat checkbook and that you want them to provide you with a complete solution. Tell them that nothing but 5 nines of uptime will do.

    DO NOT implement a half-assed solution. Unless you really know what you're doing (and if you were, you wouldn't be asking this question), don't assume that a million Linux servers strewn about a million offices and data centers is the best solution, even if it is easiest to set up and administer. Maybe it is, come up with a proposal with hard numbers and see how they compare to the vendors. A million dollars spent on a Sun E10000, and Oracle Grid subscription (scales perfectly, right?), or a million IBM engineers flown into your site when an emergency happens may be worth paying for.

  39. qmail-ldap is best suited to this task by Lost+Found · · Score: 2, Interesting

    qmail-ldap is best suited to this task. Reasons:

    1. You can sleep at night knowing that you're running the only MTA in widespread deployment that has never once had its security compromised; in fact, qmail's author Dan Bernstein still offers cash to the first one to be successful...

    2. You can sleep at night knowing that the core MTA, qmail, has reliably handled some of the largest e-mail operations in the history of the internet. Its design is such that on a properly configured system, you'll never lose a single e-mail. Hotmail actually used qmail for a long time, even after Microsoft bought them - Microsoft repeatedly tried to replace it with Exchange, which kept buckling under the load.

    3. Qmail is very modular, allowing you to pick and choose your components wisely.

    4. Qmail uses the Maildir format its author pioneered. Maildir is NFS safe, not proprietary/complicated (often binary formats like PST are subject to corruption), etc.

    5. LDAP makes it easy to manage massive amounts of accounts.

    In any case... qmail-ldap is already running large sites with millions of users. Info:

    http://www.qmail-ldap.org/wiki/Documentation

    I've set one of these systems up on an IT cluster at my current office, and I must say that it is not only very robust but also really easy to manage.

  40. Plan. Test. Spec. Deploy. by MattW · · Score: 4, Informative

    (1) Plan an server setup which can handle the load. The requirements may change, but one million users is a fair bit. How much average incoming and outgoing emails is that? Figure that out, using a network sniffer or sniffers on existing traffic if need be (although logs should work). Then use this to calculate a number of servers needed for an outgoing smtp farm, an incoming MX farm. Figure out how much storage space is to be provided per user, and then figure out how you want that storage space to be accessible. Probably your best bet is to have a round-robin DNS farm of imap/pop servers which proxy connections based on the users login to a backend farm of actual mailservers responsible for storage. Plan the ability to move users from server to server to rebalance as needed. Outgoing smtp is a lot easier since you're not really storing things long term. Plan a web farm for webmail. (And pick software) Don't forget to plan some sort of backup, and make sure your system is flexible as far as email retention; chances are the email retention policy will change at some point and your setup should be able to change with it.

    (2) Test. For each server, hammer it. Test it's load under as close to real world circumstances as you can. Then create unreal punishing loads and see how it handles it. Plan in advance for how your server farm handles something like virus-generated mass emails causing 1000% spikes in load.

    (3) Using your testing results, spec out the actual hardware. RAID, cheap hardware, redundancy, etc. If you have control over the network choice, plan a location with multiple fiber trunks coming into the building and provider redundancy. Remember backhoes in concert? Don't get hit by that. Plan for server failures, drive failures, network failures, power failures, and security compromises.

    (4) Deploy! If you did the rest right, this is the easy part. You'll have redundant network connections, HSRP, redundant switches, a proxy farm, an imap/pop farm the proxies connect to, an smtp farm for outgoing emails, and a web server farm for serving up webmail (depending on how you choose to architect the disk space, the web farm and the pop/imap farm may be one and the same; depends on how you set things up.)

    Here's a starter link to a setup which is smaller but, in principle, fairly similar:

    http://www.itd.umich.edu/umce/features/2004/cyrus. html

    Finally, if you don't want to screw it up, ask someone who has done it before. Paying someone $300/hr for a 10-30 hour review of your plan is dirt cheap compared to horking the setup. Someone who has worked in huge email environments (a la, hotmail) could show you gotchas before they bite you. (If you need help figuring out who to ask, I could even point you to some of the appropriate people)

  41. YIKES! Tossing out the groupware?! by Dark+Coder · · Score: 4, Informative

    Gee whiz... I'm surprised that the groupware is getting tossed out. If as small as 20% of the user is accustom to Outlook Calendaring, they'll represent 95% of the complaints in a new system. An advance warning to all existing account should be mailed out (both paper and email) so that nothing falls through the cracks.

    Now to the mega-infrastructure that I set up for an undisclosed company for under 50K (and also didn't want groupware).

    1. Transport Sender (sendmail). That's right! Good ol' plain sendmail scales. It does require some pretty savvy tweaking so get Sendmail.Com consultant onboard just for this. Use SleepyCat DB for speed for all sendmail setups. For one million, I had about 23,000 transaction per minutes during the day. You'll require 10 servers for this for cushion (against some idiots sending an ISO attachment).

    2. Payload receiver (sendmail). A second group of machine to handle the reception of SMTP payloads.

    3. IMAP4S/POP3S - Hey what's with the "S"? Nothing like sending your user's password in the clear. Unless you enforce VLAN in your corporate environment and limit all IMAP4/POP3 to VLAN, the "S" is a mandatory security feature, inside and outside. Guess what "S" stands for?

    4. Webmail - SquirrelMail - Yet another dedicated server (in which I had to add two more load-balanced server to handling the growing pain). Use https for login only.

    5. AntiVirus (ClamAV) - It was the best back then, now its just running in the middle of the pack. sendmail has milter that allows extensibility such as MIMEDeFang, wilter, rureal (reverse-DNS check), spamassasin, and SPF.

    6. Support - Half the effort is put into those webpages that would 'hand-hold' these newbies into reconfiguring their machine. Worth the effort if you have over 20 expert PC users that can do their boxens. Otherwise do it yourself at each PCs. These pages should cover Thunderbird, Evolution, as well as Outlook and Outlook Express.

    7. Learn to spin 11 plates, one on each pole. Keep them spinning... If they start to drop and break, bring in some more Unix dudes.

  42. Re:Here's my plan and it's the best one you'll get by Reality+Master+101 · · Score: 4, Insightful
    So how will people get all their mail rather than a twentieth of it? Easy, you set up a round robin DNS on mail.DOMAIN.com.

    This is the best advice he'll get? Sheesh.

    Think this through -- a lot of e-mail programs check every 20 minutes. Assuming I actually hit any without duplications, I could potentially need 400 minutes or over six hours to get all my mail. Since it's random, it could take days.

    And that's just for starters with this lame scheme. If I want to check mail, say, from the field on a dial-up once a day... hopefully you can see how badly this would suck.

    What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.

    --
    Sometimes it's best to just let stupid people be stupid.
  43. Only 99.9% uptime? by Radak · · Score: 2, Insightful

    If my email system designer were satisfied with almost nine hours of downtime per year, I'd find a new designer.

  44. Novell Groupwise or Lotus Notes by trboyden · · Score: 2, Informative

    Chances are you're not going to be just turning off those Exchange servers, you're going to need to migrate the data. That being the case your going to want something with good migration tools that can handle that much migration in a relatively speaking short amount a time. I just completed an Exchange to Groupwise migration and there are some really great migration tools out there for it. Groupwise also meets all your requirements out of the box. Not to mention by buying Novell you're (at least indirectly) supporting open source. I'm not as sure about Lotus Notes, but regardless if your going to have that many users, you want big name vendor support.

  45. Commercial Package Options by schave · · Score: 2, Informative

    If you are intererested in commercial packages, either Sun's Java System Messaging Server or Openwave's Mx product will easily scale to a million accounts and beyond. Many of the larger ISPs are using these packages or have their own custom mail server. Other possibilities may be Mirapoint(who offers an appliance type solution) or Sendmail.com

    If you are into benchmarks, the folks at SPEC have published results from several packages.

  46. Re:Ask Slashdot? by R3D · · Score: 2, Insightful

    Well, they're currently using Exchange.

  47. Re:Here's my plan and it's the best one you'll get by kashani · · Score: 2

    Wow. You've got no idea just what it would take to do this do you? Or you're being extremely funny.

    1. users should be in a db.
    2. imap servers should be their own cluster
    3. pop servers should be their own cluster
    4. smpt servers shoudl be their own cluster
    5. spam filtering should be their own cluster
    6. round robin DNS should be ditched in favor of hardware load balancing.

    kashani

    --
    - Why is the ninja... so deadly?
  48. Simplicity is key. by chrome · · Score: 5, Informative

    My job is building systems like this. Current mailserver system I designed and built is hosting 80,000 email accounts, and will scale out to a million quite cheaply by just adding more machines.

    OpenLDAP

    You need a central configuration repository to store the email accounts, their passwords, etc. OpenLDAP is perfect for this, and you can replicate it out for scalability. Be prepared to learn about LDAP schemas.

    Exim

    Use Exim because it has a simple process model (a single binary that does all the work, like sendmail) but has a human readable configuration file and has to be the most flexible MTA out there. You will have customers with weird requirements sometimes, and Exim will be able to meet those. Plus, it has Exiscan-ACL built-in these days, which allows you to do virus scanning and spam scanning at the DATA stage, before the mail is actually accepted by the MTA. It means you can make the sending MTA deal with the bounces if the mail is a virus or is obvious spam.

    Courier-IMAP for POP3 and IMAP access.

    Yeah its written by a sociopath, but nothing else works as good in the field. It works out of the box with sensible LDAP schemas and is fast, reliable and secure. Handles SSL, all the different authentication methods, what have you. Maildir compatible.

    Maildir message store.

    Store the mail in maildirs. Don't put them in /maildirs/domain.com/user/Maildir - split the domains up with a 2 level deep hashing algorithm (if you're virtual hosting domains, which is what it sounds like to me), so make it something like /maildirs/xx/xx/domain.com/user/Maildir, where xx/xx might be something like 3f/6b (depending on the hash). Use MD4 for the hash because its more balanced than MD5.

    NFS mount the maildirs from a fast NFS device like a Netapp. Netapps are recommended because you can plug them in, and they just work, plus they are easy to scale by adding more trays.

    Linux NFS servers set up with heartbeat and shared disk also make a nice HA NFS, and would be cost effective, but you'll have to buy an array anyway (probably fiber channel) so it might be better just get something thats completely integrated like the Netapp.

    Spamassassin.

    Can be configured to scan make at DATA time in the SMTP conversation. A LOT of configuration work here to make it play nice on a massively scaled platform, but it can be done. Mostly it needs to have things like the auto whitelisting and bayseasn filtering turned off, as the extra DB file work is a bit excessive.

    Actually, I'm sure there is a way to make it work with a less resource intensive repository, but using the standard SA rules seems to work well for my environment. *shrug*

    ClamAV.

    Free antivirus, it works, and integrates well with Exiscan-ACL. Set it up to scan via the daemon, and configure it to update every couple of hours from cron, and bob's your uncle.

    Scaling out

    Make every box the same. Make every box an MTA, a POP3/IMAP server, etc. Use something like Kickstart to automate builds so that you can build a machine in 10 minutes, and all you have to do is configure the IP address and plug it in. If you want to be REALLY sexy, you could make the machines boot off the network, and mount / from a shared NFS area, and make /var/spool/exim the internal mirrored disks. DHCP them, then all you do is plug a machine in and set it to PXE boot. Pretty trivial to do.

    Load balancing

    Hardware load balancers are pretty much a necessity. Don't touch cisco stuff. Its not very good. Go with Foundry Networks ServerIrons. The XLs can handle 1 billion requests/day if you configure them in Direct Server Return mode (also known as DSR/Foundry switchback). Use it. It makes all the return traffic go directly out to the net, meaning your ServerIrons have to switch less traffic and track less sessions. I would recommend however for a million users a pair of the ServerIron 450GTs, or bigger. Maybe one per VIP/Service.

    Now, if this is all looking pretty daunting, you could always hire me to build it for you :)

    1. Re:Simplicity is key. by Matt+Perry · · Score: 2, Interesting
      split the domains up with a 2 level deep hashing algorithm
      Could you please elaborate on this point and why you do it?
      --
      Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
    2. Re:Simplicity is key. by chiph · · Score: 2, Informative

      While I am not a mail guru, I expect it's because you've got a million users, and having that many directories all in one subdirectory means that the file system will be consuming a lot of cpu doing lookups. With a decent hash algorithm, you're down to approx. 15-16 directories in the 2nd directory level, which is quite managable.

      We did something similar at my last job, where we had to maintain 9 million+ smallish files. We originally had one level of indirection and NTFS choked (huge amount of time spent in the kernel, not enough time running our app). Adding another directory level made it happy again.

      There may yet need to be another level of indirection at the folder level to handle those few people who never deleted any email over their 15-year career with the company.

      Chip H.

    3. Re:Simplicity is key. by Anonymous Coward · · Score: 2, Interesting

      (This is not a troll, all the following questions are honest.).

      > OpenLDAP

      IIRC, the replication feature was pretty buggy in some versions of OpenLDAP (2.2.x). Has it been really fixed in the latest versions ?

      > Exim

      What about qmail ? Have you ever tried it ?

      > MD4 [is] more balanced than MD5.

      Do you have evidence to back up this claim ?

      > NFS mount the maildirs from a fast NFS device like a Netapp.

      How do you provide data redundancy with such devices ? Do you replicate data on different NFS servers ? Why not use FreeBSD or Linux boxes as NFS servers ?

      > Hardware load balancers are pretty much a necessity.

      Why not use standard software load-balancing facilities provided by Linux and BSD systems ?

    4. Re:Simplicity is key. by PapaZit · · Score: 4, Insightful

      All of the paren't suggestions are decent, but there are a few alternatives that may make sense:

      -Cyrus IMAP, while a monster to build and configure, can handle a pretty heavy load, and the latest versions can handle a lot of load-balancing internally.

      -Exim's nice. I'm a Postfix man, myself. Sendmail is king, though. I'm not going to claim to like it, but it's up to the task, and there's something to be said with using a standard tool.

      -While things like MD4 are okay for hashing, they're kind of CPU-intensive. Consider something like "second and third letter of username" that takes less CPU time. The right answer here depends a lot on the relative speed of CPU versus disk. If you can get dedicated hardware to do this (rare, but it exists), use whatever hashing the hardware supports.

      -Consider some sort of cache (maybe even separate machines) between incoming SMTP and SpamAssassin/ClamAV. When the 2am spam run hits, your incoming SMTP machines can become overloaded. The downside: deciding what to do with mail that's not rejected the moment it's received.

      -Set up a "mail machine" configuration with whatever OS and tools you use, and make it possible to create a disk image quickly. You're going to need a lot of hardware, which means that you'll have enough random failures to make building machines by hand impractical. This also means "have at least one extra built machine/disk array/etc. powered-on and waiting at all times" for those 4am hardware failures.

      -You may find that things like NFS just aren't fast enough. Be ready to look at SAN or shared "direct-looking" storage. The tough part: this is hard to discover during testing. It may be overkill, but don't lock it out as a possibility.

      -I/O is king. CPU speed won't matter as much as bus speed, disk speed, and memory speed. This is why a lot of companies use banks of big proprietary unix machines for their mail, even if they use commodity PCs elsewhere.

      -I don't trust hardware load balancers. Sometimes they're necessary (and they do make life better when they work), but they're a big single point of failure. Consider other ways to split the load, or at least ways to work around the load balancer if it should fail. The Cyrus aggregator can handle some of this.

      --
      Forward, retransmit, or republish anything I say here. Just don't misquote me.
    5. Re:Simplicity is key. by Zak3056 · · Score: 2, Interesting

      OpenLDAP

      You need a central configuration repository to store the email accounts, their passwords, etc. OpenLDAP is perfect for this, and you can replicate it out for scalability. Be prepared to learn about LDAP schemas.


      I know this won't be a popular opinion, but given that he's migrating from Exchange, it's fairly likely that they're already an Active Directory shop... it doesn't make sense to abandon it for OpenLDAP, especially given that they're almost certainly windows only on the desktop and will still need AD even if they ditch Exchange.

      --
      What part of "shall not be infringed" is so hard to understand?
    6. Re:Simplicity is key. by chrome · · Score: 2, Informative

      Cyrys: No opinion on this. When I looked at it 3 years ago, it wasn't where I wanted it.

      Exim: I've tried the rest, Exim's the best. :)

      MD4: You do it once, when the account is created, and put the location of the Maildir into the LDAP directory. No CPU hit.

      Spam/ClamAV: I've found separating this stuff out makes it worse, not better. Having all the machines equal, and having lots of them, seems to work better. Don't ask me why, I'm not a professor at this stuff, I just know what works and what doesn't.

      Disk images: Don't do it. Its a dark road. I use Fedora Core 4 and Kickstart. I build RPMs of everything, including configs, and build it all with the kickstart. You could do something similar with Debian if that's your poison.

      NFS: NFS is good. Get a fast NFS server and you won7t have problems. Use gig for the interconnect. SAN based Global File Systems are not their yet. They are too buggy and unreliable.

      IO: CPU does help a lot, actually, if you're doing the spam/antivirus thing. If you don't do that, then fine.

      Hardware load balancers: Foundry kit is trustworthy. I've been using their stuff for years and never had any major problems with it. I've got ServerIrons that have been running for 3 years without a reboot and without a problem. The key is: understand how they work, and you won't have any problems.

    7. Re:Simplicity is key. by thogard · · Score: 3, Insightful

      Current mailserver system I designed and built is hosting 80,000 email accounts, and will scale out to a million quite cheaply by just adding more machines.
      80,000 is trivial. I was running a 12 node system with 87,000 users 12 years ago on hardware that was slower than a play station.

      The complexity of going from 100,000 to 1,000,000 isn't just 10 times harder, you start to get into that area where sigma 4 system works with few problems with 100k but dies horribly with 1000k users. There is a line where instead of one machine being broken is unusual, you get this situation where at least one machine is always broken and it will often be broken in a way that is hard to diagnose.

  49. Re:bring in a consultant? by MrKahuna · · Score: 2, Funny

    Well, he seems aware that he doesn't, in fact, know everything.

  50. While we answer this question... by hellfire · · Score: 5, Funny

    ... Is anyone wondering what's going on at Microsoft right now?

    It starts with a slashdot geek working in the email department spitting up his coffee, followed by a few rumors which make it up to a guy in accounting and customer service, followed by frantic management emails, including some inappropriate language, from Steve and Bill. Then a few good geeks start tracing who this cfsmp3 guy is and try to trace him to a company while the salesreps begin coldcalling any customers running around 1 million customers.

    And Microsoft will botch it because they have no experience in cowtowing and bootlicking, which are important skills for any company who wants to humbly keep its customers.

    --

    "All great wisdom is contained in .signature files"

  51. Worst. Email. Client. EVER! by Greyfox · · Score: 3, Funny
    I've been subjected to bloated goats every time I've contracted out to IBM and I've hated the experience every time. There are a number of projects going on inside the company to try to avoid having to use it, but no one's ever had a whole ot of success at it. IT steadfastly refuses to enable imap on the servers, ostensibly because the mail servers would not be able to handle the load of EVERY SINGLE IBM employee on the planet saying "OH THANK GOD!" at once and migrating to a mail client that doesn't SUCK DONKEY BALLS.

    Don't get me wrong. Notes isn't just a crappy E-mail client. It's also a crappy database access client that provides user-definiable forms which can be used to populate rows in the database. When you start getting a LOT of rows, the performance really goes to shit unless you replicate the database down to your local hard drive.

    Rather than the Notes based solution, I would suggest an old 386 running BSD and Sendmail. That'd save you a lot of pain in the long run, versus dealing with Notes.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  52. Where to start (seriously) by einhverfr · · Score: 4, Informative

    First, you need to start by drafting real requirements. What do you need exactly? Antispam? Antivirus? Try to have it fill up at least a page.

    Once you have that done, you can start looking at solutions. You will have two parts to your solution:
    1) The DMZ email relays (possibly including other antispam/antivirus functions) You really need high availability here.

    2) Your email storage and retrieval systems. These may be a little more tolerant to downtime on an individual basis. But if you need to have redundancy here, there are ways to do it.

    I think Hotmail did fine with BSD and Qmail.* I am sure Postfix is equally capable.

    * Although Qmail itself has never had a security vulnerability discovered, you should be careful. TCPRules (on which qmail relies) has a vulnerability that can lead to root access for local users. This is not a problem on systems with no local users, however. I am not aware of any patch for the TCPRules vulnerability.

    --

    LedgerSMB: Open source Accounting/ERP
    1. Re:Where to start (seriously) by Russ+Nelson · · Score: 2, Interesting

      Yer blowin' smoke, of course. Everybody loves to claim that they've found a vulnerability in djb's code, but when it comes down to details, there are none.
      -russ

      --
      Don't piss off The Angry Economist
  53. suggestions by erase · · Score: 2, Informative

    quick 15 minute brainfart:

    in order to increase reliability, you want to adopt a clustered design - if a machine or two fail, nothing should happen to the service.

    in order for all the machines to be able to find the user preferences/passwords/etc, you'll want some sort of common storage for them. it could be on a shared filesystem, in ldap, mysql, etc. ldap is common and a good choice (it has very fast read/query performance) - make sure you use replication so an ldap server failure doesn't take you down (or better yet, a multi-master setup). if you use ldap or sql, make sure you are indexing correctly on the data you most commonly pull up.

    in order for all the machines to access the user's mail, you'll want some sort of shared message storage. a shared filesystem is easiest, you could choose from nfs, redhat gfs, veritas cluster fs, etc. if you use nfs, make sure the nfs server can failover to a backup system if the nfs master dies (netapps are great for this).

    rather than using round-robin dns, i'd invest in a load balancer. there are some free options for bsd and linux, but the commercial products are very nice and easy to use. f5 labs bigips are very nice, cisco CSSes are garbage.

    other suggestions about breaking the services into different groups are spot on. personally, i'd have 3-4 inbound smtp servers inside a loadbalanced pool that handled inbound mail and passed the messages to virus and spam scanning services before delivering them to the shared message store (your load might dictate you need more servers, but if you design right you can just add more as time goes on). i'd probably put pop3 and imap services on those hosts as well, and possibly only allow pop3s and imaps (the ssl encrypted varients).

    i'd also have a set of outbound mail servers that users would connect to to relay outbound mail. they would require smtp auth, and possibly only allow connections on smtps ports. spam/virus scanning would be performed before the message was accepted by the server, so users would get immediate feedback if their message didn't go through. the outbounds would not do any local delivery, so they would not mount the shared message store (you'll get proper bounces for all invalid mail addresses this way, instead of smtp rejections for invalid email addresses in local domains).

    i'd have another set of servers that did virus and spam scanning for both the inbound and outbound smtp servers. you'd want these machines to have faster cpus than the rest, and virus and spam scanning are usually quite cpu intensive. again, if your load increased (or was more than you had anticipated), the system is easy to grow just by adding more machines.

    another set of servers would handle the shared filesystem (if nfs, or gfs exported via gnbd), and possibly also the shared preferences store (ldap).

    the final set of servers would handle webmail.

    each set of servers should be firewalled from the others (especially the webmail servers, which are probably the most vulnerable to attack), with only the neccessary allowed traffic going through.

    qmail and postfix can easily read ldap, i'm sure sendmail can also (as can commercial solutions). anything will work for the smtp daemon.

    since you are supporting pop3 users, maildir is a better choice over mailbox for your message stores. courier or cyrus would be a good choice, and come with pop3, imap, and MDA (message delivery agent) components.

    i'd have the inbounds accept mail from remote sources immediately (assuming the user being delivered to was valid) and have them hand off the message to an MDA, which would perform spam scanning, virus checking, and any user filtering configured before delivering the message to the user's mailstore. (scanning after the message is accepted uses more resources, but grants you more flexibility - users can have their own spamassassin settings, or you can add any number of filtration steps).

    for virus scanning, check out ClamAV. for spam scanning, look at spamassassin (

  54. Easy by xihr · · Score: 5, Insightful

    Resign. You're obviously in way over your head if you have to resort to asking Slashdot readers for advice like this.

  55. Re:NO Domino by Shalda · · Score: 2, Insightful

    Well, on the subject of what not to use, avoid Lotus Domino & Notes as well. Take your favorite horror story involving Exchange and substitute Domino for Exchange and Notes for Outlook and that's what it's like. Only Outlook is a much better mail client.
     
    There are dozens of perfectly good mail servers out there. The more features they have the more likely you are to have problems. It's a pretty simple equation.
     
    And if all else fails, you can write your own. I've written one, it's not very difficult (hacked it out in C# in a weekend). It's a very simple plain text protocol. But I wouldn't run the company on something I wrote in C# in a weekend. I don't even use it myself anymore. I'm running Exchange now for my personal mail server as that's what we run at work.

  56. Re:I worked at a company that did this... by TheBracket · · Score: 2, Informative
    I work with a similar setup, only rather than plain Qmail we use Qmail-LDAP. It works wonderfully, and has some nice (but really not amazing) clustering capabilities.


    We have account data stored in an LDAP store, mirrorred to a second (read-only) store for redundancy/scaling when busy. LDAP scales wonderfully for read-heavy tasks such as this one.


    As has been mentioned separately, separating recipient (edge), storage, and outbound mail servers is really important. Our edge servers perform RBL checks, greylisting (on some domains that want it), SPF (ditto), reject various attachment types, perform a reverse-MX check to try to accept from valid addresses only, and perform a recipient address check to quickly reject incorrectly addressed messages. That cuts down 80% of incoming mail (with very few false positives). Mail is then forwarded to a second set of edge servers that run SpamAssassin (set to flag spam, not stop it) and ClamAV on attachments. Finally, it goes into the storage servers. POP3/IMAP/Webmail points at the mail directories on these servers. Our outgoing servers are quite a simple setup, with SMTP Auth (also hooked to LDAP). We also have a few listservs setup, but they are a side issue.


    Qmail is a bear to setup, and asking the author for advice is a good way to get flamed. Other than that, it works very well, we haven't had any security issues, and it's adequately fast - especially if you apply the "silly qmail todo" patch, fixing concurrency problems under high load. It's part of the Qmail-LDAP distribution (as is almost everything else I listed).


    For servers, we use FreeBSD. I'm sure other OSes would do a fine job, but FreeBSD has been rock solid for us.

    --
    Lead developer, http://wisptools.net
  57. Intelligent Architecture by Anonymous Coward · · Score: 5, Informative

    Hi Cliff;

    Sounds like a fantastic design opportunity here. The 5% of the project that is Enterprise architecture is what I enjoy the most as well. I'm assuming money probably isn't an object in terms of how much gear and bandwidth you may have to feed to this.

    I'm happy to let my fingers type away below, I'd love to keep in touch and see how you end up shaping this system. my email is allowmx at hotm...

    Before I ask, are there actually a million accounts? Or is that just a ceiling that you have to show proof of concept with?

    I've only implemented up until about 250,000 accounts of any kind, as I'm sure you're probably aware, the base transactional resource costing is essentially the same..

    For me, I would look at this for sure from at least these two angles:

    1) knowing your transactional costs (how much of your hard resources, bandwidth, cpu and disk space) will each type of transaction in your system take?) I mostly use this approach to get not an exact number, but an idea of magnitude, and detail where it happens on it's own to make sure the proper attention is applied to them.

    2) Failsafe intelligence & capacity in the infrastructure, as well as the failsafe intelligence & capacity in at the application layer. You have to know that your hardware, software, os, business logic and applications are all monitorable internally, externally for availabilty and actual "can I use it". Transactional logs, etc, of having information available when the inevitable problems come up.

    Also, having a capacity for as many of these layers to be self-healing, and fungible to the point that your service delivery is homogenous in as many ways possible. If your network finds something doesnt work or route, with mail, you can find another way to route it. Having a transactional manager of some kind, direct or not, could be useful in this case depending on what the client wants.

    99.9% uptime equates to about 526 minutes, or 87.6 hours you _could_ be down each year. Thats about 7.3 hours a month, or one day a month.

    Based on that, having flexible, redundant tools setup in a high-availabily arrangement at their respective operating capacities is key. I'm not sure if your current exchange problems are being aided by not enough equipment, bandwidth, or other stability issues, so I'll just assume that it's all of them :)

    I apologize if anyone else has already mentioned some of this, but here's some of what I've found to help me where email has become as crucial to a business as their cell phone.

    On the hardware level:

    - STORAGE: Everything goes on a SAN, if not more than one. Don't waste your time with anything less.
    - SERVERS: All servers have redundant hot swappable parts in the very least, power and hard drives. I'd even suggest making the servers Iscsi bootable so they can boot off the backbone. Beyond this, I like to buy my servers in piles of identical ones. Have 1-2 spare serevrs of each kind sitting there, ready to throw hot swap drives into from a failed server. That way if a server dies, you can address the power supplies, or get the HD's in that machine into another identical server and get it up and running while you diagnose the hardware problem independantly. My approach to any kind of problem is FIX, DETECT and REPAIR. Get it up and running, find out what was wrong, make sure it's fixed for good. Too many of us stop at the first too ;)

    The idea I have in mind is a smaller scale of a google beige box army. linux/bsd offer so much more transcations for each piece of hardware, so that works very much in your favor. Obviously something enterprise grade to satisfy the client such as the Compaq/HP Proliants, etc. I feel these Servers ahve the best overall support, manageability and information tools, and their openlinux drivers interface wonderfully with open source operating systems)

    Networking/Communication level:

    - Entire mail processing architechture communi

  58. You need a staff of 10 to 20 to run this... by Temkin · · Score: 2, Informative


    One million email accounts is quite a lot. You getting into the big league ISP category with something like this. It's not a one person operation to put something like this together. You're going to need a substantial number of well trained people to do this. There's only a couple players in the field at this level. Sun's JES Messaging system owns a sizeable chunk of the market, followed by OpenWave and a small gaggle of fly-by-nights with unproven track records.

    Some of the larger email systems however are homegrown using open source parts. Yahoo and Google immediately come to mind, and they do work quite well. But you probably don't have the resources that they do to engineer & test something like this. Yahoo is rumored to have more than 200 people working on email alone.

    Sun has a deployment like this canned, sitting on a shelf in Santa Clara. Tell them what you need, write a check, and they'll show up with the kit. 99.999% uptime if you write a big enough check. Make them to throw in the Waveset stuff.

  59. Oh, dear God, you RECOMMEND Notes? by Anonymous Coward · · Score: 3, Informative

    Of course, everyone should note that recommendation is coming from an IBM employee.

    Sorry, but Lotus Notes sucks; it's an abomination in almost every way. It's bloated, slow, buggy and has what is arguably the worst user interface ever (The User Interface Hall Of Shame said they could have based their entire site on this one app!) Sure, it does group meeting notes and can let you check other people's calendars but it falls flat as an email system. If it can't do the basics, who cares about the "advanced" features.

    Doubt me? Okay. Let's try a little experiement.
    First, sort your inbox by subject. Oh, I forgot. YOU CAN'T. Well, let me take that back. You can if you simply follow these simple instructions...
    First, you need to have Domino Designer installed. In Designer, open Folders in left pane, then open folder $Inbox, highligh the Subject column. In the window with Columns properties in second tab you can check-in the "Click on column header to sort..." checkbox. Close $Inbox folder window. To prevent design refresh, in Folders view, right-click on $Inbox folder, choose Design properties and on third tab check-in "Prohibit design refresh or replace to change".
    [blinks eyes in disbelief]

    Un. Fucking. Believable.

    Oh, and the feature I like the best is the pop-up dialog that tells you you have new mail. So you click to make that go away, switch over to LN to read the new mail and it's not there... Oh, yes, that's right, you have to press F9 to actually download the email to your client, even after being notified by an obnoxious popup that you have new mail.

    Want to know another neat little feature related to that F9 key? According to our LN System Admin, get a few dozen people to all press and hold the F9 key for a few seconds at the same time and you can crash the Domino server backend requiring the server to reboot. Nice.

    I could go on but I think I've made my point. I have never, ever, encountered anyone who has switched from Notes and been pleased with the change.

  60. Outbound queues by dskoll · · Score: 2, Insightful

    You probably want a FallbackMX host (or a bank of
    them) so backed-up outbound queues don't interfere with normal outbound processing.

    The FallbackMX hosts can use a file system optimized for directories with lots of files in them (and can of course themselves be tuned as the parent poster suggested.)

  61. See what others are using by henry.thorpe · · Score: 2, Insightful

    I'd start by seeing what the big ISPs are using.

    That's a matter of doing an mx lookup, telneting to one of their gateways on port 25, and seeing if you can infer from their banners what mail system that they are running (for the inbound smtp gateways, anyway-- since there's nothing to prevent them from layering different products). Look to mailing list archives for messages sent from the various domains, and see what the headers tell you about their outbound mail path.

    Example: Inbound Comcast HSI:

    $ dig comcast.net mx ;; ANSWER SECTION:
    comcast.net. 250 IN MX 5 gateway-r.comcast.net.
    comcast.net. 250 IN MX 5 gateway-r.comcast.net.

    $ nc -vv smtp.comcast.net 25
    Connection to smtp.comcast.net 25 port [tcp/smtp] succeeded!
    220 comcast.net - Maillennium ESMTP/MULTIBOX sccrmhc14 #274

    So, they use something claiming to be 'Maillennium'.

    If you do this for AOL, you'll see some weird-looking, probably custom AOL gateway. Earthlink says something like:
    'ESMTP EarthLink SMTP Server', AT&T WorldNet is also Maillennium, Verizon.net declares 'MailPass SMTP server v1.2.0', and so on.

    If you really wish to probe to see if this is opensource-ish stuff with obfuscated banners, you can try fingerprinting them using smtpscan http://www.greyhats.org/outils/smtpscan/> to find out that it's really just Postfix or Sendmail hiding behind that custom 220 banner. Actually, it's the smtpscan fingerprint file is an interesting read all by itself...

  62. More specific? by Grendel+Drago · · Score: 2, Interesting

    Could you be a bit more specific on the following items?

    5) Breaks well-known and understood UNIX standards.

    Which standards are these? Are you talking about the errno fiasco?

    6) Security through lack-of-functionality.

    What sort of functionality is provided by, say, postfix, that qmail simply won't do?

    7) Not really secure despite the claims.

    How's that? Do you have $500? If not, what's the security vulnerability that the author refuses to acknowledge?

    Which of these problems that you enumerate are not addressed by netqmail?

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:More specific? by killjoe · · Score: 2, Interesting

      "What sort of functionality is provided by, say, postfix, that qmail simply won't do?"

      Qmail has almost no features out of the box. It can't talk to LDAP, it can't handle multiple domains, it does not reject mail for unkown users (instead it queques up a bounce message which means each spam message generates one outgoing message).

      in order to get qmail to what exim and postfix do you have to apply half a dozen patches and recompile.

      Of course unless the guy who did the compile took very careful notes you have no idea what your particular installation of qmail is capable of either.

      I inherited a qmail install one time and it was a nightmare to maintain. When somebody decided to start sending me 100 thousand emails a day to unkownuser@mydomain.com and my message que got to be hours long I only had two options.

      1) Gather all the patches used to build the original qmail (again no real way of knowing) and then add yet another patch and recompile.

      2) Install postfix.

      Guess what I did?

      --
      evil is as evil does
  63. Cyrus IMAP by Rheingold · · Score: 2, Informative

    Cyrus IMAP is designed for this size of installation. You can split the backends up with Murder on the front-ends to distribute load; divide mailboxes on each host between filesystems (which, you'd presumably spread over multiple disks); use a SAN and GFS or other shared-storage cluster filesystems and share the spool among servers; use the new pre-release 2.3 code with mailbox replication and use more discrete, commodity components. Lots of other features that are designed for large-scale implementations.

    For authentication, of course you have choices among LDAP, Kerberos (both of which are usable even if you're stuck with a Windows domain for authentication), PAM and other things. Very flexible; too flexible for some and it can be a bit confusing.

    I've been working on rewriting the HOWTO, although I haven't made a ton of progress, it may still be useful to you: http://nakedape.cc/info/Cyrus-IMAP-HOWTO and here's a presentation I put together for Linuxfest Northwest: http://nakedape.cc/info/Cyrus-IMAP-Intro.

    You mention a million mailboxes, but that doesn't really mean much--that is just an estimate of storage requirements. What is more important to determine is how many concurrent users you will have and how much actual traffic--storage is cheap, memory not so much.

    --
    Wil
    wiki
  64. Re:bring in a consultant? by subterfuge · · Score: 2, Funny

    so he can't be management...

  65. Re:Qmail!! by Pharmboy · · Score: 5, Insightful

    A single server? For one million users?

    Insert "imagine a beowolf of those" joke here, except it isn't a joke.

    I think you might be underestimating the requirements for this large a project that "must scale perfectly". The "99.9% uptime is expected" requirement alone requires multiple internet connections, a large cluster of front end servers, and redundent database servers, preferably located in different states. (ie: "What do you mean our only server is in New Orleans?")

    I don't think the average Dell dual Xeon box is up to the task for this large a project...

    --
    Tequila: It's not just for breakfast anymore!
  66. I/O by Graymalkin · · Score: 4, Informative

    While not quite a million users, HEC Montréal switched from Netscape Messaging Server running on AIX to Postfix/Cyrus/SquirrelMail running on Linux. Linux Journal ran a really nice article and a follow-up about their transition.

    One of the first things the school did was figure out how exactly their current system was failing them. Their old AIX boxes were being stressed just by the volume of mail coming through the system, they had little power left over to do any sort of filtering. This led to users getting drowned in unwanted e-mail which only exacerbated the existing load issues. This is one of the first things you need to do, figure out why your current system isn't working properly. You'll be better equipped to fix the problems when they've actually been identified.

    HEC Montréal also went for heavy redundancy and specialization. Instead of a handful of servers sharing all of the tasks equally each node in the cluster has its own job with every class of job having a backup server. Every job is going to take a beating with so many users, even if only a fraction of them are using the system at any given time.

    I'd say the most important part of what you're doing will be modeling your current use. Are you getting a ton of traffic from viruses and worms spreading over your internal network? Do you get huge amounts of spam traffic to users? In such cases filtering at your SMTP servers will relieve the rest of the system from extraneous traffic. While you might need really beefy external SMTP servers you won't need nearly as much storage space on a SAN or NAS.

    --
    I'm a loner Dottie, a Rebel.
  67. Try Backup Exec for Single Mailbox Restorations by LazloToth · · Score: 2, Informative

    Not defending Microsoft here, but I have to take care of an Exchange 2003 Enterprise server, and I wouldn't think of trying to do it without Symantec (formerly Veritas) Backup Exec with the additional Exchange agent. Yes, you can back up and restore individual mailboxes, and even individual messages. Backup Exec has its quirks, but it's the best thing going if you have to take care of Outlook users. Over the years, starting with Exchange 5.5, Backup Exec has saved my rear when information stores got corrupted, log files were deleted accidently, and so on. Combined with a nice, fast AIT tape library, it's a great data preseration product for the small- to medium-size enterprise.

    --


    It's only funny until someone gets hurt. Then, it's hilarious.
  68. Re:Easy by Anonymous Coward · · Score: 2, Insightful

    Why is this marked as funny? It should be marked as informative.

    Unless the person wanted to start an Exchange flame war with his post, he clearly has no idea how to design an enterprise email infrastucture.

    All the technology in the world can't help you if you don't understand what you are doing and based on his broad sweeping question, it would be easy to assume that he doesn't.

    If he is the amateur email administrator that he has made himself out to be, no amount or advice or technology can help him.

    If he can't design the email infrastructure he definitely won't be able to properly implement and manage it either.

    Better leave this kind of work to the professionals.

  69. Army Knowledge Online does it for 1.72 million use by kenblakely · · Score: 5, Informative

    AKO (www.us.army.mil) is the Army's official intranet portal. We provide email for over 1.72M users, and we move almost 3 million messages a day. We do it all with Sun Messaging Server ver5.2 (soon to be Jes3) and we have exactly 2 (count 'em) two mail administrators. Sun mail is rock solid and scales great. We offer POP, SMTP, enterprise SPAM and Virus filtering as well as personal address books besides. We don't get the rich Outlook fat client, but then we want to be all web-based anyway. Can't say enough about Sun mail. If we had to do this with Exchange, I'd have to hire prolly 50 admins and deploy order of magnitude more machines.

  70. My vote is for Notes by mferrare · · Score: 2, Interesting
    I'd put my vote in for Notes also. It's architecture should scale to meet your requirements what with distributing you setup across many servers and using replication. Granted the client isn't the best by any means (more on this later) but the application itself is quite good. Your laptop users can replicate their e-mail locally which is a simple procedure. I replicate my notes locally just so I can index my mailbox on my local drive.

    But the real advantage of Notes is as a distributed applications platform. If you want to expand past e-mail and start writing applications such as leave management or room booking or technical documentation databases the this is where Notes really shines. And they're all databases and they can all be replicated so they take advantage of the same redundancy that your e-mail will use. And if you need to travel then you just replicate the databases you want onto your notebook and take them with you. It's fantastic.

    Ah, the mail client
    Why oh why does the client suck SO MUCH!! At my previous company the management were looking at moving to exchange simply because Outlook is so much a better client than what Notes (even R6) is. It's a big fat piece of bloatware (as has been discussed many times here). My main peeve is that if you edit an attachment inside an e-mail you can't save it back into the e-mail! eg: here's a typical scenario:
    Not using Notes (outlook, thunderbird, mail.app all let you do this)

    • Receive e-mail with an attachment
    • dbl-click on the attachment, edit it, save it
    • forward the e-mail, including the saved attachment, to someone else
    Simple huh?
    With Notes:
    • Receive e-mail with an attachment
    • Detach the attachment from the e-mail message. Save it somewhere
    • Use windows explorer (or whatever) to find the attachment, edit it and save it
    • Forward the message
    • before sending, delete the original attachment and replace it with the copy you have saved on your hard drive somewhere
    • send the message
    • delete your copy of the attachment
    Sigh!!!

    WHY!?!?!?!?

    But despite all that crap I still think it's an excellent platform and one you should consider. It has support for encryption and also supports IMAP (although not very well I hear). A lot of large corporations run it. I've worked for 2 large investment banks both of who run it. You can also integrate IM into it (with sametime) and remote meetings also (with sametime meeting). Also, IBM PS are good at setting it up. For something this scale you'll be up for $$$ anyway so I'd be looking at having someone come in to help you and they're pretty good (I don't work for IBM!).

    --
    Why would anyone want to use a text editor that is not vi?
  71. Re:Qmail!! by Allador · · Score: 5, Informative

    No. 0.1% != 0.1

    365 days * 24 hrs/day = 8760 hours per year

    0.1% downtime = 0.001 downtime

    8760 * 0.001 = 8.76 hrs

    You're off by two orders of magnitude.

    8.76 hrs / 12 months = 0.73 hrs/month = 43.8 minutes/month

    One 45 minute scheduled downtime (assuming its scheduled) per month isnt terrible. It's not great, but costs really start to go up as you add nines beyond those 3.

  72. Re:Qmail!! by Zarel · · Score: 3, Informative
    The "99.9% uptime is expected" suggests a fixation with Windows NT on flaky servers. 99.9% equates to 876 hours of outage a year. Quite frankly the requirement for 99.9% availability suggests the equirer does not know what they are talking about.
    That's not right... 99.9% uptime means 1/1000 of the time is outage. 1/1000 is less than 1/365, so 99.9% uptime is less than 24 hours a year, not 876. It's actually somewhere around 8. I suppose you meant 8.76 hours of outage?
    --
    Want a high quality FOSS RTS game? Try Warzone 2100!
  73. Re:Qmail!! by LurkerXXX · · Score: 2, Informative
    What???

    Check your match before telling him he doesn't know what he's talking about.

    It's 8.76 hours of outage a year.

  74. Try this... by drasfr · · Score: 2, Informative

    Just an idea... if you want to go with open sources products in your company.

    First, the most important is the backend storage.

    - I would try using a SAN for storage, like a small Clarion for example. I would carve the storage for the mail there on a volume.
    - I would create a set of export servers that would connect directly to the SAN and re-export the volumes to a set of front end servers using a combination of gndb, gfs, etc...
    See this document:
    - http://www.redhat.com/magazine/008jun05/features/g fs/
    - configure a set of servers that would act now as the mail servers themselves (frontends). I would strongly suggest using maildir. CourrierIMAP for the pop3/imap accounts is great. Install this on all the machines. For the SMTP agent you could use courrier but I usually prefer Exim.
    - run both the IMAP/POP/SMTP servers on all the servers, using maildir only.
    - use a mysql database to store the users information (passwords, email addresses, etc...). You might want to configure 2 mysql servers. One as the Master slave that will receive only the writes, and the other that would be accessed for read and balanced with the first one as reads to access user information and accounts will probably be 99% of the database activities.
    - use a load balancer to put in front of all the frontend servers, do a load balancing for all the services (POP3/IMAP/SMTP) with sticky session that will try to keep the same users on the same machines when they try to download their mail.

    When you are running out of capacity, simply adds new frontends, put them behind the load balancers and voila...

    of course I would advise going right away with powerfull 2x3.6GHZ P4 servers and like 4GB of memory. That is powerfull and can certainely serve a LOT of users already per server.

    my 2c, written quickly. I apologies if not complete but I am pretty sure the general idea is there and sound.

    open to comments

  75. You are wrong in every way. by Some+Random+Username · · Score: 4, Insightful

    There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).

    Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.

    And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.

    1. Re:You are wrong in every way. by cecil_turtle · · Score: 2, Informative

      With such a strongly worded title you'd think you'd actually have some experience to back up your claims. Memory access is faster than disk access, period. I don't care what file system you're on or what kind of caches the OS implements, fact is it's going to go to the disk almost immediately to store the change. And we're not talking about one user checking every minute. We're talking about tens of thousands of users checking every minute or few minutes. That's a continuous load on the disk - not a desirable situation for a server. Also remember that access logs are also written to disk as well.

      I'm not arguing that relational db's are the way to store everything; I'm totally about the right tool for the job. But file systems are good for storing files, they're not intended for the level of data updates (new files / deleted files) that a high use email server generates. Databases are. Also disk writes from databases are also optimized if your database is well designed and resistant to paging. If you don't want a RELATIONAL database, fine. There are other types of databases you know. Mail servers don't have anything in common with file servers in terms of resource usage.

    2. Re:You are wrong in every way. by eh2o · · Score: 2, Interesting

      As I said before, all of those things add up to a constant overhead. (but maybe you never took a class on algorithms so you don't know what I'm talking about...)

      In order to say that an RDBMS is an order of magnitude slower, one most show that as load increases the overhead of the DB grows faster than that of a FS doing the same task. (and, generally, to say that this difference is "an order of magnitude" the spread between them should increase at least linearly).

      Doing a trace on a DB for a simple query tells you absolutely nothing about its scalability.

    3. Re:You are wrong in every way. by csirac · · Score: 2, Informative
      there is no partitioning of memory to "kernelspace" and "userspace"

      Yes, there is. Try some kerneltrap articles to learn more about Linux (and OS internals in general :-). This article describes how systems with > 1GiB "big memory" works on Linux on ia32, which is reminiscent of the himem.sys days of MS-DOS ;).

      2^32 = 4 "GiB". That's all you can address with 32 bits.

      On ia32, Linux allocates 1GiB of virtual address space to the kernel, and the remaining 3GiB to user space.

      Thus, the maximum amount of physical memory that can be mapped to a stock ia32 kernel is 1GiB.

      This is enabled via the PAE (Physical Address Extension) extension
      of the PentiumPro processors. PAE addresses the 4 GB physical memory
      limitation and is seen as Intel's answer to AMD 64-bit and AMD
      x86-64. PAE allows processors to access physical memory up to 64 GB
      (36 bits of address bus). However, since the virtual address space is
      just 32 bits wide, each process can't grow beyond 4 GB. The mechanism
      used to access memory from 4 GB to 64 GB is essentially the same as
      that of accessing the 1 GB - 4 GB RAM via the HIGHMEM solution
      discussed above.


      There are awkward things you can do at kernel compile-time to get more than 1GiB accessible to the kernel on ia32, but it's not as pretty as you seem to be thinking.
    4. Re:You are wrong in every way. by swillden · · Score: 2, Informative

      There are awkward things you can do at kernel compile-time to get more than 1GiB accessible to the kernel on ia32, but it's not as pretty as you seem to be thinking.

      Well, if setting CONFIG_HIGHMEM=y counts as "awkward". The kernel docs say that's the correct setting for machines with between 1 and 4 GiB of RAM.

      From my laptop (with 1.5GB RAM and a Linux 2.6.13 kernel, with CONFIG_HIGHMEM=y, which is actually the default setting on most distros these days):

      %cat /proc/meminfo
      MemTotal: 1555496 kB
      MemFree: 32000 kB
      Buffers: 137900 kB
      Cached: 1228596 kB

      1228596 KiB == 1199.8 MiB == 1.1717 GiB

      So my kernel is currently using more than 1GiB for caching disk storage. In fact, my kernel can address up to 4GiB of RAM. I have another 1GiB DIMM on the way (which will push my laptop to 2 GiB RAM), so in a few days I'll be able to show my machine caching around 1.7GiB. (Yes, there is a reason I need 2 GiB RAM in my laptop, and it's actually not file caching).

      For machines with more than 4GiB of RAM you have to use PAE. That will allow the system to use up to 64GiB of RAM, but each process (including the kernel, even though it's not really a process) can only access 4GiB. So, your argument holds some water in the case that:

      • The machine has more than 4GiB to use for caching.
      • The machine has a 32-bit processor
      • The database engine runs multiple cache daemons, each of which caches up to the 4GiB of data it can address, and the actual server process has some mechanism for querying the correct cache daemon for a particular chunk of data.

      In that case, the database can cache more than the kernel. I'm not aware of any database engine that has such cache daemon processes. IMO, if you're putting more than 4 GiB in the box, you should probably go ahead and buy an Opteron for it also, avoiding the whole issue.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    5. Re:You are wrong in every way. by greenhide · · Score: 2, Insightful

      In a related vein:

      I'm a lowly web programmer, not nearly as brilliant in the programming field as these other geniuses here, but I find it interesting that almost all web programming books tell you that if you can move processing into the database query instead of running it in the machine code, that it'll be faster.

      This is so rarely the case. Unless you have a very powerful database server, odds are good that quite a lot of the various aggregate functions you might want to run will go much, much faster if you simple do a simple select in the database and then loop through the processing in the web app code. Not sure why this is true but it is.

      A month or two ago I heard a great quote on Cartalk that I think should be plastered to every programmer, scientist, and engineer's bulletin board:

      "Reality often astonishes theory."

      In all honesty, though, I think that a database *would* be up to the task, even for 1M+ users. Consider Amazon, which probably gets several thousand simultaneous hits each second. And each page they pull up involves much more complex data searches than a simple mailbox.

      I'd say the key concerns here aren't surrounding efficiency of processing. Mail servers, no matter how configured, are relatively low on the scale of computational complexity. It's more a size issue than anything else. The main problem will be determining how to store the data in a way that is safe, secure, fast, and reliable. Because the data needs to be redundant and widely dispersed (as in the New Orleans example someone pointed out above), it may be that a database, while not the fastest tool, may be the best tool for the problem.

      I'll admit; I know nothing about how one would go about making identical file systems available simultaneously on many distant servers. But I'm guessing once you start doing that, you're starting to increase the complexity for the system in any case.

      --
      Karma: Chevy Kavalierma.
    6. Re:You are wrong in every way. by BorisAmmerlaan · · Score: 2, Insightful
      A friend did this:
      for i in `seq -w 1 1000000`; do mkdir $i; done

      So you took the nearest LART and Enlightened him.

      Seriously, though - is there ever a reason to stick 1,000,000 objects into one container without any regard whatsoever to the type of objects or container? (Ignorance doesn't count.)

  76. Re:Obviously - RTFM!!! by elrick_the_brave · · Score: 2, Informative

    Exchange 2003 - any edition. You can scavange the restored database and bind it to any account that doesn't have any exchange.. I.E. a new temporary account... RTFM!!!!!

    --
    (1st sig) If this were a snappy sig, you'd be reading it right now. (2nd sig) I'm a karma whore. >Insert FUD here
  77. Re:go to gmail by plazman30 · · Score: 2, Informative

    Not to be a smart ass, but it's not SLA agreement. It's an SLA. SLA stands for service level agreement. SLA agreement would be service level agreement agreement.

  78. Re:Qmail!! by ObjetDart · · Score: 3, Funny
    99.9% uptime allows for almost 15 minutes of downtime a day. Even for a mom n pop business, that is becoming unnacceptable.

    Yeah. Well, if 1 minute, 26 seconds is "almost" 15 minutes, anyway.

    --
    I read Usenet for the articles.
  79. Really want to know why the client isn't as slick? by CFD339 · · Score: 2, Informative

    Simple. Its cross platform. The entire product is cross platform. Yeah, like java. Only they did it before java was a pipe dream. Late 80's.

    It has this thing called a seperation layer. All the code except the ui is the same on all the platforms. Clients used to be for os/2, mac, win16, win32, and solaris. Client side that got scalled back because nobody paid for the others -- client is win32 and mac now -- soon with code under linux as part of the next generation client. Lots of people are using on Wine.

    Now, the server is still cross platform. Win32, Linux, Aix, iseries (as/400), zseries.

    The problem with making something cross platform is, you don't use all the nifty little Windows specific integration and custom pretty things. You don't get something for nothing -- you have to make all those bits.

    Oh, the other thing? Outlook feels integrated because everything automatically does the windows automatica launch active-x thing. Just highlight a message subjet, bingo! Embedded code launches! that's why viruses and worms.

    If stuff wants to run in Notes, it has to be have a signature. OHHH, public/private key signatures and encryption. When? 1991. Hunh? Yeah, since 1991.

    If something wants to run in Notes -- It need PERMISSION to run. Thus, no viruses or worms unless you're stupid enough to tell them "OK, sure, go screw up my machine".

    Yes -- the development environment is weird and pretty unsophisticated. It takes a lot of time to learn because its not like other things. BUT -- I can make it do cool, secure, reliable things at a tenth of the cost you can in J2EE or MS .NET.

    Excited about JSR170? Ah, me too. The Notes database internals match it almost perfectly. Domino will make a great JSR170 back end. Hell, its almost that already.

    Meantime, you trolls are whining about a product that runs in Linux as a server and (using Wine) as a client. Runs on Mac. Has a fully functional JAVA environment for development and a remote API through CORBA and DIIOP. No no, instead you'll use a proprietary only -- Windows Only, Active Directory Only, Virus Distribution Engine from Microsoft.

    ahahahahahaha. Enjoy it!

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  80. no, it will not be sendmail by Doktor+Memory · · Score: 3, Interesting

    All of these systems will be running sendmail.

    You're high. Building a massive production email system on Sendmail 9 is slow-motion suicide. If the security holes don't get you, the terrible configuration methods and complete lack of scaleability will, nevermind the fact that Sendmail Inc is trying desperately to replace the product.

    "Most managable with [...] heavy customization?" I'd laugh if I wasn't crying. And I'm crying because I used to work for a company that deployed a massively customized sendmail infrastructure -- and I was one of the poor bastards who had to maintain it. Trust me, you don't want to do this. Ever.

    Yes, milter is cool. No, it's not cool enough to justify burning CPU cycles on sendmail in 2005.

    Even Sendmail Inc tacitly admits that Sendmail's design is garbage: take a look at the design document for Sendmail X, and note carefully how much it resembles Postfix and Qmail. There are very good reasons for this.

    --

    News for Nerds. Stuff that Matters? Like hell.

  81. The question is bogus. Hypothetically..... by CFD339 · · Score: 2, Informative

    Hypothetically (since nobody is dumb enough to believe this is a real life case of a million users being defined by someone betting his career on slashdot trolls)

    If it were me starting from scratch -- the model for a million uses is the internet itself. SMTP, DNS, and mabe a big LDAP directory tool. For calendaring, you're SOL, but nobody calendars with a million poeple. That's meaningless. Calendaring is only useful at the workgroup level anyway. Look to any good workgroup calendaring tool and let users define thir own working groups.

    Now, backing off the big million user stupid number. In the real corporate world, you have two real players and a ton of also-rans. The two real players are IBM/Lotus with Notes and Microsoft with Exchange.

    The market is split roughly evenly. In the US Microsoft leads a bit, in Europe and EMEA IBM/Lotus leads. How much and actual numbers are hard as hell to track down. IBM doesn't release them and Microsoft likes to count every copy of Office as an Outlook seat. Suffice it to say both companies own about a hundred million actual users.

    The basic trade off between the two - With Exchange you get tighter integration with Active Directory and smooth look and feel integration on windows. It feels like all part of the operating system. On purpose. On the other hand, you're forced to use Active Directory, forced to use Win32, and all that integration without any real security means viruses are unstopable. With Notes you get a bulky client that many users find hard to understand. You also get almost 100% prevention of virus spread (it has built in security) and other goodies. Its also a development platform and its cross platform. The client is Win32 and Mac, and users have writen howto docs for WINE. The server is linux, win32, AIX, ZSeries, and iSeries (as/400).

    You may not know this, but BOTH can use the Outlook client. Yes, the outlook client is supported with a Domino mail infrastructure. Who'd have thunk it?

    Oh, and Domino supports other mail clients too. Pop3, IMAP, and a very good Web Browser -- all at once for the same person if you like. Its got native SMTP support, as well.

    What Notes isn't, its pretty. Most people say Outlook is prettier. Ok. Easy to do if you own the OS and make software that only runs in one environment.

    So, I hear rants about Notes. I hear trolls whining about a product that runs in Linux as a server and (using Wine) as a client. Runs on Mac. Has a fully functional JAVA environment for development and a remote API through CORBA and DIIOP.

    No no, instead they'll use a proprietary only -- Windows Only, Active Directory Only, Virus Distribution Engine from Microsoft.

    You gotta love that. Why? Well, its pretty.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  82. Google It! by DieBase99 · · Score: 2, Informative

    I searched on Google for "email system" for "1 million users" ... this page came up: @Mail with large user bases -> it even gives you a case stude of Hotmail!!! the company is called @Mail it is the exact same solution that seeqmail.com uses and they have over a million users. Read it... Find out more... and Google some more Don't pay over-priced consultants unless it is something you have absolute no expertize in. It is your job to figgure out how to get it done.

  83. Re:13 Million mail infrastructure by louissypher · · Score: 2, Insightful

    I built and admin mail for around 100k users. Their is no f'ing way that you can run 13 million accounts on 10 machines. One webmail server for 13 million people?

    --
    www.bleepyou.com
  84. Re:Qmail!! by Loconut1389 · · Score: 4, Funny

    You used to work for NASA right?

  85. Plain Text by Craig+Ringer · · Score: 2, Informative

    Yes, passwords are transmitted in plain text. So is IMAP, and so is SMTP. You do make your users authenticate for SMTP, right? Picking another protocol will not help in this regard.

    What you need to do is support STARTTLS for these protocols. That lets the client connect then negotiate an encrypted connection with the server before sending passwords. It's easy to configure the server to refuse to authenticate the client unless an SSL session has been set up if that's what your security policy dictates. It's also possible to have the server demand a client certificate from the client before setting up the SSL connection, adding an extra layer of authentication.

    You'll probably also have to support the old IMAPs, POP3s, and SMTPs standards, but they should be considered deprecated and only in place for crap clients that don't know about STARTTLS.

  86. Needs to be said: by StarsAreAlsoFire · · Score: 3, Funny

    From ASR ( http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.ht ml )
    Re : Mail Transfer Agents

    Qmail : a small office of neatly dressed clerks, delivering short clipped remarks to queries, and handling mail with a rude impersonality, except in the case of failiure where they let their hair down and have an after-hours beer and let you know about it, pointing to the pertinent header sections.

    MMDF: A jumped up mailroom boy with a chip on his shoulder. Loves the bureaucracy and takes great pride in stamping "illegal address" in red ink on any mail it passes. Unpacks all the mail and repacks it in his own special envelopes before delivery to end users.

    PP: MMDF gone mad with standards fever. Think "Brazil".

    No, PP is... well, see, when it receives a letter, it chops it into small pieces, then translates bits of it using an English-Hungarian phrasebook and puts all the bits into various pigeon-holes. When it gets round to delivering the message, it collects all the bits, translates them back using a Hungarian-English phrasebook, tapes them together, and loses the letter. Some time later, you get a bounce message:

          ----- The following addresses had permanent fatal errors -----

          ----- Transcript of session follows ----- ... while talking to bloat.example.com.:
    >>> RCPT To:
      550 My hovercraft is full of eels

    PP is John Cleese.

    Sendmail: Shiva as a postman. Many arms delivering mail, dancing, taking drugs, destroying as it sees fit. Often makes creative changes to the mail for kicks, but ultimately can be persuaded to do anything with the right incantation...and that includes giving you other people's mail.

    VMail: No experience yet, but I'd guess something like a wisened old man sitting on the porch outside the postoffice. Looks at everyone who passes by with deep suspicion, but turns out to be friendly and helpful once he realises you're not there to rob the place.

    Micro$oft IMC: The Scarlet Pimpernel of postmen. Hard to find, impossible to order about, but every once in a while it saves a piece of mail from disaster. Sometimes even with it's head(ers) intact.

    cc:Mail SMTPLINK: A 5 year old child left in charge of a large sorting office. Can't reach over the counter properly, can't handle more than one letter at once and has to go looking for a grownup whenever it wants to deliver to mail to other towns. Often opens parcels to look for shiney things inside then just delivers the wrapping paper onwards.

    cc:mail UUCPLINK: an insane madman sitting in a box. Mail is thrown into a box where unknown things happen to it.. sometimes mail actually leaves the box.. usually to be delivered to the administrator of a totally unrelated postoffice and containing a complaint that the madman could not find the recipient in his dark box and would you please contact the person with the key of the box. Of course, the only way to reach that person is by mail and even if the box is opened the madman cannot be pursuaded to actually send mail to unknown addressees to the person with the key anyway...
    Gus, Pete Bentley, Malcolm Ray, Perry Rovers

  87. Re:Qmail!! by nagizli · · Score: 2, Insightful

    While debating how much time the downtime takes, which is completely worthless, I'd rather you skim through the specs of FreeBSD & Qmail if they exist. I'd also look for companies which provide installation and support of FreeBSD and consult them on subject of how much this installation could cost or something. I'd also look for successful projects with Qmail & FreeBSD.

    I'd take into consideration the fact that UNIX-based solutions are far more lightweight than ones of MicroSoft so you have no idea of what you're talking about unless you managed one yourself. Before debating on how long 0.01% downtime is, I'd rather you consider other numbers which are of much more importance to you now.

  88. Backups by Craig+Ringer · · Score: 4, Informative

    Backups.

    With POP3, the client downloads mail and deletes it off the server. Without a significantly butchered POP3 server there's no way to hold copies of that mail for a period of time (say, to ensure it goes on to your archival tapes, or to make sure you can recover files the user deleted accidentally). It's one less thing to worry about if their workstation / laptop dies, too - just give 'em another one. If more mail clients supported LDAP address books and WebDAV calendars this would be even nicer; as it is I still have to keep their mail folders in their network home dir so I can back up their address book.

    You can back up POP3 boxes if you're on a corporate network, by forcing the client to keep its spools on the user's homedir. That tends to be slow and inefficient, though, and it doesn't let you do things like transparently split out attachments and store only one copy of an identical attachment for everybody.

    It's also easy to lose mail with POP3 if your client does something silly. Most clients seem pretty decent now, but I remember old Eudora versions used to DELE mail off the server then crash, corrupting their mailboxes. Woohoo.

    IMAP gives admins much more control over user mail. You can back up their mail folders, including their outbox and filed mail. You can enforce mail lifetime limits if your information retention policy requires it. You can store single copies of duplicate messages and attachments. You can give users access to shared mailboxes, and to each other's mailboxes where necessary. You can manage their mail folders remotely ("I can't delete $message, help!"). You can set up filters that deliver mail into sub-mailboxes automatically. Good clients automatically sync the IMAP mailbox so it can be used when the client is offline, like POP3. You can have your anti-spam software learn from their mail client's Junk folder. It's just much saner for business environments, in much the same way that network home directories and thin clients are much saner than a bunch of desktops with local storage are.

    IMAP also permits you to give the user a single view of their mailboxes from their desktop and when they're on the road, or accessing their mail from home. Don't even talk about "leave mail on server" for POP3 - users WILL misconfigure it and suck all their mail down onto one of their machines, then come to you looking for help cleaning up the resulting awful mess.

    Now, for an ISP, things are the opposite. You want to get the users' mail through your system and get rid of it. Most ISPs only offer POP3 and have small mailbox caps, so the user can't set their client to never delete mail off the server. They don't want to be responsible for user mail, they want it off their hands ASAP. An ISP can just tell a user who deleted a message then wants it back "well, that was silly then wasn't it?". An ISP doesn't want to back up 5 years worth of mail for 500,000 users.

    My point is that for corporate environments IMAP is so superior that it's almost nuts to offer anything else, but for an ISP POP3 is a much more viable option. So what's so bad about POP3 depends entirely on what your needs are.

  89. Quick setup by mseeger · · Score: 4, Insightful
    Hi,

    my recommendations:

    • Calculate with about 20-30 man days for the initial design. You'll need some software development for about 30-50 man days, 100 man days for setup, testing and fine tuning. Figures may wary upon skill and LWF. Time for integration into your backup service is not included.
    • Use a directory service with replication mechanism (preferred LDAP, we've done it with MySQL too). Every system except the load balancers will get a replica.
    • The user data is stored on machines with Cyrus . Depending on machine size, user profile, mbox size etc. you take between 5.000 and 50.000 users per system.
    • The directory service knows which user is on which system. Prepare a script to move users from one server to another (including the mbox).
    • Incoming IMAP connects go through a loadbalancer to frontend systems with the perdition proxy. Those will relay thre requests according to the directory to the responsible IMAP server.
    • Incoming HTTP requests will go through the loadbalance to an Apache with Squirrel on the frontend systems. Those will convert the requests into IMAP requests and connect to the local perdition.
    • Generate a web frontend for the user to setup auto reply, vacation and anti-spam settings.
    • From those settings you can create SIEVE scripts for the user.
    • Incoming and outgoing SMTP traffic is handled by systems with sendmail. Local delivery is handled by LMTP connects directly to the IMAP servers (cyrus can handle LMTP).
    • Antivirus and Antispam is handled through the milter interface and appropiate plugins. Plan for individual settings per user (can be generated from the data in the directory server).
    • Loadbalancing SMTP us trivial.
    • Add monitoring (e.g. Nagios), Backup and Restore (last one most important, nobody wants backup, all everyone wants is restore).
    • If desired, use a cluster file system for those IMAP servers to have even more redundancy.
    • Make sure you have access to the internal DNS of your company. If you can setup "mail.acmecompany.com" to point to several ips (depending on location) this may ease your job lot. If you cannot, this may be hard (and expensive) for your load balancers.
    • You can scale everything horizontal in this concept. Choking point may be the load balancers.
    • You can distribute the system easily onto several locations. Distribution over several continents is only recommended if you can either manage the DNS or the mail agent settings per continent.
    Please forgive me, if i'm not completely correct. I'm only the sales rep ;-). But we've done it several times for ISPs. OSS software usually does the biggest part of the work. Usually some components (depending on existing contracts and knowledge) are commercial software (e.g. anti virus, load balancers, cluster file system). Typical operating systems are Solaris or Linux.

    With backup support you should be able to setup such a system in 6 to 12 months (the later more realistic for big companies).

    Most probably users will complain about the lacking calendar.

    Most troublesome will be the migration phase (hope you realized i didn't mention it above). This depends so much on your current scenario that it is very difficult to give a general advice.

    > where would you start?

    Contacting me ;-). Perhaps get a budget first. As i said, i'm sales....

    Regards, Martin

  90. Stop right now by biglig2 · · Score: 3, Insightful

    What you have here is an opportunity for a tremendous open source win against exchange, and you are about to stuff it up because you do not have a clue how to do it.

    So, what you do right now is you go find someone who does know how to do it. And by that I mean someone who can demonstrate they know how. Which does not equate to having a low slashdot id; it equates to having done real projects of this scale.

    So, how do you start? You ring IBM and get them to come in and talk to you. You ring Red Hat. You ring Accenture.

    If you want impartial advice from someone who isn't a vendor (which is a good idea), then you go find some companies that has a million seat open source e-mail deployment in place and you see if you can get their messaging admin to talk to you.

    --
    ~~~~~ BigLig2? You mean there's another one of me?
    1. Re:Stop right now by vidarh · · Score: 2, Informative
      1m mailboxes isn't much, and doesn't require a big complex system. Been there done that, and learned a lot (both about what to do and not to do) in the process, but the short story is that it isn't particularly hard.

      The main challenge when I was doing it 5 years ago (I designed and wrote most of the prototype of a free webmail system, and managed the development team that completed it) was lack of good open source webmail solutions and lack of scalable mail storage systems, and hardware limitations.

      Today there's a huge number of GOOD IMAP based webmail packages, such as IMP, and mail storage isn't much of a problem anymore - you can get a couple of TB of storage relatively cheaply.

      Today, if I was going to do this in a corporate setting, I'd buy 3-4 small cheap servers to process inbound/outbound mail, 2-3 reasonably high powered machines with good IO capacity and RAID5 to split the users mail storage, POP/IMAP access over (IO is more or less the ONLY thing that really matters - whenever you need to make a choice, always choose higher IO capacity over almost anything else), 2 machines for an LDAP directory of which server the user is on, 2-3 cheap servers to run the web frontend on.

      All in all for that kind of scale, if your total cost pans out to more than 20-30 cents per user in hardware these days you're doing something very,very wrong (and you can manage for MUCH less depending on usage patterns of your users and how much time you're willing to spend on tweaking the software).

  91. Re:Army Knowledge Online does it for 1.72 million by joib · · Score: 2, Informative

    My university also uses the Sun Messaging server. But we're only about 15000 students, so it's not a huge deal. But it works really well, at least compared to the old system with NFS-mounted mailboxes; there were constant problems with that, and it was overloaded and slow too.

  92. Hula by Norny · · Score: 2, Informative

    It's unfortunate you got so many junk answers to your query (e.g. "resign", gmail, .mac, etc). I had a server running ~15,000 accounts on a Pentium 133 with IMail 7 a while back. It wasn't pretty, but mail got sent and received as it should.

    Hula claims to scale pretty well, integrate with ClamAV and SpamAssassin, and have lots of other cool gimicks for calendars and such. For 1 million accounts, I'd get some sort of dedicated spam/virus filter, though.

  93. Over here by guruevi · · Score: 2, Insightful

    We do it with a bunch of Postfix servers and MySQL. The MySQL is going to be clustered soon but currently runs separate on each server. Each server has MySQL and Postfix and generates statistics. Currently the most heavily loaded machine (10000 mail accounts) eats about 1-5% of CPU (Single Xeon with 3x72G SCSI RAID5). We estimated you can push about 100000 accounts/server given enough disk space (we are planning to put it on Apple SAN-solution) and separating the MySQL database. There are about 10 mails/sec. passing through the server (IN/OUT). An environment with 1000-2000 exchange e-mailaccounts takes up 2 dual proc. servers for the frontend and 2 single proc. servers for the backend (storage) needs migrated to a 70000$ storage solution because the current gives not enough throughput. The problem is that each times a secretary opens a calendar (eg. to schedule an appointment with the managment) all those mailboxes, schedules, calendars, notes are opened, searched through and synced (takes about 2000MB of datatransfer in a few seconds) while the IMAP protocol doesn't do that and provides the same functionalities.

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  94. Re:Army Knowledge Online does it for 1.72 million by ataddei · · Score: 2, Informative

    Sun has the Sun Outlook Connector that allows MS Outlook to behave normally while there are Sun Messaging, Calendar, Addressbook and Directory serves instead of MS Exchange. In addition Sun has a SAFE methodology and toolkit to migrate out of MS Exchange.

  95. What a shame. by jotaeleemeese · · Score: 2, Insightful

    Somebody that obviosuly has never been trusted with a challenge on his job.

    Sad.

    --
    IANAL but write like a drunk one.
  96. NMCI Blows by HangingChad · · Score: 3, Informative
    Just so you know. Most of us out in South East Asia refer to NMCI (Navy-Marine Corps Intranet) as the Not Mission Capable Intranet.

    When it works at all it's slow. Sometimes you can hit the Send button and just sit there and wait a while.

    When we have to work on a Navy project we had to start bringing our own equipment and hubs. Even their developer machines come loaded with 10 year old software and you can't get your email and be logged in as a developer at the same time. To check mail you have to log out, log back in under a different account, then log back in as a developer. The NMCI machines are boat anchors.

    NMCI is the worst defeat the US Navy has ever suffered.

    --
    That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
  97. Well by Shads · · Score: 3, Insightful

    In my opinion you're going to need a cluster of servers or at least round robin'd mx records for the servers. I personally think sendmail scales the best of the mta packages and offers the best set of features and ease of maintenance, although alot of people would argue it's intrinsicly insecure... I've never had problems, but I kept our mail servers up to date. I would seperate the smtp machines the outside world uses to deliver mail to your space from the servers used by users of your service to deliver mail. I would also move delivery services (imap, pop, webmail) to their own machines instead of having them on the smtp machine and you would probally be best to use a nas for the actual storage medium. This is actually a really interesting project. Good luck and let us know how it turns out :)

    --
    Shadus
  98. My 2p on where to start.. by chewitt · · Score: 2, Interesting
    My 2p, based on experience of designing, managing and being commercially responsible for large scale messaging systems for the last 6-8 years (where large scale covers 500k users to 9m users) is that you don't want to use OSS as the core for projects this size. This may sound somewhat heretical to the /. audience, but if you're serious about the uptime constraints (99.9% is light - 99.999% is where you need to be and 100% is what you should be aiming at) and weighing in that someone's business somewhere is going to heavily depend on the success of this system, you *need* the focussed support and SLA's that you will only get from a commercial vendor. You're still going to glue the system together with a number of open technologies and there will be substantial customisation to meet your needs, but the core of the system needs to be rock-solid. In general my experience has been that much OSS Mail componentry is fantastic at lower scales both technically and commercially, however the admin burden rises unacceptably when the collective sum of all those components needs maintaining - even when in the hands of highly skilled administrators. Mail platforms at these scales constantly have problems/issues in them somewhere due to the unpredicatbility of a million users alone, so one of your biggest concerns is how you overcome them. Being dependent upon the OSS community or internal resources to perform a root cause analysis and fix a code bug when your system is running live is not a situation you can afford to be in.

    Some things to consider: MS Exchange is a lot more than just mail. If Calendaring and other forms of group-working are involved then the task at hand is substantially more complex than for a mail only system. Also, these days with virus and spam being endemic the platform needs to incorporate a framework that handles them as well as policy driven content management controls at it's core rather than have them as bolt-in's or bolt-on's. Are you bound by any regulatory requirements?. Geography is a major influence, and if this is a business platform how does this affect your strategies for resilience, disaster recovery and backup of the platform? In a perverse way most of the decisions you have to make when building systems of this size are about business decisions (what's the cost of retraining users to use new mail clients is a favourite of CTO's) and it's not specifically about the products/technologies involved.

    So, exactly what type of hardware/software and surrounding infrastructure you need to assemble to create 'the whole' is a somewhat open-ended question without going into a decent level of detail on your requirements and the drivers behind them. However, once you go north of about 500k users the number of commercial vendors tails off dramatically. If you include group-working as a factor it reduces further. I'll not start suggesting names (I currently work for a vendor in this space and self-plugging's not in the spirit that /. operates on), but i'd recommend starting out by talking to some of the analyst groups that have staff researching this end of the messaging market (Radicati, Gartner, Butler Group) and then opening dialogue with vendors appropriately.

  99. Has anyone suggested Gmail? by mrlatito · · Score: 2, Funny

    Gmail is open to everyone now right....just sign up for 1,000,000 gmail accounts and go on vacation! Let the engineers at google do it.

  100. Notes/Domino by hey! · · Score: 3, Interesting

    Of course nearly everyone who uses it hates it, because it seems unnecessarily complicated. But this is precisely the kind of situation Domino was designed to handle: scaling. If you can get by with Sendmail, you don't need or want Domino, but if you want to manage a million email accounts, this is one of the first places I'd look.

    This is exactly what Notes was designed to do: scale. People have been building systems on this scale with notes for nearly twenty years. You can not only scale it by moving parts of your email system onto mainframe class iron, but you can distribute it and provide all kinds of flexibility and redundancy into your system to meet virtually any messaging requirement (e.g. choose an alternate MTA for high priority traffic when there are Internet disruptions). Naturally there's some complexity involved, but if you can get by with sendmail you probably shouldn't be using Notes.

    What's more important is that management of accounts and identity, which is distributed, delegatable, and backed up by robust cryptographic certificate management. You can let a subsidiary manage it's own accounts, they can subdelegate that to a division and the division can subdelegate that to the IT staff on site; at each level policies can be set, enforced, and changed for lower levels.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  101. Cyrus by DanFluidMind · · Score: 2, Informative

    I would seriously look at Cyrus (http://asg.web.cmu.edu/cyrus/), which is designed to be scalable for huge numbers of email accounts. And the email users don't have to have accounts on the Unix boxes. It stores the messages in the file system but sets up index databases so that accessing the mailboxes is fast. It can also handle single-instance storage of the messages sent to multiple mailboxes.

  102. Re:Qmail!! by hensley · · Score: 2, Informative

    Just do the math:

    1 yr = 24 * 365 h = 8760 h
    99.9% reliability => 8760h * 0.999 uptime = 8751.24 hours uptime or 8.67 hours downtime similarly 99.99% leads to 0.867 hours downtime = 52.56 minutes

    you're off by one magnitude!

  103. and then you need a copy by diegocgteleline.es · · Score: 2, Interesting

    You have a flawed assumption in that the file is read only. Exchange/Outlook will let you modify the attachment in place and keep it in your mailbox.

    ....and then, Exchange WILL have to write a new copy of the data, because you just modified it and the data is not the same than before - you can't use the same copy. If the 1000 users keep the same file it's fine, if they modify it you need 1000 copies about it

    Sharing something with people (which for some reason database people call "single instance store" I've learned today) can be done in both a filesystem and in a data base. Databases are "one-size-fits-all" kind of tools, not always the "best" solution, but one that you've lot of chances of making it work even if it's not the best solution. Linus said something similar when he was suggested to develop GIT in top of MYSQL...if you really know what you're going to do with the data, and you KNOW that a filesystem is enought, why use it? It's buying a 900HP car to your mother - STUPID. The "let's do it just because we can" is a good step if what you want is to write overengineered, bloated software.

    Because a filesystem IS a database. Except that instead of having a SQL-ish interface, you've a "read(), write(), readdir()" kind of interface. Which happens to be really fast (filesystems are implemented inside the kernel, they're reliable, they're much simpler, easy to manage, etc).

    When you use a database like mysql, you're just using a database in top of, uh, another database (the filesystem). Which has not sense. It WILL work, but that doesn't means is the "best possible solution"

    Despite of all this, BTW, hardlinks are NOT the solution for the "share a file between 1000 users" problem. It can be, but remember that you can't make hardlinks between different filesystems. I have no idea if you can use LVM to solve this, if ACLs + symbolic links can be used to implement this in a delivery agent. And if you cant (I don't really know), someone really should think about adding something to filesystems to allow it like plan9 did, because it has sense

  104. Re:Some info on how it works by TheLink · · Score: 2, Insightful

    "Ironically, Microsoft is developing WinFS which is supposed to be able to automatically hardlink files transparently, thus the filesystem will automatically support Instance Store for every application. This is actually a pretty neat feature!"

    Not if you really want a copy.

    For most normal users, disk space isn't a big problem. If it is, duplicate files aren't usually the cause of the problem.

    When I make a copy of a file, I don't want the O/S to just add a link to the same file.

    I want a frigging copy.

    If there's a bad sector or something goes wrong the chances are higher that I can recover the data if I have a _real_ copy.

    I use a file system for storing data. If disk storage was such a big problem, Google etc wouldn't be giving out GBs to users for _free_.

    I/O is a bigger problem. Disks store a lot more nowadays, but are not that much faster.

    --
  105. Re:qmail-ldap can do it by JacobKreutzfeld · · Score: 2, Interesting

    I used qmail-ldap to build a service which has had zero downtime in over a year, planned or unplanned. I had a handful of 1U servers offering SMTP(S), IMAP(S), POP(S), WebMail, and local DNS and LDAP caches. They stored mail on a backend NetApp accessible to all servers via NFS. One master LDAP server was where accounts were added, and it replicated to the cache slaves on each 1U server. I can add capacity to the NetApp, and add servers to handle load with no downtime. The 1U servers are fronted by a redundant pair of F5 load balancers.

    We were able to apply OS patches box-by-box, taking them out of service individually, but without any downtime to the service. Very nice.

    Others are using qmail-ldap for large ISPs, of the size you are asking about. Check out their mailing list.