Infrastructure for One Million Email Accounts?
cfsmp3 asks: "I have been asked to define the infrastructure for the email system for a huge company, which fed up of Exchange, wants to replace their entire system with something non-Microsoft. I have done this before, but not for anything of this scale. Suppose you are given a chance to build from scratch an email system that has to support around one million accounts. Some corporate, some personal, some free. POP, IMAP, webmail, etc are requirements. The system must scale perfectly, 99.9% uptime is expected... where would you start?"
I'd start by submitting a question to Ask Slashdot.
n/t
Oh FP.
gmail.google.com
Support the First Amendment. Read at -1
On the OS front, I'm assuming that you'd be allowed to use the OS of your choice as part of the design. Is that correct?
Why not gmail?
With an Ask Slashdot Question?
"It's not like your minds are as open as the source you love..." - Me to the majority of Slashdot.
I would start by talking to Kerio , their mailserver is very scaleable. www.kerio.com
bashing my head up against a desk.
Ooo man the floppy drive is broken. No wait. The computer is just upside down.
I'd start by contacting people who know how to do it and can actually help you. A few responses on slashdot aren't going to help you along the entire process. Maybe even bring in a consultant.
i believe that cyrus imap was designed specifically for large scalable systems. it can scale to multiple servers and uses a database for hashing the email... (afaik)
I've always favored it, and with some scripting/automation, I wouldn't see why you couldn't scale that large with inexpensive hardware.
I have a feeling you're not going to find the answer you're looking for, as the scale you're talking about is indeed beyond the scope of work that most of us work in.
For starters, uptime should usually be higher than 99.9% for this large a site. 99.9% uptime means 40-45 minutes of downtime a month. Try going for 99.99% at least, though this usually increases the cost by about 250% according to what I have seen a few years back.
+1 funny, -2 overrated. Life isn't fair.
take a look here: http://www.webhostingtalk.com/showthread.php?threa did=441925 .. the post by slidey is possibly the most useful.
Exchange!
I would have to say use Qmail on a freeBSD/Linux system. If you look at yahoo they have millions of email accounts and use qmail wich is very stable and very portable.
Wow, That is pretty huge scale but if Google, MSN and Yahoo have supported that many, and many more users all along open the back doors to see what they are doing? If it were me Linux obviously, Hi-Availability Clusters, some kind of solid indexing. Its still email :)
A million users and they want POP3? Add a gun and a single bullet to your administration requirements.
At IBM we use Lotus Notes which has saved us LOTS of virus hassles. Every employee has an account and we're something like 320,000 worldwide. The mail "databases" are spread among Domino servers but I don't know what platform these run on, or what hardware specs they have. I imagine it's either Windows or Linux... but who knows, maybe we're using some of our PowerPC-based iSeries servers. These are the boxen formerly known as AS/400.
I'd have to go with whatever system MySpace uses. I can't believe that any system could sustain such a heavy flow of pointless "X updated!" emails to hundreds of millions of users...
'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
Specifically, the pain-killer isle.
However, I'd personally ask Google. They've done it and even their search engine has information. I found an interesting link from there detailing the deployment of a large hundred thousand user mail system, from the architecture to the software located on Linux Journal.
I've heard surprisingly good things about Communigate Pro, though I have no idea if it scales that high.
Mirapoint is probably _the_ vendor to speak to, though.
I think they want an acutal company email. so the email reads john@company.com.
I'm sure other commercial vendors have it but I do know that large companies like ATT et al use it to handle their email. It's a shrinkwrap product that does it all and then some but it's very pricy.
I'm sure you could hack together something to do this much like what google did. Might take some time but it's totally doable.
2 years and no mod points. Join reddit. Because openness is good.
I'd call IBM, Red Had or any other large vendor and ask them. If ther are big $$$$ involved which there probably are the vendors will jump through hoops for you. You'll have your own little circus.
www.stalker.com
Is able to run clusters, and clusters of clusters, and theoretically scale into the hundreds of millions of accounts. Offers all the things you want, and more. LDAP, ACAP, etc, etc, integrated webmail. Intelligent directory creation structures, etc.
earthlink's mail server complex has come up on freebsd-isp a few times
this guy used to work at both sendmail and earthlink and he has links to some good resources
vodka, straight up, thank you!
Definitely beowulf cluster of "dead" BSD clusters...
No, but the answer is simple in two words....
Geographic Cluster
If that's too rich for ya, how about gmail invites? Slashdotters could come up with a million of those I bet.
Say hello to my little sig.
Lotus Domino is a viable alternative to Exchange, although it's probably not very popular with the /. crowd.
Whatever you do, I think the most imporant part is to think through the migration process. It's good you've already done it before, but 1 million people could mean a lot of angry phone calls.
Good Luck.
I'd start with talking to vendors. Consult with some sendmail gurus, Notes guys, etc. Any of these people/companies would salvate at the thought of being a part of a project this large. First, talk to the client and hammer out the real needs with solid performance requirements, timeframes, growth expectations, (meaning real numbers) etc. Put together a well thought-out Request For Proposal and send them out to as many applicable vendors that interest you. Then just stand back and play the role of ringmaster. The vendors will give you all the ideas you need.
Just do one thing, please: make sure that the client is honest-to-goodness serious about this. I absolutely hate getting pie-in-the-sky RFPs from people who are just kicking the tires. It's a good way to burn bridges by not looking professional.
Entrepreneur : (noun), French for "unemployed"
If I was your boss and found out your idea to architecting what will be a large investment, high uptime demands and a large user base was to ask slashot your arse would feel my boot followed closely by the pavement. This sounds like a pretty poorly run place, if you need to ask slashdot for this scale of thing then you are far better off not touching it.
dunno, but ask slashdot is probably the worst choice...
Oracle E-mail Server. Oracle can easily handle your type of data volume and up time requirements.
This assumes that you'll have the hardware for it, of course.
I'll ask my boss to hire someone with a clue instead of me because going and asking ./ is all I could think of.
There are three parts to your system: sending mail, receiving mail, and storing mail. Keep them separate.
Your receivers will be a bank of servers running sendmail. They will do appropriate spam processing to reduce the amount of mail actually received. They feed the data into the storage servers.
The storage system has the data partitioned out so that all the data for one user would go to one server while all the data for another will go to a different one. The storage system also has to provide POP and IMAP access. You may want a special setup where the IMAP or POP service known which server to go to. Investigate having one giant virtual filesystem so that the system isn't too complicated.
Your webmail access will use IMAP to access the actual mail. It can be a completly different system.
The sending system will be a chokepoint for all outgoing mail. You are going to scan it as it goes out to look for virus-sent emails or unauthorized messages. For instance, you may want marketing email to be processed differently than inter-office email and such.
All of these systems will be running sendmail. I know sendmail has a bad rap for being insecure, but the insecurities have been found and since fixed. It is by far the most manageable system when it comes to large-scale deployments with heavy customization.
The radical sect of Islam would either see you dead or "reverted" to Islam.
I agree. The google appliance should implement gmail and a web front end for administration. Like the Colbalt machines of yore, only better. Google-ified.
It really is the best email.
They know everything, they are the "uber" in ubersmart.
Or was it the "goober" in goobersmart?
anyways....
www.myrealbox.com is a tech demo of NetMail and eDirectory.
"[We'll be] really getting inside your head and making it an unpleasant place to be" -- Trent Reznor
they're probably using the groupware too. Are they also willing to ditch outlook?
If you're looking for a groupware replacement, then you've got a big job ahead of you. Scalix is a mess, bynari is a hack, etc. When you do get them running things end users end up buying like PDAs and apps that hook into outlook are going to cause more problems.
If its just pop/imap you really can't go wrong. A good webmail option is kinda a catch. Squirrelmail is nice, but compared to OWA its really out of its league.
If your post told us what they were fed up with and how they used their system you'd get some real advice. Expect the usual postfix vs qmail vs sendmail vs whoever mini-flamewars.
What if he IS the highly paid consultant?
dont count out the possibility of exchange. If setup properly it can be very powerful and scaleable, and it is easier to administrate then most unix alternatives.
...they need to think about this very carefully.
I'm sure someone, somewhere within the enterprise is using features of Exchange that they won't get anywhere else. Not to sound like a Microsoft fan-boy sock puppet, but there's some features that Exchange has that people in a business environment just love.
However, since you asked. I'd run Exim or Qmail and Cyrus IMAP.
My opinion for what its worth is that you'll have a hard time meeting all those requirements perfectly in one product. You usually have to do a trade-off because some systems are more scalable, but may not provide great webmail, while others may not manage free accounts as well as others but might have a great webmail interface. I think you have to get some real requirements as to what are the most important requirements for the email system, and meet them instead of looking for the perfect product, I doubt that you'll find it ...
GPL all the way!!!!
Linux of course. 2 machines. You can use wimax for the interweb.
Did I mention FSF only? Corrupt software sux.
LAMP has to be used, it is sooo much better. Mysql can scale to 1Billion users so its best bet.
Perl should be used for front end as it is fast.
Maybe ajax rendering, but need GPL component as AJax - GPL = SUX.
What are they fed up of it about? I think it would be easier to recommend if we understood the current problems. I mean, exchange is fairly awesome: - MAPI for a mail client protocol is hard to beat - The webmail client is quite good - The integration of calendars as such Isn't exchange the benchmark everyone is trying to reach? Why go backwards? I know not everyone loves microsoft, but exchange is really good stuff.
Here's Slidey's post. (Disclaimer: Copyright blahblahblah appropriate people yadda yadda fair use etc etc don't sue me, thank you)
---
ok i work for a large uk isp in the messaging (email) operations dept. we currently have 2.5-3 million active accounts (and a load of suspended), and manage anywhere upto 12-16million mails per day
our setup is like this (this is simplistic though):
front line - anti abuse mta's - these do dnsbl type lookups (spamcop, spamhaus and sorbs). we have 9 incoming
next we have mta's. they farm mail off to brightmail servers, which do similar to spamassassin. we have 6 incoming mtas, and 8 brightmail servers (not enough - high load)
after that they farm off to vscans (6)
after that any mail that gets through is delivered to mail stores (8 + 2 hot spares)
what you want to be doing is similar to this above - chaining hte mail from one level to the next. the first level should be the rbl's - these are less processor intensive, and can remove a fair whack of your mails in one swoop. spamassassin is going to be more cpu intensive, since it has to open each mail and read the first x many bytes
id have separate machine(s) holding your master directory, and if you can get directory caches then do that too (to take the load off the master directory) - ours run oracle
i dont know what your budget is, but split up hte different tasks as much as possible. that way if you need to add more to any pool (rbl lookups, spamassassin etc) you just add another machine..
one last thing - we also have a separate box just for postmaster mail (with exim + spamassassin funnily enough) - it tends to get busy
Last edited by Slidey on 09-08-2005 at 11:19 PM
--
(end of quote)
There are companies that specialize in this, haven't followed them recently, so don't know who is still in business.
Check criticalpath.net, another possibility might be commtouch. I once dealt with these and other companies when looking to outsource the email portion of an internet service.
commercial sendmail on veritas clustered front end, fiberchannel storage on SAN for spools, probably with an ldap layer providing internal routing and backend for user profile data? -jms
http://www.zimbra.com/ The flash demos look nice anyway.
Done it. With Exchange, believe it or not. 2.5M seats, in a single Exchange/NT environment (not single server farm though - it was distributed...)
You haven't defined your real requirements, nor what 99.9% uptime means, really. For such a large site, generally 99.9% uptime is defined in terms of full site responsiveness, outside of maintenance windows. Anything less is suicidal, and I'd walk away from. Maintenance windows should more than cover your backup windows, planned upgrades, etc. This doesn't mean that you'll use each available window, on, say, Sat night from 8-4am or something, but it gives you a nice window for major planned events.
The cesspool just got a check and balance.
Novell's created an open-source mail server project called Hula that's based in part in part from their original NetMail codebase. It's aim is to provide a mail server that's easy to use and also scalable. Disclaimer: I haven't tried it, but have only heard about this.
Are we doing your homework for you? One would have to think that a company of such size in the real world would hire somebody who doesn't have to ask slashdot how to do his job.
I'm not sure that there is any commerical solution that can support 1 million emails well. Hence why Yahoo and Google have built there own custom systems. Some engineering may need to be required.
For pop3 & imap4rev1, look at:
http://www.dbmail.org/index.php?page=overview
Still need an MTA, I think qmail is the fastest, best, but I'd used exim, as its easier.
Database - not sure if MySQL and PostgreSQL will scale with dbmail.
I'd say use FreeBSD, because of the ports collection (Don't linux Flame me). However, something like Solaris 10 x86 (or Solaris+Sun Hardware) might provide a bit better scaling, and HA hardware, SAN support, support in general, etc. Though, a bit tougher on the OSS software installs (In My Experience)
http://zimbra.com/ look at Zimbra
/is awesome.
Try the hosted demo. These guys, and their work, are
disclaimer: I do not work for them, but it would be cool if I did.
http://www.jetcafe.org/~npc/book/sendmail/
A good book on sendmail performance tuning, although a lot of it covers the OS.
Then get The Practice of System and Network Administration.
http://www.everythingsysadmin.com/
I know it'll get blasted, and I thought it did suck originally, but I am surprised by its scalability and reliability.
It's not free, but it's not dependent on the Linux community either. There is a concentrated and very very dedicated support and development crew. Message store size can be up to 1/3 the size of Exchange, and moving servers around is a cinch.
I'm not a Groupwise admin or anything, but I have been and Exchange guy, and I feel your pain.
at the top of the pyramid - you have your mx servers/clusters... slap on your postfix+amavisd there to filter unwanted crap or cluster some barracudas... those would pass the wanted email to your routing servers/cluster at the middle of the pyramid. Then those would pass the email off to the pop3/imap servers (maybe one for each dept) at the bottom of the pyramid - enterprise-grade, connected to fibre-channel san ... Open source should do the trick if you have the $ to buy the hardware needed but not the software - I hear communigate pro (sp?) is nice if you are looking for something commercial. google for postfix or qmail for some nice howto's on the free stuff...
Using this as a reference point (and from recommendations I've heard)...
I recommend CommuniGate.
E-Mail Server Setup Advice?
As a VPC/LPIC (VMware Certified Professional, Linux Professional Certified) consider using a blade solution from IBM or DELL with VMware VIN (Virtual Infrastructure Node) installed to keep your server OS installations abstracted from hardware. Use RHEL as your guest OS, which will run your specific software applications.
I'd be more than happy to consult a large-scale VM installation.
Not just IMAP, but the whole shebang (MTA, webmail, POP3, IMAP, mailinglists, etc), plus you'd want OpenLDAP for storing all those passwords. I'm not sure how to set it up redundant and distributed, etc, but I'd wager that someone at the courier-mta website could point you in the right direction.
qmail is secure and scales wonderfully
:)
maybe first post??
lotus is great , as a poor BOFH who normaly admins Domino who is now due to the crap employment market admin an exhange set up , lets say I feel pain, lots of pain .
Domino - good security model , easy to implement easy to keep secure , built like a truck and can take abuse , ie mail file size that would annilate exchange , domino does not break into a sweat.
you will handle that many users without hassell
cheaper than exchange
less security hassles
simple and logical to set up
call IBM at the amout of users you have they will be selling their first borns to get your buisness.
good luck and enjoy migrating from exchange
I find it hard to believe that Exchange Server supports a million accounts in any sort of configuration that wouldn't barf on itse;f every 30 seconds.
Just what is it this is replacing?
I would have to think that you want support for a setup like this. Your options realistically probably boil down to one choice.
You'll need a vendor with proven big time support and, unfortunately, OSS is not something you may be able to look at.*
With proven installations as large as 400,000 users in a single organization, your only choice is.....Lotus Domino. Pricey though.
* Wasn't Hotmail originally running on BSD? You may want to check it's history.
I have several gmail accounts I can give you. Once you have serveral of these you can assign gmail accounts to the rest of your users. :)
It's a good mailserver ... a million accounts though ...
The higher the technology, the sharper that two-edged sword.
Might want to have a chat with Matt Simerson over at http://www.tnpi.biz/
e atures.shtml
http://www.tnpi.biz/internet/mail/toaster/intro/f
Good luck!
From my experience postfix scales the best for sending and receiving email. Use postfix+(mysql or ldap) + amavisd-new + clamav (or some proprietary alternernative) + spamassassin. Cyrus is probably the best for pop and imap access. Squirrelmail for webmail.
1) It'll run on anything - Win32, Linux, BSD, Solaris, x86, XServers, Alphas, Power5
2) It'll scale as big as you can dream - over 5 million accounts with clustering
3) MAPI support
An scalable, open-source based email server particularly well suited if you have multiple domains etc. is Limacute, developed by Linpro, a Linux experts company in Norway. It is GPL and in use by at least one large mail-centric ISP.
There's also the Hula Project. It is based on Novell's NetMail. Novell used to claim that a single server easily could handle 100.000 users. The Hula project is adding calendar and other features.
I worked at a company that hosted mail for other companies. We had POP/IMAP/Webmail plus a bunch of other services.
Our secret to making it work? Qmail.
For an installation like this, Maildir format seems like a must to me. Plus, almost all of the free webmail clients support or require it.
Like others have said, you have to separate your storage, inbound mail, outbound mail and webmail services onto different hardware. Since we had many millions of mailboxes, we also had proxies in front of all of client-facing servers to help minimize the impact of an individual server having issues. But these are all basic network design issues. The key is a secure, configurable MTA like Qmail that stores it's mail in a friendly format that other apps can understand. De-couple everything and you should be able to scale up to AOL size, if need be.
"Don't blame me, I voted for Kodos!"
A million used to be a lot of accounts, but now it really isn't. The real questions are: how many mails will be sent and received every day, how many during the peak minute of the day, and how much long-term storage is needed. At [name of company removed] which hosted zillions of email accounts, we found many unexpected problems with the storage, such as NetApp being unhappy with hundreds of billions of tiny files, Solaris NFS being unable to deal with filenames longer than 32-characters, and so forth. But handling the front-end tasks of SMTP, POP, and IMAP for 1m accounts wasn't so difficult.
If I were in your position I'd just call up HP and outsource the whole thing. If it was truly necessary to keep it in-house, I'd probably throw together three separate beefy machines: one to deal with the IMAP/POP clients, one to deal with the inbound queue, and one to deal with the outbound queue. Probably qmail or any other standard mailer would work fine. For the storage, you could use a small SAN with GFS.
My number one suggestion is hire someone who has built scalable mail systems, and written tons of code to support them: Matt Simerson
You can learn about him, and his mail projects at http://www.tnpi.biz/internet/mail/toaster.shtml
-Chris Knight
-- This sig is only a test. If this were a real sig it would say something witty. --
Openwave is definitely one way to go. Of course, I say this as someone who worked on the system, but nonetheless, it is designed to scale to millions of users across many hosts while still maintaining a single point of administrative interface. At the time I worked there, it was the ONLY mailserver that could scale that high OR offer a single interface for the entire cluster. That could be different now. Yes, it costs, but if you are supporting millions of users, the money is nothing compared to the costs of maintaining a cobbled together system. It is the email server used by many of the tier 1 ISP's and webmail systems.
Please let your superiors know so that someone more qualified can be hired to take your place.
...that is, someone who doesn't need to ask the motley masses of slashdat for advice.
In other words, YOU'RE FIRED!
however, kudos to your comp for dropping Crapland Brand Software (aka MS)
I once installed a 13 Million account mail system on a Linux infrastructure. As far as I know, it is working nowadays (I left that company).
The keys were:
- qmail (but postfix will work better nowadays)
- smtp (4 machines)
- pop/imap (4 machines)
- separated webmail 1 is enough (2 to high availability)
- NDS (Netscape Directory Server) which is now owned by RedHat and opensourced.
Hope that helped.
My slides relevant to this discussion can be found at http://www.shub-internet.org/brad/papers/dihses/ and http://www.shub-internet.org/brad/papers/sistpni/.
And yes, Nick Christenson has been a long-time friend and co-author of mine.
Feel free to contact me directly if you want some referrals.
Brad Knowles
http://daily.daemonnews.org/ -- if you're not
Agreed.
QMail should be able to handle it fine, though he should of course expect to have the load distributed to quite a few machines.
I'd probably also set up a sizeable group of mail gateways on incoming mail, to filter the mass amounts of spam and viruses that a million email addresses are going to bring.
Ahh, you need AOL Mail. They have over 1 million [l]users :-P
Sun Microsystems JES (Messaging, LDAP, Calendar).
Problem solved.
No, not free. No, not open sourced. Great performance, full, robust, integrated enterprise level systems that can handle 1 million accounts like cake (I've dealt with JES/iPlanet deployments in the tens of millions of users).
How do I do my job again?
Thanks
-Hobotron
There is truth in humor.
I bet Google would be willing to sell you a solution.
I'd start seeing what universities near you use. They won't be as big, but a large school should have circa 100k accounts and a lot of the same issues you'll face. They may already describe their infrastructure somewhere on the web. And offering to take two or three of the mail guys out to lunch or dinner will get you a ton of the nitty-gritty details and smart questions to ask yourself (and vendors).
Then once you think you have a solution, budget plenty of time for extensive testing against simulated load. Make sure you simulate failures by, e.g., pulling plugs randomly. Buy the hardware and software *after* you're 100% sure it works, not before. And where possible, roll your solution out gradually, so that small problems don't turn into MCFs.
Contact IBM. A mainframe running z/VM is your solution here.
0 /
99.9% reliabilities is more then normal for those machines. It is modular enough to expand to what ever you may need in the future, and it has the dataprocessing horsepower to actually hand the 20k or so concurrent users at a time and have the harddrive space to match that many users as well.
Run linux or unix on top of VM and you should be fine.
Product Page for Z990:
http://www-03.ibm.com/servers/eserver/zseries/z99
That said, it's a beast of a system, not the easiest thing in the world to administrate by a long shot and Sun's commitment to further development seems a little "lacking" lately. It's also not especially cheap, but you should be able to negotiate some massive discounts on a deployment of that scale (well, what did you expect from Sun?). You should definitely also be thinking about getting a few people on Sun certification courses if you do go down that route.
UNIX? They're not even circumcised! Savages!
Since you are coming from exchange is it just the email that you are replacing? Is the customer expected calendars, pub folders, and all of the other nick-nacks in exchange?
-- if you mod me down, I will become more powerful than you can possibly imagine
qmail-ldap is best suited to this task. Reasons:
1. You can sleep at night knowing that you're running the only MTA in widespread deployment that has never once had its security compromised; in fact, qmail's author Dan Bernstein still offers cash to the first one to be successful...
2. You can sleep at night knowing that the core MTA, qmail, has reliably handled some of the largest e-mail operations in the history of the internet. Its design is such that on a properly configured system, you'll never lose a single e-mail. Hotmail actually used qmail for a long time, even after Microsoft bought them - Microsoft repeatedly tried to replace it with Exchange, which kept buckling under the load.
3. Qmail is very modular, allowing you to pick and choose your components wisely.
4. Qmail uses the Maildir format its author pioneered. Maildir is NFS safe, not proprietary/complicated (often binary formats like PST are subject to corruption), etc.
5. LDAP makes it easy to manage massive amounts of accounts.
In any case... qmail-ldap is already running large sites with millions of users. Info:
http://www.qmail-ldap.org/wiki/Documentation
I've set one of these systems up on an IT cluster at my current office, and I must say that it is not only very robust but also really easy to manage.
We run 10 front end mx boxes that run postfix and deliver via lmtp to 10 lmtp servers that deliver the mail to netapps. lmtpd handles virus and spam filtering. Works like a charm. ipvsadm is a godsend.
(1) Plan an server setup which can handle the load. The requirements may change, but one million users is a fair bit. How much average incoming and outgoing emails is that? Figure that out, using a network sniffer or sniffers on existing traffic if need be (although logs should work). Then use this to calculate a number of servers needed for an outgoing smtp farm, an incoming MX farm. Figure out how much storage space is to be provided per user, and then figure out how you want that storage space to be accessible. Probably your best bet is to have a round-robin DNS farm of imap/pop servers which proxy connections based on the users login to a backend farm of actual mailservers responsible for storage. Plan the ability to move users from server to server to rebalance as needed. Outgoing smtp is a lot easier since you're not really storing things long term. Plan a web farm for webmail. (And pick software) Don't forget to plan some sort of backup, and make sure your system is flexible as far as email retention; chances are the email retention policy will change at some point and your setup should be able to change with it.
. html
(2) Test. For each server, hammer it. Test it's load under as close to real world circumstances as you can. Then create unreal punishing loads and see how it handles it. Plan in advance for how your server farm handles something like virus-generated mass emails causing 1000% spikes in load.
(3) Using your testing results, spec out the actual hardware. RAID, cheap hardware, redundancy, etc. If you have control over the network choice, plan a location with multiple fiber trunks coming into the building and provider redundancy. Remember backhoes in concert? Don't get hit by that. Plan for server failures, drive failures, network failures, power failures, and security compromises.
(4) Deploy! If you did the rest right, this is the easy part. You'll have redundant network connections, HSRP, redundant switches, a proxy farm, an imap/pop farm the proxies connect to, an smtp farm for outgoing emails, and a web server farm for serving up webmail (depending on how you choose to architect the disk space, the web farm and the pop/imap farm may be one and the same; depends on how you set things up.)
Here's a starter link to a setup which is smaller but, in principle, fairly similar:
http://www.itd.umich.edu/umce/features/2004/cyrus
Finally, if you don't want to screw it up, ask someone who has done it before. Paying someone $300/hr for a 10-30 hour review of your plan is dirt cheap compared to horking the setup. Someone who has worked in huge email environments (a la, hotmail) could show you gotchas before they bite you. (If you need help figuring out who to ask, I could even point you to some of the appropriate people)
Slashdot should not be used to solved your homework assignments!
Gee whiz... I'm surprised that the groupware is getting tossed out. If as small as 20% of the user is accustom to Outlook Calendaring, they'll represent 95% of the complaints in a new system. An advance warning to all existing account should be mailed out (both paper and email) so that nothing falls through the cracks.
Now to the mega-infrastructure that I set up for an undisclosed company for under 50K (and also didn't want groupware).
1. Transport Sender (sendmail). That's right! Good ol' plain sendmail scales. It does require some pretty savvy tweaking so get Sendmail.Com consultant onboard just for this. Use SleepyCat DB for speed for all sendmail setups. For one million, I had about 23,000 transaction per minutes during the day. You'll require 10 servers for this for cushion (against some idiots sending an ISO attachment).
2. Payload receiver (sendmail). A second group of machine to handle the reception of SMTP payloads.
3. IMAP4S/POP3S - Hey what's with the "S"? Nothing like sending your user's password in the clear. Unless you enforce VLAN in your corporate environment and limit all IMAP4/POP3 to VLAN, the "S" is a mandatory security feature, inside and outside. Guess what "S" stands for?
4. Webmail - SquirrelMail - Yet another dedicated server (in which I had to add two more load-balanced server to handling the growing pain). Use https for login only.
5. AntiVirus (ClamAV) - It was the best back then, now its just running in the middle of the pack. sendmail has milter that allows extensibility such as MIMEDeFang, wilter, rureal (reverse-DNS check), spamassasin, and SPF.
6. Support - Half the effort is put into those webpages that would 'hand-hold' these newbies into reconfiguring their machine. Worth the effort if you have over 20 expert PC users that can do their boxens. Otherwise do it yourself at each PCs. These pages should cover Thunderbird, Evolution, as well as Outlook and Outlook Express.
7. Learn to spin 11 plates, one on each pole. Keep them spinning... If they start to drop and break, bring in some more Unix dudes.
Postfix + Cyrus IMAP and Cyrus POP3. Seperate your systems out (MTA v. Final Delivery). Use Cyrus Murder (as in a murder of ravens, or a cluster for us normal people.)
Back it up with LDAP for all the joyful goodness it bears (authentication, address books, etc.) If you want stronger authentication, add in Kerberos V.
I'm sure others will suggest all-in-one packages, but most of the ones I have seen are really some combination of Postfix or Sendmail combined with OpenLDAP and Cyrus, anyway.
Take your time to think about load-balancing, storage, and test, test, test!
It'll be a couple weeks of work (assuming you already have hardware and networking and storage gear), but you'll likely end up with a bulletproof mail system.
I used to think printing on on Unix sucked. Then I figured it out. Printing on Unix *does* suck. Like a Kirby.
use zombie machines
This is the best advice he'll get? Sheesh.
Think this through -- a lot of e-mail programs check every 20 minutes. Assuming I actually hit any without duplications, I could potentially need 400 minutes or over six hours to get all my mail. Since it's random, it could take days.
And that's just for starters with this lame scheme. If I want to check mail, say, from the field on a dial-up once a day... hopefully you can see how badly this would suck.
What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.
Sometimes it's best to just let stupid people be stupid.
If my email system designer were satisfied with almost nine hours of downtime per year, I'd find a new designer.
You get to spend hours each day playing FreeCell while you are waiting for Lotus to open a single email. I've gotten quite good.
You're an idiot.
Hire someone who has done something along that scale
I hear poaching MS staff is all the rage these days.
What about fusemail?
This is exactly the kind of service they offer.
Anyone have experience with them?
Nothing is inexplicable; only unexplained -Tom Baker, Doctor Who
Earthlink uses that in part, or did when I worked there. However, instead of getting a random chunk of your email, all the servers connect to the same file servers so that you get all your mail no matter what machine you hook into. Done right, it's just as fast and you don't have to worry about missing a time-sensative email because you never logged into the one and only server it's on.
Good, inexpensive web hosting
Chances are you're not going to be just turning off those Exchange servers, you're going to need to migrate the data. That being the case your going to want something with good migration tools that can handle that much migration in a relatively speaking short amount a time. I just completed an Exchange to Groupwise migration and there are some really great migration tools out there for it. Groupwise also meets all your requirements out of the box. Not to mention by buying Novell you're (at least indirectly) supporting open source. I'm not as sure about Lotus Notes, but regardless if your going to have that many users, you want big name vendor support.
At best, you have to consider how your face will look with a chair shaped dent in it. At worst, he may bury you, then start throwing chairs at your grave. Either way, you should probably just stick with his shitty Exchange until he requires you to replace it with MMail, just like GMail, but ... better, more ... expensive.
1 million accounts doesn't really begin to explain what kind of solution you need. How many data centers do you have, what is your existing infrastructure, what kind of support will you be expected to provide to end users, to administrators, do you have special security needs to address (HIPPA, etc) et c. ...
You really need to determine what your needs are for the immediate future, then figure in growth, before you can start thinking about which solution or set of solutions (more likely I would think) is appropriate.
I could of course go into far greater detail, but then I would be looking for a piece of the action...
Good luck on finding what you need.
You misspelled it. The correct phrase is "Groupwise does suck." Majorly. At work, the FLAIM databases are always corrupt and in need of repair; the client is slower than molasses; the server is always crashing and must continually be rebooted. Oh, I'm sure Exchange is probably worse, but don't fall for Novell's worthless claims that Groupwise is a capable mail-handling platform!
I was told by freind of a freind that this is what a lot of ISPs use so we gave it a try. We downloaded the fully functional demo, installed, ported users over and started using it in about 30 minutes. Spent another couple hours customizing the web front end (which I'm sure could have been done much faster by anyone with a little graphics talent).
Good luck, don't forget to reply yourself on what you choose.
I am 100% serious.
I am very small, utmostly microscopic.
Split everything.
/var/maildir/f/foo/)
- Incoming MX's (exim)
- Spam checking
- sender verification
- greylisting
- route mail to the IMAP stores
- IMAP stores (exim plus maildir + dovecot)
- break out people on different servers (a = imap-1, b = imap-2, etc..) trivial to do w/ exim and dovecot, also break out the maildirs by letters
- LDAP/Mysql
- some kind of directory to store username, passwords, which imap store, etc.. on.
- outgoing MX's (postfix)
- postfix queue handling can't be beat
- smtp auth for users outgoing mail.
- IMAP proxy's (perdition)
- http://www.vergenet.net/linux/perdition/
- U
Look at postfix + mysqlh tml
http://www.sweeney.demon.co.uk/pfix_imap_virtual.
Mostly, U will need a cluster for everything.
If you are seeking for a all around opensource, start with this link, later, to use LVS, the tool for makeking load balancing clusters go here:
http://www.linuxvirtualserver.org/
And if you really are looking for a opensource cheap software costs (not very cheap tco) also you can build your OWN san with ata over ethernet:
http://sourceforge.net/projects/aoetools/
And for webmail a usefull but also ligth interface:
http://www.squirrelmail.org/
With all the licence cost savings, you can Invest a lot of time, and have a fair amount of flexibility.
Sendmail inc, has high availability solutions:
www.sendmail.com
Also, you can spend a lot of money and buy a very bit IBM machine with lots, and lots of lotus notes licenses, with that kind of money spent, you can put IBM at your knees if a lawer makes a good contract..
Also, to complete the solution you can setup nagios and mrtg for monitoring.
http://www.nagios.org/
http://people.ee.ethz.ch/~oetiker/webtools/mrtg/
I think, to setup the hole thing, U will need, like about 50 good servers, (maybe u can try IBM openpower with virtualization, it IS a risc CPU), and like.. humm.. a month of technical tests...
The mysql backend will give you centralized administration, LVS will provide scalability and good servers will give you uptime...
And if EVEN you like, you can make a Linux Routers using sangoma hardware:
http://wwww.sangoma.com/
Everything can be done with Linux by now... The cuestion is how much responsability do you want to have regarding the stability, and overall functionality of the solution.
IBM, HP, RedHat, SuSe, and ANY Linux Consulting firm would be interested in having you as a success history.
Good Luck, and May the Source be With You
Â_Â
If it's a large company, they have money. If they want software with a corporate vendor behind it, look no further than Vircom's ModusMail software. It can authenticate against a wide variety of sources (AD, SQL, Radius, LDAP, etc.). The user mailboxes can be stored on a SAN array to deploy multiple front-end servers to increase uptime. Supports POP/IMAP/webmail, etc. EXCELLENT spam and virus filtering built in with automatic updates every 15 minutes. Admin is Windows GUI app or web-based. Cost is higher than you would pay for a FOSS solution, but uptime and ease of management make it a great option to look at. I don't work for Vircom, but I am a satisfied customer. See: http://www.vircom.com/
It's used by many of the largest and most succesful companies across the world. With good reason - it works!
Now keep in mind this is a gateway solution: It screens incoming and outgoing mail for spam and viruses plus a whole lot more.
You still need a solution for storing the mail for 1 million accounts, but Ironmail will interface with any of them.
www.ciphertrust.com
They'll be looking for jobs soon enough, and they're as qualified for this as they are for their current jobs.
(he opined with karma to burn...)
"Win treats sysadmins better than users. Mac treats users better than sysadmins. Linux treats everyone like sysadmins."
postfix and dovecot-1.0alpha (the alpha version is a misnomer, its been very stable for me for a long time before).
make sure you have high quality hardware, and don't forget to secure the system. *make* everyone use tls/ssl and smtp-auth. have your users use port 587 (and 465 for those that need it), so you don't get blocked by all of the port 25 blocks (which exist just about everywhere).
I don't quite follow the round robin solution.
Wouldn't that mean that I'd have to press send/recieve for 20 times and there could still be a chance that some of my mail is left on a missed server ?
--> Insert Funny Sig Here
Does anyone have any ideas on what one would do if they had users who depend on a Blackberry? I'm sure that if you have that many users it is quite possible that some of them already rely pretty heavily on them.
AFAIK (I could be wrong) but there doesnt seem to be any effort by Reasearch in Motion to include sendmail (or equiv.) support for their Enterprise Server product.. Not to mention real-time calendaring and contacts synchronization..
....move along....nothing to see here....
OK having done this before there are a few tips. Sepperate everything inbound MTA's, outbound MTA's, Web, IMAP, Datastores and filtering to scale you need servers with very different requirements at every stage. Glue everything together with LDAP or SQL just remember it needs to be dynamic assume that you need to seamlessly move users from one data store to another on a regular basis and expand partitions on the fly (LVM is your friend here). The ability to alter the data flow on the fly is a must to perform maitnence etc. SAN's can be your friend here as it makes moving data around easy but it can also up the cost (10k gets you a nice server with 4-5TB's of disk 10k dosent buy squat for SAN gear) if I could do it all again I would look at iSCSI rather than IMAP/POP proxy's/NFS along with with a cluster FS.
As to software side go with what you know unless it's just incapable of doing the job. I like sendmail the next guy likes qmail (programers like it lots of easy SQL hooks) but overall having the techs be knowledgeable in it matters the most. For some cool hardware bits that can speed things up look at solid state (RAM not flash) disks for temp spools as just about everything besides MS respects the commited to disk requirement for SMTP and thats a big performance issue.
No sir I dont like it.
divide et impera
like any project, the best was to start is still thinking about dividing the original problem into small chunks. setup some rough timeline and find the correct person for each task to take over.
one big point would be all kind of administration and distribution of settings. maybe, LDAP will be the account storage of choice, but maybe, there are some other reasons against it (e.g. the need for a possibility to bulk change user properties fast and easily).
another big point is mail storage, affecting all kind of clustering you think of. huge and redundant NFS boxes are open for changes, but other storage solutions may be faster,....)
or your network design? your smtp hosts will receive tons of mail every day, should they be the same as the ones doing virus/spam checks and bouncing of non-existant accounts? it's not very useful to check inbound mails you dont's have a "local" recipient for. atm, at work, i use a setup, where the first 2 smtp hosts (load balanced, but could be via dns too) just check if the account ldap if the account exists + some minor checks, to relay internally to the next step (this way, there are free ressources for larger worm and virus waves). step 2 is virus scanning and spam detection (just adding headers, bigger machines, the cpu intensive part). finally, step 3 is the correct delivery system that stores the message itself in the user's mailbox.
at the same time, networking issues including firewall, internal interfaces and administrative access should be part of this considerations.
the easy setup combination point: after some research, the best for the last delivery and user access i found was qmail-ldap and courier. in your case, that depends on your requirement specification. comparatively easy questions like the mailbox format of choice should also be solved by this group.
security has to be an issue for all parts. how are you planning to secure your outer smtp? are there requirements that people from other networks (e.g. at home) use your smtp for sending mails (not using webmail, that'd be easier)?
monitoring & backups - 99.9% is a hard task, find the correct way of monitoring parts where you expect problems.
and in the end, the group where you will mostly spend your time: side effects of unclear specification.
If you are in charge of setting up an e-mail system for 1 million users and you are asking Slashdot for advice then you are in way over your head.
Tell them to use Google gmail and then start working on your resume.
Start by listing the requirements. You've got some protocols, uptime, and number of accounts, but you need more.
Storage requirements? Simultaneous usage? Likely peak load times? Account management? Backups? Disaster Recovery? etc...
The mail part is actually fairly trivial in comparison to building the necessary underlying infrastructure.
Computer Hacking Skills, Bow Staff skills, Bike Jumping skills.........
If you are intererested in commercial packages, either Sun's Java System Messaging Server or Openwave's Mx product will easily scale to a million accounts and beyond. Many of the larger ISPs are using these packages or have their own custom mail server. Other possibilities may be Mirapoint(who offers an appliance type solution) or Sendmail.com
If you are into benchmarks, the folks at SPEC have published results from several packages.
Suggestions like using a public email service such as Gmail is just plain silly. Asking on Slashdot is almost as silly.
No offense but if you have to ask hear, you are probably over your head on this job. I would seriously consider securing the assistance of someone who has done this before.
There are many highly scalable and non-ms servers available... examples: www.tnsoft.com, netwinsite.com, www.kerio.com, etc...
BUT the question is...
- what features do you require?
- why is the client fed up with Exchange? (which happens to be one of the premier and among the most ubiquitous corporate email servers)
- what OS will this email system operate on?
- What is your budget?
The CEO, Julia Hanna Farris has 20 years of experience working on messenging systems for Bell then for Lotus Notes and then in a few other start ups, and she is a babe as well. There is an interview with her over at It Conversations that you might want to listen to.
With the paid for edition you get all the features of exchange without the cost and without the security risks of running Windows servers.
It's a stupid question for two reasons:
1) Instead of saying 'No _________'(insert evil, monopoly here), you need to ask what exchange isn't doing. What specifically is wrong with exchange other than it's parent has the initials b.g. Any management that would start a requirement by casting a curse on a particular company is just asking for trouble. It's a no win situation because in two years if sun one, sendmail, whatever bombs on you, where will you go? Part of IS's duty in an organization is to help management ask better questions.
2) Why are they putting this on one person. I have several friends that work for a large corporation in town that runs a large installation of e-mail and no single person has responsiblity over the entire system; there's probably a staff of 30+ that run their e-mail (and that's just a guess).
Anyone that asks one person to take on a project of that scope is asking not for an answer but a scape goat.
Shop your resume.
--pete
Wow. You've got no idea just what it would take to do this do you? Or you're being extremely funny.
1. users should be in a db.
2. imap servers should be their own cluster
3. pop servers should be their own cluster
4. smpt servers shoudl be their own cluster
5. spam filtering should be their own cluster
6. round robin DNS should be ditched in favor of hardware load balancing.
kashani
- Why is the ninja... so deadly?
How much money are you willing to spend?
Do you need to keep the existing functionality of Exchange (ie: calendaring)?
The latter will limit your choices more than the former.
Funny, the first answer that came to mind was "Solaris." Sun gets a bad rap here, but really, their hardware is pretty good.
Is your organization decentralized? Be sure to think about administration and infrastructure issues associated with your particular topography. Ideally you'd want to keep your current processes, and slide another product underneath it.
Anyhow, be sure to phase it in. Run a pilot in one section. Better yet, run multiple pilots with multiple vendors if you can mange it.
Insist that the vendors do the pilot for free. If possible try and run within your existing infrastructure somehow, with live users. Find a department or two that's willing to volunteer, and use them as part of the test.
Vendors won't lie to you, but their data is obviously biased. Be sure to understand exactly what you're trying to figure out from each vendor. Keep metrics now, and while you're doing the pilot, so you have hard data as part of the pilot.
If you have requirements, be sure to share them with the vendors when they design their pilot. List -everything-, including every feature in webmail, every administrative capability that you'd like, etc. Be specific. "Supports IMAP" isn't good enough. "Supports the entire IMAP feature suite" is better. Listing every IMAP feature that you want is the best. It's not overkill, because inevitably the feature you need will be implemented in a later release and you'll be screwed.
The idea isn't to beat one vendor with the requirements. Nobody will be able to do everything. The idea is that at least you know what each product/suite can do and what it can't, so you can adjust the expectations.
Be sure that the people in the food chain forwards all external entities inquiring about the project back down to you. Vendors will attempt to affect the requirements by talking to your boss, your boss' boss, etc. Make sure that everyone knows that you are in charge, and that you are the point of contact.
Hmmm. Understand what your infrastructure is now. You can't replace it if you don't know what it is. Understand as much as you can about it, so you can spew statistics out the wazoo at meetings. If you know, for example, that 8% of everyone checks their email after business hours, you'll overpower everyone else in the meeting with your obvious technical Mojo.
Be nice, but firm. Save being a dick for vendors. Your job is to keep your internal customers happy. The Sales Engineers are not your friends, they're there to make sure you buy their product.
You can yell at sales people. It is a negotiating tactic, and can be very effective. Don't yell at the SEs, because they don't have any real authority.
And whatever you do, don't change everything at once!
Lastly, keep your boss (and your boss' boss) in the loop as to your progress. When the s**t hits the fan, they have to back you up. They can't do that if they have no idea what's happening.
Good luck!
I would like to shoot every person who modded this crap up as Informative.
It should be Funny, or modded down to oblivion.
Do away with email. Everyone then gets a hot secretary. When you want to send a message to someone, she memorizes it and then runs over to that person's cube and repeats it. If the message is too long, she can write it on her clothing and just drop it off for the other employee.
It definitely saves money on equipment, and I know I'd be a much happier person when I got an email.
My job is building systems like this. Current mailserver system I designed and built is hosting 80,000 email accounts, and will scale out to a million quite cheaply by just adding more machines.
/maildirs/domain.com/user/Maildir - split the domains up with a 2 level deep hashing algorithm (if you're virtual hosting domains, which is what it sounds like to me), so make it something like /maildirs/xx/xx/domain.com/user/Maildir, where xx/xx might be something like 3f/6b (depending on the hash). Use MD4 for the hash because its more balanced than MD5.
/var/spool/exim the internal mirrored disks. DHCP them, then all you do is plug a machine in and set it to PXE boot. Pretty trivial to do.
:)
OpenLDAP
You need a central configuration repository to store the email accounts, their passwords, etc. OpenLDAP is perfect for this, and you can replicate it out for scalability. Be prepared to learn about LDAP schemas.
Exim
Use Exim because it has a simple process model (a single binary that does all the work, like sendmail) but has a human readable configuration file and has to be the most flexible MTA out there. You will have customers with weird requirements sometimes, and Exim will be able to meet those. Plus, it has Exiscan-ACL built-in these days, which allows you to do virus scanning and spam scanning at the DATA stage, before the mail is actually accepted by the MTA. It means you can make the sending MTA deal with the bounces if the mail is a virus or is obvious spam.
Courier-IMAP for POP3 and IMAP access.
Yeah its written by a sociopath, but nothing else works as good in the field. It works out of the box with sensible LDAP schemas and is fast, reliable and secure. Handles SSL, all the different authentication methods, what have you. Maildir compatible.
Maildir message store.
Store the mail in maildirs. Don't put them in
NFS mount the maildirs from a fast NFS device like a Netapp. Netapps are recommended because you can plug them in, and they just work, plus they are easy to scale by adding more trays.
Linux NFS servers set up with heartbeat and shared disk also make a nice HA NFS, and would be cost effective, but you'll have to buy an array anyway (probably fiber channel) so it might be better just get something thats completely integrated like the Netapp.
Spamassassin.
Can be configured to scan make at DATA time in the SMTP conversation. A LOT of configuration work here to make it play nice on a massively scaled platform, but it can be done. Mostly it needs to have things like the auto whitelisting and bayseasn filtering turned off, as the extra DB file work is a bit excessive.
Actually, I'm sure there is a way to make it work with a less resource intensive repository, but using the standard SA rules seems to work well for my environment. *shrug*
ClamAV.
Free antivirus, it works, and integrates well with Exiscan-ACL. Set it up to scan via the daemon, and configure it to update every couple of hours from cron, and bob's your uncle.
Scaling out
Make every box the same. Make every box an MTA, a POP3/IMAP server, etc. Use something like Kickstart to automate builds so that you can build a machine in 10 minutes, and all you have to do is configure the IP address and plug it in. If you want to be REALLY sexy, you could make the machines boot off the network, and mount / from a shared NFS area, and make
Load balancing
Hardware load balancers are pretty much a necessity. Don't touch cisco stuff. Its not very good. Go with Foundry Networks ServerIrons. The XLs can handle 1 billion requests/day if you configure them in Direct Server Return mode (also known as DSR/Foundry switchback). Use it. It makes all the return traffic go directly out to the net, meaning your ServerIrons have to switch less traffic and track less sessions. I would recommend however for a million users a pair of the ServerIron 450GTs, or bigger. Maybe one per VIP/Service.
Now, if this is all looking pretty daunting, you could always hire me to build it for you
cant you just buy any one of a number of
email appliances that can do this? why does
everyone insist on building from scratch?
also, the number of boxes doesnt really matter
so much as the actual volume of messages.
your solution would be ten times easier if you bought a load balancer such as a cisco css or bigip, and stored mail centrally.
For 1 million accounts, Post Traumatic Stress Disorder seems more likely.
That's why I'm pretty sure the aforementioned post was a rather good troll. Either that, or he has a $100/day crack habit.
What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.
Think that's retarded? Think about the idiot company that trusted the design of their 1M-account server to some clown that's never set one up before, and thinks slashdot is the best place to start looking. They're absolutely fucked, and they deserve all of it.
First you need to think about your storage. Storage will be the key to the uptime, scalability, etc. Email is inherently tough on storage, just like OLTP work, so whatever solution you use has to be able to handle high loads of small, random I/O. Once you get your storage needs determined (how big will each mailbox be?, will there be size restrictions to attachments?, what will likely be the read:write ratio?) you 'll have to decide on a system to run it on. I'm not familiar enough with non-Microsoft solutions to make a suggestion, but for that many users it will be interesting to see what you decide.
My personal opinion, though, is you have to start with your storage systems. Everything relies on that in email solutions.
I'd start by firing your incompetent arse.
ummm, whats the budget on this project?
... Is anyone wondering what's going on at Microsoft right now?
It starts with a slashdot geek working in the email department spitting up his coffee, followed by a few rumors which make it up to a guy in accounting and customer service, followed by frantic management emails, including some inappropriate language, from Steve and Bill. Then a few good geeks start tracing who this cfsmp3 guy is and try to trace him to a company while the salesreps begin coldcalling any customers running around 1 million customers.
And Microsoft will botch it because they have no experience in cowtowing and bootlicking, which are important skills for any company who wants to humbly keep its customers.
"All great wisdom is contained in .signature files"
I recently listened to in intersting IT Converation about Scalix a linux based e-mail solution that can handle large volumes. http://www.itconversations.com/shows/detail654.htm l/
http://www.scalix.com/products/index.html/
Bryan
Don't get me wrong. Notes isn't just a crappy E-mail client. It's also a crappy database access client that provides user-definiable forms which can be used to populate rows in the database. When you start getting a LOT of rows, the performance really goes to shit unless you replicate the database down to your local hard drive.
Rather than the Notes based solution, I would suggest an old 386 running BSD and Sendmail. That'd save you a lot of pain in the long run, versus dealing with Notes.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
For that amount of users I would look into CommuniGate Pro from Stalker software. I have trouble recommending it as much as I used to since their price increases but for as large as what you're doing it would be rock solid.
It is very easy to cluster, does all the standard IMAP/POP/SMTP/Webmail along with some more uncommon features in an email server, Radius, LDAP, SIP, and MAPI (for an extra fee which can let Outlook connect to it as if it were Exchange with full Groupware functionality).
I know I sound like a sales guy for Stalker, but I have been very happy with their mail server for a few years now.
First, you need to start by drafting real requirements. What do you need exactly? Antispam? Antivirus? Try to have it fill up at least a page.
Once you have that done, you can start looking at solutions. You will have two parts to your solution:
1) The DMZ email relays (possibly including other antispam/antivirus functions) You really need high availability here.
2) Your email storage and retrieval systems. These may be a little more tolerant to downtime on an individual basis. But if you need to have redundancy here, there are ways to do it.
I think Hotmail did fine with BSD and Qmail.* I am sure Postfix is equally capable.
* Although Qmail itself has never had a security vulnerability discovered, you should be careful. TCPRules (on which qmail relies) has a vulnerability that can lead to root access for local users. This is not a problem on systems with no local users, however. I am not aware of any patch for the TCPRules vulnerability.
LedgerSMB: Open source Accounting/ERP
That said, it does have healthy clustering support. That was the only thing that made Domino tolerable for us as a mail server.
They really have to do something about all the panics and task shutdowns that Domino suffers, no matter what the equipment. There's something screwy in there somewhere - something with buffer handling or whatever, pointers getting mangled. It's probably bad databases ultimately but, after you run Lotus' consistency checks nightly on close to a terabyte of databases (spread over 10 servers) you'd expect valid data, right?
The need for something like Cassetica's NotesMedic to restart your client after a crash is kind of lame too. That should have been fixed ages ago.
I have to say that Exchange is better, sadly. I hate Exchange, but it suffers from none of these issues.
HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
Seriously, if someone asked me to do that, I'd be on the phone with IBM and Novell immediately. I can almost guarantee you they'd be more than happy to help and your job would be safe. Don't try to do this yourself...
quick 15 minute brainfart:
in order to increase reliability, you want to adopt a clustered design - if a machine or two fail, nothing should happen to the service.
in order for all the machines to be able to find the user preferences/passwords/etc, you'll want some sort of common storage for them. it could be on a shared filesystem, in ldap, mysql, etc. ldap is common and a good choice (it has very fast read/query performance) - make sure you use replication so an ldap server failure doesn't take you down (or better yet, a multi-master setup). if you use ldap or sql, make sure you are indexing correctly on the data you most commonly pull up.
in order for all the machines to access the user's mail, you'll want some sort of shared message storage. a shared filesystem is easiest, you could choose from nfs, redhat gfs, veritas cluster fs, etc. if you use nfs, make sure the nfs server can failover to a backup system if the nfs master dies (netapps are great for this).
rather than using round-robin dns, i'd invest in a load balancer. there are some free options for bsd and linux, but the commercial products are very nice and easy to use. f5 labs bigips are very nice, cisco CSSes are garbage.
other suggestions about breaking the services into different groups are spot on. personally, i'd have 3-4 inbound smtp servers inside a loadbalanced pool that handled inbound mail and passed the messages to virus and spam scanning services before delivering them to the shared message store (your load might dictate you need more servers, but if you design right you can just add more as time goes on). i'd probably put pop3 and imap services on those hosts as well, and possibly only allow pop3s and imaps (the ssl encrypted varients).
i'd also have a set of outbound mail servers that users would connect to to relay outbound mail. they would require smtp auth, and possibly only allow connections on smtps ports. spam/virus scanning would be performed before the message was accepted by the server, so users would get immediate feedback if their message didn't go through. the outbounds would not do any local delivery, so they would not mount the shared message store (you'll get proper bounces for all invalid mail addresses this way, instead of smtp rejections for invalid email addresses in local domains).
i'd have another set of servers that did virus and spam scanning for both the inbound and outbound smtp servers. you'd want these machines to have faster cpus than the rest, and virus and spam scanning are usually quite cpu intensive. again, if your load increased (or was more than you had anticipated), the system is easy to grow just by adding more machines.
another set of servers would handle the shared filesystem (if nfs, or gfs exported via gnbd), and possibly also the shared preferences store (ldap).
the final set of servers would handle webmail.
each set of servers should be firewalled from the others (especially the webmail servers, which are probably the most vulnerable to attack), with only the neccessary allowed traffic going through.
qmail and postfix can easily read ldap, i'm sure sendmail can also (as can commercial solutions). anything will work for the smtp daemon.
since you are supporting pop3 users, maildir is a better choice over mailbox for your message stores. courier or cyrus would be a good choice, and come with pop3, imap, and MDA (message delivery agent) components.
i'd have the inbounds accept mail from remote sources immediately (assuming the user being delivered to was valid) and have them hand off the message to an MDA, which would perform spam scanning, virus checking, and any user filtering configured before delivering the message to the user's mailstore. (scanning after the message is accepted uses more resources, but grants you more flexibility - users can have their own spamassassin settings, or you can add any number of filtration steps).
for virus scanning, check out ClamAV. for spam scanning, look at spamassassin (
find someone with mail experience. Large scale mail experience. They're out there.
/. does not count.
I'm giving that advice 'cos I've been to the 1m mark, and its not as simple as it sounds. You need clue, and you need experience, and
Hire someone (with clue and relevant background) who has the knowledge before you make ANY decisions. After that its easy.
Sendmail is fine
Sorry, lost you right about here. Sendmail is unacceptable crap. I have these high-volume servers (tens of thousands of users) and sendmail literally locked up trying to process the queue (yes I'd split the queues and optimized sendmail with what I thought were good settings). When it DID work, it left cruft in the queue directory that left me wondering "did it even deliver those messages..what the hell?"
Postfix has been working a LOT nicer. I also use qmail a lot in "set and forget" situations. Sendmail is junk. Microsoft could've made it better.
I can't even imagine 1,000,000 users on sendmail, even if it is spread over 20 servers. *shiver*
Assign usernames and passwords to all users and create all the accounts on every single machine (more on this later).
You do have a feel for how many users 1,000,000 is, don't you?
Easy, you set up a round robin DNS on mail.DOMAIN.com. This way whenever a user checks their mail, they'll randomly end up on a different mail server, therefore collecting more of their mail.
Are you making this shit up as you go along? Have you ever dealt with real users? A MILLION of them?
Them: "How come my important file hasn't arrived in my inbox yet? My client sent it ten minutes ago!"
"Just check your mail twenty times, it should show up after about 10 tries. kthxbye!"
Yeah, that'll go over well.
Better idea, partition the users over the 20 machines and put a gateway in front that proxies them to the correct machine based on their username or some other criteria. You can write a FAST POP proxy (probably IMAP too, never tried though) in Python using event-driven techniques.
Set up all the accounts there and write a Perl script which logs into all the other boxes on POP3 for every account, then puts the messages into the folders on the IMAP server. Get this script to run (with crontab) every minute.
Okay, okay, I get it. You're just fucking with this guy, right? You don't REALLY do it this way????
Good luck!
Yeah you'll need it...
hire someone who doesn't have to ask slashdot how to do there job.
The Kruger Dunning explains most post on
Check them out, they are doing what SUN destroyed. www.raqport.com They have a nice email appliance...
My firm has recently consulted for an email service provider that handles mail for about ten million end user accounts. Until recently, they were running everything through a large and growing bank of content filtering servers. As traffic has increased, the load on their filtering machines has increased exponentially, as has the storage requirements for their anti spam quarantine system.
Whatever you do, please add some kind of throttling-style connection control in front of the content filtering systems to limit the rate at which spammers can connect to your content filters. With content filtering and blacklists alone, you will only get about 95% of the spam and your infrastructure costs will know no limit. Add connection control and you can get the last five percent under control while also significantly reducing the amount of mail that ends up wasting time and space in the quarantine.
My company sells a traffic shaping connection control system. Fancier appliance-based options such as the Symantec 8160 are also available if you have large amounts to spend and a propensity to go with the big name.
To my knowledge something like this is not yet available in open source -- probably because it has only until recently made sense for large mail receivers such as your client.
Our home page: http://www.mailchannels.com/
Am I the only person in the world who's actually gotten a message saying gmail is down. This happens to me every few weeks and usually lasts a minute or two. Not a big deal, but certainly not going to meet your reliability requirements.
Guess there is a reason they are still in beta.
This article:
www.linuxjournal.com/article/7323
may be helpful to you. It's not on quite the same scale, but it may be helpful. I know quite a few universities and companies run enormous Cyrus clusters with LDAP and a good UNIX MTA.
You might also want to ask on the info-cyrus mailing list.
The place to start is at the beginning - step one. What are the requirements? Without those, any product being touted is pointless. Really, you need to consider things like:
- Mail storage capacity per user
- Geographical diversity - where are the users
- Functions - are calendaring et al. required, or is it just email?
- What anticipated growth is there for the system?
- What business continuity requirements are there? Is it acceptable for a single site to be out, or is absolutely everything mission critical?
That's a start. Clearly, the outcome of these questions will help you determine what the business requirements are for the system, and from there you can build an RFP and start talking to vendors to determine the most cost-effective option that meets your requirements. Suggesting any particular solution at this stage is academic.PS: To all the posters that said "look at gmail!" - gmail has an entirely different purpose than a corporate email system!
Hiring someone smarter than you. "Must scale perfectly" BWAHAHAHAHAHA go away.
Resign. You're obviously in way over your head if you have to resort to asking Slashdot readers for advice like this.
Last year, my company migrated from Exchange to Lotus Notes/Domino. (Domino is the name of the server product, Notes is the name of the client). Migration of data from Exchange to Notes was fairly easy. The best thing was the server runs on Linux, Solaris, and all IBM mainframes. Of course, Windows 2000/2003 Server are supported, but I prefer not to purchase CALs. For 50 users, I think we spent $3000. That included webmail, pop3 server, imap server, everything. I really don't think Notes gets the credit it deserves sometimes. Did I mention that it doubles as a development environment, where you can build applications fairly easily? Good luck!
You may find this interesting and useful: Argentina.com: A Case Study. They supposedly built a large-scale, low-cost email system with high reliability at a company with less than 15 employees.
"Open Source?" - Press any key to continue
Well, on the subject of what not to use, avoid Lotus Domino & Notes as well. Take your favorite horror story involving Exchange and substitute Domino for Exchange and Notes for Outlook and that's what it's like. Only Outlook is a much better mail client.
There are dozens of perfectly good mail servers out there. The more features they have the more likely you are to have problems. It's a pretty simple equation.
And if all else fails, you can write your own. I've written one, it's not very difficult (hacked it out in C# in a weekend). It's a very simple plain text protocol. But I wouldn't run the company on something I wrote in C# in a weekend. I don't even use it myself anymore. I'm running Exchange now for my personal mail server as that's what we run at work.
bance.net
Hi Cliff;
:)
;)
Sounds like a fantastic design opportunity here. The 5% of the project that is Enterprise architecture is what I enjoy the most as well. I'm assuming money probably isn't an object in terms of how much gear and bandwidth you may have to feed to this.
I'm happy to let my fingers type away below, I'd love to keep in touch and see how you end up shaping this system. my email is allowmx at hotm...
Before I ask, are there actually a million accounts? Or is that just a ceiling that you have to show proof of concept with?
I've only implemented up until about 250,000 accounts of any kind, as I'm sure you're probably aware, the base transactional resource costing is essentially the same..
For me, I would look at this for sure from at least these two angles:
1) knowing your transactional costs (how much of your hard resources, bandwidth, cpu and disk space) will each type of transaction in your system take?) I mostly use this approach to get not an exact number, but an idea of magnitude, and detail where it happens on it's own to make sure the proper attention is applied to them.
2) Failsafe intelligence & capacity in the infrastructure, as well as the failsafe intelligence & capacity in at the application layer. You have to know that your hardware, software, os, business logic and applications are all monitorable internally, externally for availabilty and actual "can I use it". Transactional logs, etc, of having information available when the inevitable problems come up.
Also, having a capacity for as many of these layers to be self-healing, and fungible to the point that your service delivery is homogenous in as many ways possible. If your network finds something doesnt work or route, with mail, you can find another way to route it. Having a transactional manager of some kind, direct or not, could be useful in this case depending on what the client wants.
99.9% uptime equates to about 526 minutes, or 87.6 hours you _could_ be down each year. Thats about 7.3 hours a month, or one day a month.
Based on that, having flexible, redundant tools setup in a high-availabily arrangement at their respective operating capacities is key. I'm not sure if your current exchange problems are being aided by not enough equipment, bandwidth, or other stability issues, so I'll just assume that it's all of them
I apologize if anyone else has already mentioned some of this, but here's some of what I've found to help me where email has become as crucial to a business as their cell phone.
On the hardware level:
- STORAGE: Everything goes on a SAN, if not more than one. Don't waste your time with anything less.
- SERVERS: All servers have redundant hot swappable parts in the very least, power and hard drives. I'd even suggest making the servers Iscsi bootable so they can boot off the backbone. Beyond this, I like to buy my servers in piles of identical ones. Have 1-2 spare serevrs of each kind sitting there, ready to throw hot swap drives into from a failed server. That way if a server dies, you can address the power supplies, or get the HD's in that machine into another identical server and get it up and running while you diagnose the hardware problem independantly. My approach to any kind of problem is FIX, DETECT and REPAIR. Get it up and running, find out what was wrong, make sure it's fixed for good. Too many of us stop at the first too
The idea I have in mind is a smaller scale of a google beige box army. linux/bsd offer so much more transcations for each piece of hardware, so that works very much in your favor. Obviously something enterprise grade to satisfy the client such as the Compaq/HP Proliants, etc. I feel these Servers ahve the best overall support, manageability and information tools, and their openlinux drivers interface wonderfully with open source operating systems)
Networking/Communication level:
- Entire mail processing architechture communi
I ran Cyrus-imap on a production server for a .com back in the day. It had .5 million accounts on the box with around 100-200 simultaneous web users hitting the daemon constantly. This was back in cyrus-imap 1.6 days. Cyrus performed very well except for logins. This was due to a flatfile that no longer exists in the 2.x release. Cyrus is probably the fastest most scalable opensource imapd/pop server out there.
If you don't mind a commercial solution, I can't imagine anything more scalable than Communicate from Stalker Technologies.
Create a mail system for 5,000 users. Create 200 subdomains. Run a forwarder which sends mail from the main address to the appropriate subdomain.
Boss: Damn email server is down again
ThisClown: Too bad we done use an open sourse solution we set up ourselves..it would always work AND we would save money.
Boss: Really?
ThisCLown: Sure Open source is great (blah blah boring open source diatribe)
Boss: sounds great, make it happen
Thisclown: [drops brick relizing his boss was taing him seriously] uhhhh sure, but I have these other priorities...
Boss: Never mind them, make us an email system that can handle [holds pinky to mouth] 1 MILLION users..
ThisClown: Goes to computer and asks slashdot what to do.
I like OS, but you know how some people can go on and on about it. Espcially people who don't really understand the magitude of a task there being given.
The Kruger Dunning explains most post on
Show me on the doll where his noodly appendage touched you.
A req analysis for sure is the start with a point by point list of functions....we currently run 29 mtas with 16 storeage racks. qmail defntly figures in your design. other vary. Have fun!
with the backbone.
load balancers, fiber channel, and shared storage
look into a clustering file system on a NAS or SAN
store your mailboxes there, I recommend looking at coraid.com
otherwise just daisychain a set of load balanced networks together
smtpd->spam/AntiVirus->mailman/postoffice delivery to SAN/NAS->POP3/IMAP/Web client->spam/antivirus->smtpd
each a seperate network, load balancers in between, seperate your receiving smtpd daemon network from your sending smtpd daemon network, and your antispam and antivirus networks should include more than one product
You could give Oracle Collaboration Suite a try.
One million email accounts is quite a lot. You getting into the big league ISP category with something like this. It's not a one person operation to put something like this together. You're going to need a substantial number of well trained people to do this. There's only a couple players in the field at this level. Sun's JES Messaging system owns a sizeable chunk of the market, followed by OpenWave and a small gaggle of fly-by-nights with unproven track records.
Some of the larger email systems however are homegrown using open source parts. Yahoo and Google immediately come to mind, and they do work quite well. But you probably don't have the resources that they do to engineer & test something like this. Yahoo is rumored to have more than 200 people working on email alone.
Sun has a deployment like this canned, sitting on a shelf in Santa Clara. Tell them what you need, write a check, and they'll show up with the kit. 99.999% uptime if you write a big enough check. Make them to throw in the Waveset stuff.
Of course, everyone should note that recommendation is coming from an IBM employee.
Sorry, but Lotus Notes sucks; it's an abomination in almost every way. It's bloated, slow, buggy and has what is arguably the worst user interface ever (The User Interface Hall Of Shame said they could have based their entire site on this one app!) Sure, it does group meeting notes and can let you check other people's calendars but it falls flat as an email system. If it can't do the basics, who cares about the "advanced" features.
Doubt me? Okay. Let's try a little experiement.
First, sort your inbox by subject. Oh, I forgot. YOU CAN'T. Well, let me take that back. You can if you simply follow these simple instructions...
First, you need to have Domino Designer installed. In Designer, open Folders in left pane, then open folder $Inbox, highligh the Subject column. In the window with Columns properties in second tab you can check-in the "Click on column header to sort..." checkbox. Close $Inbox folder window. To prevent design refresh, in Folders view, right-click on $Inbox folder, choose Design properties and on third tab check-in "Prohibit design refresh or replace to change".
[blinks eyes in disbelief]
Un. Fucking. Believable.
Oh, and the feature I like the best is the pop-up dialog that tells you you have new mail. So you click to make that go away, switch over to LN to read the new mail and it's not there... Oh, yes, that's right, you have to press F9 to actually download the email to your client, even after being notified by an obnoxious popup that you have new mail.
Want to know another neat little feature related to that F9 key? According to our LN System Admin, get a few dozen people to all press and hold the F9 key for a few seconds at the same time and you can crash the Domino server backend requiring the server to reboot. Nice.
I could go on but I think I've made my point. I have never, ever, encountered anyone who has switched from Notes and been pleased with the change.
Oh come on !!
Just get yourself one of those ultra fast AMD servers and put exchange on it with a 1,000,000 user license. I have been told by countless AMD fans that the new AMD processor is XX times faster than anything on the planet.
I think 1 server will do it for you, but you might want to raid the server just in case!
Slashdot taught me how to use the preview button!
If you have are expecting a million users you should expect to put some cash into this. Develop the majority of it from scratch, get up to date on RFCs and how different web mail services and servers vary from them (this is honestly the hardest part).
First, develop an incoming server that it specific to your aims, have it read from your user database (the only thing you should use any database server for in this instance is a user database) have it dump the raw message to files in some sort of predetermined directory structure (find out the particular limitations for your OS, and test them, you will regret it if you don't). Don't do anything with the incoming mail server more than verifying the user and dumping the raw information.
Next, you write the POP server, the web interface, and if you are going to have an internal mail client then you write that (look into IMAP for ideas on protocol). Have only these things do any sort of indexing of messages, and when they do make them write to a universally (among your software) readable index file so you don't do any work more than you have to.
Whatever you do don't use a database to handle your mail. It may work fast initially but I can promise you with any regularly available database that you will start getting very slow with more than 200000 users.
Use sendmail for your outgoing mail servers. Have dedicated servers compiled with only what is needed to function as a mail server. Rotate what smtp servers your mail clients use and expect to have a whole lot of mail that is going to wrong addresses and mail servers that have really slow connections, so have a few fallback servers that have much larger timeouts that the primary servers send the mail to once they are unable to send it themselves. That way the majority of the mail gets out fast.
You can honestly write all of this from scratch in a few months. Writing it from scratch gives you the functionality required for your situation. Look at bug databases for simlar programs and make sure your software doesn't have similar exploits. Learn from the mistakes of others. Don't use windows for any servers (not trying to be biased) except maybe on your RAID arrays. Read the writeup (don't have the link) about the issues that the hotmail team had when converting hotmail to windows.
Remember, a lot of people don't check their mail frequently, don't waste CPU time on something that is not immediately necessary.
---------
Swearing is the crutch of inarticulate mother fuckers.
A couple of V880's can handle 1,000,000 email users easy.
Qmail is best. Preferably on a FreeBSD server. So hard to kill it in any way.
Get a server with RAIDed SCSI disks preferably hot-pluggable. Install FreeBSD, Qmail and other packages you might need as you go.
Ideally keep the emails in a Maildir format.
I dont know where the Novell idea came from.
"Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
You probably want a FallbackMX host (or a bank of
them) so backed-up outbound queues don't interfere with normal outbound processing.
The FallbackMX hosts can use a file system optimized for directories with lots of files in them (and can of course themselves be tuned as the parent poster suggested.)
Perhaps you should look at hanging vanilla SMTP and IMAP servers off Netscape Directory Server (now Fedora Directory Server)? I think it supports multimaster replication and the code is OSS so if it doesn't work you can get inside and adapt it.
I'd shorten the conversation this:
ThisClown: Hmm.. what stupid question can I post to Slashdot to get me on the front page.
[Light bulb goes on over his head]
ThisClown: Aha! I can't ask how to make an email system to support a large number of users! Yes!
[Posts to Slashdot]
Editor: Shit, I've gotta do 15 minutes of work or my fellow "editors" will know I've been slacking off all week. What's in the queue? Hmm... slightly technical question about email systems. Well, I'm not too familiar with this whole "email" thing, but what the hell, seems like it would be good for some page hits.
[Posts to Front Page]
How we know is more important than what we know.
I'd start by seeing what the big ISPs are using.
;; ANSWER SECTION:
That's a matter of doing an mx lookup, telneting to one of their gateways on port 25, and seeing if you can infer from their banners what mail system that they are running (for the inbound smtp gateways, anyway-- since there's nothing to prevent them from layering different products). Look to mailing list archives for messages sent from the various domains, and see what the headers tell you about their outbound mail path.
Example: Inbound Comcast HSI:
$ dig comcast.net mx
comcast.net. 250 IN MX 5 gateway-r.comcast.net.
comcast.net. 250 IN MX 5 gateway-r.comcast.net.
$ nc -vv smtp.comcast.net 25
Connection to smtp.comcast.net 25 port [tcp/smtp] succeeded!
220 comcast.net - Maillennium ESMTP/MULTIBOX sccrmhc14 #274
So, they use something claiming to be 'Maillennium'.
If you do this for AOL, you'll see some weird-looking, probably custom AOL gateway. Earthlink says something like:
'ESMTP EarthLink SMTP Server', AT&T WorldNet is also Maillennium, Verizon.net declares 'MailPass SMTP server v1.2.0', and so on.
If you really wish to probe to see if this is opensource-ish stuff with obfuscated banners, you can try fingerprinting them using smtpscan http://www.greyhats.org/outils/smtpscan/> to find out that it's really just Postfix or Sendmail hiding behind that custom 220 banner. Actually, it's the smtpscan fingerprint file is an interesting read all by itself...
That's over EIGHT HOURS of downtime a year! Shit, Exchange should be fine for that job! I doubt any other webserver could have downtime even in the vicinity of what you're talking.
I'm not turning to stone at the sight of her, but Julie Farris also does not come anywhere close to my definition of a "babe".
Maybe she cleans up nice, but based on everything Google Images can dig up... not a babe.
Could you be a bit more specific on the following items?
5) Breaks well-known and understood UNIX standards.
Which standards are these? Are you talking about the errno fiasco?
6) Security through lack-of-functionality.
What sort of functionality is provided by, say, postfix, that qmail simply won't do?
7) Not really secure despite the claims.
How's that? Do you have $500? If not, what's the security vulnerability that the author refuses to acknowledge?
Which of these problems that you enumerate are not addressed by netqmail?
--grendel drago
Laws do not persuade just because they threaten. --Seneca
There's been some mindblowing stuff on it, recently re: corporate memory and technology, that I really enjoyed readng it - the guy is kinda funny as well. He's really looking at things in an abstract/philosophical way. He said this morning that Google's been visiting too.
Ever heard of TAI/MAI/NAI?
Technology augmented intelligence, machine augment ed intelligence, network augmented intelligence?
PC2?
Me neither, before I found this blog a couple of days ago. This guy is talking like he invented them, and it sounds pretty interesting. I want to see what he says next.
http://nrg78.com/I command thee!
hahahaha... that's some funny shit.
Postfix+Dovecot+Squirrelmail=TheWin
...and tell them they're in danger of losing your company's business.
Ballmer will be on a plane to your location a couple hours later, and he'll have his negotiation hat on when he shows up at your door. You'll get a serious discount on upgrading your entire organization to Exchange 2003, and The Powers That Be at your company will take it, since migrating from Exchange would have been a painful mess anyway.
Cyrus IMAP is designed for this size of installation. You can split the backends up with Murder on the front-ends to distribute load; divide mailboxes on each host between filesystems (which, you'd presumably spread over multiple disks); use a SAN and GFS or other shared-storage cluster filesystems and share the spool among servers; use the new pre-release 2.3 code with mailbox replication and use more discrete, commodity components. Lots of other features that are designed for large-scale implementations.
For authentication, of course you have choices among LDAP, Kerberos (both of which are usable even if you're stuck with a Windows domain for authentication), PAM and other things. Very flexible; too flexible for some and it can be a bit confusing.
I've been working on rewriting the HOWTO, although I haven't made a ton of progress, it may still be useful to you: http://nakedape.cc/info/Cyrus-IMAP-HOWTO and here's a presentation I put together for Linuxfest Northwest: http://nakedape.cc/info/Cyrus-IMAP-Intro.
You mention a million mailboxes, but that doesn't really mean much--that is just an estimate of storage requirements. What is more important to determine is how many concurrent users you will have and how much actual traffic--storage is cheap, memory not so much.
Wil
wiki
expertise ,will definatly save money in the long run.
You missed an important one. Round-robin DNS doesn't work that way: presented with a set of IP addresses for one hostname, it's almost entirely a client software decision on which IP address to reach out to. Couple that with DNS caching on the clients or their local DNS servers, and round-robin and DNS based failover servers can easily take more than 24 hours to reach even the next IP address of a round-robin set.
The obvious answer is of course : Send all those thousand employees an Gmail invite !
:)
So, you have 1,000 invites in your Gmail?! I have a Gmail but that's really amazing! I only got 100 invites even though the number was increased it's still 10x less then you suggest. Or did you mean invite 100 people expecting they'd invite 1 of the other people in the company?...Swell idea, but how are you going to keep track of who was invited and who wasn't?! After all that I think your best doing the solution yourself
I hate to say this, but I would seriously consider http://www.firstclass.com/casestudies/Business/ with some sort of anti-spam http://www.barracudanetworks.com/ns/?L=en infront of it.
The only serious problems I have with it:
-lack of true RIM support
-hard to find quality administrators
But it has all the functionality you could possibly need and it Just Works.
IBM does not run the corporate Domino servers on Windows! they use pSeries boxes with AIX
I want one meeeeeeeeeeeeeellllllllioen doll... err, email accounts!
It would be awesome to send an all-employees memo out and have 1 million computer speakers announce YOU'VE GOT MAIL! simultaneously.
If you think this is a good idea, just write at the bottom of this post
>>Me Too!!!!!!!!
Which highlights another issue - I'd struggle to believe an existing implementation of that scale was using Exchange _only_ for email, so you're not really looking for /just/ an email system, you're probably looking for a groupware solution as well.
This is not a trivial thing to implement, and you're highly unlikely to get much worthwhile advice from Slashdot.
That said, the place to start is a *real* requirements specification. You need to figure out what services you need to provide, to how many users and at what availability level(s) (note that difference services might have completely different userbases and requirements). Once you've done that, you have all the information you need to either research everything yourself (without using things like Ask Slashdot), or hire someone else to do it for you. But until you know exactly what it is you're trying to build, you shouldn't be asking for advice on how to build it.
If I were you is quit.
I've been in over my head before. But man, you've taken "OH SHIT" to a whole new level.
quit, you obviously can't handle this one.
Not advocating for Microsoft, but Exchange 2003 on the right hardware does run and scale very well, if you need the groupware features.
How could you NOT know that was a joke?
I haven't read all the comments so someone may have beaten me to the answer?. But here is my answer based on the current state of the art and numerous industry studies for reliability and lowest TCO. Call Microsoft and ask for a copy of exchange server!!!!!!!!!!!
embedded linux
I can do nothing but sit back and admire the humor:)
I think I will get myself some AMD's if they are that good.
-Scott
There are ISP-grade products that do it. Sun has one. See http://www.sun.com/software/products/messaging_srv r/home_messaging.xml
r oducts.cfm?ProductID=642%20) to check for the usual suspects. For inbound mail, they leveraage directory servers (which replicate with ease) to find the specific message store used to host the mailbox for the inbound message, and then route it correctly. These are load balanced for availability and scalability.
You need to break up the jobs of message storage, client connections, and mail transfer into isolated components that can scale independent of each other and be clustered for scalability and high-availability.
Message Transfer Agents (MTAs) are often dedicated for either inbound and outbound and also interface to scanning software (e.g. BrightMail Anti-Spam & Anti-Virus, see: http://enterprisesecurity.symantec.com/products/p
A user's mailbox will only exist on a single message store, but the message stores can be clustered for high-availability.
Client connections similarly allow an array of "message multiplexors" to scale that end of the problem. The multiplexors speak webmail, IMAP, and POP. Similar to the MTAs, they are load balanced. A user can connect to any multiplexor and a directory server is used to find that user's proper message store to connect them to their mailbox.
To the end user it looks like a single server that does POP, IMAP, and WebMail. In truth it's broken into components to achieve high scalability and availability.
A single message store can usually store a few hundred thousand mailboxes -- for a million mailboxes you'd probably only need a handful of them.
If you want that type of uptime and have a budget to support it, I'd skip the administration and technical nightmares a server farm brings and look at an HP NonStop. Install HP OSS and use one of HPs ported packages or hire some coders to port a package tot he platform. It's worth a look.
If you can split the users up, perhaps by suborganization or by geographic area, you might (and I say might because no one answer is right for everyone) be well served by having lots of different servers handling email for each group, and then aggregating it all together with a head end that handles email routing and directory services.
If you're putting more than a few hundred users into an Exchange environment, that's how Microsoft would have wanted you to do it. Although notoriously unreliable, the concept is sound. In the non-Microsoft world, you could build each area as a subdomain, deploy the usual tools (such as the SMTP and IMAP daemons of your choice), and then use OpenLDAP to tie it all together, and add some sort of Postfix or Sendmail routing system to make the subdomains invisible to the outside world.
Some organizations might even consider an open source email/groupware system like Citadel that can handle a distributed network like this; it can tie together lots of servers using its own peer-to-peer protocol and share a global address book without the need to use subdomains (and any individual server is capable of being an MX for the whole network, so you might not even need a hub server at the head end -- although you might want to use one anyway in order to centralize your border services like spam/virus filtering, archival for Sarbanes-Oxley auditing, etc.).
In summary, if you distribute and/or federate the email services, you gain the benefit of removing the single points of failure, and you can potentially put the servers closer to where the users are, reducing your bandwidth expenses.
Tired of FB/Google censorship? Visit UNCENSORED!
I think those 100 invites get refreshed everyday.
I just wonder how the employees will handle all the "bum bum"s and "top female" pictures floating around.
Not that I'm complaining. I bet it's hit-or-miss but I've seen some extremely interesting pictures there...
You can hold down the "B" button for continuous firing.
I would first look at the migration requirments.
- What do you have now?
- How fast could it be migrated
- to what?
Possible scenarios:
Alot of good suggestions so far, I've never done it at this scale, I've done an avg of 200k. But from a Sr Unix admin point of view I would suggest you split your systems by functions like many suggested.
Frontend: I would take a look at the Barracuda networks appliances that do Spyware+Spam firewalling.
DB: LDAP (if you have the option to migrate easily)
MTA(s): postfix
IMAP/POP(s): Cyrus
Webmail/via IMAP: HORDE/IMP
Good luck with the migration! thats what will take the most planning.
What I found great about exchange servers is the fact they have this awesome Outlook Web Access component. No other software I've even seen comes close. Is there by any other chance, that a similar program exists?
HD Trailers
We have a vanilla Pentium III 450 with a pair of 15k SCSI harddrives running software RAID. The OS is Debian stable with kernel 2.4.18. The load average gets high at times, but it works fine. In 8 days it will have been up one year. One year ago we took it down to upgrade the drives. The LDA is procmail.
/etc/procmailrc, we automatically delete messages that contain attachments that end in exe|vbs|shs|lnk|com|pif|bat|src|dll|vb|osx|hlp|scr |zip.
The vast majority of the users use a simple web-based PHP mail client we wrote. It runs on the mail server and manipulates the mailbox files directly so it doesn't have to create any POP3 connections.
What makes it all work so well is our constraints that you may or may not be able to use. We limit users to 5,000 messages in their mailbox. Every Saturday, I run a script I found about six years ago called expire_mail.pl to get rid of messages older than 60 days. The system is noticeably slower when the mailboxes are larger. It takes Linux much longer to append to a file on ext3 when the file is larger. The incoming max message size ('O MaxMessageSize=64000' in sendmail.cf) is 64,000 bytes. That's what saves us. With the system-wide
It all just works.
LOTUS NOTES!!!
www.postini.com
Postini rocks they manage spam and antivirus. if something is caught they hold it for you so your processing requirements are less. They also will process outgoing mail as well. This way you only have to accept mail from and send mail out to postini making your servers more secure. They will also mail bag your mail if your site goes down and notify you of such.
Couple of things, most of them have to do with the user deleting their data inadvertly, however the largest issue with pop3 is it's a big ass security hole. Passwords are transmited plain text.
If you really want to do something like this in house, hire someone like Nigel Metheringham (old friend of mine, haven't had any contact with him in years) who set up the mail system for freeserve.co.uk when they first got started. Look at what others have done.
Crucially, you will want several inbound MXes, several outbound SMTPs, and your IMAP server on the most robust hunk of metal and silicon that you can get your hands on.
Years ago I would have recommended UW-IMAP with mbx format, not mbox format. Now-a-days, I'd be more inclined to use Cyrus IMAP. As for sendmail, postfix or exim, I've got my personal favorites. Your choice will have to be based on more than my prejudices and biases. But do take a look at exim, many things were built into it for freeserve.co.uk. (Freeserve went from zero to more than a million users in a few short months when it started.)
Prime numbers are exactly what Alan Greenspan says they are -S. Minsky
Hotmail was great until MS converted it to windows.
While not quite a million users, HEC Montréal switched from Netscape Messaging Server running on AIX to Postfix/Cyrus/SquirrelMail running on Linux. Linux Journal ran a really nice article and a follow-up about their transition.
One of the first things the school did was figure out how exactly their current system was failing them. Their old AIX boxes were being stressed just by the volume of mail coming through the system, they had little power left over to do any sort of filtering. This led to users getting drowned in unwanted e-mail which only exacerbated the existing load issues. This is one of the first things you need to do, figure out why your current system isn't working properly. You'll be better equipped to fix the problems when they've actually been identified.
HEC Montréal also went for heavy redundancy and specialization. Instead of a handful of servers sharing all of the tasks equally each node in the cluster has its own job with every class of job having a backup server. Every job is going to take a beating with so many users, even if only a fraction of them are using the system at any given time.
I'd say the most important part of what you're doing will be modeling your current use. Are you getting a ton of traffic from viruses and worms spreading over your internal network? Do you get huge amounts of spam traffic to users? In such cases filtering at your SMTP servers will relieve the rest of the system from extraneous traffic. While you might need really beefy external SMTP servers you won't need nearly as much storage space on a SAN or NAS.
I'm a loner Dottie, a Rebel.
http://russnelson.com/
I call Shenanigans!
This is not a true story. Companies with millions of clients that ask ignorant IT people who have to look to Slashdot for answers.
If it is done, the CTO should be fired, The entire IT departments should be retrained, etc.
Frankly this is a CTO level decision. One does not just get "sick" of Exchange. The fact is exchange given enough money and effort works very well for most people that use the system. Companies with millions of clients can put enough money and effort into it.
This is almost too dumb to talk about... Shenanigans I say, Shenanigans!
He didnt say 'revolver', you are just assuming..
---- Booth was a patriot ----
Taken from the hula-project.org FAQ: How well does it scale? Insanely well. Scalability was the primary design parameter for the original codebase. Anecdotally, people have run 200,000 registered users on a single $4,000 PC, with a 25% concurrency rate (that's over 50,000 concurrently-connected users). Of course it will be more practical when its finished but even now it seems stable enough to consider deployment.
mattdev@server$ touch
cannot touch `/dev/genitals': Permission denied
Check out qmailrocks.org for a fantastic full featured mail server install based around Qmail. Support for database users and ldap are options, and it includes spam filtering, web mail, and even an admin web interface. The website walks you through every single little step, and has paths for various linux flavours plus the BSD's and even Solaris.
In terms of scalability you're going to want to star with some honkin' hardware. You will also need to seperate the sending (SMTP) servers from the receiving servers and the mail storage servers, in order to distribute your load. qmail.org has a ton of info as well about the Qmail system.
Read nothing in your post that I can see as being persuasive.
You did swear a lot.
I'm sure management has already asked those questions and more, maybe even complete with expletives.
Not defending Microsoft here, but I have to take care of an Exchange 2003 Enterprise server, and I wouldn't think of trying to do it without Symantec (formerly Veritas) Backup Exec with the additional Exchange agent. Yes, you can back up and restore individual mailboxes, and even individual messages. Backup Exec has its quirks, but it's the best thing going if you have to take care of Outlook users. Over the years, starting with Exchange 5.5, Backup Exec has saved my rear when information stores got corrupted, log files were deleted accidently, and so on. Combined with a nice, fast AIT tape library, it's a great data preseration product for the small- to medium-size enterprise.
It's only funny until someone gets hurt. Then, it's hilarious.
And Sun has stated in several places their goal with the "Outlook Connector" is that a user should not be able to tell the difference between an Exchange backend versus a Sun JES backend. And if you role out Sun's IM, you get a Jabber/XMPP server, too.
We have deployed it where I work supporting 75K accounts and the few boxes don't sweat.
You really should look at it. (I *think* that I have seen mentioned that Verizon and Telstra use it for their customers.)
If you have to ask that question, then you must not be qualified to even do this job? Someone qualified to do the job would already know the solution.
Also you can be sure there will never be buffer overflows or similar security problems.
I'm sure you know this, but you're going to need a clusterable technology. You need to have multiple redundant servers for this kind of load. Much better to be able to handle load by adding cheap PC-hardware servers than basing it on one huge server. James would let you build that if you want to.
Of course, James only makes sense if you're a Java type of person.
For high volume sites, my recommendation for an MTA would be Ironport C series appliances. They are really pretty bulletproof, and can handle a large volume of mail. You could use something along the lines of Exim for your mailbox servers (or exchange for that matter if your users liked the groupware aspects), fronting the mailboxes with the Ironports. Some of the larger mail installations on the net use Ironport as their MTA.
Novell's eDirectory will easily scale to handle 1 million accounts. In addition Novell has ported Groupwise to run on Linux. Of course Groupwise runs on NetWare as well which is still an awesome reliable OS and neither needs to be patched anywhere near as often as M$. Good luck!
I am glad someone got it.
Slashdot taught me how to use the preview button!
Guys I have worked on a project that had to deploy about 1million accounts previously and you simply can not fault the Mirapoint solution. Check them out as they have arguably the fastest mail platform on the planet, but the most stable and cost effective. PS I don't work for "mirapoint" so this isn't a blatent plug but have seen it do it thang first hand.
First, you migrate users in batches of 100 at a time (the invites do refresh regularly).
Then, when you engage new staff/contractors, you send a gmail invite as part of the recruitment process.
You could send invites to the heads of each department, and have them send invites to each of their direct reports, and so on right down the chain to the lowest intern.
You make the mistake of thinking you can educate the fundamental stupidity out of people. You can't.
While gmail itself is a non-option (it's like distributing hotmail accounts for your company), what about what they have set up? You might consult with them. Another user had an architecture you could also work with. Hope you have a budget.
This sig no verb.
Run Lotus Domino on this:1 09/ :-)
http://www-03.ibm.com/servers/systems/systemz9/z9
For a big fee, companies like Yahoo and Google might lend you some expertise. Heck, it may even pay off to pay them to help you get up and running.
Assuming you aren't competing with them of course.
Also try contacting universities and large companies, they may have 6-figure mailbox-counts to keep track of.
I'm sure companies that sell competing products e.g. IBM etc. will also help you get up and running, for an appropriate consulting fee.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
there was an article in a issue of sysadmin magazine about 1 year ago about something like this. Not quite the same scale you mentioned, but more like 70,000 users. but their solution's biggest strength was its ability to scale, so its a start.
the gist of it was they had a fibre channel SAN, which was shared by multiple headless servers at several levels
-squirrelmail based webmail
-postfix
-LDAP on postgreSQL
-cyrus IMAP/POP3
If You look back in the issues for the last year or so I bet you could find it...might be a start...
My company has high usage for the number of people we have, but its still only ~500 users, so we just got a beefy redhat enterprise box...
sometimes, i wonder if i'm the only conservative on teh intarweb. ah well, back to mah hogs and warmongerin'....
I agree. That's where I usually start at. Helps clear my mind and relax me. I think for a project this large, you're going to need to get laid on a regular basis if you're planning on surviving this.
"Klaatu, verada, necktie!" -Ash
This can't be real.
YOU (as in one person) have been asked to figure out how to provide email access for ONE MILLION ACCOUNTS?
There is another comment on here saying that the entire IBM corporation only has around 300,000 email accounts. Do you know how many people they probably have running their email system?
And you need to replace Microsoft Exchange, probably the most capable corporate email system available? Do they require all the features that Exchange offers??
Sorry if I'm I can't imagine this is real.
Novell GroupWise on Linux.
All of the heavy lifting has been done for you; it's a scalable, secure, battle-tested solution running on an open-source platform.
Get Ironports for the front end SMTP processing and spam/virus filtering. Then get a Communigate Pro cluster for the back end with a SAN for storage.
It'll cost you about a third of what Exchange does and its so, so painless, even for 1M accounts.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
"Get a server with RAIDed SCSI disks preferably hot-pluggable. "
And how do you achieve server failover for 99.9% uptime?
Dude, if you don't have a clue, don't even bother to post nonsense like that.
For 1m users and 99.9% uptime, you're going to need multiple servers and shared storage. That implies a level of experience that neither you nor the guy who asked the question has.
Your kind of homegrown solution causes more problems than anything. It will never scale and it will never achieve the uptime required.
Cripes. Amateurs.
Not unless your non-technical user-base can be treated for withdrawal. People in marketing, sales, consulting, managemnt can not/will not relearn another groupware system unless forced to by upper management.
Yes, you can have a robust IMAP4/webmail solution but without the integrated calendar and task delegation, you're in for a world of pain. We've tried. The Outhouse 2003 smack is too alluring for the mobile Crackberry/PDA use where their mail is at their fingertips.
What you need to ask is what do you want? A robust mail system with low cost of ownership or happy fratboys who pay the bills (including you salary).
Argentina.com has half a million accounts with POP3, SMTP, and dial up access using FreeBSD.
They did this with a hardware budget of US$75,000 over two years. Each user gets 300MB of mail space, and the cost for disk came out to US$3/GB. POP3 and SMTP uptime is about 99.95% (down 5 hours per year), while webmail access is about 99.5% uptime (down 2 days per year).
http://www.outblaze.com/index.php
FreeBSD 5
Dual 3.4 XEON Procs
4GB RAM
4 x 300GB Drives (Lotta space For Imap users)
Hardware RAID 5
FreeBSD 5
Dual 2.0 Xeon DB mysql server(for vpopmail)
2 x 70GB RAID 1
2GB RAM
Qmail, Vpopmail, Courier IMAP, apache, qmailadmin, vqadmin
That system would handle it and allow for redundancy
Notes Security Framework awesome. You can set access to the server, the mail database, the actual records and can even drill down to and encrypt a particular field in the record.
Viruses - No worries. When melissa came out companies on Exchange were down for hours if not days. Notes shops kept on trucking...
Notes Client UI Sucks - Use the Web INotes client, use Outlook (you have options).
Servers crash - Run it on a stable OS like OS390, Solaris, Linux, etc. (you have options).
Start with a frontend of servers that ONLY forward mail to your spam filters.
.5.
In the middle are your spam filters. Run SpamAssassin in daemon mode. These guys will forward mail to your delivery machines, using a DB-backed forwarding table.
The delivery machines run IMAP. While I haven't used it in about 2 years now, Cyrus IMAPd was a great system when I used it. It will do virtualization for you. Your users can connect to any of the IMAP servers and their requests will get forwarded to the correct server.
With this setup, we supported about 45k mail accounts using 5 servers. The load never got about
Note that you're still SOL in the case of a server failure. However, if you study up on IMAP and POP3, you'll discover that they can't handle server failures at a design level. There's simply no way to handle losing a server.
I've done something like this for a government entity with about 1.7 million accounts with similar constraints. Email me at anthony@adctech.biz or papillion@gmail.com or call me at (918) 926-0139. I can recommend, setup, do reports, etc for them for a reasonable price.
Anthony Papillion
Advanced Data Concepts, Inc.
"Quality Custom Software and IT Services"
You're running a million email accounts on exchange? You didn't say that, but still I'm asking. 'Cause I'd say that would be impossible. ...
:-) ). Zope could maybe even do parts of the webfrontend. But you'd have to test that for speed. Python alone is perfect for frontend though. The PL doesn't matter that much, but you should stick to one for all what you're doing. You're starting with a clean slate, you might aswell honor that without building a messy bloat of 10 technologies.
But if your company plans to scale to 1 million live and running mail accounts these are the things that come to my mind:
1st: You need serious Iron for this.
Either up to a few hundred beefed rack pc's (depends on how the mean usage of those 1 million accounts is) for load balancing, admin, fault tolerance, data and automation or some uber-special sun server solution for this. PCs are probably better. Scaling/maintaining is cheaper in the end.
2nd: I only know my way about things the size of a beercase so here goes my 2 cents for the PC solution:
Something stable (Linux or BSD) and an MTA that doesn't get in the way. Maintainability goes over speed at this point - I'd guess Postfix or Exim would be the ticket.
3rd: Consider a DB setup for storage. Also here I'd go OSS all the way. Postgres and Firebird scale very well, but even MySQL could pull this if set up correctly. You have no big relational stuff, just a 'give data, take data and shut up' scenario here. And MySQL is fast. As in f*cking fast. If you have a good admin policy and automate that you're going to have zero fuss, zero slowpoking in your storage. MySQL might even be the best way.
4th: Process automation/admining. Pick a good PL and/or appserver for this. This should be scalable in itself and also shouldn't get in the way (again: maintainability over speed). I recommend checking out Python and Zope. Zope loadbalancing is a piece of cake and allthough it's a slowpoke unsuitable for the grunt work, it's object relational DB is like sex with Claudia Schiffer when working with it. You could set up a handfull of boxes with that and have your backend covered (no pun intended
5th: Your team. You need a team for this. A handfull of people who help you build the system and document it and know whats going on. The "OSS expert but no-foam-around-mouth" type is good for this. All should use the same PL for automation and generally know whats going on, even if they specialize (storage, automation, web-frontend, etc).
6th: Facilities. This is the ballpark where you think about that aswell. Fire safety, spare power with large UPS and maybe even generators. Fat lines. Do the math, add 40% and then build it. Same 40% rule goes for cost and final rollout schedule.
7th: Offsite backup for internal accounts. Check your requirements. If the company is large you'll need a remote backup site and some overturning backup policy.
Backup could also be done with two or three suitcases of external encrypted HDDs that are carried around if you want to save bandwidth for account access. Few think of a solution like this, but it acutally is feasable, safe and cheap. And spare HDDs for replacement aren't a problem either.
8th: Politics.
External Contracts: Get nice-like with your ISP(s). You wanna have had a few beers with the guy you're explaining that you've missjudged your requirements and need an extra 4 lines. Now. Or like to switch of a few for the time being.
You Boss and his superiours: Keep them informed but do the decisions yourself. Don't pester them with techno babble. They wanna know you can do it yourself. At this scale they are more your partners than your superiours. You need to be up to it. Naturally. Rethink that matter before you give your people a thumbs up. Nothing bad about coming to the conclusion that external contractors would be the better solution. Be a professional, not a jerk.
Good luck.
We suffer more in our imagination than in reality. - Seneca
Anyone ever hear of TPF? They have the TPF Internet Mail Server...would work perfectly for this.
http://www.tnpi.biz/
I'd hire him if he was interested. Even if you don't go his route he'd be a good candidate to evalaute the purposals if you don't have someone who can do that on staff.
You should look at the following: - A good L4 Load balancer (Foundry, F5, Cisco, etc...) - A good platform for inbound and outbound email filtering and relay to front exchange; appliances exist that do this well (Ironport, Barracuda) and you can scale these via the aformentioned Load Balancer - Build a good LDAP Directory Infrastructure; Sun makes a very good one - it will scale into 10's of millions of users - Run your Exchange servers in VMware (the Data Center version) where you can quickly recover and backup your servers since the image is just a file - Implement Exchange on a good SAN such as EMC or Hitachi
In the end you're going to have to ask vendors (or advocates for free solutions) to give you examples of configurations. There are certainly candidates. I believe Sun's JES is used by some large ISPs. Novell has a couple of products that might be interesting. Mirapoint makes appliances which are very much worth looking at. I'm sure you know what to ask these folks: give us examples of configurations that actually are used in large-scale installations. What I liked about Communigate is that their multisystem setups looked more symmetrical than some of the alternatives. The main reason I didn't look at them was that I was looking to do mail and calendaring and at the time they didn't have calendaring. They do now. Also look at what other service you're going to need: spam protection, support for Blackberry and other mobile access. Vendors ought to be able to tell you how to integrate these support services.
HINT
..
Just outsource to a company http://www.loftmail.com/ They have the best corporate email setup.
of course have everyone sign up for a hotmail account. duh *ducks*
Outsource it to the Chinese. You'll get a great product and Gates' goat for another snub all at once.
/.'s Psychic-in-Residence: Psychic to the Geeks
I also work at a company that thinks Exchange is the ultimate tool, and host it about 1000 miles away from where I work. But it serves IMAP and LDAP, so I just use tbird from my Linux box and it works like a champ. I only fire up Outlook because of meeting notices. Poke around a bit and you might get tbird to work for you too.
- doug
AKO (www.us.army.mil) is the Army's official intranet portal. We provide email for over 1.72M users, and we move almost 3 million messages a day. We do it all with Sun Messaging Server ver5.2 (soon to be Jes3) and we have exactly 2 (count 'em) two mail administrators. Sun mail is rock solid and scales great. We offer POP, SMTP, enterprise SPAM and Virus filtering as well as personal address books besides. We don't get the rich Outlook fat client, but then we want to be all web-based anyway. Can't say enough about Sun mail. If we had to do this with Exchange, I'd have to hire prolly 50 admins and deploy order of magnitude more machines.
If so, they've been sacked and your Exchange servers can cool down now.
I like microcars
But the real advantage of Notes is as a distributed applications platform. If you want to expand past e-mail and start writing applications such as leave management or room booking or technical documentation databases the this is where Notes really shines. And they're all databases and they can all be replicated so they take advantage of the same redundancy that your e-mail will use. And if you need to travel then you just replicate the databases you want onto your notebook and take them with you. It's fantastic.
Ah, the mail client
Why oh why does the client suck SO MUCH!! At my previous company the management were looking at moving to exchange simply because Outlook is so much a better client than what Notes (even R6) is. It's a big fat piece of bloatware (as has been discussed many times here). My main peeve is that if you edit an attachment inside an e-mail you can't save it back into the e-mail! eg: here's a typical scenario:
Not using Notes (outlook, thunderbird, mail.app all let you do this)
- Receive e-mail with an attachment
- dbl-click on the attachment, edit it, save it
- forward the e-mail, including the saved attachment, to someone else
Simple huh?With Notes:
- Receive e-mail with an attachment
- Detach the attachment from the e-mail message. Save it somewhere
- Use windows explorer (or whatever) to find the attachment, edit it and save it
- Forward the message
- before sending, delete the original attachment and replace it with the copy you have saved on your hard drive somewhere
- send the message
- delete your copy of the attachment
Sigh!!!WHY!?!?!?!?
But despite all that crap I still think it's an excellent platform and one you should consider. It has support for encryption and also supports IMAP (although not very well I hear). A lot of large corporations run it. I've worked for 2 large investment banks both of who run it. You can also integrate IM into it (with sametime) and remote meetings also (with sametime meeting). Also, IBM PS are good at setting it up. For something this scale you'll be up for $$$ anyway so I'd be looking at having someone come in to help you and they're pretty good (I don't work for IBM!).
Why would anyone want to use a text editor that is not vi?
would could be simpler?
You want Postfix + Cyrus IMAPD. These are your core elements; Postfix is easy to chroot, run SSL, SMTP auth, and will work with SQL or OpenLDAP, etc. Ditto for Cyrus. There exist ways to interface it with several account management sides. The fine folks at CMU have designed it to scale out the ying-yang. I've never had a peep of trouble out of either piece of software.
:)
Cyrus will also provide you with POP3, in addition to IMAP.
If you want to extend the system, there are many ways to do it. On top of my Cyrus/Postfix setup, I have a procmail glue layer which runs DSPAM and any custom rules. I use MySQL for the aliases, auth table, etc. I have mod_php and Apache setup with Squirrelmail. Email is the most complex suite of applications that my Linux server does, and it does it flawlessly. I have never lost a single bit of data. I'm using a RAID array with regular backups, though
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
A good webmail option is kinda a catch. Squirrelmail is nice, but compared to OWA its really out of its league.
I recently went through the quest for a decent webmail client for my home network. I have seen the promised land, and it is The Horde.
PHP front end. Multiple storage backends from filesystems to the standard gaggle of databases. An interesting web-accessible VFS that I can see being really useful in a corporate environment.
IMP (the mail component) can read mail from multiple sources -- either POP3, IMAP or IMAP/SSL (maybe more, those are just the ones that I know). It also deals with spam management at the individual client level.
Consolidated bookmarks that are web-accesible; notes; tasks; calendars; address book.
It can use LDAP (as well as about three dozen other things) for user authentication -- an important consideration when contemplating 1,000,000 users.
A little apache magic, and it's all SSL secured.
I don't know how it would work in a large, large environment, but with Postgresql for a backing store, I imagine it could scale as far as you wanted it to.
I've only been using it for a few days, but I'm really impressed.
I find your design interesting.
The only large mail system I'm familar with uses Cyrus IMAPd's clustering facility, OpenLDAP, and postfix.
I'm particularly curious about the choice of Courier, and of NFS.
Courier I have little enough experience with to comment on. I was under the impression that it was a bit old and crufty, didn't have header caching for IMAP or other useful performance enhancements, and wasn't overly well suited to "sealed server" operation (rather than servers with direct user logins too). I would be interested to know why you passed up Cyrus IMAPd, as in my experience it's fantastic software that "just works" and I know there are sites that use it for gigantic volumes of mail.
I'm also interested in knowing what platform you're using given your use of NFS, though I guess maildir might be safe even on Linux NFS.
It sounds to me like you work AT Microsoft and have all finally seen the light!
Best of luck dude. Be sure to post your solution on the web so we can all learn from your experience (if you can).
First things first, you need reliable storage. We have NetApp filer doing the job. The delivery mail spool is shared amongst servers over NFS on a dedicated GB LAN.
Second: Load balancing. We use an F5 BigIP to balance incoming mail connections (smtp, imap and pop3).
Third: separation of duties. We have a set of externally connected mail servers. These systems route all mail. Mail destined for local delivery gets transferred to a second set of internal mail servers. The external mail servers block spewing, ban known spammers and virus check smaller mail files. The internal mail servers accept mail only from the external mail servers. They run spam assassin and clamav to stop spam and viruses. We use exim as the mta, but anything will do.
That's it.
We use maildir for mail storage, cyrus-imapd for imap and a custom system for pop delivery (which sucks, but I won't get into that here).
The Exchange Replacement How-To. LDAP, IMAP, POP3 (if you must), etc., etc. Open tech that works together and scales as high as you can add servers for it to...
geek. lawyer.
I have a server that handles 130,000 web access users -- has for something like 7 years. I'm 3000 miles away. I never get called. Its all automated.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
I would start by hiring someone who knows what the hell they're doing, which you obviously don't if you have to ask here.
Part of the abstract:
I forget what 8 was for.
Just an idea... if you want to go with open sources products in your company.
g fs/
First, the most important is the backend storage.
- I would try using a SAN for storage, like a small Clarion for example. I would carve the storage for the mail there on a volume.
- I would create a set of export servers that would connect directly to the SAN and re-export the volumes to a set of front end servers using a combination of gndb, gfs, etc...
See this document:
- http://www.redhat.com/magazine/008jun05/features/
- configure a set of servers that would act now as the mail servers themselves (frontends). I would strongly suggest using maildir. CourrierIMAP for the pop3/imap accounts is great. Install this on all the machines. For the SMTP agent you could use courrier but I usually prefer Exim.
- run both the IMAP/POP/SMTP servers on all the servers, using maildir only.
- use a mysql database to store the users information (passwords, email addresses, etc...). You might want to configure 2 mysql servers. One as the Master slave that will receive only the writes, and the other that would be accessed for read and balanced with the first one as reads to access user information and accounts will probably be 99% of the database activities.
- use a load balancer to put in front of all the frontend servers, do a load balancing for all the services (POP3/IMAP/SMTP) with sticky session that will try to keep the same users on the same machines when they try to download their mail.
When you are running out of capacity, simply adds new frontends, put them behind the load balancers and voila...
of course I would advise going right away with powerfull 2x3.6GHZ P4 servers and like 4GB of memory. That is powerfull and can certainely serve a LOT of users already per server.
my 2c, written quickly. I apologies if not complete but I am pretty sure the general idea is there and sound.
open to comments
There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).
Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.
And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.
Did you ever think that organizations with customer care need their messages stored in a relational database because they need to reference the threads, i.e. the history of the communication? And you could be talking100's of thousands of inbound email *every* day?
Exchange 2003 - any edition. You can scavange the restored database and bind it to any account that doesn't have any exchange.. I.E. a new temporary account... RTFM!!!!!
(1st sig) If this were a snappy sig, you'd be reading it right now. (2nd sig) I'm a karma whore. >Insert FUD here
While Sendmail's the only other MTA with the flexibility and performance, even it can't keep up with Openwave Email Mx. And with Sendmail, its a beast to manage on a large scale and its only one piece of the puzzle. Trust me, the price is worth the performance and managebility. It has all the features you ask for, plus. Obviously, hardware is the other part. Netapps are a good mid-range product, but they love their stuff. Get ready to pay twice for it. There's a range of load balancers out there that fit the bill. If its a high traffic site, go with Foundry's ServerIron. Otherwise, you'll want just stick with Cisco.
www.openwave.com
Sorry, but calendaring is not there yet on *nixes, try back later. Yeah, iCal, WebDAV, blah.. nobody has done it yet for free; at least not in the users' eyes.
Hula's calendaring is looking swank these days, though. If the codebase ever becomes more modular (split out SMTP/POP/IMAP/Calendar), then, Hula might make some big waves.
You can roll your own inbound/outbound MXs and filtering boxes but the creamy filling that the people demand is going to come shrinkwrapped.
99.9% uptime and Windows are mutually exclusive so, yes, you need something non-Microsoft anyway.
As of 7 years ago, when I worked for IBM, the Notes installation for all of IBM's West Geoplex consisted of SP Nodes. Silver thins running AIX. About 6 frames worth (12 nodes per frame), using high nodes for backup with TSM. Everything was connected to SSA, I can't even remember how many drawers. I do remember when I was cold all I had to do was go stand behind the SSA racks. ;-) High nodes suck, just for the record.
;-)
Everything was tied in with the SP high speed switch, which was connected to two Ascend switch routers. (If I remember the company was bought by Lucent). The Notes complex for mail was tied in to the campus network via the switch routers, and also tied to the Notes Database complex (which was a similarly large SP installation.)
We were using gated to dynamically change the default routes if one of the campus network connections died.
We also used pman to monitor the health of the complex. pman notifies you instantly when something goes wrong, where as Tivoli monitoring only polls every few minutes. There were several occasions when I worked there where we detected and resolved problems before Tivoli ever noticed.
Is this relavant to what you want to do? Probably not, just reminicsing a bit. I'm one of the few I know who actually liked the SP.
What they do now, I have no idea. It wouldn't suprise me though if it was Regata Lpars with GigE and Shark disk.
Now as far as Notes is concerned, RUN, do not walk, away. I can't stand it, but hey, it's your mail system.
"The avalanch has already started, it is too late for the pebbles to vote." -Kosh
I did some investigating on expandable mail systems and the only one I found was cyrus' murder project, http://asg.web.cmu.edu/cyrus/ , and so far it's worked quite well for me. The campus supports 10's of thousands of users. I don't know of any reason why it could not be expanded to hundreds of thousands and beyond. It supports high availability and is quite fast. Also, unlike other servers it allows a single namespace, so no imap1.domain.com, imap2.domain.com, everyone is just imap.domain.com. Check it out.
And another. DBMSs make integration with web applications a lot easier. Odds are there are already tools and classes and languages (like php) to easily mess with the db.
Also, if you want your email in a db without using exchange, try dbmail.
I've set up a dbmail/postfix installation to use with my company's web application we're developing. Though I have no idea how well these work with 1000000 users, they're worth exploring.
Simple. Its cross platform. The entire product is cross platform. Yeah, like java. Only they did it before java was a pipe dream. Late 80's.
.NET.
It has this thing called a seperation layer. All the code except the ui is the same on all the platforms. Clients used to be for os/2, mac, win16, win32, and solaris. Client side that got scalled back because nobody paid for the others -- client is win32 and mac now -- soon with code under linux as part of the next generation client. Lots of people are using on Wine.
Now, the server is still cross platform. Win32, Linux, Aix, iseries (as/400), zseries.
The problem with making something cross platform is, you don't use all the nifty little Windows specific integration and custom pretty things. You don't get something for nothing -- you have to make all those bits.
Oh, the other thing? Outlook feels integrated because everything automatically does the windows automatica launch active-x thing. Just highlight a message subjet, bingo! Embedded code launches! that's why viruses and worms.
If stuff wants to run in Notes, it has to be have a signature. OHHH, public/private key signatures and encryption. When? 1991. Hunh? Yeah, since 1991.
If something wants to run in Notes -- It need PERMISSION to run. Thus, no viruses or worms unless you're stupid enough to tell them "OK, sure, go screw up my machine".
Yes -- the development environment is weird and pretty unsophisticated. It takes a lot of time to learn because its not like other things. BUT -- I can make it do cool, secure, reliable things at a tenth of the cost you can in J2EE or MS
Excited about JSR170? Ah, me too. The Notes database internals match it almost perfectly. Domino will make a great JSR170 back end. Hell, its almost that already.
Meantime, you trolls are whining about a product that runs in Linux as a server and (using Wine) as a client. Runs on Mac. Has a fully functional JAVA environment for development and a remote API through CORBA and DIIOP. No no, instead you'll use a proprietary only -- Windows Only, Active Directory Only, Virus Distribution Engine from Microsoft.
ahahahahahaha. Enjoy it!
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
Either run it on Netware or OES. Running it on Netware gives you full clustering capability .. which means very little, if any downtime. (I run GW7 on a NW6.5 cluster, and if a node fails or is taken down for maintenance, it's back and running within about 5-10 seconds on a different node. The clients never see it being down, they only notice a few second delay in sending/opening an email).
GW 7 has NICE webmail, pop3/imap/pop3s/imaps/SOAP, is LDAP compliant, has PDA connect software, and the ability to do webmail with light devices (i.e. your phone).
= Grow a brain...
I'd check out the Sun Mail platform or the Openwave platform. They are pay for, but scale very large! Sun also licenses per employee instead of per mailbox, which is a plus for email providers. The Sun Platform has a plugin for Outlook that allows it to totall mimic outlook back end things like calander and such. http://www.openwave.com/ http://www.sun.com/software/javaenterprisesystem/i ndex.xml
Just my opinion, For what it's worth.
Sendmail or postfix or even qmail will do the job just as well as exim. Just say "use whatever MTA you like" instead of trying to pretend your MTA of choice is the only way to go.
I found dovecot to be faster than courier, and use less RAM. It also does ldap, ssl, maildir, etc, etc.
Making a mess of ugly directories is not needed if you are using a decent filesystem. I know the BSD's FFS has dirhash to make handling tens of thousands of files/dirs in a single directory work just fine, Solaris has something too. I'm sure one of the dozens of linux filesystems has this dealt with. And don't bother with a linux or BSD NFS server for something this size, just go netapp.
Spamassassin!?!? Good lord man, you will need dozens of servers just to run that. Its incredibly resource intensive, it needs its own server just for a few thousands users. Perl and tons of string mangling is not a resource effecient spam filtering solution. Use a statistical filter written in C.
Don't make all the boxes the same. Its a much more effective use of resources to dedicate these X boxes as MTAs, these as POP, these as IMAP, LDAP over there, etc, etc. You don't want all that stuff on every server, or you are wasting lots of RAM with identical processes on seperate servers that can't share resources. Its also easier to tune the OS to fit exactly what the box is doing, which doesn't work so well when its doing everything.
Hardware load balancers are not at all needed. Throw a couple OpenBSD machines running CARP and PF in front of the servers. It will be cheaper, and gives you firewall + load balancer in one.
Agreed. Notes is horrible. My employer uses it. It is almost the worst e-mail client imaginable. Only good thing about it is that it's better than i-notes. That isn't exactly setting the bar very high though...
Dude, that was fucking hilarious. Excellent troll, sir.
All of these systems will be running sendmail.
You're high. Building a massive production email system on Sendmail 9 is slow-motion suicide. If the security holes don't get you, the terrible configuration methods and complete lack of scaleability will, nevermind the fact that Sendmail Inc is trying desperately to replace the product.
"Most managable with [...] heavy customization?" I'd laugh if I wasn't crying. And I'm crying because I used to work for a company that deployed a massively customized sendmail infrastructure -- and I was one of the poor bastards who had to maintain it. Trust me, you don't want to do this. Ever.
Yes, milter is cool. No, it's not cool enough to justify burning CPU cycles on sendmail in 2005.
Even Sendmail Inc tacitly admits that Sendmail's design is garbage: take a look at the design document for Sendmail X, and note carefully how much it resembles Postfix and Qmail. There are very good reasons for this.
News for Nerds. Stuff that Matters? Like hell.
There have been several good answers for the front end. Here's a good backend architecture.
make sure you virtualize. VMWareESX and Vmotion are very cool. They have tons of info on their site for using virtualization to increase uptime and it's all true. I thought it was a load of BS until I started using it. It's great for DR and multiple sites.
flat backend filestore....
ok, I have nothing to do with these people. I have no stock in the company, I jsut think they have a cool product. http://datadomain.com/
Check out Data Domain if you're going to use flat filesystem for the filestore. They use bitpattern matching to provide pseudo single instance store, and (they claim) 20x compression (though with this technology and something like mail, you could probably approach 8x easially.
Their products do remote replication so you can have your multpile sites with the same mailstores.
also figure storage.... 1M users, figure average message size is 15K (in a single instance store system, no SIS, figure 75K). Figure everyone is going to have 1000 messages in their inbox.
so that's 15-75TB if you could limit mailbox size reasonably, you could probably get away with the DataDomain DD460 without too much hassle. put one at each site, set up asynchronous replication, buy 2 extras for backups in different locations and offset their synchro schedules. if you need a message delete more than a few weeks ago, tell whoever wants it to go ask the corporate lawyers why you shouldn't keep email on tapes. If they really want to back it up to tape, email me and I'll build you an architecture on paper.
if you don't use a DD product for your backend, look at pillardata.com you could build a 20TB system for about $6/GB and when you fill it up, expand it for about $3.50/GB (in 4TB chunks)
I do storage management for a living. I have about 160TB of accessible storage spinning right now. Backend I can do off the top of my head. Front end is best left to others, but the backend is always the same.
"We are not tolerant people. We prefer drastically effective solutions"
Malcolm Turnbull has called for the Government to give every Australian their own email address for life.
6 74.htm
http://www.abc.net.au/news/newsitems/200509/s1456
Hypothetically (since nobody is dumb enough to believe this is a real life case of a million users being defined by someone betting his career on slashdot trolls)
If it were me starting from scratch -- the model for a million uses is the internet itself. SMTP, DNS, and mabe a big LDAP directory tool. For calendaring, you're SOL, but nobody calendars with a million poeple. That's meaningless. Calendaring is only useful at the workgroup level anyway. Look to any good workgroup calendaring tool and let users define thir own working groups.
Now, backing off the big million user stupid number. In the real corporate world, you have two real players and a ton of also-rans. The two real players are IBM/Lotus with Notes and Microsoft with Exchange.
The market is split roughly evenly. In the US Microsoft leads a bit, in Europe and EMEA IBM/Lotus leads. How much and actual numbers are hard as hell to track down. IBM doesn't release them and Microsoft likes to count every copy of Office as an Outlook seat. Suffice it to say both companies own about a hundred million actual users.
The basic trade off between the two - With Exchange you get tighter integration with Active Directory and smooth look and feel integration on windows. It feels like all part of the operating system. On purpose. On the other hand, you're forced to use Active Directory, forced to use Win32, and all that integration without any real security means viruses are unstopable. With Notes you get a bulky client that many users find hard to understand. You also get almost 100% prevention of virus spread (it has built in security) and other goodies. Its also a development platform and its cross platform. The client is Win32 and Mac, and users have writen howto docs for WINE. The server is linux, win32, AIX, ZSeries, and iSeries (as/400).
You may not know this, but BOTH can use the Outlook client. Yes, the outlook client is supported with a Domino mail infrastructure. Who'd have thunk it?
Oh, and Domino supports other mail clients too. Pop3, IMAP, and a very good Web Browser -- all at once for the same person if you like. Its got native SMTP support, as well.
What Notes isn't, its pretty. Most people say Outlook is prettier. Ok. Easy to do if you own the OS and make software that only runs in one environment.
So, I hear rants about Notes. I hear trolls whining about a product that runs in Linux as a server and (using Wine) as a client. Runs on Mac. Has a fully functional JAVA environment for development and a remote API through CORBA and DIIOP.
No no, instead they'll use a proprietary only -- Windows Only, Active Directory Only, Virus Distribution Engine from Microsoft.
You gotta love that. Why? Well, its pretty.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
I searched on Google for "email system" for "1 million users" ...
this page came up:
@Mail with large user bases
-> it even gives you a case stude of Hotmail!!!
the company is called @Mail
it is the exact same solution that seeqmail.com uses and they have over a million users.
Read it... Find out more... and Google some more
Don't pay over-priced consultants unless it is something you have absolute no expertize in. It is your job to figgure out how to get it done.
What?! You think that being down over 45 minutes every month is "scaling perfectly?"
Best Buy can have you arrested
365 (days) * 24 (hours) / 1000 = 8.76 (hours)
No?
Good job. You wasted quite a bit of time proving in excruciating detail that i messed up, after everyone else alerted me about it.
Somehow i'm not surprised that, in your calc.exe fest, you missed the obvious - I missed a decimal space.
I wouldn't normally do it your way, though. I would click the "Forward" button, then double-click the attachment, select the "Edit" option, edit and save, and then send. But I could do it your way. The only requirement is that you have to put the received message into edit mode before you try to edit the attachment. There is no "Edit" command on the menu or action bar for received messages (because face it... the received message is supposed to stay the way the sender intended it), but you can do it either by ctrl-e or by double-clicking anywhere in the message body.
-rhs
"...the email system for a huge company, which fed up of Exchange, wants to replace their entire system..."
I think perhaps that a parenthetical rephrasing, such as,
"...a huge company (the subject), which, fed up with (not of) Exchange, wants to replace its (not their, as we want the possessive pronoun to match the subject) entire system..."
I administer GroupWise at one of the largest health care organizations in the United States, and I can say that I've been veyr pleased with its reliability, scability, quality and its ability to run on either Netware or Linux.
Whenever other companies are having problems with email viruses, we don't seem to have the same problems. On top of this, our uptime is very impressive even in non-clustered sites.
Though it's not free, Novell's technical support is superb. Give it a try!
Ask a pidgeon breeder how quickly he can round up a million birds.
I'm sure there are some very competent people here but for that size project you need professional help. You need to either look at Sun's messaging product or software.com's Intermail. I've implemented both and had issues with both. At 1mil accounts you shouldn't hit either products scalability issues. Although I will say I was just involved in a major ISP's migration from intermail to Sun's messaging for approx 7mil users because intermail simply wasn't scaling. 130+ servers with lots of folks architecting the various layers.
Well, there's your problem. Instead of bringing in IBM, you need to bring in a qualified consultant, preferably an IBM Lotus Business Partner, with actual real-world experience in the trenches making Notes and Domino really work for organizations similar to yours, and who makes it their business to understand your problems, fix them, and transfer knowledge to you. You'll never get that from IBM. You're a small customer for them. You barely register on their radar screen. And contrary to what most people assume, the people IBM Global Services have out in the field supporting Notes and Domino simply don't have vast amounts of special inside knowledge and direct connections to development.
Netmail for add bells and whistles (plus "support) or Hula for the FOSS version of Netmail (I think under GPL). Netmail was supposedly designed for large numbers of users
you mean decimal *place*. Maybe you're "not a History or English degree" after all and you just made a small calculation error. Or maybe you just made a spelling mistake. Either is UNACCEPTABLE.
Having been through this sort of thing before on a smaller-scale (we've done 3 mail system re-engineerings in the last 8.5 years, the last 2 LDAP-based and are just breaking into 6-figure mailbox count), I can make one big recommendation: Think VERY long and hard about your LDAP schema and make sure you get it right the first time. Do this LONG before you even think about other software/hardware.
If done properly and you get proper triggers done to push updates from your backend database to LDAP (or just make the LDAP your canonical data store for technical data if you can), everything else can be implemented using off-the-shelf software and hardware (think SAN backend for mailbox store with redundant switched paths between servers and storage and load-balancers in front of most of your components).
And as stated before, seperate your roles and put them on discrete redundant clusters of machines.
Our architecture actually had the mailboxes in private space, with access coming via front-end MX for incoming mail, and pop/imap proxy for reading.
Assuming you need performance, zmailer is your
best choice for outgoing email.
one million accounts. i'd ask for some proof of that first before even considering taking on the project.
A friend of mine at Iowa telecom just did this /. so he probably wont read this but if you really want to know how to do this you will find him.
poke around you might find him he's a bit fed up with
I'd Tell you all my secrets but I lie about my past
you could use exmerge on 5.5 to do mailbox-level backups and restores. it would stop single-instance storage for that user's old mail, though.
When using the openbsd machines for loading balancing and firewalling, you can run spamd on them. Its openbsd's very small and very effecient greylisting daemon. Doesn't matter what MTA you use, uses almost no resources, and cuts out the vast majority of spam before it even touches the mail servers.
Do what ever. And how did this get slashdotted? didn't know a question could do that. Tried forever to get something slashdotted. My site for one when it opened. and here you go and get it up in the highest rank everywere, a question.
ModLife.Net - If it ain't modded, what's the point?
Personally, I'd start by asking for a raise.
IBM / Lotus Domino sounds like a good fit. Supports webmail, POP3 and IMAP out of the box, servers are extremely reliable if administered properly. Scalable, supports clustering (and real clustering, not like Exchange). Runs on Linux, Windows, and big iron from IBM. Sounds pretty much like what you're looking for, since you didn't say "free".
I'd start by looking at Communigate Pro from Stalker Software. They have a Dynamic Cluster solution which taps out at 5 million accounts, and includes everything you are asking for. They have a Super Cluster that will handle 5 million+ if need be. Their prices are very reasonable, and they have won numerous awards. Their Network Computing stress test did something like 160,000 e-mails per hour with zero errors. They have a free unlimited trial to download, and runs on 21 platforms from Windows and Linux to QNX and BeOS!! http://www.stalker.com/content/solutions.htm
Because teenage pranks are fun when you're about to die!
I know this is going to come out all wrong, but I have been a mail admin for a while and if you could use multipal servers under one domain Lotus Domino on iSeries is by far the most stable mail collaberation product on the market. I know no one likes Notes, but it is stable and the iSeries is by far the most stabel production machine on the earth right now next to a mainframe and solaris. You can run Domino on Solaris too and you should be ok there too. There is great fail over services. Don't think because it's an IBM product too that it's big and crappy. Remeber that IBM purchased Lotus and it's stilll a great product and just keeps getting better. You can put tons of user on each Domino server and agin it's stable too. That is all for now. Yes I said Lotus Domino and Notes......what you going to do about it? No I did not use spell check.
David Vasta iSeries(AS/400) Admin & Junkie
Rediffmail uses qmail, and they have upwards of 30 million users. But I wouldn't dare to contract your rhetoric with facts.
-russ
Don't piss off The Angry Economist
It's not free, but the best most scalable non-microsoft solution has got to be CommuniGate, it could scale to millions of accounts easily. Supports just about any OS and even includes almost any messaging protocol you can think of. Check it out at www.stalker.com.
It's been said before, but I'll have to throw my lot in with Communigate Pro. There have been installations of over 4.5 million.
Check out their page on dynamic clusters. I use it every day and must say it's the best investment I've made in commercial software.
For $.99 per user can use Yahoo's Business Email using your company's domain. From Yahoo: http://smallbusiness.yahoo.com/email/faq.php#8 Why should I outsource my email system to a Yahoo! Business Email plan? Our platform is a well-established system with millions of satisfied customers and billions of emails successfully delivered every month. With a Yahoo! Business Email plan, you can expect: * Cost savings: Yahoo! Business Mail can provide email for your entire company at just a fraction of the cost of mail systems you host yourself. * Immediate implementation: Yahoo! Business Email plans are so easy to set up and use, you can start configuring your email system minutes after your initial purchase. Other email systems may take weeks or months to implement. * Greater reliability: Yahoo! Business Email systems are monitored 24 hours a day to ensure performance and reliability. * Automatic Technology Upgrades: You don't have to worry that your email system will be outdated or need upgrading. Yahoo! Business Email plans automatically upgrade your email system with the latest hardware and software improvements.
I would not use the terms good with Hitachi. Maybe I am biased with EMC products, but they sell more than the next big three SAN vendors combined.
What luck for rulers, that men do not think. - Adolph Hitler
Stop using SQ or Horde/IMP, @Mail - http://atmail.com/ is a much more reliable and "Outlook" looking WebMail client.
Computer Associates
addict3d (more info than the CA link above)
It doesn't look very exploitable, but it is worrysome.
LedgerSMB: Open source Accounting/ERP
Having investigated scalable mail systems I would recommend at least taking a look at Mirapoint. They aren't perfect but they are professional and have a very nice solution, though it will cost you.
If you're going to do things yourself I'd suggest looking at some of the following:
Has most of the following already installed and you'd want to subscribe to a version of RHN that let you rapidly roll out new servers or upgrades/security patches to existing servers
Can be used for authentication and directing mail and pop/imap or even webmail session to the appropriate backend mail stores.
POP/IMAP proxy can use LDAP
Again can use LDAP
I used this to proxy/redirect webmail logins
It's cluster feature is actually quite handy and it's monitor scripts along with some Perl make for a quick and easy monitoring solution.
Using the above you can setup front end mail exchangers doing various anti-spam and anti-virus work in a load balanced setup with dynamic banning of IPs based on logs of refused mail. They should make use of LDAP so you don't allow any mail in that is destined for a non existant user.
Then you can use this to balance multiple back end servers of virtually any description. You could even have multiple vendor solutions used for the backend servers. Of course you'll need to tie it all together with custom administration scripts, etc
Stalker Communigate is very good but for that amount of traffic it will be expensive.
It's a great roduct though.
The parent poster couldn't be more wrong. It's actually hard to even read his post entirely.
:). 1.5 for 9...
;)
1) qmail is VERY EASY to install, and yes, due to it's age you must patch it. This is why you should use netqmail from qmail.org.
2) qmail is the easiest MTA to configure PERIOD. Single config files with multiple data lines (domains) or even a single line of configuration (server name)? Sounds damn easy to me.
3) Not scalable? Are you insane? qmail scales with the operating system. It uses mostly system calls to complete it's tasks, so it's very very quick. But again, it depends on what you are doing. You can't run _eveything_ on one machine, regardless of MTA used.
4) "on other daemontools"? That doesn't even make sense. First, daemontools is a software package by that name, there are no "others". Second, in #3 you said it was designed for inetd, no, it was designed for tcpserver and not daemontools. Running it under daemontools ensure it's always running and started by init itself. I think you need to reread your installation files, djb blasts inetd and merely recommends daemontools.
5) It does? Such as? OH, I know, the installation setup. Well you can change that very easily when you install -- conf-home.
6) What? qmail is one of the most flexible MTA designed (postfix might kick original qmail's ass out-of-the-box though). With something like netqmail with the QMAILQUEUE intercept patch, you can get in between any operations qmail does. And yes, thank you, it is very secure.
7) Oh snap! You just said it was secure and now you say it isn't!?! Pick a side! But ok, please enlighten us how a program written in 1997 hasn't been updated due to security issues (I am aware of the 64bit 4gb ram remote root exploit though, but good luck with that exploit code; datasize anyone?)?
8) Yes, you are right. Too make matters worse, it's not a fair open source license either. You are 1 for 8 so far, good work!
9) Yes, also a very annoying problem. But many have taken the initiative to compile the most common and required features into nice toasters. Just hit google for a qmail toaster. I personally roll my own qmail
In the end you turn out to be nothing more than a poorly educated troll. Please be quiet, adults are speaking.
-mo
Ummm... I think it was a joke. Must be over your head.
My company employs a combination of several technologies which provide almost 100% uptime. Although no system will be perfect, I believe you can achive that 99.9% service level.
Our e-mail enterprise product is CommuniGate Pro (CGP) from an unfortunately-named company called Stalker Software. CGP is in use by many ISPs, scales very well, and is high performance. We're much smaller than 1 million users with around 40,000 accounts. CGP supports SMTP, IMAP, POP, Webmail, LDAP, and has plug-ins for antispam and antivirus. As these functions require a lot of I/O and CPU horsepower I would configure a separate e-mail security appliance. Our CGP servers have a Unix load factor of about 1.00 or less.
For e-mail security, we use a pair of IronPort C60s as our border SMTP gateway. The C60s run Sophos antivirus and Symantec Brightmail. Brightmail has a false positive rate of 1 in 1,000,000 which is very important in large organizations. These C60 systems can each process several hundred thousand messages per hour, which is ideal for peak demands and are great for blocking zombie hosts. No system will block all spam or viruses. However, you can expect to catch roughly 98% of spam and 99.9% of viruses with no effort from the users. Power users can always emply additional spam filters with their e-mail client, such as, Thunderbird.
DO NOT skimp on hardware. Buy high-end Intel or "Unix" servers (Sun, HP, IBM, etc.) and install your favorite flavor of Unix/Linux. Did someone else mention hot-pluggable redundant systems? DO NOT store e-mail messages on your e-mail system. Get yourself a real NAS or SAN server, such as the Network Appliance FAS series. Don't skimp on low-cost imitations. Our NetApp servers have a record of 100.00% uptime for the past five years. Honest! Our only downtime on the NetApp servers was for UPS or power maintenance, or filesystem migration. We have not experienced any downtime. Can I say it again?
Experts will argue whether you should run iSCSI or NFS. NFS is just as fast as iSCSI and can be shared across multiple servers. I-SCSI and SAN volumes cannot be shared across multiple servers so scaling an iSCSI volume to 1,000,000 users is out of the question. Because CGP manages account and file sharing mitigation, you don't have to worry about silly and incompatible NFS file locking utilities.
Good luck with your "project" and please let us know upon what you decide to use.
signature pending slashdot approval
Perhaps you shouldn't have made this statement:
I'm sorry but it just bugs me how many people throw around uptime figures without knowing what they mean.
Easy to point out someone elses mistake, eh? Oh... That taste in your mouth is crow.
The taste in my mouth is your mom's pussy juice.
I mean...what else?
Blar.
For example, the Cyrus IMAP server supports single instance store using file system storage (Maildir-style IIRC). You don't need a database to do it.
Instead of running one mega-honking-big, water-cooled mission critical mail system, consider breaking it down by division. Set up subdomains so that users have addresses such as "username@research.organization.com" or "username@london.organization.co.uk"
Then, each division can handle its own mail. Or, you can set up different mail clusters to handle each division. Still centrally managed, but easier if you break the load into smaller chunks.
Of course, none of this is possible if you're talking about, like, email for mobile phone customers, where all the addresses are in the format 3015551212@messaging.mobileprovider.com.
Anyway, that's my 2 cents. Don't spend it all in one place.
GW claims 10K+ users per (single CPU) server - we have about 6K accounts on a box that's also doing web access, smtp & anti-spam and it hardly gets into double-digit CPU so I think that is conservative. ~40K is the theoretical limit I think.
Clustering would help reliability and scaleability, and you can run it on linux or netware.
GW7 lets you run an OutLook client against the GW backend if you don't like the native client. POP3, IMAP etc and there's also a linux client.
Design (logical and physical) would depend on factors like the average mailbox size, the percentage of users likely to be online concurrently, and where the users are geographically.
I'm a UNIX Systems Admin from a University who runs the Sun Java Enterprise Messaging software (which has had several name changes over the years), and if you're looking for massive scaling, reliability and flexibility- I think this product is what you're after.
I think there is still a few issues with the outlook connector for calendaring, but I haven't looked at it for a while. Though we don't run 1,000,000 users, we happily support over 50,000 on a not-so-big machine. With the right hardware behind it, high performance LDAP servers and Sun Cluster I think you've got your answer.
I dont agree with it being difficult to administrate.. you set it up properly and once you get your head around how it works (if you understand PMDF, this is a no-brainer), it's very manageable and goes like a train. It may take a little time to get it all up to scratch, and if you're not in the know Sun Pro Services should be able to sort it out for you.
As an admin of GW with 250K users I can personally say that it is the correct way to go for large-scale projects. The company I work for is spread out around the globe and our CIO demands 99.99% uptime (it's hard...I admit) or our jobs are on the line. We have a great agreement with Novell that allows us to do this with clusters (we've never lost an entire cluster at one time with OES Linux or NetWare even during updates, upgrades, etc). The new GroupWise client is loved by our users (easy to use) and the WebMail is just awesome. Administration is a cinch and has minimal stress (considering the requirements). As a technology that's been around and proven for years it is great. Integrating with eDirectory is great as well because of the marvel of partitioning. Assuming we don't have problems this year with an entire cluster we'll hit 100% in just a few more months for uptime but we still have the holiday rush to survive so here's hoping...
I just reread the parent after posting a long description of our configuration. Is this system in your basement? I ask because you mention personal and free accounts. You could put these special non-corporate accounts on a different system. Why burden a corporate system with non-corporate users?
If you must, CommuniGate Pro is great for this function because you can create multiple e-mail domains within a CGP cluster. CGP lets you delegate administrative functions for each e-mail domain. From our 40,000 user system, I would bet the farm that CGP easily scales to 1,000,000+ users on a fairly small cluster.
As others and I have emphasized, separate the functions on different systems. Install a front-end e-mail security appliance to handle blacklists, block zombies, LDAP attacks, antivirus, and antispam. Do NOT run these using MailScanner or SpamAssassin. Use a commercial appliance such as IronPort (major competitors are MiraPoint and MailFrontier).
Put your mailstore on a real NAS server such as a Network Appliance FAS-960 which can handle up to 32 terabytes and handle a large number of simultaneous transactions (NFS ops). Cheap RAID systems cannot support the load of 1,000,000 users. NetApp servers automatically generate instantaneous snapshots of the file system every hour thereby permitting easy restore of messages without going to backup tape or secondary storage.
Someone else mentioned installing a bunch of redundant fibre, etc. I would hope your e-mail system is installed in a data center with these features.
signature pending slashdot approval
You know, corporate accounts is sure as hell gonna notice $305,326.13 Michael!
Neither of your proposed solutions work. (1) fails because your big scary overused buzzword is not a buzzword at all. It's a person, with a family. And if you were to kill my kid/parent/sibling, I'd resolve to kill you. I wouldn't expect any less from the friends and family of someone I'd killed. For every person you kill, you create many more 'terrorists.' Ultimately, you are advocating genocide. But you're a smart cookie, so I have to assume you know and are ok with that. Kill 'em all, along with the niggers, gypsies, and jews, right Adolf? (2) They don't give a damn what religion you practice, they just want you to a) stop killing their friends, family, and elected leaders, and b) stop trying to tell them how to run their own damned country. Saudi Arabia is an Islamic state and that hasn't stopped Bin Laden and friends from attempting to topple that nation has it?
Killing people is a tarbaby. The harder you struggle, the more bogged down it it you become. If I had mod points, I'd mod you down based on your fascist sig alone.
save 50% if you switch to novell from a major competitor. http://www.novell.com/products/openenterpriseserve r/promotion/index.html?sourceidint=rbanner_oesmigr ate_promo
I hate to throw a commercial solution out here....
but, our company uses Lotus Notes and we have been satisfied with it. Not sure about its IMAP capabilities, but throw enough resources (servers) at it and take some time to learn how to manage it and things work pretty smoothly.
Yes, passwords are transmitted in plain text. So is IMAP, and so is SMTP. You do make your users authenticate for SMTP, right? Picking another protocol will not help in this regard.
What you need to do is support STARTTLS for these protocols. That lets the client connect then negotiate an encrypted connection with the server before sending passwords. It's easy to configure the server to refuse to authenticate the client unless an SSL session has been set up if that's what your security policy dictates. It's also possible to have the server demand a client certificate from the client before setting up the SSL connection, adding an extra layer of authentication.
You'll probably also have to support the old IMAPs, POP3s, and SMTPs standards, but they should be considered deprecated and only in place for crap clients that don't know about STARTTLS.
From ASR ( http://home.xnet.com/~raven/Sysadmin/ASR.Quotes.ht ml )
... while talking to bloat.example.com.:
Re : Mail Transfer Agents
Qmail : a small office of neatly dressed clerks, delivering short clipped remarks to queries, and handling mail with a rude impersonality, except in the case of failiure where they let their hair down and have an after-hours beer and let you know about it, pointing to the pertinent header sections.
MMDF: A jumped up mailroom boy with a chip on his shoulder. Loves the bureaucracy and takes great pride in stamping "illegal address" in red ink on any mail it passes. Unpacks all the mail and repacks it in his own special envelopes before delivery to end users.
PP: MMDF gone mad with standards fever. Think "Brazil".
No, PP is... well, see, when it receives a letter, it chops it into small pieces, then translates bits of it using an English-Hungarian phrasebook and puts all the bits into various pigeon-holes. When it gets round to delivering the message, it collects all the bits, translates them back using a Hungarian-English phrasebook, tapes them together, and loses the letter. Some time later, you get a bounce message:
----- The following addresses had permanent fatal errors -----
----- Transcript of session follows -----
>>> RCPT To:
550 My hovercraft is full of eels
PP is John Cleese.
Sendmail: Shiva as a postman. Many arms delivering mail, dancing, taking drugs, destroying as it sees fit. Often makes creative changes to the mail for kicks, but ultimately can be persuaded to do anything with the right incantation...and that includes giving you other people's mail.
VMail: No experience yet, but I'd guess something like a wisened old man sitting on the porch outside the postoffice. Looks at everyone who passes by with deep suspicion, but turns out to be friendly and helpful once he realises you're not there to rob the place.
Micro$oft IMC: The Scarlet Pimpernel of postmen. Hard to find, impossible to order about, but every once in a while it saves a piece of mail from disaster. Sometimes even with it's head(ers) intact.
cc:Mail SMTPLINK: A 5 year old child left in charge of a large sorting office. Can't reach over the counter properly, can't handle more than one letter at once and has to go looking for a grownup whenever it wants to deliver to mail to other towns. Often opens parcels to look for shiney things inside then just delivers the wrapping paper onwards.
cc:mail UUCPLINK: an insane madman sitting in a box. Mail is thrown into a box where unknown things happen to it.. sometimes mail actually leaves the box.. usually to be delivered to the administrator of a totally unrelated postoffice and containing a complaint that the madman could not find the recipient in his dark box and would you please contact the person with the key of the box. Of course, the only way to reach that person is by mail and even if the box is opened the madman cannot be pursuaded to actually send mail to unknown addressees to the person with the key anyway...
Gus, Pete Bentley, Malcolm Ray, Perry Rovers
Backups.
With POP3, the client downloads mail and deletes it off the server. Without a significantly butchered POP3 server there's no way to hold copies of that mail for a period of time (say, to ensure it goes on to your archival tapes, or to make sure you can recover files the user deleted accidentally). It's one less thing to worry about if their workstation / laptop dies, too - just give 'em another one. If more mail clients supported LDAP address books and WebDAV calendars this would be even nicer; as it is I still have to keep their mail folders in their network home dir so I can back up their address book.
You can back up POP3 boxes if you're on a corporate network, by forcing the client to keep its spools on the user's homedir. That tends to be slow and inefficient, though, and it doesn't let you do things like transparently split out attachments and store only one copy of an identical attachment for everybody.
It's also easy to lose mail with POP3 if your client does something silly. Most clients seem pretty decent now, but I remember old Eudora versions used to DELE mail off the server then crash, corrupting their mailboxes. Woohoo.
IMAP gives admins much more control over user mail. You can back up their mail folders, including their outbox and filed mail. You can enforce mail lifetime limits if your information retention policy requires it. You can store single copies of duplicate messages and attachments. You can give users access to shared mailboxes, and to each other's mailboxes where necessary. You can manage their mail folders remotely ("I can't delete $message, help!"). You can set up filters that deliver mail into sub-mailboxes automatically. Good clients automatically sync the IMAP mailbox so it can be used when the client is offline, like POP3. You can have your anti-spam software learn from their mail client's Junk folder. It's just much saner for business environments, in much the same way that network home directories and thin clients are much saner than a bunch of desktops with local storage are.
IMAP also permits you to give the user a single view of their mailboxes from their desktop and when they're on the road, or accessing their mail from home. Don't even talk about "leave mail on server" for POP3 - users WILL misconfigure it and suck all their mail down onto one of their machines, then come to you looking for help cleaning up the resulting awful mess.
Now, for an ISP, things are the opposite. You want to get the users' mail through your system and get rid of it. Most ISPs only offer POP3 and have small mailbox caps, so the user can't set their client to never delete mail off the server. They don't want to be responsible for user mail, they want it off their hands ASAP. An ISP can just tell a user who deleted a message then wants it back "well, that was silly then wasn't it?". An ISP doesn't want to back up 5 years worth of mail for 500,000 users.
My point is that for corporate environments IMAP is so superior that it's almost nuts to offer anything else, but for an ISP POP3 is a much more viable option. So what's so bad about POP3 depends entirely on what your needs are.
There are a few things you are going to want to consider in this. First, you really need to define what you want the mail system to be. If your requirements are simply POP/IMAP, then you can go with a variety of vendors or, if you are big on building it yourself, some opensource offerings. However, with a million accounts, you aren't going to want to revisit all the outlook clients to point to POP/IMAP. For a direct exchange replacement you might want to talk with communigate (formerly stalker). You will also want to consider looking at the mail gateways for inbound and outbound traffic. While opensource is a great thing, managing the infrastructure for a system this large on opensource can be a bit of a pain. So, I would look at the commercial vendors out there like IronPort as they provide all the filtering and traffic management through a single interface. The vendors will provide all the design assistance you will need, so there really isn't a need to bring in a consultant. So, my design would be to have an MTA layer facing the net (potentially dedicated inbound and outbound MTAs), the mailstore layer in the protected net, an LDAP master server and a couple of replicas for the mail systems to hit. I know IronPort and Communigate would do this well. While I am a big fan of open source, for a system this large you really want support and companies backing up the whole thing... LEt the flames begin
Use Notes/Domino on the backend and set up Outlook as the client. That way, people get to keep the same look and feel, but it's being handled by a much more scalable solution on the backend.
'He who has to break a thing to find out what it is, has left the path of wisdom.' -- Gandalf to Saruman
The company i work for has a tool exactly for this kind of situation called DBMail (available, open source, at http://www.wodan.net./ It does exactly this: store mail in a database, making your system as scalable as your system of choice. We currently have a few setups of half a million users, and there is no reason why a bigger setup wouldn't be possible. If you need help or want to know more, mail me :)
(5) Get yourself another job before all of the problems can be found. When they're found, come back as an even higher-paid consultant to fix them.
The ______ Agenda
Resign. Go work for Google. After you get some experience with how a real large scale projects work, go back to the same company to fix whatever mess the guy after you have done.
One million accounts, huh?
So one out of every 5000 humans in the universe will have an account on your mail infrastructure?
Even counting for dupes, I don't believe you'll ever serve a million email accounts. This isn't a technical thing. It's one of those 'gee there aren't enough people in the world for me to believe you' things.
But I might be wrong. People seem to estimate their customers and users in such a way that suggests a good chunk of everyone on the planet is using their stuff all the time. I just don't believe them until I've seen real evidence. Nothing personal.
Two words:
Qmail and FreeBSD.
You will not find a more stable, scalable solution anywhere.
What this company should do is fire this tool and hire someone that doesn't need to ask Slashdot how to handle their huge and apparently very important mail system.
I'm browsing at +3 so I'm not sure how many people have already suggested it but you really should give Stalker a call and talk about Communigate
Dan
my recommendations:
- Calculate with about 20-30 man days for the initial design. You'll need some software
development for about 30-50 man days, 100 man
days for setup, testing and fine tuning.
Figures may wary upon skill and LWF. Time
for integration into your backup service is
not included.
- Use a directory service with replication mechanism (preferred LDAP, we've done it with MySQL too). Every system except the load balancers will
get a replica.
- The user data is stored on machines with Cyrus . Depending on machine size, user profile, mbox size etc. you take between 5.000 and 50.000 users per system.
- The directory service knows which user is on which system. Prepare a script to move users from
one server to another (including the mbox).
- Incoming IMAP connects go through a loadbalancer to frontend systems with the perdition proxy. Those will relay thre requests
according to the directory to the responsible
IMAP server.
- Incoming HTTP requests will go through the
loadbalance to an Apache with Squirrel on the
frontend systems. Those will convert the requests
into IMAP requests and connect to the local
perdition.
- Generate a web frontend for the user to setup
auto reply, vacation and anti-spam settings.
- From those settings you can create SIEVE scripts for the user.
- Incoming and outgoing SMTP traffic is handled by systems with sendmail. Local delivery is handled by LMTP connects directly to the IMAP servers (cyrus can handle LMTP).
- Antivirus and Antispam is handled through the milter interface and appropiate plugins. Plan for individual settings per user (can be generated
from the data in the directory server).
- Loadbalancing SMTP us trivial.
- Add monitoring (e.g. Nagios), Backup and Restore (last one most important, nobody wants backup, all everyone wants is restore).
- If desired, use a cluster file system for those
IMAP servers to have even more redundancy.
- Make sure you have access to the internal DNS of your company. If you can setup "mail.acmecompany.com" to point to several ips (depending on location) this may ease your job
lot. If you cannot, this may be hard (and expensive) for your load balancers.
- You can scale everything horizontal in this concept. Choking point may be the load balancers.
- You can distribute the system easily onto several locations. Distribution over several continents is only recommended if you can either manage the DNS or the mail agent settings per continent.
Please forgive me, if i'm not completely correct. I'm only the sales repWith backup support you should be able to setup such a system in 6 to 12 months (the later more realistic for big companies).
Most probably users will complain about the lacking calendar.
Most troublesome will be the migration phase (hope you realized i didn't mention it above). This depends so much on your current scenario that it is very difficult to give a general advice.
> where would you start?
Contacting me ;-). Perhaps get a budget first. As i said, i'm sales....
Regards, Martin
We have a fairly large scale email system. There are 180,000 email accounts on the system. All can be used. But only 8,000 are really truely active users. We use maildirs to store the emails. Adding vpopmail, spamassassin, virus scanning to the pipeline does not slow anything down. Performance is great. It is not even a good machine. Plus I have had 100% uptime so far on it. If they go this route, vpopmail takes full advantage of relational databases for authentication. It speeds that up dramatically. It is a pretty neat system though.
I would prefer partitioning the users by having a compact lookup map on the MTAs handling incoming mail, rather than distributing them across mail servers by hashing or other means. This way you have some way to directly control the distribution. For instance so you can move specific users to specific machines, and if you want to scale up your system, you can have more control over which users you migrate where etc.
(If you have 1M users, aim for 20M email addresses. A map mapping each user to an autonomous mail system can be put into a very compact data structure which can be kept in RAM. This data structure need not be mutable -- you can rebuild and reload it periodically to include updates)
Ideally you would have some proxying for POP and IMAP so that users do not need to be aware of what mail system they need to connect to. They just go to pop.mail.example.com or imap.mail.example.com and you can use DNS to route them to a local proxy . The local proxy knows which mail system the user belongs to and connects the user there.
This scheme would require some modest amount of development work, but I think it would be justified in this case. None of it is too complex. For instance there are several ways to solve the user-mapping in the MTA, from the naive Perl kludged LMTP-backend doing central-database-lookup (takes about 30 minutes to implement) to an atomically updatable compressed trie-based map in the MTA, so you can reject mail to invalid recipients before delivery takes place. It would surprise me if there are no proxy servers for IMAP or POP available which can do the same mapping, but I guess you can have someone write a fairly scalable one in a just a few weeks.
If you go for a monolithic solution for 1M users you will experience a lot of downtime and frustration and to keep a beast like that up is going to cost a lot more than having a bunch of cheap stand-alone systems.
I'm still surprised that this one is still under most people radars... It uses a DB as its backend (mysql, postgresql). You've got to use something like Postfix, Exim, etc. as SMTP. This thing seriously ROCKS. I'm still waiting for it to hang. It scales very well and is definitely worth a check.
I would use a few servers as incoming SMTP servers in a round-robin DNS or with load balancers. I would also set up a few backup IN MX for the domains.
I would store mail on backends servers which would have an smtp, pop and imap daemon running.
For the mail checking I would put POP and IMAP proxies, which can also be in round robin DNS or with load balancers.
You would then need a system to redirect from the SMTP servers and POP/IMAP proxies to the right backend. This can be done with a database or an ldap server. Or email addresses could be rewritten @backend1.domain.com, etc...
It looks like that buffer overflow might be there, but it depends on stuffing lots of data into the RELAYCLIENT environment variable. Because qmail-qmtpd does not have the setuid bit, RELAYCLIENT must be set by root or the daemon user prior to dropping root. Hence this bug is totally unexploitable.
e /2004-March/018191.html
http://lists.grok.org.uk/pipermail/full-disclosur
But I agree with the sentiment - oh so close!
I'd agree with the comment of looking at outsourcing.
There are excellent email providers which could provide customized email solution that can easily scale to your need.
Should that not be an option, one solution to scaled to that size for me in the past is the iPlanet Messaging Server from Sun. It is also used by a lot of ISPs for their customers. Very versatile, customizeable and solid.
Good luck
Of course there are certain drawbacks, like when you use IMAP, the database loses sync, so you shouldn't mix it on individual accounts. But I don't use IMAP now so it's perfect for me. This could be fixed by modifying the IMAP server, which shouldn't be that difficult, I just don't have the need.
My point is, don't criticize if you don't know what you're talking about ;-)
Yours sincerely,
Peter
You're going to get the normal 99% BS from /. "experts" that have NEVER set up a mail server in an environment with REAL traffic. I don't know why some many people must render opinion about things they don't know anything about. Skimming a HOW-TO doesn't make you an expert.
I would suggest you find a professional with proven experience in this arena. Lets also hope your company has given you enough of a budget to do this properly.
Good Luck
You cannot replace Exchange with Qmail or anything like that. Exchange is a workgroup collaboration system, which is much much more functionality than plain e-mail.
If you want to implement for a big COMPANY, you must go IBM-Lotus Notes/Domino, because that is the only honestly full featured collaboration suite alternative (Novell's one lacks big time).
A single big IBM iSeries AS/400 or zSeries mainframe iron can serve 15k CONCURRENT Lotus users, either over the native OS, or by running Linux in virtual partitions. And those machines are rock solid, both by hardware and software. Just don't forget to reboot every decade.
You could use Sun iron also, but Sun is about to turtle in one-two years, so don't count on support. You also don't want to run Domino on Windows Server, do you? Reboot twice a month just because of security hotfixes? You are smarter than that!
The idea to implement a million mailbox system on PC iron with free software is only viable for webmail systems, that give mailboxes for free, because you do not have to give uptime warranty there.
Companies buy big black IBM iron, because their IT HQ staff do not want to work extra nighshifts to fix all kinds of weirdest problems, they want to go home at 5pm and spend all their big income on thier kids, wife, dog and car.
Geeks should not be allowed to go anywhere near a corporate data center, they should be shot on sight!
What the guy should do is buy an e-mail system that can handle 1,000,000 users and not screw around trying to chewing gum his own solution.
He who never tries never learns anything either... and he who learns how to achieve something like this is subsequently worth a lot of money.
Why not open source the project?? get some geekers to code some mutant linux based mammother system to send ooogles of spam(cough) mail.
> > > FWIW, I've experimented with 750k mailboxes on a single system with 8GB RAM and we
> > > plan to put that number in production in a couple of months here.
> >
> > Ouch, 750k? How many concurrent accesses?
> >
>
> We currently have 1.6M, 1.2M and 940k mailboxes in 3 boxes with fiber to a single emc storage, all boxes
> dual Xeon 3.4Ghz EMT64T with 4G.
We tend to have quite large mailbox lists, but not as large as this. The biggest issues we've found with large mailbox lists are:
1. Number of concurrent connections.
If you support/encourage IMAP usage, then you tend to end up with quite a few more connections than POP.
Although technically IMAP can be very long lived, we find there are lots of short connections (mostly due to things like Outlook Express which when doing a "sync" pass does a logout and login for each *folder* in a users account!) and some long ones. With about 650,000 folders on one machine (about 130,000 users) and at peak times we see about 3500 imapd processes. We use linux 2.6, and find that this is a good number of maximum processes to have. Although the kernel is just about O(1) for everything these days, we find that there does seem to be a bit of an elbow point around the 5000 process mark where things just seem to start showing higher latency and average loads on the server
2. Size of mailboxes.db file
With a large mailbox file, you probably want to use the skiplist format. Part of the implementation of the skiplist db however is that the entire file is mmap'ed into memory. While this is generally fine since each process shares the same mmap file backing, with really large mailboxes.db files you can end up with just huge page tables.
For instance, the above 650,000 folders mailboxes.db is about 100M is size. With pages being 4k each, that means each process needs 25,600 pages just to mmap that file into it's process space. If you have > 4GB of RAM, you have to use x86_64 or PAE mode in linux. Both of these mean that each page requires a 64bit page table entry (8 bytes). If you have 3500 process then...
3500 * 25600 * 8 = 716800000 = 683M
Yes, that's 700M of memory just to hold the memory map of all your processes, no actual real data at all!!
This also means that you MUST use the high-PTE option in linux, or else you'll have lots of low memory pressure.
3. IO
CPU isn't an issue. IO definitely is. Cyrus uses minimal CPU on todays hardware, but it still is an IO hog.
That's part of the reason we sponsored the meta-data split patches that have gone into 2.3 so that you can separate out the email store part and the cyrus.* files onto separate partitions/spindles to improve overall performance. Where possible, split out:
user.seen state files
quota files
cyrus.* files
email spool files
Onto separate spindles/partitions. At least that way you'll be able to use something like "iostat -p ALL 120" to see which parts of your system are generating the largest IO.
I think the best Open Source solution is ISPMAN.
It uses LDAP as it's backend database and is all-in-one ISP solution with mail+web+dns+etc. You don't have to use everything if you want.
The best thing is that "you may start with a single server to manage user's mailboxes and add more as you grow. ISPMan can manage this and allow you to create user's accounts and mailboxes on different servers. This does not affect the user at all but allows the system administrator to balance the load of mails on different machines."
/running right after you
I have an old 386 in my shed, 66Mhz DX with 24Mb of RAM and a 2Gb HDD that has been running as my mail server for 6 years since its last reboot. Something like this should do the trick for you so long as all your users don't want to check their mail all at the same time. ps, try beating that kind of uptime $1m data center. pps, i'm worried i may have to take it down soon to replace the power supply as the fan is getting slower :(
...get everyone a Gmail account!
Oh s**t!
Take a look at the Mirapoint mail appliances at http://www.mirapoint.com/. 99.999% reliability and scalable up the wazoo.
What you have here is an opportunity for a tremendous open source win against exchange, and you are about to stuff it up because you do not have a clue how to do it.
So, what you do right now is you go find someone who does know how to do it. And by that I mean someone who can demonstrate they know how. Which does not equate to having a low slashdot id; it equates to having done real projects of this scale.
So, how do you start? You ring IBM and get them to come in and talk to you. You ring Red Hat. You ring Accenture.
If you want impartial advice from someone who isn't a vendor (which is a good idea), then you go find some companies that has a million seat open source e-mail deployment in place and you see if you can get their messaging admin to talk to you.
~~~~~ BigLig2? You mean there's another one of me?
Have a look at GMX. they are one of the biggest E-mail providers in Europe, (mostly Germany and Austria)
http://www.gmx.de/
As far as i know, they have a 100% non MS Solution.
Mostly Linux clusters.
They seem to scale pretty well.
Hope this helps.
Greetings
Comment removed based on user account deletion
My university also uses the Sun Messaging server. But we're only about 15000 students, so it's not a huge deal. But it works really well, at least compared to the old system with NFS-mounted mailboxes; there were constant problems with that, and it was overloaded and slow too.
At firms I worked with (telephony companies, usually), scheduled downtime is not included in downtime numbers. Of course, it depends on the SLA, but this is how it worked in the "5 nines" days of Ma Bell. 5 nines (99.999% uptime) was basically a myth. :)
I just did a qmail install yesterday and even when it's a good program it has a long steep learning curve. Every time I install qmail I need to google around, read a lot of documents and understand new things to decide what I need. Netqmail is a good but insufficient starting point while qmailrocks.org's version is completely overblown (at least for me). While figuring out the patches and other tools I'd need I couldn't help plotting yet another qmail package of my own.
Improving netqmail and making it what qmail should be would be great.
As an aside, my personal feeling is that if DJB sticks to his licensing (and continues to ignore all the patching) we need to eventually rewrite qmail. It's getting worse by the day and the patches are already starting to conflict with each other.
Of course not. If implemented properly, it should be transparent. That means you disolve the link when somebody performs a mutation on their instance. If 1000 people do this, it means you will have to store the 1000 variations. But you could also detect similarities with ie. hashing, and only store those unique objects and link to them for each instance.
;-)
Ironically, Microsoft is developing WinFS which is supposed to be able to automatically hardlink files transparently, thus the filesystem will automatically support Instance Store for every application. This is actually a pretty neat feature!
Yes, people will seem stupid when you assume they are. It is most usually about your assumptions, not them..
Instead of jumping on the problem, just think the obvious solution, and then patent that
http://www.debunkingskeptics.com/
It's unfortunate you got so many junk answers to your query (e.g. "resign", gmail, .mac, etc). I had a server running ~15,000 accounts on a Pentium 133 with IMail 7 a while back. It wasn't pretty, but mail got sent and received as it should.
Hula claims to scale pretty well, integrate with ClamAV and SpamAssassin, and have lots of other cool gimicks for calendars and such. For 1 million accounts, I'd get some sort of dedicated spam/virus filter, though.
High level concepts can be outlined in a few paragraphs.
Even if eventually the poste calls the consultants he can get enough ideas here to at least be properly informed of the overall direction a solution could take.
Why are there so many people around with a "don't bother" attitude?
IANAL but write like a drunk one.
We do it with a bunch of Postfix servers and MySQL. The MySQL is going to be clustered soon but currently runs separate on each server. Each server has MySQL and Postfix and generates statistics. Currently the most heavily loaded machine (10000 mail accounts) eats about 1-5% of CPU (Single Xeon with 3x72G SCSI RAID5). We estimated you can push about 100000 accounts/server given enough disk space (we are planning to put it on Apple SAN-solution) and separating the MySQL database. There are about 10 mails/sec. passing through the server (IN/OUT). An environment with 1000-2000 exchange e-mailaccounts takes up 2 dual proc. servers for the frontend and 2 single proc. servers for the backend (storage) needs migrated to a 70000$ storage solution because the current gives not enough throughput. The problem is that each times a secretary opens a calendar (eg. to schedule an appointment with the managment) all those mailboxes, schedules, calendars, notes are opened, searched through and synced (takes about 2000MB of datatransfer in a few seconds) while the IMAP protocol doesn't do that and provides the same functionalities.
Custom electronics and digital signage for your business: www.evcircuits.com
Sun has the Sun Outlook Connector that allows MS Outlook to behave normally while there are Sun Messaging, Calendar, Addressbook and Directory serves instead of MS Exchange. In addition Sun has a SAFE methodology and toolkit to migrate out of MS Exchange.
Somebody that obviosuly has never been trusted with a challenge on his job.
Sad.
IANAL but write like a drunk one.
Concerning the question from Cliff from yesterday about an alternative to MS Exchange, very clearly Sun has 3 reasons for you to move: - A backend software servers to support one million and many more users on the messaging, directory and groupware. Sun software servers are OEMed into many other vendors in the telco space for normal email, unified messaging, etc. etc. AND for some huge corporate levels. I won't list as well own many telcos serve B2B customers and do messaging hosting. Sun's software is here the clear leader. - The Sun Outlook Connector that allows a user to keep MS Outlook client but see all Sun servers for mail, calendar, tasks, global addressbook, private addressbook, notes, journals, etc. like if this was an MS Exchange server. So no need to retrain your users and I can tell you it was exposed to the fire of real users! - The Sun Groupware Migration Toolkit (SGMT) that allows you to SAFELY migrate the whole users data from MS Exchange (any version) with NO NEED to force users to change their passwords and no disk expansion on mail side with coexistence for user level, password, mail and even public folders. Again from real projects. Let me know for an offline discussion if you need more details
Heavy /. users on my company are in charge of systems providing email services n that range.
/. readership for I don't know what reason, several of the reply a pure gold for somebody in the situation of the poster.
Actually I think I have spotted their reply already.
Many people underestimate the
IANAL but write like a drunk one.
I agree that there is not much point in using a database if what you need is a "traditional" email system. But a relational database allows you to go beyond what a traditional email system is capable of: I wrote Decimail (http://decimail.org/) to investigate these possibilities. Basically, Decimail is a PostgreSQL database with IMAP and SMTP daemons.
In Decimai, an IMAP mailbox is defined by an SQL query. So you can group messages by date, subject, sender, content and so on. The main contrast with other systems is that this categorisation is done retrospectively: there is no need to file messages when they arrive, but instead you find them when you want to read them. And each message can appear in more than one mailbox if it matches multiple queries.
Decimail is not particularly efficient: it's emphatically not what this questioner wants for his millions of users! But if you sometimes find the organisational features of your existing email system a bit limiting, it might be of interest.
Yeah its not the quickest thing (spop) and you virtually can't send any attachments :(
... if you are just dealing with other ako users.
The portal is nice though
-- if you mod me down, I will become more powerful than you can possibly imagine
If you want to guarantee anything beyond 99.0% availability, you must have complete redundancy at least 200 miles apart. This distance make all the MPLS links unusable, signifiantly increasing system complexity.
You never mentioned what your RTO and RPO were. If you can lose 24 hours worth of data, there are fairly standard methods. 12 hours is doable. Less than that and you need to spend a ton more money. SRDF/RA is interesting when you get down to the 5 minute area and don't want to write across the WAN for both locations.
Probably the easiest solution is to get 4 mainframes, 2 per site, create linux partitions on them and use some commercially supported MTA. Use all the mainframe replication facilities to do the remote replication daily.
Or you could use email like it was meant with federation and each dept or location having their own local server.
Don't forget about spam filtering, SOX compliance, and automatic encryption of external communications. IronMail merged with a PGP product can do this. The free PGP implementations make the data the individual's, not the companies. I'll just say that commercial PGP has "other solutions available" so the company still can get access to encrypted information.
When it works at all it's slow. Sometimes you can hit the Send button and just sit there and wait a while.
When we have to work on a Navy project we had to start bringing our own equipment and hubs. Even their developer machines come loaded with 10 year old software and you can't get your email and be logged in as a developer at the same time. To check mail you have to log out, log back in under a different account, then log back in as a developer. The NMCI machines are boat anchors.
NMCI is the worst defeat the US Navy has ever suffered.
That's our life, the big wheel of shit. - The Fat Man, Blue Tango Salvage
Poster: we have Exchange, it does not work, we are moving to something else.
drsmithy: you should use Exchange. It is the rock0rz!
Me: doh!
IANAL but write like a drunk one.
Your idea is good, but is it implemented anywhere?
Over my objections, a colleague tried to have the mail directories on on machine and the pop servers on four others. At light loads he got acceptable performance, and so put it in production. With several thousand accounts, 30-minute (not seconds, minutes!) delays between messages were common.
NFS (v2 and 3) is pessimal for constantly-updated files with ad-hoc locking mechanisms.
As suggested in the parent, distribute the mail files for a given user to a machine which provides the pop and imap services from local disk.
--dave
davecb@spamcop.net
Don't set your self up for huge sustainment costs by building and maintaining your own e-mail system. Contract with someone who knows how to do it correctly and cheaply. Contact Google, Yahoo or MSN to have your million users added to existing infrastructure. I am sure they can give you your own domain and would be considerably cheaper. If all else fails, take a look at Oracle Mail. You will be pleasently suprised.
A million users and they want POP3? Add a gun and a single bullet to your administration requirements.
An ISP, etc. could quite reasonably require a million POP3 accounts. Given that the submitter mentions Some corporate, some personal, some free I am guessing that employee users are in the minority here.
A few weeks ago, I read an interesting article about Schlund&Partner, one of the largest internet-related companies in Germany these days, developing an eMail-solution of their own, because they weren't able to find anything suiting their demands. They're the guys behind GMX, the arguably most popular eMail service in Central Europe. Their eMailing system, Nemesis, is designed to provide scalability and redundancy all the way, maybe you can get them to relicense it to your company (I don't know which license the project actually is under, sorry), or at least let you evalute their solution so you can deceide what you can actually expect from nowadays eMail-gadgets for the enterprise.
:)
The article was published on the german-speaking Linux-Magazin, Issue "August 2005".
Good luck!
:%s/Open Source/Free Software/g
YTARY!
Merging Lotus Domino with the power and stability of an IBM iSeries (aka AS/400) would give you the stability and robustness you require. Unfortunately, the cost isn't cheap.
The iSeries also runs Linux if you are looking for a stable, hig performance (but not cheap) 'server'. You may already have an iSeries if you have 1 million people working for you.
Lotus Domino runs on Linux so you could run it on a Lintel box and get the stability of Linux with the Domino feature set. Alternatively, an IBM mainframe (or any other that would run Linux) could be a Domino server, if you wanted to re-use existing equipment you may have.
As for the client, Lotus Notes runs on Windows and Linux (in WINE). Their web client, iNotes, while not perfect, performs nicely and has some security features built-in that you'd need for roaming users.
(no, I don't work for or sell IBM products)
The thing is, 1 million accounts is not simultaneous. There will be a substantial number who won't be on the server at any given time. The number of accounts isn't as much a concern as the size of the inboxes involved. If you run quotas and keep them down, you can have a decent machine handle these accounts even with one server. That is, until you add webmail. With that number, webmail would have to be a serious package. I know for my implementation, the speed of the webmail package dropped with large inboxes, so again quotas help. I'm with the suggestion above about a Beowolf cluster. Clusters are nice for redundancy, speed, and the ability to stay up indefinitely.
So you're mad because you can easily customize the mailbox interface in Notes? You've just shown that you can sort mail by subject, so where does the problem come in?
/. crew
If you're angry with the company you work for for not making the UI enhancements you want you should be complaining to them about it, not the
I'd go with a unix based operating system, build .. or why not rewrite and centralize your authentication on a modified Open LDAP server!
a HA active/active cluster that consists of at lease 3 nodes each. If you are an Intel shop I'd go with
6 Dell 6650 poweredge servers. Some form of SAN, or
NAS storage. 1M users must mean a budget so I would go with an EMC CX600. I'd use Linux HA. Configure
the first three servers as your mail hub, configure your accounts, imap, ldap, pop, webclient, etc. on your next three servers. Sendmail gets bashed but I'd tell you what... It is stable and when configured properly it is secure. LDAP can be populated via your Active Directory
If you don't have to time to roll your own system, check out scalix. Roger Williams University (my son attends there) uses Scalix. www.scalix.com
Make the spamfiltering function somebody elses problem- either a service (Mailwise) or appliance.
... perhaps opteron based IBM blades? Clustered on some linux distro that IBM would provide a support contract for. I've also been looking at 64bit HPC stuff from Terrasoft solutions on Mercury HW- very intriguing.
Software? Cyrus IMAP + Postfix.
Hardware? Something high density
At a previous company I was with, we ran CommunigatePro, with roughly 500,000 users on it. Our cluster had only 4 machines to support this, and we had plenty of room to grow. They offer great support, and have several installation services. Everything is customizable in it, and has all the features you requested.
http://www.accelerateglobalwarming.com
GMAIL!!
Stalker fucked their customers real hard a year ago.
My guess is that they have a new leadership with clueless MBAs and maybe planning for an IPO.
It is really pity since Stalker was the best player, but my advice is to stay well away from them.
http://www.theregister.co.uk/2005/02/04/stalkers/
1) It's a bitch to install. Won't even compile on modern Linux distributions. You have to patch it to compile it and the patch isn't even hosted on qmail's site.
Yes, it's not for newbies like you. It's for people who know what they're doing. On which "modern distro" did it "not even compile", anyways?
2) It's a bitch to configure. Rather than parsing a single configuration file, qmail relies heavily on the presence of individual files in a directory.
Guess what, many admins prefer that approach over
lengthy, nested config files.
3) Not not not not scalable! That's a myth. Doesn't properly batch jobs together. Hell! qmail was originally designed to be run from inetd!
qmail was designed to run from daemontools.
Where do you take all that bullshit from that you're spilling here?
Oh, and it scales quite well, quote from qmail.org:
USA.net's outgoing email, Address.com, Rediffmail.com, Colonize.com, Yahoo! mail, Network Solutions, Verio, MessageLabs (searching 100M emails/week for malware), listserv.acsu.buffalo.edu (a big listserv hub, using qmail since 1996), Ohio State (biggest US University), Yahoo! Groups, Listbot, USWest.net (Western US ISP), Telenordia, gmx.de (German ISP), NetZero (free ISP), Critical Path (email outsourcing service w/ 15M mailboxes), PayPal/Confinity, Hypermart.net, Casema, Pair Networks, Topica, MyNet.com.tr, FSmail.net, Mycom.com, and vuurwerk.nl.
4) Heavy reliance on other daemontools.
Yes, because
1. it's UNIX
2. all other tcpserver implementations were broken at the time (and afaik are still broken)
What are you gonna criticize next, qmails low memory footprint?
5) Breaks well-known and understood UNIX standards.
Which "well-known and understood UNIX standards" are you referring to? Do you have *any* clue what you're talking about? NO.
6) Security through lack-of-functionality.
Actually it's secure by design.
If you had the slightest clue about software architecture you'd have realized that on first glance.
7) Not really secure despite the claims.
So, you have found a vulnerability and collected the $500 USD from djb? Oh, you didn't?
Then shut the fuck up.
Your FUD is not appreciated, kiddy.
8) No longer maintained.
Says who? You? haha!
9) No features. Adding them requires patching, and patching, and more patching.
Yes, which you do *once* and then roll it into a nice package for future deploys.
Last time I had to change something in my qmail-tarball was over a year ago.
That tarball is all I need to pull up new installations with all the patches that I need.
Deploy takes exactly one line:
tar zxf qm.tgz && cd qmaili && make && make setup check
And, guess what, the Makefile even sets up the config for the box at hand which is easy because all it takes is stuff like echo `hostname -f` >/var/qmail/control/me etc.
With postfix I'd have to replace tokens in a config-file-template (which will ofcourse change syntax in newer versions) and other shit.
But from your statements I can tell that all this is way beyond your little head already. So next time you see the adults talking about stuff you don't grok you'd better shut up and listen instead of humiliating yourself like you just did.
And if you still didn't get it:
All your points are either outright *wrong* or display an embarrassing lack of clue. Go figure.
I didnt bother to read all of your comment. I stopped right around the point i noticed that your mathskills are worse than mine.. and that's saying a lot, haha. Basic math lesson below..
.01.. which [in this case] is the mathematical equivalent of "1 percent" .. NOT the remainder of "99.9 subtracted from 100" (which is what you THOUGHT you were multiplying it by).
24(hrs/day) x 7 (days/week) x 52 (weeks/yr) = 8736 HOURS IN A YEAR.
"99.9% uptime equates to about 526 minutes, or 87.6 hours you _could_ be down each year"
8736 / 100 = 87.36. Therefore, 1% downtime (99.0% uptime) is 87hrs lost per year! 0.1% downtime (MUCH DIFFERENT THAN 1.0%!) is 1/10th of that number.. 8.7hrs/year.. roughly 43/minutes per month. You got tripped up because you multiplied 8700 by
I hate to sound so derisive.. but seriously, you start off saying you've built enterprise-class systems.. yet you don't know something as fundamental as how much uptime "3 nines" equates to? I can understand crappy math skills (i have them too), but I can't understand not knowing something so fundamental by heart. I play with cameras for a living, so maybe i've misjudged the amount of off-the-cuff knowledge an admin/architect of a 250k acct email system must have regarding uptime.. but WOW haha.
hey dude, fyi you messed up ;-)
Of course, none of the solutions offered here are in the least bit compliant with the myriad of regulations that are going to need to be addressed in a enterprise this large. Sorbanes-Oxley is just one of your problems.
/., you need to outsource this project to a company that specializes in projects this size and will guarantee compliance.
Sorry, but if you need to ask this question on
In my opinion you're going to need a cluster of servers or at least round robin'd mx records for the servers. I personally think sendmail scales the best of the mta packages and offers the best set of features and ease of maintenance, although alot of people would argue it's intrinsicly insecure... I've never had problems, but I kept our mail servers up to date. I would seperate the smtp machines the outside world uses to deliver mail to your space from the servers used by users of your service to deliver mail. I would also move delivery services (imap, pop, webmail) to their own machines instead of having them on the smtp machine and you would probally be best to use a nas for the actual storage medium. This is actually a really interesting project. Good luck and let us know how it turns out :)
Shadus
Some things to consider: MS Exchange is a lot more than just mail. If Calendaring and other forms of group-working are involved then the task at hand is substantially more complex than for a mail only system. Also, these days with virus and spam being endemic the platform needs to incorporate a framework that handles them as well as policy driven content management controls at it's core rather than have them as bolt-in's or bolt-on's. Are you bound by any regulatory requirements?. Geography is a major influence, and if this is a business platform how does this affect your strategies for resilience, disaster recovery and backup of the platform? In a perverse way most of the decisions you have to make when building systems of this size are about business decisions (what's the cost of retraining users to use new mail clients is a favourite of CTO's) and it's not specifically about the products/technologies involved.
So, exactly what type of hardware/software and surrounding infrastructure you need to assemble to create 'the whole' is a somewhat open-ended question without going into a decent level of detail on your requirements and the drivers behind them. However, once you go north of about 500k users the number of commercial vendors tails off dramatically. If you include group-working as a factor it reduces further. I'll not start suggesting names (I currently work for a vendor in this space and self-plugging's not in the spirit that /. operates on), but i'd recommend starting out by talking to some of the analyst groups that have staff researching this end of the messaging market (Radicati, Gartner, Butler Group) and then opening dialogue with vendors appropriately.
i believe Communigate Pro by Stalker software has a 1 million account single server system running for a couple of years now. They also have cluster configurations to eliminate that single point of failure. Very good product, very stable.
you can implement this scalable and redundant, it offers all services mentioned plus more. With this http://www.oracle.com/technology/products/cs/index .html
you can also use the oracle database as datastore for all kinds of docs, effectively replacing fileservers. Don't forget things like backup & recovery ...
I don't know how others think about this or have experiences with it but I think it worth some investigation. It has a price tag per user but with a full implementation it could be a nice price...
http://homepage.mac.com/ik_zelf/oracle
can you get in touch with me at afrest@hotmail.fr
Move all your users to eComStation clients, so you won't have to be concerned with AV software or any existing trojons.
Start porting your favorite, e-mail server to an Atari... what fun is this project unless you're going to run it on old arcane hardware which was never designed for anything remotely close?
If this were my company's move, I'd try to get John Terpstra of the Samba team to consult with us. See http://us1.samba.org/samba/team/ for contact information.
--dave
davecb@spamcop.net
I'm wondering if they didn't know how to use Exchange properly, what makes them think that "Anything But Microsoft" is going to be any easier? Are they just going to try each one until they find one with defualt settings that most closely match what they want to do? Inquiring minds want to know.
HitScan
Well if the users get a 100 Kb mail box it might.
Gmail is open to everyone now right....just sign up for 1,000,000 gmail accounts and go on vacation! Let the engineers at google do it.
Of course nearly everyone who uses it hates it, because it seems unnecessarily complicated. But this is precisely the kind of situation Domino was designed to handle: scaling. If you can get by with Sendmail, you don't need or want Domino, but if you want to manage a million email accounts, this is one of the first places I'd look.
This is exactly what Notes was designed to do: scale. People have been building systems on this scale with notes for nearly twenty years. You can not only scale it by moving parts of your email system onto mainframe class iron, but you can distribute it and provide all kinds of flexibility and redundancy into your system to meet virtually any messaging requirement (e.g. choose an alternate MTA for high priority traffic when there are Internet disruptions). Naturally there's some complexity involved, but if you can get by with sendmail you probably shouldn't be using Notes.
What's more important is that management of accounts and identity, which is distributed, delegatable, and backed up by robust cryptographic certificate management. You can let a subsidiary manage it's own accounts, they can subdelegate that to a division and the division can subdelegate that to the IT staff on site; at each level policies can be set, enforced, and changed for lower levels.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
I would seriously look at Cyrus (http://asg.web.cmu.edu/cyrus/), which is designed to be scalable for huge numbers of email accounts. And the email users don't have to have accounts on the Unix boxes. It stores the messages in the file system but sets up index databases so that accessing the mailboxes is fast. It can also handle single-instance storage of the messages sent to multiple mailboxes.
There are two answers to this. One is Qmail which as long as you are familiar with this program and are willing to invest the technical time into it would give you the basics for mail. One draw back here is that you have no protection and would have to add an appliance due to the size of the network such as Symantec which would only increase cost and IT time. Second would be to again use Qmail and run ldap you could then proxy your mail through a scrubber lowering CPU usage and some maintenance time. The third and solution that I would highly suggest as being your best bet is to outsource this service to a subscription based provider that also includes mail protection i.e. Anti-virus, Anti-Spam, Anti-Spyware as well as the ability to control outgoing mail protecting sensitive company documents. There is only one Company that I know of that provides this complete mail with protection service and that is Sarron www.saron-corp.com.
Ugh. I just did this crap yesterday.
/filer management for Exchange seems to work pretty well, and you can restore arbitrary mailboxes, emails, whatever, from any point in time that you have backups for.
we don't do brick-level backups of mailboxes because it's too much overhead, so to restore a bunch of deleted contacts from a users mailbox I had to go back and restore the whole mailstore. conveniently, though, I was able to pop it on to the RSG on my new exchange 2003 box, which hurt much less than I thought it would.
I only wish the user had been on our new exchange 2003 box, which is backended onto a netapp filer- so far the snapshot backup
Sometimes, though exchange just makes me want to start smoking again.
EOM
Don't dismiss this out of hand. I'm willing to bet that Google would licence the technology they use for private corporations, and you already know it can handle over a million users. Google is in the business of making money... (or that's what I've heard).
I would certainly give them a call and see what they are asking for a private version of Gmail.
I would use qmail, with courier and vpopmail running over top of it. For the web mail check out http://www.horde.org/. It's got some really great features.
For spam and virus check out a barracuda unit, simply amazing. http://www.barracudanetworks.com/ns/?L=en
-Pizentios
Exchange 2003 is the best option because of available support and scalability, BUT it must be correctly implemented. The company I work for has ~200,000 mailboxes that used to stretch across several Exchange 5.5 environments, TAO mail, Lotus Notes and GroupWise. We started a consolidation effort four years ago that is in its final stages (mostly because we've acquired other companies during that time that had to be migrated.) This was accomplished by building a brand new native mode Win2K infrastructure along with Exchange 2000 then migrating clone user accounts then mailboxes in phases. We went from a problematic patchwork of mail platforms with diverse support to one large Exchange 2000 (since upgraded to Exchange 2003 in preparation for Win 2K3) environment spread out all over the globe that is highly available, clustered and 99% up time. If you're planning this you definitely need Exchange 2003 installed from scratch and involve Microsoft. I've seen it work first hand.
I'm by no means a developer though I read up on email technology and providers all the time.
I'd consider contacting the good folks at:
http://www.fastmail.fm/
they provide one of the fastest and most standards compliant IMAP, pop3, and SMTP services I've ever used
They support lots of bandwidth and storage and low costs using, AFAIK, all open-source software at a seemingly low cost per user
Also, the individual who maintains the following website might be of good assistance to you:
http://www.ii.com/internet/messaging/imap/isps/
and
http://www.ii.com/
I would look into HMail.
It comes with webmail or you can use a different one, IMAP, POP, SMTP, External accounts, Antivirus (ClamAV), blacklists, MySQL or MSSQL support, web based admin control panel that you can let users use to control thier account, individual domains or the whole server, multihoming, action scripting and more. Oh and it's FREE. A good choice if you have to use MS as your server OS. Now of course it's never been tested for that many email account but I do know of people using it for multiple domains with 1000's of accounts on each domain.
I think Slashdot readers have enough spare Gmail invites to help this guy out, right?
Earlier this year me and my team rolled out the largest email system in Europe for $UK_ISP (not BT).
It caters for 4 million current users and can scale to an estimated 10 million.
We use Openwave MX software to do this - it was the only thing that would scale. I mean the *only* thing, nothing else could cope. We looked, trust me.
You need *lots* of hardware. This isn't a full list, but to give you an idea:
24 MTA machines
12 FEP (front end processing) machines
16 queue machines
48 mail storage machines
16 virus-scanning machines
2 dedicated DNS boxes
4 directory servers (to look up mailboxes)
16 webmail machines
Numerous other boxes including logservers, terminal servers and a jumpstart environment for quick rebuilds.
Typical box stats: SunFire V440/480, quad processor, 8gb / 16gb RAM where possible. All run Solaris 8/9.
These are hooked up by fibre to a couple of enormous EMC arrays, and a bunch of HP EVA storage also. Total capacity? currently ~48tb.
It's a massive project, and it's not perfect, or (ever) completely finished, but it works!
Good luck with your project, if I could give you one bit of advice it would be to take whatever spec you think you need and double it.
cheers
Super Awesome Broadband
You have a flawed assumption in that the file is read only. Exchange/Outlook will let you modify the attachment in place and keep it in your mailbox.
....and then, Exchange WILL have to write a new copy of the data, because you just modified it and the data is not the same than before - you can't use the same copy. If the 1000 users keep the same file it's fine, if they modify it you need 1000 copies about it
Sharing something with people (which for some reason database people call "single instance store" I've learned today) can be done in both a filesystem and in a data base. Databases are "one-size-fits-all" kind of tools, not always the "best" solution, but one that you've lot of chances of making it work even if it's not the best solution. Linus said something similar when he was suggested to develop GIT in top of MYSQL...if you really know what you're going to do with the data, and you KNOW that a filesystem is enought, why use it? It's buying a 900HP car to your mother - STUPID. The "let's do it just because we can" is a good step if what you want is to write overengineered, bloated software.
Because a filesystem IS a database. Except that instead of having a SQL-ish interface, you've a "read(), write(), readdir()" kind of interface. Which happens to be really fast (filesystems are implemented inside the kernel, they're reliable, they're much simpler, easy to manage, etc).
When you use a database like mysql, you're just using a database in top of, uh, another database (the filesystem). Which has not sense. It WILL work, but that doesn't means is the "best possible solution"
Despite of all this, BTW, hardlinks are NOT the solution for the "share a file between 1000 users" problem. It can be, but remember that you can't make hardlinks between different filesystems. I have no idea if you can use LVM to solve this, if ACLs + symbolic links can be used to implement this in a delivery agent. And if you cant (I don't really know), someone really should think about adding something to filesystems to allow it like plan9 did, because it has sense
or he will kill you
I'd seriously consider Yahoo Webmail if it could branded for the company.
Well, now that we've cleared up the benefits of IMAP, I'd like to add the real reason, which nobody seems to mention, of why people like POP3 over IMAP is PRIVACY. The idea is 'Pull it down, and off those company servers' before you get fired because some friend forgot to use your gmail account. When you host your own mail, IMAP rox. A Debian Cyrus/Squirrelmail solution has speedily pushed out my mail for years with much better virus and spam protection than most. Evolution (client, nearing the end of its REALLY buggy days) allows me to merge my corporate exchange and personal IMAP on my desktop, at home and at work (via OWA). Inappropriate emails are quickly drag 'n dropped out of Work->Inbox and into Personal->Funny. Rather than craft yourself some POP3 hack around IMAP, spend the time to setup your own personal IMAP system, and get yourself an ISP that lets you do this. (One that allows incoming port 25, authenticated outbound SMTP via ISP) Over the years I've beefed mine up using a VPN to a second/multiple mx locations for redundancy, as I have family members in the area - I use Rogers who leave your IP alone, when on UPS. Well, not a million users, but family and friends and an old Dell D233, on software RAID IDE. Debian packages with inherent security.. I sleep well on long vacations, and don't waste valuable company time trying to maintain my privacy. And YES, I might consider a Cyrus based solution, if you are looking for FREE and FAST. Postfix+Courier might work too, not sure how Courier scales, but if you like Maildir, GO FOR IT. iMac.
And yet somehow the fact remains you were still wrong. Hard to say Ooops and move on?
I wrote an article on a system I use in production for about 20,000 accounts that should scale up to what you're looking for. Obviously you'd want to add in more servers, nice RAID setups, etc. The nice thing about this setup is it separates onto different servers the inbound MX traffic from the POP3/IMAP traffic. Here is a link to the article:
1 1b.htm
http://www.samag.com/documents/s=8920/sam0311b/03
Another benefit is if there is a hardware failure it either doesn't impact the system at all ( if you lose an MX server ) or it only impacts a small subset of your total addresses which makes it more manageable.
Do it like Hotmail does and use BSD.
You'll see a lot of trolling and flamebait regarding the Notes client; but the fact is, IBM Lotus Domino might meet your requirements.
It provides full e-mail with calendaring and scheduling. It supports POP3 and IMAP, and webmail, as well as S/MIME, MIME file attachments, HTML, and LDAP for directory lookup. It has been demonstrated dealing with 300,000+ simultaneous users on a single server. (In fact, the network bandwidth gave out before the server did.)
It scales from a single old x86 (I have some old quad Pentium 2 200MHz boxes running it fine) to a zSeries mainframe. It has clustering, so if you set it up right, when a server goes down users' clients move to a clustered replica and don't even notice a problem.
I'm not wild about the Notes UI either; but you owe it to yourself and your company to check out Domino as a possible solution, because you don't have to run the Notes client at all. You can even keep using the Exchange client, and just replace all the servers. There are tools to help migrate from Exchange. I can probably put you in touch with some people experienced with Exchange migrations if you like; e-mail me (address on personal web site, at bottom of page).
Another option might be IBM Workplace Messaging. That's focused around web mail. I have to confess I don't know how far it has been scaled up. (I'm not in sales.)
(Opinions mine, not IBM's.)
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
I just built a mail system that was intended to scale out to about 4 million users. We replaced over 70 windows machines running iMail with a small array of linux boxes behind load balancers. Mail is largely IO bound, so you pretty much have to get a kickass storage system. I specified fibrechannel connections to a generic SATA array with a lot of spindles. Because you're going to get a lot of delivery and read access concurrency, use Maildir as the spool storage. Select a good fast filesystem. We used Reiser for historical reasons, but XFS would also work well. We used Qmail as the transport with ldap patches and courier-imap(ssl) as the imap server, which practically gives you the webmail component for free. Spend your money on the SAN backend, and give each mail node a lot of ram. Apart from the Maildir requirement, you could build out a like architecture with postfix as well. To date, over 2 million accounts have been transitioned to the new system, and those 70 windows machines have been retasked as webservers. Here's the machine breakdown:
;)
2 * dual xeon/1G ram/small boot drive -- SMTP
6 * dual xeon/4G ram/small boot drive -- imap/pop
2 * dual xeon/4G ram/internal raid -- LDAP
4 Terrabyte SATA Raid backend connected by fibrechannel cards
*note, if you have a requirement to save virus prone boxes from commiting sepuku over foreign agents coming in on the mail channel, you'll have to scale the smtp requirements nearly arbitrarily to account for processing each message for malignant components at that stage. Personally, I think it's kind of distasteful to use Free (speech) systems to make windows secure, as it's sort of defeatist, but that's my political baggage. YMMV.
I used qmail-ldap to build a service which has had zero downtime in over a year, planned or unplanned. I had a handful of 1U servers offering SMTP(S), IMAP(S), POP(S), WebMail, and local DNS and LDAP caches. They stored mail on a backend NetApp accessible to all servers via NFS. One master LDAP server was where accounts were added, and it replicated to the cache slaves on each 1U server. I can add capacity to the NetApp, and add servers to handle load with no downtime. The 1U servers are fronted by a redundant pair of F5 load balancers.
We were able to apply OS patches box-by-box, taking them out of service individually, but without any downtime to the service. Very nice.
Others are using qmail-ldap for large ISPs, of the size you are asking about. Check out their mailing list.
The company I work for (which shall remain anonymous, just like me) does it thus:
Use cisco content switching modules (CSMs) for load balancing all of the clusters, we no longer use POP and IMAP, but we used to. Just webmail now, I'll explain the old style, just the hardware side since that's pretty much all I was privy to. Filling in the rest though is just a matter of reading how to tie it all together, which is well documented.
LDAP - OpenLDAP:
2xDell 1850
IMAP - Courier IMAP:
8xDell 1850
POP - Can't say that I remember what software, courier possibly:
8xDell 1850
SMTP - DJB's Qmail:
12xDell 1850
Webmail - Squirrel Mail (we created our own later):
8xDell 1850
NAS - NetApp Filer:
2x960 (I believe) in a failover configuration
1) It's a bitch to install. Won't even compile on modern Linux distributions. You have to patch it to compile it and the patch isn't even hosted on qmail's site.
/usr/include/errno.h > conf-cc; make" is not exactly going to kill you, now is it?
Look, it's annoying that Bernstein and the GLIBC authors have decided to take their mutual pissfest out on us, but "echo gcc -O2 -include
If it is going to kill you, there's always the net-qmail distribution.
2) It's a bitch to configure. Rather than parsing a single configuration file, qmail relies heavily on the presence of individual files in a directory.
A matter of taste, I guess. The single-config-per-file method makes it very easy to build kickstart/rpm profiles that add or remove certain features without having to carefully parse/edit a monolithic configuration file, but I can see how for a junior sysadmin it's a little more confusing than just "look in main.cf."
3) Not not not not scalable! That's a myth. Doesn't properly batch jobs together. Hell! qmail was originally designed to be run from inetd!
You really have no idea what you're talking about, do you? (Hint: qmail isn't sendmail, and qmail-smtpd isn't "qmail" any more than inetd is "unix".)
4) Heavy reliance on other daemontools.
You can use daemontools to manage qmail if you want to. It's not a requirement, and the official docs don't even suggest it.
5) Breaks well-known and understood UNIX standards.
Really? Which ones?
6) Security through lack-of-functionality.
It's an MTA. It transports mail. Securely, as it happens. This is a feature, not a bug.
7) Not really secure despite the claims.
Really? Care to enlighten us?
8) No longer maintained.
This is as close to an actual valid complaint as you've got here: it's certainly been a good long time since the last release. And yet, it still works.
9) No features. Adding them requires patching, and patching, and more patching.
Look, if you need an MTA that speaks LDAP, SQL and UUCP, has hooks into an integrated calendar, and polishes the bumpers on your car, it's probably true that qmail is not the tool you want to use. Have fun trying to manage whatever monstrosity it is that does.
It does one thing, and it does that one thing extremely well: some of us still consider that to be a virtue.
Serious sysadmins don't use qmail and for damn good reason.
I rather doubt you'd recognize a serious sysadmin if one bit you.
News for Nerds. Stuff that Matters? Like hell.
A little bit of research reveals this:
http://www.usenix.org/events/lisa03/tech/full_pape rs/elprin/elprin_html/
I don't suppose anyone's come across any newer research (or implementations using this approach)?!?
Don't forget there may be some sort of security required too. I'm not just talking about using SSL and TLS either. There may be some requirement to use two-factor auth and email encryption. Plus the potential privacy legislation issues to deal with. You could be up for some extra headaches here and possibly a free sports car or a lifetime of golf days from Vasco or RSA.
I would look at Lotus Notes on the biggest pSeries server you can find, or an i5 iSeries if you can. Infinitely manageable and configurable, with native or web interfaces (you do, however, get *much* more functionality with the native client) it additionally lends itself to considerable use in application design within Notes itself, as well as integration with the DB2 database as seen on the i5. Very very nice and I believe bullet-tested by the folks over at IBM.
I am not a Notes sales guy, just a happy admin.
I have a BUNCH of Gmail invites. Ya wan'em?
The REAL jabber has the user id: 13196
What you do today will cost you a day of your life
...would put such a giant system in a single place? Multiple mail servers don't have scalability problems.
Put one mail server in each department. Universities have done this nearly thirty years ago and it still works. It also saves bandwidth as intra-department email doesn't need to be routed. Everything works out of the box, they invented SMTP back in the 80ies to make this sort of thing possible.
If you really must give people their firstname.surname@company.com address, put the mapping into a database and have central routers forward accordingly.
Honestly, what's wrong with the tried and proven low tech solutions?
More than half the bugs listed are for nonDJB code. qmailadmin and masqmail are not by DJB.
;). Ahead in some areas perhaps. But quite behind especially in performance.
;)
Much of the rest are for running in a 64 bit environment. If you want to port some 32 bit apps to a 64 bit environment, no surprise that you might have to change a few things first before things run properly.
The "rcpt to" overflow DoS thingy isn't a problem in practice, because your qmail processes should have sane ulimits on them. You might want ulimits even if you run postfix or some other mailserver.
The other one happens if you allow users to send 2GB messages. If you don't and you have a ulimit limiting the amount of memory qmail-smtpd uses to <2GB, I think the process would die first (not sure what happens if you try the exploit without ulimits on a box with < 2GB free RAM...).
The other thing of course is: qmail-smtpd runs as qmaild and not root. So even if you do allow > 2GB messages and there is an overflow, the attacker only gets qmaild permissions.
The attacker needs to work a lot more to get further.
AFAIK DJB doesn't claim there are no bugs. He just gives a security guarantee. And so far, I don't see how these bugs will allow an attacker remote root on a qmail system. Even without ulimit controls, you'd just get DoS or qmaild.
I don't believe openbsd is ahead of the curve though
It'll be fun to get another DJB vs Theo thing. When was the last one?
Since there are 100 posts about Linux solutions, I'll make my other suggestion.
FirstClass ( www.firstclass.com ).
FirstClass is a great piece of groupware software that has been around since before the Internet. Calendars, Shared Conferences, Web Pages, VoiceMail/Fax, Shared Folders (available through CIFS/Windows File Sharing), IMAP, POP, Auth SMTP, EASY EASY EASY administration. The Mac, Linux, and Windows clients look identical, EASY setup for users, a server name, user name, password. No ugly SMTP/POP/delete messages and all of that. Different web mail templates, one looks and acts JUST like the client.
How many concurrent users, because 1mil users is not hard, 1mil concurrent users is. Also 99.9% uptime is not THAT hard, it's 9 hours of down time per year.
The downfall of FirstClass is though you can have multiple "clustered" Internet boxes (http, smtp, pop, etc), you can only have one main server. Also everything (unless you POP) is kept ON the server, so you NEED and internet connection (or modem) to work with EMail.
Another bonus is that web pages are EASY, NO HTML required. Create a document, change the fonts and colors, drag image, BAM, the web page is done.
FirstClass is definatly worth a look.
-Ben
-=Down Syndrome in Maine
http://www.qmailrocks.org/ and a few other patches for much larger setups
This doesn't sound like a problem for a single, monolithic solution to me.
The "free" accounts--for one--should be a separate system.
Each team should have its own email system. Each team should have the option of having one of their own manage their email instead of IT. IT should define standards to use for interoperability.
Webmail should probably be a generic web IMAP interface with the requirement that each team's email system have IMAP turned on so it'll interop with the webmail.
This is the kind of setup that I've seen work best. But YMMV.
"Oh, the other thing? Outlook feels integrated because everything automatically does the windows automatica launch active-x thing. Just highlight a message subjet, bingo! Embedded code launches! that's why viruses and worms."
Stop spreading FUD. Outlook hasn't executed scripting in messeges for years. While it's true that Outlook uses Trident ("IE") for displaying messeges, it runs it in a mode with Javascript and ActiveX disabled. Even Trident flaws are unlikely to cause a compromise.
All recent email worms have been of the "download this executable and run it" variety. And Outlook 2003, by default, won't even let you download executables.
I am in the navy and we have horrific information technology problems. Basically the problem is that we have little internal expertise and have tried to make up for it with a unruly mass of contractors. If you are an IT contractor just be sure to use the phrase "web enabled" and the Navy will bite. We now have an absurd array of web pages that we are supposed to access for training, pay, medical, promotions, retirements....the list goes on. Guess what - each one is provided by a different contractor and requires it's own usernames and passwords. I literally have 14 (not kidding) usernames and passwords written down that I need to do my job. What is particularly shocking is how many of these sites use your social security number as part of the sign-in process.
Oh, yeah, Guninski. He's a crank. Sure, if the sysadmin doesn't apply any process limits, an attacker can deny himself service. That's like saying that if you have a gun you can shoot yourself in the foot.
-russ
Don't piss off The Angry Economist
If your disk is flakey and your data is important, you will back it up on removable media which you store in a separate location.
With internet, it is even easier. You can use one of the online backup services, or do it yourself. What can be achieved rather painlessly today in backup is amazing: In one day I mirrored a Linux PC over internet, with full mirroring of the whole Debian-install, making for a complete redundant backup-machine which can be put online in minutes. Pretty fun, and I can clone the entire install anywhere I like, never having to install everything by hand again.
If you're relying on copies of files for backup, then I guess you never had a HD die on you. It's not fun, and copies won't help you. Solutions exists to get the data, but it's too pricey for individuals.
If you're into local redundancy, I guess RAID could be a cheap option, or just copy the files to another filesystem.
I think your gripe is with Microsoft usually not giving enough options to their customers, because I can't see why you don't like this solution, which of course in a sane system should be possible to turn off. For a workstation this is probably not needed anyways, but imagine what this can do for a CVS-sandbox fileserver.
I'm surprised that the filesystems for Linux haven't been doing this for years yet. It might have performance issues, but those should be solvable.
http://www.debunkingskeptics.com/
I would seriously start the design process with the storage system. Email is mostly just "storage" that is accessed by many different servers all at once. It needs to be fast, fault tolerent and easy to backup while in use for this a "point in time snapshot" feature is good. You may want to talk to Sun about thier "ZFS" filesystem on Solaris 10. If not read up onwhat
/." and peole here think for only 2 seconds (if that) before they type an answer
it does and then get something else like it.
Once you have bomb proof storage running on a cluster of servers, raid, hot spares a transactinal file systam and all that than you
add smtp, imap, pop webmail servers. Use lots of MX record, round robin DNS or whatever to load balance. If the storage don't work the system
will not work, get that right first
Lots more detils but "this is
BTW 99.9 is setting the sights a bit low that would alow about 8 hours of downtimeer year. Shoot for another "9".
"All the other tests" are 2 other tests, searching and selecting all headers. This is not indicative of actual use, and doesn't demonstrate that mail storage should be a database. As I said previously, its just because they are comparing mysql with indices, to imap servers without indices. Throw dovecot in there with its indexing and all of a sudden mysql isn't faster at searching.
http://www.sun.com/software/products/messaging_srv r/home_messaging.xml
The Sun Java System Messaging Server is a high-performance, highly secure messaging platform--the leader in the service provider messaging market. Scaling from thousands to millions of users, the Java System Messaging Server is suitable for both service providers and enterprises interested in consolidating email servers and reducing total cost of ownership of communications infrastructure. It also provides extensive security features that help ensure the integrity of communications through user authentication, session encryption, and the appropriate content filtering to help prevent spam and viruses.
First of all, no, your RAM will not be enough to cache everything. Just like it won't with a database. You will end up with more RAM dedicated to caching with just filesystem, since a database takes up a bunch of RAM for other things.
And your last part makes no sense at all. Of course doing find will be slow. Just like select * would be slow on a table with a million rows. You shouldn't be trying to access everything at the same time no matter where you stored it.
And your mkdir problem is likely just because you don't have enough inodes. If you are creating a filesystem to store alot of files and directories, you will create one with enough inodes to have them all.
And of course, you don't want to make a million directories in the same dir. Make a few thousands directories (one for each domain) and then have each of those contain a few hundred of thousands maildirs (the users for that domain).
Feel free to ignore reality and pretend you need to build a database. Those "data structures" are called files, and the filesystem is already written, and it takes care of them for me. Pretty handy huh? And decent imap servers already have indexing, so that's taken care of too. Oops, you suddenly get all the benefits a database would give you, without the huge overhead.
If the files are not in the buffer cache using fs storage, then they would also not be in the DBs cache using a db for storage. You will have LESS RAM available for caching data if you use a database, since you now have all the other stuff you don't need from a database using up RAM too.
... disk ops" then so will a database server, or does it have magic powers to avoid access disks? Or are you trying to compare a fileserver with 512MB of RAM vs a database server with 16GB of RAM?
If a fileserver is going to "choke on a flood of
And the majority of mysql installations out there might well be used to provide an SQL frontend to simple, non-relational data, but that is definately not the case with real databases. As I have explained repeatedly now, the only thing a database will get you is faster searches (SEARCHES, not ACCESS), and that is entirely because of the indexing. Use an imap server that does indexing and suddenly the database is offering you nothing.
They call it a mail server server, but it includes web, calendar, ftp, radius, pop3, imap, wap, smtp and just about every other relevant RFC you can think of. It includes Outlook compatibility (calendaring as well) and runs on just about anything. And, more importantly for what you want, it scales very well, supporting large volumes of incoming email, millions of users, and multi-machine clustering.
I've used it for many years on somewhere.com. Not many accounts, but it handily bounced (and spamtrapped) as many as a million messages a day.
a Gmail account.
Adding a database layer makes it even worse, typically increasing the chance the box will start swapping, while helping to drain the CPUs, and eat most of the memory you could have used for an index cache.
A poorly written SELECT statement can also be a very effective way of slowing a large email database to a crawl.
I'd like to introduce you to (my little friend...) the concept of creating subdirectories as a way of organizing data. :)
Each user could have their own subdirectory. There is no need to store everything in a single directory though -- the subdirectories could be further subdivided based on month or even day. The filenames themselves could be chosen so that commonly searched fields are available without needing to search the contents of the files. You don't have to search through the whole 2G of emails (just because Deus-Ex-change does that doesn't mean that you have do it that way).
Since 99% of searches are looking for something that happened the same day or a previous week, I don't think it would get bogged down that easily (but I'm willing to listen if you can find an exception).
(And this is without commenting on some of the bloat monstrously hypocritical idiots have tried to add to some of the common Unix utilities, but of course there are non-bloated versions around that run much faster. No forced obsolescence for me, thank you.)
Adding a database layer makes it even worse, typically increasing the chance the box will start swapping, while helping to drain the CPUs, and eat most of the memory you could have used for an index cache.
:)
Um no. Adding an index to the data (what a DBFS really is) would speed up the search, not slow it down. The idea is to find the information you need as fast as you possibly can. There is no way that walking an index is going to be slower than churning through 2 gigabytes of data.
I'd like to introduce you to (my little friend...) the concept of creating subdirectories as a way of organizing data.
Um, no again. When I refer to 2GB of mail, I mean 2GB per user. When I was in tech support (back when drives weren't much larger than 2GBs!) we constantly had to repair the PST files of some poor sap who had inadvertantly gone beyond the storage capacity of a single PST file. (FYI, PSTs corrupt silently instead of complaining about being full. It's quite annoying.)
There is no need to store everything in a single directory though -- the subdirectories could be further subdivided based on month or even day.
To what end? If I'm searching my entire mailbox, I still need to churn through all those subdirectories. Not to mention that your organization scheme may be counter-intuitive to me. I may prefer to organize my mail by project instead of date. (In fact, I really don't know of anyone who orders their mail by date.)
The filenames themselves could be chosen so that commonly searched fields are available without needing to search the contents of the files.
Or, the meta-data of a database file system could extract the necessary components, index them, and store them with the file itself. No need to munge one type of meta-data (filename) to support other types of meta-data (subject, from, to, etc.). Not to mention that your scheme hangs by a very lose thread. What happens if a user decides to rename the file?
Question, did you read the link I gave in the great grandparent post? You may find it informative.
Since 99% of searches are looking for something that happened the same day or a previous week,
No, that's not the normal pattern I see. Most searches are an attempt to find some obscure piece of information that's been lying dormant for years. For example, if a coworker gives me a username and password for use when they're unavailble, it could be anywhere from months to years before I need that information. Other examples include procedure documents, URLs, code documentation, and project information needed by follow-up projects.
That's what makes GMail so effective. None of that info is lost. It's all indexed and tagged so that you can easily search for it in the future. Filesystems should be able to replicate that experience.
Javascript + Nintendo DSi = DSiCade
Look at Samsung Contact
http://www.samsungcontact.com/
That seems justified.
I'm in the useful position of being the BOFH and mail admin at work - at a fairly flexible and reasonable workplace - so I'm able to use my IMAP mailbox without fuss.
I can imagine that may not be true for some. However, POP3 won't help you - it's still gone through your company's mail filters, been logged, and if the company is really dodgy been scanned for "flag" words / analysed or even had a copy stored. The download protocol doesn't matter - either your company is not reading your mail, or they are.
The point is that if your company doesn't permit the use of work email for personal stuff, you're generally better off following that. Even if it's not reasonable - because they're not likely to be reasonable about it if there's a problem, either.
I have to call bullshit. 1.72 million users and less than 3 million messages a day? So each user on average only sends and/or recieves less than 2 messages per day?
Its not bullshit and you obviously havent' been around the Army. Every troops has an AKO account, maybe (and I say maybe) 20% of them log on a day and maybe 20% of them actually send emails with it. Every unit in the army still runs its own independent unit email system (usually m$ exchange) which handles the vast bulk of US Military email traffic. NETCOM tried to the force the issue with PKI CAC implementation but the Army resisted and DA G6 never backed them up. AKO is a good idea done badly and lacks serious command support but that is OT.
... active accounts is a different item though.
AKO really does have ~2 million accounts and maybe ~3 million emails daily
De Oppresso Liber
Except when the document is opened, changed, and version control added you DO have 900 different instances. Gee, aren't you glad that feature saved you?
I run Samsung Contact, which is an implementation of HP Openmail. It was designed to scale at least 32000 users per server, and can be used with a webclient, with POP3, IMAP, and Outlook via MAPI. My users think it's exchange, but it isn't. All running off a Linux server
http://www.sun.com/software/products/messaging_srv r/home_messaging.xml
Designed for it, aggregate servers and load sharing, the works.
Where I work, we have a system similar to this. I noticed that nowhere do you list virus scanning and spam blocking. Of course, offering these services will *dramatically* increase the needs for infrastructure.
We use several sets of servers. The first servers are our MX servers. We have 5 of these and they process all incoming mail from the Internet at large. These servers are responsible for two things, delivery to local accounts, and processing incoming mail. They are all behind a Cisco CSS load balancer and have a limit set on their connections. There's a good reason for this, to be explained later. Besides the MX servers, we have 3 MQ servers. The MQ servers have no limits on their connections, as they are spill-over servers only, designed to queue up mail. This ensures that our spam and virus scanner servers do not become overloaded, even during a large spam attack, as the MQ servers must traverse their own, even more limited ingress on the load balancer. We also have SMTP servers, which are used by internal customers for sending mail. We have 3 of these. These servers also support SMTP AUTH over SSL/TLS for customers when they are off-network.
All of these servers are served by a load balanced set of SV servers, or our spam and virus scanning servers. These servers are running your usual concoction of mimedefang/spamassassin/clamav and are used by both the SMTP and MX servers. We currently have 12 of these to keep up with peak loads. All mail servers are running Sendmail, as its milter interface has performed much better in our tests than any other MTA. Of course, the exact configuration of the servers is a bit of a secret, but we have separated queues to keep emails from filling up the queues. Each of our 3 queues has 10 sub-directories, to keep the number of actual files in each directory down and to limit disk I/O on such large directories. Filesystem choice makes a big difference here, so you'll want to figure out your average email type and determine what filesystem to use based on this. The more RAM you have, the better.
For mailboxes checking, we have 3 sets of servers, our POP3 servers, of which we have 5, our 4 IMAP servers, and our 3 webmail servers. The webmail servers are running an IMAP proxy. Of course, all of these services are behind a load balancer. We even use the load balancers between services, such as webmail/IMAP. We use Courier IMAP, squirrelmail, and nupop for these, though all are heavily modified to support features which aren't necessarily needed outside our environment, such as automatic username munging based on originating IP. Backend storage is provided by NetApp Filers.
This services about 500,000 email customers.
Appliance Reliable Low overhead No downtime Blindingly fast A very very happy customer over several years...
First, let me say that this argument may have already been carried out quite eloquently and in greater depth in other threads. You may want to check them out. Maybe your view is already represented. Not that I want to duplicate many of the answers already given, but...
How does a DBFS index materially differ in any way from the existing indexing system of filesystem i-nodes (index nodes anyone? :) directories, caches and buffers, in a way that would matter to this application?
Sorry, somehow I misread that. But, my line of reasoning still applies. It's not the size of your attachments but the wisdom of your indexing method :)
It may be useless to argue about the details of specific implementations, but is this a critically useful feature for an email user? And why would you want to have your front end permit renaming files? That would seem to be a strange feature to have for an email client, let alone an email server, especially since the email has already been delivered. And why do you see the use of naming conventions as hanging on a loose thread? Metadata seems to work pretty well for websites, and the naming conventions get pretty arcane there.
I reject this. I think the most frequent search is to scroll up and down the page with the most recent emails of that day to look for a recently received email, and then to click on it and read it. That most frequent of searches is right under your nose -- perhaps that's why you didn't consider it. In wider searches, searching for data that's being automatically categorized for you is by far easier and faster than "look under every rock to search everything for me." If I know it was from last year, why should I wait for Clippy to search through all of this year's stuff? We don't have to work like Deus-Ex-Change on this one. Especially when you know where they're headed with it, and that they're indifferent (by design) to the impact of their relentless bloating.
Whether or not a DBFS is the right tool for the job depends more on how well tested it is, and less on the other incidentals. If it permits a bunch of extra, marginally useful indices but the performance doesn't change much, and instead it adds another layer of library bloat and new unknown bugs to place the 99.9% uptime requirement at risk, then why should anyone use it? Maybe a better question would be -- what does it do best, and is this really a case of it?
Some final questions to ponder: who is watching out that all of the new science being created isn't simply a renaming of old science principles? Is there any incentive to recognize the contributions that have come before, or is it more profitable to convince people that it is "new" somehow? If the old science isn't being taken into account, then, is the new science really science or just namespace cruft?
Another poster mentioned two mail servers which use hard links for this purpose.
Centralization breaks the internet.
OpenMail (on which Scalix was based) scaled to insane levels compared with Exchange, Scalix should be the same. If we're talking consumer ISP-style workloads, you should be able to approach 100K users on a smallish Intel server. The key is to have a decent SAN, as previous posters have pointed out.
Scalix can support just about every Outlook feature that Exchange can (forms being the notable exception). Any mailbox can be used with POP, IMAP, Outlook/MAPI, or the Scalix web client (SWA). SWA is an AJAX client, with a look'n'feel close to Outlook.
Scalix quotes 99.99% uptime, and I saw even better in OpenMail days. Again, a good SAN is a must.
Groupwise could be the solution, no virus, and you have solutions for linux and windows, and netware, etc
It's from novell
Its news to me. Of course, less than 56% of the exchange market has gone past version 5.5 so I suppose that could be the reason I'm unaware of the change.
The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
"Um no. Adding an index to the data (what a DBFS really is) would speed up the search, not slow it down."
There is no reason why the plain old files in the filesystem could not be indexed. See spotlight for an excellent example.
evil is as evil does
We have 1000 users with one admin who devotes maybe half of her time to maintaining Domino. Outage? Try every 12 months when we successfully upgrade to the latest version of Domino. BTW, Domino 7 came out this week - we'll be upgrading by year end. Why are you still fumbling around with old versions?
Yes, look who really is dumb. Albert, what is wrong with you??? Can't you just tell him politely that he made an error like a civilized person? No, you can't. You have to be ubercranky and shout in his face. You barbaric twit.
And BTW, what about your sig? You don't use "tonight" in the past tense. "DID YOUR MOM SERVE YOU AN EXTRA HELPING OF DUMB THIS EVENING" maybe, or "WILL YOUR MOM SERVE YOU AN EXTRA HELPING OF DUMB TONIGHT" (although that one doesn't really make much sense).
You're an idiot and a troll, Albert. Go away and never come back, or, as you said, go commit suicide.
--
Bonk the Zonk! TMM for editor!
Trolling all trolls since 2001.
We use it here at out university. The JES Suite is pretty expensive for a million users, but much more affordable with .edu pricing. Scales _really_ well.
A significant set of comments here approach this problem from the server and sysadmin side. None of them approach the problem from the user interface or usability side. How can your implementation be successfull if your users (up to 1 million) are unhappy with the way they are required to use it, or it's unusable to them (very common for those not technically inclined as you are almost certainly to have with that many email accounts)
First you need to determine how the users will access their email, and how they are going to use it. Will this be webmail, client app, PDA, etc.
Then you need to determine what the user requirements are:
Also what applications they will be using to connect to this system such as: emacs, pine, mutt, thunderbird, outlook express, MS outlook, opera, evolution, mail.app, etc.
If possible try to enforce a policy restricting the use of email clients to a small subset, but do however remember that there may be users on Mac's, **nix PC's, Windows PCs, and potential others. (NB: Avoid allowing Outlook Express if you wish to use IMAP)
Determine your security requirements for the mail system. Is everyone required to connect using SSL encrypted links?
Determine your minimum service levels required (99.9% uptime or higher, do note that every 9 beyond the first 3 can be expected to double the cost of the solution)
Determine support levels for hardware with respect to warranty, part availability, technician availability etc.
Determine backup requirements, are you required to be able to restore individual emails, individual mailboxes, all mailboxes, and how many levels of backups are required? Do you need to be able to restore emails deleted 4 months ago
Quota requirements, are there limits on the size of a persons mailbox, can this be customized, are there limits on the size of an email a user can send, and the same for receiving. Will you allow a user to store 2+GB of email on your system?
Determine other legal requirements, such as a requirement to be able to retrieve any email sent through the system for auditing/legal purposes
Determine effectiveness of antivirus filtering and how many levels of antivirus filtering will you require to ensure robustness and the correct level of user protection?
Determine level of spam filtering required (generic, user specifiable, with headers, without headers,
Do you require mailinglist or distribution list requirements (mailman?)
How many physical sites will be accessing this mail system (one office? multiple branches)
Will you be requiring a support ticketing system? (example: RT from http://www.bestpractical.com/)
Will users be able to customize their mail settings (enable/disable bayesian spam filtering, custom antispam rules, setting of spam thresholds, autoresponder messages, out of office replies, disable/enable spam filtering, disable/enable antivirus filtering)
What level of redundancy are you required to have? Do you need to provide redundant systems even if one datacentre is disconnected (somehow)
ie. main datacentre you use in the UK is disconnected for some reason outside your control, do your roaming users in the UK still need to be able to access their email without any loss through an alternative backup mail system in the US?
Can your users be split up into multiple sub-domains? ie. production, hr, finance, lists, support, technical, development, etc. And will they notice or can you hide it from the user with a simple server-side rewrite.
How are you going to measure the performance of the system once in place. wrt disk space, amount of connections, upti
Hi,
I would start looking at
Hardware:
two Sun Fire 2900s or 6900s
Hitachi (HDS) Storage Server
EMC DMX 1000
consider solid data solid state disk for message queues.
Software:
Oracle 10g, collaboration suite.
Veritas Cluster Server RAC Edition
Veritas VVR - EMC's SRDF is over priced
Veritas Foundation Suite DB Edition for Oracle.
Result:
Plenty of horsepower. Highly scalable and reliable.
Cost: See your Sun, EMC, Veritas and Oracle dealers.
Seriously, this is what I would do and I've had quite a bit of experience building large messaging systems. The above combination usually works well.
Avoid Linux, unless you like alot of overtime.
How does a DBFS index materially differ in any way from the existing indexing system of filesystem i-nodes (index nodes anyone? :) directories, caches and buffers, in a way that would matter to this application?
:)
:-)
The "index node" provides an index tree for a very simple type of query: The filesystem heirarchy. (This was addressed in the article I linked to.) However, the inodes provide no real information about a file other than its location, unless they are extended to include meta-data attributes. Using the email example, "To", "From", "CC", "Received", "Sent", and "Subject" are all meta-data fields you might expect to find in an email. With the meta-data, I can ask the index, "from:UnapprovedThought@gmail.com" and get back an instant result. Without the index, I have to churn through every email file on disk. Not to mention that I need to *parse* each file to find the info I'm looking for.
The scheme of using the filename does alleviate the situation somewhat, but it still is not tremendously fast (lots of I/O here), you still need to parse each filename, and it is limited in the number of fields it can contain.
It's not the size of your attachments but the wisdom of your indexing method
But you are advocating no index. I'm advocating an index built into the file system. Which is it?
And why would you want to have your front end permit renaming files?
You have no choice. If the user can see a file on disk, they can rename it. Your options are to extend the OS GUI to prevent the user from taking such action, or work with the user so that the client and the file system show consistent views. In a DBFS, the two *will* be consistent. So much so, that you may not even need an email client, or the client may be nothing more than a specialized version of the file browser.
Javascript + Nintendo DSi = DSiCade
I had scanned your article a while back looking for the technical guts of DBFS but lost interest in it for some reason.
In any case, an inode also provides a degree of locality, sort of like a stake in the ground. Files described by the inode are likely to huddle physically near that inode on the disk, and therefore quick to access if the disk head (a relatively slow moving object) is already hovering around that spot. Perhaps the "query result" is even within the same block or track that has been already read by accessing the inode, so that no further physical disk activity is needed. Replacing this with a non-locality based system may not result in similar performance, especially if a logical to physical hash is used to compute an actual physical location, and the location is far away from where the index is stored. It also opens the door to poor worst-case performance. In short, I would need to see comparative benchmarks of best-case and worst case before I believe this would be as you claim, both in single case "queries" and for a heavily loaded server having to handle a real-world load of multiple queries, inserts, deletes and updates at the same time, on different parts of the disk.
This sounds simplistic, as if the disk hardware and its natural latency were somehow absent from the picture. If it is instant, as you say, then there is also a fair chance that the same query would return instantly for a non-DBFS. Namely, because the block the information is in, is already cached...
The filename solution avoids most of the churnings. As do a finite number of indices. But, at some point the DBFS will also have to churn through the disk for a randomly chosen string in the body of the email message. Or, you could technically index everything, but I doubt you'll do that, based on the fact that these are large chunks of mostly random data coming in, and you'll be spending all of your time updating the index with each email that comes in and wondering why your new server is so slow. Under these two (at minimum) competing processes, physically, the disk head will be flying to-and-fro from the location of the index to the location of the data and back, and it's easy for me to think that this could be implemented improperly, and that the index itself would become fragmented, or inconsistent in a forced shutdown... in short, it could get very complicated, unreliable, unrecoverable, and yes, even slow.
Parsing using a non-bloated language built on a minimum of non-bloated libraries is still going to be faster than the disk speed, especially since these are roughly linear time searches. (Obviously, I'm not talking about a Windoze design here, where the purpose is to see how much RAM can be consumed inside of buzzword-du-jour subsystems so as to get you to replace your computer that much sooner.)
You have a gift for understatement.
Once again you seem to ignore that this information will typically be in the cache. Your DBFS will have a cache, and any other filesystem will also have its cache. You will want to use cache space for your index, right? Thus, there won't be "lots" of difference, except maybe in the reverse if you have lots of indices that no one actually uses.
Parsing a line of text isn't going to bring the heavens down. Especially when it is already in-cache and you can split the work up among several processors.
Most Army installations only use their AKO webmail accounts for forwarding to their installation e-mail servers, most of which as far as I know use Exchange. Most of the time the only use of AKO webmail comes when the installation specific e-mail system is down for maintenance ... so the response is probably relevant for those who use AKO webmail on a regular basis, but not Army-wide.
Judging by those numbers, though, I would say the Sun setup is great for forwarding, but not for high scale groupware.
The filename solution avoids most of the churnings.
And also limits the amount of information that can be stored. You're going to bump up against the upper-limit of the filename length, just from the subject. Add an email with a large number of To's or CC's, and your filename solution breaks down.
This sounds simplistic, as if the disk hardware and its natural latency were somehow absent from the picture. If it is instant, as you say, then there is also a fair chance that the same query would return instantly for a non-DBFS. Namely, because the block the information is in, is already cached...
Of course it's not truely instant, but it's close enough. Take the Spotlight search system as an example. It produces a list of files, as you type. For a comparison of performance, go to '/usr/bin' on a Unix system and attempt to use tab completion in BASH. Notice how slow BASH is at retrieving the results?
But, at some point the DBFS will also have to churn through the disk for a randomly chosen string in the body of the email message. Or, you could technically index everything,
Hallelujah, he finally gets it! Yes, index everything. Or more precisely, the hook for handling email files would divide the file up into keywords which would be added to the index. The amount of storage for this would be the one-time cost of the word plus 4-8 bytes for each instance of the word found across the entire filesystem.
You will want to use cache space for your index, right?
Correct. But in the absence of the data being cached, it's much faster to burst read the index than it is to read through every file. Ideally, the head would never leave the platter while reading in the index, as opposed to studying each file on disk. Worst case for head movement in this situation would be O(fragments) for the index and O(files*fragments) for studying each file.
Unless these are modelled as subdirectories so that they don't take up space in the filename.
Mess, mess, mess. Not to mention that file systems still often have limits on the size of the full path.
I doubt you'll do that, based on the fact that these are large chunks of mostly random data coming in, and you'll be spending all of your time updating the index with each email that comes in and wondering why your new server is so slow
Who's talking about servers? I'm talking about clients. (Although servers would work just fine as well.) And updating the index is a minor amount of data to commit. As I said, the cost of the word, plus a 4-8 byte cost for each instance on the filesystem.
Also, the DBFS has to read an extra index, a storage area a normal filesystem doesn't have to maintain, physically translate to, update, or use up cache space to store.
Except that a Database File System would be built to maintain, physically translate to, update, and use cache space to store these indexes. Remember, I'm not advocating the use of a database on top of a files ystem. I'm advocating a more advanced file system that extends the current indexing capabilities.
FYI, BeFS was a full database file system, HFS+ now has DBFS features, and NTFS has a great deal of DBFS features (despite Microsoft ignoring the features in OSes prior to Vista).
You were worried about a parser, but here you've got a larger, more complicated filesystem driver taking up space in precious RAM, butting up against kernelspace and single-threadedly stuck to one CPU.
None of the above. I realize that you decided my article was a snore-fest, but I actually suggested using FUSE to stick the file system in userspace. Which means that the process can be multi-threaded, multi-processor, and freely use pagable memory.
I still don't see why the webpage/GUI writer, for instance, would be forced to explicitly throw in a rename button on the client (this is supposed to be a record of delivery, not a workgroup editing effort).
All the user has t
Javascript + Nintendo DSi = DSiCade
I can think of several ways to solve this offhand. If your filesystem has a fixed filename limit, or you can't tune your filesystem to increase the filename length, you can still store each email as a directory, with each field as a separate file within it, side-by-side with the message. If you don't want to do that, you can store as much as will fit and spill over longer fields to the message itself. Or, just have shorter fields. You could have a client-side address book that maps a hash id to recipient email address, etc. (There are a bunch more ways but I'm not going to bore you with them.)
No :) It was quite fast for me. You must be doing something wrong.
Even if it is slow for you, your client won't typically be looking at a directory filled with a million little files (a disingenuous example if I may say so), because it will be smartly broken down into subcategories.
Everything, eh? Well, if your entire filesystem is on a SAN, that's one huge global index that has to span a single immense partition for 1 million users. The index would be so big that you would likely not be able to fit the entire thing within the RAM of a single server. If that happens, your system is in even more trouble than I thought earlier. I figure a 1 million user system will have to handle 100 million incoming emails a day, most of them during the morning peak. So, the SAN drives have to keep revisiting the index even for reads, not just for writes. And all that just so you don't have to store the word viagra in a million different places. That is, DBFS will be slow unless it has some form of distributed filesystem capabilities.
You want to keep re-reading the index from disk, rather than from cache? That doesn't sound like the ideal or best case. The ideal is to avoid as much disk access as possible.
I will glean from this that there are no benchmarks yet...
But the disk head has to move far away from where it was just to update a tiny thing. That's fine for a one-user system, but for a large email system it can gum up the works very quickly.
Don't worry, I may be dense, but that much has sunk in. I can sympathize -- freeing you from your subdirectory inhibition has been just as difficult. :)
Yeah, it's either run in userspace or you will have a huge index (filled with stuff like viagra1, v1agra, etc.) that you can't swap out. But the downside is that you've replaced the base of your information pyramid (that ought to be stable enough to build on) with an unproven component. It also doesn't improve matters that this new filesystem is multithreaded if you're trying to debug a file or index corruption issue. While that's happening all 1 million users will be waiting for the system to come back up, because for the global index to remain consistent, the entire system has to go down if only a part of it has failed.
DBFS sounds great for a small syste
1,000,000 email users
About 70% of users use system daily (if this is a webbased email like Hotmail, this is a high estimate)
Each user reads 30 emails and sends 10
A grand total of 19,600,000 emails per day (this is a high estimate)
Only 226 messages either read or sent per second per day
@ 100k avg. per message storage is 1.8 gb per day or 666 gb per year if you don't compress
An Apple xServe dual G5 w/16gb of RAM and xRAID 5 TB will work just fine
For uptime, two xServes w/ a load balancer both can send and receive, filter spam
For the webmail piece, you can cache the whole website or poor man's way - make a ramdisk and copy the website to the ramdisk, whole site is cached.
QMail or Sendmail take your pick, parse incoming mail via php/shell/perl script and round robin mail to diff mailbox dirs on the xRaid, cakewalk.
Add a separate web server for added scability
Replace filesystem storage w/ an additional 2 node Oracle RAC cluster w/ email data partitioned over the xRaid to take advantage of additional high availability & disaster recovery features.
Bottomline, if you have 20 admins for this and a room full of hardware, you need some more skills...hehe.