Behind the Scenes at Hotmail
mallumax writes "ACM Queue interviews Hotmail engineer Phil Smoot on how they manage more than 10,000 servers spread around the globe. Between them, they process billions of emails per day and are overseen by hundreds of administrators. To do that they have returned to the command line. From the article: 'Our operations group never wants to rely on any sort of user interface. Everything has to be scriptable and run from some sort of command line'. The overriding philosophy seems to be KISS. Also: tape backups are out and spam levels have stabilized."
The overriding philosophy seems to be KISS.
Don't try to tell me that the guys at Hotmail only want to Rock & Roll all night and party every day?!?
He who knows best knows how little he knows. - Thomas Jefferson
..who they call for support? :-)
If I recall correctly, wasn't Hotmail originally run on UNIX boxes?
Excuse my speling.
Making The Bar Project
"Between them, they process billions of emails per day and are overseen by hundreds of administrators."
And how does the NSA process all that email? Now THAT would be an interesting technical challenge!
What OS it runs on and which web server? I am not trying to be funny.
One ring to bind them - should probably have more fiber and less rings in their diet.
They've gone back to the command line? I wonder if it's SFU (Services for UNIX) where they at least have bash, or if they're having to wear out the "\" key and give their right pinky-finger carpal tunnel? /P
Quo usque tandem abutere, Nimbus, patientia nostra?
... stands for "Keep it simple, silly", or "Keep it simple, stupid". There are other variations on the acronym but the general idea is the same.
There are 2 kinds of people in this world. Those that can keep their train of thought,
Isn't this the same Hotmail that started out running Linux or a UNIX?
...and Windoze died under the load that the *NIX environment handled with aplomb.
Then Redmond purchased to try to show that Windoze scaled just as well as *NIX
I guess they finally managed to migrate it to Windoze... I think it very amusing that they avoid the GUI and are a CLI/scripted environment. The more things change, the more they stay the same.
Not only are the questions well picked but the some of the answers are quite interesting. For instance Phil on scalability: Before reading this article, I always had hotmail pegged as a hacked together e-mail system less organized than a monkey sh*tfight but if Phil speaks the truth, I've underestimated them. They're a hacked togethor server mess with a lot of effort put into staying afloat--and they have been doing well for a long time.
I guess I've always taken my free Hotmail account for granted.
My work here is dung.
I used to get about 35 spam a day in my primary hotmail account that I'd had since 1997. Now it gets about 4 a day so things have improved, but my biggest concern about Hotmail is that its virus scanning is horrible. There have been several times when it would have let me download a virus attachment, or allowed multiple obvious virus messages through. They've switched to Trend from McAfee, but I think the problem still remains.
Saskboy's blog is good. 9 out of 10 dentists agree.
From the article:
/.
Hotmail relies on less than 100 system administrators to manage it all.
From the summary:
Between them, they process billions of emails per day and are overseen by hundreds of administrators.
Brought to you by the high quality control here at
BF Can you quantify in some way the extent of the spam problem?
PS It is massive. Years ago we saw as many as 3 billion incoming messages. This has declined, but the estimates are that 75 percent of all e-mail is spam. Over the past couple of years our techniques have gotten better, and our partnerships with other major ISPs have improved. I would say spam is still gross and abusive, but it hasn't been getting worse lately.
We do continue to react to spam on a daily basis as spammers continue to seek out holes in our defenses. What we see now is more sophistication in the spammers--more phishing schemes, people trying to get credit card numbers and that kind of thing.
But didn't he get the memo from headquarters? Bill Gates said there would be no more spam! They better get to work -- they're running out of time!
GetOuttaMySpace - The Anti-Social Network
> To do that they have returned to the command line.
Absolutely.
I'm currently in the process of trying to change our company culture away from legacy GUI tools and toward command-line tools.
Scriptability is a highly under-rated goal. I'm not against GUI tools -- but they need to be built on top of scriptable utilities.
It's worth noting that anyone in the IT field knows that the command line is much more powerful than any GUI. And let's not forget that it's just cool to show your friends how you can manipulate mainframe servers without a mouse. :-)
I always thought that the command line was a user interface. You know, interfacing between a user and a computer.
It's hard to picture using a computer without any sort of user interface. I'm pretty sure that, in order to call it "using" a computer, some sort of interface must exist, be it keyboard mouse and monitor, binary switch, light gun, real gun, neural link, telekinesis, or whatever. Otherwise, you're not using it, are you?
On the other hand, maybe the article is correct- a lot of operations group probably don't want to use "any sort of user interface" to communicate with their computers. They want to be sitting on a beach in tahiti drinking daiquiris, thousands of miles away from the computers they're supposed to maintain.
Can anyone tell me how to set my sig on Slashdot?
I am genuinely amazed that they need even that many systems admins. That breaks down to only 100 machines per administrator.
I have worked on projects with that many hosts before and only had maybe 10 colleagues.
In the landscape of today's megaservices, Hotmail just might be Mount Everest
Is this true? I thought Google might be the Everest. Anyway, speaking from personal experience, in my university every student has multiple yahoo/gmail accounts but just a handful use Hotmail. Can someone throw light on the actual number of users all over?
Have you guys ever sat back and wondered what the world would be like without spam? Think of how much processing power the Hotmail servers have to throw at filtering out spam. I know our company personally blocks around 75% of all incoming mail with RBL's before it even gets into the system to be further processed with the anti-spam tools and yet spam STILL slips by all that. Could you imagine having a physical mailbox absolutely filled to capacity ever single day with junkmail.. to the point where you have trouble sifting through it all to find the legitimate mail and bills?
Smile, it confuses people
It's like one commercial after another. 'See how great we are!!'
Right... it's always more interesting to read article after article about only unsuccessful operations run by people who aren't proud of what they do, and don't face huge, global challenges.
You're cranky because it's MS. If exactly the same article ran, substituting "gmail" and "google" for all of the other names, you'd say, "cool!"
Don't disappoint your bird dog. Go to the range.
Is that a name? I thought smoot was a unit of measurement.
Fight Spammers!
"Those who don't understand UNIX are doomed to reinvent it, poorly."
From the article and elaborating on the
Q: Are there scaling reasons to think about the benefits of a command line for managing over a GUI, or are there other things to think about?
A: Our operations group never wants to rely on any sort of user interface. Everything has to be scriptable and run from some sort of command line. That's the only way you're going to be able to execute scripts and gather the results over thousands of machines.
Also, we all remember the scaling issues that MS had when they took over hotmail and initially tried to switch from freebsd to Windows.
MS had to port over cron jobs because its not something that is installed and used by default under windows like UNIX. They had to rewrite the "inefficient" perl code that ran fine on FreeBSD to C++. They had to redo the memory allocation to prevent memory leaks in the new C++ code. Read about it from the goat's mouth http://www.microsoft.com/technet/interopmigration
I can't wait until FreeBSD and other inferior OSes get tools to find memory leaks. One day....
(That last line was sarcasm and not a flame).
What's even more funny is that they won't rely on any user interface (that's what the article says). Because a command line isn't graphical, it's not a user interface?
Oh, I see. The command line, which works in a script, is a programmer interface. Programmers aren't users, of course.
Looks like the site is down, it is however there is, however, a Coral Cache copy.
You might want to double-check your cut-and-paste.
I read somewhere, not too long ago, that Hotmail was set to go to a new and improved user interface that would look alot like Outlook. I haven't seen or heard anything since, certainly not on my Hotmail account. Can anyone shead some more light on this rumor. When, where and for whom is this update coming?
Or am I just being delusional again?
To or from? :-)
So long and thanks for all the spam
So sad that it should come to sham
Could anyone suggest a better rhyme for spam?
Why does he keep mistaking the word "use" for the word "leverage" ? The only possible advantage I can see in substituting the word "leverage" is that it sort of implies they are making the best use of these tools that they can in which case you would think that most people would have already assumed they are not making the worst possible use they could of the tools and it's interesting that the author feels it necessary to make that distinction.
One of the funniest trivia about hotmail is that, from a long time, it ran entirely over *BDS, even after it was bought by Microsoft.
I suppose they have changed to W2003 by now, but the image damage was done.
--
Superb hosting 20GB Storage, 1_TB_ bandwidth, ssh, $7.95
Huh? I recall my hotmail account having a "Find" button for a while now. Right in between the "Junk" and "Put in folder" buttons. Go ahead and have a look.
It's interesting, but for some specific uses, IIS does a great job of handling traffic. For example, streaming video from servers seem to run a lot better on IIS and seem to be a little less resource intensive. I'm not sure about the overall use of Hotmail, though.
[%] Cingular Ringtones
I think you're missing the point that these server are geographically separate and it may be worth the "inefficiency" of having a full-time or on-call administrator that is near a hotmail colocated facility. If there was a cluster of server that were inaccessible in the Egyptian server (just to pick a random country), you wouldn't want to fly an admin out that's posted in England, even if it is only a few hours' flight. It's worth it to hire and train a local presence.
Except if you RTFA, you'll find it's less than 100. Dumbass.
Or is it just me?
I have to use a different account to keep in contact with those frieds of mine that still uses hotmail and wont switch (mainly because they'd loose their friends thet uses hotmail)
more than 10,000 servers spread around the globe ... are overseen by hundreds of administrators.
Heh. I used to work at Akamai which provides content delivery services for many of the biggest sites on the web. They have somewhere over 15,000 servers that are managed by tens of administrators, not hundreds. In fact, a typical NOCC (yes, 2 'C's for Akamai) shift at Akamai is only staffed by 8 or so people, with only a couple of senior level admins on call. And they're delivering all sorts of web-based content, including streaming, not just e-mail.
But then Akamai runs them all on linux, whereas I belive Hotmail is all Windows based. You do the math.
AT first, it was BSD running on a bunch of identical custom-made sub-1U servers. But No! Then it was replaced by windows boxes . . . racks and racks of 99c Fry's keyboards velcroed to the backs and fronts of racks, with miles of small-gauge track, upon which ran diabolical steam-powered robots, each with a single arm and with fingers at the end, forever fixed at the precise spacing to stab the keyboards' CTRL-ALT-DEL keys. Noisily the robots rumbled back and forth on their appointed rounds . . .
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
The administrative mantra is to automate.
The administrative mantra is to automate.
The administrative mantra is to automate.
"they process billions of emails per day" and probably most of it is spam. who in his right mind uses a hotmail account anyway?
IAAL
He submits, of course, that any program can be written in any reasonable language -- for they all are, after all Turing machine's equivalents. But the quality of the tools can make a difference between a feature being added next week and not at all.
If Hotmail's admins are back to command line and scripting anyway, maybe, they should've stuck with FreeBSD.
Look at how quickly Google is rolling new things out -- their platform allows them to.
In Soviet Washington the swamp drains you.
Surely the command line is a User Interface?
This apparently appeared in "People Weekly", April 24, 1989, v. 31, p. 93+
Harvard Bridge spans the Charles River linking Boston and Cambridge. In 1958 Lambda Chi Alpha took 5' 7" MIT freshman pledge Oliver R. Smoot, Jr. and rolled him head over heels the entire length of the bridge. Every ten smoots they calibrated the bridge, painting marks. The bridge was found to be exactly 364.4 smoots plus an ear. Successive pledge classes repainted the markings.
In 1987 the Mass. Dept. of Public Works decided the concrete of the bridge was due for replacement. They had no plans for smoot preservation. The Boston Press tracked down Oliver R. Smoot, Jr. who was then age 48, and executive vice president of Computer and Business Equipment Manufactures Association in Washington D. C. He had no plans of being reused for new markings.
The Mass. Metropolitan District Commission, the government body in charge of the bridge went on record in support of smoots. They stated, "We recognize the smoots' role in local history. That's not to mean that the agency encourages graffiti painting. But smoots aren't just any kind of graffiti. They're smoots! If commemorative plaques and markers are not installed by the state once the bridge work is done, then we'll see that it's done."
Stephen Smoot, a son of Oliver R. Smoot, Jr, was then age 21 and attending MIT was ready to redo the smoot measurements, although he was 5'11", so everything would be off.
There are a couple of pictures of Oliver R. Smoot, of MIT students ready to redo measurements with Stephen Smoot, and of a plaque that reads:
"This plaque place in honor of THE SMOOT which joined the angstrom, meter and light year as standards of length, when in October 1958 the span of this bridge was measured, using the body of Oliver Reed Smoot, M.I.T. '62 and found to be precisely 364.4 smoots and one ear. Commemorated at out 25th reunion June 6, 1987 M.I.T. Class of 1962"
Another clipping states that the Mass. Dept. of Public Works gave two Smooted sections of sidewalk to the MIT museum at a ceremony. Continental Construction Company of Cambridge also agreed to make the new concrete sidewalk slabs 5' 7" long to coincide with the Smoots, instead of the usual 6' increments.
I'm sure Phil still hears about this
"It is a greater offense to steal men's labor, than their clothes"
I can't wait until FreeBSD and other inferior OSes get tools to find memory leaks. One day....
You don't think UNIX apps written in C++ leak like their windows counterparts? The problem there wasn't the operating system, it's that C++ should only be used by experts because it's so %^$^ hard to get right!
They tried to speed things up by going from perl to a compiled language, if that was the true bottleneck and they were spending all their time in the perl interpretor the mistake was not that decision, the mistake was the choice of compiled language. (C++ is %^$^ hard!)
Don't get me wrong... I love C++. I also love to drive cars. I don't let my son drive cars OR code in C++ because both require training to be safe. Coding C++ is like driving a Formula 1 racer... one wrong move and BLAMMO...
XML is a known as a key material required to create SMD: Software of Mass Destruction
Indeed it does! Though it is still filtering emails from competing services..
Maybe I didn't catch it, but I RTFA and I didn't see the specific numbers of 10,000 severs or "hundreds" or admins mentioned. IIRC, it said they use less than a 100 admins, and several thousand severs, or language such as that.
... and all the other junk they throw at you know. Yes, it used to be an enjoyable experience before Microsoft bought them. No SPAM, no cookies, Javascript, and it worked in any browser without problems. Ads were scarce - you'd never see more than one on a page. POP3 and auto-forwarding were standard features at no cost whatsoever. Heck, I was away for 4 months and still had my account when I returned. Boy have things changed....
Slowly they took away each good feature one by one. The real nail in the coffin for me was when they deleted all my sent messages out of the blue and instituted that as their new policy. I left hotmail, went to Yahoo Mail, and never looked back. Why on earth does anyone put up with hotmail, when there are far superior alternatives available, like Yahoo and Gmail?
The Boston police have been known to use smoot markers to indicate accident locations on the bridge. Apparently Smoot's experience as a unit of measurement led to a life-long career; he eventually became Chairman of the Board of the American National Standards Institute, and later President of the International Organization for Standardization.
"It is a greater offense to steal men's labor, than their clothes"
My favorite: "geo-distributed data centers when the speed of light becomes a factor" - that's a keeper from the Microsot dust-bin of bad apps (and bad writing).
Microsoft programmers seem to learn everything anew [and sadly, to create their own terminology for standards and terms already defined and accepted by standards groups].
They could have referenced what others had done, could have paid attention to the W3C groups and learned, like the rest of us did, about REpresentational State Transfer(REST) and the principles upon which the WWW was architected. But instead they usually recreate all such effort. Sad(Lucky?) thing is they're not finished yet; they still lag behind.
"Embrace & Extend" in execution becomes "Misunderstand & Misapply".
who needs hotmail anyway? just use gmail(99% spam free) and be happy.
Hotmail can no longer be used with Outlook because Microsoft admits it's easier to block everyone and charge for access rather than cancel the accounts of spammers.
As an aside, I know this guy (Phil Smoot). He worked pretty close with my dad at PG&E, and I remember when he was working on TerraServer. My dad says he's a hell of a Tuba player. Check out the credits: http://www.cellophanesquare.com/item_music.asp?mg= 15&id=R+++197127
The majority of people has as a hotmail account just to use their MSN Messenger. In Europe everyone uses MSN Messenger.
Yet despite the talented people working on Hotmail, they still fall flat on their face in two apparently challenging areas:
1.) Logging in. You would think that since I already typed hotmail.com in the address bar, I wouldn't have to type "@hotmail.com" in the log-in form, but alas, the solution has aluded them. In fact, it seems to have escaped them altogether, since it used to be that way. Apparently having seperate hotmail.com and msnmail.com, storing a cookie, or even just having a radio button is beyond the limits of their servers. The extra 12 characters I have to type wouldn't bug me so much, except for the fact that there's no logical reason for it.
2.) Logging out. The msn.com page that you're redirected to when you log out ranks almost as low as the AOL page that pops up when you log into AIM (a seperate problem that can be solved by using GAIM) as far as usefullness. The pointless crap they pass of as news on that site drives me up the wall (TomKat Wedding Colors to Include Fuscia, Poll: Will an asteroid hit the earth in 2029? blah blah blah). All I want to do is delete my spam, but I have to put up with this in order to do it.
With crap like that, I'm often tempted to ditch hotmail. If they can't take being a dust bin for email lists that I don't care about seriously, I see no reason why I should bother to use their 250 mB of email storage. Oh wait, it's free...right.
The story is not about mail servers. It's about Microsoft being great.
The story they want you to pick up here is twofold.
1) Hotmail runs on a Microsoft OS now and definately not BSD
2) Our admins write scripts and use the command line just like you might on UNIX/Linux
You're meant to be amazed and think, "Hmm, that Microsoft stuff must be good just as good as UNIX/Linux then, maybe I'll put off the switch"
Don't be a mug.
I think M$ has plenty of information to create a true analysis. That would paint a great before & after picture for all of those people that want to migrate from Linux to Windows.
I can see the headlines now: Migrate to windows, increase your server footprint by 1000% and your IT force by tenfolds!
Anyone that knows anything knows that Hotmail does not have any bragging rights.
just use some kind of *nix box and we all here at /. will love you. Until then ....
Most system administrators tend to spend the majority of their time doing semi-repetitive tasks, with relatively little variation in these tasks, which is obviously best geared towards a command line. This makes command line scripts that take arguments (to customize the action) much easier than a GUI.
On the other hand, most software engineers and several other disciplines have to deal with tasks that are vastly different, so that command line use to do these tasks would be a nightmare compared to a well-designed GUI. The bottom line: the repetitiveness of tasks determines the optimal choice of GUI vs. command line use.
Compared to Google clusters, they seem to be light years behind. As a software developer, I can tell you that the key to rolling out applications quickly, is to have a decent framework in place. Whatever that framework might be (from shell scripts to java monstrosities), once its in place, developing apps on top of it are easy. Similarly a well thought out app execution environment is golden.
If you ever check out Google's MapReduce, you'll see what I mean. It's just so well thought out and so elegant, that its easy to believe that they can scale outwards forever. You'd not be too far off if you thought that Microsoft were rethinking their whole production environment to compete with Google.
There's no way that Microsoft can quickly and easily roll out vast new applications that scale, because that whole clustering framework is completely opposite to what Windows provides.
Newsfollow.com
Spam, lovely Spam, wonderful Spam!
k etch)
http://en.wikipedia.org/wiki/Spam_(Monty_Python_s
http://en.wikipedia.org/wiki/Smoot
I do this kind of thing for a living. We have zero people regularly staffing lights out colo cages all over the world. We're not within a few hundred miles of any server. On the odd instance that we have to reseat a blade or physically unplug a server or replace a patch cord we contract someone locally to do that under out supervision.
The only exception to this is where local law requires us to do this as in some EU countries.
But WTF do I know, I get modded for trolling and I've only been in IT for 25 years?
SFU is a port of openbsd's userland to windows. As such it uses ksh, not bash.
I was a strong hotmail user before Microsoft took it down, uh, I mean took it over.
It was a great service! One of the first, and probably the best.
Microsoft took it over and there was no advancement or innovation for years (a decade?). Spam ate up my tiny inbox while Microsoft just threw MSN graphics all over the place.
When Gmail came out, I gave it a try. It was everything Hotmail could have been years ago if it hadn't been bought by MS! (Well, it COULD have been out of business, so I've got to give them that I suppose).
They forced Microsoft to pay a little attention to features. They gave out a little more storage and started blocking some spam, but it was too little too late.
In order to write this I decided to visit my hotmail inbox, I haven't been there for a while. 136 emails, and 43 have been detected as junk. They are ALL junk--A party invite from "heather", a Cola Quiz, etc. 136 undetected junk emails out of 179.
And even at that, they still only give 1/8 the amount of storage that Google does.
Crap, on top of that I just looked at a spam with pictures in it and it didn't auto-block them like Google does. Now I'm probably infected.
Thanks Microsoft!
From,
The guy who used to argue the advantages of Microsoft to the Unix admins...
Comparing Akamai with Hotmail is a nogo, think of handling spam for example.
Cheap MS bashing, don't reward it.
Repeat after me: We are all individuals
Why don't you RTFA first, the summary was incorrect. He said "LESS THAN 100" admins.
"Network Operations Centre" Centre?
I think BDS came out of Utah in the 60's.
You remember Utah in the 60's? Back when Spock was taking too much LDS?
Sigh, I've sunk to stealing jokes from William Shatner.
1. I'm not english
:), I was a little pressed when I wrote that post...
Well I'm sorry for all the mistakes
BTW I do not think its much worse than the Slashdot average.
Pay no attention to that man behind the curtain....
Every time you call tech support, a little kitten dies.
at 99.9%. They call it 75% because they don't count those that have paid M$ for circumvention of their spam filters. They're UCE's.
(In case you think I'm kidding, do some searching - on Google, not on MSN, it won't show up there. One of the ways they turned Hotmail into a profit centre was by selling permission to spam their customers).
Not much more than what TFA describes. The heart of the article (for me as a system administrator and "architect"):
"PS If you rely on scale up, you'll probably get killed. You should always be relying on scale out."
So they are thinking about this at MS. It may not work as well in Windowsland as BSDland right now, but they have their all-seeing eye turned un that direction.
And when Google says "large number of commodity machines" (as in your mapreduce link) I suspect they are just being their usual smart-ass selves. Those "commodity machines" are no doubt just as optimized, scrutinized, and standardized as Hotmail's.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
If my memory serves me correctly, Microsoft bought Hotmail from Steve Jurvetson for $300mln. http://www.dfj.com/team/steve_bio.shtml
Per Aspera Ad Astra.
What do we know about Google's architecture? Not much more than what Hotmail discloses about their architecture in their article.
I'm not trying to be an advocate one way or the other - I think both Google and Hotmail are doing as best they can with the architectures they are constrained too. Hotmail has to eat their own dog food, for better or for worse.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
(just a little over a year ago) At a fairly new data center, Hotmail's backend still had a LOT of Sun Enterprise 4500 boxes running Solaris. None of them were being phased out at all. But, all of the boxes that were being brought online were HP/Compaq boxes running Windows.
10,000 servers running windows, all connected, using IIS, and AD and they just figured out that the command line is "kuhl".
It's for breaking down large datasets and processing them in chunks.
I use mapreduce in Nutch to run mozdex.com, does that mean i'm more efficient them Microsoft?
We do continue to react to spam on a daily basis as spammers continue to seek out holes in our defenses.
After I gave up Hotmail for Gmail, I selected the option on my Hotmail account to only receive mail from my Hotmail contacts. Even so, I still receive a steady stream of spam on that account.
Badly.
That's how they do it.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
"Beware of he who would deny you access to information, for in his heart he dreams himself your master."
Yes, Interix produced SFU. And starting with 3.0 it uses an OpenBSD userland, like I said already. Download it and check for yourself with a "cd /bin && strings * | grep OpenBSD". I guess just talking out of your ass is easier though huh?
Good fucking christ dude, you are repeating the same, completely irrelivant nonsense over and over. Nobody gives a fuck about your ancient copies of interix, we are talking about SFU.
"Since purchasing Softway Systems, Microsoft has screwed with and crippled the software product formerly known as Interix."
No, they have included it as part of their SFU package, along with the openbsd userland. Again, which part of this is difficult for you to grasp? You can confirm this from interix.com and microsoft.com, do so instead of repeating your crazy nonsense.
And which part of "SFU uses the openbsd userland, including pdksh" is in any way wrong? Its easily verified fact, and you repeating things that have nothing at all to do with the subject won't change that. And how on earth would that make me a microsoft astroturfer? This is the single most bizzare lack of logic I have seen on slashdot, and I discussed the world being 10,000 years old with a christian fundamentalist whackjob. So congrats on the severe brain damage anyhow.