Debugging Microsoft.com
teslatug writes "Channel 9 has an interesting video interview with Chris St.Amand and Jeff Stucky who test and debug Microsoft.com. They reveal some of the big problems they used to face such as recycling processes every 5 minutes due to memory leaks and 32 bit limitations, and being unable to push more than 10 Mbits of data to their datacenters due to Windows' networking stack limitations."
WMV? You serious?
How the hell am I supposed to watch that?
The summary is missing the fact that many of their problems went away after upgrading to an early 64 bit version of Vista with its improved networking stack.
Help Brendan pay off his student loans
Why don't they just migrate to Apache on OpenBSD? :)
Oh, right...
1. mplayer
2. xine
Not that tough, really, now is it?
The secret to creativity is knowing how to hide your sources. -- Albert Einstein
I suppose that, transitively, it is due to a limitation in an archaic version of the BSD stack.
"Strangers have the best candy" -Me
Hey, Microsoft has to eat their own dogfood if they want to keep some modicum of credibility no matter how bad the food tastes...
Is anybody really suprised here? What they didn't tell us is that there's a top-secret Debian redundancy server running behind it just in case all hell breaks loose. Nothing to see here, move along.
( I
Is that not one of the most ironic things you've ever heard? The limitations of the operating system made by the same company holding back another division? Shock and awe.
The limitations discussed in the video of the Windows TCP stack are not limited to Windows. These are limitations imposed by a to-the-spec implementation of TCP. TCP is 30+ years old, and it wasn't designed for the kinds of networks it runs on today.
The new TCP stack in Vista effectively implements TCP is such a way that it removes these limitations while preserving compatibility with old stack implementations.
Interviewer: "Hey dude."
Chris St.Amand "What up bro"
Interviewer: "So like what happened when you worked on microsoft.com? Oh but first...Did you get all the chicks at the bars when you mentioned your job or what?"
Chris St.Amand "Oh totally. I'd just say, 'what up babe. I work on the microsoft.com web portal' and she'd degfrag my harddrive all night."
Interviewer: "Sweet. So what was your biggest hurrdle writing all that HTML? After all that's a complicated langaguage to master."
Chris St.Amand "It'd definelty have to be that F'ing page not found shit. You don't know how many times I'd go to microsoft.com after doing a big update and it'd just say four-oh something and the page just wouldn't show up. You know we tried to put up a 420 page not found but got in trouble with our boss."
Interviewer: "Yea totally! That would have been cool. Oh ummm let's see here. So what other problems did you have?"
Chris St.Amand: "Not being able to use FreeBSD to serve that shit. When I first heard I actually had to use Microsoft I was completely like, 'Not cool Bill. Not F'ing cool, Bill.'
Interviewer: "Any thing else? Like was it hard to get up every day in the morning knowing that your existence was updating microsoft.com HTML?"
Chris St.Amand: "Yea I tried sucicide a number of times. But then I discovered that I could just completely make up new HTML tags and that was a lot of fun."
Interviewer: "Make up HTML?"
Chris St.Amand: "Oh yea, we're microsoft. When I first started they told me that no other browsers exist other then that big blue F'ing E and that no other operating systems exist. And that I could do whatever I wanted to do. So I just started making up *ALL KINDS* of crazy ass HTML.
Interviewer: "Cool dude. You rock. Anything else you want to mention?"
Chris St.Amand: "Yea you know all that crazy F'ed up HTML that all of our products output? You know without indention and messed up question marks everywhere? That was me. I was all hung over the day I added that. And that's about it."
Interviewer: Thanks Chris, I'm sure you'll go down in infamancy for such a piece of F'ing shit web page and end up in some lame ass 'Don't write web pages like this' hall of fame.
Chris St.Amand: "Peace out and remeber to eat your greens not smoke 'em!"
They should be redesigned.
That's a big problem of software made by companys:
1 - The company's cashflow is based arround selling new versions of the software
2 - They can't sell to it's customers improvements that they customers can't see
3 - There is a fixed time that can go by beetween one release and the next one
4 - Resources are limited
Because of this, a major redesign is something that won't be profitable, because only the advanced users will note the changes, but 99% of their customers won't, so the software won't sell well. Bug fixes also won't sell, because they are also unvisible to the naked eye of the majority of the userbase, and also customers expect those changes to be free.
So, some companys only can expect revenue from a given software once a year, and they have to invest into that software, a given set of limited resources over, say, 6 months, when they have to freeze the featureset so they can start debugging. Seeing which things sell, they will obviously focus their atention on: New Features, and a nicer GUI.
OTH, a project that doesn't have a company running it, can just get out lots of upgrades, when needed, and focus their time on making the software better, even if some of the changes made to the software won't be seen by most of it's users.
With software prices dropping, and Free Software proving to be a better option, the budget of software companys will be even more limited, and we won't see this situation changing anytime soon.
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Slightly off topic, but the new Windows TCP stack will be implementing their new Compound TCP stack, aka, CTCP. More information can be read here:
a spx?type=Technical%20Report&id=940
http://research.microsoft.com/research/pubs/view.
Am I the only one who looked at the title and thought: "debug microsoft.com? Who still uses .com files any more?"
Yup, thought so. I suck.
They reveal some of the big problems they used to face such as recycling processes every 5 minutes due to memory leaks and 32 bit limitations, and being unable to push more than 10 Mbits of data to their datacenters due to Windows' networking stack limitations."
Micro$oft needs 64 bit so it can leak more memory faster and stay running. Or at least this is how I read this.
As for 10mbs, maybe they should put a Linux/BSD/UNIX cache in front of those servers like MSNBC did to get through the last olympics.
Absolutely true. I used to work for a hosting company, we had GNU/Linux and Windows servers. ...
The GNU/Linux servers were the ones with more hits, and the ones that required less atention. The windows servers were a pandora box of problems. IIS just can't hold up by itself, if you just serve static pages you are ok, but when people starts using that asp + odbc shit, you have to restart IIS every 5 fucking minutes. We used to receive a stupid "too many conections" from ODBC in our log, and restarting the stupid services woudln't do a damn thing, all you could do was restart the machine, Yes, restart a SERVER. That's about the worse thing a sysadmin can go through, the panic of not knowing if that crappy windows was going to come back up or not. OTH, our GNU/Linux machines with sites running a variety of CGI apps (PHP, Perl, etc), all using MySQL, supported 5 times the load on the windows machines without complaining, and i'm talking about 300 sites on simple x86 hardware, less powerfull than the one on the windows machines, that died with less than 100 sites
WTF am I doing replying to an AC at 5 A.M on a Friday night?
Well do you think want to give us Linux users the satisfaction of seeing Microsoft employees admitting faults in their software?
Microsoft does this all the time. They call it eating their own dogfood. In a way, it's quite smart actually. One, it shows customers that they aren't afraid to run their own product. Two, it helps them learn how to use and support their products in a large network. And three, it helps them find defects in the software.
Geek used to be a four letter word. Now it's a six-figure one.
According to microsoft, the MSN messenger service (which serves to around 70 million people) used to run on 250 32-bit servers, and now it runs on just 25 or something like that... (apparently one of the big reasons was the limit on the number of tcp connections).
It's quite amazing to think that a service as huge as messenger can run on just 25 servers!
The AACS key is NOT 0xF606EEFD628B1CA427BEA93A9CA9773F
The following is just hearsay, as I've never actually worked for MS. But a couple of engineer buddies I used to work with did some subcontracting for MS, and they said they deployed a whole lot of internal-facing *nix servers during that period. I tend to believe it, because the MS security guys who taught some seminars I attended wouldn't confirm or deny that they used any Linux internally. If they could have denied it in clean conscience, wouldn't they have done so emphatically?
Working in a DevOps shop is like playing in a band made up entirely of keytarists.
IIS just can't hold up by itself, if you just serve static pages you are ok, but when people starts using that asp + odbc shit, you have to restart IIS every 5 fucking minutes.
That's not because of IIS; it's because of the people writing the ASP apps and stupid admins not configuring IIS correctly. If you have stupid people writing applications, those applications have a tendency of doing stupid things. Combine that with admins who don't properly isolate that applications running on IIS and you've got a recipe for requiring an IIS restart "every 5 fucking minutes".
Give me 5 minutes and I can write a nice app that takes down Apache no problem. A few infinite loops, perhaps each creating a dozen new database connections and allocating a massive string buffer in memory.
IIS 6.0 has a lot of features built into it that allow for admins to configure application pools to more effectively isolate applications. You can configure those application pools to recycle automatically given certain criteria (like memory usage, CPU usage, # req/sec, @ req/total, etc.), and the pools are isolated from each other so that if one dies due to a misbehaving application, the other applications on the system are not affected.
We used to receive a stupid "too many conections" from ODBC in our log, and restarting the stupid services woudln't do a damn thing, all you could do was restart the machine, Yes, restart a SERVER.
Perhaps that's all you could do, but somebody who spent more than 10 minutes reading about administering IIS would know to recycle the ODBC COM+ application to clear out the connection pool. Then they would find the stupid people writing that crappy applications and fire them, or at least isolate their applications in a separate app pool or worker process. (Dllhost.exe.)
Spare me the anecdotal stories of your LAMP solutions doing so much better than your Windows solutions. You have absolutely no credibility given your complete ignorance.
Hmm, nearly-direct link to a 145-megabyte video file on the /. front page, posted right as the geeks of the world are getting home from work. What are you, crazy? Are you trying to Slashdot Microsoft?
Don't answer that.
"In a 32-bit world, you're a 2-bit user. You've got your own newsgroup, alt.total.loser." -Weird Al
Watch the video. That wasn't the problem.
The problem was connecting two datacenters that were physically seperated by a long distance but connected with a high bandwidth pipe... the TCP protocol has problems with this because of latency issues.
Read this to see how they solved it.
Bandwidth vs Latency.
/proc/sys/net/ipv4/icmp_ratelimit
:)
Take a truck. A huge one. Fill it up with recorded DVD's and send it over a hundred miles distance.
You'll have huge bandwidth.
But wait, somehow a DVD got lost in transit. What now ?
You have to phone back and have a taxi to pick it up and deliver the missing DVD.
As you need the last DVD, you'll have to wait. Your bandwidth decreases.
It's pretty much costly for you to do so if you miss a DVD.
So you decide to take only a hundred DVD's per truck and using multiple smaller trucks. But somehow none is missing this time, so you spent a lot of money for the extra trucks.
This issue is somehow similiar to Heisenberg's Uncertainty Principle. You cannot get maximum bandwidth and minimum latency.
Linux can respond faster if it has to. OS/X doesn't do that because it does not want to.
It can also respond slower:
$ cat
250
Tune it as you wish.
Yes, I had some beers today, and what?
No, no, no... they can saturate a 10MB/s connection easily. What they had problems with was database connections over a long distance (a problem with TCP, not windows)... which they rectified (using a concept called CTCP), check this paper out: http://research.microsoft.com/research/pubs/view.a spx?type=Technical%20Report&id=940
-everphilski-
You could scan through all of my old posts for background if you like, but back when NT 4.0 was brand new, I helped to save a failing ISP (for at least the next 6 months or so) by setting up a new mail server to replace the one that was failing ever 2 to 10 minutes. I used a machine with less than half the power and resources of the machine already running... and loaded slackware. I think the kernel was jsut over 1.00 at the time.
...I guess I've repeated enough digs on microsoft for one posting...
Yeah, "old technology" couldn't do anything better than new stuff like NT right? Come to think of it, there's not a LOT of difference between XP's kernel and NT's from what I understand... a few bug fixes here and there... but basically, it uses the same vulnerable messaging scheme and drivers running at ring-0 and all that.
After having this video playing in the background for awhile, one interview question caught my ear:
"So is your security getting better?..."
Aside, its funny to hear them concede that they're actually having to adjust for other browsers visiting their home page.
"Use standard-compliant code? Heresy!..."
The one on the left is Coke, the other 3 are Red Talking Rain. Personally, I'm a Green Talking Rain programmer, but I can respect teh other side :) Talking rain (particularly green) is the nectar of the programmers here in Seattle.
:)
You see, Microsoft started the great thing a few years back where every floor was stocked with 2 giant refrigerators of free soda. The rest of the local software companies quickly moved to copy this ingenious move, so you can't program and not be in contact will all the free soda you can drink. This sounds pretty cool until you've done it for about 2 years. At that time, assuming you are not a natural soda addict, the last thing on earth you want to drink is any kind of beverage with sugar in it, because you are so unbelievably sugared out. In come Talking Rain. Talking Rain is a simple carbonated spring water, with just a hint of fruit oil added, and no sugar. Green Talking Rain adds lime oil, and Red Talking Rain adds Rasberry, I think, although being a Greener myself, I never really paid attention. The fact that only senior programmers have completed this Talking Rain pupation, allows you to easily glance at someone's trash can in their office and peg them for a Senior or Junior level developer. You will almost never see a Junior level developer drinking Talking Rain, and almost never see a Senior level NOT drink it. Kind of a free soda pecking order.
Of course I may be reading to much into this, but my Greener roots run deep
Latency and bandwidth are not orthogonal when you have flow control. Try looking up 'bandwidth delay product' and tcp windowing. To achieve 1gbp/s to mars you need to buffer all that data in case of packet loss. Available memory will throttle your throughput.
A quick web search says round trip times to mars are between 10-50 minutes. Say 60 minutes * 60 seconds = 360 gigabits of window space to achieve full line rate. Now consider some minor packet loss and even with SACK you're buffering an unreasonable amount of data.
Annoying that the parent got modded up with bad information and this post will likely be passed over.
Slashdot has turned from "Microsoft sucks" to waxing poetically about how Microsoft used to suck.
How times change...
It was one of there secondary sites, something like blah.microsoft.com. The ISP was supposed to be hosting it on a colo NT box as part of an outsourced hosting contract. Well the site crashed constantly and the support team got sick of the late night pager calls and moved it over to a BSDI box with Apache and spoofed the server headers to read IIS, never told the M$ guys.
And they still manage to have a service outage for at least a few minutes to a few hours a month. AIM and Yahoo! don't seem to do that to me.
Administration, software issues, whatever. MSN isn't that amazing, especially compared to the other services.
My blog. Good stuff (when I remember to update it). Read it.
all everyones problems went away when they switched to winxp ?
Sorry i though everyones problems went away when they switched to winme?
Sorry i though everyones problems went away when they switched to win98
Sorry i though everyones problems went away when they switched to win95.
all i seem to hear before a new windows release is how xxxx is stable now xxxx starts up in only 4 seconds xxxx doesnt have this problem xxxx doesnt have that problem.
Windows has had commercial server software for how long ?
and its just fixing a stack limitation when ?
Heh, if you had to pay those MSFT licensing fees, I'm sure you'd find a way to reduce the number of Windows Servers you used too. ;)
Microsoft--and the two staffers shown in this video--deserve strong praise for the *unedited* candor, the self-depricating humor, and the absense of spin on this video.
:-)
Maybe I've missed the comments, but what no one seems to mention here is that these guys--clearly both geeks at heart (in a good way)--really are peeling back a lot of the layers of MS's site. The candor about their security problems, the 2gb memory issues, and a variety of other things was refreshing.
Heck, they even mention firefox.
Good work all. Good work.
Running 'Nix is like owning a Lightsaber. It's "a more elegant weapon for a more civilized time."
At around 10:25 in the video Chris St. Amand, who runs Microsoft's website and data center, types in his password, which the camera recorded. And the video is hosted off of Microsoft's website...although I don't know how long that'll still be operational.
If you think that AIM never goes down, you have no idea what you're talking about. I've had AIM shit out on me MANY, MANY times, and yes, this is with the actual AIM client. It'll kick me off, and I won't be able to sign in for a few minutes, sometimes it'll get stuck at verifying login/password and just sit there until it times out, etc.
AIM has its server problems too.
Also, not everyone who disagrees with you is an astroturfer. As hard as it may be to believe, some people might ACTUALLY have different experiences and opinions as you.
But Apache never crashed (and this was on a comparatively memory-poor box by today's standards - 256 meg), just took a second or two ... and nobody else connected to the box complained.
.NET framework is and how much bang for my buck I can get out of ASP.NET on IIS. Sometimes I pick Java for those rare cases one needs a server application to be portable.
Apache, like IIS, has a finite number of threads it uses to handle incoming requests. If you use up all those threads, Apache, and IIS, can't respond. You either must increase the number of threads or users will be denied access to the site. Eventually, you run out of system resources. In either case, you've prevent one (or likely a lot more) request from being fulfilled by the web server. End of story.
Your example is a foolish one. You never caused Apache to run out of resources. If you had, it would have "crashed" as the originally posted meant it... it couldn't handle further requests. That wasn't because Apache is superior in some way to IIS, it's because your clicking didn't use up all the threads. Simple as that. That's what I was explaining... the same thing can happen to Apache as can happen to IIS. Just because Apache is open source doesn't make it invulnerable to resource exhaustion due to inept programmers.
No, its Windows that pretty much has no credibility. The one thing it DOES have that nobody else has is the widest selection of trojans, viruses, worms, and idiot users.
That and the majority of the fortune 500 companies running on it. Windows is a fully capable server platform, and there are countless examples to back that up... just as there are countless examples that show that Linux can be a capable server platform. My point was that IIS is not inherently flawed as the original poster suggested. In fact, IIS 6.0 is in my opinion the best web application server on the market if cost is not an issue. (Windows licenses can be too expensive for a small company.) It's had extremely few security holes (FAR fewer than Apache has in the same timeframe), it's very fast (thanks to advanced features like kernel mode listeners), it's extremely reliable thanks to application isolation, process recycling, and great management and monitoring tools, and it's host to many excellent development platforms from PHP to ASP.NET.
IIS 7.0 is shaping up to be even better with some great ways to customize the web server to make it as bare metal as possible if that's what you want.... taking a hint from Apache in this case.
But for you to sit there and question the intelligence of somebody who uses Windows as a server platform shows your ignorance. It shows you don't bother to really examine alternatives to what you're comfortable with. When choosing a platform for a project I make sure to consider as many things as possible... from portability requirements, to intellectual property issues, to performance, to cost, to ease of development. That's my job as a software architect. Sometimes I choose LAMP for its very low initial cost. (Basically free.) Sometimes I pick ASP.NET because of how robust the
Regardless, there are lots of options out there and until you're able to pick the best one for the job at hand you're just going to be limiting yourself for no good reason. Both career wise and intellectually.
Pardon me if I think you're lying through your teeth. How could they not notice that they're no longer connecting to a Windows server? They would still have to connect via FTP or something other protocol, did you spoof those too? Not just that, how did you manage to fake the whole directory tree? If they connect to upload files, they'd notice it was a unix system by the file hierarchy and the fact that ASP DIDN'T WORK ANYMORE. Yes, there are some *nix ASP products, but they don't work that well. They'd definitely notice something was wrong the second they tried changing something on the website.