Keeping an Eye Out When Sites Go Down
miller60 writes "Are major web sites going down more often? Or are outages simply more noticeable? The New York Times looks at the recent focus on downtime at services like Twitter, and the services that have sprung up to monitor outages. When a site goes down, word spreads rapidly, fueled by blogs and forums. But there have also been a series of outages with real-world impact, affecting commodities exchanges, thousands of web sites and online stores."
Is downtime really more frequent? Or is it just more visible?
The answer is both.
"Kill 'em all and let Root sort 'em out"
Twitter's infrastructure is notoriously poorly thought out, and I sort of doubt they employed any systems administrators (or service engineers, or operations engineers, or whatever) up until recently.
I think the barrier to entry from an engineering standpoint has been lowered such that you can more easily make a site that appears to be pretty decent and attracts an audience. What is often missing is the behind-the-scenes work which ensures that the service is:
- Deployed properly, with testing and staging environments that actually mirror production.
- Fault-tolerant at every practical level. This gets expensive, so you see datacenter failures take down large swaths of sites who don't have multiple locations.
- Constantly monitored, including performance metrics, to find issues quickly or ever before they happen.
This is the kind of work that always seems to take a back seat to development due to resource constraints, but it really needs to occur in tandem with the development process.
If you don't design a site from the ground up to be redundant and highly performing, its pretty difficult to flip a switch and make it that way later. Which is basically what Twitter has found out. Whether or not this mentality is taking over the Interworld is another story though.
These days web pages comprise of multiple sources, often displaying content from multiple servers. Consider that 'back in the day' a web site was a static HTML file with multiple links. These days we have a 'site' linking to an image server, media server, advertising server, with sql backbones and other content providers. When one of these sites fail, often the whole works goes down.
Which is also why many major sites are so slow to load on less than optimal connections (which many are still stuck with). Personally, I find all the bells and whistles distracting, complicating, and useless. It seems like sites compete to see how crowded and busy they can make their pages. Right up at the top of the list for me are sites that insist on displaying some stupid Flash screen (that adds nothing to the meat and potatoes content/function of the site) and give you no option for bypassing it. The Internet could be a marvelous animal for information if website designers could just resist the impulse to throw every possible widget and geegaw into the mix. It not only adds little to the basic functionality of the site, but as pointed out above, just increases the number of individual elements that can fail and slow or stop a site in its tracks.
Me, if I want the MLB scores, or the news headlines, or to compare prices between a few retailers, all I need is the information, please -- I don't need need a floor show accompanying it.
"Every great cause begins as a movement, becomes a business, and eventually degenerates into a racket." -- Eric Hoffer
So don't go there, don't click on links to it, and stop bitching about it. It only annoys you if you let it.
Or do you just like to whine?
Yes, they got a mention, because they can't fucking make the damn thing stop dying. If you want to be that prominent you need to get your shit together, or take the flak.
i am a soviet space shuttle
An important, related issue is the loss of local knowledge.
If you did a web startup ten years ago, you pretty much had to hire a sysadmin. If you had a good one, they would yell at your developers about their retarded, unscalable designs. Having a scary bearded man threaten you with defenestration has its downsides, but it does give you an incentive to consider the impact to operations.
The ever-lower cost of hosting is also a problem. If you tried to just throw $250k of hardware at a scaling issue back then, hopefully some executive would come by and ask some WTF-ish questions. (Unless you were at Boo.com or Webvan, natch.) But now, monthly rental on equivalent computing power is circa $400. Who'd bitch about that? Which allows you to really settle in to a totally unscalable architecture.