Keeping an Eye Out When Sites Go Down
miller60 writes "Are major web sites going down more often? Or are outages simply more noticeable? The New York Times looks at the recent focus on downtime at services like Twitter, and the services that have sprung up to monitor outages. When a site goes down, word spreads rapidly, fueled by blogs and forums. But there have also been a series of outages with real-world impact, affecting commodities exchanges, thousands of web sites and online stores."
So they're more likely to suffer downtime as any one of the many pieces can break, causing it to all go down. Look at a site like Drudge Report that gets massive traffic, but is really VERY simple to run. Then look at a site like Twitter or YouTube or something like that, which has many more services to operate and keep running together.
Agreed. Google and Slashdot are the two (depending on my mood) sites I test to see if I have an internet connection. If I can't reach one, I don't even bother testing the other - I assume it's on my end, and I've not yet been wrong.
What with the "software as a service" and "outsourcing system administration" fads, more sites are relying on other sites being up when they power up. This could become a problem in bringing a site back up after an outage. It's important to know which sites have "black start" capability; they can start up without any resources from the outside.
You can save money by outsourcing Linux system administration to Tomsk, Russia, or Lotus system administration to India. "Remote System Administration for your Lotus Notes/Domino Servers, Infrastructure". But can you then restart your data center from a cold start, when the offshore admin people can't yet get in?
I can't really trust those network monitoring sites. They aren't accurate. All they can tell is that the site is down "from their location". I work for a webhosting company, and I've run into numerous cases where a customer is screaming that his website is down because they network monitoring site sent him a report saying so. The truth of the matter was the site was up the entire time (even the customer could get to the site when I had them actually try). If a node goes down anywhere between the monitoring site and the user's website, they get a false positive. On top of that, you have to wonder if any of these monitoring sites are also deliberately sending false reports. Back when I was working for an ISP, I remember there was some kind of network monitoring software that came out, and a number of people were installing on their computers. It would start warning customers that their "network connection was saturated - blah blah blah" and customers would call in blaming us. Within a few days I started seeing reviews on the net about the product, and some research showed that it was deliberately generating false reports for anybody that wasn't with a certain large coaster shipping ISP. Apparently the software company was a shareholder. I can't remember what the name of the product was however, this was back in the old dialup days.