Keeping an Eye Out When Sites Go Down
miller60 writes "Are major web sites going down more often? Or are outages simply more noticeable? The New York Times looks at the recent focus on downtime at services like Twitter, and the services that have sprung up to monitor outages. When a site goes down, word spreads rapidly, fueled by blogs and forums. But there have also been a series of outages with real-world impact, affecting commodities exchanges, thousands of web sites and online stores."
Is downtime really more frequent? Or is it just more visible?
The answer is both.
"Kill 'em all and let Root sort 'em out"
I just use FreeBSD, PostgreSQL and lighttpd, so my sites and my clients' sites never goes down. If the sites are inaccessible, it's because the hosting providers fucked up.
In my (admittedly anecdotal) experience, major websites are remarkably UP 100% of the time. I've never seen Google go down even once in the past few years.
If you can read this... 01110101 01110010 00100000 01100001 00100000 01100111 01100101 01100101 01101011
So they're more likely to suffer downtime as any one of the many pieces can break, causing it to all go down. Look at a site like Drudge Report that gets massive traffic, but is really VERY simple to run. Then look at a site like Twitter or YouTube or something like that, which has many more services to operate and keep running together.
Twitter's infrastructure is notoriously poorly thought out, and I sort of doubt they employed any systems administrators (or service engineers, or operations engineers, or whatever) up until recently.
I think the barrier to entry from an engineering standpoint has been lowered such that you can more easily make a site that appears to be pretty decent and attracts an audience. What is often missing is the behind-the-scenes work which ensures that the service is:
- Deployed properly, with testing and staging environments that actually mirror production.
- Fault-tolerant at every practical level. This gets expensive, so you see datacenter failures take down large swaths of sites who don't have multiple locations.
- Constantly monitored, including performance metrics, to find issues quickly or ever before they happen.
This is the kind of work that always seems to take a back seat to development due to resource constraints, but it really needs to occur in tandem with the development process.
If you don't design a site from the ground up to be redundant and highly performing, its pretty difficult to flip a switch and make it that way later. Which is basically what Twitter has found out. Whether or not this mentality is taking over the Interworld is another story though.
when the site you're using to monitor whether a site is down goes down?
PANIC AT THE DISCO!!
You DO recall that people are still using windows, right? Where's the confusion?
These days web pages comprise of multiple sources, often displaying content from multiple servers. Consider that 'back in the day' a web site was a static HTML file with multiple links. These days we have a 'site' linking to an image server, media server, advertising server, with sql backbones and other content providers. When one of these sites fail, often the whole works goes down.
Personally, I don't notice an increased frequency in site downtimes with any of the services that I use and I don't feel this is newsworthy. Of course, I don't use Twitter so maybe that's why.
Don't let the site go down, you'll put your eye out!!
SIGLOST && SIGUNUSED && SIGQUIT
More people are RTFA on Slashdot stories.
Most major sites use multiple isps and servers to ensure sites don't go down. My company uses ATT and Verizon for its backbones.
Any type of load balancer in front of several web servers and application servers would prevent about 99.99% of downtimes. Thats of course barring poor coding and human error, but if you hire the rights guys, shouldn't be an issue.
What with the "software as a service" and "outsourcing system administration" fads, more sites are relying on other sites being up when they power up. This could become a problem in bringing a site back up after an outage. It's important to know which sites have "black start" capability; they can start up without any resources from the outside.
You can save money by outsourcing Linux system administration to Tomsk, Russia, or Lotus system administration to India. "Remote System Administration for your Lotus Notes/Domino Servers, Infrastructure". But can you then restart your data center from a cold start, when the offshore admin people can't yet get in?
Are major web sites going down more often?
A bit more often now thanks to AVG?
We're not sure if the sites are already dead, or if the observers changed the outcome.
God spoke to me.
...is this just more sock puppetry for Twitter -- the singular most annoying website on the planet, and the next biggest has-been.
Can we at least let one day go by without an article directly or indirectly about this POS?
How do "major web site" (as in "in any way important or at least interesting") and "Twitter" belong in the same sentence?
Now mod me flamebait and let's go on with our lives.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
AVG is probably why we have this post this week. There were a lot of timeouts last week, although Grisoft was not the only problemo. For a while Virgin Media customers in the UK lost a couple of continents last week, with the U.S.A. and Australia dropping off the map. I had to read Pravda instead of Slashdot for an hour or two...
My backup route actually worked fine and I was just in the middle of getting a squid proxy server of my own up and running when the network problems magically fixed themselves. There are lessons to be learned, if you need your internet more than is healthy then you also need a backup plan. This could be a wifi sharing agreement with the neighbours or a proxy server at work that you can dial into at home. The internet does not dynamically re-route stuff when there is a problem with a major link. This is a problem. I thought we would have TCP/IP over ATM or something like that to solve that by now.
TrustSaas.com by Australian Online Solutions is an uptime monitoring service ('SaaS Weather Report') for Software as a Service (SaaS) run by an independent third party. It checks service status every 60 seconds (over 500,000 times per service per year), instantly alerting subscribers of problems by email and/or SMS. TrustSaaS records downtime with the highest resolution possible and response times are also analysed. This information is included in monthly reports delivered by email to subscribers, allowing them to make comparisons between providers, monitor end-user experience, verify Service Level Agreement (SLA) compliance and trust that SaaS is delivering on its promises.
make the "outternet"
ahhaha......just like ham groups......
or "undernet"?
http://www.thewebsiteisdown.com
GPL: Free as in will
halcyon_on_twitter: Is there anybody out there?
UTF-8: There and Back Again
A century ago price of gasoline worried very few people. Today there are calls to nationalize oil-companies as "vital businesses" — somehow, they believe, nationalization improves things...
How long until these same Commies (or whatever they'll choose to call themselves, when the label-du-jour gets just as discredited) call for nationalization of Google or Amazon?
The nation can not exist without reliable search-engine, can it? We must nationalize Google to ensure fair and equal access to knowledge for all.
Or: our least-privileged can least-afford to buy the expensive books they need to get ahead. To help the poor with readily affordable knowledge we must have the government take over book distribution by nationalizing Amazon and other book-sellers, whose obscene profits the such-such's Administration refuses to tax!
In Soviet Washington the swamp drains you.