Keeping an Eye Out When Sites Go Down

Short version... by MRe_nl · 2008-07-06 06:15 · Score: 4, Insightful

Is downtime really more frequent? Or is it just more visible?
The answer is both.

--
"Kill 'em all and let Root sort 'em out"

Re:Short version... by arth1 · 2008-07-06 06:25 · Score: 5, Insightful

I think monopolization plays a role too.
Back when people jumped between Altavista, Hotbot, Jeeves and other engines, one of them going down wasn't so bad -- you just used another, and a day later, you wouldn't even remember that one of them had been down. But these days, everyone and his dog uses Google, and if Google goes down, people won't know what to do. Similar for other sites and hubs -- they've become too big, and users have become too reliant on them.
So even if uptime has increased, the impact of downtime has become larger, in part due to the larger reliance on single systems.
Re:Short version... by negRo_slim · 2008-07-06 06:26 · Score: 1

The answer is both.
Perhaps. But in my personal experience the big names are reliable enough for me to continue to use their services. Seems like a slow news day or someone wanted to stroke Alex Payne's ego for whatever reason as his claim to fame doesn't seem to all that worth a write up by the NY Times.

--
On the Oregon Cost born and raised, On the beach is where I spent most of my days
Re:Short version... by Anonymous Coward · 2008-07-06 09:45 · Score: 0

... and a day later you wouldn't remember that any of those sites existed because they disappeared after hiring hundreds of people to maintain what was effectively a grep script with a few parameters thrown in for good measure.
Re:Short version... by Digital+End · 2008-07-06 09:54 · Score: 1

This is just the pre-story to "ISP's point to recent outages as proof that P2P Traffic is causing death of internet"

--
Beware of he who would deny you access to information, for in his heart, he dreams himself your master.
Re:Short version... by Jurily · 2008-07-06 15:52 · Score: 1

Yeah, we need another Google, with the same resources and the same or better search algorithms.
Like that's going to happen anytime soon...
Re:Short version... by Huggs · 2008-07-07 00:07 · Score: 1

actually... my dog prefers http://www.msdewey.com/

I just use FreeBSD and PostgreSQL. by Anonymous Coward · 2008-07-06 06:16 · Score: 0

I just use FreeBSD, PostgreSQL and lighttpd, so my sites and my clients' sites never goes down. If the sites are inaccessible, it's because the hosting providers fucked up.

Re:I just use FreeBSD and PostgreSQL. by Bill,+Shooter+of+Bul · 2008-07-06 17:24 · Score: 1

Yeah, anything between the users and the site can contribute to downtime. Users don't really know what the problem is when you are down, they just know you're down. They probably won't understand or really care why. Still, you'll probably have more downtime due to hardware or application specific logic, than platform. FreeBSD, PostgreSQL, and littpd are grate pieces of software, but the popular alternatives (Linux, MySql, apache) won't introduce anymore downtime. There is no magic bullet.

--
Well.. maybe. Or Maybe not. But Definitely not sort of.

no... by religious+freak · 2008-07-06 06:16 · Score: 1

In my (admittedly anecdotal) experience, major websites are remarkably UP 100% of the time. I've never seen Google go down even once in the past few years.

--
If you can read this... 01110101 01110010 00100000 01100001 00100000 01100111 01100101 01100101 01101011

Re:no... by Anonymous Coward · 2008-07-06 06:22 · Score: 0

Google has a gazillion servers.
Re:no... by Nick+Fel · 2008-07-06 06:58 · Score: 5, Funny

I've seen Google down. Not completely unreachable, but not working. It was terrifying.
Re:no... by Koiu+Lpoi · 2008-07-06 07:01 · Score: 3, Interesting

Agreed. Google and Slashdot are the two (depending on my mood) sites I test to see if I have an internet connection. If I can't reach one, I don't even bother testing the other - I assume it's on my end, and I've not yet been wrong.
Re:no... by matt3k · 2008-07-06 07:40 · Score: 0

Google has a gazillion servers.
No Google has a Googolplex.
Re:no... by Fumus · 2008-07-06 08:24 · Score: 1

Not that long ago google and sourceforge were down for a moment because of some ISP problems IIRC. I thought my internet was down or something because I couldn't get to slashdot, nor google.
Re:no... by Opportunist · 2008-07-06 08:27 · Score: 2, Funny

Just because you searched for "sex" and "porn" and didn't get any results but linkpages and squatters doesn't mean the search engine's broken, ya know? :)

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:no... by Winckle · 2008-07-06 10:43 · Score: 1

About 3 or 4 years ago google had a DNS issue which meant it was unreachable via domain name, and only by typing in the actual IP address.
It was pretty weird to see google.com not responding, I initially assumed that my modem was having connection problems, but then I read on the SA forums about the problem.
Very weird to be without google.

New sites are more complicated... by Anonymous Coward · 2008-07-06 06:19 · Score: 4, Interesting

So they're more likely to suffer downtime as any one of the many pieces can break, causing it to all go down. Look at a site like Drudge Report that gets massive traffic, but is really VERY simple to run. Then look at a site like Twitter or YouTube or something like that, which has many more services to operate and keep running together.

The twitter factor by ximenes · 2008-07-06 06:24 · Score: 5, Insightful

Twitter's infrastructure is notoriously poorly thought out, and I sort of doubt they employed any systems administrators (or service engineers, or operations engineers, or whatever) up until recently.

I think the barrier to entry from an engineering standpoint has been lowered such that you can more easily make a site that appears to be pretty decent and attracts an audience. What is often missing is the behind-the-scenes work which ensures that the service is:

- Deployed properly, with testing and staging environments that actually mirror production.
- Fault-tolerant at every practical level. This gets expensive, so you see datacenter failures take down large swaths of sites who don't have multiple locations.
- Constantly monitored, including performance metrics, to find issues quickly or ever before they happen.

This is the kind of work that always seems to take a back seat to development due to resource constraints, but it really needs to occur in tandem with the development process.

If you don't design a site from the ground up to be redundant and highly performing, its pretty difficult to flip a switch and make it that way later. Which is basically what Twitter has found out. Whether or not this mentality is taking over the Interworld is another story though.

Re:The twitter factor by Anonymous Coward · 2008-07-06 06:44 · Score: 2, Insightful

Twitter made a big mistake by basing their technology around Ruby on Rails.
Ruby on Rails is, of course, great for CRUD-style websites. It makes development lighting fast, and easy as sin. Twitter doesn't exactly fall into that category. Although Ruby on Rails is flexible enough to develop a small-scale version of the Twitter application, it just isn't capable of scaling.
They really need to be looking into Erlang. Erlang is perfect for the type of software they need to provide the service they offer (see ejabberd for example). Plus it's open source, and it has a vibrant online community, and frequent releases, numerous conferences, interfacing with other languages, and other goodies.
Erlang originated from, and has been successfully used within, the telecom industry, which is very similar to the market Twitter is involved with. Thus they should learn from the masters, and use Erlang wherever possible for their core services.
Re:The twitter factor by jnovek · 2008-07-06 06:47 · Score: 5, Insightful

"If you don't design a site from the ground up to be redundant and highly performing, its pretty difficult to flip a switch and make it that way later. Which is basically what Twitter has found out."
And really, that's OK.
Sites like Twitter are popping up precisely because the bar is very low to get your idea out on the 'net and compete. Sure, the cost in dollars and person hours is much higher to refactor for stability later, but would Twitter have even come into existence if that was a requirement from the start? Would its founders have considered it a worthwhile risk?
Jason
Re:The twitter factor by Anonymous Coward · 2008-07-06 07:15 · Score: 2, Interesting

This is the kind of work that always seems to take a back seat to development due to resource constraints, but it really needs to occur in tandem with the development process.
That's not true. As the Twitter, Digg, Flickr, etc. examples clearly show, it's much more important to appear "pretty decent" when you corner the market than anything else. The cost of doing it properly from the get go can not be shouldered by a company with an unproven concept, neither time- nor money-wise. Most of these services are 99.9% user base and 0.1% implementation. If you can get the users with a rough sketch, it is then much easier to get the resources for even a complete rewrite of the server software. Besides, this isn't even a business biased view: Most programmers agree that the first implementation is for understanding the problem and the second implementation is for solving it.
Re:The twitter factor by ximenes · 2008-07-06 07:18 · Score: 1

I agree that there is a trade-off here. If you spend too much, take too long and aren't 'agile' enough then your site will be old news by the time you get it out of the gate. No one cares, and its all pointless.
On the other hand, if you don't spend any time worrying about the future you will be totally unprepared if you reach your goal of user interest. Then the site doesn't work sufficiently well to retain users, no one cares and its all pointless.
I think part of the overall issue is that while there are numerous frameworks and reusable components to ease development, most of them don't really add anything when it comes to future maintainability. In my experience, they often detract.
So much of what it takes to run a site properly is bespoke and closely guarded, unlike the vast development resources out there for the taking. Its good in a way (it keeps me employed), but its also a waste of time. Ideally we should be solving new problems, not wasting time having Apache rotate logs without restarting for the 1,000,000th time.
Re:The twitter factor by msimm · 2008-07-06 07:23 · Score: 1

I think the barrier to entry from an engineering standpoint has been lowered such that you can more easily make a site that appears to be pretty decent and attracts an audience.
I think you hit the nail on the head. Sites are increasingly complicated applications with a great set of increasingly complex tools available to help you bring your ideas to the public. Of course this doesn't help so much with the basics you've mentioned and to make things even more complicated the requirements for scaling are becoming increasingly complex; as we move from the read web to the write web scaling becomes less about replication and memory caching and more about complex sharding and well planned data layout.

In the old days it was only the big boys who really worried about these types of things, but today it's small and medium sized ventures doing it, and you can expect to see a few more short-cuts taken and a few blunders along the way.

--
Quack, quack.
Re:The twitter factor by Anonymous Coward · 2008-07-06 07:45 · Score: 0

I'll be nice to you.
Have Apache log to syslog (a remote syslog host, of course).
That way, you have access to your precious logs when you need them (i.e., when your servers might actually be down).
Now go solve a new problem.
Re:The twitter factor by ximenes · 2008-07-06 08:06 · Score: 2, Insightful

OK, lets explore this. If I was to log to syslog, only the ErrorLog supports it. In order to do this with an AccessLog, I would have to use the piped log output feature to route to a script that I write which in turn writes to syslog for Apache.
This is exactly the sort of bespoke stuff I'm referring to. Why should this need to be implemented 1,000 times at company after company to accomplish the exact same thing?
Re:The twitter factor by LEMONedIScream · 2008-07-06 08:10 · Score: 2, Insightful

Did you just get paid to write that?
Re:The twitter factor by ximenes · 2008-07-06 08:14 · Score: 2, Insightful

I agree to an extent, but I also think that not all of these sites will survive their re-implementation periods in the face of better-designed competitors. Flickr, for instance, is internally a mess. I presume part of this is due to poor initial implementation, but its further compounded by a need to Yahooize it at every level.
I presume Twitter will encounter a mass exodus at some point, as its users are likely to be very keen to move on to the next big (and possibly more reliable) thing.
Every time a site is down, you run the risk of irretrievably losing a portion of your users. Once you get enough bad will going, you don't even have to have failures; just having a reputation as not being reliable can be enough.
Re:The twitter factor by drinkypoo · 2008-07-06 08:56 · Score: 1

By the time you get big enough to really have to worry about scalability more than just turning on caching, you ought to be able to produce enough revenue to reimplement the site. If not, obviously you aren't relevant (or you aren't clever enough.) :)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:The twitter factor by dubl-u · 2008-07-06 09:14 · Score: 3, Insightful

Sites like Twitter are popping up precisely because the bar is very low to get your idea out on the 'net and compete. Sure, the cost in dollars and person hours is much higher to refactor for stability later, but would Twitter have even come into existence if that was a requirement from the start? Would its founders have considered it a worthwhile risk?
That's a common after-the-fact excuse for not thinking at all about performance, but I've concluded that it's mostly bullshit.
Sure, if you consider these questions up front and know what you're doing, it's completely possible to defer most of the work until things start to pick up. That's a very legitimate business decision, and if you get a big surprise in your growth curve, it's possible to get crushed. But with a little load testing, responsible development practices, and a little forethought, you've got a very good chance of avoiding a disaster. And none of that needs to be a big barrier to just getting something out.
On the other hand, if you just don't think about those questions at all, building things willy nilly with no preparation for refactoring and growth down the road, then that's just idiotic. You are in effect betting that you will fail, in that your site will work only if it doesn't get popular. And with something like Twitter, where the network effect is king and you could only make money with a shitload of traffic, massive growth is the only way to succeed.
From what I can tell, Twitter is firmly in that second camp. They've been going for nearly two years, and they've been shaky for most of it. One black eye from a sudden surge is acceptable, and for some is even a badge of honor. But more than a year of load-based suckage, to the point where you are an international joke, is a sign of plain incompetence. Although it hasn't killed Twitter, it has killed other businesses, and Twitter is not out of the woods yet.
Re:The twitter factor by nabsltd · 2008-07-06 09:28 · Score: 2, Interesting

Fault-tolerant at every practical level. This gets expensive, so you see datacenter failures take down large swaths of sites who don't have multiple locations.
I work on a site that has pretty much every conceivable fault-tolerance you can get short of multiple sites: multiple separate ISPs leading to router and firewall hardware that is redundant for each ISP along with multiple load-balanced front-end web servers connected to load-balanced database and file servers (with every server running Solaris). Everything has multiple power supplies connected to different mains feeds and different generators. All of this is frightfully expensive and heavily monitored.
Yet, the #1 thing that is causing downtime is the failure of the clustering software on the file servers to actually fail over if something goes wrong. So, whenever the file system mounts fail, the whole system is down until those servers are rebooted, which takes 1-2 hours because of the clustering software.
Yet, if those file servers would have been relatively cheap with no redundancy, they could have been re-booted quickly and the file system mounts automatically recovered within 15 minutes.
So, the moral here is that more fault-tolerance isn't always the best way to maintain uptime. Carefully deciding where to spend money on what type of fault-tolerance is going to get you more uptime in the long run of the real world, instead of spending unwisely to increase statistical uptime.
Re:The twitter factor by msimm · 2008-07-06 09:30 · Score: 1

That's the idea. Although I'd hope that you put at least some considerations into things, planning and the real world don't always match up perfectly. More so, because a lot of the technology that you'll find yourself deploying is either new or to be developed in-house.

--
Quack, quack.
Re:The twitter factor by dubl-u · 2008-07-06 09:34 · Score: 3, Insightful
By the time you get big enough to really have to worry about scalability more than just turning on caching, you ought to be able to produce enough revenue to reimplement the site. If not, obviously you aren't relevant (or you aren't clever enough.) :)
I've heard this theory a lot. With regrettable frequency, it's part of noob entrepreneur business plans. I see three big problems with it.
1. If a sudden surge in popularity is forcing you to work on scalability, that's exactly the point that you don't want to work on scalability. Finally, people care about your site! So now you want to give them cool new features regularly, so they don't go away again. Plus, they discover (and create) problems that you need to solve with new code.
2. Scaling is much harder to do when you're behind than when you're ahead. If you're already creaking under load, you run around doing a lot of quick fixes that do nothing for the long term. All of the budget you planned for that rebuild can quickly get eaten up just keeping things from catching on fire.
3. Per-user margins have been steadily declining for pretty much the life of the web. Decreased hardware and bandwidth costs mask some of that. And the vast growth of the internet audience makes up for the rest. But over time you have needed larger and larger numbers of people to have a viable web business. So you need to serve a lot more people to support a staff than you did early on.
Twitter is a good example of all of these problems. They surely started out saying they would worry about scaling later. Then later came, and they had other things to do: new features, dealing with abusers, setting up a customer support infrastructure. Their quick scaling fixes kept their heads barely above water, but they didn't do much for the long term. And they are still in the "grow big, grow fast" stage, so they don't have any revenue and would rather wait a while longer to deal with that.
Re:The twitter factor by drinkypoo · 2008-07-06 11:43 · Score: 1

And they are still in the "grow big, grow fast" stage, so they don't have any revenue and would rather wait a while longer to deal with that.
The problem isn't what is included in the plan, but what isn't. Google understands that sometimes things don't scale as well as you hoped, which is probably the real reason for the eternal beta status of so many google products; they can just close new subscriptions any time. Use the same model when the pressure is on and it will keep the fires burning low while you work on the replacement.
If you can make the thing work fast from day one, that's great. But a lot of these sites would never have even existed in the first place if it weren't so easy to get them up and running (if not scalably) in the first place.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:The twitter factor by ximenes · 2008-07-06 11:48 · Score: 1

I agree, 'enterprise' solutions are often more trouble than they're worth.
There are a lot of solutions that look good on paper and then turn out to be serious pains in the ass in practice, or have a failure mode that is actually worse than the common method (but perhaps less likely).
One of my biggest gripes about systems administration is that there are all of these solutions to make life easier, but a lot of them are basically traps if you are running a large scale operation. Take centralized authentication for instance. Makes life a lot easier, reduces the possibility of mistakes due to manual intervention. Yet if that service is unavailable, it could bring the entire infrastructure down and require a non-trivial solution.
So then you fall back on the old method of manually updating /etc/passwd, which is also a super pain. You make something to automate the process, push out updates to systems as necessary (or use something like puppet or cfengine).
I guess my point is, this stuff has been thought through often, but the technology isn't ubiquitous to handle the problems. Companies have their own internal glue to deal with these type of problems, and it keeps getting re-invented over and over again.
Re:The twitter factor by dubl-u · 2008-07-06 11:49 · Score: 1

Google understands that sometimes things don't scale as well as you hoped, which is probably the real reason for the eternal beta status of so many google products; they can just close new subscriptions any time. Use the same model when the pressure is on and it will keep the fires burning low while you work on the replacement.
This sounds plausible, but it is often a path to failure.
You can get away with growth limits during an early private beta phase, but turning away interested users when you catch fire with the general public is asking for trouble. Many of them will never come back. My pals at Google tell me that they now see GMail's slow-growth approach as a giant mistake, something that cost them lots of users that they have not so far managed to get back.

If you can make the thing work fast from day one, that's great. But a lot of these sites would never have even existed in the first place if it weren't so easy to get them up and running (if not scalably) in the first place.
That is a false dichotomy. Doing all the scaling work up front is indeed expensive. Being ready to do it in stages just as you need it isn't. You just have to know what you're doing.
Re:The twitter factor by drinkypoo · 2008-07-06 12:12 · Score: 1

Another reason to do it that way is that six months is more than enough time for some major evolution in web platforms. You could turn around and blink and a new major version could be out that gives you literally an order of magnitude more scalability.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:The twitter factor by Anonymous Coward · 2008-07-06 13:13 · Score: 0

Shouldn't this be modded funny? Erlangs a bloody joke!
Why would anybody choose Ruby on Rails knowing they will have to port their site to a real solution for any kind of scalability? Just start out in j2ee or .net and be done with it!
Re:The twitter factor by Jeff+DeMaagd · 2008-07-06 14:23 · Score: 2, Insightful

It might even be better savings that way, but the way people talk about how Twitter is set up, it sounds like the people that set it up didn't even know what they were doing, like maybe they dropped out of school halfway through the database class. Given that they are still having problems, I think it's reasonable to suggest that they still don't know what they are doing, even though their VC funding should have allowed them to hire enough qualified people to fix the problem. The way it is now, I wonder if there really is any resale value in the company. At this point, they have no revenue stream, not even ads as far as I can tell, so it looks like they're looking to build a service that gets bought out by a big company. I think whoever buys them would almost certainly not be buying them for the employees, the organization, the code or the infrastructure, but rather, just the users and only the users. I see little value in anyone anything there except in what amounts to buying the users.
Re:The twitter factor by mcrbids · 2008-07-06 18:18 · Score: 1

If you don't design a site from the ground up to be redundant and highly performing, its pretty difficult to flip a switch and make it that way later. Which is basically what Twitter has found out. Whether or not this mentality is taking over the Interworld is another story though.
Truer words have never been spoken. I've successfully deployed an application that "bit" in the marketplace, and has grown rapidly. Since it's a niche product, you've never heard of it and probably never will. Nonetheless, we've been approaching the limits of what a single server + database server can accomplish, and have been on a year-long project to fix this.
We've been carefully building in the technology to scale linearly for a year now, testing extensively each step before rolling out the next change, and then doing it again, even as we continue to add features that customers want and need. Even with a properly layered software stack, it's a long, slow process that will probably never end so long as this company is making money. A fairly major update was just applied to production on 7/3.
We've done a fabulous job of running lean and mean on a single logic server - it's radically more difficult to build a high-integrity, database-driven application for a cluster with linear scalability!

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:The twitter factor by Doctor+O · 2008-07-06 23:27 · Score: 1

And really, that's OK.

In general, it would be OK, but Twitter having performance problems is something I just can't understand at all.
I mean, look at it. What does it do that couldn't be scaled by putting up web site load balancing, a DB cluster, and some clever caching?
I'm not trolling, I'm genuinely interested. Someone please enlighten me. I'm building web applications for almost ten years now and have no idea how they manage to perform so badly. And no, "RoR doesn't scale, n00b!" does not count. There are quite some examples of complex RoR sites that perform well in spite of heavy traffic (see Xing and Multiply among others).

--
Who is General Failure and why is he reading my hard disk?
Re:The twitter factor by Builder · 2008-07-07 00:14 · Score: 1

See, I'd hire you based on that comment - well put.

But what happens by Anonymous Coward · 2008-07-06 06:39 · Score: 2, Funny

when the site you're using to monitor whether a site is down goes down?

PANIC AT THE DISCO!!

Re:But what happens by Magic5Ball · 2008-07-06 08:32 · Score: 1

This was recently discussed at the Outages list:
http://64.233.167.104/search?q=cache:gjmRbu02vRUJ:isotf.org/pipermail/outages/2008-June/000754.html+%22Outages+have+an+Outage%3F%22&hl=en&ct=clnk&cd=3&gl=ca&client=firefox-a
They're in the middle of migrating servers or something, so outages.org seems to be down at the moment:
https://puck.nether.net/pipermail/outages/2008-July/000084.html

--
There are 1.1... kinds of people.
Re:But what happens by Geak · 2008-07-06 16:46 · Score: 3, Interesting

I can't really trust those network monitoring sites. They aren't accurate. All they can tell is that the site is down "from their location". I work for a webhosting company, and I've run into numerous cases where a customer is screaming that his website is down because they network monitoring site sent him a report saying so. The truth of the matter was the site was up the entire time (even the customer could get to the site when I had them actually try). If a node goes down anywhere between the monitoring site and the user's website, they get a false positive. On top of that, you have to wonder if any of these monitoring sites are also deliberately sending false reports. Back when I was working for an ISP, I remember there was some kind of network monitoring software that came out, and a number of people were installing on their computers. It would start warning customers that their "network connection was saturated - blah blah blah" and customers would call in blaming us. Within a few days I started seeing reviews on the net about the product, and some research showed that it was deliberately generating false reports for anybody that wasn't with a certain large coaster shipping ISP. Apparently the software company was a shareholder. I can't remember what the name of the product was however, this was back in the old dialup days.
Re:But what happens by teknognome · 2008-07-06 17:25 · Score: 1

Then you get things like Is isTwitterDown.com down?
Re:But what happens by Anonymous Coward · 2008-07-07 03:00 · Score: 0

I can't really trust those network monitoring sites. They aren't accurate.

I agree with you, an inaccurate monitoring service is in some ways worse than none it all if it generates false alarms that make people numb to alerts or misses outages and gives a false sense of security.
Have you looked at Panopta? They have a unique system that confirms outages from multiple geographic locations before deciding a site is down. We've been using them for a few months, with no false alerts and several small outages that were detected.

You do recall... by Anonymous Coward · 2008-07-06 06:47 · Score: 0

You DO recall that people are still using windows, right? Where's the confusion?

More sites using multiple external sources by urbanriot · 2008-07-06 06:55 · Score: 2, Interesting

These days web pages comprise of multiple sources, often displaying content from multiple servers. Consider that 'back in the day' a web site was a static HTML file with multiple links. These days we have a 'site' linking to an image server, media server, advertising server, with sql backbones and other content providers. When one of these sites fail, often the whole works goes down.

Personally, I don't notice an increased frequency in site downtimes with any of the services that I use and I don't feel this is newsworthy. Of course, I don't use Twitter so maybe that's why.

Re:More sites using multiple external sources by Stanislav_J · 2008-07-06 08:03 · Score: 2, Insightful

These days web pages comprise of multiple sources, often displaying content from multiple servers. Consider that 'back in the day' a web site was a static HTML file with multiple links. These days we have a 'site' linking to an image server, media server, advertising server, with sql backbones and other content providers. When one of these sites fail, often the whole works goes down.
Which is also why many major sites are so slow to load on less than optimal connections (which many are still stuck with). Personally, I find all the bells and whistles distracting, complicating, and useless. It seems like sites compete to see how crowded and busy they can make their pages. Right up at the top of the list for me are sites that insist on displaying some stupid Flash screen (that adds nothing to the meat and potatoes content/function of the site) and give you no option for bypassing it. The Internet could be a marvelous animal for information if website designers could just resist the impulse to throw every possible widget and geegaw into the mix. It not only adds little to the basic functionality of the site, but as pointed out above, just increases the number of individual elements that can fail and slow or stop a site in its tracks.
Me, if I want the MLB scores, or the news headlines, or to compare prices between a few retailers, all I need is the information, please -- I don't need need a floor show accompanying it.

--
"Every great cause begins as a movement, becomes a business, and eventually degenerates into a racket." -- Eric Hoffer

PHB to Webmaster... by brianc · 2008-07-06 06:56 · Score: 1

Don't let the site go down, you'll put your eye out!!

--

SIGLOST && SIGUNUSED && SIGQUIT

My theory: by Anonymous Coward · 2008-07-06 07:01 · Score: 0

More people are RTFA on Slashdot stories.

Most Always up by Anonymous Coward · 2008-07-06 07:02 · Score: 0

Most major sites use multiple isps and servers to ensure sites don't go down. My company uses ATT and Verizon for its backbones.

Re:Most Always up by Anonymous Coward · 2008-07-06 07:22 · Score: 0

And it only takes one backhoe to destroy them all.

Load Balancer maybe?? by ZonkerWilliam · 2008-07-06 07:02 · Score: 0

Any type of load balancer in front of several web servers and application servers would prevent about 99.99% of downtimes. Thats of course barring poor coding and human error, but if you hire the rights guys, shouldn't be an issue.

Blackstart capability by Animats · 2008-07-06 07:05 · Score: 4, Interesting

What with the "software as a service" and "outsourcing system administration" fads, more sites are relying on other sites being up when they power up. This could become a problem in bringing a site back up after an outage. It's important to know which sites have "black start" capability; they can start up without any resources from the outside.

You can save money by outsourcing Linux system administration to Tomsk, Russia, or Lotus system administration to India. "Remote System Administration for your Lotus Notes/Domino Servers, Infrastructure". But can you then restart your data center from a cold start, when the offshore admin people can't yet get in?

Re:Blackstart capability by dubl-u · 2008-07-06 09:45 · Score: 3, Insightful

An important, related issue is the loss of local knowledge.
If you did a web startup ten years ago, you pretty much had to hire a sysadmin. If you had a good one, they would yell at your developers about their retarded, unscalable designs. Having a scary bearded man threaten you with defenestration has its downsides, but it does give you an incentive to consider the impact to operations.
The ever-lower cost of hosting is also a problem. If you tried to just throw $250k of hardware at a scaling issue back then, hopefully some executive would come by and ask some WTF-ish questions. (Unless you were at Boo.com or Webvan, natch.) But now, monthly rental on equivalent computing power is circa $400. Who'd bitch about that? Which allows you to really settle in to a totally unscalable architecture.
Re:Blackstart capability by thogard · 2008-07-06 23:34 · Score: 1

You forgot about the renewed trend in reusing other peoples code even when it complicates things. Just grab something off the web and link it in and hope it works. When you had a real sysadmin running the servers, the developers would write the few lines of code and not download a package and its dependencies just to avoid some work. It seems like all our new development seems to be mostly bogged down in getting everything that we didn't write to work with everything else we didn't write.

Thanks, Grisoft by FilterMapReduce · 2008-07-06 07:09 · Score: 4, Funny

Are major web sites going down more often?

A bit more often now thanks to AVG?

Slashdot uncertainty principle by CrazyJim1 · 2008-07-06 07:10 · Score: 5, Funny

We're not sure if the sites are already dead, or if the observers changed the outcome.

--
God spoke to me.

Re:Slashdot uncertainty principle by Anonymous Coward · 2008-07-06 08:04 · Score: 0

MOD PARENT UP

or... by owlnation · 2008-07-06 07:56 · Score: 0, Troll

...is this just more sock puppetry for Twitter -- the singular most annoying website on the planet, and the next biggest has-been.

Can we at least let one day go by without an article directly or indirectly about this POS?

Re:or... by Buran · 2008-07-06 08:36 · Score: 3, Insightful

So don't go there, don't click on links to it, and stop bitching about it. It only annoys you if you let it.
Or do you just like to whine?
Yes, they got a mention, because they can't fucking make the damn thing stop dying. If you want to be that prominent you need to get your shit together, or take the flak.

--
i am a soviet space shuttle
Re:or... by Anonymous Coward · 2008-07-06 15:10 · Score: 0

I havn't been to Twitter and I'm not quite sure what it is. I'm not sure I really care.
Does that mean I need to turn in my geek card, or that I've remained blissfully ignorant?

Could anyone give me a hint? by Opportunist · 2008-07-06 08:31 · Score: 1

How do "major web site" (as in "in any way important or at least interesting") and "Twitter" belong in the same sentence?

Now mod me flamebait and let's go on with our lives.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

Re:Could anyone give me a hint? by Anonymous Coward · 2008-07-06 09:14 · Score: 0

How do "major web site" (as in "in any way important or at least interesting") and "Twitter" belong in the same sentence?
Now mod me flamebait and let's go on with our lives.
It's a "major web site" in that a lot of people use it. You're welcome.
Re:Could anyone give me a hint? by dubl-u · 2008-07-06 09:57 · Score: 1

How do "major web site" (as in "in any way important or at least interesting") and "Twitter" belong in the same sentence?
They are at position 933 on Alexa's list of the world's most visited websites. I'd guess that means circa 1.5m registered users, 2.5m visitors/month, and 7.5m page views/day. As a comparison, they have about 2-3x the reach of Slashdot.
They may seem less well known to you than that because it's a social networking app that has spread mostly by word of mouth. If your friends use it, you won't be able to escape it; otherwise, it will seem irrelevant.
Re:Could anyone give me a hint? by Culture20 · 2008-07-06 13:13 · Score: 1

If your friends use it[Twitter], you won't be able to escape it; otherwise, it will seem irrelevant.
My friends have been using it a lot. It still seems irrelevant.
Re:Could anyone give me a hint? by Opportunist · 2008-07-06 19:25 · Score: 1

I know people who use it. I also know people using myspace. But then again, I also know people eating at McD's...
Just because "everyone" does it doesn't make it relevant. Actually, if anything, it makes it irrelevant.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:Could anyone give me a hint? by dubl-u · 2008-07-07 07:50 · Score: 1

But then again, I also know people eating at McD's... Just because "everyone" does it doesn't make it relevant. Actually, if anything, it makes it irrelevant.
You should get together with that other guy. He thinks it's irrelevant because nobody's using it. You think it's irrelevant because everybody's using it. Together, you could be contemptuous and dismissive of everything!
What fun! It would be a party. Wait, no, it would be more like the opposite of a party. But either way, the rest of us would be better off.

no, but... by ClarisseMcClellan · 2008-07-06 11:51 · Score: 2, Interesting

AVG is probably why we have this post this week. There were a lot of timeouts last week, although Grisoft was not the only problemo. For a while Virgin Media customers in the UK lost a couple of continents last week, with the U.S.A. and Australia dropping off the map. I had to read Pravda instead of Slashdot for an hour or two...

My backup route actually worked fine and I was just in the middle of getting a squid proxy server of my own up and running when the network problems magically fixed themselves. There are lessons to be learned, if you need your internet more than is healthy then you also need a backup plan. This could be a wifi sharing agreement with the neighbours or a proxy server at work that you can dial into at home. The internet does not dynamically re-route stuff when there is a problem with a major link. This is a problem. I thought we would have TCP/IP over ATM or something like that to solve that by now.

TrustSaaS just launched last week... by Anonymous Coward · 2008-07-06 12:05 · Score: 0

TrustSaas.com by Australian Online Solutions is an uptime monitoring service ('SaaS Weather Report') for Software as a Service (SaaS) run by an independent third party. It checks service status every 60 seconds (over 500,000 times per service per year), instantly alerting subscribers of problems by email and/or SMS. TrustSaaS records downtime with the highest resolution possible and response times are also analysed. This information is included in monthly reports delivered by email to subscribers, allowing them to make comparisons between providers, monitor end-user experience, verify Service Level Agreement (SLA) compliance and trust that SaaS is delivering on its promises.

so be it by Anonymous Coward · 2008-07-06 13:25 · Score: 0

make the "outternet"
ahhaha......just like ham groups......
or "undernet"?

The salesguy asked him to reboot the webserver. by Mr.+Hankey · 2008-07-06 16:20 · Score: 1

http://www.thewebsiteisdown.com

--
GPL: Free as in will

Twitter down? by halcyon1234 · 2008-07-07 01:30 · Score: 1

halcyon_on_twitter: Is there anybody out there?

--
UTF-8: There and Back Again

How long until calls to NATIONALIZE them? by mi · 2008-07-07 06:30 · Score: 0, Flamebait

A century ago price of gasoline worried very few people. Today there are calls to nationalize oil-companies as "vital businesses" — somehow, they believe, nationalization improves things...

How long until these same Commies (or whatever they'll choose to call themselves, when the label-du-jour gets just as discredited) call for nationalization of Google or Amazon?

The nation can not exist without reliable search-engine, can it? We must nationalize Google to ensure fair and equal access to knowledge for all.

Or: our least-privileged can least-afford to buy the expensive books they need to get ahead. To help the poor with readily affordable knowledge we must have the government take over book distribution by nationalizing Amazon and other book-sellers, whose obscene profits the such-such's Administration refuses to tax!

--
In Soviet Washington the swamp drains you.

Slashdot Mirror

Keeping an Eye Out When Sites Go Down

77 comments