Transatlantic Cable Fault Disrupts Internet In UK
An anonymous reader submits "Web traffic between the U.S. and Europe has been hit after an undersea cable developed a major fault on Tuesday. Because the TAT-14 cable network is shaped like a ring, it should be able to cope with one such failure -- but unfortunately the consortium that owns it hadn't fixed an earlier problem, just off the U.S. coast. Just shows how systems with build-in redundancy can still go badly wrong...."
My guess is that the initial problem may have been an undersea cable. Those generally take 2 or more weeks to fix, and if the weather is really bad, they have to pull the boats back in, delaying things further.
No evidence, of course, but it seems like the most logical reason. Cables like the TAT-14 don't stay unfixed just because someone in management is lazy.
"Just shows how systems with build-in redundancy can still go badly wrong...."
No, it shows how well designed redundancy can be overcome by bad management decisions! Engineering brought low by bean counters... Gee, when has that ever happened before?!
Any technology distinguishable from magic is not sufficiently advanced.
Any technology distinguishable from magic is insufficiently advanced. - Geek's corollary to Clarke's law
The real problem is in the design of networks. Information networks are designed to be fault-tolerant (famously but erroneously attributed to a desire to withstand nuclear attacks) -- multiple connections and a "mesh" network mean that if nodes break, traffic is routed elsewhere and the network continues to function. This works great, and there's no problem with it. But the problem is, humans don't build networks this way, and economics is against doing so.
If you're buying a network connection, you buy it from the best provider available, which naturally means network connections become concentrated to a few suppliers, who in turn find economies of scale and provide lower prices, thus attracting more customers. Thus the economics of building networks naturally produces networks that have a few or even single points of failure: we noticed this on September 11th, when the knockout of the huge links through New York noticeably slowed transatlantic traffic, even to sites other than CNN and the other news sites that were being toasted by demand at that point. Centralisation is something that we naturally do because it's economically efficient, but centralisation leads to problems for networks.
In the energy sector, things are even less flexible, because energy connections are a lot more expensive to set up and difficult to maintain than information links. The US powercut was caused by the cascading failure of a daisy-chain of power stations around the great lakes. Nobody would build an information network that way any more, but it's still the natural way to build a power network. Italy's powercut was caused by a huge reliance on foreign power, supplied by JUST TWO LINKS to France -- one fell over, instantly overloading the second and knocking it out too.
Yes, we are critically reliant on these fragile networks. And yes, economic realities tend to cause these problems, but not because of privatization: it's simply because humans naturally tend to build poor networks, because those are cheaper -- no matter who pays the bills. To solve the problem, we need to pay more attention to networking theory when building all of our networks, and provide regulatory incentives to build better networks of both kinds.
Or one day, a critical failure will cause a cascading catastrophe, and it will be nobody's fault. We built the network to fail that way.
From the story: Just shows how systems with build-in redundancy can still go badly wrong
Well, build-in redundancy is just there to let you some time to fix problems before disrupting activity. I mean, if I don't change HDD A on my RAID-1 Array when it is reported to be defective, there is no point in having a RAID-1 Array. The company in charge is responsible. The "build-in redundancy" did its job fine. They just shouldn't have installed a system with redundancy if they didn't plan on fixing non-disruptive problems.
Write boring code, not shiny code!