Army DNS ROOT Server Down For 18+ Hours
An anonymous reader writes "The H-Root server, operated by the US Army Research Lab, spent 18 hours out of the last 48 being a void. Both the RIPE's DNSMON and the h.root-servers.org site show this. How, in this day and age of network engineering, can we even entertain one of the thirteen root servers being unavailable for so long? I mean, the US army doesn't even seem to make the effort to deploy more sites. Look at the other root operators who don't have the backing of the US government money machine. Many of them seem to be able to deploy redundant instances. Even the much-maligned ICANN seems to have managed deploying 11 sites. All these root operators that have only one site need a good swift kick, or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability."
So the Internet worked as it should, and routed around this disruption. The other root servers were unaffected, and still functioned fine. So what exactly is the problem?
This is what happens when you give contracts to the lowest bidder. The military may have tons of money, but that doesn't mean they spend it wisely. Even if it's not a contracted company taking care of these servers, and it's government employees (there's a difference), a LOT of those employees get their jobs based on keywords and general qualifications and several have a 'I did my time in the military and retired, they owe me this for all the hard work I did before' attitude. Not everyone is like that, and I've met some government employees (in the tech field) who really did know their stuff.. and not all contracts are bad -- but they can turn sour when a company steps in, says they'll do all that and more for this much less, and they really don't know what they're doing. I've seen that happen too. And if it's managed by soldiers.. well. They always told us, you're a soldier first, and a 'whatever your job is' after. Most technically trained soldiers don't know how to do their job well, or even at all. They just tough it out until they're an NCO, and then they're supposed to be a leader and tell their underlings to do the work.
Don't be so harsh on the US military. They only have a trillion dollar budget, you know? How are you ever going to set up redundant systems if all you get is pocket change? You have to cut corners somewhere. Maybe it's time to increase their funding a bit more.
"It's too bad that stupidity isn't painful." - Anton LaVey
It has nothing to do with this being a US Army server. It has everything to do with bad design. The people given the responsibility of a root server should NOT take that responsibility lightly.
Hardware fails. That's just how it is. Even with the highest end hardware available today, outages can happen. This is why there are 13 root servers to start with. So long as they don't all go down at once, all is good. As far as 18 hours to recover, why is that bad? With 12 others to pick from, should this one be a high priority? I think not. Getting one's panties in a bunch because a server fails and takes some time to recover makes you sound like a silly management type. Most of us lived at least a large part of our lives without any root servers - or any servers at all. It's not the end of the world if DNS goes down. It will be ok, I promise.
They're sticking to their moto and deploying an Army of one.
Whine much?
Rest assured, the government isn't holding back. Those non-redundant Army servers already cost an order of magnitude more then everybody else's redundant servers.
No sig today...
Could this simply be a part of the Cyber Storm III information warfare exercise?
http://www.military-technologies.net/2010/09/29/test-of-first-us-cyber-blitz-response-plan-begins/
Tell your friends about xenu.net
Agreed.
From the offending server's website: "BRL volunteered to host one of the original root servers ... to provide a root server for the MILNET in the event that MILNET had to be disconnected from the Internet."
The purpose of the G/H servers is not to support the greater good (that's a side benefit), but to ensure that the MILNET can function if the DoD cuts itself off from the rest of the internet.
And besides, If my math is correct, there are a total of 205 redundant root sites (http://www.root-servers.org/), so imagine going up asking for funding...
[IT Guy] "General, we need money to add another redundant root server site, if all the sites go down the internet collapses!"
[General] "That sounds bad! How many redundant sites are there now?"
[IT Guy] "Only 205"
[General]
Actually, most of the root "servers" are "anycast" now (9 of 13), so a single site failure doesn't matter. The US DoD runs two (G and H). G is anycast. H isn't. There wasn't clarification to what the issue was. It's easy to be quick to say "oh they suck", but shit happens sometimes. That's part of why we don't run on just one root nameserver. :)
For all we know, it could have been a planned outage. I kinda doubt it with that size window, but who knows. It was only 1 of 13, which makes it more like 1 of an awful lot since 9 of the "servers" are really servers distributed world wide. I was doing some monitoring a while back, showing how our traffic moved, and that included monitoring the root servers. It made some really screwy routes, where one check would be in the US, and the next one would be somewhere in Europe.
Serious? Seriousness is well above my pay grade.
Actually, given the size and scope of the US military, you are right, 1 trillion dollars is about pocket change to most people.
I'm for increasing their budget more too. But I'm not sure that this outage wasn't planned. How better to test the ability to withstand a "cyber attack" then to lose your DNS servers and see if the your departments can fully function without them. This ability would greatly decrease the time needed to change to an alternative system if ever needed or more likely regroup resources and work around it. I'm not so sure that this wasn't just a readiness test of some sorts disguised as an accidental outage or something. It would make more sense to make it appear to be a problem server then actually gearing up to work around it that would create another potential target for any attacks.
https://lists.dns-oarc.net/pipermail/dns-operations/2010-October/006142.html
Classification: UNCLASSIFIED
Caveats: NONE
> FYI, the H root server is currently experiencing an outage
> due to a SONET ring outage possibly caused by flooding from
> the tropical storm on the east coast. No estimated repair time.
H root returned to service at 12:30 UTC today. Fiber cut due to downed
utility poles. Repair was delayed due to high water.