Slashdot Mirror


Army DNS ROOT Server Down For 18+ Hours

An anonymous reader writes "The H-Root server, operated by the US Army Research Lab, spent 18 hours out of the last 48 being a void. Both the RIPE's DNSMON and the h.root-servers.org site show this. How, in this day and age of network engineering, can we even entertain one of the thirteen root servers being unavailable for so long? I mean, the US army doesn't even seem to make the effort to deploy more sites. Look at the other root operators who don't have the backing of the US government money machine. Many of them seem to be able to deploy redundant instances. Even the much-maligned ICANN seems to have managed deploying 11 sites. All these root operators that have only one site need a good swift kick, or maybe they should pass the responsibility to others who are more committed to ensuring the Internet's stability."

29 of 154 comments (clear)

  1. Army Intelligence? by toygeek · · Score: 2, Funny

    An Oxymoron indeed!

    1. Re:Army Intelligence? by Mr2cents · · Score: 4, Funny

      Don't be so harsh on the US military. They only have a trillion dollar budget, you know? How are you ever going to set up redundant systems if all you get is pocket change? You have to cut corners somewhere. Maybe it's time to increase their funding a bit more.

      --
      "It's too bad that stupidity isn't painful." - Anton LaVey
    2. Re:Army Intelligence? by sumdumass · · Score: 3, Interesting

      Actually, given the size and scope of the US military, you are right, 1 trillion dollars is about pocket change to most people.

      I'm for increasing their budget more too. But I'm not sure that this outage wasn't planned. How better to test the ability to withstand a "cyber attack" then to lose your DNS servers and see if the your departments can fully function without them. This ability would greatly decrease the time needed to change to an alternative system if ever needed or more likely regroup resources and work around it. I'm not so sure that this wasn't just a readiness test of some sorts disguised as an accidental outage or something. It would make more sense to make it appear to be a problem server then actually gearing up to work around it that would create another potential target for any attacks.

    3. Re:Army Intelligence? by Runaway1956 · · Score: 2, Funny

      Careful - don't lump all the military together. It's the ARMY under discussion. My navy has problems, to be sure, but my navy can keep a server up and running. Not to mention, the navy wrote the book on repetitive redundancy. I think congress should take the server away from the army, and give to the navy. Overall security should improve, and physical security will most certainly improve. Our marines haven't lost a server yet!

      --
      "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
    4. Re:Army Intelligence? by Stupendoussteve · · Score: 2, Insightful

      Er... the Navy has outsourced to HP. In fact, to get out of the agreement they are having to pay to even receive information about the network configuration.

  2. So the Internet worked as it should... by Anonymous Coward · · Score: 5, Insightful

    So the Internet worked as it should, and routed around this disruption. The other root servers were unaffected, and still functioned fine. So what exactly is the problem?

    1. Re:So the Internet worked as it should... by jayhawk88 · · Score: 4, Funny

      Because it's Saturday, and we don't have anything else to get upset about! WE HAVE TO HAVE SOMETHING TO GET UPSET ABOUT, DON'T YOU UNDERSTAND?! How can I be expected to face the day if I'm not pissed off about something that doesn't directly affect me in any meaningful way?

    2. Re:So the Internet worked as it should... by OeLeWaPpErKe · · Score: 2, Insightful

      We've all made links in cat5 > 200 meters that work perfectly fine. Granted, perfect reliability is something else, but for a backup link in a datacenter that charges an arm and a leg for fiber connections and < 10% of that price for copper ... I've even been known to stick that link in a 10G copper interface card to see if it'd work (even if it didn't work). But I've had reliable gigabit copper links over > 250meters operational for years.It helps a lot if they're the only ethernet link in a metal cable tray.

      And the opposite as well. Ever had an ethernet link inside a bundle of VDSL links ? The link was barely 30 meters, but the error counters mounted faster than the traffic counters. And the link stayed up, so the routing protocol saw no need to reroute. Now that was a bitch to deal with. Especially since we couldn't replace the cable with cat6.

      If your network design can't deal with signal loss on individual links, especially when known beforehand that said links are located in a warzone, you have other problems than theoretical maximum link distances. And even in general : hardware WILL fail, so prepare for failure instead of investing untold resources in preventing it.

  3. Why is it their problem? by sjs132 · · Score: 2, Insightful

    Because they don't have redundancy? Everyone gets mad because the USA wants to control the internet, but let something go bad and then someone wants to point fingers? Really? I just don't get the mentality of "We want you to do this for free" and then people turn around and B&M about the service being down for a bit.

    --
    --- Relax, that mass muderer is just trying to reduce our carbon footprint, one fetus at a time...
    1. Re:Why is it their problem? by Sprouticus · · Score: 4, Insightful

      It has nothing to do with this being a US Army server. It has everything to do with bad design. The people given the responsibility of a root server should NOT take that responsibility lightly.

    2. Re:Why is it their problem? by JWSmythe · · Score: 4, Informative

          Actually, most of the root "servers" are "anycast" now (9 of 13), so a single site failure doesn't matter. The US DoD runs two (G and H). G is anycast. H isn't. There wasn't clarification to what the issue was. It's easy to be quick to say "oh they suck", but shit happens sometimes. That's part of why we don't run on just one root nameserver. :)

          For all we know, it could have been a planned outage. I kinda doubt it with that size window, but who knows. It was only 1 of 13, which makes it more like 1 of an awful lot since 9 of the "servers" are really servers distributed world wide. I was doing some monitoring a while back, showing how our traffic moved, and that included monitoring the root servers. It made some really screwy routes, where one check would be in the US, and the next one would be somewhere in Europe.

      --
      Serious? Seriousness is well above my pay grade.
  4. One down, several dozens up by Anonymous Coward · · Score: 2, Insightful

    What's the problem? The point of redundancy isn't to keep all redundant instances up all the time. The system is designed to allow for downtime of quite a few servers.

  5. Lowest bidder by pixiekhatt · · Score: 4, Insightful

    This is what happens when you give contracts to the lowest bidder. The military may have tons of money, but that doesn't mean they spend it wisely. Even if it's not a contracted company taking care of these servers, and it's government employees (there's a difference), a LOT of those employees get their jobs based on keywords and general qualifications and several have a 'I did my time in the military and retired, they owe me this for all the hard work I did before' attitude. Not everyone is like that, and I've met some government employees (in the tech field) who really did know their stuff.. and not all contracts are bad -- but they can turn sour when a company steps in, says they'll do all that and more for this much less, and they really don't know what they're doing. I've seen that happen too. And if it's managed by soldiers.. well. They always told us, you're a soldier first, and a 'whatever your job is' after. Most technically trained soldiers don't know how to do their job well, or even at all. They just tough it out until they're an NCO, and then they're supposed to be a leader and tell their underlings to do the work.

    1. Re:Lowest bidder by Isao · · Score: 4, Interesting

      There are two main approaches to government contracting: Lowest Cost and Best Value. Contrary to popular belief, Lowest Cost is not always the one chosen, by a long shot. I also previously misunderstood "Close enough for government work." Turns out most "government work" has very specific requirements and specifications, or you don't get paid. If you see something different, please call Waste, Fraud & Abuse.

    2. Re:Lowest bidder by John+Hasler · · Score: 2, Insightful

      > This is what happens when you give contracts to the lowest bidder.

      Because they'd obviously get better results by giving them to the highest bidder...

      Try to get your head around concepts like "requirements", "specifications", and "lowest qualified bid". You not only do not get paid if you don't do the job you agreed to do, you may even have to pay the extra cost of having someone else do it over.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  6. There are 12 others - pick one. by Anonymous Coward · · Score: 5, Insightful

    Hardware fails. That's just how it is. Even with the highest end hardware available today, outages can happen. This is why there are 13 root servers to start with. So long as they don't all go down at once, all is good. As far as 18 hours to recover, why is that bad? With 12 others to pick from, should this one be a high priority? I think not. Getting one's panties in a bunch because a server fails and takes some time to recover makes you sound like a silly management type. Most of us lived at least a large part of our lives without any root servers - or any servers at all. It's not the end of the world if DNS goes down. It will be ok, I promise.

    1. Re:There are 12 others - pick one. by forkazoo · · Score: 5, Insightful

      Most of us lived at least a large part of our lives without any root servers - or any servers at all. It's not the end of the world if DNS goes down. It will be ok, I promise.

      You are an idiot.

      At one time it wouldn't have been a disaster for DNS to go down. Now we have everything from business to business transactions to stock trading to government bonds to consumer purchases being done online. We have hospitals depending on the internet to get their plasma on time. We have a billion people using social networks for hours. We have farmers using the internet to check the weather, militaries using the internet to transmit vital intelligence, and kids using the internet to call home and say they'll be late.

      Meh. It's just one of 13 roots. Almost nobody queries it directly. If I have my DNS pointing to my ISP DNS, or to Google DNS, or to my own recursive caching DNS Server which uses one of those as an upstream, all 13 root servers could be down for literally days and it's likely that almost nobody would ever notice. Most DNS servers will retain large caches of most domains. If something freaks out when the roots disappear, a few small ISP's might need to make some quick configuration changes. Some DNS changes wouldn' propagate properly until the DNS root servers were back online. But, frankly, life would go on. Making all of DNS go away would be pretty much impossible, short of taking out every node on the Internet.

      Yes, if *All 13* root servers suddenly died, there would be a few people who would get a late night at the office, but I certainly wouldn't see the effects directly.

    2. Re:There are 12 others - pick one. by stephanruby · · Score: 2, Interesting

      Go ahead and rub your nose in it until you get over your "how DARE you claim incompetence within the Army" offense.

      First, let me start by saying that the guy you replied to was rude, and I don't see why he needed to insult you to make his point. However...

      What went wrong is that a server that's not supposed to ever go down went down.

      Your argument seems circular. Your assumption is that this root server is never supposed to go down. In this physical world, that's a pretty huge assumption to make.

      And no, saying that the server went down is no proof positive that it should never have happened. The fact is, there was redundancy and the redundancy kicked in as it was supposed to. Now we're saying the redundancy can be outside of root, or inside of root, it doesn't really matter. And you're saying that the redundancy has to be ***inside*** of root, there can be no other way.

      Tell us, have you read something that gave you that idea? Me, I'm thinking that you probably read that recent Times (or Newsweek) article, if anything, I do agree that the article only seemed to romanticize and emphasize the importance of root servers, but I'd argue that the article was more a piece of flamboyant story-telling than an actual report on an actual technology. The truth about root servers is far less sexy than what the article did imply. The real truth is that if all the root dns servers went down at the same time, most of the internet and its dns would keep on working pretty nicely. We'd be running on old possibly slightly outdated cached dns information, but that wouldn't really matter -- it's only the end results that would matter anyway.

  7. Really, I'm going to be the first? by Anonymous Coward · · Score: 4, Funny

    They're sticking to their moto and deploying an Army of one.

    1. Re:Really, I'm going to be the first? by Sulphur · · Score: 2, Funny

      When the movie comes out, will it be Stephen Spielberg, James Cameron, or Mel Brooks?

  8. wow by buddyglass · · Score: 4, Insightful

    Whine much?

  9. Was it the monitoring system? by Antique+Geekmeister · · Score: 2, Interesting

    I've seen numerous instances where the monitoring system, itself, was confused or detached. The results on a chart are then quite confusing, unless you know how to backfill the data in the chart.

    Why, no, I've never been asked to do that for a 99.999% uptime SLA monitored site when some confused person in the offsite monitoring station put a bad IP address in /etc/hosts. No, no, no, couldn't happen.

    1. Re:Was it the monitoring system? by Anonymous Coward · · Score: 5, Informative

      https://lists.dns-oarc.net/pipermail/dns-operations/2010-October/006142.html
      Classification: UNCLASSIFIED
      Caveats: NONE

      > FYI, the H root server is currently experiencing an outage
      > due to a SONET ring outage possibly caused by flooding from
      > the tropical storm on the east coast. No estimated repair time.

      H root returned to service at 12:30 UTC today. Fiber cut due to downed
      utility poles. Repair was delayed due to high water.

  10. "backing of the US government money" by Joce640k · · Score: 4, Insightful

    Rest assured, the government isn't holding back. Those non-redundant Army servers already cost an order of magnitude more then everybody else's redundant servers.

    --
    No sig today...
  11. Non-story by A+beautiful+mind · · Score: 2, Interesting

    You have to realise that the layout of the root dns server hierarchy is historical. It is composed of organizations that are vastly different now than they were 20 years ago. The H root server people don't seem to care about things very much and there are a couple of other root servers where the organizations operating them don't put too much effort into things.

    Luckily, the internet doesn't really depend on them, as there are a couple of big organizations with heavy investment into making sure the root servers stay accessible all the time, like RIPE or Verisign. They operate thousands of physical machines at dozens of geographically distributed locations, all structured under one ip address, via anycast. This results in the situation where one logical root server outweights the other one in terms of physical boxes at least 100:1, if not more.

    My last information about the Verisign operated root servers from a couple years ago for example is that they are ridiculously overprovisioned, operating well under 1% used capacity, even when subjected to a fairly large DDOS. As far as I know, the common dns servers all support rtt banding, so basically using a random list of dns servers for a given resource that fall below a threshold of latency, therefor they wouldn't really notice the H root being down.

    --
    It takes a man to suffer ignorance and smile
    Be yourself no matter what they say
  12. It's just a drill: Cyber Storm III by Xemu · · Score: 3, Interesting

    Could this simply be a part of the Cyber Storm III information warfare exercise?
    http://www.military-technologies.net/2010/09/29/test-of-first-us-cyber-blitz-response-plan-begins/

    --
    Tell your friends about xenu.net
  13. Re:Not the biggest problem out there.,,, by gnieboer · · Score: 4, Interesting

    Agreed.

    From the offending server's website: "BRL volunteered to host one of the original root servers ... to provide a root server for the MILNET in the event that MILNET had to be disconnected from the Internet."

    The purpose of the G/H servers is not to support the greater good (that's a side benefit), but to ensure that the MILNET can function if the DoD cuts itself off from the rest of the internet.

    And besides, If my math is correct, there are a total of 205 redundant root sites (http://www.root-servers.org/), so imagine going up asking for funding...
    [IT Guy] "General, we need money to add another redundant root server site, if all the sites go down the internet collapses!"
    [General] "That sounds bad! How many redundant sites are there now?"
    [IT Guy] "Only 205"
    [General]

  14. A good swift kick? by Klync · · Score: 2, Funny

    > All these root operators that have only one site need a good swift kick...

    Alright, anonymous coward, I nominate YOU to be the one to go and give the US Army a "good swift kick". See ya when you get back!

    --

    ----
    Not to be confused with Col.
  15. mod parent up - it has actually *gasp* information by shallot · · Score: 2, Funny

    Wish I had mod points...

    Of the 64 comments I see in full, only this one has actual pertinent information about the downtime.

    ...

    I must be new here. :)