Slashdot Mirror


Entire .SE TLD Drops Off the Internet

Icemaann writes "Pingdom and Network World are reporting that the SE tld dropped off the internet yesterday due to a bug in the script that generates the SE zone file. The SE tld has close to one million domains that all went down due to missing the trailing dot in the SE zone file. Some caching nameservers may still be returning invalid DNS responses for 24 hours."

16 of 207 comments (clear)

  1. change control / management, anyone? by SuperBanana · · Score: 5, Insightful

    I seriously hope someone is fired or loses a contract over this. Where was the validation, change control, etc? I would expect that at the TLD level, a change to a configuration file would have to be inspected by someone AND run through some syntax-checking scripts...

    As for the person who was modded up for saying "hey, no big deal, fixed in 30 minutes!", not quite. DNS servers (and individual computers!) cache negative results. Anything anyone did a query on during those 30 minutes will be negatively cached by their system and their local DNS server. Granted, a whole lot of local Swedish ISPs and network providers have probably flushed their DNS server caches, but it's still going to seriously impact traffic to many, many sites, especially for everyone outside Sweden.

    1. Re:change control / management, anyone? by e2d2 · · Score: 2, Insightful

      I'll go one better and say we should try him in a military tribunal and sentenced to hard time in ADX. That will send the world a message - NO MISTAKES OR ELSE.

      Get real man, this is a human error. Your struggle for perfection baffles my monkey brain.

    2. Re:change control / management, anyone? by davebooth · · Score: 3, Insightful
      Right AND wrong in one post :)

      Excessive paperwork like 30 min to fill out a change request form to do something like make a 30 second edit to a config file and sighup a daemon is stupid and you'll hear no argument from me on that. Change control per se however, is essential, particularly in a large enterprise. Running part of that kind of infrastructure without change control would be like trying to manage the kernel source tree without cvs (or svn or $REPOS_OF_CHOICE, analogy holds either way.)

      The problem is not change control, its the way it is implemented. Change control methodology is designed by PHBs who haven't actually done the tech work in years, if they ever did. It's then scribbled all over by a "business analyst" who thinks a sigpipe is a plumbing problem and by the time guys actually doing the work get hold of it it has become a nightmare of procedural BS when all you really needed was a way to make sure everything you do to a live production system is documented and that anything other than emergency break-fix at least got basic testing and a second pair of eyes looking at it before rolling it out.

      --
      I had a .sig once. It got boring.
    3. Re:change control / management, anyone? by RabidMonkey · · Score: 5, Insightful

      As a DNS admin myself, touching high value zones, let me tell you, missing a stupid dot happens all the time. All the change control in the world doesn't help when you just don't type one little period. Even more helpfully, most tools won't notice and the zone will pass a configuration check because missing the trailing "." is syntactically correct.

      Let me add as well that "change management" that you want is just fantastic .. no making changes during core hours. When you run a 24/7 business, non-core hours means something like 2am. at 2am, I, and most mammals, are not at their mental best, so missing a single dot isn't horribly hard.

      The only thing I'd suggest they do is use an offline test box for zones, then promote that change to prod. Then, you can load all the mistakes you want, do your digs, and if stuff works, THEN you move it to prod. I never ever make changes on production servers, they are done offline, tested, then put into prod with scripts. It makes it a lot harder for missing periods to make it into production.

      Finally, this is a good reason why negative caching should have low TTLs. If you run a DNS server that can't handle low neg-caching TTLs, it's time to upgrade from a 386.

      Cheers.

      --
      We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
    4. Re:change control / management, anyone? by Chris+Mattern · · Score: 2, Insightful

      Even more helpfully, most tools won't notice and the zone will pass a configuration check because missing the trailing "." is syntactically correct.

      Not if the configuration check you wrote checks for the trailing "." anyways. And if it doesn't, you need to rewrite it.

  2. Re:No big deal by scott_karana · · Score: 2, Insightful

    While the impact of this is no big deal, it's still kind of scary that the people running a decently-sized ccTLD would make such a novice mistake on their zonefile.

  3. Re:unless you are swedish by CharlyFoxtrot · · Score: 3, Insightful

    its "no big deal" until you need to know something off the internet right now, high stakes

    I need to know what a fourteen year old thinks about copyright law and I need to know it NOW !

    --
    If all else fails, immortality can always be assured by spectacular error.
  4. Re:unless you are swedish by Hyppy · · Score: 3, Insightful

    The Internet was started as, and always has been, a "best effort" network. If a packet gets through, great. If not, well, it's not the end of the world. People have tried to code more and more resilient protocols on top to be as robust as possible, but in the end it's a very fragile system that can go down quite easily.

    Anything sufficiently "high stakes" shouldn't rely on an unreliable medium.

  5. Re:DNS is the problem by upside · · Score: 3, Insightful

    Except the Pakistan affair was about the BGP routing protocol. I agree the file format is nutty, though.

    I can't think of a better alternative to the hierarchical system, perhaps you have a suggestion. A flat namespace would be an administrative impossiblity, not to mention the stress it would put on name servers. Increasing the number of TLDs would lessen the impact of a single failure, though.

    --
    I'm sorry if I haven't offended anyone
  6. Re:DNS is the problem by RalphSleigh · · Score: 2, Insightful

    Pakistan taking out Youtube had absolutely nothing to do with DNS, they wrongly propagated a BGP announcement for the youtube IPs outside of Pakistan, so about 1/3 of the internet routed traffic into their black hole instead of to Youtube. Pretty effective blocking had they kept it internal, but they didn't.

    --
    Come as you are, do what you must, be who you will.
  7. Re:No big deal by eln · · Score: 5, Insightful

    The actual downtime is no big deal, but the reason it happened is. Evidently, the registrar for an entire country's domain likes to roll out changes to the primary zone file without any sort of testing or syntax checking first. Simply having a small network (one or two computers) running a test root server, and running your scripts against that first, would have discovered the bug.

    DNS is very simple, but it's just as prone to human error as anything else. If you're responsible for the records of a large number of domains (like, say, an entire country), you probably ought to take some time to develop proper testing and change control procedures before you fiddle with it. It sounds like these guys didn't take it seriously enough and got burned. I hope they'll learn their lesson from this and change their procedures.

  8. Re:DNS is the problem by bwalling · · Score: 2, Insightful

    You do recognize that most of the protocols and specifications running the Internet are decades old, right? The fact that they've lasted this long is really rather impressive.

    Besides, if we redesigned it now, it would be insanely complex and bloated, not to mention never fully implemented (CSS? ha!), as there would be too many parties "contributing".

  9. Re:No big deal by MrMista_B · · Score: 2, Insightful

    You expect them to be absolutely perfect all the time no matter what, forever and ever? /That's/ unrealistic.

  10. Re:No big deal by Anonymous Coward · · Score: 1, Insightful

    I expect automated sanity checks before a modified zonefile goes live. Like, what would a domain name server receive when asking for a well known domain under that TLD? If that doesn't result in at least some records, warn the admin that the zonefile might not be correct.

  11. Re:Why MaraDNS uses a special zone file format by grumbel · · Score: 2, Insightful

    Can MaraDNS handle IPv6 now? Last time I used it I had to ditch it in end as IPv6 support was lacking.

  12. Re:No big deal by mcgrew · · Score: 2, Insightful

    I wish browsers would store the IP address of the page as well as the domain name in bookmarks. That way if the DNS server goes down you could still get to the site. Of course, the primary lookup should still be the domain name, since a site can have its address changed; the browser would only look at the IP if the DNS lookup failed.