Slashdot Mirror


Entire .SE TLD Drops Off the Internet

Icemaann writes "Pingdom and Network World are reporting that the SE tld dropped off the internet yesterday due to a bug in the script that generates the SE zone file. The SE tld has close to one million domains that all went down due to missing the trailing dot in the SE zone file. Some caching nameservers may still be returning invalid DNS responses for 24 hours."

24 of 207 comments (clear)

  1. There goes my favorite web site ! by Anonymous Coward · · Score: 3, Funny

    Goat.se

  2. Re:No big deal by wsanders · · Score: 3, Informative

    Yeah, been there done that. *My* fumble only brought 10,000 domains down for about 10 minutes, and no one noticed. (I think all the domains hosted only cat pictures anyway.)

    Sorry, that's as big a responsibility as any employer has ever deemed suitable for my incompetent ass.

    --
    Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
  3. Re:No big deal by eldavojohn · · Score: 5, Funny

    The downtime lasted 30 minutes, and most domains were probably cached by nameservers anyway.

    I once viddied an animated documentary about a small town in Colorado that lost the internet for 22 minutes. It was not pretty. Our hearts and minds go out to you, people of Sweden. I cannot even fathom what that would be like ... I hope the looting and rioting has died down with the restoration of the internet.

    --
    My work here is dung.
  4. change control / management, anyone? by SuperBanana · · Score: 5, Insightful

    I seriously hope someone is fired or loses a contract over this. Where was the validation, change control, etc? I would expect that at the TLD level, a change to a configuration file would have to be inspected by someone AND run through some syntax-checking scripts...

    As for the person who was modded up for saying "hey, no big deal, fixed in 30 minutes!", not quite. DNS servers (and individual computers!) cache negative results. Anything anyone did a query on during those 30 minutes will be negatively cached by their system and their local DNS server. Granted, a whole lot of local Swedish ISPs and network providers have probably flushed their DNS server caches, but it's still going to seriously impact traffic to many, many sites, especially for everyone outside Sweden.

    1. Re:change control / management, anyone? by Anonymous Coward · · Score: 4, Funny

      Sweden porn?

      IKEA instruction manuals?

    2. Re:change control / management, anyone? by Mathness · · Score: 4, Funny

      I seriously hope someone is fired or loses a contract over this.

      You'll be happy to know that the person responsible have been found. The person in question was described as having unusual bushy eyebrows and speaking in a thick Swedish accent. His last comment about the incident, before being dragged away, was "bork bork bork".

      --
      Carbon based humanoid in training.
    3. Re:change control / management, anyone? by davebooth · · Score: 3, Insightful
      Right AND wrong in one post :)

      Excessive paperwork like 30 min to fill out a change request form to do something like make a 30 second edit to a config file and sighup a daemon is stupid and you'll hear no argument from me on that. Change control per se however, is essential, particularly in a large enterprise. Running part of that kind of infrastructure without change control would be like trying to manage the kernel source tree without cvs (or svn or $REPOS_OF_CHOICE, analogy holds either way.)

      The problem is not change control, its the way it is implemented. Change control methodology is designed by PHBs who haven't actually done the tech work in years, if they ever did. It's then scribbled all over by a "business analyst" who thinks a sigpipe is a plumbing problem and by the time guys actually doing the work get hold of it it has become a nightmare of procedural BS when all you really needed was a way to make sure everything you do to a live production system is documented and that anything other than emergency break-fix at least got basic testing and a second pair of eyes looking at it before rolling it out.

      --
      I had a .sig once. It got boring.
    4. Re:change control / management, anyone? by RabidMonkey · · Score: 5, Insightful

      As a DNS admin myself, touching high value zones, let me tell you, missing a stupid dot happens all the time. All the change control in the world doesn't help when you just don't type one little period. Even more helpfully, most tools won't notice and the zone will pass a configuration check because missing the trailing "." is syntactically correct.

      Let me add as well that "change management" that you want is just fantastic .. no making changes during core hours. When you run a 24/7 business, non-core hours means something like 2am. at 2am, I, and most mammals, are not at their mental best, so missing a single dot isn't horribly hard.

      The only thing I'd suggest they do is use an offline test box for zones, then promote that change to prod. Then, you can load all the mistakes you want, do your digs, and if stuff works, THEN you move it to prod. I never ever make changes on production servers, they are done offline, tested, then put into prod with scripts. It makes it a lot harder for missing periods to make it into production.

      Finally, this is a good reason why negative caching should have low TTLs. If you run a DNS server that can't handle low neg-caching TTLs, it's time to upgrade from a 386.

      Cheers.

      --
      We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
  5. So I guess it's... by 6Yankee · · Score: 5, Funny

    ...borked!

  6. Re:An oft overlooked single point of failure? by sexconker · · Score: 3, Interesting

    Uh, it would make no difference.
    DNS is hierarchical, and has teh caching.

    2 independent groups running DNS would strive to make sure they sync with each other quickly - thus all failures would sync quickly too.

    The difference between
      - the delay of a correct change propagating across the two firms running DNS
      - the delay of an incorrect change propagating within a single DNS

    would essentially be zero.

    No good things could come from what you propose unless it was specifically designed to have a 24 hour delay or something.

    Can't get to milkmaids.se ? Try milkmaids.se via DNS2 to get a 24-hour old version.

    This is something the CURRENT DNS system could support - explicitly calling for older versions.

    In fact, it might be worthwhile. Somebody write an RFC.

  7. Re:unless you are swedish by CharlyFoxtrot · · Score: 3, Insightful

    its "no big deal" until you need to know something off the internet right now, high stakes

    I need to know what a fourteen year old thinks about copyright law and I need to know it NOW !

    --
    If all else fails, immortality can always be assured by spectacular error.
  8. Re:unless you are swedish by Hyppy · · Score: 3, Insightful

    The Internet was started as, and always has been, a "best effort" network. If a packet gets through, great. If not, well, it's not the end of the world. People have tried to code more and more resilient protocols on top to be as robust as possible, but in the end it's a very fragile system that can go down quite easily.

    Anything sufficiently "high stakes" shouldn't rely on an unreliable medium.

  9. DNS is the problem by cthulhuology · · Score: 4, Interesting

    It still boggles my mind that anyone thought zone files are a good idea. The file format is so damn brittle, that a single byte can spell disaster. On top of that, the hierarchical naming structure presents an inherent systemic risk for all sub-domains as exhibited by this .se fiasco. Nevermind the injection attacks, Pakistan taking out Youtube, and the rest, you have organizations like Verisign which profit immensely off of keeping the system broken. And don't even bother mentioning DNSSEC, as it still doesn't resolve this fundamental issue. The next systemic fuckup will simply be a signed fuckup.

    1. Re:DNS is the problem by upside · · Score: 3, Insightful

      Except the Pakistan affair was about the BGP routing protocol. I agree the file format is nutty, though.

      I can't think of a better alternative to the hierarchical system, perhaps you have a suggestion. A flat namespace would be an administrative impossiblity, not to mention the stress it would put on name servers. Increasing the number of TLDs would lessen the impact of a single failure, though.

      --
      I'm sorry if I haven't offended anyone
    2. Re:DNS is the problem by Anonymous Coward · · Score: 5, Funny

      Regedit32.exe

    3. Re:DNS is the problem by photon317 · · Score: 5, Informative

      Part of the problem with DNS these days, which your post exemplifies, is that from very early on "BIND's implementation of DNS", and "DNS The Protocol" have been mashed together and confused by the RFC authors (who were involved with the BIND implementation and had motive to encourage the world to think only in BIND terms) and basically everyone who ever used DNS in any capacity. Zonefiles are not implicit in DNS address resolution (neither for authoritative servers or recursive caches). They really aren't any part of the wire DNS protocol for resolving names. They *are* part of a wire protocol for secondary servers that slave zonefiles from primary servers, but even in that case it's really more a "BIND convention" than a necessity. Ultimately how you transfer a zone's records from a master server to a slave server is up to however those two servers and their administrators agree to do so. You can skip the AXFR protocol that uses zonefiles and instead do something else that works for both of you. Inventing a new method of slaving zone data is easy and doesn't involved much complicated rollout. Some people just rsync zonefiles for instance instead of using AXFR today.

      It's really frustrating (believe me, I've done it) when you try to implement a new DNS server daemon from scratch from the RFCs, and you have to wade through this mess of "what's a BIND convention that doesn't matter and what's important to the actual DNS protocol for resolving names on the wire".

      --
      11*43+456^2
  10. Re:No big deal by CorporateSuit · · Score: 4, Funny

    The downtime lasted 30 minutes, and most domains were probably cached by nameservers anyway.

    I didn't notice the DNS freak out, but I did notice the internet's smug meter had dropped about 30%.

    --
    I am the richest astronaut ever to win the superbowl.
  11. Re:No big deal by eln · · Score: 5, Insightful

    The actual downtime is no big deal, but the reason it happened is. Evidently, the registrar for an entire country's domain likes to roll out changes to the primary zone file without any sort of testing or syntax checking first. Simply having a small network (one or two computers) running a test root server, and running your scripts against that first, would have discovered the bug.

    DNS is very simple, but it's just as prone to human error as anything else. If you're responsible for the records of a large number of domains (like, say, an entire country), you probably ought to take some time to develop proper testing and change control procedures before you fiddle with it. It sounds like these guys didn't take it seriously enough and got burned. I hope they'll learn their lesson from this and change their procedures.

  12. Re:No big deal by JustOK · · Score: 3, Funny

    Are you my motherboard?

    --
    rewriting history since 2109
  13. There's møre to Sweden than .se by 93+Escort+Wagon · · Score: 5, Funny

    Wi nøt trei a høliday in Sweden this yer?

    See the løveli lakes

    The wonderful telephøne system

    And mani interesting furry animals

    --
    #DeleteChrome
    1. Re:There's møre to Sweden than .se by rainmaestro · · Score: 4, Funny

      We apologise for the fault in the previous post. Those responsible have been sacked.

  14. Re:TPB by KevinKnSC · · Score: 3, Funny

    It looks like someone messed up the summary. I'm pretty sure it should be:

    Peengdum und Netvurk Vurld ere-a repurteeng thet zee SE tld drupped ooffff zee internet yesterdey dooe-a tu a boog in zee screept thet generetes zee SE zune-a feele-a. Zee SE tld hes cluse-a tu oone-a meelliun dumeeens thet ell vent doon dooe-a tu meessing zee treeeling dut in zee SE zune-a feele-a. Sume-a cecheeng nemeserfers mey steell be-a retoorneeng infeleed DNS respunses fur 24 huoors.

  15. Re:No big deal by CorporateSuit · · Score: 5, Funny

    DNS is very simple, but it's just as prone to human error as anything else.

    Are you kidding? I've been programming DNS for a long time, and if theirs one thing I learned, its that programmers like me don't make errors.

    --
    I am the richest astronaut ever to win the superbowl.
  16. Re:No big deal by pyrrhonist · · Score: 4, Funny

    but I did notice the internet's smug meter had dropped about 30%.

    Norwegian detected.

    --
    Show me on the doll where his noodly appendage touched you.