Slashdot Mirror


Why Browsers Blamed DNS For Facebook Outage

Julie188 writes "That was probably the only time 'DNS' will ever be a trending term on Twitter. The cause was Facebook's 2.5 hour outage on Thursday, which incorrectly told users trying to access the site that a DNS error was to blame. In truth, experts who've read Facebook's explanation say the site went down because Facebook gave itself a distributed denial-of-service attack when a system admin misconfigured a database. So why was DNS blamed? The 27-year-old communications protocol has been known to cause other, somewhat similar outages."

5 of 96 comments (clear)

  1. Ageism by Vahokif · · Score: 5, Informative

    The 27-year-old communications protocol

    So? TCP/IP is 36 years old.

    1. Re:Ageism by morgan_greywolf · · Score: 2, Informative

      What does IPv4 have to do with DNS? (hint: nothing. Modern DNS servers support IPv6)

  2. Re:DNS? by rs79 · · Score: 2, Informative

    http://rs79.vrx.net/works/photoblog/2010/Sep/23/

    Notice the page, being served from facebook.com, saying "bad DNS". Think about that
    for a second.

    --
    Need Mercedes parts ?
  3. There WAS some DNS issues too ! by ivan_w · · Score: 3, Informative

    The confusion might have come from the fact that when I looked, there seemed to also be some DNS problem.

    Basically, when asking directly, the servers that are authoritative for the zone were giving me a CNAME for the 'ANY' query, but not the associated A records, which it should, since the CNAME was pointing to a host name within the same authority. At this point, any sensible resolver stops asking !

    This only lasted for a little while though - so it might have been a glitch or possibly a deliberate action related to how they were trying to fix the underlying issue itself - possibly averting traffic until they actually solved the actual problem.

    --Ivan

  4. Re:Did Facebook have an internal DNS failure? by rekoil · · Score: 3, Informative

    It didn't fail, they turned it off. This was the easiest way to "shut off the entire site" as their post-mortem describes. The DNS errors users saw were being generated by the front-end HTTP proxies, not by client browsers, which caused most of this confusion. Once the database issue cleared, they reactivated the DNS entries for the back-end servers one cluster at a time and the site came back.