Slashdot Mirror


Why Browsers Blamed DNS For Facebook Outage

Julie188 writes "That was probably the only time 'DNS' will ever be a trending term on Twitter. The cause was Facebook's 2.5 hour outage on Thursday, which incorrectly told users trying to access the site that a DNS error was to blame. In truth, experts who've read Facebook's explanation say the site went down because Facebook gave itself a distributed denial-of-service attack when a system admin misconfigured a database. So why was DNS blamed? The 27-year-old communications protocol has been known to cause other, somewhat similar outages."

20 of 96 comments (clear)

  1. Message saying DNS error by Anonymous Coward · · Score: 2, Interesting

    It wasn't your browser having a DNS error, it was the user facing servers at Facebook reporting DNS problems talking to whoever they talk to. Maybe when they decided the way to fix the problem was to take down the site, they just removed the back end server cluster from their internal DNS.

  2. Duh by vlm · · Score: 5, Insightful

    So why was DNS blamed?

    From http://www.facebook.com/note.php?note_id=431441338919&id=9445547199&ref=mf&_fb_noscript=1

    The way to stop the feedback cycle was quite painful - we had to stop all traffic to this database cluster, which meant turning off the site.

    I'm, uh, taking a wild guess that simply shutting off port 80 is not going to allow for a controllable ramp up... they could redirect to another site, Orkut or myspace would have been mildly humorous. I am mildly surprised they don't have a simple emergency box with a simple static "undergoing repair" page, but, whatever ...

    So, other than zapping the A records and waiting, what are they supposed to do? Bonus points if they were doing DNS based load balancing and simply unplugged their (dns based) load balancer.

    I have no dog in the fight, having deleted my facebook account months ago. It is kind of funny that a page of technobabble is described as "technical details" as if folks like us/me would find it to be a complete description rather than pretty vague. Then again we're dealing with farmville addicts and you can't reason with addicts.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    1. Re:Duh by PaganRitual · · Score: 2, Funny

      This whole situation does explain why my mother appeared to be sick on the couch at my parent's place on Thursday afternoon when I paid them a visit. With all the shaking and huddling under the covers and looking pale-faced I presumed she had come down with the flu or something.

      Then again we're dealing with farmville addicts and you can't reason with addicts.

      They aren't addicts, that's patently unfair. They can stop any time they want. What is most admirable about them is that they are simply so time-savvy that they coincide those times at which they wish to stop with the periods during which their crops have to be left to grow. Once the crops are ready for harvest, they desire to play again. It's really very simple and implies no addiction whatsoever.

      Seriously though, 2.5 hours? The experience I have with Farmville gives me vague recollection that there are a fair few crops that have a growth period of a hour or less, and given that the crops wither and become unusable in the same time they take to complete their growth makes me wonder how many people petitioned Zynga for free ... well, the game is free so technically (and literally) nothing of value was lost, but still, I'm sure they were crying about something.

      Now shut-up, it's nearly 4:01 server time and my rogue still needs the Brewfest boss' dagger to drop for it. 5 times and all I've seen is the mace which I can buy for fuck all anyway. My warlock has had two daggers already; maybe it's payback for the Midsummer event when my rogue got the staff twice and my warlock never saw it. THIS IS SUCH BULLSHIT.

  3. Ageism by Vahokif · · Score: 5, Informative

    The 27-year-old communications protocol

    So? TCP/IP is 36 years old.

    1. Re:Ageism by morgan_greywolf · · Score: 4, Insightful

      Really? DNS is broken? So typing say, http://slashdot.org/ doesn't work for you?

      No. DNS has a few security issues, but they're mostly minor. The fact that DNS works for millions of people every day without issue at least 99% of the time proves that DNS is a successful design, even if it could use some security updating.

    2. Re:Ageism by kasperd · · Score: 2, Interesting

      I think that comment was referring to the fact that some recent announcement said there are now 5 billion devices on the internet, and IPv4 supports only up to 3.7 billion devices.

      --

      Do you care about the security of your wireless mouse?
    3. Re:Ageism by kasperd · · Score: 5, Insightful

      Some people think technology should be replaced just because it is old. But really, it should be replaced if it doesn't suit our needs and there is a different technology that does suit it.

      It is better to replace a 1 year old technology that does not suit our needs than to replace a 50 year old one that does. Usually when replacing, you want to replace with something newer. But in some cases it may turn out to be better to replace a new and misdesigned technology with an older and proven one.

      That said, there are improvements to both IP and DNS which should be rolled out because they fix real problems. The rollouts are not happening as fast as they ought to, mainly because it is problematic to roll out a change to the entire Internet, especially when not everybody involved is cooperating.

      But I don't think that really has anything to do with this outage.

      --

      Do you care about the security of your wireless mouse?
    4. Re:Ageism by oldspewey · · Score: 2, Funny

      So? TCP/IP is 36 years old.

      Yeah, but it still lives in its parents' basement.

      --
      If libertarians are so opposed to effective government, why don't they all move to Somalia?
    5. Re:Ageism by morgan_greywolf · · Score: 2, Informative

      What does IPv4 have to do with DNS? (hint: nothing. Modern DNS servers support IPv6)

    6. Re:Ageism by dlgeek · · Score: 2, Insightful

      And is definitely showing it's age. There's been a big cry for years from those working at the really high end of networking that we need to replace (really just extend) TCP because it doesn't work well with high bandwidth-delay-product links. This is because the max window size and ramp-up algorithm (slow start) don't allow you to saturate the pipe quickly enough or even at all. There are several proposed extensions floating around to fix the problem but none of them have widespread adoption.

      This actually is the case with a lot of our old networking protocols - yes, they were incredibly well designed at the time, but many are showing that they need to be upgraded to reflect modern technology. Back to our original case, the original DNS protocol does have a lot of problems that have surfaced lately (think about the sequence number prediction stuff from a couple years back) which inspired the roll-out of DNSSEC. IPv4 is hitting it's limits, but we're having trouble rolling out IPv6. How much easier would fighting spam be if SMTP had a strong authentication system for sent messages? Even HTTP, which has undergone several revisions, is again showing limitations, hence Google rolling out SPDY which allows predictive pushes, stream parallelism, etc.

      I don't think anyone seeks to criticize the designers of these protocols, and the protocols have excelled and scaled far, far beyond anyone's wildest expectations. That being said, they have been showing cracks lately as technology has grown, and nothing looks like it did back when they were written. However, we have hit a point where the difficulty in upgrading or replacing them is actually starting to hold us back.

  4. Re:DNS? by Mitchell314 · · Score: 3, Funny

    Then stop buying Dells. :P

    --
    I read TFA and all I got was this lousy cookie
  5. Not mission critical! by j_col · · Score: 2, Insightful

    I found the genuine panic from many Facebook users to this outage very amusing.

  6. Re:DNS? by rs79 · · Score: 2, Informative

    http://rs79.vrx.net/works/photoblog/2010/Sep/23/

    Notice the page, being served from facebook.com, saying "bad DNS". Think about that
    for a second.

    --
    Need Mercedes parts ?
  7. There WAS some DNS issues too ! by ivan_w · · Score: 3, Informative

    The confusion might have come from the fact that when I looked, there seemed to also be some DNS problem.

    Basically, when asking directly, the servers that are authoritative for the zone were giving me a CNAME for the 'ANY' query, but not the associated A records, which it should, since the CNAME was pointing to a host name within the same authority. At this point, any sensible resolver stops asking !

    This only lasted for a little while though - so it might have been a glitch or possibly a deliberate action related to how they were trying to fix the underlying issue itself - possibly averting traffic until they actually solved the actual problem.

    --Ivan

  8. Re:DNS? by kasperd · · Score: 2, Interesting

    Notice the page, being served from facebook.com, saying "bad DNS".

    I can understand why that may cause people to think the problem is with DNS. The error message looks like it came from an http proxy. That would suggest that either the user had a proxy configured or facebook were using a reverse proxy. If it was the later, the DNS "problem" would be inside their network.

    --

    Do you care about the security of your wireless mouse?
  9. Re:So what? Big Whoop! by Sir_Lewk · · Score: 2, Funny

    No. Facebook doesn't do data-mining, and they don't serve ads. They simply pull money out of their ass.

    --
    "linux is just DOS with a UNIX like syntax" -- Galactic Dominator (944134)
  10. Re:Did Facebook have an internal DNS failure? by rekoil · · Score: 3, Informative

    It didn't fail, they turned it off. This was the easiest way to "shut off the entire site" as their post-mortem describes. The DNS errors users saw were being generated by the front-end HTTP proxies, not by client browsers, which caused most of this confusion. Once the database issue cleared, they reactivated the DNS entries for the back-end servers one cluster at a time and the site came back.

  11. Re:So what? Big Whoop! by kiwimate · · Score: 2, Insightful

    So is Slashdot.

    I don't know that finger pointing is necessarily healthy - that tends to suggest CYA and childish blame games. But on a technical IT focused web site, one might suppose that a lessons learned exercise on the root cause of the failure of a massive website would be of interest and hopefully even an educational experience.

  12. You must have interesting firewall logs... by RulerOf · · Score: 2, Funny

    look at my own /etc/hosts file. From time to time I manage to bite myself on the ass with my block-list

    #Below is my custom DNS blocklist
    127.0.0.1 om.nom.nom.

    user@localhost:~$ ping om.nom.nom.

    --
    Boot Windows, Linux, and ESX over the network for free.
  13. Re:I disagree by Kvasio · · Score: 2, Interesting

    Yet, you failed to notice that /. is a site for nerds.
    Many nerds do not thrive to cultivate their social skills.
    Checking their friends status on social network might not be on top of their agendas.
    So: event was notable, but not very important to many slashdotters.