Why Browsers Blamed DNS For Facebook Outage
Julie188 writes "That was probably the only time 'DNS' will ever be a trending term on Twitter. The cause was Facebook's 2.5 hour outage on Thursday, which incorrectly told users trying to access the site that a DNS error was to blame. In truth, experts who've read Facebook's explanation say the site went down because Facebook gave itself a distributed denial-of-service attack when a system admin misconfigured a database. So why was DNS blamed? The 27-year-old communications protocol has been known to cause other, somewhat similar outages."
The 27-year-old communications protocol
So? TCP/IP is 36 years old.
http://rs79.vrx.net/works/photoblog/2010/Sep/23/
Notice the page, being served from facebook.com, saying "bad DNS". Think about that
for a second.
Need Mercedes parts ?
The confusion might have come from the fact that when I looked, there seemed to also be some DNS problem.
Basically, when asking directly, the servers that are authoritative for the zone were giving me a CNAME for the 'ANY' query, but not the associated A records, which it should, since the CNAME was pointing to a host name within the same authority. At this point, any sensible resolver stops asking !
This only lasted for a little while though - so it might have been a glitch or possibly a deliberate action related to how they were trying to fix the underlying issue itself - possibly averting traffic until they actually solved the actual problem.
--Ivan
It didn't fail, they turned it off. This was the easiest way to "shut off the entire site" as their post-mortem describes. The DNS errors users saw were being generated by the front-end HTTP proxies, not by client browsers, which caused most of this confusion. Once the database issue cleared, they reactivated the DNS entries for the back-end servers one cluster at a time and the site came back.