Wikipedia Explains Today's Global Outage
gnujoshua writes "The Wikimedia Tech Blog has a post explaining why many users were unable to reach Wikimedia sites due to DNS resolution failure. The article states, 'Due to an overheating problem in our European data center many of our servers turned off to protect themselves. As this impacted all Wikipedia and other projects access from European users, we were forced to move all user traffic to our Florida cluster, for which we have a standard quick failover procedure in place, that changes our DNS entries. However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally. This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects."
How 'bout proofreading titles
DNA resolution failure
Because of this outage, I actually had to work this morning.
Human or otherwise?
I could see why the failover didn't work... They should try resolving names instead of nucleic acids. :\
Its the T-virus, run!
The blog link explaining the whole thing of course doesn't work for us Europeans, either.
However, shortly after we did this failover switch, it turned out that this failover mechanism was now broken, causing the DNS resolution of Wikimedia sites to stop working globally.
Good thing Wikimedia pays their System Administrators well enough to test their backup systems.
Some government pencil pusher mixed up wikileaks with wikipedia... after all the "strange tweets" from @wikileaks it sounded feasible ;)
...as proof of global warming?
Hate it when my DNA doesn't resolve.
Sorry I know its just a type-o, but its funny to me.
Think Deeply.
With both stormy and sunny days.
I don't know which is more awesome - that this article came up just as I was wondering what happened to Wikipedia, or that the post links to an article which I CAN'T READ BECAUSE WIKIPEDIA IS DOWN.
When did we start using DNA to resolve domain names? I mean we can fit a butt-load of information in a DNA strand but I think the overhead would be too high for DNS resolutions. (Should be DNS)
>>due to DNA resolution failure. ...Also known as mutation...or X-men
cant we simply hack that by modifying gens.conf?
I noticed wikipedia wasn't resolving this morning.
Flushing my "DNA" cache fixed it ;-))
rndc flush
Everything I write is lies, read between the lines.
Guess it resolved to a chimp?
Whoa, why is the DNS resolving dATP.dGTP.dCTP.dATP?!?
This problem was quickly resolved, but unfortunately it may take up to an hour before access is restored for everyone, due to caching effects.
If you don't want to wait an hour for it to update, you can open a command prompt and type "ipconfig /flushdna".
Please be warned that this may also revert you to some sort of single-celled organism.
Obviously this was caused by Global Warming.
With DNA resolution problems, apple.com resolves to 64.38.232.180 (oranges.com).
"I'm not a quack, I'm a mad scientist! There's a difference." - Dr. Cockroach
We apologize for the inconvenience this has caused.
[Citation needed]
Summation 2
RNA actually translates the name into an IP address.
You could read up on this at http://en.wikipedia.org/wiki/Translation_(genetics) if wikipedia's ribosomes weren't down right now.
You see guys, this is why you regularily test your backup plans and failovers. This is equivalent to building maintenance making sure the fire extinguishers aren't expired... it's basic to IT. Unfortunately, Wikipedia just reminded us that what's basic isn't always what's remembered. Someone just lost their job.
#fuckbeta #iamslashdot #dicemustdie
Nothing to see here. Overheating was normal behavior after I updated the Pr0n article.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
...to the old setting by some Admin who edited it the last time, and who would be damned if he let anyone else get in the last word.
DNA resolution failure? sounds serious.
Wikipedia admins need to get out of their basement anyway.
Judging by traveling through Europe in the summer, they've never discovered "air conditioning."
Guess what Wikipedia, you are a free service. You could be up 10 hours a day or 24, existing is sufficient. Damn demanding internayz morons.
Well, looks like all the DNA jokes are now -1 off topic
Well played /., well played.
"There is a way that seems right to a man, but its end is the way of death." Proverbs 16:25 (NKJV)
But when I got to the wiktionary.org main page I didn't see any kind of note or warning.
Couldn't they have at least put up some kind of warning box, hopefully with a list of IP addresses underneath so that one could directly access the services when in dire need?
.
.
.
.
.
(I'm not really sure what constitutes "dire need" of wikimedia services, but I'm sure someone can come up with a list of relevant circumstances)
coding is life
active/passive systems are a pain in the arse. The whole concept of testing failover in an active/passive situation is wrong. Anything which relies on human beings doing this and that and that and that is a bad solution.
Just run active/active and load balancer over both sites. If one fails it's tests, you just pull it.
Deleted
They couldn't get to the Wiki page about failover testing.
Have gnu, will travel.
From the Summary:
"Due to an overheating problem in our European data center many of our servers turned off to protect themselves"
"we were forced to move all user traffic to our Florida cluster"
I think Wikipedia needs to build some data centers further north.
I thought maybe they had simply deleted Wikipedia because some admin decided nothing on there was "notable".
You build your systems to be fault tolerant. They automatically continue with half the components missing. Automatically disable those which fail the continually running tests.
Build your backup tests into daily procedures. i.e. don't copy/scp files to other locations/servers/sites, restore them to the other location. Autorestore DB backups to the staging/test/dev/reporting systems daily.
Computers are there to do stuff automatically. Getting human beings to do them is prone to failure.
Deleted
I didn't understand some terms in the summary, so I was about to wiki them... *sigh*
I see lots of comments stating that this would not have happened had admins run regular tests on the failover mechanisms. That seems a poor assumption- if the system happens to fail and then an outage occurs before the next scheduled test, one may not be aware of it.
We had this problem recently where we were testing our backup generator. Normally, we cut power to the local on-campus substation, which kicks in the generator and activates a failover mechanism, rerouting power. Well, the generator came on no problem but the failover mechanism was broken, so every server in the datacenter spontaneously lost power. Had we known the failover was broken, we would have not done the regular test. However, the last test on the failover (done directly without cutting power), a mere month prior, had shown the failover mechanism was fine.
Point being, unless you are going to literally continuously test everything, there is still some probability of an unexpected double failure.
-Ryan
AUWYHSTOT (Acronyms are Useless When You Have to Spell Them Out Too)
20:47 UTC+1, we are still without Wikipedia probably due to poor DNS propagation
if only their blog had mod points. all the comments are of the form "still down where ever I am"
every anarchist is a baffled dictator. Benito_Mussolini
...and nobody really gave a damn.
So why are they considered relevant again?
Darn, I thought Wikipedia was going to explain today's global outrage.
:q!
How many kids will go to school tomorrow and say they couldn't complete an assignment because Wikipedia is down?
Speaking of Wikipedia, an idea that has long been in my mind, but that I have never sat down and worked out is distributed hosting of Wikipedia. The idea is that volunteers each contribute some resources (network capacity, storage space, RAM, and CPU cycles) to host and serve part of the content.
This way, we should be able to reduce the load on the (donation supported) Wikimedia servers, as well as increase the redundancy in the system.
Is anybody already working on this or are there perhaps even already implementations of this idea?
Please correct me if I got my facts wrong.
And here, I thought that the Great Firewall of China had been blocking access to politically-charged Websites again.
-Z
I was looking up curry recipes ... the really hot ones.
Man, I never thought my tapeworm would cause a global outage.
I started reading the article and wondered why there was such global outrage about dns resolution on Wikipedia, then I went back and looked at the title again...
If it ain't broke, DON'T fix it.
I happened to be looking up "DNS Failover" on Wikipedia at the time of the DNS failure
I was only 28,931 registrations away from having a 6-digit UID
Wikimedia is terribly understaffed. They have about 35 employees, for one of the 5th largest sites on the Internet (and that includes legal/finance/MediaWiki devs/etc. staff). Basically the site is run by a dozen guys. Compare that to any other Top 10 site, this is just crazy.
Given their limited resources (both human and financial), it is amazing that Wikipedia is down so rarely. If you want the site to be more reliable, there is something you can do: Donate to the Wikimedia Foundation
Overheating?!? Must be Global Warming - ahem - [man made] Climate Change!
I was rather pissed. And the only thing I was going to do is to look up a few math terms. Ended up using PlanetMath and few other sites, but when Wki came back, I check them as well as guess what: they had the most comprehensive and informative articles. That's the first outage I remember since I started using Wiki.
It was though there was a great calming came over the Force. Like the dawning of a new age, one based on freedom and facts. One where people were free to write articles without fear of deletion and condemnation. Or edit articles without fear of biased reversion, or banishment.
Suddenly people saw the wood from the trees, and realized there was an whole Internet out there with truth and beauty in it, where jack-booted book-burners were not only not in control, but not welcome either.
And then they brought wikipedia back up again; the black flags flew, and the click of heels was once again heard around the net.
Jokes aside. Indeed, I wish I were joking. There is a lot of pure evil at the heart of Wikipedia.
trying to run anything reliable when you give any control you had to other random people on the Internet is doomed to fail.
I've heard a talk from someone who suggested moving to content-addressing: instead of giving you a URL, I give you a sha1 hash of the page you want (and maybe an URL to tell you where to start looking). Then, you don't care from where you get your data, as long as it matches. You can grab the page from the originating host, or from a local cache, or from a bunch of different peers, or from... well, you name it. As long as you get the bits that match the hash, you're happy.
I think the idea is (1) good; (2) pie in the sky; and (3) applicable here.
that bad and instead of having to apologize, you just have your buddies brag about how smart and competent you are and how well payed.
us little people down here in meatspace... when we f-up that bad, we get this thing called 'fired'.
You realise of course that pretty much all of Wikipedia is multiply mirrored ... answers.com, Google cache, Bing ...
http://rocknerd.co.uk