Slashdot Mirror


Power Outage Takes Wikimedia Down

Baricom writes "Just a few weeks after a major power outage took out well-known blogging service LiveJournal for several hours, almost all of Wikimedia Foundation's services are offline due to a tripped circuit breaker at a different colo. Among other services, Wikimedia runs the well-known Wikipedia open encyclopedia. Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers. They have established an off-site backup of the fundraising page here until power returns."

577 comments

  1. This is why you don't turn Google down by Anonymous Coward · · Score: 5, Funny

    They'll turn the lights off.

    1. Re:This is why you don't turn Google down by rs79 · · Score: 1, Funny

      "They'll turn the lights off."

      Nah, that's not it. When Hunter Thompson shot himself in the head last night the bullet kept going and hit a really unfortunate piece of equipment.

      --
      Need Mercedes parts ?
    2. Re:This is why you don't turn Google down by brilliant-mistake · · Score: 0

      Holy crap, that's not funny, dude. Hunter Thompson was a great human being and one of the seminal writers of our time. It's not cool to make light of his death.

    3. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      You kill yourself and people are going to talk. I'm sure he realized this.

    4. Re:This is why you don't turn Google down by brilliant-mistake · · Score: 0, Offtopic

      People should talk. They should talk about his work and his contribution to American culture. They shouldn't be making fun of him. He deserves better. What if someone you knew killed himself? Would it be funny if people joked about that?

    5. Re:This is why you don't turn Google down by Captain+Nitpick · · Score: 2, Informative

      Nobody's turned Google down. There's been no actual proposals to turn down yet.

      --
      But then again, I could be wrong.
    6. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      Yes, which is why people I know shouldn't kill themselves. I will laugh at them. In fact, if any one I know expresses suicidal thoughts, I will tell them I will laugh if they kill themselves and if they don't want me to laugh, they shouldn't.

    7. Re:This is why you don't turn Google down by brilliant-mistake · · Score: 1

      Wow, what a swell friend you must be. Very supportive.

    8. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      Learn to spell before talking about Wikipedia.

    9. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      The fact is, I didn't know him. He's dead. Get over it.

    10. Re:This is why you don't turn Google down by citog · · Score: 1

      'Learn to laugh before you post on Slashdot' .. is a possible sarcastic response to that comment. 'Wikipaedo' should be a giveway to the grandparents intentions .. ermm .. you know what I mean :)

    11. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      No kidding. I think I just went blind.

    12. Re:This is why you don't turn Google down by multisync · · Score: 2, Insightful

      They should talk about his work and his contribution to American culture. They shouldn't be making fun of him. He deserves better.


      If Hunter S. Thompson were still alive, he'd be making fun of himself for killing himself.

      --
      I don't care why you're posting AC
    13. Re:This is why you don't turn Google down by benna · · Score: 0, Offtopic

      I think Hunter Thompson would probobly rather his death not be taken so seriously.

      --
      "It is not how things are in the world that is mystical, but that it exists." -Ludwig Wittgenstein
    14. Re:This is why you don't turn Google down by oldwolf13 · · Score: 1

      That's awesome dude, and very much true.

      please note: I am a big Raoul Duke fan as well, R.I.P.

      He lived crazily... hell why die boring?

      --
      If I can't smoke and swear I'm fucked.
    15. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      I think he was trying to imply that Wikipedia is run by paedophiles (which is the British spelling of pedophiles) or something. It's really a rather incoherent insult though.

    16. Re:This is why you don't turn Google down by LWATCDR · · Score: 1

      "Hunter Thompson was a great human being" Not to be mean or anything but how was he a great human being? What war did he end? What life did he save? How many poor did he feed? How many people did he teach to read? I am not saying that he was not a good writer or even a great writer but a great human being? I would never make fun of his death because I am sure he was loved and for them it brings great sorrow. But saying he was a great human being seems to be over the top.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    17. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      Supporting someone post-suicide is not exceptionally useful, but it's a beautiful thought.

    18. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      How is googles non proposal different then when you raised money last time for new servers while refusing to say what you were going to spend the money on?

    19. Re:This is why you don't turn Google down by Captain+Nitpick · · Score: 1
      How is googles non proposal different then when you raised money last time for new servers while refusing to say what you were going to spend the money on?

      The Wikimedia Foundation said in broad terms what the money was going to be spent on. That they didn't itemize their spending for the next six months down to the penny does not constitute a refusal to say what it was going to be used for.

      --
      But then again, I could be wrong.
    20. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      Posters from the project refused on /. to even give a broad description of what type of hardware and other uses the money would be put to.

      Sorry, but when people are asking for money it is incombent on them to say what they are going to use it for. And, if I recall correctly, quite a few of the wikipedia posters got highly upset when they were asked about plans for the money.

      Honest people do not do that.

    21. Re:This is why you don't turn Google down by 1lus10n · · Score: 1

      Lemme put this another way for you:
      How many people has he inspired to become writers ?
      How many people has he inspired to challenge the system ?
      How many people has he inspired to fight for their beliefs ?
      How many people has he inspired to get drunk and trash a hotel room ?

      Greatness is not measured purely by direct action. Just because I donate money to homeless people doesnt make me great. Greatness comes in many forms, inspiration being one of them. Many great artists (writers, muscicians, painters etc) are great people because of what they inspire in others. Inspiration is greatness. Charity is greatness. Intelligence is greatness. Greatness comes in many forms.

      --
      "Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe." --Albert Einstein
    22. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0

      He did a fuckload more than you'll ever amount to, sunshine.

    23. Re:This is why you don't turn Google down by mdecarle · · Score: 2, Insightful

      Must you really know what the money is being spend on?

      If you donate money, you are asking them to continue to offer their great service to you and other people. How they achieve that goal, is up to them, no?

      You don't ask the Red Cross what they use your money for, do you? The organisation usually tells you afterwards.

    24. Re:This is why you don't turn Google down by rs79 · · Score: 1
      Make light of his death or bring it up for discussion? Oddly my submission of the death of "the only reporter in the 20th century to tell the truth" as a store here was rejected by some infantile swine.

      Make no mistake. I'm a fan. I'm the asshole who went to see one of his rare utterly brilliant public performances in Long Beach on April 5 1989, risked being thrown out by taking pictures, tape recorded it, transcribed it and posted it all to usenet.

      I think it was at a club called the Golden Bear, but hey, if you can rememeber where you saw HST you wern't really there.

      If you think he wouln't like being talked about like this then you don't know him at all. By his own words:

      {Guy in audience} What would you like people to remember you by ?

      {HST} I don't have to worry about it.


      Sadly I fear part 1 is lost.

      http://ctr.vrx.net/hst/

      "Chew on that gibberish for a while you heartless scum" - HST.

      Day one of an HST-less world was a cold, dark and ugly place. Godspeed you savage twisted motherfucker.
      --
      Need Mercedes parts ?
    25. Re:This is why you don't turn Google down by rs79 · · Score: 1

      "Not to be mean or anything but how was he a great human being?"

      He walked with the Kings.

      --
      Need Mercedes parts ?
    26. Re:This is why you don't turn Google down by Anonymous Coward · · Score: 0
      You don't ask the Red Cross what they use your money for, do you?

      Quite frankly, after 9/11, people SHOULD be asking the red cross what they're using your money for. They solicited donations under the pretense that it was going to help families of 9/11, but in reality it went into the Red Cross war chest to pay for ... who knows what.

    27. Re:This is why you don't turn Google down by __aajqwr7439 · · Score: 1

      and the Angels.

      dn

    28. Re:This is why you don't turn Google down by Captain+Nitpick · · Score: 1
      Posters from the project refused on /. to even give a broad description of what type of hardware and other uses the money would be put to.

      A cursory glance at the story on the previous fundraiser reveals no such thing.

      Sorry, but when people are asking for money it is incombent on them to say what they are going to use it for.

      Wikimedia has said what the money is to be used for. If you don't like the level of detail given, then don't donate. But stop implying that it's part of some conspiracy.

      And, if I recall correctly, quite a few of the wikipedia posters got highly upset when they were asked about plans for the money.

      Again, I saw no such thing in the previous fundraising discussion.

      Honest people do not do that.

      Honest people don't make vague unsubstantiated claims about people acting to defraud the public.

      --
      But then again, I could be wrong.
    29. Re:This is why you don't turn Google down by LWATCDR · · Score: 1

      You see that is where I have to disagree.
      Writers can be great, musicians can be great, and painters can be great. But the vast majority are no greater than a plumber or house painter. I did not challenge that he was a good writer or even a great writer but that does not make him a great human being.
      Even the list of things that you put down skipping the first and last do not make for a great human being.
      Think of all the great evil that has come from challenging the system and fighting for what they believe. Hitler challenged the system and fought for his beliefs. The Oklahoma City bomber challenged the system and fought for what he believed. To be becoming a writer is no greater than becoming a plumber. Getting drunk and trashing a hotel room does not make anyone great.
      Sure Hunter Thompson may be a great writer but I do not see how he can be called a great human being. If you give him credit for the "good" you say he has done will you also give him blame for everyone what has read one of his books and died driving drunk or ended up in rehab for drug use? I bet the answer is no. People make up their own minds and control their own actions. They also choose what inspires them and what they do with that inspiration.
      A great human being is someone that servers others or changes the world. People like Salk or Mother Theresa. Hunter Thomas from what I have seen has not done that. He was in the end like musicians, painters, football stars, basketball stars, baseball stars, and actors an entertainer. I see a problem when we raise entertainment to the realm of human greatness. As I said he maybe a great writer. But great human being? I still do not see it.
      I do feel bad for the pain that his friends and family are going through. I wish them well.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    30. Re:This is why you don't turn Google down by pk2000 · · Score: 1
      Think of all the great evil that has come from challenging the system and fighting for what they believe.
      USA?
    31. Re:This is why you don't turn Google down by jo42 · · Score: 1

      What is a 'Hunter Thompson' and why should I give a flaming fart?

    32. Re:This is why you don't turn Google down by LWATCDR · · Score: 1

      Some would even agree with that statement.
      That is why I specificaly left out the people that destoryed the WTC.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    33. Re:This is why you don't turn Google down by elemental23 · · Score: 1

      Are you an idiot?

      http://www.nytimes.com/2005/02/22/books/22thompson .html

      I would post a Wikipedia URL but, you know...

      --
      I like my women like my coffee... pale and bitter.
    34. Re:This is why you don't turn Google down by 1lus10n · · Score: 1

      Wrong. Plumbers and the like can learn their trade from someone. Being a good plumber does not take vision or talent, it takes the ability to deal with shit and learn.

      You cant teach what van gogh had. You cant go to school to learn how to be the next davinci. You either have it or you dont.

      "Think of all the great evil that has come from challenging the system and fighting for what they believe. Hitler challenged the system and fought for his beliefs. The Oklahoma City bomber challenged the system and fought for what he believed."

      You have got to be high, or joking. The greatest evils in the history of this world were allowed to happen when other people stood by and LET them happen. Hitler did bad things. Horrible things. However many great things came from the nazi's, not the least of which is the tolerance gained once people saw how truly horrible intolerance can be.

      "They also choose what inspires them and what they do with that inspiration."

      Its not that simple. Sometimes things just grab you. You cant choose what or who you love. I'm sure he probably turned quite a few people into drug addicts or drunks. Is that really a bad thing ? Stupid people die. If you drive drunk you are a stupid person. Hopefully your stupidity will make headlines in the paper so I can post it to the CoFD mailing list and we can all have a chuckle at your corpse's expense.

      Entertainment does not include thought. You speak of mother theresa, but in all likelihood without the written word and the thought provoking arguments of others before her we would not have had mother theresa.

      Football players entertain, they do not create, they do not challenge. Music has the ability to do these things, as do other forms of art like writing or film. Without these things many people would not have the opportunity to fight the system of oppression and greed. Without these things they would not know they are being oppressed. Look back through history at the role of writers, artists, poets and the like. They have always been at the forefront of change. They have brought about discussion on topics that the average person has been afraid to broach.

      That is greatness.

      --
      "Two things are infinite: the universe and human stupidity; and I'm not sure about the the universe." --Albert Einstein
  2. Coincidence... ;) by Faust7 · · Score: 5, Funny

    Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers.

    "You see, guys? This is what could happen if we ever ran out of money. Now cough up some dough!"

    1. Re:Coincidence... ;) by xsupergr0verx · · Score: 5, Funny

      So... slashdot the offsite backup?

      --

      Click here for a free picture of an iPod!
    2. Re:Coincidence... ;) by daveo0331 · · Score: 5, Interesting

      On the other hand, subjecting the donation page to the Slashdot effect seems like a great way to reach the fundraising goal in no time. Assuming of course the page itself stays up.

      Seriously though, if you like wikipedia, consider donating, even if it's just 5 bucks. I think it's even tax deductible if you itemize.

      --
      Remember the days when Republicans were the party of fiscal responsibility?
    3. Re:Coincidence... ;) by AdmiralWeirdbeard · · Score: 1

      Hey, at least they had the dough to cover a backup of the fundraising site.
      I mean, you gotta spend money to make money, you know?

      --
      Come read my stupid blagablog. Rants and Giggles
    4. Re:Coincidence... ;) by Raul654 · · Score: 5, Informative

      I was just in freenode joking with Jimbo about this. He said he thought was wondering how long it would be before slashdot ran a story about it (2 hours) and asked people to please stop with the consideracy theories. Meanwhile, the devs are working fairly furiously to get it back up (Kate hasn't slept in 27 hours. Jimbo just declared Feb 22 to be Kate-day) (--A wikipedia admin.)

      --


      To make laws that man cannot, and will not obey, serves to bring all law into contempt.
      --E.C. Stanton
    5. Re:Coincidence... ;) by FuturePastNow · · Score: 1, Funny

      Consideracy theory? Is that like when you worry about people being nice to you?

      --
      Give a man fire, and you warm him for the night. Set a man on fire, and you warm him for the rest of his life.
    6. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      it's called a joke dude, have a heart. ;)

    7. Re:Coincidence... ;) by Raul654 · · Score: 3, Insightful

      No no, but with the google deal looming, the tin-foil-hatters are paying close attention to wikipedia, and every little thing gets overly-scrutinized.

      --


      To make laws that man cannot, and will not obey, serves to bring all law into contempt.
      --E.C. Stanton
    8. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      Hmm ... Kate. Kates are always cute.

      Got a pic? =)

    9. Re:Coincidence... ;) by josh3736 · · Score: 1
      Almost:
      The Wikimedia Foundation Inc., a Florida not-for-profit corporation, is registered as a charitable organization with the State of Florida's Division of Consumer Services, a division of the State of Florida's Department of Agriculture and Consumer Services, and may lawfully solicit donations under Florida law. However, Wikimedia is still in the process of obtaining official tax exempt status from the United States Internal Revenue Service. As it is a new organization (corporate status granted: June 20, 2003), you may not deduct donations from your federally-taxable income until the IRS determines Wikimedia is tax exempt. If you make a donation, you will receive the paperwork needed to claim it as a tax deduction once tax exempt status is granted; please contact a tax professional for the details of deducting such a donation.
    10. Re:Coincidence... ;) by Captain+Nitpick · · Score: 0, Offtopic

      Hmm ... Kate. Kates are always cute.

      Got a pic? =)

      You know not what you ask. Trust me.

      --
      But then again, I could be wrong.
    11. Re:Coincidence... ;) by thedustbustr · · Score: 1

      slashdot the fundraising page? Talk about massive revenue... Anyway, bandwidth isn't their concern *as I speak, you hear a boom as the offsite server goes up in smoke*

      --
      This sig is false.
    12. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      My guess is that she's the blond in this GIS.

    13. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      I think he's commenting on your awful spelling. I seriously hope you don't write anything for Wikipedia.

    14. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      I'm going to pick up some wikipedia swag. $5 from every item goes to the foundation.

      I'm sure one of those mugs would make a great pen-holder. Might as well get the coasters to go with it.

      Wikipedia swag here

    15. Re:Coincidence... ;) by m0ok · · Score: 1

      you know, that IS THE COOLEST VOICE EVARRR Soundwave! :D *memories*

      --
      *I am the anti-sig*
    16. Re:Coincidence... ;) by OverlordQ · · Score: 1

      Considering LokiTorrent made $40K+ and probably ran off with most of that, why can't Wikimedia come up with more?

      --
      Your hair look like poop, Bob! - Wanker.
    17. Re:Coincidence... ;) by Captain+Nitpick · · Score: 1
      "You see, guys? This is what could happen if we ever ran out of money. Now cough up some dough!"

      Daniel "mav" Mayer, the Wikimedia Foundation CFO had this to say on IRC:

      well if this were some type of grand conspiracy to get more money in the fund drive, then it is a dismal failure
      --
      But then again, I could be wrong.
    18. Re:Coincidence... ;) by register_ax · · Score: 0
      It took some digging but here is what I found.

      Kate most likely uses DVORAK. Her uname is UNIX UNIXPC SYSTEMS 3.0INTL mc68k. The timestamp is at 12:03 am, so she is either a night owl, or had recently reset CMOS.

      In this picture, you can see her desk layout. Note that if that is her wrist, she isn't a whale afterall. But we've never seen a fat troll either so we keep looking...

      But you wouldn't want to room up with her. Two geeks a god-send? One would think, but then take a moment to consider double the mess. It's better to fit a clean person and messy person together, and then reach for some medium ground. Trust me.

      Her clickety-clackety keyboard and cheap mouse, clicky

      But we want pictures. We want to fit this busty gal to someone of the likes of Kate Kohl, right? Well comrades, here is our-ahem-mess of a girl. BAM Nice picture eh? A real tease right. You see them nice lips through that soft, flowing ... erhmm ... raggedy, tress?

      So give me more I hear you say. Alas I've run clean. No more pictures this wild stead can round in. So is there any other defining factor before I pronounce my wedlock proposal I hear you ask. Indeed, the fact she uses .... drum roll ... KDE!! Yes, yes it is true. How horrible of a woman to do so. Well, it's all part of the show.

      Perhaps a "geek" on this website who has been deluded by the inviting K, will wed her on the morrow. How ignorant of the poor chap where her cosmetics are dumped for computer bloat, and will evermore tie us up with the admin task of maintaining a proper show. She's perfectly capable of administering her own programs I hear you say, but I ask you what women really carries around her own baggage when really to have a man toil at her feet? May that ignorant sap take her away from eye and mind so she may never taunt us with her feminine nature again! (well at least until next happy hour)

      Once more into the pussy let's dive, but out before the lock lies ringed.

    19. Re:Coincidence... ;) by Random+Chaos · · Score: 2, Informative

      Well...Slashdot has fully hit:

      Temporary fundraising site: "This account has exceeded it's bandwidth quota and has been temporarily disabled."

    20. Re:Coincidence... ;) by Anonymous Coward · · Score: 1, Interesting

      That's not "SYSTEMS", it's "System 5 R3.0"

      In other words, Wikipedia runs on SCO technology. By the Power of Moroni, SCOPOWER!!!

    21. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      It's all good. On 'pedia his spelling is corrected by others, on /. he gets flamed to medium crisp.

    22. Re:Coincidence... ;) by fbform · · Score: 1
      (--A wikipedia admin.)

      'ere! What sort of signature is that? You're supposed to type "--~~~~".

      --
      Time flies like an arrow. Fruit flies like a banana.
    23. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      SVR3 is WAYYY before SCO times.

    24. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      'ere! What sort of signature is that? You're supposed to type "--~~~~".

      Bogus. "~~~~". "--" is for incurable usenetters.

    25. Re:Coincidence... ;) by fredrikj · · Score: 4, Interesting

      On the other hand, subjecting the donation page to the Slashdot effect seems like a great way to reach the fundraising goal in no time. Assuming of course the page itself stays up.

      You do know that Wikipedia receives something like 100 times the traffic Slashdot does, right?

    26. Re:Coincidence... ;) by David+Gerard · · Score: 2, Interesting

      I can now see why Kate NEVER EVER emerges from her heavily-armed bunker in Oxfordshire.

      --
      http://rocknerd.co.uk
    27. Re:Coincidence... ;) by arafel · · Score: 1

      Yes, but the backup page isn't hosted on the Wikipedia servers. Otherwise it wouldn't be much of a backup. ;-)

      And, in fact, it's giving a "bandwidth exceeded" message as I type...

    28. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      This Kate happens to be a guy (transvestite or something).

    29. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      So it should be called 'wikipedia effect', not './ effect'...

    30. Re:Coincidence... ;) by DA_Chef · · Score: 3, Interesting

      Something like that, yes: Alexa's statistics

    31. Re:Coincidence... ;) by Anonymous Coward · · Score: 0

      Somehow, when you say 'busty', Kate Kohl is the last thing that comes to mind. I mean seriously, does a grown man want someone with the body of a starving 12 year old? When's the last time this Kate had a full meal, and digested it??

    32. Re:Coincidence... ;) by camt · · Score: 1

      Seriously though, if you like wikipedia, consider donating, even if it's just 5 bucks. I think it's even tax deductible if you itemize.

      It is *not* tax deductible yet. They have not yet received federal charity status, though they have applied for it.

      I believe once they are granted that status, even your previous donations are then deductible, though that is best left up to your accountant for verification.

    33. Re:Coincidence... ;) by Jamesday · · Score: 1

      Slashdot is pretty popular. Wikipedia really only does 7-10 times its traffic according to Alexa.com. Hard to say which is most undercounted though. Wikipedia probably gets more AOL traffic, Slashdot more people who don't do more than curse at the Alexa toolbar used for collecting the statistics. Neither counted.

  3. What Happened. by Anonymous Coward · · Score: 5, Informative

    What happened?
    At about 14:15 PST some circuit breakers were tripped in the colocation facility where our servers are housed. Although the facility has a well-stocked generator, this took out power to places inside the facility, including the switch that connects us to the network and all our servers.

    What's wrong?
    After some minutes, the switch and most of our machines had rebooted. Some of our servers required additional work to get up, and a few may still be sitting there dead but can be worked around.

    The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state. Attempting to bring up the master database and one of the slaves immediately after the downtime showed corruption in parts of the database. We're currently running full backups of the raw data on two other database slave servers prior to attempting recovery on them (recovery alters the data).

    If these machines also can't be recovered, we may have to restore from backup and replay log files which could take a while.

    1. Re:What Happened. by khallow · · Score: 0, Redundant

      I hope this incident tipped the balance in favor of using UPS's to protect your servers and implementing some sort of off-site data backup.

    2. Re:What Happened. by Pinkfud · · Score: 1

      Oddly enough, I had just finished reading that message before coming over here. I'm a fairly regular contributor to the Wiktionary project. It's pretty rare that I learn about an event before reading it on Slashdot.

      --
      The world is my oyster. That's why it's always in a stew.
    3. Re:What Happened. by wakejagr · · Score: 2, Insightful

      Kudos to Wikimedia for actually explaining what happened and not just putting a "This page is down, please try again later" messege up. Many people/companies/groups/etc would be too proud or too afraid of bad publicity to actually explain the problem.

      --
      Don't save Windows XP! http://www.petitiononline.com/jjw1xp/petition.html
    4. Re:What Happened. by bigberk · · Score: 1
      Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state
      Someone please remind me again why massive databases are not yet being implemented with simple discrete file storage on ReiserFS. Sure, MySQL will be faster once in memory but it sounds like the price you pay is lack of robust storage and difficult backup/recovery -- probably the most important part of running a database.
    5. Re:What Happened. by Anonymous Coward · · Score: 2, Insightful

      You do know that in real datacenters you don't have a UPS on each PC, but a UPS for the ROOM and between this UPS and your servers you are going to need brakers, so if you put to many things on a circuit it may cause problem, as simple as that.

    6. Re:What Happened. by Anonymous Coward · · Score: 5, Funny

      Real datacenters don't have PeeCees.

      Oh, maybe one, out at the guard's desk.

    7. Re:What Happened. by Anonymous Coward · · Score: 0

      You do know that in real datacenters you don't have a UPS on each PC, but a UPS for the ROOM

      You're a fucking moron. Why would redundancy make something not "real"?
      What constitutes a "real" datacenter. A room with a couple walls is a real datacenter.

      Anyway, as for UPSes, I have seen many instances of UPSes per blade or unit. There are tons of rackmountable per unit UPS systems on the market.

      It is a simple cost/benefit thing.

    8. Re:What Happened. by Anonymous Coward · · Score: 1, Insightful

      The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state.

      I don't get it, then why the fuck bother with InnoDB. Transactions/ACIDity imply a performance penalty over just cache and async write of a direct image. One pays this penalty for the benefits (usually critical for many applications) of data integrity and robustness. How would you like your bank to run on MySQL?

      This is the dumbest thing I've ever heard. I used to tell MySQL weenies that their DBMS sucked because it had no transaction support, then recently these annoying inbred fuckwits tell me that MySQL is just as good as Oracle because it has InnoDB support (we'll let the fact that the schema is kept in the shitball format slide)...Well apparently these morons don't have a fucking clue what transaction processing really means. Usually COMMIT and ROLLBACK are suppossed to actually mean something... and even working 90% of the time doesn't cut it.

      I would never donate to this goddamn Wikipedia project as long as I know that the funds are going to end up being sapped to support their crippled shitball database.

    9. Re:What Happened. by Anonymous Coward · · Score: 0

      >I'm a fairly regular contributor to the Wiktionary project.

      Get ready to re-contribute as it seems their database got corrupt.

    10. Re:What Happened. by afidel · · Score: 1

      In a real datacenter there are two sources of power to every rack so that stupid stuff like this doesn't kill an entire section =)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    11. Re:What Happened. by Leo+McGarry · · Score: 3, Interesting

      What constitutes a "real" datacenter.

      One that complies with building and safety codes, for starters. In every jurisdiction with which I'm familiar -- admittedly not even close to all of them-- it's actually against the law to have a battery unit inside a data center cage. It's a violation of the safety code. When fire and rescue personnel go into a commercial building, they have to be sure that the power is really off. If there's a battery lying around somewhere, shorting to ground through a desk or door frame for instance, it can cause big problems.

      Ask around. I bet you'll find that your data center explicitly forbids customer-installed battery units.

    12. Re:What Happened. by Leo+McGarry · · Score: 1

      I don't think it's fair to compare a site like Wikipedia to a bank. With a bank, it's vital that transactions complete and that they not be lost somewhere due to a computer glitch. Wikipedia is, for all intents and purposes, a hobby site. If the whole thing were to vanish tomorrow, nobody would miss it.

      Of course, you're right on about not making a donation. If you want your money to go to a good use, make sure you vet who it is you're donating it to in advance. Do you really want to donate it to what basically amounts to some guy's personal home page gone way out of control?

    13. Re:What Happened. by mr_zorg · · Score: 2, Informative

      You laugh at this, but it's 100% true.

    14. Re:What Happened. by Anonymous Coward · · Score: 0

      Kudos to wakejagr for actually explaining what happened and not just putting a "This sig is down, please try again later" message up. Many slashdot people would be too proud or afraid of bad publicity to actually explain the problem.

    15. Re:What Happened. by StalinJoe · · Score: 1

      Ever since I found Wiktionary last fall, I have neglected my moderation and meta-moderation duties here on /. It's been some time since I've logged into this account.

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." - Josef Stalin
    16. Re:What Happened. by Cramer · · Score: 1
      • I would never donate to this goddamn Wikipedia project as long as I know that the funds are going to end up being sapped to support their crippled shitball database.
      Technically, your donation wouldn't be. The donations are used to pay for things like new servers, bandwidth, co-lo costs, etc. As far as I'm aware, the project hasn't bought any software -- which means MySQL AB hasn't been paid for the "crippled shitball database" being used. (And bomis may still be picking up the bill for the bandwidth and co-lo space.) All of the people working on the project are volunteers; they aren't paid for their contributions.
    17. Re:What Happened. by heliopilot · · Score: 1


      AC power distribution in a large data center can be quite complex. Although local authorities recognize the need for reliable backup power, amazingly, protection of life and property is considered more important than protection of data, and the design of the data center AC power systems reflects that philosophy.

      I worked in two different data centers in the 1980-1990 time-frame. They both used Halon fire suppression systems and a sophisticated alarm monitoring system. A halon recharge cost over $25,000. If there was an alarm/fire in the data center, ALL AC power (raw and UPS) was killed and the halon dumped.

      Raw and UPS AC power was routed through circuit breaker panels and large AC contactors. Large UPS systems must be designed to be more reliable than the municipal power, not a trivial task. In our centers, multiple redundant (50KW?) UPS systems ran in parallel and the output power was phase synchronized. Battery plants lived in their own vaults. At least one UPS (if not more) could drop off-line for repairs/maintenance without losing power to any equipment. There were numerous sub-panels behind the UPS system that fed individual circuits in different parts of the data center.

      When a fire alarm sounded, things got interesting. The staff of 30 had about 1 minute to evacuate the data center. The last one out stood by any door and held a large, red, spring loaded hold-off button depressed. As long as the button was depressed, nothing further happened. We were instructed to hold the button and observe conditions in the data center till either of two events happened. A) We observed serious smoke/fire, in which case we let go of the button and left, allowing all power to be cut, and the halon dump to proceed. Or B) Responding firemen arrived, at which time the system was manually deactivated and the source of the alarm determined and the cause rectified.

      Once, an electrician was working on the lighting system and accidently shorted one of the smoke detector sensor alarm loops. This (by design?) tripped the halon dump and killed AC power immediately, with no opportunity to exit first. We staff had to exit in the dark, stumbling into racks of equipment and over tables and chairs through streams of downpouring halon (very noisy, like dozens of scuba tanks being emptied simultaneously). The "wind" of halon emptied dozens of 4' tall boxes of wide format printer paper feeding the dozen or so chain printers. Tens of thousands of feet of paper were blown randomly all over the center.

      I work today around much smaller data center, but the convention is still to use a small number of large UPS's, not a large number of individual UPS's to drive individual PC's. I'm told there are some code limitations, but mostly it boils down to cost and reliability. The batteries in small "home/office" UPS's wear out pretty regularly and somewhat unpredictabley. Most are "switchers" that do not decouple power spikes and noise on the mains completely. It is best to have a well designed backup power system consisting of redundant UPS systems in parallel, and an autostart emergency generator (which is often required by code in large buildings anyway).

    18. Re:What Happened. by Anonymous Coward · · Score: 0

      Reiser does not journal data, only the metadata. Turn off the computer while data is being written to disk and you will not only lose that data, but the data it was overwriting as well. The file system may not have to be "repaired" ever, but data does frequently just vanish from it. This is NOT proper journalling.

      It's overal complexity seems to also cause it to just generally lose data FASTER than good old ext2 as well.

      Wake me up when you have a REAL journalling FS to suggest.

    19. Re:What Happened. by Anonymous Coward · · Score: 0

      Once, an electrician was working on the lighting system and accidently shorted one of the smoke detector sensor alarm loops. This (by design?) tripped the halon dump and killed AC power immediately, with no opportunity to exit first.

      When I read it first I was sure that the electrician was killed, I swear.

    20. Re:What Happened. by Anonymous Coward · · Score: 0

      Just to be clear, the tripped breakers serviced racks occupied solely by Wikimedia, downstream of the UPS plant.

  4. Just like PBS by FunWithHeadlines · · Score: 1, Funny
    "Coincidentally, the foundation is in the middle of a fundraising drive to pay for new servers."

    So like PBS, they bring the service down to remind you they need the cash to provide you with the service you wanted to see but they just brought down.

  5. how ironic by iosmart · · Score: 0, Redundant

    Nothing for you to see here. Please move along. lol, or does slashdot just take a long time to update? great, i click submit and "The operation timed out while trying to connect to slashdot.org"

  6. News Update by Anonymous Coward · · Score: 5, Funny

    After returning from the power outage, the servers have just been slash-fried.

    1. Re:News Update by Captain+Nitpick · · Score: 1
      After returning from the power outage, the servers have just been slash-fried.

      The Slashdot effect is negligible compared to Wikipedia's normal insanely high traffic.

      --
      But then again, I could be wrong.
  7. They should ask for more... by PornMaster · · Score: 2, Informative

    If they bought actual servers with dual power supplies and got power from multiple PDUs at their data center, they would be much better off. If this is really because of a tripped breaker, then it's pretty inexcusable, since dual power supplies fed from separate circuits would have prevented it... unlike the LJ outage which was from the power being cut to all circuits.

    But if they're going to cobble together some whitebox crap servers, and not change the architecture, they'll be right back to an outage next time it happens.

    1. Re:They should ask for more... by Raul654 · · Score: 2, Insightful

      Right, because we all know money grows on trees...

      --


      To make laws that man cannot, and will not obey, serves to bring all law into contempt.
      --E.C. Stanton
    2. Re:They should ask for more... by v1 · · Score: 1

      From the sounds of it, the colo has a monster generator outside, and no UPS's inside. That sounds like a really bad plan, since generators can't cut in quick enough to stop a box from rebooting. It sounds like the equipment inside the building was unprotected... either that or the breakers tripped and the UPS's started screaming and nobody was there to do anything about it. (or the UPS's had squat for runtime or were overloaded - poor service no matter how you look at it)

      --
      I work for the Department of Redundancy Department.
    3. Re:They should ask for more... by Anonymous Coward · · Score: 0

      no, but apples grow on trees, and apples can be exchanged for money.

    4. Re:They should ask for more... by Anonymous Coward · · Score: 0

      What about herpes, which grows on my penis for free?

    5. Re:They should ask for more... by 42forty-two42 · · Score: 1

      There might've been a breaker after the UPS.

    6. Re:They should ask for more... by ergo98 · · Score: 1

      Any system should robustly support rebooting, and it's a little disconcerting seeing seemingly regular stories of these sites with systems that they need to kick, cajole, or press "ANY KEY" to get the site operational again.

      In the case of Wikipedia if they had a robust database backend the circuit would have been re-enabled, their systems would have powered on, and the world would be great again. At this colo the UPS was likely upstream of the breaker (like at LiveJournal).

    7. Re:They should ask for more... by Kris_J · · Score: 1

      Agreed. I "tested" a secondary domain controller one time when I "discovered" that one of the sockets in a UPS connected to the primary was a bit loose.

    8. Re:They should ask for more... by man_ls · · Score: 3, Insightful

      IIRC, that's the Fire Code. The breaker needs to be able to unconditionally kill all power inside the facility. Thus -- it kills the power post-UPS.

    9. Re:They should ask for more... by PornMaster · · Score: 3, Insightful

      Sometimes it costs more to do things wrong, in the long term, than to do them right.

    10. Re:They should ask for more... by Anonymous Coward · · Score: 0

      Amazing how omnipotent and yet, completely unforgiving some random Slashdotters are. I guess being god's own admin tends to give you unlimited hindsight.. in reverse. Or something.

      Anyway. I guess all I'm trying to say is that while you might have a point, you probably don't know everything that's going on down there, so trolling Slashdot is kinda pointless, no?

    11. Re:They should ask for more... by mboverload · · Score: 2, Insightful

      Hey man, they have their traffic doubling every 4 months, they NEVER planned for this sucess this early. Building infrastructure is hard when you never plan for it.

    12. Re:They should ask for more... by Anonymous Coward · · Score: 0

      if you find someone that will give you money for your herpes, let me know... i've got a couple of things i'd like to sell them as well.

    13. Re:They should ask for more... by Jamesday · · Score: 3, Informative

      The database servers have dual redundant supplies and the colo tells us that TWO circuit breakers tripped. Fun. Not. Do try to avoid having the same happen to you - losing both circuits isn't fun.

    14. Re:They should ask for more... by poopdeville · · Score: 1

      Sometimes it costs more to buy spiffy new servers than to recycle hardware. Particularly to a non-profit organization with shockingly fast growth. (Hint -- this isn't a for profit company. They don't make money doing it. There is no "loss" except to the community)

      --
      After all, I am strangely colored.
    15. Re:They should ask for more... by thedustbustr · · Score: 1

      Sometimes you have to make the short term price cuts instead of the long term savings because you don't have enough money to cover the less expensive alternatives initial cost. Have you ever rented an appartment? That's lost money. Could you afford to invest in a house right after school?

      --
      This sig is false.
    16. Re:They should ask for more... by brion · · Score: 4, Interesting

      Our database masters do have dual power supplies. The circuit breakers were tripped on both sides.

      --

      Chu vi parolas Vikipedion?

    17. Re:They should ask for more... by Anonymous Coward · · Score: 0

      You do realize most countries make money out of paper, right?

    18. Re:They should ask for more... by Donny+Smith · · Score: 1

      Losing one circuit is misfortune.

      Losing both is carelesness.

    19. Re:They should ask for more... by batkiwi · · Score: 1

      I'll assume this was added after you posted this, otherwise please RTFA:

      "At about 14:15 PST some circuit breakers were tripped in the colocation facility where our servers are housed. Although the facility has a well-stocked generator, this took out power to places inside the facility, including the switch that connects us to the network and all our servers. (Yes, even the machines with dual power supplies -- both circuits got shut off.)"

    20. Re:They should ask for more... by Anonymous Coward · · Score: 0

      Sure your database servers have dual power supplies but they are still connected to the mains power. You should have attached these redundant power supplies to redundant UPS units with enough battery power to give you time to respond to the datacenter's outage and perform a clean shutdown of the machines. You also need a reliable 'out-of-band' line into the database servers so you can get remote access to them in order to shut them down. I was the Systems Manager for a small ASP with a terribly limited budget but we had this situation covered. We may not have been able to stay up during the big summer blackout some years ago but we had no problem to flip the switch when the power was restored. UPS power for both the database servers and the storage arrays. Breakers be dammned. It's a poor excuse and bad planning. They need more than money.

      BTW my home server has the SAME setup - dual power supplies and dual UPS units to protect our data.

      BTW I only found out about this because I just tried to consult Wikkipedia and found them down.

    21. Re:They should ask for more... by Anonymous Coward · · Score: 0

      And your just a dumb fuck tard who doesn't know shit about nothing. So you should just shut the fuck up. Really. I worked as a sys admin at an ISP for a year or two, and believe me, this type of crap does happen. But the people who have their shit together are able to recover the fastest. I saw guys that when http was not responding to there servers for more than 30 seconds, would call. If the situation wasn't handled by us within 20 minutes, the guy would be there in 10 minutes. This Wikipedia, I go there daily. They talk about how Wikipedia will be replacing Encyclopedia Britianica someday, well they should have their shit togther. Google wants to help them, they should have their shit together. I like MySQL as much as the next guy, but fuck that toy database. It your going to do anything that is more than a recipe collection than you should be using a fully transactionable database. If you absolutely cannot go down without database corruption, get some fucking UPS's. Wikipedia, I love you. But grow the fuck up. You have an excellent system but you cannot be down 1/2 a day for replication. Unacceptable.

    22. Re:They should ask for more... by Scarblac · · Score: 1

      BTW my home server has the SAME setup

      Your home server setup would be ILLEGAL in a data center because the circuit breaker is required to take off all the power, for use in case of a fire.

      --
      I believe posters are recognized by their sig. So I made one.
    23. Re:They should ask for more... by Cramer · · Score: 2, Informative

      A word on breakers... first, they aren't fuses. They are magnetically thrown -- pull too much current through it and an electromechanical break is closed releasing the breaker contacts which are pushed/pulled apart by springs. As it's magnetically thrown, tripping one breaker can (and does) trip surrounding breakers. I've seen it happen a number of times -- with brand new breakers, even.

    24. Re:They should ask for more... by Anonymous Coward · · Score: 0

      I don't really care and just where are these "laws" about what goes on in a data centre. If I want my database servers to retain power from a UPS and perform a clean shutdown that's what they are going to do.

    25. Re:They should ask for more... by Anonymous Coward · · Score: 0

      I actually checked our TOS agreement and it specifically allows us to attach a UPS as additional protection to our servers - Just because you and your poorass company don't have a budget for good equipment don't try crawling up my ass over it. While you're busy recovering your database I'll be bringing services back on line.

      Here's the paragraph:

      We recommend regular backups of your information and installation of an Uninterruptible Power Source (UPS) for your own protection. Our power is supported by a commercial UPS system. However, extended power interruptions, beyond the capacity of our UPS system, are possible. Installation of a UPS system adequate to your needs is recommended. It is your responsibility to insure the protection of both your equipment and software from power loss.

      Any data centre that prevents a client from taking additional steps to protect thier data and servers from power loss does not deserve to have clients.

    26. Re:They should ask for more... by Anonymous Coward · · Score: 0

      "any" system should support rebooting, I agree but databases are slightly different animals. If they go down unexpectedly you don't want them coming back up "automatically". You want to have your DBA(s) there to verify the integrity of that database and perform any necessary repairs/recovery before bringing up the database in full read/write operation. No, I'm not a DBA but I work and have worked with good DBAs and know what to expect and what is expected.

    27. Re:They should ask for more... by ergo98 · · Score: 1

      Sure, if you're talking about crap DBAs.

      Any real modern DB system, such as Oracle, MS SQL Server, Sybase, DB2...any of the real ones...has extensive software support for recovery from failures.

    28. Re:They should ask for more... by Joe5678 · · Score: 1

      That would just be the wording somebody threw in to cover their ass in case the power goes out.

      In reality most areas have laws that require an EPO (Emergency Power Off) switch to protect people. This is so that when somebody is getting electrocuted, you can cut the power, obviously a UPS that does not get shut off by the EPO switch would be in violation. Any larger UPS system will have EPO contacts, although I doubt many data centers are going to run the EPO switch to your personal UPS.

    29. Re:They should ask for more... by Anonymous Coward · · Score: 0

      So that's still fine - power is cut to the facility which protects someone working on infrastructure or in case of fire. Our UPS units in our cages continue to power our servers allowing for an orderly shutdown of database servers and other critical servers. FYI our UPS units aren't the cheap APC home units but rack mounted UPS units that do have contacts to allow connection to an EPO switch or BRB however I don't think there is provision for us to connect to them in the data centre. Our servers continuing to receive direct power from our UPS units should in no way affect the safety of employees or others working in the data centre on other equipment. Would be nice if someone modded the other comments up a little so that folks would see them. FYI they were already "tested" in the big August power outage and performed admirably. Our three DB servers and storage arrays did an orderly shutdown. Our PDU (network addressable to power cycle servers and gear by script or direct intervention from remote connection) then did a power off on our arrays. A notebook that we keep running in our cage (from a compact flash based HD) provides additional monitoring and scripted control of other servers and equipment. It did require a site visit to power this notebook up again and to do a physical check on our equipment. Funny to hear all the moaning from others though as they fought with BIOS and servers that did not boot. We saw someone with a bunch of whitebox computers in some rather extreme frenzy. All our stuff was up and running though as per normal. Nice when a little forsight and planning actually works isn't it? (Our clients appreciated the effort too BTW)

  8. Arghh by dauthur · · Score: 1

    Perfect. Now I can't look up stuff Q-wiki-ly. Har har.

    I was affected by LJ going down too, so I know how this is. Pain in my ass.

    1. Re:Arghh by Trejkaz · · Score: 1

      As they say, Google does have a cache of all the articles. Maybe this really was Google's fault somehow. ;-)

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    2. Re:Arghh by Anonymous Coward · · Score: 0

      Cry me a fucking river you info-junkie. What the fuck do wikipedia or live journal have to do with human progress? Information? Fuck all that will do for us. If the US decides to bomb Iran next, will it be for lack of information? If your wife leaves you or your kids are born with ADD, is that because of lack of information?

      No way. So kick your fucking info-habit and get free from the influence of all the info-bullies around here, too.

      You think information has been democratized through these technologies? Get off the pot. Look at the effects this is Internet shit is having on our social fabric. Or your morality.

      Get it?

  9. Another indictment of MySql by Anonymous Coward · · Score: 5, Insightful

    Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state

    Ya know, I just don't understand why so many projects with such high visibility and requirements for reliability use a toy database like MySQL.

    Someone PLEASE tell me why. Because right now the only thing I can think is that people just don't know how to pronounce "Postgres".

    1. Re:Another indictment of MySql by ergo98 · · Score: 5, Interesting

      No database can guarantee data integrity in the case of a power failure.

      Barring a couple of extreme exceptions, of course a modern database system should protect integrity in the case of a power failure, or any other sudden system failure (kernel panic, GPF, whatever). In the case of the much maligned SQL Server, you can hit the power button all you want mid-transaction and you're going to get a blister on your finger before the database is corrupted.

    2. Re:Another indictment of MySql by Anonymous Coward · · Score: 5, Insightful

      No database can guarantee data integrity in the case of a power failure

      This is false. SQL Server 2000 (yeah, I know, instant mod-down) has a transaction log and so does Oracle and I'm sure every other half-decent database. ALL committed transactions are preserved and the data is in a consistent state.

      MySQL does not have this and the developers don't seem to care much about it. This is the problem with open-source in general, if someone is just doing it for fun they aren't going to spend any time on the stuff they don't care about personally.

    3. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      WRONG.

      Any real database server, which MySQL is most assuredly not, can guarentee data integrity since the last COMMIT. Once the COMMIT comes back, the data is supposed to be safely on the disk.

      If the database server loses power in the middle of a commit, it had damned better be able to rollback to back before the last commit. If it can't, it's a crappy database.

      MySQL can't do either. It's "transaction" support is a freaking joke. That we've now seen two sites suffer over a day of downtime simply due to MySQL being crap (LiveJournal and Wikipedia) should prove to anyone watching that you don't run important sites on MySQL.

      Maybe when Slashdot has to go down for several days when its MySQL server has power cut, then people will finally understand the lesson: you'd have to be insane to use MySQL for anything important. (Which, I suppose, means it's a perfect fit for LiveJournal. Laugh, it's a joke.)

    4. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Out-of-the-box fulltext searches.

      Nothing else really... this probably isn't even true anymore. But when you or someone else comes back with a link to Postgres' fulltext search functionality, it had better be as easy as MySQL's!

    5. Re:Another indictment of MySql by sploo22 · · Score: 5, Informative

      No database can guarantee data integrity in the case of a power failure.

      Think again. Techniques to do this have been around for years -- it's called stable storage. You just keep redundant copies of data that's changing, and use a neat and simple procedure to ensure that either they both get updated by a transaction, or the original data can be recovered. Certainly the most recent data might be lost, but there's no reason for the database to be corrupted or even in an inconsistent state.

      --
      Karma: Segmentation fault (tried to dereference a null post)
    6. Re:Another indictment of MySql by imroy · · Score: 5, Informative

      I just love stupid trolls that can't even use Google.

      Tsearch2 - full text extension for PostgreSQL
      DevX: Implementing Full Text Indexing with PostgreSQL - about Tsearch2.

      Tsearch2 is included in the postgresql-contrib package of at least Debian and Novell/SuSE. Is that "out of the box" enough for a clueless MySQL user?

    7. Re:Another indictment of MySql by Jamesday · · Score: 3, Informative

      Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?

      For the rest, we'll see as we get to them and, for any that fail, then look to see whether it was the disk controller or the disk drive lying about having the data written to battery backed up RAM or the disk surface.

      Wikipedia hasn't suffered a day of downtime yet for this reason and looks to be down for no more than a few hours this time. A previous incident lasting more than a day was a human or three screwing up and having two copies of the server software writing to the same database files without any locking to prevent conflicting updates. The result of that shouldn't surprise anyone.

    8. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Despite your rude tone, I am much obliged and am now a Postgres convert. Thanks!

    9. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Pfft. like you know what you are talking about?

      Jackass

    10. Re:Another indictment of MySql by fimbulvetr · · Score: 0

      How can the an application running on an OS guarantee that the hardware has written the data correctly, and that it will be 100% after the next boot? Even in normal situations the DB cannot guarantee that data will be accessible.

    11. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Go read a book on database design and journaling. This problem was solved about 30 years ago.

    12. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      At least ONE? What kind of bullshit answer is that? "At least I still have ONE of my eyes, so there to you and your stupid safety glasses!"

      ANY real database server can sustain a power outage with no corruption. PostgreSQL, Sybase, MSSQL, Oracle, etc. ANY of them.

      MySql is a silly little toy.

    13. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      How can the an application running on an OS guarantee that the hardware has written the data correctly, and that it will be 100% after the next boot?
      It's called journalling you idiot.

      In general, it, along with error correcting codes and hashes can make the statistical likelihood of a "false" transaction (a transaction committed when it never completed) arbitrarily small.

    14. Re:Another indictment of MySql by hunterx11 · · Score: 1

      MediaWiki 1.4 has experimental support for PostgreSQL; I wouldn't be too surprised if Wikipedia switched over to it in the future. Ssuch a transition would probably be fairly painful, though.

      --
      English is easier said than done.
    15. Re:Another indictment of MySql by Anonymous Coward · · Score: 2, Insightful

      I realise since you seem to be involved with wikipedia, you'll be modded up no matter what. However, what you just said makes no logical sense. The grandparent basically said that mysql's transaction support sucks and consequently it can't guarantee db integrity over a power failure. You said that because *one* server came back up with no problems that he should reassess mysql. You could have *all* your servers come back with no problems and it still wouldn't change the grandparents assessment. You would just be getting lucky.

    16. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      I love you more than a person ought to love an anonymous person on a website.

    17. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Est stulti persistere in errore.

      Not sure if I matched the original word order (which doesn't really matter due to the way Latin works), but that means "It is characteristic of a fool to persist in error." I thought that phrase was suitable for this situation.

    18. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Have you dipshits at Wikipedia ever heard of journaling? How about ACID transactions? MySQL is a silly little toy database. Ditch that piece of shit and join us here in the year 2005. MySQL is free. Often times you get what you pay for.

    19. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      You sir, are one dumbass nigger. You should be dragged behind a pickup truck until your nutsack is worn off your wretched worthless body.

    20. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      What the fuck is wrong with you? I keep finding really stupid posts with your sig beneath them.

      Your opinions aren't worth shit, which is extremely surprising given that they seem to be composed almost entirely of shit.

    21. Re:Another indictment of MySql by Tough+Love · · Score: 4, Insightful

      Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?

      But one didn't. That's a much more informative data point.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    22. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Captain, sensors report a 100% correlation between uninformed posts and your name!

    23. Re:Another indictment of MySql by xenocide2 · · Score: 1

      you must be too familiar with XFS and ReiserFS. Try a filesystem with full journalling, rather than just metadata logging. Ext3 is a far cry from the X "if it hurts, don't do that" FS.

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

    24. Re:Another indictment of MySql by novakyu · · Score: 1
      Est stulti persistere in errore.

      Not sure if I matched the original word order (which doesn't really matter due to the way Latin works), but that means "It is characteristic of a fool to persist in error." I thought that phrase was suitable for this situation.

      I haven't seen that quote, and my Latin is a bit short and rusty (took one-year grammar course, and it's been a year since), but I don't think "est" should be come at the beginning (or be capitalized, for that matter---capitalize only at the beginning of a paragraph, which you don't have here).

      "est" (or "sum, esse"), when it comes at the beginning of a sentence, means "there exists," rather than "is" (as a linking verb). Since here, it seems "persistere" is the subject (er... if I got it correctly that "stulti" is in dative... but it's pure guess...I guess it could be genitive, too---probably genitive, but that would still make "persistere" the subject), that can't be the sense of "est".

      But then, word order is nothing fixed in Latin (as you said), so it could be as you quoted, and I could be wrong. But I just wanted to point out that word order can change meaning (and definitely emphasis) in Latin.

      Now, if I am right, will you take your own advice?

    25. Re:Another indictment of MySql by Anonymous Coward · · Score: 0
      Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?

      With an ACID-compliant database, they all do, every time, barring a serious hardware failure. (Which a power loss is not.) One of out of two (or N out of M, for N < M) is not good enough.

    26. Re:Another indictment of MySql by mbaciarello · · Score: 1

      capitalize only at the beginning of a paragraph, which you don't have here)

      I live in Rome, and I've never seen lower-case Latin in inscriptions on monuments or anything more formal than sexual jokes graffiti on walls in Pompeii. AFAIK lower-case (which was a direct variation of the uppercase anyhow) might be used in vulgar inscriptions and possibly personal writing, but not in more formal texts.

      Romans would have written "EST STVLTI PERSISTERE IN ERRORE", or the same with a mid-dot (don't know how to write it here, if you use OS X, it's alt-.) between words. They also had a knack for abbreviating words which stuck in later ages, possibly because writing on marble isn't that convenient...

      As for case, stulti is genitive of the adjective stultus, -a, -um - dative would be stulto for masculine and neutral genders. Genitive is appropriate here, as the phrase really means "It is (a) fool's [habit|custom|characteristic] to persist in error."

      Besides, I do agree word order in Latin is variable according to rhetorical emphasis, etc...

    27. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Using est with a genitive and an infinitive is an idiomatic construction that means "It is characteristic of x to y." Literally the sentence means "It is of a fool to persist in error." It has no real subject any more than the sentence "It is raining." has a real subject.

      I also adopted modern capitalization conventions out of habit and I don't think I'm especially wrong for doing so since that's often done when printing lists of Latin one-liners.

      I think you're right about not putting est at the beginning of the sentence though. It doesn't change the meaning any, but reflecting on it further there would be no reason for the original author to put emphasis on est when it plays a minor role in the sentence.

      Searching a bit on Google led me to this site which refers to this construction as the predicate genitive. One of the examples they give is:
      "multa loqui stulti est" = "To say many things is characteristic of a fool."

      And in the other examples est comes at the end of the sentence. So following this mold, I'd think that
      "persistere in errore stulti est"
      would be more likely.

      But I suppose that also introduces an ambiguity as to whether stulti is a predicate genitive or if it modifies errore.

      Anyway, I saw the phrase in Gavin Betts' "Teach Yourself Latin" and I don't have my copy at hand (and now this will probably bother me until I go and hunt it down). It's a very good book that I would highly reccomend to anyone interested in learning Latin.

    28. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Who said anything about patching? Stupid troll shithead.

    29. Re:Another indictment of MySql by pete_m78 · · Score: 1
      This is the problem with open-source in general, if someone is just doing it for fun they aren't going to spend any time on the stuff they don't care about personally.

      Anti-Open Source FUD - doubly so, since MySQL is produced by a commercial company. MySQL's limitations are nothing to do with it being open source, although its popularity certainly is.

    30. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Perhaps issue here isn't MySql - maybe the write-back cache's on the hard-drives are enabled.

      No database will cope with that, they'll think data is commited and it isn't.

      Get a battery backed up RAID controller if you want reliable write-back caching.

    31. Re:Another indictment of MySql by Trogre · · Score: 1

      I'm sure the situation at hand will give a significant kick in the pants for transaction log development for MySQL.

      Or a massive shift to Postgres.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    32. Re:Another indictment of MySql by patriceCH · · Score: 1

      Correct me if I'm wrong, but MySQL does have a transaction log. It's called Binary Log and you can read about it in the documentation.

    33. Re:Another indictment of MySql by Rich0 · · Score: 1

      Someone PLEASE tell me why. Because right now the only thing I can think is that people just don't know how to pronounce "Postgres".

      I am using mysql for a few web apps - all FOSS out-of-the-box packages.

      The problem is that it seems like most developers haven't figured out how to support any database generically, and as a result you don't get the luxury of using whatever database you like.

      Now, if I were developing my own custom code I'd probably take a hard look at postgres. Right now I'm just stuck with what everybody else writes.

      I just don't get why people can't use generic SQL and figure out how to make their code database-independant. Most windows apps are written this way precisely so that you can develop it using an Access-database backend with zero data-protection, and then run it in production with some ultra-expensive Oracle setup.

    34. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      You are kidding me right? In .NET, any database provider must implement a standard set of interfaces and switching from one to the other is trivial. Of course if you have inline SQL or stored procs that need to be ported, that can be a bit of work. But from what I understand, MySQL uses some kind of bastardized SQL so I suppose that makes it more difficult too.

    35. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Did you read the article? Did you even read the summary? Did you even read THE TITLE??

      Whatever it has, it didn't work.

    36. Re:Another indictment of MySql by novakyu · · Score: 1
      I live in Rome, and I've never seen lower-case Latin in inscriptions on monuments or anything more formal than sexual jokes graffiti on walls in Pompeii. AFAIK lower-case (which was a direct variation of the uppercase anyhow) might be used in vulgar inscriptions and possibly personal writing, but not in more formal texts.

      Er... all inscriptions are in capitals---that much is indisputable. Handwritings, however, are mostly in lowercase (er...although the lowercase is probably a later invention... as is the case with Greek---in fact, older manuscripts (i.e. papyrus, etc.) are always in all-caps). So, either one has to pretend that he is copying an old manuscript verbatim, or he should make a correct transition to more modern representation---and not concoct some Anglo-Roman convention of capitalizing beginning of every Latin sentence (either all caps or all lowercase (except in particular circumstances as proper names or beginning of a paragraph), not anything in-between).

      Now, a question: so do you think that it is correct usage to begin this particular sentence with "est"? Is it possible for "est" at the beginning of sentence to mean anything other than "there is"? (which is quite different meaning than when "est" is used with subject-predicate)

      PS. Ah, now I recall the use: Genitive of Characteristic---a more common and less frivolous classification than Genitive of Military Accompaniment.

    37. Re:Another indictment of MySql by -brazil- · · Score: 1

      > But from what I understand, MySQL uses some kind of bastardized SQL

      As does Postgres. As does MSSQL. As does Oracle. I don't think there is a single DBMS in existence that doesn't have any proprietary "extensions" to SQL. Partially this is due to the earlier SQL standards missing vital features such as sequence/autoincrement, partially due to DB makers wanting to outshine the competition with features that may or may not be useful.

      --

      The illegal we do immediately. The unconstitutional takes a little longer.
      --Henry Kissinger

    38. Re:Another indictment of MySql by stienman · · Score: 1

      Because right now the only thing I can think is that people just don't know how to pronounce "Postgres".

      Gesundheit.

      -Adam

    39. Re:Another indictment of MySql by Coryoth · · Score: 1

      Databases that I know of that have a transation log etc. such that sudden power outages etc. can, at worst result in the last transaction failing, but no database or table corruption:

      PostgreSQL
      Sybase ASE
      Sybase IQ
      Oracle
      NCR Teradata
      DB2
      MS SQL

      I'm sure there are more. Why is everyone bringing up MS SQL server? It isn't that much less of a toy than MySQL. Sure, it does transactions, but compared to Oracle, ASE, DB2 and Teradata it is lightweight.

      Jedidiah.

    40. Re:Another indictment of MySql by gregfortune · · Score: 1

      Have I been missing something all this time?

      http://dev.mysql.com/doc/mysql/en/binary-log.html

    41. Re:Another indictment of MySql by stephenbooth · · Score: 1

      last year I had to help port PHPBB from using MySQL to Oracle. The biggest problem we found wasn't that the SQL needed huge amounts of editing (just some stuff about making join types and setwise operations work right as I recall), it was data types. Oracle uses SQL92/ANSI datatypes where as MySQL doesn't so you end up having to change types of fields to their nearest ANSI equivalent, sometimes this would leave the SQL working, other times it caused the SQL to need editing.

      Sometime ago I read about the origins of MySQL. It was essentially written for a very specific purpose, high volume datawarehouse where if the data got screwed it could be just reloaded from the source files. This caused decisions to be made, correctly for that purpose, that are now comming back to haunt it when it's used for other purposes.

      Stephen

      --
      "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
    42. Re:Another indictment of MySql by Jamesday · · Score: 2, Insightful

      Depends on the cause. If the database server software was being lied to by the OS, controller or drives I'm not sure just how much I'm inclined to blame the database server sofware.

      I am inclined to ask the database server vendor to see if they can find ways to protect against it and I've briefly discussed that already.

    43. Re:Another indictment of MySql by Jamesday · · Score: 1

      The problem is working out why it didn't do what it is capable of doing and did on one system. Did the grandparent really expect the database to survive OS, controller and/or drives all lying about what they have committed to disk? That's the sort of issue we appear to have.

      It is something which is worth trying to protect against.

      If you haven't seen it already, take a look at the results from LiveJournal's testing

    44. Re:Another indictment of MySql by mbaciarello · · Score: 1

      or he should make a correct transition to more modern representation

      Not sure what you mean... Leaving all caps aside, what would have been your proposed correct capitalization for the phrase?

      Now, a question: so do you think that it is correct usage to begin this particular sentence with "est"?

      That's a very good question, and at first glance I'd say you're right: it's not. All examples I could come up with include "est" in your first sense (est modus in rebus, for example.)

      However, I've also been trying to rearrange that sentence, and it just doesn't sound right if you move the "est." I should note, though, that I'm not a Latin expert (as you have surely guessed), it's just that you're exposed to five-year classes in our high schools, with a lot of translation. And you don't get to choose whether to take Latin or something else.

      I would say that it is in fact correct usage, when you're stressing the characteristic more than the subject of the phrase, in an absolute sense (not comparing or contrasting to anything else.)

      Even in English, or Italian for that matter, I'd say "It is foolish to run Windows Me on a Beowulf cluster," as opposed to "Running Windows Me on a Beowulf cluster is foolish," if I were to match the rhetorical style of the original Latin sentence. In Italian, where you don't need to explicitly use subject pronouns for all verbs, the analogy is even closer.

      "Stulti est persistere in errore" would certainly be acceptable, but at least to me, it would sound more like stressing "stulti" as opposed to someone or something else -- as in replying to some other statement, commenting or adding to it, etc...

    45. Re:Another indictment of MySql by GermsFromSpace · · Score: 1

      ... because many folks here are talking out their arse (i.e. regurgitating marketing info they've read/heard) and haven't really experienced or tried any of the bs they're shovelling

    46. Re:Another indictment of MySql by Heikki_Tuuri · · Score: 1
      Hi!

      Actually, MySQL/InnoDB has two transaction logs. The write-ahead log in InnoDB's log files is used in crash recovery. MySQL's 'binlog' is used in point-in-time recovery from a backup.

      Regards,
      Heikki
      Innobase Oy
    47. Re:Another indictment of MySql by Qubit · · Score: 1

      neat.

      That was something that I wanted for a couple of my projects (full text search on databases), but I hadn't yet bothered to go looking for what offerings there were for MySQL and PostgreSQL.

      Further proof that Slashdot is all you need®.

      --

      coding is life /* the rest is */
    48. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?

      Yes, Jamesday, if *one* database in a cluster is not corrupted after a power failure, it obviously means that the database management system is reliable... Let's just ignore that *every single one* of the live database servers failed, let's concentrate on the one that was off-line and was not processing any updates while the power went down. The entire cluster must have had hardware problems, all of them at the same time, even though the hardware was very different among them. I'm sorry, Jamesday, but are you joking? I really hope so. Because I've done a lot of contributions to Wikipedia under the assumption that it is run by competent people. And let me explain it if it isn't obvious: if a database gets corrupted because the controller didn't write everything to disk it means that THE DATABASE WAS NOT IN A CONSISTENT STATE DURING THAT OPERATION. And "consistent" is what the "C" in "ACID" stands for. Please don't make dumb excuses that one off-line server survived so the toy database you are advocating is good. This is utterly laughable if not downright insulting for anyone who is even a remotely competent DBA. I doubt you will even bother replying to this comment. Why bother when your posta are modded up anyway? Facing the fact that your favourite toy database is not to be used for serious data, you answer that the off-line server survived! What a joke.

    49. Re:Another indictment of MySql by Jamesday · · Score: 1

      The one which we used was "offline" only in the sense that we never use it for end user requests because it's used for bulk reporting, backup and apache web server work. It was applying transactions at a rate limited by its disk speed.

      I expect the mod points will be used as deemed appropriate by those using them, on the basis of their understanding the merits of the posts concerned.

    50. Re:Another indictment of MySql by novakyu · · Score: 1
      Not sure what you mean... Leaving all caps aside, what would have been your proposed correct capitalization for the phrase?

      All lower case. It's probably to some degree (who am I kidding---to a great degree) grammar nazi-ish, but, well, that's how much I liked seeing Latin sentences (or quotations) in all lower case---looks more hip than English quotes, for some reason. But, I guess this convention may not be universal.

      BTW, thanks for the detailed explanation about the word order---the original quote is probably just as sensible as they can come, as you said (normal, unmarked word order would be subject + object + verb, but i'm not sure how "persistere in errore", which I thought was acting as the subject, might be treated same as regular noun subjects...). Well, leaving that aside, I should take up more Latin while I'm in school---'been neglecting it too long while I was studying other languages.

    51. Re:Another indictment of MySql by fimbulvetr · · Score: 1

      I wasn't trying to make a point with file systems (I personally run reiser and xfs), I was simply trying to point out to these ignorant fucks who just read marketing brochures that being acid compliant (and even, gasp, using postgres) still doesn't protect against this:

      http://slashdot.org/comments.pl?sid=140219&cid=117 50780

      But, considering they're mostly ACs, it's probably just the same guy and he doesn't have enough balls to put his name behind it.

    52. Re:Another indictment of MySql by jack_csk · · Score: 1

      Oh really? If you had never deal with database corruption on mssql due to power outage, you are not trying hard enough.
      Trust me, I was lucky that I found the decent backup of the master database file during one of my recovery.

    53. Re:Another indictment of MySql by jack_csk · · Score: 1

      Ya, my Windows 2000 and Windows XP are NOT free, yet I don't feel what I pay for... And you are equating other free DBMS such as PostgreSQL to MySQL? I bet you pardon me.

    54. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      The one which we used was "offline" only in the sense that we never use it for end user requests because it's used for bulk reporting, backup and apache web server work. It was applying transactions at a rate limited by its disk speed.

      First of all, what does "applying transactions" mean? Did the database process inserts and updates "at a rate limited by its disk speed" or was the vast majority of those transactions read-only? Even if that was indeed writing data across the entire database at the speed you describe and it survived the unexpected power outage, then it is NORMAL and EXPECTED for any ACID-compliant database. This is how EVERY server should deal with any power failure AND harware failure resulting in not writing all of the data during writing. THAT is why any serious DBMS don't expect complex disk writes (by which I mean changing more than one byte) to be atomic, precisely because power/controller may fail DURING such an operation, after writing some, but not all of the data. We are talking about problems solved DECADES ago. I will not believe you are not aware of it, because you would have to be completely incompetent as a DBA, and I know you are not. So let's stop this farce and please finally answer this question once and for all: why do you not only state that the choice of using MySQL as the Wikimedia backend was wise but actually mislead people into thinking that one server in the cluster surviving a power outage resulting of hours of downtime was a success proving that MySQL is reliable? Why are you so strong about your opinion that MySQL is better suited for the task than PostgreSQL despite the facts proving otherwise? Because, let's face it, this is basically how all of your posts in this thread can be summarized, and this is what most of Slashdot readers (and Wikipedia users) would want to finally read about.

      I expect the mod points will be used as deemed appropriate by those using them, on the basis of their understanding the merits of the posts concerned.

      You must be new here.

    55. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      I modded up 5 of his posts, and will continue doing so if I feel like it.

    56. Re:Another indictment of MySql by novakyu · · Score: 1
      But I suppose that also introduces an ambiguity as to whether stulti is a predicate genitive or if it modifies errore.

      I think the ambiguity is partly resolved by the fact that, well, a sentence that goes like, "persisting in error of a fool is" doesn't make sense (with the "be" verb, either it should me "there exists" or it needs a predicate), as I think "persistere" (and the whole phrase dependent on it) is the subject of the sentence, rather than the sentence being an impersonal (as the English example you cited is) construction.

      But, thanks for the good explanation---now my question is, why are you posting AC? (because it's off-topic?...)

    57. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      You only prove my point. Most of mods moderate "if they feel like it" and a guy being a Wikipedia DBA whom this story is all about is sure to get all of his posts modded up, even if he's just saying "one of our servers in the cluster survived a simple power outage so MySQL is a reliable RDBMS with real ACID." He didn't even address my arguments, read this entire thread instead of only following his posting history and modding up whatever he says. So far I have metamoderated two positive mods done to his posts as unfair. I will metamoderate all of them as unfair and if I only get mod points I will go back here and mod them down as overrated. All of them.

    58. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Did the grandparent really expect the database to survive OS, controller and/or drives all lying about what they have committed to disk?

      Yes!!!

      For God's sake, have you ever worked with a real database? If the OS, controller and/or drives never lied about what they had committed to disk you wouldn't need to implement your own techniques to ensure atomicity of transactions, would you? (Hint: "A" in "ACID" means "atomicity").

      It is something which is worth trying to protect against.

      You bet it is! That's why it has been protect against for decades in real DBMSs.

    59. Re:Another indictment of MySql by Anonymous Coward · · Score: 1, Informative

      > > Any real database server, which MySQL is most assuredly not, can guarentee data integrity since the last COMMIT.

      > Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?


      There is a huge difference between being capable of doing something and guaranteeing it. There is a difference between "sometimes, maybe, if you're lucky" and "always". I'm surprised that you can't see it. (Who the hell is moderating this thread, anyway?)

    60. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Depends on the cause. If the database server software was being lied to by the OS, controller or drives I'm not sure just how much I'm inclined to blame the database server sofware.

      If the database server software was being lied to by the OS, controller or drives, then it should still be in a consistent state! You might lose the last transaction or two and you might need to replay them from the log, but the database should NEVER get corrupted because of such a trivial failure. NEVER. As soon as you stop playing with toys and start working with real databases, you will exactly understand what ACID is all about and why it can save your ass in cases like this. I know that Wikipedia is just a hobby project and you are just a volunteer, but if something like that happened in the bank I work in, I would INSTANTLY lose my job and would never find a serious bank DBA job again. Wake up, the hardware errors you talk about (even if indeed every single server in the cluster had experienced them, which is highly doubtful) and much more serious ones were happening ten years ago, twenty years ago, thirty years ago. Crappy hardware is nothing new. (In fact, I would literally kill for so reliable hardware thirty years ago!) That's why real DBMSs ensure that in the case of such failures, the database is not corrupted. Use real RDBMS like everyone else and just get over it. Otherwise you sound like a kid who has just discovered MySQL and thinks that databases are something new and his toy is something great. This is just stupid.

    61. Re:Another indictment of MySql by Tough+Love · · Score: 1

      Depends on the cause. If the database server software was being lied to by the OS, controller or drives I'm not sure just how much I'm inclined to blame the database server sofware.

      The database server software was not being lied to by the OS. If the controller in the machine tested good after coming up and the memory tested good, then the OS was not being lied to by the hardware. Admit it, it was MySQL, just like a dozen experts have already told you.

      The chance that it was anything else is vanishingly small. By the way, I develop fault-tolerant systems for a living. I'm sure many of the others who have offered their opinions are similarly qualified. Particularly, read the post of the AC who responded to you, it's an accurate summary of the situation.

      Please, let's not entrust all the world's knowledge to a toy database.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    62. Re:Another indictment of MySql by Jamesday · · Score: 1

      Here's a portion of a report on the tests LiveJournal did. The chance isn't that small, in part because we have similar equipment, much from the same supplier. There's also a known OS glitch which is a possible factor, though this test doesn't cover that.

      ----

      The client picks random 16kB-aligned offsets on the partition and picks a random 32-bit number which it writes in hex (%08x) over a 16kB range. it reports to the spewserver both BEFORE and AFTER the disk write.

      -- the server notes what the client said it was about to do and what it reported doing.

      -- let it run for awhile....

      -- Pull the power...

      -- server notices client hasn't sent anything in 3 seconds, quits, writing out a map of what 32-bit number pattern should be at each sector.

      -- power on server

      -- copy map file laptop (spewserver) to the server, run spewclient in verify mode. it dumps a histogram of errors per seconds-before-powerloss:

      Histogram of seconds before end:
      3 31
      4 7
      5 1
      65 1

      Well, the 3 seconds is really because the "end" is considered time AFTER the 3 second timeout, so that's kinda a bug. That should read 0,1,2 seconds before, not 3,4,5. But see how there are 31 regions that are bogus at t=0, 7 at t=-1, and 1 at t=-2?

      That means something was lying, and we don't buy that hardware until we get it configured so it doesn't lie.

      ----

      As you're probably aware, a system starting up doesn't tell you whether the disk drives are or aren't caching writes. You're probably also aware that some drives and/or controllers and/or drivers have been known to ignore flush requests and cache anyway.

      Now, it is possible to design a database so that it can handle such failures in the rest of the system. I discussed that with MySQL while we were in the early stages of recovery to ensure that they were aware of this issue.

      It's not only MySQL who are going to get such an approach from me. Two RAID controller vendors are going to as well, since the RAID controllers are supposed to be ensuring that the data is safely written.

      The approach I take when asking a vendor to add a feature doesn't include pitching one of their competitors to anyone reading about the incident.

    63. Re:Another indictment of MySql by Tough+Love · · Score: 1

      As you're probably aware, a system starting up doesn't tell you whether the disk drives are or aren't caching writes. You're probably also aware that some drives and/or controllers and/or drivers have been known to ignore flush requests and cache anyway. Now, it is possible to design a database so that it can handle such failures in the rest of the system. I discussed that with MySQL while we were in the early stages of recovery to ensure that they were aware of this issue. It's not only MySQL who are going to get such an approach from me.

      This is very simple. Test the drive or raid controller. See if it lies about cache flushing. If it doesn't, then it was MySQL's fault. I bet you five bucks it was MySQL's fault.

      Don't forget that a whole bunch of people who know what they're talking about told you it was MySQL's fault, and why they think so.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    64. Re:Another indictment of MySql by Anonymous Coward · · Score: 0

      Even if it's one guy, I agree with him. That makes two. You're stupid.

  10. mysql bad at disaster recovery? by bdigit · · Score: 4, Interesting

    This is not a troll or a flame at all but between this and the livejournal servers, it sure sounds like hell if your mysql servers ever go down unexpected.

    Is mysql the only dbase like this or does postgres get corrupted as well during unplanned downtime? If I recall from using MSSQL servers , we never had a problem like this. We would simply reboot the servers and not worry about tables being left in unrecoverable states. Please correct me if I am wrong though.

    Is there any way around this or will this always be a problem with mysql?

    1. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 1

      I think it's probably a problem with write caching. Most hd manufactures turn it on, but it causes problems in exactly that kind of situation.

    2. Re:mysql bad at disaster recovery? by cnettel · · Score: 1
      It is only a problem if the OS/RDBMS config won't be wise enough about what is written when and how they enforce flushing of proper parts. Or, of course, the HD blatantly ignores any hint to force a flush operation and so on.

    3. Re:mysql bad at disaster recovery? by YU+Nicks+NE+Way · · Score: 4, Insightful

      There's a simple way around this: stick to PostgreSQL, MSSQL, Oracle, DB/2, or some other real database. MySQL doesn't make the grade, precisely because things like this can happen.

    4. Re:mysql bad at disaster recovery? by ctr2sprt · · Score: 5, Interesting
      We have a similar problem at work. There we don't endure database corruption, we just get broken replication. It appears to be working, but it actually isn't. So we have to take the master offline (actually just acquire a write lock on the DB, it can still answer SELECTs), tar up its (massive) database, scp it to the slaves, start the master, stop the slaves, untar the database, restart the slaves, and restart replication. The entire process can take several hours and it's easy to make mistakes. We put stickers on our MySQL servers saying "DO NOT REBOOT WITHOUT CONTACTING OPS MANAGEMENT," though unfortunately faulty DIMMs are illiterate.

      I don't know if PostgreSQL has similar problems, but I very much doubt that Oracle or DB2 do. I know that improved failover support has been a target of the PSQL developers for a little while now, so while it may not be on par with Oracle and DB2 it's probably closer than MySQL. At least for now.

      I wish this had prompted management to consider alternatives to MySQL, at least for our mission-critical database servers, but unfortunately it hasn't. They don't even see that we could sell an enterprise-level RDBMS as a significant feature - we're a webhosting company - and charge through the nose for it. Oh well. They don't listen to peons like me, they just make me fix MySQL replication every two weeks.

    5. Re:mysql bad at disaster recovery? by jsight · · Score: 1

      I don't know if this is a MySQL weakness or bad admin work.... but I do know that this does not happen on Postgres.

      We've had a many a postgres running machine suffer through unplanned reboots without anything more complicated after the restart than deleting a lock file.

    6. Re:mysql bad at disaster recovery? by fimbulvetr · · Score: 0, Flamebait

      UHH...Oracle replication is MUCH more tedious, time consuming and error prone than mysql. It's also almost a guarentee that any loss of an oracle server will require recovery. Mysql can handle reboots well.

      Not sure about DB2, but I'm pretty sure postgres is more difficult, considering it's not even native to the software (Last time I checked).

      BTW, write a damn script. Mysql was written for unix, unix thrives on scripts. If you can't handle writing a script, why the hell are you a DB admin?

    7. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Postgresql is more hardened against powerloss because transactions are not just posted against the DB but onto a separate "write-after-log". The major DBs also use similar techniques -- I think Oracle even has the option to use 3 separate "redo" logs.

      That being said, the fact that both LJ and Wiki seems to need to require hours and hours of checking says they configured for speed instead of data integrity. Turn OFF write caching, turn ON full journaling and you should have much better luck with MySQL.

    8. Re:mysql bad at disaster recovery? by fimbulvetr · · Score: 1

      I don't think you _know_ this doesn't happen on postgres. I do think, however, you know this doesn't happen to you/your company on postgres _and_ you couldn't possibly hope to match how much i/o wiki(media|pedia) does in an hour.

    9. Re:mysql bad at disaster recovery? by ctr2sprt · · Score: 4, Interesting
      Mysql can handle reboots well.
      No. It can't. We have two concrete examples in this very page - one provided by Wikimedia, one provided by me - which directly contradict your statement. Maybe under some circumstances MySQL can handle reboots, but it's been proven already that it can't always do so. Perhaps your MySQL experience is not with high-load applications (at least not the level of load Wikimedia and my employer see).

      BTW, write a damn script. Mysql was written for unix, unix thrives on scripts. If you can't handle writing a script, why the hell are you a DB admin?
      Because the process doesn't lend itself well to scripting. For example, MySQL automatically releases locks when you close your connection to the DB. Presumably this is to avoid deadlocks and for other good reasons, but it's not trivial to write a script to do that. Also, since this is an important system, we don't like the idea of trusting computers to handle its repair: we want someone knowledgeable monitoring every step in case something doesn't work exactly right. I can of course sit there and watch the script do its thing, but that defeats the purpose of scripting the process in the first place.

      Regardless, the difficulty of the task is not the main issue. The main issue is that we are dealing with north of 1GB of data here, and on busy servers on a busy network that means restarting replication takes an hour or longer. So not only is performance reduced by 33% when we take the slaves offline one at a time, performance is reduced further by the traffic of tar/scp in the background. Not to mention the fact that, because we have a lock on the master's DB, so you can't even consider the DB cluster fully functional.

    10. Re:mysql bad at disaster recovery? by fimbulvetr · · Score: 2, Insightful

      I'd rather just agree to disagree on this one, at this point it's all just what we have observed. It heavily depends on the situation, how the db is setup, etc.

      As far as the script, yes, it does have locks, and rightly so. It's not terribly tough to write a lock aware script. In my opinion, the replication setup is extremely easy to script. I'd much rather script it than sit in front of the console. Once I see it work, I know it will work every time, and I won't worry about something like me or a peer mistyping the server-id at 4:00am. Even at 20GB, it can't be terribly long at 100Mb/s.

      You only need the lock on the master while you're tar'ing the snapshot for distribution to the other servers. Once it's tar'd, unlock master, gzip, redistribute, tar zxvf, setup slave and it will catch up.

    11. Re:mysql bad at disaster recovery? by Dachannien · · Score: 4, Funny

      So we have to take the master offline (actually just acquire a write lock on the DB, it can still answer SELECTs), tar up its (massive) database, scp it to the slaves, start the master, stop the slaves, untar the database, restart the slaves, and restart replication.

      You forgot the part where you have to take the chicken across first, because the fox won't eat the grain if you leave them alone.

    12. Re:mysql bad at disaster recovery? by artifex2004 · · Score: 1
      We put stickers on our MySQL servers saying "DO NOT REBOOT WITHOUT CONTACTING OPS MANAGEMENT," though unfortunately faulty DIMMs are illiterate.


      Yikes. Surely new DIMMs are cheaper than lost productivity and overtime, over a year's budget?
      Plus, who knows, maybe they can amortize them?
    13. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Off the top of my head I can think of a trivial solution. I'm sure that real databases use something more sophisticated but basically for each record you write to disc you first zero out a particular byte, write the record, and then set the byte to a particular value to indicate "yes I really mean this." If the byte isn't set, the record is ignored.

    14. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      I'd rather just agree to disagree on this one

      That's mighty generous of you, considering as how you're complete full of shit and all. MqSql is a miserable toy used by morons who don't know any better.

    15. Re:mysql bad at disaster recovery? by Tough+Love · · Score: 2, Informative

      The main issue is that we are dealing with north of 1GB of data here

      Nice post. I'd just like to add that Wikipedia deals with north of 170 GB, not counting images.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    16. Re:mysql bad at disaster recovery? by darkpixel2k · · Score: 2

      We're a bunch of GEEKS. Why hasn't anyone invented some sort of device that could keep power going to a computer during a power outage. Maybe some sort of...battery backup? We could even give it a cool name like "Uninteruptable Power Supply"--or even a cool acronym like UPS.

      We could even attatch circuitry to this 'UPS' that could send a signal to the computer when the power goes out or the battery runs low. If we work hard, we could even get this circuitry to give us an estimated time before the battery is totally dead. We could call this new technology 'smart signaling'.

      Wow--this would totally enable some cool features on the computer side. You could send an email, or even a page to the responsible server admin. You could even...
      ...wait for it...
      SHUTDOWN THE F*CKING DATABASE IN A SAFE MANNER!

      Ok--I understand the deal with LiveJournal and the EPO device--but why should a tripped circuit breaker cause a hard shutdown of a server? Seriously. Battery Backup people!

      Even my home computer does that. Power goes out, five minutes later the box begins shutting down...

      --
      There's no place like ::1 (I've completed my transition to IPv6)
    17. Re:mysql bad at disaster recovery? by SinaSa · · Score: 1

      If this is the case, then can you explain why MySQL managed to complete the recovery?

      I'm on #mediawiki on freenode, and one of their servers (the others are coming up now) "made the grade" as you put it.

      --
      --
      The last digit of pi is four.
    18. Re:mysql bad at disaster recovery? by MightyMartian · · Score: 1

      And the fact that you can't spell "MySQL" doesn't in any way detract from your opinion, of course.

      --
      The world's burning. Moped Jesus spotted on I50. Details at 11.
    19. Re:mysql bad at disaster recovery? by sarahemm · · Score: 3, Informative

      A lot of datacentres don't allow UPSes within customer enclosures, as even if the EPO is triggered they keep supplying power which can be dangerous for fire/rescue crews. I'm aware this wasn't an EPO situation AFAIK, but the rules still apply.

    20. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      And the fact that you can't spell "MySQL" doesn't in any way detract from your opinion, of course.

      No, a typo does not detract from my opinion, fucktard.

    21. Re:mysql bad at disaster recovery? by sterwill · · Score: 3, Informative
      I know that PostgreSQL uses write-ahead-logging so it can avoid exactly these kinds of problems. It doesn't matter how much I/O PostgreSQL is doing; all writes go to the log. If the machine crashes, it replays the log file up to its most recent write. Worst case: data that was in the process of being appended to the log when the machine crashed didn't get flushed to disk, and that last transaction is lost. No tables are corrupt. No 6+ hour delay getting back online.

      You would know this to if you had read the PostgreSQL documentation.

    22. Re:mysql bad at disaster recovery? by Leo+McGarry · · Score: 1

      Fantastic post. So many comments around here attempt to be funny but are just dumb repetitions of in-jokes that were stupid the first time they were told. Your comment is literate and hilarious. Thanks for posting it.

    23. Re:mysql bad at disaster recovery? by normal_guy · · Score: 1

      Mine requires 10min. of UPS. One plug of dual power supply is on colo power, one is on my UPS.

      --

      Linux: Free if your time is worthless.
    24. Re:mysql bad at disaster recovery? by TheNarrator · · Score: 4, Informative
      PostgreSQL is far superior to MySql in it's disaster recovery ability, namely WAL (Write Ahead Logging). I've been using PostgreSQL since version 7.0 came out and I've never had it fail to come back up on me after any power outage or reset.

      http://www.postgresql.org/docs/8.0/interactive/wal .html

    25. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      So? That's your colo. Notice that the OP said "Many".

    26. Re:mysql bad at disaster recovery? by Jugalator · · Score: 2, Insightful

      They were lucky with that server?

      I mean, if a few servers' databases survived, that may speak more of random luck of not being in a status so when the power outage occured nothing bad happened. If all of the databases survived, that speaks of MySQL being resistant to this sort of thing.

      --
      Beware: In C++, your friends can see your privates!
    27. Re:mysql bad at disaster recovery? by NonSequor · · Score: 1

      I've always been curious. How do you domesticate a fox?

      --
      My only political goal is to see to it that no political party achieves its goals.
    28. Re:mysql bad at disaster recovery? by 4A6F656C · · Score: 1

      A proper database server, one that is fully ACID complient, will successfully recover from a power failure, as it will replay the logs in order to undo or redo changes that were occuring when the power outage happened. This is typically achieved using Write Ahead Logging (WAL). PostgreSQL, Oracle, MSSQL and other such RDBMS systems are capable of this. MySQL with InnoDB comes close, but not quite. I believe that MySQL MaxDB (which was SAPDB) is fully ACID complient, if you wanted to stick with MySQL.

    29. Re:mysql bad at disaster recovery? by darkpixel2k · · Score: 1

      True. Although I don't have one for my home UPS, many of the larger server/datacenter UPS' have the ability to hook into an EPO.

      It just seems stupid to have power problems now-a-days with computers.

      --
      There's no place like ::1 (I've completed my transition to IPv6)
    30. Re:mysql bad at disaster recovery? by DAldredge · · Score: 1

      Lots of money and fancy cars.

    31. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 1

      Or, of course, the HD blatantly ignores any hint to force a flush operation and so on.

      This is a very real problem with MOST desktop-style ATA drives.

      It's possible that they built a bailing-wire server and didn't confirm that the hardware was adequate for database use. Use SCSI kids!

    32. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      As much as I hate to make a pop culture reference that's less than 5 years old I've got to say that this reminds me of the scene in Anchorman where Ron and Veronica are arguing over what San Diego means. First Ron makes up some bullshit story about what it means and then she calls him on it and he admits he was just trying to impress her and that no one knows what it really means. Then she says, no it means Saint Diego. And after a bit of this Ron quietly says they'll have to agree to disagree.

    33. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      In a perfect world, perhaps. Meanwhile, in reality, all of those CAN be become corrupt.

      Ouch, I just recalled a story of a corrupted MSSQL at work. Same story. We got a freakishly good consultant in who fixed the thing, so it seemed it was not a hardware problem.

      His first reaction was "you haven't followed the on screen recovery instructions, have you?", "no...", "good! in that case there wouldn't have been any data left to recover!".

    34. Re:mysql bad at disaster recovery? by Cramer · · Score: 1

      Pulling the power to a database server can result in lost data and damage to the data files. Some databases are more resilient than others, but corruption can occur in just about any database. Sybase (various versions) will refuse to start a db that was not shutdown correctly -- you either have to force start the db or check it first, even when there's a transaction log. I've personally seen postgres databases permanently corrupted by a power outage [the power supply blew up, fwiw.] Even Oracle isn't immune to backhoe's, however corruption is pretty rare *grin*.

      Battery backed hardware RAID is good insurance, but "these things happen." And with the db traffic volumes of wikipeida, it's much more likely to happen because data is constantly being written to disk. The choice of db engine really doesn't make a huge difference.

    35. Re:mysql bad at disaster recovery? by JollyFinn · · Score: 1

      >Is mysql the only dbase like this or does postgres get corrupted as well during unplanned downtime?
      >If I recall from using MSSQL servers , we never had a problem like this. We would simply reboot the
      >servers and not worry about tables being left in unrecoverable states.

      Okay lets put it in other words. The database designed to ONLY run on MS windows can handle system crashes FAR better than database that is designed for unix mainly. Hmmm. Is there are reason for this? Can anyone quess why its the case?

      --
      Emacs is good operating system, but it has one flaw: Its text editor could be better.
    36. Re:mysql bad at disaster recovery? by Cramer · · Score: 1

      Things like this can happen to any database. Anyone who will claim otherwise doesn't have the experience to be making such a claim. Corruption is possible with any database.

      MSSQL? You're joking, right? If you knew anything about wikipedia, you'd know how laughable that suggestion is. Oracle would be nice, but I'm not sure they'd hand over a license for free -- and oracle is certainly not a hands-free db engine.

    37. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Talk about small potatoes. I work for a small organization and I manage Oracle databases that start at 500GB and go up from there. I wouldn't trust them to anything but Oracle. It's recovery ability is second to none.

    38. Re:mysql bad at disaster recovery? by gregorio · · Score: 1
      MSSQL? You're joking, right? If you knew anything about wikipedia, you'd know how laughable that suggestion is. Oracle would be nice, but I'm not sure they'd hand over a license for free -- and oracle is certainly not a hands-free db engine.
      Your hate for Microsoft/love for open-source is turning you blind.

      Real databases can corrupt data, of course. The last batch of transactions is the most vulnerable to corruption in case of server failure.

      But, in a real database, it is not going to completely destroy the database as a whole, just the last pieces of data. Things will just keep working, you'll only be out of a small fraction (the last one) of your data.

      About MS-SQL not being able to handle Wikimedia, you should first look at tpc.org before making false assumptions based on wishful toughts of your anti-MS zealotry dream. And you must not forget that they currently use MySQL, wich is inferior to MS-SQL in all possible characteristics.
    39. Re:mysql bad at disaster recovery? by sydb · · Score: 1

      Unix doesn't "crash" so they didn't need to make MySQL crash-tolerant. Windows on the other hand....

      Do I win a prize?

      --
      Yours Sincerely, Michael.
    40. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Yes, they were really lucky.

      From what I read in a mailing list message, the particular server that survived was offline for maintenance.

      So that's why it survived.

    41. Re:mysql bad at disaster recovery? by Sweetshark · · Score: 1

      The last batch of transactions is the most vulnerable to corruption in case of server failure.
      Even MySQL can do transactions now.

    42. Re:mysql bad at disaster recovery? by Jamesday · · Score: 5, Informative
      >>Can anyone quess why its the case?

      Easily. See what those saying that MySQL can't do what MySQL does are promoting.:)

      LiveJournal found that it had some disk systems which lied about having committed writes. The have a preliminary tool which copies what it's writing to disk to a networked system and then compares the after power off and recovery state to what the disk system said it could do. Are going to make it available to the community as time allows.

      I expect we're going to find the same at Wikipedia. Here's a pretty typical error log, this one from the server which was master database server:

      050222 5:11:12 InnoDB: Database was not shut down normally.
      InnoDB: Starting recovery from log files...
      InnoDB: Starting log scan based on checkpoint at
      InnoDB: log sequence number 303 1283776146
      InnoDB: Doing recovery: scanned up to log sequence number 303 1289018880
      InnoDB: Doing recovery: scanned up to log sequence number 303 1294261760
      InnoDB: Doing recovery: scanned up to log sequence number 303 1299504640
      InnoDB: Doing recovery: scanned up to log sequence number 303 1304747520
      InnoDB: Doing recovery: scanned up to log sequence number 303 1309990400
      InnoDB: Doing recovery: scanned up to log sequence number 303 1315233280
      InnoDB: Doing recovery: scanned up to log sequence number 303 1320476160
      InnoDB: Doing recovery: scanned up to log sequence number 303 1325719040
      InnoDB: Doing recovery: scanned up to log sequence number 303 1330961920
      InnoDB: Doing recovery: scanned up to log sequence number 303 1336204800
      InnoDB: Doing recovery: scanned up to log sequence number 303 1341447680
      InnoDB: Doing recovery: scanned up to log sequence number 303 1346690560
      InnoDB: Doing recovery: scanned up to log sequence number 303 1347688389
      InnoDB: 1 transaction(s) which must be rolled back or cleaned up
      InnoDB: in total 14 row operations to undo
      InnoDB: Trx id counter is 1 935480064
      050222 5:11:13 InnoDB: Starting an apply batch of log records to the database...
      InnoDB: Progress in percents: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 InnoDB: Database page corruption on disk or a failed
      InnoDB: file read of page 8617985.
      InnoDB: You may have to recover from a backup.
      050222 5:12:20 InnoDB: Page dump in ascii and hex (16384 bytes):

      Observe that the database engine went back to its last checkpoint, noticed the partial transaction and undid it and was rolling ahead in the write-ahead log when it encountered a database page which failed its checksum test. That failed checksum test is why I think it's a problem with the disk system lying about what was written. You can get that when a database page spans two drives in a stripe set and one has committed the update while the other hasn't.

      In more typical situations MySQL simply applies the updates and all is well. I've had a server set up to exceed RAM with swap turned off and get killed every ten minutes for hours and recover every time.

      Just to be complete:
      • The database servers have dual redundant power supplies. TWO breakers at the colo tripped, taking out both.
      • The systems are a mix of SCSI and SATA, so no point in arguing about one being lousy. SATA and Linux win if you want a winner: it was a SATA box using Linux RAID 0 whoch completed full recovery. It wasn't one of the normal servers - it was used for backup and offline report generation.
      • Two different disk controller makers, one each for SCSI and SATA.
      • Battery backed up write cache on most of the main server disk controllers but the one without the battery backup for the write cache had the same problem (which shouldn't surprise anyone - that one should be expected not to recover well).
      • After LJ's experience we were after UPS systems in the racks but hadn't yet checked whether the local fire code allows them. Some don't, for el
    43. Re:mysql bad at disaster recovery? by neitzsche · · Score: 1

      Clue me please: how does one tell (on a running system, long after power on self test) that a DIMM has gone faulty?

      --
      "God is dead." - Frederik Nietzsche
    44. Re:mysql bad at disaster recovery? by Cramer · · Score: 3, Informative

      You mis-read the comment. (and again, know little about wiki's setup) The point is, there aren't any Microsoft Windows boxes in the cluster. And I don't expect the Wikimedia Foundation to approve the cash outlay for buying windows and mssql licenses for the number of system currently serving as database engines. Plus there's the complexity of those boxes not just being db servers, and the fact that none of the admins are anywhere near the actual hardware -- remote management of windows boxes is not something I would recommend (and not something easily done via ssh.)

      MySQL is not inferior in all possible characteristics... MSSQL is a windows only product. I cannot run it on Solaris, OS X, AIX, Tru64, linux, etc. It, thus, loses on that characteristic. Wiki is not a windows shop, so stop wasting your time suggesting a windows product. The cluster is running linux. Bring linux software to the table and we'll talk. Wiki uses mysql because it's free and it's fast. My suggestion would be Oracle, but it's most certainly not cheap or free.

    45. Re:mysql bad at disaster recovery? by GermsFromSpace · · Score: 1

      lol ... I don't pretend to understand disk technology down to the electron level but ... I feel I can safely say that if you were to say, run a magnet over the surface of one of the disk's platters then any database would have a problem. This is obviously an extreme example but my point is that hardware trumps software. Does anyone know *exactly* what happened physically inside the server when the power went off? (uber h/w geeks please don't waste your time) The correct answer is 'NO'. Therefore to start making generalisations about the robustness of the db s/w used in the server (much less about db software not used in the server) seems presumptuous at best. As for the parent of this message ... Sure, you can implement large scale db solutions in MSSQL. Just spend a significant amount of money on the db software. Have some dork who's devoted his life to M$ and has the time/inclination to decipher the inner workings of the bloated, over designed, over complicated, La Brea Tar Pit syndrome 'software' (btw, if u missed a classical computer education try 'The Mythical Man Month' by Frederick Brooks for a thorough description of the problem with 90% of M$ s/w and the clue to my tar pit ref. above) Then even with said dork, something doesn't work right no matter what he tries so he calls in M$ support at $250/hr. 3 weeks later they get it all working and just to end on a positive note, I'll say that it works without a hitch for the next 6 months :-P ... and btw, I've used many M$ products extensively, mostly out of necessity(i.e. market or situation driven) so I'm not just talking out of my exit. The reason we like MySQL in particular (and many other O/S solutions in general) is because generally they are: 1. Free 2. Easy to understand 3. Easy to use 4. Elegant solutions 5. supported by a large group of knowledgable, enthusiastic users Oh, and my last point about MSSQL ... it's always cool if dbs are fast. MySQL -- fast. MSSQL -- not so much. Once again, refer to the book mentioned above ...

    46. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Get some sleep Jimbo. My therapist is hopeful Wiktionary will stay down for another day, so that my withdrawl episode will culminate in voluntary commitment to the local psychiactric hospital. Thorazine, here I come.

      [Mods, parent post is the first informative post in entire thread. Please mod up.]

    47. Re:mysql bad at disaster recovery? by jdavidb · · Score: 2, Informative

      Unfortunately for webhosting the demand for MySQL is higher than for the other available DBMS's, since most available open source software and gratis software that requires a database is going to have been developed originally with MySQL. I would much prefer to be using PostgreSQL for the applications I run with a hosting provider, but the apps I use don't function with it, and the hosting provider (NearlyFreeSpeech.net) doesn't offer anything else, anyway.

      I figure once the advantages of the other DBMS systems become more apparent (and enough disaster stories happen to highlight the advantages) the apps will begin to offer and improve support for PostgreSQL and others, and then there will be a demand for them and some hosting providers will begin to offer them. I do understand that PostgreSQL consumes a lot more resources than MySQL, though, so it will not be cheap.

      You get what you pay for. How much is reliability worth?

    48. Re:mysql bad at disaster recovery? by defile · · Score: 2, Interesting

      No. It can't. We have two concrete examples in this very page - one provided by Wikimedia, one provided by me - which directly contradict your statement. Maybe under some circumstances MySQL can handle reboots, but it's been proven already that it can't always do so. Perhaps your MySQL experience is not with high-load applications (at least not the level of load Wikimedia and my employer see).

      I don't mean to diminish what you guys do, or question your abilities. I simply want to offer my perspective because I've been in similar situations.

      I ran an extremely high load site off of MySQL for about 4 years. It started out modestly and went up to around 2000-3000 queries/sec hitting the RDBMS, about 30% of them data updates.

      There were genuine cases where MySQL annoyed the hell out of me.

      For example, the use of pthreads is a huge pain in the ass under Linux because all of the thread stacks share address space. On a 32-bit platform and a large InnoDB buffer pool, it's easy to run out of pointer bits. Once we switched to a replicated setup this wasn't such a big deal anymore. Moving to a 64-bit platform would've made this a non-issue too, but we didn't have the luxury of doing this at the time.

      Regarding data corruption? Every time I've blamed MySQL 4 + InnoDB for ruining my data, I've been too soon to do so. MySQL is often the messenger of a real underlying problem.

      What underlying problem? Well, OS, disk, or RAM for starters. And these aren't always easy to find.

      I went to enormous lengths to verify that all of those things were working properly so I could blame MySQL. Kernel updates, memtest86, I even ran (VA-)CTCS on it for a week and the machine showed no problems. But every time we'd bring MySQL up on it we'd encounter data corruption within 20 days or so.

      The site continued to run off of the remaining servers and we just hoped MySQL wouldn't "corrupt" those as well while we tried to figure out what the problem was.

      Anyway, one day a few weeks later someone was playing around with the retired machine and found that 1 time out of 20, on boot it wouldn't find one of the hard disks. Oh, and this disk just happened to be used in the MySQL data partition. We replaced the disk container and the machine hasn't had a problem since and runs in production.

      We should've just thrown the fucker away.

      I've wasted a lot of time because I had more confidence in machines than I should've. Most computers have terrible reliability, even the ones that are marketed as being reliable. Most people just don't notice. MySQL notices. :(

    49. Re:mysql bad at disaster recovery? by Pakaran2 · · Score: 1

      Are you or are you not aware that Wikimedia has a policy of only running free software on the servers? Actually the Kate's Tools were offline for awhile largely because they didn't want to install Java (and I know there's free software JVM's, don't know what the situation was with using those).

      Also I believe a fair amount of the wiki code is MySQL specific.

    50. Re:mysql bad at disaster recovery? by Pakaran2 · · Score: 1

      I know Oracle does well on recovery, but what about its redundant apostrophe extraction capabilities?

    51. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      It tells you -- unless you're using cheap consumer toys, in which case you've obviously decided that reliability isn't important anyway.

    52. Re:mysql bad at disaster recovery? by sloth+jr · · Score: 1

      Any of the above databases will suffer the same problems if write-cache on the disks used for storage is turned on.

    53. Re:mysql bad at disaster recovery? by normal_guy · · Score: 1

      My point was that you're at the wrong colo if somebody tripping on an extension cord pulls down your site.

      --

      Linux: Free if your time is worthless.
    54. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      Huh,

      WAL is the generic concept which most of databases follow, including MySQL.

    55. Re:mysql bad at disaster recovery? by Jamesday · · Score: 1

      The most recent case was a replacement for search where the prototype was in use and doing the same work on a server costing about $2,000 that two and a bit database servers costing $12-14,000 were doing. Java tool. Compatible with the latest java standard. Not allowed to run it apparently because the GNU JVM isn't compatible with the latest standards and at least one board member wants to use the GNU one, instead of using the Sun JVM until it catches up or the search engine gets further along in the prototype stage and gets to the point of doing compatibility work.

    56. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      I'd never have considered donating to Wikipedia before (even though I use it all the time), but...

      Since MySQL doesn't appear to be up to snuff, can you put a price tag on migration to, say, PostgreSQL? If there was a continuously-running "Wikipedia PostgreSQL fund", I'd plop some money in it.

      Wikipedia being slow or serving up bad pages is no big deal to me -- lots of web servers do that already -- but being down for a day while they play with the database isn't fun.

    57. Re:mysql bad at disaster recovery? by sirsnork · · Score: 1

      Decent server hardware will tell you via it's management software. Usually long before it causes the system to crash. Also if you have chipkill memory installed it will also turn off the faulty chip on the memory so you can continue to run until you can organise a replacement piece and schedule some downtime

      --

      Normal people worry me!
    58. Re:mysql bad at disaster recovery? by Jamesday · · Score: 1

      The only one not using a fancy caching controller. Comparatively plain Linux sofware RAID 0 in a box used for last resort recovery and report generation, which never sees normal end user queries. Usual pattern for this box is to be running some heavy reports for 6-12 hours then catching up in replication. Repeat daily. Since it was current, it was online and replicating at the time of the power loss. If it hadn't been actively replicating/writing we'd have had some logs from the master to replay into it to get it current. It's typically doing 25-60 queries per second when it's catching up in replication, limited by disk speed, and it was probably doing that at the time of the power loss.

    59. Re:mysql bad at disaster recovery? by Jamesday · · Score: 1

      At Wikimedia we're dealing with north of 180GB of data, though we're cutting that with better compression. Rsync from slave to slave on gigabit ethernet runs at anything from 20-40 megabytes per second while the network's servers are pushing out to the internet 80 or so megabits/s and doing about twice that on the internal connections. I expect it to take me 90-120 minutes to copy across all 180GB. Recovery of this sort was one of the reasons we switched from 100 megabit to gigabit for our internal network. We've benefitted in routine operations but this is the first time the investment has paid off in a major problem situation.

      I'm curious about the nature of your problem - what's happening or not happening when replication stops, what error messages, if any, what happens when you do stop slave, start slave or a reset slave. Whether it's all slaves at the same time or only one of them.

      Wikimedia does use bots for quite a few things. One box is suffering from an FC2-related replication glitch and a script corrects that automatically. Same script automatically kills unacceptably slow queries but since we control everything we can set up rules for that which we know are sufficiently safe for the situation. Harder for your case, perhaps impossible without getting really upset customers.

    60. Re:mysql bad at disaster recovery? by defile · · Score: 1

      The telehouse facility stayed up during 9/11 while World Trade Plaza was crumbling to the ground 4 blocks away. If they can stay up through that, everything that calls itself a colo should be able to.

      Well, it did go down a few days later because their UPS ran out of fuel, but it was because of political issues. The national guard wouldn't let the refueling truck into the area. I will admit that having your colo quarantined by a military force is an exceptional circumstance.

    61. Re:mysql bad at disaster recovery? by infra-red · · Score: 1

      Skipping over the database issues, I'm really wondering why two breakers would flip off at the same time. I suspect they have overloaded the circuits.

      I know with the servers at work, we are very careful to not go past 50% load on any one UPS since we split the Dual PS servers over them. If one goes down, it draws more power across the other. If it draws too much, the second UPS will fail too.

      Also, wouldn't the raid controllers have battery protected write ahead cache? Even if the power was down suddenly, I would expect that the writes would still be completed to disk. Maybe I'm being naive with this expectation or I'm not fully comprehending the problem.

    62. Re:mysql bad at disaster recovery? by Anonymous Coward · · Score: 0

      This guy has never heard of consistency checks.. Whatever, just another dumbass "3v3r17h1ng M$ 5uXX!!!!" zealot, with a stupid posting history and no basic education at all... Screw you, kiddie. =]

    63. Re:mysql bad at disaster recovery? by Jamesday · · Score: 1

      You're not the only person wondering about the breaker issue. Last year we had an issue with an overloaded circuit which killed power to a rack so it's an issue we'll be discussing further. This and the things like fire and hurricane are part of why we're heading for more sites - we don't want problems at one place taking us down.

      Four of the five RAID controllers have battery backed up write cache. Two controller brands.

      There's an issue where Linux won't flush. There's the possibility that the controller didn't remember or didn't flush. There's the possibility that the drives were write caching. Or all of those at once may have happened. Your expectation matches mine. It still didn't happen.

      Since there have been three recent prominent cases where it didn't happen on multiple systems from multiple vendors it's likely to be very clear to MySQL that real installations do have this as a problem and they should be addressing it. Will be interesting to see what they come up with.

      So far two of the RAID systems have continuing problems which are keeping them from going back into service. One is rebuilding (which suggests that there was drive caching without flushing on command as an issue there). One was giving odd read results, so we're testing it and doing some planned changing of the RAID setup from 4 in 10 and 2 in 1 to 6 in 10 before we try putting it back into service.

      Databases aren't the current performance issue though - half of the Squid cache servers we were using in Florida have continuing problems. Hadn't quite finished setting up 6 new ones before the power problem.

      Excellent news for me is that we're at 61% of the $75,000 fundraising target already. If that continues and we go well over the target that makes it less necessary for me to be conservative in spending, so I can suggest more redundancy and such. My initial thought on a target was $100,000 and it's not a problem to spend all we get on speed and reliability things.

      Hopefully the promising fundraising results so far and our intent to continue operating at least one site from donations will reassure anyone concerned about us becoming too dependent on any single donor. Thanks to those who have donated!

  11. Not exactly by Man+in+Spandex · · Score: 1

    Wikimedia/pedia don't run after Homer

    Homer: Me Homer, I'm running from PBS.

  12. Power outages suck. by goofyheadedpunk · · Score: 2, Interesting

    Power outages suck, and a great way to protect from them is to distribute your project over a large area of electrical service.

    I know the wikimedia folks are fundraising for more servers, but I wonder if this will provide more incentive to accept Google's offer?

    --

    What if the entire Universe were a chrooted environment with everything symlinked from the host?
    1. Re:Power outages suck. by Anonymous Coward · · Score: 0

      If you know something about "Google's offer", you know more than most people, including the upper ranks of the Wikimedia Foundation. There are no offers, only ongoing negotiations and rumors.

  13. More information here... by Anonymous Coward · · Score: 5, Funny

    I found this useful information about power outages:
    http://www.wikipedia.org/search?/power_o utage

    1. Re:More information here... by daveo0331 · · Score: 4, Funny

      Unfortunately that site you linked to appears to be slashdotted, or something. Here's a mirror:

      http://www.answers.com/topic/power-outage-1

      --
      Remember the days when Republicans were the party of fiscal responsibility?
    2. Re:More information here... by WhatAmIDoingHere · · Score: 1

      I hope you're kidding.

      --
      Not a Twitter sockpuppet... but I wish I was.
    3. Re:More information here... by Anonymous Coward · · Score: 0

      -1: Dipshit.

    4. Re:More information here... by Satertek · · Score: 1

      We must have slashdotted the power grid!

    5. Re:More information here... by Angry+Black+Man · · Score: 1

      maybe they should get one of these

      --
      the byproduct of years of oppression by the white man
  14. Join me, my friends! by mctk · · Score: 4, Funny
    --
    Paul Grosfield - the quicker picker upper.
    1. Re:Join me, my friends! by bradkittenbrink · · Score: 1

      you wrote this page, didn't you?

  15. Re:Shooting pains in my left arm by nsaneinside · · Score: 1

    Sorry, dude. You missed the LJ outage. This was the one where you should've posted some misleading-but-legitimate-looking "information."

  16. Oh, great... by ral315 · · Score: 3, Funny

    Even when the servers go back on, they'll be slashdotted.

    1. Re:Oh, great... by Anonymous Coward · · Score: 0

      slashdot effect does not impact wikimedia server.

    2. Re:Oh, great... by novakyu · · Score: 1
      slashdot effect does not impact wikimedia server.

      What, because it has +5 immunity?

      Seriously, though, wouldn't the traffic be slightly higher than normal---and the server might buckle...?

    3. Re:Oh, great... by slavemowgli · · Score: 1

      That is nothing you should worry about - Wikipedia's traffic is much, much higher than that of slashdot, so the idea that Wikipedia could be slashdotted is just as nonsensical as slashdotting, say, Google.

      --
      quidquid latine dictum sit altum videtur.
    4. Re:Oh, great... by Anonymous Coward · · Score: 0

      use google cache then

    5. Re:Oh, great... by arafel · · Score: 1

      Wikipedia's normal traffic is about 5 times higher than that of Slashdot. I don't think it should be a problem...

      A "Wikipedia effect" could probably bring down most servers around, though. :-)

  17. The Google Connection by lakerdonald · · Score: 0
    They should have been quicker implementing the new servers and bandwidth provided by Google.

    Google's like them nazi's:

    If you don't join their party, they'll come get you!

  18. There's a lesson to be learned here by Raul654 · · Score: 3, Funny

    As that economic genius, Eric Cartman taught us:

    1) Get something other people love
    2) Don't let them use it
    3) Profit!

    It doesn't hurt if you are running a fund drive at the same time, either.

    --


    To make laws that man cannot, and will not obey, serves to bring all law into contempt.
    --E.C. Stanton
    1. Re:There's a lesson to be learned here by SCVirus · · Score: 1

      The NHL's trying that.

    2. Re:There's a lesson to be learned here by David+Gerard · · Score: 1

      -1, Troll. ;-p

      --
      http://rocknerd.co.uk
  19. Aaaaand... by Faust7 · · Score: 4, Funny

    Meanwhile, the devs are working fairly furiously to get it back up

    Don't worry, we'll take care of your backup servers in the meantime. ;)

  20. Stupid question... by isny · · Score: 1

    Why didn't their servers have a UPS? If the power was down for only a few minutes, it wouldn't have been such a big deal.

    1. Re:Stupid question... by ScrewMaster · · Score: 2, Insightful

      Something still doesn't add up. Even if a backup generator autostarts successfully, there's a significant delay between mains failure, switchover, and the generator picking up the load. That's usually a few seconds or more, too long for a computer to run off the residual charge in its power supply filter caps. There would still have been an inverter-charger somewhere to keep the equipment running until the generator was fired up. Sounds like somebody screwed up, either by tripping the wrong breaker, or by designing the facility improperly to begin with.

      --
      The higher the technology, the sharper that two-edged sword.
    2. Re:Stupid question... by Carnildo · · Score: 2, Informative

      Fire code. When someone hits the Big Red Button, all electrical power in the server room must be out. Therefore, UPSs can't be located in server racks (or if they are, you need to go to the effort of wiring them into the BRB).

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    3. Re:Stupid question... by smcallah · · Score: 1

      PDU's in a data center are fed from the building UPS. You don't plug a UPS into a UPS, that's just pointless, and could actually cause an outage when there wouldn't otherwise be one. Yes, it would have stopped this outage. But imagine, you have a UPS in your rack. That UPS fails. But the entire datacenter is still running just fine, but your rack lost power because you decided it was a great idea to put in a UPS for "redundancy." Job = Lost. Dual power supplied servers is what would have helped here, not more UPS's. Generally, a real data center provides redundant power feeds served from 2 different PDU's which are each served by a different UPS. Which ideally should be served by 2 generators. But when a breaker is flipped, your only safety net are dual power supples.

    4. Re:Stupid question... by Anonymous Coward · · Score: 0

      Yes, this thing you refer to, it's called a UPS.

      However, the building generally has a very large UPS that can back up the entire floor. The breaker that fails in this situation will be after the UPS, bringing down your gear.

    5. Re:Stupid question... by flyingsquid · · Score: 1
      I'm actually kind of a computer moron, but I'm a bio geek and I was reading some stuff about DNA. DNA can get damaged in a lot of ways. Mutations can get introduced on one of the strands of the helix, strands get broken, bases might match up wrong between the strands, and soforth. So how does life deal with this to achieve a very low error rate of one error per billion base pairs copied?

      The interesting thing is that it's not so much that DNA is really well-protected from abuse (though that's part of it), it's that the cells have really sophisticated tools for repairing this damage. Many biological systems take failure as a foregone conclusion, but have evolved some extremely sophisticated ways of functioning while damaged and rapidly repairing them and getting back up to speed (Hell, I broke a toe and hiked down and up the Grand Canyon while it was healing).

      Anyhow, there was an article in Scientific American a while back which argued for the same concept for computers. It said that reducing the failure rate wouldn't do nearly as much to reduce total downtime as making systems able to rapidly bounce back from failures.

    6. Re:Stupid question... by almightyjustin · · Score: 0, Redundant

      They did use dual power supplies, both circuits were shut off.

      --

      Omnes arx vestrum sunt adiuncta nobis.

    7. Re:Stupid question... by normal_guy · · Score: 1

      Their post says that both circuits in a dual power supply system failed. My crappy little colo locker has a 10min. UPS that draws power from the colo. One plug of the dual power supply is plugged into it, the other directly into the colo's generator-backuped line. That way even if it takes them 10 minutes to get the thing started, we're good - and if the UPS fails randomly we're still good.

      --

      Linux: Free if your time is worthless.
    8. Re:Stupid question... by Anonymous Coward · · Score: 0

      And that says to me "Somebody done fucked up."

  21. Re:GET SOME PRIORITIES! by Anonymous Coward · · Score: 0, Offtopic
  22. Re:No!!! by Anonymous Coward · · Score: 0

    If you don't want your random nonsense moded down or deleted, post it on Everything2.com.

  23. ETA for read only service is now 2-4 hours. by Jamesday · · Score: 5, Informative

    So far one of our database servers has completed a successful recovery (we're working through them all). On a gigabit link it takes something between 90 minutes and 4 hours to rsync from one to another. As soon as we have two database servers working, we'll be restoring service in read only mode. Likely to be that 90 minutes to 4 hours from now as worst case.

    I'll post followups to this post later, as we're closer to being fully recovered.

    1. Re:ETA for read only service is now 2-4 hours. by Jamesday · · Score: 4, Informative

      May be longer so I withdraw that time estimate.

    2. Re:ETA for read only service is now 2-4 hours. by Anonymous Coward · · Score: 0

      I know this is the last thing on your mind, but I'm sure many people would be intrested in a the gory details of the restore post sleep. Disasters suck, disaster recovery suck as much, but learning from them is at least one thing you can gain.

    3. Re:ETA for read only service is now 2-4 hours. by strider44 · · Score: 1

      I can't wait. How come that it's only when the site is down when you nearly instantaniously find ten things to look up on it?

    4. Re:ETA for read only service is now 2-4 hours. by sumbry · · Score: 1

      A lesson I learned a long time ago, always put the database servers on UPSes. Do so in a way that if primary power is ever lost, the servers shut themselves down and basically wait for human intervention (basically because if you reach this point, something horrible must have happened).

      Power is never supposed to go out in datacenters, you figure you're paying they money for it, but it always does... it's never just the power company.. always some combination of generator or UPSes being overloaded, circuit breakers being overloaded.. name any of 100 reasons. Power needs for server farms is such a complex thing that this is basically inevitable.

      Anyways, the bottom line is that your time (or the downtime) is worth way more than the price of a few extra UPSes (and some serial/usb cables to talk to the db servers and inform 'em to shutdown).

      In reality the DB boxes are the only things that you really have to consider this for. Outages across anything else you can deal with, but once the databases die you're pretty much looking at spending the night (or next coupla nights) in the datacenter.

      Been there, done that. :)

    5. Re:ETA for read only service is now 2-4 hours. by Anonymous Coward · · Score: 0

      So uninformed and condescending! It must be a junior high student in his school's computer lab!

    6. Re:ETA for read only service is now 2-4 hours. by Skapare · · Score: 1

      Since I have no idea how much data is involved, I can't say if this is an expected performance level of rsync or not. But I have noted that over the years of using rsync, specific parameters can be adjusted depending on the given circumstances to optimize the performance of rsync. In many cases it depends on the data content format being understood. And sometimes preparing certain seed data ahead of time and optimize the transfer as well, given the way rsync works. One example was when I needed to transfer a CD ISO image of a bootable system across V.34 modem link. I already had most of the files that were on that ISO on the target system. I didn't have the tools handy to build the ISO directly on the target system, so what I did was just built a .tar file and set it up as the target file. Instead of 20 hours upload time, it went in about 25 minutes.

      Careful decisions with parameters like -W and -B can do well to refine the performance. And if there is no data ready on the target at all, piping a compressed tarball across works faster than rsync.

      Also, rsync can perform really bad with lots of files (millions) since it scans the file tree first, and does so on source and target without overlapping them in time (bad design there IMHO). In such cases, working with subdirectories separately can speed things up.

      --
      now we need to go OSS in diesel cars
    7. Re:ETA for read only service is now 2-4 hours. by imsabbel · · Score: 1

      they have a database, no tons of little files.
      And the current amount should be >100GB including the old revisions, so even over gigabit it should take some time

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    8. Re:ETA for read only service is now 2-4 hours. by Jamesday · · Score: 1

      Gory details will be provided.

      Suspect right now is the same thing LiveJournal found: disk systems lying about having committed writes. LJ has developed a testing tool for that, which has the writing system tell a networked remote system what the disk system says it's written, then compares that to the state after the power off and on to find out if it lied. The system which did a full recovery doesn't have a fancy caching disk controller, just Linux RAID 0. Which worked nicely.

    9. Re:ETA for read only service is now 2-4 hours. by jdavidb · · Score: 1

      In case nobody else says it: thank you.

    10. Re:ETA for read only service is now 2-4 hours. by Jamesday · · Score: 1

      Thanks. Don't thank too much though: remember that the technical team is supposed to be preventing this, not recovering from it. But recovering beats not.:)

      Looks as though we're still 4-6 hours away from being read-write again. Catching up with the lag on the one we're restoring from is going fine, just takes a while. Seems very unlikely that we've lost any significant amount of data - of the order of fractions of a second to second's worth just before the failure is most likely.

    11. Re:ETA for read only service is now 2-4 hours. by sumbry · · Score: 1

      nope, just an admin who's learned that the simple solutions are often the best ones.

      and if you think that post was uninformed, then you've obviously never spent the night in a datacenter bringing up a failed db server because of something that the data center techs told you never would happen - and it did anyways.

    12. Re:ETA for read only service is now 2-4 hours. by Skapare · · Score: 1

      If re-syncing an existing file, the time should vary depending on factors like the amount of difference that exists, and the rsync blocksize in use. If that's one single massive 100 GB file, the default rsync blocksize might be a terrible choice just because rsync will end up spending a lot of time scanning for each block match. If the granularity of the database is still small (e.g. units of difference won't get matched with a larger blocksize) then rsync may not even be a very good choice.

      If the file is being re-transferred in whole, rsync is a bad choice, even if the -W option is used.

      Don't get me wrong ... I love rsync. I use it all over the place. But it has limitations which can be stretched a good ways with careful choices. A single 100 GB file is probably a bit far. I hope they used the DB engine (I forget its name) that can store that on a RAID partition instead of in a filesystem.

      --
      now we need to go OSS in diesel cars
    13. Re:ETA for read only service is now 2-4 hours. by stephenbooth · · Score: 1

      Yep, I've done that. I've also spent a weekend recovering an Oracle database from multiple backups. Because of idiocy in management and stupidity amongst the ops we didn't have a single backup in nearly a month that had all files backed up correctly, fortunately by combining backups I could get all the files back but at different timestamps and a complete transaction log from before the earliest datafile up to the time of the failure (the failure being a sysadmin who thought he knew what he was doing going in and dropping the wrong database (a database with a different name, different password and on a different machine) then removing the filesystems the datafiles were in, all 'accidentally'). This meant that I could roll the files forward to just before the 'failure'.

      Stephen

      --
      "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
  24. Backup power supply? by adeydas · · Score: 1

    I remember once my mail service provider went offline too a year or so back due to power failure but fortunately they had diesel generators for backup power. Dosen't Wikimedia has the same facility?

    1. Re:Backup power supply? by Anonymous Coward · · Score: 0

      If it's anything like the data center I work in, the UPS is inline with the power to the machines. eg. 480 3 phase -> UPS -> breaker panel -> 110 run under the false floor. Sadly, the only problem with this is that a breaker can still trip, and take everything on that circuit off. Which is what i'm guessing actually happened.

    2. Re:Backup power supply? by brion · · Score: 3, Informative

      The colocation facility has diesel generators to protect against the outside power going out. Thanks to the miracle of circuit breakers, power circuits inside the facility shut off (including both circuits feeding our dual-power supply machines).

      --

      Chu vi parolas Vikipedion?

  25. YHBT. YHL. HAND. by Anonymous Coward · · Score: 1

    The only thing worse than somebody faking a heart attack on slashdot is someone else BELIEVING HIM!!

    1. Re:YHBT. YHL. HAND. by Phil+Karn · · Score: 1
      Maybe it was fake, maybe it was real. After an offline conversation, which you haven't seen, I think it was more likely real than not.

      But let's say it wasn't. Why in hell should that bother me? Are you really saying that it's better to let someone die than to take the horrible risk of possibly being thought a fool by someone whose opinion I couldn't care less about? If so, I think you should take some time to carefully re-examine your values about what's important in life.

      Last fall I lost a high school friend to a heart attack at age 50. His father had also died young, and my friend had already had a prior heart attack and was grossly overweight. Despite these risk factors and several hours of clear warning signs in the presence of friends who repeatedly offered help, he turned them down, went home and died. I've heard similar stories from a paramedic friend who had seen her share of people in clear cardiac distress denying that they had any problem at all -- most of whom later died. So I'm well aware of how important it can be to light a fire under someone who thinks they might be having a heart attack, and I don't see any reason to feel otherwise.

      Now go away.

  26. Distributed Wikipedia? by femto · · Score: 2, Interesting
    Isn't raising money for servers a short term solution? Surely the real solution is to invest time and effort into finding a way to distribute wikipedia across the 'net?

    Google seems to have succeeded in building a distributed platform. What about something similar to seti@home, which takes a chunk of each user's disk space and bandwidth and uses them to implement a virtual computer on which wikimedia projects may be run?

    Surely someone is already working on something like this (pointers anyone??)

    1. Re:Distributed Wikipedia? by midom · · Score: 3, Insightful
      Well, distributing a wiki is a task a bit more complex than distributing search index (async!) or seti@home (async). You don't care in async data arrays wether the packet you sent to some node is hour or day old. You care about that in wiki, because every user will be pressing 'edit' button, and data should be consistent everywhere. We are working on distribution.
      • Distributed caches - now majority of hits are served by caches, and some of them are offsite. It was a pilot project for a while and now we're trying to design and build scalable infrastructure for that. But still, lots of edits are served uncached.
      • Distributed file systems - are there any? NFS is single-server system, MS has something, PVFS has no redundancy, GoogleFS is closed and not released, Coda, AFS, all of those just don't work. Right now we're trying to develop MogileFS (the perl-based app-level file storage by LiveJournal) store and sure there are other ideas.
      • Distributed database - there are no proper large database multimaster opensource solutions. MySQL with replication and transactional data store is used. In this event it would be great to have second datacenter nearby with additional DB replicas and gigabit interconnection, but that costs money. And app-level bidirectional replication is in plans for both MySQL and PostgreSQL. And SAN deployment is too costly.
      And yes, MediaWiki code has PostgreSQL support, but migrating from one database to another without proper tests, benchmarks and insurance isn't very mature.
    2. Re:Distributed Wikipedia? by InfiniteWisdom · · Score: 2, Insightful

      170GB isn't that big and people routinely run far more critical stuff without any kind of exotic seti@home-like distribution. What's really inexcusable is the fact that a power failure caused database corruption that turned a 2 minute power outage into major downtime.

    3. Re:Distributed Wikipedia? by Anonymous Coward · · Score: 1, Informative

      do a search for P2P Web Cache

    4. Re:Distributed Wikipedia? by SinaSa · · Score: 0, Troll

      Talk is cheap.

      What's inexcusable is YOU calling the work of volunteers to provide the world (literally) with an almost unlimited source of good information for free, inexcusable.

      Why don't you get off your ass and help?

      What am I doing to help you say? Well...they asked me to be rude to the rude people on slashdot... :D

      --
      --
      The last digit of pi is four.
    5. Re:Distributed Wikipedia? by InfiniteWisdom · · Score: 1

      Why don't you get off your ass and help?

      And on what basis do you assume I'm not helping? Like you said: talk is cheap.

    6. Re:Distributed Wikipedia? by Anonymous Coward · · Score: 0

      Some references found through a quick citesser search:

      C. Krick, F. Meyer auf der Heide, H. Räcke, B. Vöcking, M. Westermann: Data Management in Networks: Experimental Evaluation of a Provably Good Strategy. Theory Comput. Syst. 35(2): 217-245 (2002)

      http://citeseer.ist.psu.edu/meyeraufderheide99prov ably.html

      http://citeseer.csail.mit.edu/krick99data.html

  27. lame quotes rule by mrpuffypants · · Score: 4, Funny

    it's as though 300,000 people cried out and were suddently silenced ...

    and then somebody diffed the change and made them speak again

    1. Re:lame quotes rule by Anonymous Coward · · Score: 1, Funny

      Guybrush Threepwood was an ordinary, uninspired character in a very boring game.

  28. Mod parent up! by Raul654 · · Score: 1

    Jamesday is wikipedia's chief sysadmin, so his comment is probably one of the most informative one here

    --


    To make laws that man cannot, and will not obey, serves to bring all law into contempt.
    --E.C. Stanton
  29. URI to the Rescue by Doc+Ruby · · Score: 3, Interesting

    This outage, as well as our beloved slashdotting, is yet another argument for URIs, rather than just URLs. URLs are like IP#s; they're absolute pointers to specific object locations, in terms of the storage/retrieval interface of a single instance. URIs are virtual, like domain names. They are distributed in DNS, a Netwide database, updated for current lookup values for actual retrieval. URLs need the same kind of layer. Of course, some other characteristics of these objects must be reflected in the URI model that are not appropriate to IP#/domain names, like multiple identical copies, or perhaps versions.

    Just cacheing copies, either actively with a redirection URL, or passively in caching backbone webservers, isn't cutting it. Caching values is always better suited to solving performance problems, creating its own concurrency and identy problems. Not to mention the publication limits of "opt-in" caches, like Coral or Google, which are an afterthought (and usually unknown) to the published object itself. Google has a huge, high-performance URL lookup system. It's taken quite a bit of value from the Internet, and all the content creators it rides on to derive all its value. It give back quite a bit, with its simple, fast, effective interface. Google is perfectly positioned to make its name truly synonymous with an Internet revolution (not just a pinnacle of search evolution) by implementing URIs. If Google let objects get looked up by a URI code as simple as say, [A-Za-z0-9]+, it could get halfway to its namesake in objects with just 28 "digits"; just 7 digits would cover each object instance in its database right now, dozens of times over. If Google opened up such a URI protocol to anyone on the Web running such a "DIS" server, just like DNS, they could offload much of the work, avoid accusations of trying to "own the Internet", and improve their own service immeasurably, not least by making broken links in their database a quaint old curiosity. Will they rock our world, or will another big player, like Archive.org do it, before Microsoft, desperate to distinguish MSN Search, ruins it for everyone with some kind of proprietary hack that favors MS objects?

    --

    --
    make install -not war

    1. Re:URI to the Rescue by Amit+J.+Patel · · Score: 2, Interesting

      URLs contain a domain name. Domain names already provide a level of indirection. Why can't we use that level of indirection for Wikipedia's problem? I don't see what URIs buy us -- if we're already not using the indirection we have, how does a second level give us?

    2. Re:URI to the Rescue by fimbulvetr · · Score: 1

      If you would have used newlines and/or paragraphs I probably would have read it.

    3. Re:URI to the Rescue by Doc+Ruby · · Score: 3, Informative

      Because domain names equate to a single IP# (even if that number changes) - a single instance of the object. A URI is just a unique ID across the whole Net, for an object class, which can have single instances. A good URI scheme will take different states of that class into account, like different versions of the object. Domain names, as implemented in DNS, can't give us the one (URI) to many (instances) we obviously need to support scalability and distributed objects.

      --

      --
      make install -not war

    4. Re:URI to the Rescue by Anonymous Coward · · Score: 0

      I don't believe that is what a URI does at all. You are making up something new, you should give it a new name.

    5. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      If you can't read a 6 and 10 line pair of paragraphs about a simple URI scheme, you're probably not going to be much help getting us from URLs to URIs. Just lay back and enjoy it when it works.

      --

      --
      make install -not war

    6. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      That's because you don't know what a URI is. A URI is a "Uniform Resource Identifier", the first to be used of which is a URL. URIs uniquely identify an Internet object in a single Internet namespace. All the issues I mentioned are URI issues. You're new to the game; you should learn more before criticizing.

      --

      --
      make install -not war

    7. Re:URI to the Rescue by fimbulvetr · · Score: 1

      So it'll all be done for me while I sit back and watch? Nice.

      Back to lifting weights while my girlfriend reads this story to me.

    8. Re:URI to the Rescue by John+Courtland · · Score: 1

      Throw this out next time, will save you trouble: URI from W3C

      --
      Slashdot is proof that Sturgeon's Law applies to mankind.
    9. Re:URI to the Rescue by Anonymous Coward · · Score: 0

      Actually, hostnames (not domain names) equate to IPs. Even still, depending on your DNS setup, you can have one hostname equate to several IPs, handed out on a rotation so that one server doesn't get hammered.

    10. Re:URI to the Rescue by mortonda · · Score: 1
      I guess I still don't understand your point:
      host www.yahoo.com
      www.yahoo.com is an alias for www.yahoo.akadns.net.
      www.yahoo.akadns.net has address 216.109.117.106
      www.yahoo.akadns.net has address 216.109.117.109
      www.yahoo.akadns.net has address 216.109.117.207
      www.yahoo.akadns.net has address 216.109.118.66
      www.yahoo.akadns.net has address 216.109.118.70
      www.yahoo.akadns.net has address 216.109.118.74
      www.yahoo.akadns.net has address 216.109.118.77
      www.yahoo.akadns.net has address 216.109.118.78
    11. Re:URI to the Rescue by J'raxis · · Score: 2, Insightful

      URIs are a superset of URLs and URNs. I think what you're talking about is a URN, isn't it? These are the URIs that specifically name something uniquely (for example, urn:isbn:1902593790 or urn:oid:1.3.6.1.4.1.20115) but don't necessary help you locate it at a specific place.

    12. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Fully Qualified Domain Names (FQDNs) each equate to a single IP in a single request. Round-robin IP resolution is a DNS hack that doesn't work in a distributed system like the Internet. Because you're shuffling "lookup spaces" in a single namespace" - the objects aren't really distributed. Closer to a real URI scheme, but similar to DNS IP rotation, is URL translation by, eg, the HTTPd, which converts a single URL to whichever object instance the HTTPd decides, on the fly. Rewriting the URL for a redirect of the client, or just retrieving the object from wherever on the Net and sending it back, are other scaling techniques. Cisco even has scaling hardware that keeps clones of servers sync'd across the Net, and routes to one or the other depending on their loads. But all of those are hacks. They overload the URL, DNS or IP syntax to misrepresent structure and semantics of the objects and their distribution in terms of completely inappropriate protocol structures. So they're fragile - when taking them at face value, or upgrading the protocol, the extra dependencies make unmanageable complexity. Better to have a single layer that accounts for all that complexity, and let each layer do what it is designed to do.

      --

      --
      make install -not war

    13. Re:URI to the Rescue by ratnerstar · · Score: 1

      Uri can't help you here, man. Unless he could power wikimedia... with his mind!

      --
      Just because you sold your soul to the devil that needn't make you a teetotaler. --The Devil and Daniel Webster
    14. Re:URI to the Rescue by Chuck+Chunder · · Score: 1
      If you can't read a 6 and 10 line pair of paragraphs about a simple URI scheme, you're probably not going to be much help getting us from URLs to URIs. Just lay back and enjoy it when it works.
      There is a significant difference between "can't read" and "can't be bothered to read".
      --
      Boffoonery - downloadable Comedy Benefit for Bletchley Park
    15. Re:URI to the Rescue by Captain+Nitpick · · Score: 1

      And when the master MySQL database server goes down, it's still not going to work right because Wikipedia is a dynamic system with changing content. URIs are not a magic bullet that make things distributed without any other work.

      --
      But then again, I could be wrong.
    16. Re:URI to the Rescue by vidarh · · Score: 1
      You're talking out of your ass. With URI's you still need a method of finding the LOCATION of the objects you're looking for. That still means synchronizing multiple servers and finding a way of directing requests to the servers that happens to have the object.

      DNS is one way of achieving that. You can achieve the same with any directory/naming service provided you have clients that know how to use one of them.

      But ultimately you still need a naming service.

      Whether you make that service take into account just a part of the name (as with DNS with URL's coupled with an HTTP server) or the whole name, the end result is the same. Nothing is stopping you from doing this today by registering a domain and giving each page on your system a unique hostname and repoint your DNS to point to any object.

      Nothing is also stopping you from using URI's to uniquely identify objects you happen to have lieing around, but that doesn't help you if you can't find them.

    17. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      No, you're spouting the same flawed solutions I've debunked in this thread, and complaining that URI resolution still needs location lookups, as if I didn't point out exactly that. Read the rest of the thread before trying your obnoxious attacks.

      --

      --
      make install -not war

    18. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Each of those IP#s is a different namespace - the entire namespace, and its values, must be replicated for that DNS hack to work. For maybe just one object which is distributed. Sync'ing the entire collection of identical namespaces, especially with versioning, to say nothing of current access history state, is too expensive to do that way. A URI scheme, even as simple as what I proposed, is much more workable. Instead of shoehorning distributed object data into inappropriate structures like DNS, URI syntax can reflect semantics natural to the objects, rather than just minimally encapsulating the data.

      --

      --
      make install -not war

    19. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      There is no spoon. There is no uri.

      --

      --
      make install -not war

    20. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      But they can be bothered to post an obnoxious retort? That's beyond lazy, and less than stupid - it's fatuous. And you seem to like it...

      --

      --
      make install -not war

    21. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Of course not. That's why I mentioned a protocol, a distributed database. A Google hosted fraction, hardly a "master MySQL database". If it were that easy, I'd do it myself.

      --

      --
      make install -not war

    22. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Yes, URNames are the kind of URI that I'm talking about. Perhaps if I had posted "URN", more specifically, I wouldn't have gotten so many carping responses that think URLs are the be-all and end-all of URIs. But, considering the specifics of their complaints, I'm not sure. But at least someone understands the issue.

      --

      --
      make install -not war

    23. Re:URI to the Rescue by ggvaidya · · Score: 1

      Something like this?

    24. Re:URI to the Rescue by Chuck+Chunder · · Score: 1

      It wasn't a retort or lazy or stupid or fatuous. It was letting you know you weren't communicating effectively. You can take that and learn from it or ignore it or get all offended by it. Your choice.

      --
      Boffoonery - downloadable Comedy Benefit for Bletchley Park
    25. Re:URI to the Rescue by Trogre · · Score: 1

      Uh, no. Just no.

      DNS name records can (and often do) contain multiple IP addresses for that very reason - load balancing. Perhaps not true load-and-fault-sensitive balancing, but load distribution at the very least.

      Try "host www.amazon.com" on different machines - you'll get any number of addresses back.

      Look up RFC 1794 some time, which has been implemented in both Linux and NT systems for at least the last six years.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    26. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Uh, yes. Not "just yes", but for some complex reasons. Ie. the very flaws in DNS load balancing you mention are just the start for why it's not as good as URIs (or, to be more specific, URNs, as another poster suggested). URLs and DNS aren't a good enough model of the Internet to work all the time - we still have lots of problems. That's why URNs have garnered so much hard work over the years. My lookup is only one simple way. But it would have helped Wikipedia, and lots of the other problems I mentioned.

      --

      --
      make install -not war

    27. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      Yes, much like that. Even TinyURL is a good start, much like my proposal. But not distributed enough a resolver network to offer full benefit.

      --

      --
      make install -not war

    28. Re:URI to the Rescue by Doc+Ruby · · Score: 1

      No, you are just hoping that there's an oversimplified answer. PURL is "much like" what I'm describing. But does PURL have the distributed protocol I mentioned? No. That's one reason it's not "just like" the URI system I described. It's enough like it that there aren't any significant contradictions, but not enough like it to satisfy all its requirements. I hate all the bitching about how suggested tech solutions don't already solve all your problems. Of course we're making up stuff as we go along - that's the brainstorming part of design, validated by real design discipline. Most people don't do it in public, because of exactly this kind of counterproductive complaining.

      --

      --
      make install -not war

    29. Re:URI to the Rescue by Anonymous Coward · · Score: 0

      Well, *I* never post anything to Slashdot until I have fully explored and validated all possible avenues and vectors of weakness, defects, shortcoming, fragility, blemishes, flaws, shortfalls, drawbacks, imperfections, chinks in the armor, or otherwise. I suggest you do the same.

  30. Where are you guys hosting from? by bigberk · · Score: 0

    Time to move your operations to Winnipeg, Canada. The power never stops flowing (in the 12 years I've lived here, I only remember two power failures in my residential neighbourhood). I really don't understand why there aren't network server operations set up in reliable power centres such as these.

    1. Re:Where are you guys hosting from? by Thu25245 · · Score: 1

      So, nobody ever crashes a truck into a transmission pole in Winnipeg?

      Ice never builds up on the lines in Winnipeg?

      Nobody ever cuts a buried line with a backhoe in Winnipeg?

      This happened because someone tripped a circuit breaker in the building where the servers were located. The grid itself was fine. No matter how reliable the generation facility, people will always muck thing sup.

    2. Re:Where are you guys hosting from? by Anonymous Coward · · Score: 0

      well what does it matter when the circuit breaker trips? That's doesn't have to be the colo's fault

    3. Re:Where are you guys hosting from? by Rakishi · · Score: 2, Informative

      Because the power didn't actually go out?

    4. Re:Where are you guys hosting from? by Anonymous Coward · · Score: 0

      Well, first of all, they would HAVE TO LIVE IN WINNIPEG!!!!

    5. Re:Where are you guys hosting from? by topham · · Score: 1

      I can assure you the power does occasionally stop flowing in Winnipeg.

      While at a client site a powerfailure occured which was caused my a hydro worker electrocuting himself on a power pole a block from the site.

      As well, contrary to the previous poster I know of about 24 power failures in the past 12 years in a particular neighborhood in Winnipeg. About 2 per year, on average, but 2004 tried to move that average up a couple of notches.

      Complete powerfailures will occur, sooner or later at any single site. A significant online site really should be in multiple locations, but that adds significant to the complexities of running it on a day to day basis. What's a few hours downtime really worth?

    6. Re:Where are you guys hosting from? by endersdouble · · Score: 1

      As strange as it may sound, I can still pull a circuit breaker when the power's live.

    7. Re:Where are you guys hosting from? by jeremymk · · Score: 1

      I guess it really depends on what area of winnipeg you live in, been here all my life and have had the power go out more then a handfull of times a year. Plus no matter what, the breaker tripping will still knock out the power.

    8. Re:Where are you guys hosting from? by Anonymous Coward · · Score: 0

      haha i'll bet a winipegger posted that. amen, leave town and never look back.

    9. Re:Where are you guys hosting from? by lachlan76 · · Score: 1

      What's a few hours downtime really worth?

      Depending on who you are, it can be quite a lot.

    10. Re:Where are you guys hosting from? by HeghmoH · · Score: 1

      Sounds like it's going to be worth a lot of money for Wikimedia. I bet they get a ton of donations because of this event.

      --
      Mod down posts with a "Free Mac Mini/iPod" sig, they're spam!
    11. Re:Where are you guys hosting from? by Anonymous Coward · · Score: 0

      Hosting servers in Winnipeg??? Just how are the peering arrangements that far north? I'd stay away from there until I knew there were more than a few lines out. Don't think I'd rely on data links only from Good ol' Bell Canada for our business. They've caused more than a few local outages in the Toronto area due to fires and floods at cental offices. Always has been alternatives to keep our servers available and online.

  31. Ironic by 42forty-two42 · · Score: 1
    From the google cache of their hardware growth planning:
    Question - don't you think a UPS system would also be a wise investment?
    1. Re:Ironic by Jamesday · · Score: 4, Informative

      Yes. I wrote that cached page and it's now a bit out of date. IF, and it's not certain, local fire regulations permit the use of UPS systems in the racks we're going to be installing them. Decided on that after LiveJournal's unfortunate experience. But don't yet have them.

    2. Re:Ironic by Skapare · · Score: 1

      If that's a location that does not permit UPS systems in racks in computer rooms, then get moving to somewhere else. They would be in the miniscule minority. I can't say such does not exist; I've seen some very dumb regulations in some places, mostly large cities like Chicago and New York. But California seems to have "distributed stupidity", so who knows what nonsense you can find there.

      Sometimes regulations get misinterpreted. For example it is a common, and reasonable, regulation to prohibit lead-acid batteries like car batteries. UPS systems generally use gel-cell batteries, which are similar, but have the safety of a gel stabilized acid. But I've still encountered people that assume the two are alike. If someone says regulations prohibit something, find out what they are; which regulation and what specific section/part. Then start a talk on that.

      But with a UPS, be sure you have a power loss warning system that will alert you when the mains power coming in is down, or for any reason the UPS is drawing down the batteries. I generally recommend a continuous online double conversion UPS.

      --
      now we need to go OSS in diesel cars
    3. Re:Ironic by Jamesday · · Score: 1

      Good recommendation on the online UPS side - nice, clean power and no switching inconsistency. Might go cheaper on the slave database servers but the master should be very well protected.

      With a UPS, we will have things configured to be very conservative, shutting down the servers after no more than a minute or so on UPS power. Much nicer to be cleanly shut down than not.

      We are looking to have 1-4 remote sites this year, with database slaves we can switch to being masters in an emergency, so single location failures shouldn't be able to be so disruptive. But not there yet. This year has been "only" going from 3-4 servers to 50, all with donations buying the equipment and everyone involved donating their time. We're in pretty good shape overall (and wonderful shape compared to this time last year!) but not yet where we need to be given the expectations of continuous reliability for the busiest internet sites. Gradually improving that.

  32. Business plan by Anonymous Coward · · Score: 0

    1) Register a wiki domain name like wikisearch.org.
    2) Host a "backup" fundraising page there that sends money to us instead.
    3) Have someone mess with the Wikimedia circuit breaker.
    4) Send the power outage news to slashdot with our link.
    5) Profit!!

  33. UPS? by optimusNauta · · Score: 0, Redundant

    What wikipedia needs is some UPS technology between the wall and those critical servers they are spending hours restoring.

  34. Wikimedia Colo Facility.. by Anonymous Coward · · Score: 0

    I'm not surprised.

    The facility they are coloed in is considered "rickety" by many.

    From what I hear, they are expanding into a decidedly "non-rickety" location.

    Hopefully, this is the last outage we'll see due to these circumstances..

    1. Re:Wikimedia Colo Facility.. by MistabewM · · Score: 0

      Do you have a list of non-rickity hosting =)

      --
      "A learning experience is one of those things that says, 'You know that thing you just did? Don't do that.'" - DNA
  35. I think it has something to do.... by Anonymous Coward · · Score: 0

    ...with all the porn sites that they host on the side.

  36. Integrity? by krem81 · · Score: 1

    Is there a way to ensure integrity of the data with such a setup?

    1. Re:Integrity? by Jamesday · · Score: 3, Interesting

      Yes. It's in our plans regardless of what happens with Google.

    2. Re:Integrity? by Anonymous Coward · · Score: 0

      Please tread carefully when dealing with Google. I know they claim to 'do no evil', but the Mathworld Mess highlights what can go wrong. Wikipedia is too precious to lose.

    3. Re:Integrity? by Captain+Nitpick · · Score: 1
      Please tread carefully when dealing with Google. I know they claim to 'do no evil', but the Mathworld Mess highlights what can go wrong. Wikipedia is too precious to lose.

      Unlike Mathworld, nobody has the authority to sign over the copyrights to Wikipedia's content (in bulk) to Google or anyone else.

      --
      But then again, I could be wrong.
    4. Re:Integrity? by Anonymous Coward · · Score: 0
      From the Mathworld page:
      Despite the facts that I (or volunteer contributors) wrote these entries and that...
      The Mathworld web site was similar in that copyright was not owned by one person. Possibly the volunteers signed their copyright over to Eric, but on the surface it appears that Eric did not have the authority to (unintentionally) give copyright to CRC. That didn't stop CRC.

      Presumably that is why Eric and CRC were contractually obliged to rewrite any missing articles at their own expense.

      As a result, CRC insisted that broad reproduction rights to all contributed material be secured. Furthermore, if we are not able to secure such rights, then Wolfram Research and I, at our own expense, must rewrite the entries in question from scratch for CRC to reproduce.

      It looks like the volunteers did the right thing by Eric and didn't pull their articles and thus causing him grief??

    5. Re:Integrity? by Captain+Nitpick · · Score: 1
      The Mathworld web site was similar in that copyright was not owned by one person. Possibly the volunteers signed their copyright over to Eric, but on the surface it appears that Eric did not have the authority to (unintentionally) give copyright to CRC. That didn't stop CRC.

      It's ambiguous. The current permissions form merely grants Wolfram the right to do whatever they please with the material. I have no idea what the previous terms were.

      Wikipedia's content is explicitly under a license that allows for irrevocable rights to republication and derivative works. The worst that can happen without declaring the GFDL invalid is that the Foundation goes bankrupt. But we'd still have the right to take the database and go elsewhere.

      --
      But then again, I could be wrong.
    6. Re:Integrity? by Jamesday · · Score: 1

      The best protection possible is having the Foundation have no more rights than any other GFDL licensee. And to be absolutely sure that it's not acting as if it's some sort of association of all contributors or something, in which case someone could argue that it had the rights to do that sort of unfortunate assignment of rights and has no Communications Decency Act or DMCA protection from the acts of those who contribute content. At which point it's conceivable for it to lose a copyright infringment case and all rights to the content except those granted by the GFDL.

      One of the unfortunate things it's doing now is registering as a trademark things which are already common law trademarks of the authors of the work Wikipedia. It'll be interesting to see if someone objects to the trademark registration on that basis. I'm certainly considering it, because a successful registration will increase the vulnerability of the name of the work, changing it from an effectively impossible to lose trademark to one the Foundation can lose. Trademark registration for the Foundation being refused because of an existing common law trademark of the contributors would be a very positive result, protecting it from others without adding risk of loss.

      There are some who don't understand this sort of risk reduction and suggest silliness like having the Foundation have copyright assigned to it.

      This sort of thing is one of the reasons why I try to make it very clear that I am not a "member" of the Foundation and it has no power to act on my behalf in any matter regarding IP rights to what I've contributed. It's the best course to protecting the works.

  37. Re:The evils of Wikipedia by mrnobo1024 · · Score: 0

    All of these "values" are artistically incorporated in one person: Wikipedia.

    There's a person named Wikipedia now? Weird parents...

  38. 170 gigs? by mnmn · · Score: 1

    That aint much. My older harddisk is 200. I'm planning to get a 400 gig one.

    I wonder if wikimedia will ship the whole wikipedia on a few bzipped DVD isos to people who want a not-so-up-to-date encyclopaedia. I was researching a period of 1200AD, not much chance that data will change in the next few months.

    And I DO wonder why doesnt another database company take up a mirror of wikipedia, just to show the reliability, speed, scalability etc of their database.... great marketing tool especially if you own all the ad bars. Sybase? Ingres? MSSQL? sleepycat even?

    Why do I have a feeling someone kicked a Pentium1 server running freebsd with a 200GB harddisk somewhere out there...

    --
    "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    1. Re:170 gigs? by CAIMLAS · · Score: 1

      This would be useful, if indeed Wikipedia was anywhere near complete. I too was researching an older era recently (the "Viking" era of the nordic lands - around 700 - 1100, give or take), and I found more pertinent information on various sites than I did on wikipedia. Wikipedia, while having some more encyclopedic information (a couple maps, mainly), there wasn't much there at all that wasn't - at best - cursory.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    2. Re:170 gigs? by mnmn · · Score: 1

      I have been reading alot on the neolithic age between the end of the last ice age ~10000BC and birth of christ. The Vikings' travels upto greenland are interesting, as is the spread of the Inuit from the west, but have you read about the Tunit/Dorset people? They were an arctic people before the Inuits. Had strange hairdos, and their boat consisted of three blown up bladders of large mammals. There is one account of a european meeting a Dorset man in the hudson bay , around when they were becoming extinct... I have found VERY little about the Dorset men, almost no studies have been done, and even fewer websites on them.

      Details of these things are there on the wiki, best thing being they are very approachable instead of exhaustive, and provide links for further info instead of getting so deep into the subject as to lose the reader. Wiki brings details in a very expectable way. I've been chasing my own history and who the heck are we (Hazaras), turns out a persian king, shah abbas kicked us out into afghanistan while also making us shia from buddhist 400 years ago. Thats still debatable. Another interesting detail you'll notice of the mesolithic and neolithic time ages, is how various civilizations developed the same technology about at the same time distinctly away from each other, like pottery and farming, only a few thousand years after the end of the ice age. It shows the potential of those technologies had been very strong, or perhaps people travelled much further on foot than we think, and so are interconnected much more deeply.

      Just some observations; I should start a blog.

      --
      "Give orange me give eat orange me eat orange give me eat orange give me you." -Nim Chimpsky
    3. Re:170 gigs? by brion · · Score: 2, Interesting

      The vast majority of this space is taken up by revision histories (and those are compressed!) Periodic database dumps are available for download. Image and multimedia uploads have been taking up a bigger share lately, but those are on a separate server which recovered just fine.

      A German company has published an end-user-friendly CD-ROM of material from the German-language Wikipedia, but afaik no one's published an English-language edition yet.

      --

      Chu vi parolas Vikipedion?

    4. Re:170 gigs? by Jamesday · · Score: 1

      About this time last year, 15,000 RPM SCSI drives had 73GB as their largest available size and cost about $700 each. We have 6 in the master database server, set up in RAID10 with about 210GB of usable space available. 7200 RPM and 10,000 RPM drives were available in larger sizes.

      We're definitely purchasing greater capacity for the next batch of master server systems. Thinking in terms of a terrabyte or so and 12-16 drives.

      The traffic volume is also intersting. Think of about 200 million selects and 1.2 million inserts/updates per day and so far up to about 1300 hits per second. Slashdotting of us is only about 200-400 hits per second or so and stopped, usually, being a problem around April 2004.

    5. Re:170 gigs? by wongn · · Score: 1

      There's currently an editorial team working on editting the articles to a fixed state that corrosponds to a certain degree of accuracy and quality. The idea is that this can be distributed in either paper or CD form to anyone who so wishes (and the developing world)

  39. Answers.com by stevemm81 · · Score: 3, Informative

    You can look things up on answers.com.. They mirror wikimedia, as well as other dictionaries/encyclopedias.

    1. Re:Answers.com by Anonymous Coward · · Score: 0

      mod parent up

  40. Xenu Strikes Again! by Anonymous Coward · · Score: 1, Informative

    It was Xenu! Great God of the Scientoligists who caused the power outage. He/she/it was angry you didn't pay all your hard earned cash to learn the inner secrets to find out about he/she/it. Read all about it on Wikipedia. Oh, wait you can't!

    I find it an interesting coincidence the power outage happened so soon after that the Xenu article was featured. I may be paranoid, but the Scientologists have taken paranoia to a new dimension. They are not above dirty tricks. Karl "Turd Blossom" Rove could learn a thing or two.

    1. Re:Xenu Strikes Again! by MillionthMonkey · · Score: 5, Informative

      I find it an interesting coincidence the power outage happened so soon after that the Xenu article was featured.

      Gee, you just had to mention the X-word! Now this thread won't load for most Scientologists because the keyword filters they were forced to install by their Church will see "Xenu" and block the site. After all the mere sight of the word could cause "pneumonia and death" if you haven't paid the Church of Scientology for the proper preparation.

      Wikipedia's Xenu article has an interesting history if you look, as I did the other night when it was featured. Scientologists vandalize it regularly. You're supposed to pay them a half million (or some absurd sum of money) to find out about Xenu. After you find out, you're too embarrassed to admit to anybody that you paid a half million to learn that your problems are caused by bad science fiction, when you could have bought a house in Silicon Valley instead. So they obviously don't want a Wikipedia article giving away their half-million-dollar "trade secret" for free.

      One trick I saw was to use HTML entities to spell out insults at the top of the article- like "only an idiot would believe this" or something. In the editor window, the entities weren't rendered and each letter appeared as a hex code.

      A more effective attack took a different approach. The vandal in this case changed "Scientologists" to "Muslims", "Scientology" to "Islam", and inserted a boring-sounding sentence at the end of the first paragraph claiming that "Xenu" is another name that Muslims use for "Allah". It completely discouraged you from reading further. If you didn't know better you wouldn't find out how "Allah" distributed the thetans around volcanoes on various planets and blew them up with hydrogen bombs, and how their blown-up spirits cause problems in your personal life today.

      This is OT, but what the hell, why not whack a beehive? Additional information on Xenu:
      Operation Clambake (Hubbard maintained that humans are descended from clams)
      The Xenu leaflet (all about Xenu- this information can save you lots of $$$$$)
      The road to Xenu (authored by a woman who got suckered)
      The Google cache of Wikipedia's Xenu article is also a must read.

      I'm wondering if I'll get a lot of freaks, downmoderations, and hostile AC replies after I post this. After all, that's the kind of thing that Hubbard called "fair game". If it sinks below default visibility I'll repost it again with my karma bonus, so you theta-clear-wannabes out there can save your points for someone else.

    2. Re:Xenu Strikes Again! by Silentnite · · Score: 3, Interesting

      If only I were a mod. Informative, and just plain funny if you ask me. I've read about that entire thing going back and forth and its kinda odd. On the one hand I think that Wikipedia should be limited to who can change it. But on the other its really neat and diverse to let everybody at it.

      Oh well.. Slightly OT

  41. What's the Name of Wikimedia's Colo? by Ron+Bennett · · Score: 1

    What's the name of Wikimedia's colo?

    Ron

    1. Re:What's the Name of Wikimedia's Colo? by timstarling · · Score: 2, Informative
    2. Re:What's the Name of Wikimedia's Colo? by Anonymous Coward · · Score: 1, Funny

      From their website:

      Neutelligent owns and operates it's data centers. Neutelligent is not co-located in someone else's data center. We are located in downtown Tampa,FL. ,in one of the only true NOC's in the southeast. Neutelligent has huge fiber commitments with the following carriers, totalling over 4.5 gigs: UUNET, Level 3, and EPiK. By doing so, that will ensure 0:00 down time, not 1%, we mean 0% downtime.

  42. Absolute power corrupts. by kiwidefunkt · · Score: 3, Funny

    As soon as I saw "Power corrupts. Power failure corrupts absolutely" I thought, the damn commies finally did it! But no, not hacked by commies...just by a renegade circuit breaker.

    --
    www.kiwilyrics.com - a wiki for lyrics
    1. Re:Absolute power corrupts. by exocett · · Score: 1

      Nah, I didn't think so- he would've replaced all the c's with k's. "Power korrupts. Power failure korrupts absolutely." The guy's soviet russian like that.

    2. Re:Absolute power corrupts. by maxwell+demon · · Score: 1

      You mean, as in "Konqueror", "KrawlSite", "Katalog", "eduKator", "Kasablanca", "Klusters", "Konference", ...?
      I new there is something wrong with KDE! :-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
    3. Re:Absolute power corrupts. by LirQ · · Score: 1

      I was slashdotted! weeeeeeeeeeeeeeeeeeeet!

    4. Re:Absolute power corrupts. by Anonymous Coward · · Score: 0

      Of course. All of you free software nuts are communists according to Bill.

    5. Re:Absolute power corrupts. by Red_Faction · · Score: 1

      Victory to red faction We are the just ones!

  43. why, why, why? by CAIMLAS · · Score: 2, Insightful

    Why were they not using battery backup on their database servers (IE, their critical servers)? That way the servers would have the necessary 10 minutes (or whatever) so that they can shut down the DBs and power off the systems.

    This is a negligible cost for something as integral as an active sync with the work that people have performed - for free.

    Why is this not seen as important? "The wiki users will just recreate the material"? That's somewhat presumptuous.

    Now, livejournal I can understand not doing this (as there are many clients which allow people to sync with their online journals and the material is fairly culturally worthless), but wikipedia? It's one of the better things on the Internet.

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    1. Re:why, why, why? by ananke · · Score: 1

      Uhmm, most likely they do use ups systems. Ones that reside outside of the racks. See, not everybody bothers with separate ups for each individual server, that's just silly. The breakers that got tripped were most likely between the pdu's and the racks.

      --
      --- d'oh
    2. Re:why, why, why? by Captain+Nitpick · · Score: 1
      Why were they not using battery backup on their database servers (IE, their critical servers)? That way the servers would have the necessary 10 minutes (or whatever) so that they can shut down the DBs and power off the systems.

      See this post by Jamesday.

      --
      But then again, I could be wrong.
    3. Re:why, why, why? by BrodeCo · · Score: 1

      Mmmm... LiveJournal did have some ability to do this, weirdly enough. They only lost data from one cluster after their power loss last month.

      I'm glad it turned out this way, as I only use Wikipedia to look up important facts & research historical data... I need my LiveJournal friends-list to find out which Harry Potter character someone I've never met would be!

  44. And I have a paper due... by Anonymous Coward · · Score: 0

    Where am I do research? The internet? I have visted that place in years!

  45. Wikiolo. by Anonymous Coward · · Score: 0

    ba dum pish.

  46. more info by focitrixilous+P · · Score: 0, Redundant
    --
    SAILING MISHAP
  47. How long? by TeeRebel · · Score: 1

    So how long should it take before they resolve all of their issues? History reports are pretty hard without Wikipedia...

    1. Re:How long? by Anonymous Coward · · Score: 0

      Worse, the Internet's premier source of Ashlee Simpson information is down! How will I ever survive?

    2. Re:How long? by wongn · · Score: 1

      Should be editable within a couple more hours max - so I believe.

  48. Tin hats or tin heads? by fm6 · · Score: 1

    I had just begun working at Hurricane Electric when they had their big power failure. (It was the first day I was answering the phone on the help desk. Not a pleasant experience!) In that case the power loss was due to mistake by a technician servicing the backup power supply. Then there was the Internap failure, which seems to have been caused by a similar human error. Now a third provider has had some weird circuit breaker issues. That makes three major outages in less than a year. Either there's some evil conspiracy, or a lot of different companies are using the same bad procedures.

    1. Re:Tin hats or tin heads? by YankeeInExile · · Score: 1

      ... there are a lot of companies using very bad practices.

      When I was at Random DotBomb dot Com and we were building out a cage at a colo facility in Sunnyvale, I asked for three circuits to our cage, each circuit from a different UPS.

      They looked at me like I had three heads.

      "But, it's UPS power ... why do you want diversity?" -- they asked stupidly. I responded, Because sometimes UPSes fail. Sometimes Lefty the Lectrician turns off the wrong breaker. Sometimes Clumsy the Carpenter buzzes his Sawzall into a 480V riser (A failure mode I remember happening -- the world is short one clumsy carpenter ).

      Some weeks after I made this irrational demand, they did have a UPS failure -- and a spectacular one at that -- remember, these are not the itty bitty 3 kVA units we use to keep our systems at home running, but enormous units capable of supplying 277/480 at 150 amperes -- with a spray of hot metal and slagged semiconductors and fire shooting out.

      Despite this catastrophic failure, which caused considerable strife to some of our cage-neighbors, our pathetic wannabe site kept going.

      --
      How does the Slashdot Effect happen given that no slashdotters ever RTFA?
    2. Re:Tin hats or tin heads? by fm6 · · Score: 1
      I only disagree with you on one point: there will never be a shortage of clumsy carpenters!

      What suprises me is that your colo provider was able to accommodate your request. At HE, the only power available comes from an inverter at the end of each cabinet row. (Power is converted from AC to DC, then back to AC; in effect the whole building is one big UPS.) You could, at some expense, wire a cabinet to multiple inverters, but that still wouldn't protect you from a failure at the main system -- which is, in fact where the failure happened.

      What really bugged me at the time was the "shit happens" attitude afterwards. I was the only one who thought that maybe something was wrong with procedures (did anybody tell that tech, "don't touch that breaker"?). And since I was the clumsy ignorant newbie, my opinion counted for zilch.

      Before HE calls its lawyers (we did part on bad terms), I should point out that they don't seem to do any worse than any other colo company -- as this string of power failures indicates.

      What's needed is some kind of certification authority for colo and hosting providers. An independent person needs to go in and ask the hard questions: Redundant power systems? Redundant networks? (Most colo companies actually seem to have those, which says something about their mindset.) Well documented procedures? Sufficient staffing? Proper training? Enough money in the bank to keep operating?

      That last one was particularly painful for me. HE itself is in good financial shape, but there are fly-by-night outfits that call themselves colo companies, but really just resell rack space. A couple of HE's resellers went out of business when I was there, and it was not fun to tell their customers they couldn't even get their machines back. And flaky hosting companies that consist of one machine and one or two semi-competent entrepreneurs are legion.

    3. Re:Tin hats or tin heads? by YankeeInExile · · Score: 1

      Unfortunately for the world, you're right -- clumsy carpenters seem to be capable of reproducing faster than killing themselves with power tools and electrical equipment. More the pity.

      I went into internet stuff in 1989 from the telco business, and was absolutely horrified at the lack of rigor that most vendors were putting into their implementations.

      I remember one consulting client (dialup ISP in central California), who was single homed. I explained "You guys really should have at least two and preferably three network feeds." Their CTO told me "But we only can afford one T1 of bandwidth right now."

      "Take whatever budget you have, divide it in three, and get that much from three vendors...You are better off with three 256k fracs than a single point of failure."

      "No way! We're a serious ISP ... we can't have a fractional T1" said the CTO.

      "Your absolute failure to understand will come back to haunt you some day."

      Getting back to the Sunnyvale facility: As to the specifics -- getting two UPS feeds was pretty easy -- one came from the inverter that was "supposed" to feed us, and one from the cage row opposite us. It was getting the third, which required a construction order to run a conduit thirty-someodd feet from the other side of the room that took an act of deity.

      When I built out our facility in Kansas City, I forewent UPSes entirely and just ran the whole kit-and-kaboodle on Sun and cisco gear with -48VDC inputs. There was a small 4kVA inverter to supply the handful of non-mission-critical things I could not get in DC supply versions (e.g. tape stacker, drop lights, couple of Wyse 50s.

      --
      How does the Slashdot Effect happen given that no slashdotters ever RTFA?
    4. Re:Tin hats or tin heads? by fm6 · · Score: 1
      As to the specifics -- getting two UPS feeds was pretty easy -- one came from the inverter that was "supposed" to feed us, and one from the cage row opposite us. It was getting the third, which required a construction order to run a conduit thirty-someodd feet from the other side of the room that took an act of deity.
      So you really didn't have three different UPSs. You just had three different inverters, all part of the same building-wide UPS. That saved you from an inverter failure, but doesn't help you at all if the whole system goes down -- as happened in the other three cases we've been talking about.

      You could have gotten your redundant inverters at HE, and probably with a lot less hassle, since they're a lot less bureaucratic than other providers. But by the same token, they're not very good at following, or even defining, procedures. And it was, as far as I can tell, a failure of procedure that caused their blackout.

    5. Re:Tin hats or tin heads? by YankeeInExile · · Score: 1

      Well, yes, all three batteries could have failed, and all thre sets of rectifiers feeding those batteries could have failed. But that is a particularly unlikely event.

      --
      How does the Slashdot Effect happen given that no slashdotters ever RTFA?
    6. Re:Tin hats or tin heads? by fm6 · · Score: 1

      When you talked about "connecting to the inverter" I assumed you meant that all the inverters were part of a single power supply system -- which is the case at HE.

  49. mysql.com here's a suckssess story for ya by Donny+Smith · · Score: 1

    >Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state. Attempting to bring up the master database and one of the slaves immediately after the downtime showed corruption in parts of the database.

    Well this is just great PR for mySQL.

    (To DBA: have you guys ever heard of the replicating feature?)

    1. Re:mysql.com here's a suckssess story for ya by mrnobo1024 · · Score: 0

      Wikimedia actually does use replication, in fact they have 5 database servers (or did last time google crawled the meta-wiki)

    2. Re:mysql.com here's a suckssess story for ya by Cramer · · Score: 1

      FYI: the replica's were powered down too. It takes time to verify integrity of dozens of servers -- and until you do, you don't know for sure that nothing is borked. Oh and I guess to missed the part about the binlog filling up (stopping replication) and the db continuing on (breaking replication.)

    3. Re:mysql.com here's a suckssess story for ya by Anonymous Coward · · Score: 0

      And 4 of them failed to maintain an in-tact DB. It was by luck or the grace of god that the 5th one survived.

      Lesson? MySQL is comeplte and utter shite and should die immediately.

  50. Blow Your Mind by thatgun · · Score: 1
    1. Re:Blow Your Mind by Anonymous Coward · · Score: 0

      Do my eyes deceive me? Everything2 has no Ashlee Simpson articles? How could it possibly be regarded as a useful resource in that case!

  51. Easy, brain-dead sql db recovery (if possible) by iamcf13 · · Score: 1, Offtopic

    Here are the ingredients to this solution:

    A completely designed, 100% empty database.

    A COMPLETE log of all the SQL statements that were applied to it IN the order they were used. This is obtained by the application logging the SQL statements to the SQL log file AFTER the SQL statement is succesfully executed.

    When a data base failure occurs, stop everything, 'replay' the backed up SQL logfile (thats on a separate backup system) on a copy of the empty DB there. TADA! you are back in business back to the point of failure!

    The downsides....

    Redesigning the database will screw everything up unless the SQL statements used during the redesign are logged as well.

    All sql requests must be funneld through 1 and only 1 db connection. Otherwise the sql statements in the logfile stand a chance of being recorded 'out of sequence'. Here is a brief example:

    With one db connection, user 1 edits record x two separate times in succession then user 2 comes along behind user 1 and modifies the record with no problems. Without record locking or with indiscriminant multithreading, record x will be corrupted if user 2 edits record x between user 1's two consecutive edits. See the downside?

    The SQL logfile gets corrupted due to storage media failure. The only way around this would be to copy the log file to a backup mirror system on a periodic basis and verify it is a good backup copy using a strong cryptographic hash such as SHA-512 or for the utterly anal and paranoid, a byte-for-byte comparison.

    The EXTREME volume of data may/will make this approach unfeasable due to time constraints -- too much data to restore via 'replaying'. 'Checkpointing' from a known good database state will cut down the size of the SQL log file but introduces the possibility of database corruption by simply using the wrong checkpoint database when replaying the sql statements.

    Speaking of 'tar' in the parent post, I 'cowrote' a simple, high-performance freeware Windows file archiver that combines file aggregation with data compression. If you want to try it out, it is here.

    1. Re:Easy, brain-dead sql db recovery (if possible) by Tough+Love · · Score: 3, Interesting

      A completely designed, 100% empty database.

      A COMPLETE log of all the SQL statements that were applied to it IN the order they were used. This is obtained by the application logging the SQL statements to the SQL log file AFTER the SQL statement is succesfully executed.

      When a data base failure occurs, stop everything, 'replay' the backed up SQL logfile (thats on a separate backup system) on a copy of the empty DB there. TADA! you are back in business back to the point of failure!


      Read the Wikipedia page. That's exactly what they've done, but because the MySQL database got corrupted, instead of just falling back a few minutes, they may have to go right back to a full backup and replay the log since then, which takes a lot more time than replaying a few transactions.

      The solution is to switch to a database that actually implements ACID (the second letter stands for "Consistency" and the last letter stands for "Durability" which is what failed here).

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    2. Re:Easy, brain-dead sql db recovery (if possible) by iamcf13 · · Score: 1

      The solution is to switch to a database that actually implements ACID (the second letter stands for "Consistency" and the last letter stands for "Durability" which is what failed here).

      Which RDBMSes that you know are 100% ACID compliant? Please name any you know.

    3. Re:Easy, brain-dead sql db recovery (if possible) by Tough+Love · · Score: 1

      Which RDBMSes that you know are 100% ACID compliant? Please name any you know.

      PostgreSQL and Ingres. I'll stop there, because I don't want to include any non-open source databases in my list. There are some I am not sure about.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    4. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      Oracle, DB2 and Sybase of course all pass the test. I'm unclear on why you would omit those. (I'm a Sybase fan, myself.)

    5. Re:Easy, brain-dead sql db recovery (if possible) by Tough+Love · · Score: 1

      Oracle, DB2 and Sybase of course all pass the test. I'm unclear on why you would omit those.

      Because there is no point in listing any databases that are not open source, for an open source project. Those you mention are very respectable of course, in their own world.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    6. Re:Easy, brain-dead sql db recovery (if possible) by Nefarious+Wheel · · Score: 1
      Which RDBMSes that you know are 100% ACID compliant? Please name any you know.

      A very close contender would be my old friend the (now Oracle-branded) RDB from now-defunct Digital, running on VMS. It's still in use after decades of people hitting it with very large hammers, and with the less-reliable hardware of yesteryear it went through an awful lot of refinement over time. Ten years in that shop, after the first year of it's introduction we didn't lose a single committed transaction from then on. Nothing is 100% reliable but you could definitely stick a few nines after the 99 percent mark on that one. Might even be cheap nowdays, despite the Oracle and HP branding

      But you know the adage -- if it works, it's obsolete.

      --
      Do not mock my vision of impractical footwear
    7. Re:Easy, brain-dead sql db recovery (if possible) by noisehole · · Score: 1

      seriously, everytime a db discussion comes up here ppl are referring to mysql and postgres. how come firebird almost never gets any credits?

      acid is the first feature that gets mentioned on their factsheet http://firebird.sourceforge.net/guide/FBFactsheet. html

      and please, if a browser joke just came up to your mind, just drop it ;)

    8. Re:Easy, brain-dead sql db recovery (if possible) by TheRaven64 · · Score: 1

      Not, technically, an RDBMS, but SQLite is also ACID compliant, and even more free than PosgreSQL. Not useful in the same situations as a real RDBMS, but very nice for small or single-user things.

      --
      I am TheRaven on Soylent News
    9. Re:Easy, brain-dead sql db recovery (if possible) by stephenbooth · · Score: 1

      PostGres, Oracle, DB2, Sybase and Ingres all pass the minimal requirement for ACID (normal operations) but only PostGres and Oracle pass a strict requirement (when it all goes pearshaped). They have multi level redundancy and things like processes that watch the processes that do the work and if the processes that do the work die the watcher processes restart them or tidy up after them as appropriate.

      I'm not sure about Sybase or Ingres but I do know that read consistency in DB2 is a joke in non-trivial systems.

      Stephen

      --
      "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
    10. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      Because there is no point in listing any databases that are not open source, for an open source project.

      Uh ... why? Is it some kind of pseudo-religious, political correctness thing for you? Does the idea of using the right tool for the job not carry any weight?

      Those you mention are very respectable of course, in their own world.

      I agree, except that "their own world" is "Earth."

    11. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      Uh ... why? Is it some kind of pseudo-religious, political correctness thing for you? Does the idea of using the right tool for the job not carry any weight?

      No, it's because Wikimedia use only open-source software. There's no point listing software they'd never use. You're flaming the wrong person.

    12. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      Hm. I guess it is some kind of pseudo-religious political correctness thing.

      That's a shame. Whenever ideology gets in the way of technology, bad things result.

    13. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      Hm. I guess it is some kind of pseudo-religious political correctness thing.

      I can't speak for Wikimedia, but I wouldn't characterise "ethics" as "pseudo-religious political correctness".

      That's a shame. Whenever ideology gets in the way of technology, bad things result.

      I wouldn't call GNU a bad thing, nor the FSF, the EFF or anything of that kind.

    14. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      I wouldn't characterise "ethics" as "pseudo-religious political correctness".

      That's okay. I wouldn't characterize only choosing to do business with inferior tools when superior tools are widely available because the inferior tools conform to some arbitrary standard of ideological acceptability "ethics."

      I wouldn't call GNU a bad thing, nor the FSF, the EFF or anything of that kind.

      Oh, gosh, I would. Gnu has done more to set back the cause of software collaboration than any other single group. Their maniacal crusade against anybody with the slightest interest in profiting from the sale of computer software has completely turned the world off to the idea of public-domain software. The FSF is nearly as bad, and everybody knows that the EFF is just a political front-group for radical leftists who want to affect a fundamental change in the way our society works.

      Each of those things is very bad indeed. All the more so because they have apparently fooled people like yourself into thinking that they're not bad at all. Which is very, very sad.

    15. Re:Easy, brain-dead sql db recovery (if possible) by crayz · · Score: 1

      You may not agree with open-source philosophy, but to caricature it as "only choosing to do business with inferior tools when superior tools are widely available because the inferior tools conform to some arbitrary standard of ideological acceptability"

      Any so-called worldview you're working with here would condone Mengele in the same sentence as DB2, an overstatement so broad it's ironic coming from the keyboard of the person who criticized using the wrong tool for the job.

    16. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      > > The solution is to switch to a database that actually implements ACID (the second letter stands for "Consistency" and the last letter stands for "Durability" which is what failed here).

      > Which RDBMSes that you know are 100% ACID compliant? Please name any you know.


      Even SQLite, for God's sake!

      The first point of the features list on their website: "Transactions are atomic, consistent, isolated, and durable (ACID) even after system crashes and power failures." [emphasis added]

      Was you trying to be funny? The fact is that ACID is not that hard that no one has it. MySQL is the only one who seems to have a problem with it, but what is even more frightening is that the answer of MySQL developers was always "ACID? You don't really need it." Even today you specifically need to create tables using non-standard syntax if you want them to be ACID-compliant (and yes, they are MUCH slower than PostgreSQL).

      The bottom line is that ACID is something successfully implemented for DECADES in DBMSs. I won't ever trust a self-proclaimed database "expert" who has ever even suggested that ACID is not a top priority because even if that developer adds some half-assed ACID features later to his project, we will see problems like this one with Wikimedia.

      If you're serious about your data, you need ACID because it means that EVEN IF THE POWER IS DOWN and EVEN IF THE DISK DIDN'T WRITE WHAT IT SHOULD HAVE WRITTEN when the power went down then your database is still in a consistent state (you know, the "C" in ACID) as in any given point. I've been following the MySQL development since the beginning, and the history of MySQL attitude can be summed up as:

      1. First was: You don't really need ACID!
      2. Oh, you do? But ACID would slow down the DB.
      3. Oh, correct data is important than speed? But in fact, no one has ACID. (YOU ARE HERE)
      4. Oh, they do? All of them? OK, here, have those new tables.
      5. Oh, they're slower than PostgreSQL? But they are real ACID!
      6. Oh, they're not? Wikimedia is screwed? ... (WE ARE HERE)

      I wonder how many years will have to pass before people will finally get over it and admit that MySQL is not a serious database...

    17. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      The solution is to switch to a database that actually implements ACID (the second letter stands for "Consistency" and the last letter stands for "Durability" which is what failed here).

      So the "C" stands for Consistent? But I thought that the "I" stands for inconsistent... uhm... Gotta go!

      --Anonymous MySQL developer.

    18. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      I wouldn't characterize only choosing to do business with inferior tools when superior tools are widely available because the inferior tools conform to some arbitrary standard of ideological acceptability "ethics."

      My point sailed far over your head, didn't it?

      "arbitrary standard of ideological acceptability"? Did it ever occur to you that it's not some arbitrary standard, but a carefully chosen course of action? And that the process of deciding what is and isn't acceptable behaviour is...wait for it... known as ethics?

      Journalists, for example, often conform to certain ethics, such as attempting to tell the truth. It could probably help their careers if they just made stuff up occasionally, but that would go against their ethics.

      In the case of software, it might be the case sometimes that you can, in fact, get ahead quicker with closed-source tools. But that goes against some peoples ethics, because they are forbidden from sharing those tools with their friends.

      You might disagree with these values, but you have no business saying that it's not a decision resulting from ethics.

      Their maniacal crusade against anybody with the slightest interest in profiting from the sale of computer software

      You are either ignorant or a troll. Buy GNU software from the FSF

    19. Re:Easy, brain-dead sql db recovery (if possible) by Jamesday · · Score: 1

      Agreed.

      Oracle is still not going to happen. Not going to get into a potential huge recurring license and support fee situation. The cost/benefit trade for it just isn't right for this job - license and support fees can buy other solutions instead.

    20. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      to caricature it as...

      You didn't finish your sentence.

      Any so-called worldview you're working with here would condone Mengele in the same sentence as DB2

      What?

    21. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      Did it ever occur to you that it's not some arbitrary standard, but a carefully chosen course of action?

      The possibility occurred, sure. But you don't have to be a brain surgeon to see that that's not what's going on here. It's totally political.

      In the case of software, it might be the case sometimes that you can, in fact, get ahead quicker with closed-source tools. But that goes against some peoples ethics, because they are forbidden from sharing those tools with their friends.

      We have a word for that. The word is "stupid." Just because there's a rationalization for a stupid position doesn't make the position any less stupid.

      You are either ignorant or a troll.

      Hint: Somebody who disagrees with you is not automatically uninformed or insincere. We have a word for that, too That word is "arrogant."

      So far you're stupid and arrogant. Wanna go for number three?

    22. Re:Easy, brain-dead sql db recovery (if possible) by Tough+Love · · Score: 1

      I wouldn't characterize only choosing to do business with inferior tools when superior tools are widely available because the inferior tools conform to some arbitrary standard of ideological acceptability "ethics."

      There is nothing whatsoever inferior about PostgreSQL or Ingres. Go away, troll.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    23. Re:Easy, brain-dead sql db recovery (if possible) by Leo+McGarry · · Score: 1

      Go away, troll.

      Were you drunk when you posted this? Were you on some kind of medication? Are you mentally ill in some way?

      I'm just wondering what possible excuse you could have for writing something so mind-blowingly rude.

      Whatever it is, it's sure to be a good one. Right?

    24. Re:Easy, brain-dead sql db recovery (if possible) by Tough+Love · · Score: 1

      Were you drunk when you posted this? Were you on some kind of medication? Are you mentally ill in some way? I'm just wondering what possible excuse you could have for writing something so mind-blowingly rude.

      There is no such thing as being too rude to a troll.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    25. Re:Easy, brain-dead sql db recovery (if possible) by Anonymous Coward · · Score: 0

      But you don't have to be a brain surgeon to see that that's not what's going on here. It's totally political.

      So far you have offered absolutely no reason for believing this. Substantiate that claim, please.

      We have a word for that. The word is "stupid."

      Choosing a particular course of action over another, more profitable action is not stupid when it is done for ethical reasons. You wouldn't call journalists stupid for choosing the truth over getting ahead in their careers, would you?

      Somebody who disagrees with you is not automatically uninformed or insincere.

      You stated that GNU and the FSF are radically opposed to selling software. That is completely false, the fact is easily verifiable and common knowledge.

      Now, if you didn't make that claim insincerely, and you weren't merely ignorant of the truth... what is your explanation for making that untruthful claim?

      So far you're stupid and arrogant.

      What basis do you have for calling me stupid? I never said that I agreed with their ethics, merely that it's an ethical decision not arbitrary wanking.

  52. Re:Paris Hilton Sidekick Hacked Photos and Phone N by Anonymous Coward · · Score: 0

    What is most pathetic is that those actual, real stars all gave their numbers to that moron. Money talks, I guess.

  53. *SIGH* Isn't it obvious I wasn't trolling? by FunWithHeadlines · · Score: 1

    I don't know who I angered, but I'm getting modded down more than usual, including at least three different moderators who voted the parent post a Troll. A troll? Anyone reading it can see I was making a joke based on what the submitter said. Not funny? OK, that's a valid criticism for everyone has their own view of humor, and I respect that. But a troll? Wow, What did I do to them?

    1. Re:*SIGH* Isn't it obvious I wasn't trolling? by Anonymous Coward · · Score: 0

      quit whining

  54. Oh...ok... by buffy · · Score: 2, Funny

    So, _this_ is where I should be posting my outage reports! And here I've been sending them only to people who would care.

    "Slashdot...outage reports for nerds! Stuff that doesn't matter to me!"

    Lol!

    -buf

  55. Yea! Death to "Wiki" and everything "Wiki" related by Anonymous Coward · · Score: 0

    The sooner that asinine word exits the lexicon the better.

  56. MySQL not ACID by Tough+Love · · Score: 2, Insightful

    From the wikipedia page:

    At about 14:15 PST some circuit breakers were tripped in the colocation facility where our servers are housed. Although the facility has a well-stocked generator, this took out power to places inside the facility, including the switch that connects us to the network and all our servers. (Yes, even the machines with dual power supplies -- both circuits got shut off.)

    After some minutes, the switch and most of our machines had rebooted. Some of our servers required additional work to get up, and a few may still be sitting there dead but can be worked around.

    The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state.


    (Bolding mine.) This proves that MySQL is not ACID, there is no way that a power outage is supposed to cause corruption in a database. This is not a troll, this is a simple conclusion. I really think that Wikipedia should switch to PostgreSQL, which is considerably more mature in terms of ACID compliance.

    --
    When all you have is a hammer, every problem starts to look like a thumb.
    1. Re:MySQL not ACID by Heikki_Tuuri · · Score: 1

      Hi! InnoDB does have a transaction log, and uses fsync() to write data to disk, like PostgreSQL and almost all transactional databases do. InnoDB is an 'ACID' database. The problem in this case was that apparently fsync() did not write the data to disk, or to a non-volatile disk cache. There is not much a database can do in this situation. Regards, Heikki Innobase Oy

    2. Re:MySQL not ACID by Anonymous Coward · · Score: 0

      fsync() is not enough.
      http://www.postgresql.org/docs/8.0/intera ctive/wal .html

      Cheers.

  57. Taken down by CO$, coincidence or not! by friendscallmelenny · · Score: 3, Funny
    Yesterday they had frontpage Scientology entry with Xenu stuff. I told my friend, "That site will be in trouble soon."
    He thinks I'm a god now!

    perhaps I just inadvertently reached clear

  58. I just submitted a story... by Refrozen · · Score: 1

    I hope I don't get in trouble, but I submitted a story that said "My site went off line during the reboot to upgrade the kernel, then, went down for a few seconds while overwriting files to implement the new Refrozen Upload"

  59. WHO CARES by joeware · · Score: 1

    Big deal, who cares. Do I really need to know when some group's servers crash?

  60. Re:Shooting pains in my left arm by ptbarnett · · Score: 1
    Phil, I'm sure someone is going to give you a hard time about responding to a troll.

    It probably is a troll. But, maybe someone will think the joke was funny enough to remember the symptoms and recognize them if they happen for real at some point in the future.

  61. Notice thier Database worries by iwadasn · · Score: 2, Funny


    Apparently one of their MySQL databases got corrupted as well. Figures. You'd think with all that volume they'd be wise enough to use a DB that can withstand a hard powercycle without losing data.

    Just remember, friends don't let friends use MySQL for important data.

    1. Re:Notice thier Database worries by isorox · · Score: 1

      I work for News at a major UK broadcaster called The ZZZ. We use a supplier called XXXX to keep track of a couple-thousand hours of broadcast quality video. Central to all this is a program called YYYY - it links all the fragments of video and audio (about 5 frames long each) into a clip. An American Sports channel uses the same system.

      Thursday afternoon we lost 7 hours as they're replication controller got its knickers in a twist and replicated a 12-hour clip backwards. We're still in testing so it wasn't critical, but a database that would allow this is not worthy of mission critical.

      Of course by the time we had noticed the problem restoring from a backup (which XXXX insist we dont need) would have meant losing just as much video. Merging the databases may have worked but some areas of storage would have been overwirtten.

      All very bad.

    2. Re:Notice thier Database worries by iggymanz · · Score: 1

      yup, there's a few open source databases out there that seem to rebound from power failures very nicely, you can just have them restart in the rc scripts, but I've seen too many mysql backends LOSE DATA. It's very distressing to me to see otherwise very cool projects, like rubyonrails, fall in love with a toy like mysql & build on that crap.

    3. Re:Notice thier Database worries by EDSdrone · · Score: 1

      Funny, I used to work for the ZZZ, in White City. My Sig. Oth. still works there and there's a bloke on her train who's implementing a broadcast archive to digital storage project. It sounded interesting, VERY large scale project..

    4. Re:Notice thier Database worries by isorox · · Score: 1

      Wow, your signature has it's own job?

      Oh.

      What train?

  62. Re:URI to the Rescue - Cisco Distributed Director by joejoejoejoe · · Score: 1
    I'm sure there are many devices and technology that break the one ip to one dns name, heck even dns breaks that with round-robin addressing...

    But as for hardware that can be used to serve two instances of a website, Cisco makes a product called Distributed Director.

    From the product description:
    Cisco DistributedDirector efficiently distributes Internet services among globally dispersed Internet server sites by leveraging the intelligence built into the Internet router-based infrastructure, standard Domain Name Services (DNS), and the Hypertext Transfer Protocol (HTTP). With DistributedDirector, customers can optimize server load distribution resulting in superior end-to-end server access performance.
    I am only mildly familiar with Distributed Director, but it gives different IP answers to DNS queries based on some formulas, one of which can be which ever server farm is considered closer to the client.

    In the case of this or a planned outage with DD you can take a site out of the active config (i.e. the down site). DD is for geographically disperse server farms.

    Cisco also makes a product called Local Director (both of these may have been replaced with "Intelligent Director" in some part, IDK anymore). LD allows you to balance across web servers for example (in the same server farm).

    Also as for a big caching system, most of the time I think the people that are serving something want to be the ones to serve it, directly. Reasons for not using your suggestion could include security, advertising revenue based on traffic stats, etc.
    --
    Silly Rabbit: tricks are for kids.
  63. Re:Shooting pains in my left arm by Phil+Karn · · Score: 1

    Could be. But we took it offline and exchanged a few more messages that make me think it was more likely to have been real.

  64. Rash of outages by RomulusNR · · Score: 1

    LJ was down, WP is down, Server Beach had an outage two weeks ago, and I at least have had the misfortune to have my ISP down for a week. Is it me, or does it seem like colocation center outages are becoming rampant lately?

    --
    Terrorists can attack freedom, but only Congress can destroy it.
  65. Re:Paris Hilton Sidekick Hacked Photos and Phone N by Anonymous Coward · · Score: 0

    OK, the "Eminem" voice has an Aussie accent, just like the douchebags that supposedly made the call. I call bullshit on this one.

  66. :::eyes UPS under table::: by shoemakc · · Score: 4, Funny

    :::eyes my UPS::::

    ::::ponders for a momment::::

    :::eyes the serial cable that gracefully shuts down said computer in the event of a power failure::::

    :::ponders some more::::

    :::eyes the spare UPS sitting in the corner that used to be connected to a database server::::

    Hmm, I think i'm almost onto something here, but i just can't seem to nail it down...

    -Chris

    --
    --an unbreakable toy is useful for breaking other toys--
  67. Fortunately, Wikicities is still online . . . by greenreaper · · Score: 2, Informative
  68. Proper fundraising link by Jugalator · · Score: 3, Informative

    The link in the article is broken, here's the proper one:
    http://wikimedia.org/fundraising/

    --
    Beware: In C++, your friends can see your privates!
  69. Consider, for a moment, by Anonymous Coward · · Score: 0

    that someone might as well trip the power cord between the UPS and the computer. Your UPS doesn't look so good now, does it?

    There are valid reasons for real databases having proper disaster recovery mechanisms built-in.

  70. Yep, she's a hottie. by Grendel+Drago · · Score: 1

    Man, I'd hit it. You hear that, kturner? I'd hit it! Judging from the back of her neck, that is.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  71. Points of failure. by Grendel+Drago · · Score: 1

    There are also valid reasons for having more than a single point of failure in a system.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  72. Doesn't make sense. by Grendel+Drago · · Score: 1

    You'd think that, since MySQL has been around for a number of years, and because other databases have it, that high reliability would have been contributed or at the very least funded by somebody.

    Maybe the performance penalty it incurs is prohibitive---one can run the site reliably, or one can run it fast, but not both. Ugh, what a choice.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Doesn't make sense. by stephenbooth · · Score: 1

      Reliability does cost. Firstly in development, it's something else that has to be coded and it usually involves a fairly high degree of complexity so is not simple to get right. Where the user will notice the cost is in performance. If you're not worried about reliability or data integrity then you can delay writing transactions to disk (disk i/o being one of the slower things that databases do) until there's no other activity on the system or you run out of memory to cache the transactions in. When you are worried about reliability then you have to write out a transaction as soon as it is commited, even if you've got a million and one other things going on you have to write it out then,preferably in more than one place (protects against bad blocks and sysadmins who want to clear space and think: "Well, it's only a log file.").

      Stephen

      --
      "Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
  73. Sooo.... by terpri · · Score: 1

    Loco colo severs servers?

  74. Uh huh. by Grendel+Drago · · Score: 1

    If Google let objects get looked up by a URI code as simple as say, [A-Za-z0-9]+ ... just 7 digits would cover each object instance in its database right now, dozens of times over. If Google opened up such a URI protocol to anyone on the Web running such a "DIS" server, just like DNS, they could offload much of the work...

    Yeah. I'm going to go register n8y9vtw before anyone else does. 'Cause everyone knows that the entirety of the namespace you mentioned is useful. Uh huh.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
    1. Re:Uh huh. by Doc+Ruby · · Score: 1

      Why not? Oh, you think URLs' value is defined by how catch they are. That's why there's so much money in typodomain squatting. Right.

      --

      --
      make install -not war

  75. A simple explanation. by Grendel+Drago · · Score: 1

    Clearly, someone set up you the bomb.

    --grendel drago

    --
    Laws do not persuade just because they threaten. --Seneca
  76. Re:This is idiotic. by RPoet · · Score: 1

    Running Wikimedia is insanely costly. Did you donate to them? (I thought not)

    --
    "Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
  77. Re:This is idiotic. by the+angry+liberal · · Score: 1

    Nope. I don't use them.

    Perhaps they should just give up since it costs so much.

  78. At the risk of pointing out the obvious... by Headcase88 · · Score: 2, Funny

    So now they'll have to put up a page to say "The temp page that says that our site is down is down. We are working aorund the clock to get the temp page back up.".

    --
    "When the atomic bomb goes off there's devastation...but when the atomic bong goes off there's celebraaaaation!"
    1. Re:At the risk of pointing out the obvious... by Anonymous Coward · · Score: 1, Funny
      The temp pages have been completed in an entirely different style at great expense and at the last minute.

      Signed, Joe Q. Llama

  79. Re:This is idiotic. by RPoet · · Score: 1

    Many of us do use them, many times each day, and find it an incredibly useful resource. It's just too bad it's so slow, and even worse when they're completely down like now, so we donate what we can.

    --
    "Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
  80. Linux Kernel bug?!? by peterwilm · · Score: 2, Interesting

    I recall a discussion about fsync not being properly implemented both in Linux Kernel 2.4 as well as 2.6. I think it was patched in 2.6.9 or so, but not in 2.4.

    Unfortunately, I cannot find the thread any more. Does anybody remember?

    So, this might be rather a linux kernel bug, not a mysql bug.

    Secondly, why does everybody say that mysql does not support ACID-transactions? MySQL does advertise them. Are you talking about pre-4.0 MyIsam tables? Or do you suggest that 4.0/4.1 InnoDB-tables aren't ACID-compliant either?

    1. Re:Linux Kernel bug?!? by peterwilm · · Score: 1

      mentioned LKML thread (and it was 2.6.5)

    2. Re:Linux Kernel bug?!? by Anonymous Coward · · Score: 0

      Secondly, why does everybody say that mysql does not support ACID-transactions?

      For two reasons:

      1. During the whole time MySQL wasn't ACID, practically every MySQL fanboy laughed when somebody complained that it wasn't ACID, saying that it wasn't necessary/it was bloat/it's too slow/etc. That's ingrained in many peoples memories as MySQL people being dumb, ignorant, arrogant, and unwilling to add ACID.
      2. Even today, when you ask for InnoDB tables, it can quietly give you non-ACID tables without even bothering to tell you.
  81. more servers.. by martin · · Score: 1

    need servers in difference co-lo's.

    Google suffered the same problem and that was the last time they had all their eggs in one basket (err servers in on co-lo.

    1. Re:more servers.. by C00lCat · · Score: 1

      Best practice is to have several servers around the world. 1 on California, 1 on New York, 1 on centeral europe, 1 on japan or some other far eastern country. Why not Korea? Koreans have too much bandwidth for my taste...

      --
      "All your base" used to be fun.
  82. Re:This is idiotic. by vidarh · · Score: 1


    Yeah, because you should always give up whenever you meet some resistance, because you're destined to fail anyway.
    </sarcasm>

  83. Back up by Anonymous Coward · · Score: 0

    It seems to be working in read-only mode now. Now let's see how the single slave DB server reacts to the upcoming slashdot effect.

    1. Re:Back up by wongn · · Score: 1

      It seems to be currently working on a copy that is a good 35 hours+ old. I'm not sure about any more because the firewall here is allowing me access to neither Google cache or the livejournal technical blog (kinda ironic)

    2. Re:Back up by Meowing · · Score: 1

      Yes, it's currently behind. The site is up read-only while the logs play back and fill in the missing edits. THe devs don't have an ETA for "back to normal" but a least it's there again!

  84. Snapshots by Anonymous Coward · · Score: 0
  85. Wikimedia-Wikipedia by essreenim · · Score: 1
    Yes, in England, and indeed all other English speaking countries except America, sublety survives. There is a silent 'a' but I guarantee you people (at least me anyway) pronounce 'paedo' differently to 'pedo'. We prounce it 'pee-do'. I have no idea how you pronounce it, probably 'pehdo'.. Anyawy, why can't you learn to pronounce and spell and stop changing the language.

    Anyway, I'm a big fan of the Wikimedia foundation and recently donated a small sum to their cause. I will donate again later if they still run and they need help.

    1. Re:Wikimedia-Wikipedia by essreenim · · Score: 1
      Anyawy,

      And that goes for me too!. But at least I'm not trying to change the language.

      Also, I'd like to add that I think Google's donation to Wikipedia is a good gesture. It is the first evidence I have seen of them repaying something to free software for what free software has done for them.

      May it continue.

  86. What about the pr0n servers? Doesn't anybody care? by Anonymous Coward · · Score: 0
    Dozens of pr0n servers in Southern California have gone off-line because of the heavy rains.

    If Slashdot has decided that it's a priority to report when popular servers experience downtime, shouldn't it let us know the status of our favorite pr0n servers, too?

    How much longer do I have to wait before I can once again get access to my daily pr0n? I need to know!

  87. Flaws in Wikipedia by ivanjs · · Score: 1

    The concept of wikipedia is a good one, but I experienced first hand a flaw in their system of open moderation. I wrote an article that used images from my own website that I created, and someone registered my entry as possible copyright violation of my own work, even though I clearly stated on the entry that the images were mine! So my entry has lingered in CV limbo awaiting judgement. Laughable at best... ivanjs

    1. Re:Flaws in Wikipedia by wongn · · Score: 1
      I wrote an article that used images from my own website that I created, and someone registered my entry as possible copyright violation of my own work, even though I clearly stated on the entry that the images were mine!
      Did you correct them? There are doubtless a few, too many editors who in haste may often tag pages over zealously, but enough do well enough in checking to see if they truely are copyvio by reading notices. Your experience is only one example of a flaw, but no system is perfect in its effect, and for the vast part, the open moderation system *does* work.
    2. Re:Flaws in Wikipedia by ivanjs · · Score: 1

      Yes, I filled out the thing you're required to do to stop the article from being deleted, but it's still in CV. Even funnier-the guy who threw it into CV further said "well, if it's not copyright violation, it certainly is an example of neologistic vanity" or something similar. So he threw it into CV, without REALLY thinking it was CV because he didn't like the fact that I used images illustrating the concept? That's what the CV area is for???? Again, I LIKE wikipedia-that's why I put an entry in, because I liked the idea of community contribution, but when someone has the power to serve up your entry for deletion because of personal choice disguised as copyright violation, that's pretty lame. John

    3. Re:Flaws in Wikipedia by Anonymous Coward · · Score: 0

      You consider that a flaw? If someone else had been copying the information and pictures from your website without your permission, wouldn't you be GLAD that Wikipedia does the best it can to quickly find and eliminate copyright violations? Of course they get false positives sometimes. That's why the CV page is there -- to allow multiple eyeballs and common sense to separate the wheat from the chaff, and not just delete every suspicious item on sight in order to cover their asses.

    4. Re:Flaws in Wikipedia by wongn · · Score: 1

      What article was it anyway? If (though I doubt it, I just don't know anything here) it was original research, then it wouldn't be allowed on those grounds. I don't have any objections on "vanity" grounds, and wouldn't, but each editor to their own.

    5. Re:Flaws in Wikipedia by C00lCat · · Score: 1

      Well, thats bad but whats worse is contriverisal arguments like we have on religion, abortion, "freedom fighters" and all the other "good" stuff. As both parties believe they are right. It ends up painful. Some topics should be locked.

      --
      "All your base" used to be fun.
  88. Cospiracies by Anonymous Coward · · Score: 0

    There was a dispute last night that didn't go Jimbo's way, so maybe he pulled the plug? Makes for nice conspiracy anyway

  89. Oh No!! by Anonymous Coward · · Score: 0

    They should have used a UPS...we use it all the time!!

  90. It's not SATA by Jamesday · · Score: 2, Informative
    The best copy we have is on a lowly pair of 250GB SATA drives using Linux RAID 0 and since thats the best it's the one we used.

    Every main database server had corrupt database pages. That is, 3 systems with battery backed up write caching controlles and SCSI drives and 2 SATA systems with write caching SATA controllers, one battery backed up the other not, two different SATA disk drive makers.

    Involved:
    • Two completely different caching controller brands
    • Two different SATA drive makers
    • Seagate only on the SCSI drive maker side

    Obvious speculation involves the controllers not telling the drives not to write buffer or the drives not listening. No point in getting into SCSI or SATA or this disc controller or that controller fights when there's this much variation involved.
  91. Mirror by Anonymous Coward · · Score: 0

    I've set up www.WikiMirror.com a while back.

    Let's see how it manages to survive the Slashdot effect ;)

    Note: Some pages might be out of date

  92. Write-ahead logging by Jamesday · · Score: 1

    Yes, so does the InnoDB engine in MySQL. Doesn't help so much when the drive system has lied about which page writes have been committed to disk and you have a RAID system where a page is spanning a couple of drives and one wrote its update to the page while the other didn't, which is what I speculate happened. That leaves you with a database page which fails its checksumming. Can recover from individual bad pages but not worth doing it when there's a complete copy available. Since we don't need to recover them, we aren't. Copying from one with no apparent damage instead.

    Did, of course, make a copy of the databases before trying to restart some ofthem, so we'd have that recovery option if it turns out that we need it.

    1. Re:Write-ahead logging by fimbulvetr · · Score: 1

      That's exactly the point I've been trying to make, it's just that you've done it much more succinctly. Thank you.

    2. Re:Write-ahead logging by Jamesday · · Score: 1

      Happy to oblige, since I really do understand what's happening with the systems I'm one of those looking after. And have been tracking the various investigations LiveJournal did.

      I will be chatting with MySQL about this though. Two big sites with nice controllers having the same problem (and in our case, two different vendors) means there's something they should really look into handling better, somehow. Certain that we aren't the only people affected, just the prominent ones who are willing to write about it in public.

    3. Re:Write-ahead logging by smitty45 · · Score: 1

      just out of curiosity...it wasn't 2.6.x kernel running on Opterons, was it ?

      and perhaps mysql using O_DIRECT ?

    4. Re:Write-ahead logging by Jamesday · · Score: 1

      Yes, it was 2.6.9 on Opterons (2.6.9-1.6_FC2smp and 2.6.9-1.681_FC3smp) . Don't know without checking about O_DIRECT. The one which survived completely intact is a P4 on 2.6.9-1.11_FC2.

    5. Re:Write-ahead logging by smitty45 · · Score: 1

      Heiki Tuuri has been talking quite a bit about instability of InnoDB on AMD opterons, especially on 2.6.

      The fact is - it's just plain unpredictable and IMHO, broken. Andrew Morton has been given ample evidence of this, and it certainly looks like the majority of the problem lies in either virtual memory, or the 2.6 I/O scheduler.

      Friendster suffered the same problems, and came to one conclusion: use 2.4, until 2.6 is fixed.

      My current work shows the same thing: even a CHECK_TABLE would fail with "page corruption", and it would appear as if it's random.

      FWIW, there's even a report on the mysql mailing list of a guy running his own test...he memory-mapped an entire 4gb or so, and got back different results when it was re-read. Very scary.

      Either way, I sympathize with you. :)

  93. OT? Operating Thetan? by Anonymous Coward · · Score: 0
    Oh well.. Slightly OT

    Slightly Operating Thetan?

    I don't think so. Sounds to me like an ARC-break!

    1. Re:OT? Operating Thetan? by Anonymous Coward · · Score: 0

      ot = offtopic

      -someone else

    2. Re:OT? Operating Thetan? by Anonymous Coward · · Score: 0

      Oooooo, so close, you almost got the joke. Better luck next time!

  94. More conspiracies... by Anonymous Coward · · Score: 0

    There was a dispute last night that didn't go Jimbo's way, so maybe he pulled the plug? Makes for nice conspiracy anyway...

    As long as the servers are not "outsourced", there is little transparency in WP. Google, please

  95. How much do we have to pay by Anonymous Coward · · Score: 0

    to keep them from tripping over the powercord again?

  96. Re:Shooting pains in my left arm by Anonymous Coward · · Score: 0

    no worries. Living in Ontario Canada there aren't any doctors. Government cutbacks and low Doctor pay.

  97. Indeed. by Anonymous Coward · · Score: 1, Insightful

    Sometimes the history of an article says just as much (if not more) than the article itself.

  98. Itemized spending is standard by SeanDuggan · · Score: 1

    Must you really know what the money is being spend on?
    If you donate money, you are asking them to continue to offer their great service to you and other people. How they achieve that goal, is up to them, no?

    Except, well, every major charitable organization of decent size issues itemized reports as to where the money is budgeted and how last years money was actually spent versus budget. So yes, I will ask the organization how they plan to spend my money before giving it. Well, except for webcomics... After all, we already know they need the money for server costs, pens, crack...ers. Can't do without those saltines.

    --
    This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
  99. Latest news by saforrest · · Score: 4, Informative

    Posted on the mailing list wikipedia-l 32 minutes ago:

    From: Brion Vibber
    Reply-To: wikipedia-l@wikimedia.org
    To: Wikipedia-l, Wikimedia Foundation Mailing List, Wikimedia developers
    Date: Tue, 22 Feb 2005 04:47:56 -0800
    Subject: Re: [Wikipedia-l] Wiki Problems?

    Brion Vibber wrote:
    > There was some sort of power failure at the colocation facility. We're
    > in the process of rebooting and recovering machines.

    The power failure was due to circuit breakers being tripped within the colocation facility; some of our servers have redundant power supplies but *both* circuits failed, causing all our machines and the network switch to unceremoniously shut down.

    Whether a problem in MySQL, with our server configurations, or with the hardware (or some combination thereof), most of our database servers managed to glitch the data on disk when they went down. (Yes, we use InnoDB tables. This ain't good enough, apparently.)

    The good news: one server maintained a good copy, which we've been copying to the others to get things back on track. We're now serving all wikis read-only.

    The bad news: that copy was a bit over a day behind synchronization (it was stopped to run maintenance jobs), so in addition to slogging around 170gb of data to each DB server we have to apply the last day's update logs before we can restore read/write service.

    I don't know when exactly we'll have everything editable again, but it should be within 12 hours.

  100. Imagine by Anonymous Coward · · Score: 2, Funny

    Imagine what would happen if there would be a link on wikipedias main site to slashdot and from slashdot back to wikipedia...Boom?

    1. Re:Imagine by Anonymous Coward · · Score: 0

      Internet would explode :D

  101. Re::::eyes UPS under table::: by Anonymous Coward · · Score: 0

    ::::kicks out the cord between your computer and the UPS, which is analogous to what happened to Wikipedia::::

  102. What does that have to do with anything? by Anonymous Coward · · Score: 0

    They are using mysql's transactional tables. It still corrupts the whole damn table alot of the time when its unexpectedly terminated. Real databases do not do this, they will not corrupt the table.

  103. This is exactly what I've been worrying about. by EDSdrone · · Score: 1

    Today in fact. I built a wiki to work as a knowledgebase it works superbly on a small scale, there's a possibility of using it on a larger scale. My main concern is the databse, I'd like to port it to DB2 (we're an IBM shop) as I don't want to be the one that gets fired when what happened to wikimedia happens to me. Might even get them to hire someone to do the job, if they're feeling generous...

  104. "fairly furiously"? by Anonymous Coward · · Score: 0

    Meanwhile, the devs are working fairly furiously to get it back up (Kate hasn't slept in 27 hours

    That doesn't sound right.

    "Fairly" means moderately, while "furiously" is pretty extreme.

    You could say something's "fairly big" which would mean it's pretty big, or you could say it's "gigantic" which mean it has extreme size. But you'd never say it's "fairly gigantic", since the meanings of the two words conflict.

  105. Re:Power outages suck. - Conspiracy theory... by philipdl71 · · Score: 1

    I know the wikimedia folks are fundraising for more servers, but I wonder if this will provide more incentive to accept Google's offer?

    What you mean to ask is if Google's team of secret agents sneaking into the colocation facility and tripping the circuit breakers will result in wikipedia deciding to accept Google's offer and thus further their plans for world domination.

  106. Google cache without highlighting by Anonymous Coward · · Score: 0

    In case anyone else finds the highlighting as distracting as I do:

    Google cache of Wikipedia's Xenu article without highlighting

  107. Power overwhelming... by C00lCat · · Score: 1

    I am suprised their servers are so fragile. A UPS/Surge protector aside from generators can do wonders... I thought they had some kick ass backup power... Although wikipedia seems to be back, rather slow but better than none.

    --
    "All your base" used to be fun.
  108. In related news.... by AviLazar · · Score: 1

    Wikipedia updates one of it's records regarding Slashdot.org (a.k.a. /.)

    The new record reads "You will rue this day /., you will rue this day..."

    --

    I mod down so you can mod up. Your welcome.
  109. Re:URI to the Rescue - Cisco Distributed Director by Doc+Ruby · · Score: 1

    Yes, that's true - I've used DD (and its predecessors) since 1999. It's got exactly the limitations addressed by URIs. Because URIs are a higher-level layer, while DD is a lower-level layer, addressing the problems of distributed objects keyed as URLs. DD makes you clone entire webservers for a single distributed object. And doesn't let you distinguish between versions, or other state differences. I expect that Cisco will make terrific products supporting URIs when we get software that uses more than URLs.

    --

    --
    make install -not war

  110. Jonathan is misinformed! by dahamsta · · Score: 1

    Heh, from the Day 2 Fund Drive report:

    "What can I say? I owe half of what I know to the Wikipedia! Keep up the good work." by Jonathan Grose

    So a full quarter of what Jonathan knows is misinformed, inaccurate, or 1337 speek!

    (Nah, really I love Wikipedia. But Jonathan was asking for it.)

  111. Re:This is idiotic. by Anonymous Coward · · Score: 0

    Ok... but that's the point. YOU use them, have YOU donated? No? STFU about slowness then!

  112. ACID by Jamesday · · Score: 2, Informative

    Except it's now been a few years since MySQL incorporated InnoDB, so maybe it's time to move on and rejoice that it's now one of the free database servers with ACID support? This one happens to come with standard replication and fulltext search. Also with a range of other engines to choose. PostgreSQL, last I knew, doesn't have built in replication, fulltext search and alternative storage engines but has it's own particular strengths. In the end, every end user gets to benefit from the competition between excellent tools. Good for us all to be happy about that.

    1. Re:ACID by Anonymous Coward · · Score: 0

      Except it's now been a few years since MySQL incorporated InnoDB, so maybe it's time to move on

      I'm not saying that this concept is justified, I'm merely offering it as a partial explanation as to why people are so dismissive of MySQL's ACID. And I believe the complaint about it silently substituting non-ACID tables is still valid with the latest release.

      Also, people being told that ACID was merely fluff over and over by fanboys are not going to be inclined to "rejoice" now the people who told them that have finally figured out that it is, in fact, a really big requirement.

      PostgreSQL, last I knew, doesn't have built in replication, fulltext search and alternative storage engines

      Replication and full-text search are certainly available for PostgreSQL; they aren't "built-in" (although it does ship with a full-text search implementation, it's not installed by default), for example one third-party full-text search engine takes advantage of PostgreSQL's extension mechanisms to do what it needs to.

      Not sure about alternative storage engines, 8.0 does include tablespaces to tweak filesystem usage.

      But isn't it a little hypocritical though, to point out that people complaining about MySQL's ACID are going on out of date information - and then start pointing out PostgreSQL's shortcomings with "last I knew" statements that aren't exactly up-to-date themselves?

    2. Re:ACID by Jamesday · · Score: 1

      Since I'm mostly looking after MySQL servers I know I'm not current on all the latest PostgreSQL developments, so I thoght it wise to leave an opening for those who know it better to correct me if I'd missed something new in the last few months.:)

  113. I followed your spam link... by Anonymous Coward · · Score: 0

    Are you tired of getting spam, scams, and malware via email?

    Yes! Not only via email, also via Slashdot comments!

    STOP YOUR Spam/scams/malware NOW!

    NOW? Oh dear God, where do I sign up!!!

    FREE download!

    This is too good to be true!!!!

    Learn how to stop your unwanted email now for FREE!

    All for FREE???

    New email filtering programs for Windows makes spam and malware sent by email 'almost impossible'.

    Where do I send the cheque???

    Now, seriously, I find it utterly hilarious that the solution to spam that READS like spam is spamvertized on Slashdot by its author who doesn't do anything else than spamming Slashdot threads with his spam links! Hilarious.

    Moderators: WAKE UP!

  114. Wikipedia now read-write by Jamesday · · Score: 1

    Wikipedia is now read-write on a limited number of servers. Enough for most things but we still have some features disabled as the rest of the database servers catch up. Any data loss was limited, so far as we can tell at present, to the last few seconds at most.

  115. InnoDB does use WAL by Heikki_Tuuri · · Score: 2, Informative

    Hi!

    InnoDB has used WAL since I wrote it in mid-1990s. To PostgreSQL, WAL came later, around 2000.

    Regards,
    Heikki
    Innobase Oy
  116. My advice by Pan+T.+Hose · · Score: 1

    I am having some pretty severe shooting pains from the base of my neck down to my left wrist. Coupled with a strange "squeezing" in my chest, and shortness of breath, I'm a little worried.

    And you should be worried because those shooting pains in your wrist sound like a carpal tunnel syndrome. Try to use your keyboard mostly with your right wrist while the pain is stronger and do some exercises every fifteen minutes or so (push-ups are great for stretching wrists). The shortness of breath has nothing to do with carpal tunnel and is a normal symptom of being overworked. But keep in mind that while few nice shots of espresso will help you stabilise your breath and feel less tired, only proper exercises can help your overworked wrists.

    Only a couple more hours of work. I might hop out and see the doc then.

    You should try to use your right wrist for the hardest jobs, do a lot of exercises, and if the symptoms don't disappear in few weeks or months at most you should probably see a doctor. I wish you good luck and I hope your wrist problems will not stop you from posting on Slashdot.

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  117. battery backed RAID issues, geographic diversity by whitis · · Score: 1

    I find it curious that both Livejournal and Wikipedia were using fancy battery backed RAID controllers and this still happened. I have no personal experience with such controllers but
    I assume they must work like this: the most recent N writes are stored in battery backed RAM. After power is restored, the controller rewrites ALL of the data in battery backed RAM in case some of it didn't make it out to disk.

    Now there are three scenarios where this could fail, that I can think of. In one, the total cache on all connected disk drives exceeds the total size of the battery backed RAM. Unless there is a message you can send to the drive forbidding it to use a portion of its cache for writes (ok to use all for reads), you are SOL in this case. The second scenario is that the controller clears some of the data in the battery backed RAM after it has been told by the drive that it has been written (via serial number) or after some amount of time (or number of write operations) has passed and this assumption is wrong. The third scenario is
    a variation of the second in which the controller assumes the drive does not cache writes at all and immediately invalidates the RAM copy as soon as a write xfer operation completes but that is rather pointless. I am ignoring problems due to equipment failures or week long power outages here.

    Write back caching on the drive is loaded with problems and in lieu of such a controller, It would seem that about the only way it can safely be enabled (without supplemental storage on a device without write back cache) is if there is the ability for the operating system to send a query to the drive to ask if a specific serial number of write operation has completed or at least to return the highest serial number such that all serial numbers lower than that have completed. Then the operating system can allow, for example, 15 processes to simultaneously schedule write operations (or one process multiple writes) which the drive can complete in optimal order but none
    of those processes can continue once they have called flush() until the data has actually been written.

    I assume wikipedia has something like 100 reads for every write. I wonder if the performance from allowing write caching is really necessary
    during normal operations (as opposed to database or index rebuilds or replication operations) and if it could be turned off most of the time. This is not a fix for the underlying problems (unless it stays off all the time) but would be a way to improve the odds considerably.

    It seems to me that data center proceedures should be designed (and fire codes need to accomodate this) to allow operations to occur in a reasonable sequence. First the alarm goes off. Then a signal is transmitted to all servers to begin shutdown (can be delivered by ethernet). Lights must remain powered until people have a chance to evacuage. Then there is a reasonable delay to allow evacuation of personell and to allow someone to reach the hold off switches. There would be separate hold off switches for halon and power down. Then halon wold be released. Power would not be shut off until servers had time to power down or fire crews needed to enter the area for purposes other than verifying that the fire was out. Even if the fire is being maintained by emergency power, the halon should kill it so a facility with adequate halon should never need to actually do an emergency power off unless perhaps the fire affected an area outside the area protected by halon.

    At a minimum, I would expect the data center to have two complete UPS systems. These would supply power to racks in a cartesian arrangement where each rack was supplied power from two different UPSes, one from a row bus and one from a column bus. With the exception of the emergency power off required by fire codes, it would be very difficult for a rack to lose power on both busses unless there was a short from one b

  118. Re:What about the pr0n servers? Doesn't anybody ca by Qubit · · Score: 1

    You should go make a page on wikipedia that has information about what regional porn servers are up or down.

    Perhaps you could get Netcraft and Vivid Video to co-sponsor a page with up to date information on it -- just like what Netcraft does now. I suggest that you call it

    Vivid Craft ;-)

    (GoDaddy.com says that vividcraft.com is not taken yet!...)

    --

    coding is life /* the rest is */
  119. MOD PARENT UP by Anonymous Coward · · Score: 0

    This was a very good post. Unfortunately, it will be modded down because it dared to say the truth, while the laughable reply by incompetent Wikimedia DBA saying "Since at least one of our MySQL database servers has so far restarted successfully with all InnoDB data intact, perhaps you'd care to reconsider your assessment that MySQL is incapable of doing what it just did?" was obviously modded up... Yes, Jamesday, if *one* database in a cluster (the only one which was not live, I might add) is not corrupted, it means that a database is reliable... Let's just ignore that *every single one* of the live database servers was not reliable, let's concentrate on the one that was off-line and was not processing any updates while the power went down. And the rest of the servers must have had hardware problems, all of them at the same time. What a joke.

  120. not quite by Jamesday · · Score: 1

    There are 8 people who have decided to call themselves that and are doing something. There's no broad community action on it and it's not in any way any sort of official editorial team with any official role.

    Editing articles to a fixed state seems very unlikely to happen, since it's pretty thoroughly contrary to the method by which the project works and the complete and comprehensive objectives of the project. The general result of people trying to do it is them being barred from the project for uncooperative editing.

    Paper and CD are risky targets because they lose the CDA and OCILLA protections which keep wikipedia.org, the Wikimedia Foundation and other contributors very safe from legal action based on content.

  121. Re:Shooting pains in my left arm by Anonymous Coward · · Score: 0

    Frankly, could be a heart attack. I am a doctor. You need to go and see someone IMMEDIATELY, even if the pains subside. I hope that it isnt so, but sometimes chest pain which goes away (Unstable angina) is a precursor of a heart attack. IF you are reading this, ask someone to take you to the doctor and get an ECG and your serum enzymes (CPK-MB) done, NOW.

  122. Re:This is idiotic. by RPoet · · Score: 1

    I have indeed donated, out of my poor student budget.

    --
    "Oppression and harassment is a small price to pay to live in the land of the free." -- Montgomery Burns.
  123. shouldn't MySQL handle that? by Trepidity · · Score: 1

    The standard in the "big iron" database world is that no matter what the hardware lies to you about, you can still come up in a consistent state, assuming that there is some time t in the past at which all data up to that point is successfully written to disk. Algorithms for figuring out what hasn't been written yet, even in the face of inconsistent write caches, are probably 30 years old by now.

    Losing recent changes is certainly acceptable, but the DB simply giving up and saying "restore from backup" isn't.

  124. MySQL moving up by Jamesday · · Score: 1

    Big iron (and expected corporate features) is still an area where MySQL is rapidly evolving. I doubt it'll take two years. Likely less with stimulous from high profile incidents.

    Restore from backup for MySQL really means "restore from your backup and replay your binary log until you get back to the point of failure". Or ask MySQL for assistance - they will look at such cases. Neither is as good as I'd like of course - either involves more extended unavailability of data when the site needs to be up, if with incompete data, within minutes or a small number of hours.

    On the followup side, additional power lines are being run to our racks and discussion with one RAID controller vendor indicates that a maximum of 20 minutes of battery backup can be expected. That's not long enough for a colo situation, so more followup with their engineers is needed, to see if they can produce something more realistic.