Slashdot Mirror


A Note On Thursday's Downtime

If you were browsing the site on Thursday, you may have noticed that we went static for a big chunk of the day. A few of you asked what the deal was, so here's quick follow-up. The short version is that a storage fault led to significant filesystem corruption, and we had to restore a massive amount of data from backups. There's a post at the SourceForge blog going into a bit more detail, and describing the steps our Siteops team took (and is still taking) to restore service. (Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)

75 comments

  1. oh okay by Anonymous Coward · · Score: 5, Insightful

    oh, I thought some of that shitware they sling got loose and bit them in the ass

    1. Re:oh okay by Anonymous Coward · · Score: 0

      +5 RealityBiteTheirAsses (ts)

    2. Re: oh okay by Anonymous Coward · · Score: 0

      Simple explanation: they had to reboot the system, and then it took all day for systemd to finish whatever it had to do before the system could come up.

  2. NSA backup by roman_mir · · Score: 0, Troll

    Isn't NSA world's most famous backup solution? Just ask them for the data. 'NSA cloud', now for more than just political prosecution and government oppression and corporate theft (and an occasional viewing of your dick). Backup your data with us! It is also paid by debt and inflation, so you don't even have to pay for the service, you have already paid!

    As to /. ..... I have no words, only adjectives left for you.

    1. Re:NSA backup by Tablizer · · Score: 5, Funny

      Isn't NSA world's most famous backup solution?

      Yes, but they have to kill you after they restore your data.

      Try Chinese gov't instead.

    2. Re:NSA backup by war4peace · · Score: 1

      Yup, they kill you BEFORE they restore your data.

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    3. Re:NSA backup by sound+vision · · Score: 1

      Adjectives are words though.

    4. Re:NSA backup by roman_mir · · Score: 1

      Not once I am done with them!

    5. Re:NSA backup by Anonymous Coward · · Score: 0

      be fair. they just remove your organs with good resale value and then what happens after that is up to you.

  3. Less Trolling by Anonymous Coward · · Score: 0

    Lately I have noticed quite a fair bit less of the typical trolls.

    Has Slashdot improved filtering mechanisms?

    1. Re:Less Trolling by roman_mir · · Score: 0, Troll

      /. Is does enough trolling of its own, all the sjw crap and more. Trolls are professionals now.

    2. Re:Less Trolling by Anonymous Coward · · Score: 0

      Nope!

      This place is so lame that even the trolls are giving up...

    3. Re:Less Trolling by Demonoid-Penguin · · Score: 4, Funny

      Lately I have noticed quite a fair bit less of the typical trolls.

      An increase in numbers has resulted in an increase in their diversity - plus there's so little time when they've so many places to go in their desperate battle for attention. The troll union proposed hot-bedding and time-sharing but, for obvious reason, were unable to get the propositions ratified.

      Noted troll think tank UnderTheBridgeWatch, recently published a report predicting that the recent balkanization of troll unions due to a large number of them getting married (the others just want to sleep around) under the new gay marriage laws will further increase the diversity of their appearance. Because unlike humans, trolls of the same gender do produce offspring.

    4. Re: Less Trolling by mnemotronic · · Score: 2

      Personally, Ive off-shored & out-sourced a majority of my trolling, passive-aggressive self-rightous diatribes and compensated product endorsements. This leaves more time for, well, pr0n. Gotta have a life sometime ya know.

      --
      The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
    5. Re:Less Trolling by sound+vision · · Score: 1

      This is sad news. I was starting to miss the GNAA and goatse.

    6. Re:Less Trolling by Anonymous Coward · · Score: 0

      Thank you for your interest in joining the Gay Wigger Association of DICE* (GayWAD)! GayWADs worldwide are happy that you'd like to become part of our

      constantly enlarging member ship (come sail away 8======D~)

      Unlike other geek fraternities that you might have heard about, GayWAD accepts members of all races, creeds, and colors. We don't even have a technical inclination requirement. As our founders stated in the Annals of GayWAD, Chapter 1: "You don't have to be a geek, as long as you like it Greek." They were, of course, referring to the penis in anus style of sexual relations. Don't despair, as attaining full fabulous lifetime status in GayWAD is easy. The only prerequisites for membership in Gay Wigger Association of DICE* are that you meet all of the following conditions:

      1. 1. Ownership of penis, anus, or both

      To submit your Gay Wigger Association of DICE* Membership Application, simply do nothing. Congratulations, you're now a GayWAD!

      If you require a specific membership number for purposes such as framing, docking, or prestigious inclusion upon your business cards and resume, please take down this number: 69.

      Optionally, you may complete the following survey by replying to this post, indicating affirmative responses with an X in each appropriate box:

      GayWAD Membership Survey (OPTIONAL)

      [ ] I am gay
      [ ] I am a wigger
      [ ] I have used SLASHDOT VIDEO to find a sex partner

      After completion of this optional survey, your Slashdot post ID shall serve as your unique Gay Wigger Association of DICE* membership ID.

      Your GayWAD membership kit** is on its way.

      * GayWAD is neither affiliated with nor endorsed by DICE.COM.

      ** GayWAD membership kit no longer includes HIV self-test catheter.

    7. Re:Less Trolling by Anonymous Coward · · Score: 0

      I use to troll here several times a day. Anymore the only time I bother with Slashfag is when I'm utterly bored with everything else. I get around here about twice a week and most of the time look at the low posting numbers and just shrug to go to better sites.

    8. Re:Less Trolling by tepples · · Score: 1

      Because unlike humans, trolls of the same gender do produce offspring.

      How does that work biologically?

    9. Re:Less Trolling by Demonoid-Penguin · · Score: 1

      Because unlike humans, trolls of the same gender do produce offspring.

      How does that work biologically?

      Like flies on shit.

      Note that there is only one gender of trolls, and yet they increase in number. Proof!

      Witling faecus the rightfully endangered, potty-mouth troll.
      Habitat: under bridges, in sewage systems, amongst the cruft of computer systems. Dark places close to humans.
      Weaknesses: Humour, facts and sunlight. Humour causes them pain. Exposure to sustained Facts or strong Light are fatal.
      Appearance: Various aberrations. Recognisable by the unique pin-like growth on their neck - the only visible feature that reliably distinguishes them from sentient bipods.
      No pictures exist due to their sensitivity to light of any form Artists impression
      History: A hydrogen-sulphide based life-form, possibly originating from the interaction of decomposing faeces and swamp gas. Whether they in fact qualify as a life form, or possess sentient capabilities is uncertain. It's theorised that they appeared when the first ancestors of humans developed intelligence, and that trolls have been devolving ever since. The theory is much debated and purely hypothetical as the only historical records are in the form of ancient legends due to the lack of fossil record. They have no backbone and upon death leave only a nasty stain and a foul odour.
      Biology: Their "closed-loop" digestive system allows them to survive their entire life eating only their own excrement. As their brain is composed of only two cells (neither of which function) they are unable to support any distro other than Windows - and even then, only the sliding kind.
      Additional references: Troll study, Suler, J.R. and Phillips, W. (1998). Deviant Behavior in Multimedia Chat Communities.

    10. Re:Less Trolling by Demonoid-Penguin · · Score: 1

      This is sad news. I was starting to miss the GNAA and goatse.

      Take heart, there is always gonorrhoea and syphilis. And Foxx News.

  4. Sourceforge Badware risks ? by nickweller · · Score: 4, Informative

    Sourceforge is Badware risks .. http://i.imgur.com/Hhtgv0H.png

    1. Re:Sourceforge Badware risks ? by Anonymous Coward · · Score: 2, Informative

      Sourceforge use to be great but has been serving crapware for a couple of years. You'd have to be off your rocker to use it if you have any choice, either as an author or an end user.,

      https://forum.filezilla-project.org/viewtopic.php?t=30240&start=90
      http://www.theregister.co.uk/2015/06/03/sourceforge_to_offer_only_optin_adware_after_gimp_grump/

  5. While you're at it, add some modern features by the_humeister · · Score: 4, Interesting

    like unicode support and ipv6.

    1. Re:While you're at it, add some modern features by dmomo · · Score: 5, Funny

      Unicode, shcmunicode... they're adding REAL features, like social media icons, polls disguised as articles, hawt new barfy web-2-oh skinz, and the sexy removal of the "read more" link. Because BUZZFEED!

  6. At least it wasn't Beta by arglebargle_xiv · · Score: 4, Insightful

    Could have been far worse...

    1. Re:At least it wasn't Beta by Anonymous Coward · · Score: 1

      Beta is coming. ...basically the meeting went like this:

      "what do you mean they didn't like the change to beta?"

      "fools, they don't know what's good for them"

      "I know, the idiot users are like frogs. We can boil them slowly. Let's start making all of the beta changes gradually over 6-12 months."

      "Genius! That'll show them, they won't even notice that we've changed anything at all"

      "Raises all round?"

      "Sounds good to me chaps!" ...etc...etc... beta is coming whether you like it or not.

    2. Re: At least it wasn't Beta by Anonymous Coward · · Score: 0

      Forget it Jake - it's Betatown.

    3. Re:At least it wasn't Beta by buckfeta2014 · · Score: 0

      My ears are burning...

      --
      Buck Feta. You know what to do.
    4. Re:At least it wasn't Beta by Tablizer · · Score: 1

      If it were Beta, we wouldn't know the difference.

  7. A Blazing Storage moment ... by cold+fjord · · Score: 2

    All right! Nobody moves, or the storage gets it! .... Help me! Help me! .... Shut down! ..... Won't somebody help that bad drive?!

    The reboot is near.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
    1. Re:A Blazing Storage moment ... by sumdumass · · Score: 1

      You kind of butchered it with the reboot is near being after the fact but otherwise excellent job.

      BTW, I think that is one of the best movies ever. Lots of people don't realize what it was making fun of though.

    2. Re:A Blazing Storage moment ... by JustOK · · Score: 1

      It's like twitter, have to read bottom to top.

      --
      rewriting history since 2109
    3. Re: A Blazing Storage moment ... by mnemotronic · · Score: 1

      Mongo LIKE.

      --
      The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  8. Might be some cleanup still needed by cold+fjord · · Score: 2

    I clicked on a "firehose" link and the most recent story was "YouTube's ready to select a winner" from March 2013.

    But the "help us select the next story" link was ok, as was directly entering Slashdot.org/recent.

    Good luck with the restore / clean up / troubleshooting. That's not a fun way to spend a weekend.

    --
    much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
  9. Future of free hosting at Sourceforge? by decaffeinated · · Score: 3, Interesting
    Serious question: Just out of curiosity, who pays the bills for all of the infrastructure that keeps Sourceforge running?

    Hardware isn't free and employees aren't free. I seriously don't understand how Sourceforge has kept the lights on all these years.

    And by the way, I'm a very satisfied user of their services. But I do worry about their future.

    1. Re:Future of free hosting at Sourceforge? by fred911 · · Score: 5, Informative

      I think ever since the bundleware fiasco the revenue is generated by ads.

      https://en.wikipedia.org/wiki/...

      --
      09 F9 11 02 9D 74 E3 5B - D8 41 56 C5 63 56 88 C0 45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    2. Re:Future of free hosting at Sourceforge? by decaffeinated · · Score: 1

      Thanks. That's not chump change.

  10. Problems Persisting by Anonymous Coward · · Score: 0

    Anyone else get an "Internal Server Error" 500 from this page?

    Had to reload 2-3 times before it come through successfully.

    All is not well afloat the good ship SlashDice!

    Not APK.

  11. Thank you. by Etherwalk · · Score: 5, Insightful

    Thank you to the Slashdot team. Bringing systems back up like that is emergency-mode-fun, but a lot of work, and we appreciate it.

    1. Re:Thank you. by mlts · · Score: 1

      Have to agree here. Lot of people appreciate /. being up and going.

      One can armchair quarterback and talk about how corruption wouldn't happen with this filesystem or this SAN, but corruption and problems happen no matter what the platform.

    2. Re:Thank you. by Anonymous Coward · · Score: 0

      The Slashdot team doesn't give a shit about you. They just care about the ad revenue.

    3. Re:Thank you. by cerberusss · · Score: 1

      Amen. I've been visiting this site around user ID 110.000 or so, and I've actually never experienced a full blackout. Static version every now and then, that's all.

      --
      8 of 13 people found this answer helpful. Did you?
    4. Re:Thank you. by Anonymous Coward · · Score: 0

      Not even 9/11 brought /. down. I blame DICE and the shared infrastructure personally.

    5. Re:Thank you. by reboot246 · · Score: 1

      Slashdot has ads?!?!?

  12. But it was webscale by Billly+Gates · · Score: 1

    It was fast as hell!

    here

  13. Timing by Tablizer · · Score: 1

    And right before the Pluto flyby.

    Seriously, though, imagine the thoughts going through NASA minds when the probe crapped out a week before the big encounter. Their toilets must have been full of bricks.

    It's not like rover problems where you can continue where you left off after you fix it. New Horizons couldn't stop.

  14. EMC by Anonymous Coward · · Score: 0

    Next time use an open source data storage solution, proprietary and fickle technologies like EMC must extinct!!

  15. Cause?? by scsirob · · Score: 4, Interesting

    It's great to see how you responded to the failure and got services resumed pretty quickly. However, I'd rather like to see a follow-up sometime, describing a root cause analysis. With all the clustered, distributed servers and filesystems you use today, such an outage shouldn't be possible, right?

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
    1. Re:Cause?? by swb · · Score: 2

      The blog post was pretty content free about what exactly went wrong.

      I would have guessed they would have the functional ability to either restore a storage snapshot to get back an entire LUN or a VM from a VM-based backup, and maybe they did.

    2. Re:Cause?? by Anonymous Coward · · Score: 0

      They don't really want us discussing this.

      The article was posted around midnight (US time) on a Sat night.

      I'm also noticing on my front page that the article is "collapsed" into just its title. I've noticed that before on other articles, but have no idea what logic dictates whether an article is collapsed or expanded. Anyway, this means I have to click twice, once on the article title to expand it, and again on the stupid speech bubble, to get to the comments.

    3. Re:Cause?? by danomac · · Score: 4, Funny

      Maybe somebody from Dice downloaded something from Sourceforge and installed it on their servers?

    4. Re:Cause?? by Anonymous Coward · · Score: 0

      YUP, checking back here 24 hours later (now roughly midnight Sunday night in London) and the article has 68 posts and is at the bottom of the front page.

      Nicely handled by the editors/admin of the site, well done guys.

      But we know that you guys are playing games like this and it alienates us from you. Others have posted here a "thank you" for fixing the problem, and others pointed out that it's all crap they don't care and they just want the advertising revenue, which is further evidence of the bad will SlashDice is creating with the community.

      Once the community is gutted of the people who "know shit" and "post iconic" and "invaluable" comments, the site will fall. Sure, enjoy your million page impressions per day and hundreds of forum posts from drooling idiots. Yay! But the heart and soul of the site will be gutted, and without the core contributors, the site will fall.

      Interesting times! Digg failed years ago. Reddit looks to be well on its way now. And SlashDice is all but a shadow of its former self.

      Who will fill the void? Soylent? Or my personal favorite (the dev has done an incredible job) PipeDot? Or something else...

  16. ZFS by darkain · · Score: 1

    Serious question: how much of this could have either had been prevented, or restored much more quickly if they were using ZFS with proper parity, checksuming, snapshotting, and sending (backups)? This really is the one-size-fits-all storage solution at this point.

    1. Re:ZFS by Harlequin80 · · Score: 2

      Kinda depends on the failure. If your raid controller decides to die in a spasmodic on off on off way you can easily corrupt all your file systems in one go, zfs or otherwise. At that point if you didn't have redundant live storage pools it gets harder.

      Or of course there is the issue where someone does something stupid, like deleting files from live machines without thinking about what they are.

    2. Re:ZFS by Anonymous Coward · · Score: 1

      Do you have any idea how ZFS works? Since ZFS is copy-on-write, you cannot corrupt already written data, unless your controller writes completely unrelated blocks or some crazy shit like that which I've personally never seen before.

      Also, a good setup separates the redundancy domains into separate hardware, i.e. if you run RAID10, no two disks of a mirror live on the same controller, for example.

      Deleting files is trivially defeated by regular snapshots.

      The best thing about ZFS: You always know the state of your data. If ZFS says it checksums fine, then your data is fine, period. If it says your data is corrupt and can't heal it automatically, you can whip out your zfs send backups and trivially restore your complete pool under complete checksum protection.

      You don't ever have to verify a single bit of user data manually.

    3. Re:ZFS by Anonymous Coward · · Score: 0

      I've seen more customers losedata completely than with any other tech. in fact before zfs we never lost data forever. zfs has lots of failure modes that moot the design, as we found.

    4. Re:ZFS by Anonymous Coward · · Score: 0

      FUD

    5. Re:ZFS by Anonymous Coward · · Score: 0

      ZFS has almost no failure modes except when the data has been corrupted, but ZFS will always know if the data has or has not been corrupted.

    6. Re: ZFS by Anonymous Coward · · Score: 0

      There are two kinds of people in the world. Those who don't understand ZFS and those who use ZFS.

    7. Re:ZFS by Harlequin80 · · Score: 1

      Given I have seen sysadmin delete the backups to free up space you cannot always handle stupidity.

      And seriously? You cannot corrupt already written data? WTF. ZFS has a whole system built into it to periodically check if data has corrupted once on the disk. Its called scrub. Do you think they would have gone to a huge load of effort if no on disk corruption ever happened?!?!?

      ZFS is very good at ensuring that there has been no "in transit" corruption by doing a crc check of the written file before removing the original. It is fantastic for that. But it doesn't protect you from the data corruption it only warns you when it occurs. If you have a controller spack out on you then you will end up with corruption. If the corruption happens fast enough or your are not monitoring it enough that corruption can destroy the file system.

      If you have redundant controllers and you catch it before the corruption spreads to the mirrored drive then you can recover. If however you lose a controller and have issues on your other controller you are stuck rebuilding from backups.

  17. Ceph block devices by TheRealHocusLocus · · Score: 1

    [...] This incident impacted all block devices on our Ceph cluster.

    Power/communications/routing down event? Was monitor quorum lost? Inquiring minds that are not trolls are curious and grateful that the path to restoration was clear. Best wishes.

    --
    <blink>down the rabbit hole</blink>
  18. Autorefresh sucks by Anonymous Coward · · Score: 0

    Autorefresh sucks. Turn it off.

  19. No expert by ChrisMaple · · Score: 1

    I've negligible experience in this sort of failure and recovery, but...
    Shouldn't slashdot and sourceforge be entirely separate, so that the failure of one can't bring down the other?
    Shouldn't there be live redundant systems, so that when one fails, one of the redundant systems is switched online in minutes? I don't mean just redundant storage, but 3 or 4 systems running concurrently, taking the same input and monitoring to confirm that the output is the same.

    Is this too expensive or not technically feasible?

    --
    Contribute to civilization: ari.aynrand.org/donate
  20. Obligatory bogachev by mnemotronic · · Score: 1

    Your important files encryption produced on this computer: photos, videos, documents, etc. Here is a complete list of encrypted files, and you can personally verify this.

    Encryption was produced using a unique public key RSA-2048 generated for this computer. To decrypt files you need to obtain a private key. The single copy of the private key, which will allow you to decrypt the files, located on a secret server on the Internet; the server will destroy the key after a time specified in this window. After that, nobody and never will be able to restore files.

    To obtain the private key for this computer, which will automatically decrypt files, you need to pay 666 Bitcoin like similar amount in another country. Click >nezt> to select the method of payment and the currency. Any attempt to remove or damage this software will lead to the immediate destruction of the private key by server.

    --
    The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
  21. Software or Hardware failure? by RedMage · · Score: 1

    "Storage corruption" is fairly vague. I've been bit by it in the past - once due to a vendor software bug (Oracle block corruption), and once due to hardware (flaky storage controller chip writing garbage (Supermicro MB)) I would like to hear more about the root cause.

    RM

    --
    }#q NO CARRIER
  22. Great the /. could notify users by Streetlight · · Score: 2

    It looks like /. had a Plan B ready in the case of a catastrophic failure. For some sites one just gets a blank page with some strange message when that happens. /. did the right thing letting users know they had a problem and were working on it and then let us know a bit about what happened. Thanks, /. techs.

    --
    In a time of universal deceit, telling the truth is a revolutionary act. George Orwell
  23. Damn you, Slashdot! by __aaclcg7560 · · Score: 1

    On the same Thursday that Slashdot experienced data storage corruption, the 1TB hard drive on my Windows gaming PC crashed, reporting 4GB of free space available and unresponsive to IO block commands. (I've seen that behavior on USB sticks, but never on a hard drive.) Except for several years of email, all my data was on the file server. Oh, well. I got a good excuse to rebuild my eight-year-old PC, especially with Windows 10 around the corner. Meanwhile, I'm using a $250 Dell laptop for everything except gaming.

  24. It allowed spoofing comment scores (5:erocS) by tepples · · Score: 1

    They tried Unicode before. It allowed spoofing comment scores. SoylentNews claims to support Unicode; I wonder how it prevents spoofing comment scores.

  25. That rebuild though! by Gazzonyx · · Score: 2

    I say this as someone that runs ZFS on his backup/file server; if you do have to restore or resilver it can take a long while! A single slow drive in a vdev will limit the entire pool's IO (the extent of which is entirely dependent on topology, but the weakest link always crushes you in ZFS). After a handful of TB of data, even with a pool of mirrored vdevs and a flash cache device, the resilver for a single drive can take a day unless you've got some serious spindle count at high RPMs. Even SAS drives don't provide that many IOPS.

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

    1. Re:That rebuild though! by Harlequin80 · · Score: 1

      It gets orders of magnitudes worse if you have two vdevs joined together in a single pool. I have 5 x 1.5g and 5 x 2g in a joint pool and I lost a 1.5. The re silvering process was days.

  26. Disclaimer by RyoShin · · Score: 1

    (Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)

    Nice to see that blurb of text again. Can we get this to happen every time you post a Nerval's Lobster/Dice slashvertisement, too?