A Note On Thursday's Downtime

← Back to Stories (view on slashdot.org)

A Note On Thursday's Downtime

Posted by Soulskill on Saturday July 18, 2015 @04:25PM from the slashdotting-ourselves dept.

If you were browsing the site on Thursday, you may have noticed that we went static for a big chunk of the day. A few of you asked what the deal was, so here's quick follow-up. The short version is that a storage fault led to significant filesystem corruption, and we had to restore a massive amount of data from backups. There's a post at the SourceForge blog going into a bit more detail, and describing the steps our Siteops team took (and is still taking) to restore service. (Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)

20 of 75 comments (clear)

Min score:

Reason:

Sort:

oh okay by Anonymous Coward · 2015-07-18 16:30 · Score: 5, Insightful

oh, I thought some of that shitware they sling got loose and bit them in the ass
Sourceforge Badware risks ? by nickweller · 2015-07-18 16:41 · Score: 4, Informative

Sourceforge is Badware risks .. http://i.imgur.com/Hhtgv0H.png
1. Re:Sourceforge Badware risks ? by Anonymous Coward · 2015-07-18 17:48 · Score: 2, Informative
  
  Sourceforge use to be great but has been serving crapware for a couple of years. You'd have to be off your rocker to use it if you have any choice, either as an author or an end user.,
  https://forum.filezilla-project.org/viewtopic.php?t=30240&start=90
  http://www.theregister.co.uk/2015/06/03/sourceforge_to_offer_only_optin_adware_after_gimp_grump/
While you're at it, add some modern features by the_humeister · 2015-07-18 16:49 · Score: 4, Interesting

like unicode support and ipv6.
1. Re:While you're at it, add some modern features by dmomo · 2015-07-18 17:50 · Score: 5, Funny
  
  Unicode, shcmunicode... they're adding REAL features, like social media icons, polls disguised as articles, hawt new barfy web-2-oh skinz, and the sexy removal of the "read more" link. Because BUZZFEED!
At least it wasn't Beta by arglebargle_xiv · 2015-07-18 16:51 · Score: 4, Insightful

Could have been far worse...
Re:Less Trolling by Demonoid-Penguin · 2015-07-18 16:54 · Score: 4, Funny

Lately I have noticed quite a fair bit less of the typical trolls.
An increase in numbers has resulted in an increase in their diversity - plus there's so little time when they've so many places to go in their desperate battle for attention. The troll union proposed hot-bedding and time-sharing but, for obvious reason, were unable to get the propositions ratified.
Noted troll think tank UnderTheBridgeWatch, recently published a report predicting that the recent balkanization of troll unions due to a large number of them getting married (the others just want to sleep around) under the new gay marriage laws will further increase the diversity of their appearance. Because unlike humans, trolls of the same gender do produce offspring.
A Blazing Storage moment ... by cold+fjord · 2015-07-18 16:57 · Score: 2

All right! Nobody moves, or the storage gets it! .... Help me! Help me! .... Shut down! ..... Won't somebody help that bad drive?!
The reboot is near.

--
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Might be some cleanup still needed by cold+fjord · 2015-07-18 17:04 · Score: 2

I clicked on a "firehose" link and the most recent story was "YouTube's ready to select a winner" from March 2013.
But the "help us select the next story" link was ok, as was directly entering Slashdot.org/recent.
Good luck with the restore / clean up / troubleshooting. That's not a fun way to spend a weekend.

--
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
Future of free hosting at Sourceforge? by decaffeinated · 2015-07-18 17:04 · Score: 3, Interesting

Serious question: Just out of curiosity, who pays the bills for all of the infrastructure that keeps Sourceforge running?
Hardware isn't free and employees aren't free. I seriously don't understand how Sourceforge has kept the lights on all these years.
And by the way, I'm a very satisfied user of their services. But I do worry about their future.
1. Re:Future of free hosting at Sourceforge? by fred911 · 2015-07-18 17:13 · Score: 5, Informative
  
  I think ever since the bundleware fiasco the revenue is generated by ads.
  https://en.wikipedia.org/wiki/...
  
  --
  09 F9 11 02 9D 74 E3 5B - D8 41 56 C5 63 56 88 C0 45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Thank you. by Etherwalk · 2015-07-18 17:31 · Score: 5, Insightful

Thank you to the Slashdot team. Bringing systems back up like that is emergency-mode-fun, but a lot of work, and we appreciate it.
Re:NSA backup by Tablizer · 2015-07-18 18:11 · Score: 5, Funny

Isn't NSA world's most famous backup solution?

Yes, but they have to kill you after they restore your data.
Try Chinese gov't instead.

--
Table-ized A.I.
Cause?? by scsirob · 2015-07-18 18:56 · Score: 4, Interesting

It's great to see how you responded to the failure and got services resumed pretty quickly. However, I'd rather like to see a follow-up sometime, describing a root cause analysis. With all the clustered, distributed servers and filesystems you use today, such an outage shouldn't be possible, right?

--
To Terminate, or not to Terminate, that's the question - SCSIROB
1. Re:Cause?? by swb · 2015-07-18 23:12 · Score: 2
  
  The blog post was pretty content free about what exactly went wrong.
  I would have guessed they would have the functional ability to either restore a storage snapshot to get back an entire LUN or a VM from a VM-based backup, and maybe they did.
2. Re:Cause?? by danomac · 2015-07-19 09:04 · Score: 4, Funny
  
  Maybe somebody from Dice downloaded something from Sourceforge and installed it on their servers?
Re:ZFS by Harlequin80 · 2015-07-18 20:26 · Score: 2

Kinda depends on the failure. If your raid controller decides to die in a spasmodic on off on off way you can easily corrupt all your file systems in one go, zfs or otherwise. At that point if you didn't have redundant live storage pools it gets harder.
Or of course there is the issue where someone does something stupid, like deleting files from live machines without thinking about what they are.
Re: Less Trolling by mnemotronic · 2015-07-19 01:12 · Score: 2

Personally, Ive off-shored & out-sourced a majority of my trolling, passive-aggressive self-rightous diatribes and compensated product endorsements. This leaves more time for, well, pr0n. Gotta have a life sometime ya know.

--
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
Great the /. could notify users by Streetlight · 2015-07-19 03:24 · Score: 2

It looks like /. had a Plan B ready in the case of a catastrophic failure. For some sites one just gets a blank page with some strange message when that happens. /. did the right thing letting users know they had a problem and were working on it and then let us know a bit about what happened. Thanks, /. techs.

--
In a time of universal deceit, telling the truth is a revolutionary act. George Orwell
That rebuild though! by Gazzonyx · 2015-07-19 22:48 · Score: 2

I say this as someone that runs ZFS on his backup/file server; if you do have to restore or resilver it can take a long while! A single slow drive in a vdev will limit the entire pool's IO (the extent of which is entirely dependent on topology, but the weakest link always crushes you in ZFS). After a handful of TB of data, even with a pool of mirrored vdevs and a flash cache device, the resilver for a single drive can take a day unless you've got some serious spindle count at high RPMs. Even SAS drives don't provide that many IOPS.

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.