A Note On Thursday's Downtime
If you were browsing the site on Thursday, you may have noticed that we went static for a big chunk of the day. A few of you asked what the deal was, so here's quick follow-up. The short version is that a storage fault led to significant filesystem corruption, and we had to restore a massive amount of data from backups. There's a post at the SourceForge blog going into a bit more detail, and describing the steps our Siteops team took (and is still taking) to restore service. (Slashdot and SourceForge share a corporate overlord, as well as a fair bit of infrastructure.)
like unicode support and ipv6.
It's great to see how you responded to the failure and got services resumed pretty quickly. However, I'd rather like to see a follow-up sometime, describing a root cause analysis. With all the clustered, distributed servers and filesystems you use today, such an outage shouldn't be possible, right?
To Terminate, or not to Terminate, that's the question - SCSIROB