Slashdot Mirror


Open Compute Project Comes Under Fire

judgecorp writes: The Open Compute Project, the Facebook-backed effort to create low-cost open source hardware for data centers has come under fire for a slack testing regime. The criticism was first aired at The Register where an anonymous test engineer described the project's testing as a "complete and total joke." The founding director of the project, Cole Crawford has penned an open letter in reply. The issue seems to be that the testing for standard highly-reliable hardware used by telcos and the like is very thorough and expensive. Some want the OCP to use more rigorous testing to replicate that level of reliability. Crawford argues that web-scale data centers are designed to cope with hardware failures, and "Tier 1" reliability would be a waste of effort.

6 of 86 comments (clear)

  1. Smells like astroturf. by An+Ominous+Coward · · Score: 3, Insightful

    Probably Cisco trolling against a movement that's going to put them out of business.

    Sooner the better, I say.

  2. Cheap hardware. Smart Software by biojayc · · Score: 5, Insightful

    You don't need expensive hardware to run datacenters. You need cheap commodity hardware with smart software on top. Just ask Google or Facebook.

  3. Saying you test is easy. by digsbo · · Score: 4, Insightful

    But testing well is really, really hard. And expensive, especially for data center scenarios. If you haven't put it in an oven and observed the effects, it's not tested for telco data centers.

  4. Isn't this expected? by fuzzyfuzzyfungus · · Score: 4, Insightful

    I don't know if it's a good idea or not(probably depends on who you are, and I'm sure that there will be some people who chose incorrectly); but is it really a surprise that OCP would be doing their testing on the cheap 'n cheerful side of things?

    It was my understanding that their premise, from the beginning, was that existing hardware vendors were excessively focused on adding costly, thermally demanding, and often proprietary, features at the hardware level that were unnecessary if you were willing to compensate for their absence in your software design.

    There is obviously some level of reliability below which no compensation at the software level is possible(if you can't run the algorithm for detecting errors because it keeps glitching out, it's probably not going to work); but the impression they always conveyed was that many of the more sophisticated reliability mechanisms are really features aimed at people who are substantially less able to cope with failure; and are therefore willing to pay substantially more for hardware that can invisibly paper over a variety of moderately serious failures and allow the software on top to run without incident; rather than buying lots of cheap hardware that has a risk of going down in a screaming heap.

    So long as nobody gets any stupid optimistic ideas, I don't really see the issue. Sure, if Facebook were about sending men to mars, they should seriously consider having three CPUs running in lockstep and voting on all operations and so on; but this project is about delivering as many ad impressions per dollar as possible; no reason to get worked up over the occasional glitch.

  5. 5 9's by The+Raven · · Score: 4, Insightful

    I'm gonna side with OCP on this one. It is far more economical to deal with reliability via redundancy than it is via expensive parts. This is why we use RAID rather than speccing our drives to last 10 years minimum. All the big players in the datacenter market have put thousands of hours each into designing systems tolerant of missing parts.

    The downside is that writing custom stacks tolerant of missing pieces is fucking hard and a huge up-front investment for a company. Most off-the-shelf software does not have that level of redundancy and fault tolerance baked in already. This means that for many small to medium sized deployments it's cheaper to buy a really expensive fault tolerant rack of servers and run your off-the-shelf software on it than it is to buy into OCP with inexpensive hardware that's more open to failure, because your software is NOT open to failure.

    Different strokes for different folks and all. Use the right tool for the job. And OCP was made by companies with massive data farms to fit their needs... and their needs are probably not your needs.

    --
    "I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
  6. Be highly available in software, not hardware by poopie · · Score: 4, Insightful

    I suspect open compute project welcomes additional testing resources for the benefit of everyone... as long as it doesn't involve an oppressive amount of process that simply serve to slow down progress.

    But... Web scale IS different, so I can't blame the main sponsors for not prioritizing what isn't as important to them. Once you accept that ALL hardware fails, and that you can either pay more for more reliable hardware, or you can develop better software architecture to handle failures, you look at things differently. Spend your money once on good software engineering, instead of over and over on every server.