Slashdot Mirror


Build Your Own 135TB RAID6 Storage Pod For $7,384

An anonymous reader writes "Backblaze, the cloud-based backup provider, has revealed how it continues to undercut its competitors: by building its own 135TB Storage Pods which cost just $7,384 in parts. Backblaze has provided almost all of the information that you need to make your own Storage Pod, including 45 3TB hard drives, three PCIe SATA II cards, and nine backplane multipliers, but without Backblaze's proprietary management software you'll probably have to use FreeNAS, or cobble together your own software solution... A couple of years ago they showed how to make their first-generation, 67TB Storage Pods"

239 comments

  1. My God... by AngryDeuce · · Score: 1

    It's full of stars!!

    1. Re:My God... by ByOhTek · · Score: 2

      It's full of slashvertisements!!

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    2. Re:My God... by Kjella · · Score: 1

      Pr0n stars? Because we all know what it's really full of...

      --
      Live today, because you never know what tomorrow brings
    3. Re:My God... by x6060 · · Score: 2

      It's not a slashvertisement if they tell you how to build it yourself.....

    4. Re:My God... by TheRaven64 · · Score: 2

      Except for the bit about how it would be even better if you paid for their proprietary management software...

      --
      I am TheRaven on Soylent News
    5. Re:My God... by Dillon2112 · · Score: 5, Insightful

      My problem with Backblaze is their marketing is very misleading...they pit these storage pods up against cloud storage and assert that they are "cheaper", as though a storage pod is anything like cloud storage. It isn't. Sure, there's the management software issue that's already been mentioned, but they do no analysis on redundancy, power usage, security, bandwidth usage, cooling, drive replacement due to failure, administrative costs, etc. It's insulting to anyone who can tell the difference, but there are suits out there who read their marketing pitch and decide that current cloud storage providers like Google and Amazon are a rip off because "Backblaze can do the same thing for a twentieth the price!" It's nuts.

      You can see this yourself in their pricing chart at the bottom of their blog post. They assert that Backblaze can store a petabyte for three years for either $56k or $94k (if you include "space and power"). And then they compare that to S3 costing roughly $2.5 million. In their old graphs, they left out the "space and power" part, and I'm sure people complained about the inaccuracies. But they're making the same mistake again this time: they're implicitly assuming the cost of replicating, say, S3, is dominated by the cost of the initial hardware. It isn't. They still haven't included the cost of geographically distributing the data across data centers, the cost of drive replacement to account for drive failure over 3 years, the cost of the bandwidth to access that data, and it is totally unclear if their cost for "power" includes cooling. And what about maintaining the data center's security? Is that included in "space"?

      On a side note, I'd be interested to see their analysis on mean time between data loss using their system as it is priced in their post.

      You could say the Backblaze is serving a different need, so it doesn't need to incur all those additional costs, and you might be right, but then why are they comparing it to S3 in the first place? It's just marketing fluff, and it is in an article people are lauding for its technical accuracy. Meh.

    6. Re:My God... by ByOhTek · · Score: 0

      So, you are suggesting they aren't promoting their proprietary software, or their service (while criticizing the competitors) on that page?

      Odd, I must be on some pretty interesting, and highly specific-targed hallucinogens.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    7. Re:My God... by x6060 · · Score: 4, Insightful

      Did you notice how they even gave you the alternatives to their software? Essentially they are saying "We developed this for our own internal use and if you would LIKE to pay for it its cool. If you dont then there are these other free alternatives." But then again just because some company is mentioned in the article it MUST be a slashvertisment.

    8. Re:My God... by x6060 · · Score: 2

      You must be. They even GIVE you the free alternatives to their software. But in the same page they give you everything you need to do it yourself. You just have to add some hardware.

    9. Re:My God... by Archangel+Michael · · Score: 1

      Who said it would be better? Not the article. The article said you'd have to do it yourself, and if you're THAT good, you might make something better. If you don't want to do it yourself (FreeNAS) you can opt for their software to manage it, which GASP HORROR, they charge for.

      Challenge laid down Open Source Community, make your own Management software and create a new FS that doesn't have the limitations of the EXT4 has without using LVN to get around those limitations, that is better than what these people offer.

      Complainers, like most of the ones here, just complain and say what is "wrong", but never offer up a solution and work to create an alternative that is better. It is much easier to "complain" about shit than to actually do it.

      Quite trying to quarterback from the sidelines.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    10. Re:My God... by Revotron · · Score: 1

      Suits will be suits. Backblaze proudly boasts that they're a great offsite backup solution, but they will quickly tell you that they are not a "cloud storage" provider. Their only business is offsite replication. They don't hide the fact that if you upload 500GB of data and then delete it off your computer, it will be removed from their systems as well.

      You didn't read the article. You know how I know that? They explicitly state that they don't have any costs for replacement hard drives over 3 years, because they're *all under warranty* for 3 years. When a drive fails, they get a new one from the manufacturer, no questions asked. And yes, actually, they do include bandwidth in the cost. They disclose it within the same paragraph.

      And on the topic of cooling, cooling is a cost that can't be directly assigned to one particular server because it's irresponsibly expensive to monitor the heat output of every individual server. The cost of cooling is an indirect cost and is always factored into operational overhead, and from there the operational overhead is allocated evenly across systems. I suspect that they're just giving the direct costs of storing X amount of data, which would also explain why the Amazon S3 price is ridiculously high - they don't have access to only the direct prices, so they're forced to use the list price which already includes all costs and profits associated with the service.

    11. Re:My God... by ByOhTek · · Score: 0

      Maybe so, but they still are rather rabidly promoting their products, especially in the last paragraph or two.

      Backblaze attributes its ongoing success to its ultra-frugal Storage Pods. While competitors in the online backup space are closing down or hoiking prices up wildly, Backblaze still manages to offer unlimited, secure backup for $5/month.

      Looks like marketing/advertising speak to me.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
    12. Re:My God... by Archangel+Michael · · Score: 3, Insightful

      First: ALL Marketing is misleading. That is what marketing does. Accentuate the positive, eliminate the negative. So complaining about that is just idiotic.

      Second: You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing. Sure that doesn't included cost for premises, and HighSpeed Internet to multiple locations. But still, that is aggregated with all the other clients.

      Third: what are you paying for in the "cloud", I mean besides ethereal concepts. Does Amazon tell you how they do things? You probably know less about Amazon (and the others) setup so you're comparing something you know something about (not everything) verses something you know almost nothing about, and the complain that they aren't doing it in a comparable way. You don't know.

      Fourth: Your basic assumption is that Backblaze has no contigency for drive replacement, which is false. Since these are "new" drives there might be insufficient data about failure rates and therefore the actual cost of replacement (never mind warranties) or having drives in both Hot and Cold Spare setups. I'm sure that Backblaze in their $5/MO service figures what it costs to store data, have spares, keep the Datacenter running and profitable. Even if they double the cost to $10, it still puts the others to shame.

      Have you compared the data loss rates for the last three years between Amazon and Backblaze? Can you even compare or is that data held secret (see point 1b). My point here, is that you're pulling shit out of your ass and thinking it doesn't stink. Even if it isn't directly comparable, it is at least in the realm of consideration, EVEN if everything you said is true. And at 10 times less in cost, that can buy a lot of redundancy. It is just a matter of perspective.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    13. Re:My God... by Anonymous Coward · · Score: 0

      Somewhat agreed but who else do they compare themselves too?

      Especially in an era where "The Cloud" is such a buzzword, they need to fit themselves into other peoples expectations of a service provider, sadly.

      Interesting as well is your push that S3 is distributed and redundant "enough" to stop one having to think about it. Really?

    14. Re:My God... by Anonymous Coward · · Score: 0

      I do consulting on the side for some major hosting companies doing things such as storage.. I think you're seriously overestimating the systems some of them have in place. From what I've seen is more about how little space footprint can they take up in their data centers, rather than how much reliability is there.

    15. Re:My God... by Anonymous Coward · · Score: 1

      From their blog post:

      Our philosophy is to plan for equipment failure and build a system that operates in spite of it. We have a lot of redundancy, ensuring that if a drive fails, immediate replacement isn’t critical. So at his leisure, Sean also spends one day each week replacing drives that have gone bad. As of this week, Backblaze has more than 9,000 hard drives spinning in the datacenter, the oldest of which we purchased four years ago. We see fairly high infant mortality on the hard drives deployed in brand new pods, so we like to burn the pods in for a few days before storing any customer data. We have yet to see any drives die because of old age, which will be fascinating to monitor in the next few years. All told, Sean replaces approximately 10 drives per week, indicating a 5 percent per year drive failure rate across the entire fleet, which includes infant mortality and also the higher failure rates of previous drives.

    16. Re:My God... by QuantumRiff · · Score: 1

      They don't sell the management software.. its internal. They only sell a backup service to end users, via a client.

      --

      What are we going to do tonight Brain?
    17. Re:My God... by x6060 · · Score: 2

      Thats all well and good that you can point out a few places that "Looks like marketing/advertising speak" and ignore the fact that they give you all the tools and knowledge to do it yourself and replicate the success of the "ultra-frugal Storage Pods". Hell they even give you some of the issues you will run into using one of the free alternatives AND EVEN GIVE YOU THE SOLUTIONS FOR THEM. OH! But there are a few places where they mention the product that they run off the very system they told you how to build and could make yourself. That must mean they are evil corporate fat-cats trying to cheat you out of your hard earned dollars. DAMN THEM FOR PROVIDING COST EFFECTIVE BACKUP SPACE!!!!!!!! How DARE they seek to earn a living. I am guessing you would be fine with the "slashvertisment" if their service was free though.

    18. Re:My God... by TooMuchToDo · · Score: 1

      1) Build 135TB box 2) Install Openstack.org's Object Storage system (free! Amazon S3 API Compliant!) 3) Profit? Fuck profit! STORE ALL THE THINGS!

    19. Re:My God... by TooMuchToDo · · Score: 1

      Amazon's S3 is based off of MogileFS (the concept, not the code): http://danga.com/mogilefs/

      And if you want to run an S3 compliant system internally, you'll us openstack.org's object storage system:

      http://www.openstack.org/projects/storage/

      Ability to provide object storage services at multi-petabyte scale
      Free open source software, no licensing frees, ‘open-core,’ or ‘freemium’ model
      Written in python; easy to differentiate your offering with extensions and modifications
      Compatibility and established ecosystem with industry standard OpenStack API
      Support for Amazon S3 API for easy inbound migration
      Completely multi-tenant, with billing integration hooks
      Pluggable authentication mechanism for SSO integration
      Integrated reseller model allows for resale of services

    20. Re:My God... by Anonymous Coward · · Score: 0

      The cloud is just services/servers on the internet you stupid TWIT. Yes, it is cloud since it is on the internet. Cloud does not mean redundancy. I can put stuff on my comcast line and call it the Cloud. Cloud is fucking marketing term, stop being stupid.

    21. Re:My God... by Anonymous Coward · · Score: 0

      Hi - the main issue is they are different services. Amazon is designed to host your production data, high performance, lots of bandwidth, HA, etc

      This is almost "Write-once/only" storage.

      Some of your points are invalid
        - "They still haven't included the cost of ..drive replacement to account for drive failure over 3 years"
        o They state that they have a 3 year warranty, so replacement parts are free

        - "Space"
        o I think it's a given that this covers all the facility costs - power, cooling, a security guard, etc.

    22. Re:My God... by Rich0 · · Score: 1

      You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing.

      Ok, you'll also have to write software to keep them in sync.

      Oh, and you'll need a tech at each of those locations. You'll also need physical security. The security guards will need a supervisor, as will the techs. You'll need a bathroom at those locations, which means you'll need janitors, and so on.

      Don't underestimate the cost of overhead if you're running a business. It can be rather substantial.

      All that said, if you're willing to run at medium scale I have no doubts that anybody can set up the equivalent of AWS for less - after all Amazon does it and makes a profit on top. The question is whether a business really wants to. I'm sure my employer goes through enough plastic trash bags in a week to fund the creation of a bag manufacturing line. However, at some point a business needs to decide what business it is in...

    23. Re:My God... by Dillon2112 · · Score: 1

      I'm not sure I ever said anything about having to think about it. I can tell you that S3 runs at least triple redundancy (enough to survive the loss of two data centers simultaneously). That's a very different product from what BackBlaze is selling.

    24. Re:My God... by Dillon2112 · · Score: 1

      I didn't mean to assert that BackBlaze's product has no place in the market. Heck, I'd love to build a couple of their pods for home use. My only point was that it is a product that has very different strengths than S3. The only reason I picked S3 in the comparison is because *they* picked S3 when they decided to discuss pricing.

    25. Re:My God... by guruevi · · Score: 1

      At the rate Amazon and company are charging for bandwidth, monthly storage cost, doubly so for redundancy and then still manage to lose your data or frequently lose connection to your 'pod' it IS cheaper to build your own even if you use a comparable top-of-the line SAS/FC enclosure such as the SASBeast or SATABeast.

      Backblaze simply drops the price for 'backup' or large amounts (talking about multiple racks full) storage even more than we are used with current solutions. $8000 for 135TB ($60/TB) is CHEAP. We currently buy similar units at roughly $300/TB. I can have a fully redundant system and still come out cheaper even if it is slightly less space and energy effective. After all the costs are counted, Amazon is roughly $1500/TB.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    26. Re:My God... by Dillon2112 · · Score: 1

      First: ALL Marketing is misleading. That is what marketing does. Accentuate the positive, eliminate the negative. So complaining about that is just idiotic.

      OK, there's also the concept of truth in advertising. I sense you want to argue about this, but false comparisons are different than "accentuating the positives".

      Second: You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing. Sure that doesn't included cost for premises, and HighSpeed Internet to multiple locations. But still, that is aggregated with all the other clients.

      Amazon doesn't charge for one data center with a couple dozen rack-mounted machines being watched by one tech. If that's what you think S3 is, you are mistaken.

      Third: what are you paying for in the "cloud", I mean besides ethereal concepts. Does Amazon tell you how they do things? You probably know less about Amazon (and the others) setup so you're comparing something you know something about (not everything) verses something you know almost nothing about, and the complain that they aren't doing it in a comparable way. You don't know.

      Actually, I know quite a bit. Even if you ignore things I learned on the job, if you just read their basic literature about what they offer, you can see that they use Dynamo on the back end and that they have infrastructure "designed to provide 99.999999999% durability and 99.99% availability of objects over a given year", as well as "sustain the concurrent loss of data in two facilities.". I happen to know how they derived those numbers, but that's not a useful discussion...the numbers were honestly derived, even if they can't empirically show them to be correct (the service simply hasn't been around long enough for that). The point is that S3 is in an entirely different ballpark from what BackBlaze offers.

      Fourth: Your basic assumption is that Backblaze has no contigency for drive replacement, which is false. Since these are "new" drives there might be insufficient data about failure rates and therefore the actual cost of replacement (never mind warranties) or having drives in both Hot and Cold Spare setups. I'm sure that Backblaze in their $5/MO service figures what it costs to store data, have spares, keep the Datacenter running and profitable. Even if they double the cost to $10, it still puts the others to shame.

      No, my basic assumption is that they don't factor in the costs for drive replacement. They say in their blog post that they have drives that are out of warranty. They don't include the costs to replace them in their analysis. Even for the drives that *are* under warranty, they don't factor in the cost of identifying them, ordering replacements from the manufacturer, and replacing the actual drive. In other words, I'm asserting that the dominating cost isn't necessarily the drive, it's the labor. They even say that the "hidden costs" of doing all this are the labor. They don't discuss what they pay Sean, but they aren't factoring it in to their graphs at the bottom of the post.

      Have you compared the data loss rates for the last three years between Amazon and Backblaze?

      No, I'm asking them to.

      Can you even compare or is that data held secret (see point 1b).

      S3 has a service level agreement for the durability of the data. I haven't seen this from BackBlaze. I still maintain that the services aren't comparable in any meaningful way.

      My point here, is that you're pulling shit out of your ass and thinking it doesn't stink.

      I'm not sure what you're trying to say here.

      Even if it isn't directly comparable, it is at least in the realm of consideration, EVEN if everything you said is true. And at 10 times less in cost, that can buy a lot

    27. Re:My God... by Dillon2112 · · Score: 1

      You didn't read the article.

      Not only did I read the article, I read their blog post. And the original blog post they had about their 67 TB pod, back when they first wrote it.

      You know how I know that? They explicitly state that they don't have any costs for replacement hard drives over 3 years, because they're *all under warranty* for 3 years. When a drive fails, they get a new one from the manufacturer, no questions asked.

      There's no such thing as not having any costs. They employ Sean to replace drives and build pods full time. That's a cost, and it's not included in their charts at the bottom of the blog post. Bandwidth isn't either, and they never claim it is. Read on.

       

      And yes, actually, they do include bandwidth in the cost. They disclose it within the same paragraph.

      And this is where it gets interesting. In the paragraph you refer to, they say it costs $2100 for them to run a rack of 10 pods for one month. If each pod has 135TB, and we define 3PB to be 3000TB, then we need 23 pods to get 3PB (this is assuming all they use is their new pod design, which they don't, they also use their less dense legacy design as well). But let's not get off track.

      So, 2.3 racks (23 pods total), each costing $2100 a month to operate in terms of *space rental*, *power* and *bandwidth*. This is the cost that includes bandwidth. So, assuming we run 2.3 racks for 36 months, we get $2100 * 36 * 2.3 = $173,880, just to operate 3PB for 3 years. That doesn't even include the cost to build them, or Sean's salary. And yet, somehow, at the bottom of their post, they assert that it costs a *total* of $96,000 to build and operate 3PB for three years. Odd, no?

      My point is that the costs they quote at the bottom of their blog post are inaccurate. Even if you take out bandwidth, which they state is roughly 1/3 of their operating costs, we're still talking about $116k of operating costs, above and beyond the cost to build the machines.

      Even after you factor all that in, they still aren't beginning to offer the service that Amazon or Google does. Again, my point is they are misleading folks when they compare their product to Amazon as though they have the same features. They don't.

    28. Re:My God... by Dillon2112 · · Score: 1

      I came back from a genetic research conference a couple of months ago where an IT professional at Argonne National Labs spoke about his efforts to build a compute and storage capability there to support the needs of the genetic sequencing community at the lab. Unllike BackBlaze, which has about 200-300 machines, he was managing over 250,000 cores (so, what, maybe 15,000 machines?) and said that it was getting to the point where it would be cheaper for him to move to a cloud service provider.

      The lesson I took away from that is that at the small scale, you can probably do it more cheaply, but as you scale up, the larger outfits end up offering a better deal, which isn't all that surprising.

      I thought that might be an interesting data point, since you basically said the same thing in your last paragraph.

    29. Re:My God... by Dillon2112 · · Score: 1

      S3 is redundant. They compared themselves to S3.

    30. Re:My God... by Dillon2112 · · Score: 1

      Even their own math on the auxiliary costs doesn't add up. I outline the what I think are the inconsistencies in a peer thread to this one.

    31. Re:My God... by ByOhTek · · Score: 1

      Ummm... Fiarly trivial thing to do.

      The only thing that is particularly novel/interesting about that setup is their custom case. Everything else is plain obvious or advert for their service.

      --
      Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
  2. Already approaching Petabytes? by Hermanas · · Score: 1

    Wow, are we already approaching Petabyte clusters? I'm still getting used to Terabyte!

    1. Re:Already approaching Petabytes? by Anonymous Coward · · Score: 0

      Well, that's at least a Petabit cluster...

    2. Re:Already approaching Petabytes? by gman003 · · Score: 1

      According to TFA's TFA, the company has a total capacity of 16 petabytes, using only 201 pods (many being the old 1.0 pods with 67TB storage).

    3. Re:Already approaching Petabytes? by Walker1337 · · Score: 1

      Most of the high end storage providers have Petabyte arrays now. I work for Netapp and I have personally installed a single cluster with 624 2TB SATA drives in it.

    4. Re:Already approaching Petabytes? by Narnie · · Score: 1

      If I ever have to admin a Petabyte cluster, I'd name it Petabear.

      --
      greed@All_Evils:~#
    5. Re:Already approaching Petabytes? by corbettw · · Score: 1

      I know, it's crazy! Storage numbers are increasing faster than the national debt!

      --
      God invented whiskey so the Irish would not rule the world.
  3. Again? by DeHackEd · · Score: 0
    1. Re:Again? by DeHackEd · · Score: 1

      Ugh, replying to myself. I missed the link in the post.

      But nothing's changed, right? It's the same chassis, same diagrams from backblaze. Only ~2 years of bigger drives is new.

    2. Re:Again? by Amouth · · Score: 1

      and different hardware/raid/multiplier/power harness setup..

      basically the same just updated - and worth an note about.. i wish they sold or someone sold the setup sans drives (or just the bare case) - it looks fun to mess with but don't have a lot of free time now days.

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    3. Re:Again? by Anonymous Coward · · Score: 0

      If you read the blog post the parts that changed besides the hard drives are: motherboard, CPU, RAM amount, additional Gigabit ethernet, Changed from Debian 4 to 5, changed file system from JFS to ext4, changed to a different PCIe SATA card to eliminate the 4th one that used to run on just a regular PCI slot amongst other things. In other words, RTFA.

    4. Re:Again? by Anonymous Coward · · Score: 0

      There is more updated in there than just the drives. From the Backblaze blog entry:

      We’ve made several improvements to the design that have doubled the performance of the storage pod. Most of the improvements were straightforward and helped by Moore’s Law. We bumped the CPU up from the Intel dual core CPU to the Intel i3 540 and upgraded the motherboard from one Gigabit Ethernet port to a Supermicro motherboard with two Gigabit Ethernet ports. RAM dropped in price, so we doubled it to 8 GB in the new pod. More RAM enables our custom Backblaze software layer to create larger disk caches that can really speed up certain types of disk I/O.

      In the first generation storage pod, we ran out of the faster PCIe slots and had to use one slower PCI slot, creating a bottleneck. Justin Stottlemyer from Shutterfly found a better PCIe SATA card, which enabled us to reduce the SATA cards from four to three. Our upgraded motherboard has three PCIe slots, completely eliminating the slower PCI bottleneck from the system.

    5. Re:Again? by symbolset · · Score: 1

      You can get this from the company that builds the cases for them, Protocase. Send an email to lpodgursky@protocase.com for details. It's $5395.00 (1-4 units) and $4995.00 (5-9 units). And yes, that's more than building it yourself naturally.

      --
      Help stamp out iliturcy.
    6. Re:Again? by nabsltd · · Score: 1

      i wish they sold or someone sold the setup sans drives (or just the bare case)

      TFA says the case is available from Protocase for $875 in single unit quantities.

      A "pod" is just a standard x86 PC in this custom 4U case. Sure, it has a few specific extras, but all are standard, off-the-shelf hardware that you can easily buy. Appendix A in the Backblaze blog post gives every detail you need.

      If you start with just 15 hard drives (for a total of 45TB), then the price would be about $3300. You probably only save about $500 by using an standard case, because a decent one with room for 15 or more drives will set you back at least $300.

    7. Re:Again? by Amouth · · Score: 1

      5k just for the case? or is that everything sans drives?

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    8. Re:Again? by symbolset · · Score: 1

      Everything sans drives.

      --
      Help stamp out iliturcy.
    9. Re:Again? by Amouth · · Score: 1

      opps - read the thing and missed the one off price on the line item - thanks for pointing that out.

      --
      '...if only "Jumping to a Conclusion" was an event in the Olympics.'
    10. Re:Again? by symbolset · · Score: 1

      They also sell the case by itself. They wanted $872 for qty 1 on the case alone about 18 months ago, some reasonable customization extras are available (custom silkscreen logo, custom colors and so on). Shipping is extra. It's odd that they don't have a simple web store setup for this, but it looks like their business is almost exclusively bespoke tin bending.

      --
      Help stamp out iliturcy.
  4. OLD OLD NEWS by Anonymous Coward · · Score: 0

    They have had a blog post on this topic for almost a year at least.

    1. Re:OLD OLD NEWS by kalalau_kane · · Score: 1

      Sun has been selling this same design for several years -- Sun x4500 released October 2006. - 6 SATA controllers - 48 top loading SATA drives - 2 x86 CPU.

  5. Not enough by bryan1945 · · Score: 2

    For a true porn collector yet.

    --
    Vote monkeys into Congress. They are cheaper and more trustworthy.
    1. Re:Not enough by Anonymous Coward · · Score: 0

      It's even far from it if you want your porn in 3D and 8k definition, as it should be.

    2. Re:Not enough by rbrausse · · Score: 1

      fun fact: porn industry has problems with high definition

      The high-definition format is accentuating imperfections in the actors — from a little extra cellulite on a leg to wrinkles around the eyes. [..] "The biggest problem is razor burn," said Stormy Daniels, an actress, writer and director. "I'm not 100 percent sure why anyone would want to see their porn in HD."

    3. Re:Not enough by Anonymous Coward · · Score: 0

      "Perfection" is overrated. Where some see flaws, others might see features ;).

      The "Girl Next Door" and MILF types might not be "perfect" but they're still popular.

  6. Feelin' HOT HOT HOT by GameboyRMH · · Score: 0

    Something about all those drives being packed in there like hot metal sardines gives me a bad feeling...

    --
    "When information is power, privacy is freedom" - Jah-Wren Ryel
    1. Re:Feelin' HOT HOT HOT by L4t3r4lu5 · · Score: 1

      I wouldn't be surprised if the top of the case fit flush with the hard drive cases and was used as a heatsink. Alu top case, finned, with a bank of fans in push/pull configuration, and a hot/cold arrangement of ducting along the racks.

      That's how I'd do it, anyway.

      --
      Finally had enough. Come see us over at https://soylentnews.org/
    2. Re:Feelin' HOT HOT HOT by Anrego · · Score: 1

      The multipliers make me more nervous!

      Seriously... my experience with sata multipliers has been that they should be avoided at all costs.

    3. Re:Feelin' HOT HOT HOT by hjf · · Score: 3, Informative

      This is nothing new. You've never been in a datacenter before, kid. You can ask a grownup one day and he can take you there and you will feel the heat. And NOISE. No offense, but I think you're one of those gamer kids who builds rigs for max FPS, with esoteric water cooling and silent fans everywhere.

      Yeah, no, you don't need to pamper your hardware that much. Even laptop drives work way hot (60C+) for years with no issue.

      Most servers are built that way too. The Sun x4500 is extremely densely packed. And there are hundreds running just fine.

    4. Re:Feelin' HOT HOT HOT by Lorien_the_first_one · · Score: 1

      Thank you for pointing that out about laptop drives. I have one at home burning it up at over 50C.

      --
      The diversity and expression of human opinion is essential to human survival.
    5. Re:Feelin' HOT HOT HOT by Anonymous Coward · · Score: 0

      Something about all those drives being packed in there like hot metal sardines gives me a bad feeling...

      apparently it is not an issue as their blogpost says:

      We monitor the temperature of every drive in our datacenter through the standard SMART interface, and we’ve observed in the past three years that: 1) hard drives in pods in the top of racks run three degrees warmer on average than pods in the lower shelves; 2) drives in the center of the pod run five degrees warmer than those on the perimeter; 3) pods do not need all six fans—the drives maintain the recommended operating temperature with as few as two fans; and 4) heat doesn’t correlate with drive failure (at least in the ranges seen in storage pods).

    6. Re:Feelin' HOT HOT HOT by Anonymous Coward · · Score: 0

      Their very specific selection of SYBA-branded SATA card is because that card works best with the multipliers. They have this figured out already, or else how on earth would they have hundreds of these pods working well in a datacenter.

    7. Re:Feelin' HOT HOT HOT by houghi · · Score: 1

      I have one running as a server. The fan inside is broken so no cooling at all. It runs around 100C for several months now.

      --
      Don't fight for your country, if your country does not fight for you.
    8. Re:Feelin' HOT HOT HOT by Kjella · · Score: 1

      Well that noise are the massive fans that keep the temperature of the equipment fairly close to ambient. If you quiet down the fans, the room temperature won't change much but power-hungry components will suddenly be way, way above room temperature. I had a really crappy cabinet crammed with back-to-back disks, didn't think much of it until they started dying... checked the SMART data, oh 75C for the top drive... that's 50C or so above the ambient temperature in the room. Better cabinet with more space, more and bigger fans, now it's down to 40-45C. It's not to "pamper" that hardware they do it, it's to do it quietly. If you don't care that your gaming machine sounds like a jet engine taking off, there's no problem.

      --
      Live today, because you never know what tomorrow brings
    9. Re:Feelin' HOT HOT HOT by cmiller173 · · Score: 1

      You would be surprised that there is a piece of foam between the top of the case and the drives if you RTFA!

    10. Re:Feelin' HOT HOT HOT by gweihir · · Score: 1

      Thermal design is highly non-intuitive. So you experiment, measure and have monitoring and automated emergency-shutdown in place. You do not even net fan-monitoring with this setup. Just very simple disk-temperature monitoring will tell you when a fan is down. My guess would be that they can tolerate one fan failure for some time and do a forced shutdown if two go down.

      This is for experienced engineers. I have done things like this before, and I think I could design both hardware and software for these boxes. It is not magic, just solid engineering with a solid understanding of the problems involved.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    11. Re:Feelin' HOT HOT HOT by Anrego · · Score: 1

      Oh no doubt. I mean they are using these things reliably as you said, so I'm sure it works. Same can be said about the heat issues (though I guess that would be dependant on external cooling as well).

      Just saying that the mere mention of SATA multipliers makes me cringe and fear for my data/sanity :)

    12. Re:Feelin' HOT HOT HOT by bill_mcgonigle · · Score: 1

      Seriously... my experience with sata multipliers has been that they should be avoided at all costs.

      SAS multipliers with SATA drives is a better risk/cost balance, for the general case.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    13. Re:Feelin' HOT HOT HOT by serviscope_minor · · Score: 1

      Yeah, no, you don't need to pamper your hardware that much. Even laptop drives work way hot (60C+) for years with no issue.

      That sounds a little hot. Just logged into one of my compute servers and the sensors read between 34 and 44 degrees. Though it's a 1U quad 6100[*] with very little disk space. But in general, slightly cold is waaaaay worse than very hot since the oil gets too viscous. My laptop runs hotter (cpu reads between 50 and 70 degrees), but it has a flash disk.

      [*] The 1U quad 6100s are astonishingly dense. You see may vendors bragging about how some silly hacked up job made of infinite atom CPUs or ARM or MIPS is super dense and low power, and they tend not to stack up well against the 6100s in terms of flops / U (or often even cores / U) and don't do much (if at all) better in terms of flops per watt. I think AMD are the current winner in this regard.

      But yes, the head and noise is quite astonishing.

      By the way, there are decent companies that will sell you a watercooled rig off the shelf if you need a GPU workstation that doesn't sound like a turbojet. They look a little funny and l33t-gamer but they work very well.

      --
      SJW n. One who posts facts.
    14. Re:Feelin' HOT HOT HOT by Penguinisto · · Score: 1

      The foam is there for good reason, too... you don't want hard drives banging (even at a sub-millimeter distance) against the top of the case - tends to wear out the drives, cause more errors, and makes noise.

      I know, I know... 'but they're flush!' Well, unless you custom-machined each HDD case *and* the unit case they went in, you're guaranteed to have a few drives in that type of physical array vibrate like that.

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    15. Re:Feelin' HOT HOT HOT by hjf · · Score: 1

      I was talking about guys that but full-tower machines to make 4-way RAID 0+1 arrays, with each disk 20cm apart from the other, and a hard drive cooler (with two fans) for each drive. That's overkill.
      Just a small amount of wind running under the drive is enough to keep it cool. No need to keep it at room temperature. 50C is good enough.

      Yes, you need forced air (that is, fans). But a few correctly placed fans for the whole case, are enough.

  7. This is a huge step forward by mugurel · · Score: 3, Funny

    for both internet security and privacy: each of us can now store his own local copy of the internet and surf offline!

    1. Re:This is a huge step forward by AmberBlackCat · · Score: 1

      That would actually be nice. If every site I ever went to was cached locally. Like having a browser cache with unlimited size. It would be miles better than archive.org, if you remember a site from years ago and wish you could go back. Even better if it prefetched links you never clicked on.

    2. Re:This is a huge step forward by demonbug · · Score: 1

      for both internet security and privacy: each of us can now store his own local copy of the internet and surf offline!

      Of course, with my 150GB/month bandwidth cap it is going to take ~70 years to fill it up...

    3. Re:This is a huge step forward by Pharmboy · · Score: 1

      wget -m -p http://*

      Just run that in your cron.daily scripts and you are good to go!

      --
      Tequila: It's not just for breakfast anymore!
  8. But you cant use it without getting too hot? by drolli · · Score: 1

    Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

    1. Re:But you cant use it without getting too hot? by blackraven14250 · · Score: 1

      With those gigantic fans, and the track record they have, it's probably ok.

    2. Re:But you cant use it without getting too hot? by Black.Shuck · · Score: 2

      It's probably fine.

    3. Re:But you cant use it without getting too hot? by gweihir · · Score: 1

      First, it depends on airflow. That is pretty close to optimal in the design. Second, you can monitor disk temperature and even have an emergency slowdown or shut-off if they overheat. Monitoring and shut-down is easy to script, maybe half a day if you know what you are doing.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:But you cant use it without getting too hot? by demonbug · · Score: 3, Interesting

      Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

      According to their blog post about it, they see a variation of ~5 degrees within unit (middle drives to outside drives) and about 2 degrees from the lowest unit in a rack to the highest. They also indicate that the drives stay within the spec operating temperature range with only two of the six fans in each chassis running.

      Keep in mind these are 5400 RPM drives, not the 10K+ drives you would expect in an application where performance is critical. These are designed for one thing - lots of storage, cheap. No real worries about access times, IOPS, or a lot of the other performance measures that a more flexible storage solution would need to be concerned with. These are for backup only - nice large chunks of data written and (hopefully) never looked at again.

    5. Re:But you cant use it without getting too hot? by WuphonsReach · · Score: 1

      Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

      It doesn't take much airflow at all to keep drives down around 35-40C. Even a light breeze can be enough to drop drive temperatures 5-10C. They're only 5-10W devices (for 3.5" drives) which means they're easy to cool in comparison to the 100-200W video cards or the 95-150W CPUs.

      --
      Wolde you bothe eate your cake, and have your cake?
  9. Can't actually store 135TB of data by gman003 · · Score: 4, Interesting

    The article says it uses RAID 6 - 45 hard drives are in the pod, which are grouped into an arrays of 15 that use RAID 6 (the groups being combined by logical volumes), which gives you an actual data capacity of 39TB per group (3TB * (15 - 2) = 39TB), which then becomes 117TB usable space (39TB * 3 = 117TB). The 135TB figure is what it would be if you used RAID 1, or just used them as normal drives (45 * 3TB = 135TB).

    And these are all "manufacturer's terabytes", which is probably 1,024,000,000,000 bytes per terabyte instead of 1,099,511,627,776 (2^40) bytes per terabyte like it should be. So it's a mere 108 terabytes, assuming you use the standard power-of-two terabyte ("tebibyte', if you prefer that stupid-sounding term).

    1. Re:Can't actually store 135TB of data by GameboyRMH · · Score: 2, Informative

      A manufacturer's terabyte would be 1,000,000,000,000 bytes.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    2. Re:Can't actually store 135TB of data by gman003 · · Score: 3, Informative

      Common usage for the past 50 years has been that, in the context of computer memory capacity, 'tera-" is to be interpreted as 2^40 (with "giga-" being 2^30, and so on). You'll note that I included a sidenote on 'tebibytes" to appease revisionists like you.

      PS: It's rather ironic that someone accusing me of bastardizing SI prefixes can't even spell 'terabytes" properly. Unless you're somehow referring to Earth Bytes or something.

    3. Re:Can't actually store 135TB of data by Wildclaw · · Score: 1

      I haven't checked how Hitachi does it, but that's how Seagate and Western Digital do it.

      Bullshit. Neither of my new 2TB Western Digital disks come with 2048*10^9 storage space.

    4. Re:Can't actually store 135TB of data by Inda · · Score: 1

      Tell me about it!!!

      £4,561.68 still sounds like a steal. In fact, I might just steal one and save even more!

      I actually spend more than that on food for the family per year. I wonder...

      --
      This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
    5. Re:Can't actually store 135TB of data by Kjella · · Score: 4, Informative

      Hitachi:
      "Capacity - One GB is equal to one billion bytes and one TB equals 1,000GB (one trillion bytes) when referring to hard drive capacity."

      Western Digital:
      "As used for storage capacity, one megabyte (MB) = one million bytes, one gigabyte (GB) = one billion bytes, and one terabyte (TB) = one trillion bytes."

      Seagate (PDF product sheets):
      "When referring to hard drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes."

      So no, no and more no. Sometimes there really should be a "-1, Wrong" moderation...

      --
      Live today, because you never know what tomorrow brings
    6. Re:Can't actually store 135TB of data by FreeBSDbigot · · Score: 1

      The 135TB figure is what it would be if you used RAID 1

      Actually, RAID 1 (mirroring) would cut the usable space in half; RAID 0 (striping) would keep it at 135TB.

      --
      Orange whip? Orange whip? Three orange whips.
    7. Re:Can't actually store 135TB of data by OverlordQ · · Score: 0

      Just because it's been used that way in the past shouldn't be justification for continuing to bastardize it.

      --
      Your hair look like poop, Bob! - Wanker.
    8. Re:Can't actually store 135TB of data by complete+loony · · Score: 1

      Data is also duplicated across different pods so you can lose one due to power supply issues and not care for a while. RAID across local groups of disks does seem a bit pointless when you already have a layer of redundancy across the whole rack.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    9. Re:Can't actually store 135TB of data by Just+Some+Guy · · Score: 2

      Stop bastardizing the SI prefixes. Terra is the prefix

      The irony: it is strong with this one.

      --
      Dewey, what part of this looks like authorities should be involved?
    10. Re:Can't actually store 135TB of data by thesh0ck · · Score: 0

      wrong

    11. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      Yeah, and from your Internet provider, is it a megabit 1048576 bits? (answer: no, it is not).

    12. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      I'm going to bypass the whole debate and use Earth Bytes from now on.

    13. Re:Can't actually store 135TB of data by ari_j · · Score: 1

      Just a small quibble: RAID level 1 would give you a capacity of 3TB with an absurd amount of redundancy. Level 0 is that one that would give you 135TB striped across all 45 disks.

    14. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      raid1 is mirroring, that would result in 3*45/2=67,5 TB.

      You probably meant raid0, striping, that would use ALL space for user purpose and speed upp transfers significantly compare to any other raid level.

    15. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      The metric system is way too clean as it is now, Americans need to bastardize it a little so they don't look too silly for not using it. To further help the cause make sure you act as if SI is case insensitive, and write km, Km and KM interchangeably.

    16. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      Since your argument is based on "common usage in the past 50 years" instead of arguing directly that overloading these prefixes is a good idea, I've been thinking... what is the minimum x such that "common usage in the past x years" is enough to justify doing something that is not directly justified?

    17. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      > The 135TB figure is what it would be if you used RAID 1, or just used them as normal drives (45 * 3TB = 135TB).

      I think you mean RAID-0 or JBOD. Raid-1 is mirroring - you'd get 3TB of data storage, all replicated 45 times.

    18. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      Just because something has been wrong for 50 years doesn't make it the truth.

    19. Re:Can't actually store 135TB of data by maxwell+demon · · Score: 1

      Well, of course Terra is the prefix for 10^42.You know, Terra is really big. :-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
    20. Re:Can't actually store 135TB of data by QuantumRiff · · Score: 1

      There is "cloud storage" management software that would be awesome on these boxes (although they might benefit from a bit more CPU and ram, and some more Gigabit nics.. When I read this blog article yesterday, I immediately went back to openstack.. The examples for Openstack Storage don't even bother with raid, since the objects on the drive will be replicated to multiple other servers automatically. This could be very, very interesting..

      http://www.openstack.org/projects/storage/

      --

      What are we going to do tonight Brain?
    21. Re:Can't actually store 135TB of data by marcosdumay · · Score: 1

      Kelvin Mega what?

      On a side note, I know plenty of people that use the SI as if it was case insensitive. Other common bastardizations:

      Caling the unit "linear metter", and abreviating it as "ml" (in latin languages of course, english speaking people are more likely to use some custom unit for that)(again, propably case insensitive, so replace it by Ml, mL or ML if you like).

      Abreviating second as "sec", square metter as "sqm" or "quadm", "mqd", etc.

    22. Re:Can't actually store 135TB of data by gman003 · · Score: 1

      Huh. That's odd - I distinctly remember seeing otherwise. Oh well - guess I was wrong.

    23. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      The 135TB figure is what it would be if you used RAID 1, or just used them as normal drives (45 * 3TB = 135TB).

      And these are all "manufacturer's terabytes", which is probably 1,024,000,000,000 bytes per terabyte instead of 1,099,511,627,776 (2^40) bytes per terabyte like it should be. So it's a mere 108 terabytes, assuming you use the standard power-of-two terabyte ("tebibyte', if you prefer that stupid-sounding term).

      You mean RAID 0...RAID1 is 100% redundancy so would cut your storage in half.

    24. Re:Can't actually store 135TB of data by gman003 · · Score: 1

      Dammit, why do I keep getting those mixed up?

    25. Re:Can't actually store 135TB of data by Anonymous Coward · · Score: 0

      Not your fault; they should have been clearer. It's just every single manufacturer, stamping it on every single package, data sheet, and product listing for years.

    26. Re:Can't actually store 135TB of data by Roman+Mamedov · · Score: 1
    27. Re:Can't actually store 135TB of data by olau · · Score: 1

      Except that we haven't talked about terabytes for more than the last few years since the capacities weren't there before. Nice try. :)

    28. Re:Can't actually store 135TB of data by serviscope_minor · · Score: 1

      All manufacturers have used base 10 for many years.

      I think you are thinking of the 1.44MB floppy where the "megabytes" are the bastard son of 10^3 and 2^10.

      --
      SJW n. One who posts facts.
    29. Re:Can't actually store 135TB of data by serviscope_minor · · Score: 1

      Common usage for the past 50 years has been that, in the context of computer memory capacity, 'tera-" is to be interpreted as 2^40 (with "giga-" being 2^30, and so on). You'll note that I included a sidenote on 'tebibytes" to appease revisionists like you.

      No, it has only been common in RAM since otherwise you end up with RAM chips with holes in them. RAM chips alway uses 2^N bits, not bytes.

      The use for magnetic storage has always been very inconsistent (with the pinnacle of silly units being reached with the 1.44MB floppy). This makes sense since there is no real need have a power of two sector especially as these days the unmber of sectors per track varies to give effectively constant bitrate across the disk.

      Bandwidth has almost always been beasures in base 10 units, since engineers tend to think about bitrates and MHz etc (who ever heard of 1KHz == 1024 Hz). It has been bastardized in the direction of base two units on and off in parts of the computer industry (e.g. display to users).

      For many computer uses, base 2 units remain useful.

      But the use for everything except RAM has always been inconsistent, so it really makes sense to distinguish between K M G T and Ki, Mi, Gi, Ti. It may sound silly, but at least it is unambiguous.

      --
      SJW n. One who posts facts.
    30. Re:Can't actually store 135TB of data by Rich0 · · Score: 1

      Yeah, but just think of the parallel seek times!

    31. Re:Can't actually store 135TB of data by Solandri · · Score: 2

      Some marketer at Maxtor(?) started the transition from the 2^20 definition of MB to 10^6 for HDDs in the mid-1990s. The (at the time) smaller HDD manufacturers like Western Digital quickly followed suit. Seagate was one of the later ones. IBM (now Hitachi) was the last one to make the switch - they held out until about 2000.

    32. Re:Can't actually store 135TB of data by DamnStupidElf · · Score: 1

      Having to replicate 145 TB over the network just because one disk failed is kind of pointless. RAID6 will just rebuild the failed disk locally.

    33. Re:Can't actually store 135TB of data by houstonbofh · · Score: 1

      Dammit, why do I keep getting those mixed up?

      Because you haven't lost a drive yet?

    34. Re:Can't actually store 135TB of data by gman003 · · Score: 1

      Probably. My knowledge of RAID is mostly theoretical at this point - I got to set up a server last week, used a five-disk RAID 5 plus two hot spares. Nothing's failed yet, although I expect to experience a failure within a year, as the disks are over a decade old.

    35. Re:Can't actually store 135TB of data by complete+loony · · Score: 1

      If a disk fails, you lose 3TB not 145. You'd only lose access to 135TB if you lose connectivity to a whole pod, and even then the disks might be recoverable. So of their 9,000 disks, about 10 fail and are replaced per week. So you'd want to ensure that you can lose any random set of disks without losing data.

      From their first article, they put 15 disks into a RAID array and would need to lose 3 of them before they can't rebuild the array. But even when you do replace a drive, the rebuild on a 3 TB disk @ 150MB/s is going to leave quite a long window between drive failure and your data being safe again.

      Instead if you do something close to what I believe google does. You make sure every block of data is stored on say 3 randomly chosen disks, on separate machines (though you might want more than that). And you lose any two disks at once, the number of data blocks that would be down to one copy would be quite small and only take a few seconds to replicate again.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    36. Re:Can't actually store 135TB of data by Stuarticus · · Score: 1

      as the disks are over a decade old.

      Wow, you must be able to store multiple DVDs on that array!

      --
      If you think someone isn't free to have a different definition of "freedom" you may be a tyrant.
    37. Re:Can't actually store 135TB of data by gman003 · · Score: 1

      Yeah. Total capacity: 126GB. Total usable in current config: 72GB.
      Hey, at least I stuck Linux on it. The rate I'm doing that, pretty soon the only Windows-running server on the network will be the domain controller.

    38. Re:Can't actually store 135TB of data by Elshar · · Score: 1

      You got a little mixed up on your RAID levels.

      RAID 1 is mirroring, so you'd get (45 * 3TB) / 2 = 67.5TB

      RAID 0 is concatenating, so you'd get 45 * 3TB = 135TB

      You really wouldn't want to do RAID0 for a 'backup' array. :)

    39. Re:Can't actually store 135TB of data by DamnStupidElf · · Score: 1

      To only lose 3TB out of the 145TB implies that the metadata is stored redundantly across all storage devices and that the data is not striped. If the data is striped then the loss of one drive makes most of it useless. If metadata is not redundant across several drives then you could lose the drive storing metadata about data on other drives.

      It's a cost vs. speed/simplicity trade-off to only replicate data. You have to buy 300% of raw storage for three full replicas, versus approximately 115% for RAID6 on 15 disks. For enclosure/server redundancy RAID6, or in general any (m,n) erasure code, can be used to aggregate enclosures with a small overhead for protecting data. The entire reason for designing erasure codes was to save time/space/money.

  10. Deja Vooooooo.... by Anonymous Coward · · Score: 0

    Didn't we cover this story a couple of years ago with smaller drives?

    1. Re:Deja Vooooooo.... by cmiller173 · · Score: 1

      Didn't the summary say so and provide a link to the previous story. Of course, in addition to the drives getting bigger they changed a couple other things (MB, memory, CPU, SATA cards, SATA multipliers, wiring), but it is the same case so sure it's the same.

  11. The price is too high.. by adamjcoon · · Score: 1

    You can buy 68 internal drives (2TB each) for the low price of $5439.32 http://www.newegg.com/Product/Product.aspx?Item=N82E16822152245 I'm not a hardware expert, but I imagine you could connect them somehow for less than $1944.68.. ($7384 - $5439.32)

    1. Re:The price is too high.. by h4rr4r · · Score: 1

      $2000 to connect 68 drives seems crazy cheap. A good raid controller can cost more than that.

    2. Re:The price is too high.. by tomz16 · · Score: 2

      Nope, not at all... $2,000 is actually really cheap IMHO. Try to find a way to connect 68 drives cheaply (RAID cards and SATA multiplier backplanes are both pretty expensive). Don't forget that you also need a custom case, motherboard, ram, cpu, PS, and cooling for everything.

    3. Re:The price is too high.. by Anonymous Coward · · Score: 0

      "I'm not a hardware expert"

      If you hadn't told me, I'd have never known.

    4. Re:The price is too high.. by gman003 · · Score: 1

      Yes, 2TB drives are more cost-effective (price per terabyte) than the 3TB drives. But one of the major costs for Backblaze is power and space. They pay about $2,000 per month per rack in space rental, power and bandwidth, regardless of whether that rack is using 3TB drives of 300gb drives. So the difference in hardware costs is payed back by the increased density.

    5. Re:The price is too high.. by hjf · · Score: 1

      The price also includes custom made cases, fans, the power supplies, and custom-made port multiplier SATA backplanes. The custom parts make it pretty expensive, I guess.

    6. Re:The price is too high.. by b0bby · · Score: 1

      The 3TB drives are $6300 for 45 at newegg, and you'll need less cases/space/power for them - it's probably a wash in the end.

    7. Re:The price is too high.. by Anonymous Coward · · Score: 0

      cost it all up and add additional psu and controllers for 68 drives. we'll wait...let's all tell people how to do something they are already doing.

    8. Re:The price is too high.. by Anonymous Coward · · Score: 0

      I'm not a hardware expert, but I imagine you could connect them somehow for less than $1944.68.. ($7384 - $5439.32)

      Not that easily. Beside a strong power supply and a big case you need enough ports.
      Either you do one mainboard + processor + RAM + case for each, which means 7 computers with 10 ports each, leaving 300$ per machine.

      Or you need a mainboard that can take e.g. 5 12-port SATA controllers (which are normally PCIe x8, good luck finding a board with that many slots).
      Here you are strictly in server territory, and that means an immediate price x 10 :-) from what you are used to.

    9. Re:The price is too high.. by kiwimate · · Score: 2

      You might want to read the actual blog where they explain what they use in a bit of detail. This isn't my area of expertise either, but I do know that running 10 servers is very different from running 100 servers, which is also different from running 1000 servers. There are many questions that crop up that you really don't have to consider when you're down in the smaller arenas. (E.g. patch management - manually patching 10 servers is feasible and more cost effective than having an OTS solution; manually patching 1000 servers, not so much.)

      They do also state at the outset:

      In this post, we'll share how to make a 2.0 storage pod, and you're welcome to use the design. We'll also share some of our secrets from the last three years of deploying more than 16 petabytes worth of Backblaze storage pods. As before, our hope is that others can benefit from this information and help us refine the pods.

      My reading - they definitely know more about this than I do, and they're not too proud to admit there could be lessons they can learn from the community.

    10. Re:The price is too high.. by Savantissimo · · Score: 1

      The specific drives they recommend are $130 each (Hitachi Deskstar 5K3000 HDS5C3030ALA630 http://www.newegg.com/Product/Product.aspx?Item=N82E16822145490R ), $5850 for 45, ~1% failure rate vs. the 5% they were getting from other drives.

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    11. Re:The price is too high.. by houstonbofh · · Score: 1

      I'm not a hardware expert, but I imagine you could connect them somehow for less than $1944.68..

      Yes, it is usually cheaper to build hardware in your imagination...

  12. RAID-6 by hjf · · Score: 1

    RAID-6, really?
    After 5+ years working with ZFS, personally, I wouldn't touch md/extX/xfs/btrfs/whatever with a 10 foot pole. Solaris pretty much sucks (OpenSolaris is dead and the open source spinoffs are a joke), but for a storage backend it's years ahead of Linux/BSD.

    Sure, you can run ZFS on Linux (I did) and FreeBSD (I do), but for huge amounts of serious data? No thanks.

    1. Re:RAID-6 by TheRaven64 · · Score: 1

      Sure, you can run ZFS on Linux (I did) and FreeBSD (I do), but for huge amounts of serious data? No thanks.

      What do you count as a serious amount of data? And what makes the FreeBSD version inferior in your opinion (aside from being a slightly older version - I think -STABLE now has the latest OpenSolaris release)?

      Genuinely curious: I'm thinking of building a FreeBSD/ZFS NAS and I'd like to know if there's anything in particular that I need to look out for. Performance isn't really important, because most of the time I'll be accessing it over WiFi anyway, which is liekly to be far more of a bottleneck than anything else. I'm planning on using 3 2TB disks, for 4TB of storage space in a RAID-Z configuration.

      --
      I am TheRaven on Soylent News
    2. Re:RAID-6 by Anonymous Coward · · Score: 0

      I am not trolling but genuinely interested, are there any other advantage of 6TB total storage for 4TB usable in RAID-Z as opposed to RAID-5 other than the check-summing, and block level de-duplication?

    3. Re:RAID-6 by TheRaven64 · · Score: 1

      RAID-Z doesn't have the write hole, so you can do it in software without needing either excessive fsync() or a battery backup. It also has the advantage of being more reliable in cases of partial failure. It's quite a common failure mode for hard disks to return sectors with errors. With RAID-5, you get a checksum failure, but you don't know which of the drives is reporting the wrong error until one fails completely. With RAID-Z, the block checksum will fail on one of them, so you'll get the result from the drive with the error reported immediately. If you do have to rebuild, RAID-Z is aware of which bits of the drive actually contain data, so doesn't require you to copy unused blocks, as RAID-5 implementations do.

      That's ignoring all of the other nice features of ZFS, like atomic transactions, O(1) snapshots, deduplication, and so on.

      --
      I am TheRaven on Soylent News
    4. Re:RAID-6 by afidel · · Score: 1

      Why not? Having parity even when you've lost a disk is a good thing (you can do the same with RAIDZ2 obviously). They are going to be CPU/network latency/network bandwidth limited LONG before they are storage bandwidth limited so the double parity calculation isn't a big deal. Heck I run almost two thirds of my enterprise on vRAID6 on my EVA, other than a bad firmware update it would take a machine gun to make me lose data.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:RAID-6 by Anonymous Coward · · Score: 0

      Also stripes aren't of a fixed size under RAID-Z. That means it's not possible for there ever to be a partial stripe write, so writing performance should theoretically be faster (I haven't seen a huge difference in practice) without the write-backs.

    6. Re:RAID-6 by DamnStupidElf · · Score: 1

      copy-on-write snapshots (fast, space-efficient), automatic resilvering (scanning the disks for bit errors and automatically correcting them), inline compression, extremely large file and filesystem support, and probably others.

    7. Re:RAID-6 by hjf · · Score: 1

      Anything larger than a few terabyes I call "serious amount". Why? well, think how long it takes to fsck your 250GB drive. Now think how much it will take to fsck 4TB or more - some people can't wait 8 hours for an fsck to finish. ZFS doesn't have fsck - the filesystem is *always* consistent.

      The FreeBSD version is a port, ZFS was designed with Solaris in mind. Unices look the same at the command line, but it's the inner workings that make the difference. So I'd rather stick with Solaris, which is what ZFS is developed and tested to work in.

      Performance-wise, Solaris wins hands down. In my tests I consistently got 240MB/s reads in Solaris, while Linux was spotted at 160MB/s. No idea why.
      Curiously enough, power-efficiency-wise Solaris beats Linux too. My UPS reported a 14% idle load with Linux vs 11% with Solaris. Both had power management enabled, but Linux also had tickless and all sorts of funky PM stuff. FreeBSD's power management is next to non-existant.

      If you're building a ZFS storage box, I strongly recommend you get an "HP Microserver". $300 gets you a little box with 4 disk trays, 2 extra SATA ports (need hacked BIOS to enable AHCI for them), and an Athlon II Neo processor. The power consumption of the thing is less than 50W, and it's SILENT. Gigabit Ethernet, VGA port, and 6 USB. It comes with a shitty 1GB RAM (but... ECC RAM!), but it has 2 DIMMs. I just added another 4GB ECC (Kingston generic, KVR1333D3E5), but you can discard the 1GB stick. It comes with a tiny 160GB drive - I just recycled for an old desktop. I run 4x1TB WD Green HDDs. By all means, get WD Greens. They're "slow" (not slower than your wifi) but they run cool and don't waste power.

      Careful with 4k sector drives (WDxxxEARS), I don't know about FreeBSD/ZFS but Solaris' aligns to 512-bytes and performance sucks. You will need to google on "ashift" to see the workaround. Also, I'd tell you to get 4 disks instead of 3. You can't add more drives to a vdev - that means: a ZFS storage pool consists of multiple vdevs. Your ZFS pool will consist of 1 vdev of 3x2TB drives. If you want to add a 4th drive, your pool will be 3x2TB + 2TB. No redundancy on that one. You can't extend to 4x2TB from 3x2TB. Since most mobos have *at least* 4 drive connectors, use em or lose em.

      Ah yes, never make the "OS pool" the data pool. ALWAYS run the OS from a separate drive. If something goes wrong, at least you can mount the data pool in another machine (or a fresh OS install).

    8. Re:RAID-6 by hjf · · Score: 1

      Yes. The most important one: manageability.

      You manage zfs with the "zpool" and "zfs" commads. The filesystem and software raid is one, not md+ext4. If you replace all your drives with larger drives, zfs automagically grows to use all the space. The pool is portable - you can import it to any system without problems.

  13. Anything over 2TB should be ZFS... by QuietLagoon · · Score: 2
    ... if you really care about the data. ZFS has built-in so much more data integrity checks, and more extensive data integrity checks, than the vanilla RAID6 arrays.

    .
    Both FreeBSD and FreeNAS, in addition to OpenSolaris, support ZFS.

    1. Re:Anything over 2TB should be ZFS... by Anonymous Coward · · Score: 0

      Mod parent up. Using JFS instead of ZFS is the biggest mistake for this build.

    2. Re:Anything over 2TB should be ZFS... by brianwski · · Score: 4, Interesting

      ... if you really care about the data.

      (Disclaimer: I work at Backblaze) - If you really care about data, you *MUST* have end-to-end application level data integrity checks (it isn't just the hard drives that lose data!).

      Let's make this perfectly clear: Backblaze checksums EVERYTHING on an end-to-end basis (mostly we use SHA-1). This is so important I cannot stress this highly enough, each and every file and portion of file we store has our own checksum on the end, and we use this all over the place. For example, we pass over the data every week or so reading it, recalculating the checksums, and if a single bit has been thrown we heal it up either from our own copies of the data or ask the client to re-transmit that file or part of that file.

      At the large amount of data we store, our checksums catch errors at EVERY level - RAM, hard drive, network transmission, everywhere. My guess is that consumers just do not notice when a single bit in one of their JPEG photos has been flipped -> one pixel gets every so slightly more red or something. Only one photo changes out of their collection of thousands. But at our crazy numbers of files stored we see it (and fix it) daily.

    3. Re:Anything over 2TB should be ZFS... by rubycodez · · Score: 1

      Except OpenSolaris is dead, better to keep data on a filesystem that runs on living OS. FreeNAS is FreeBSD based, so we're down to one open source OS that supports ZFS.

    4. Re:Anything over 2TB should be ZFS... by brianwski · · Score: 2

      Using JFS instead of ZFS is the biggest mistake for this build.

      (Disclaimer: I work at Backblaze) - We no longer deploy new pods with JFS, but over half our fleet of 200 pods are running JFS and we are perfectly happy with it. We worked through a couple bugs related to large volumes, but after that our main reason for using EXT4 going forward is that in our application EXT4 is measurably faster than JFS, and it is reassuring to be on a filesystem that is used by more people so it (hopefully) has more bugs fixed, etc.

      Earlier we were totally interested in ZFS, as it would replace RAID & LVM as well (and ZFS gets great reviews). But (to my understanding) native ZFS is not available on Linux and we're not really looking to switch to OpenSolaris.

      ANOTHER option down this line of thinking is switching to btrfs, but we haven't played with it yet.

    5. Re:Anything over 2TB should be ZFS... by F.Ultra · · Score: 1

      +1 I'm always surprised on the number of ZFS zealots that reasons like ZFS is a network filesystem. If using ZFS on these nodes then there would be no real end-to-end protection, and still the ZFS zealots goes on an on about how important that end-to-end protection is :-)

    6. Re:Anything over 2TB should be ZFS... by PingPongBoy · · Score: 1

      But at our crazy numbers of files stored we see it (and fix it) daily.

      I do a lot of MD5 checks on my files. But I have yet to encounter a file that has gone bad on a hard drive over time while the rest of the drive stays good. If the file was written badly (rare but it happens), then the MD5 will be wrong. I have huge multigigabyte files and of course many smaller.

      As a rule I try to buy hard disks with more than one year of warranty. They used to be 5 years but a lot of them are now 1 year and 2 year. The 1 year models tended to be flaky, and the 2 year models tend to be quite reliable even with a lot of usage.

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
    7. Re:Anything over 2TB should be ZFS... by lopgok · · Score: 1

      You say you checksum EVERYTHING. I wonder what type of intel i3 processor supports ECC? From what I know, only intel xeon processors support ECC. Of course all amd processors support ECC and many cheap amd compatible motherboards do also. I bought a 3 core amd processor, a motherboard and 4gb of ECC ram for under $200, over a year ago. How reliable can a huge disk array be without ECC memory for the cpu?

    8. Re:Anything over 2TB should be ZFS... by Anonymous Coward · · Score: 0

      ... if you really care about the data.

      (Disclaimer: I work at Backblaze) - If you really care about data, you *MUST* have end-to-end application level data integrity checks (it isn't just the hard drives that lose data!).

      Let's make this perfectly clear: Backblaze checksums EVERYTHING on an end-to-end basis (mostly we use SHA-1). This is so important I cannot stress this highly enough, each and every file and portion of file we store has our own checksum on the end, and we use this all over the place. For example, we pass over the data every week or so reading it, recalculating the checksums, and if a single bit has been thrown we heal it up either from our own copies of the data or ask the client to re-transmit that file or part of that file.

      At the large amount of data we store, our checksums catch errors at EVERY level - RAM, hard drive, network transmission, everywhere. My guess is that consumers just do not notice when a single bit in one of their JPEG photos has been flipped -> one pixel gets every so slightly more red or something. Only one photo changes out of their collection of thousands. But at our crazy numbers of files stored we see it (and fix it) daily.

      You sure do! Working for marketing, are you?

    9. Re:Anything over 2TB should be ZFS... by Anonymous Coward · · Score: 0

      Just curious if you are also using data de-duplication?

      Also might want to step up the advertising and internet presence, first I've heard of BackBlaze despite looking at remote backup systems for awhile.

      HEX

    10. Re:Anything over 2TB should be ZFS... by guruevi · · Score: 1

      We're back to Solaris Express which seems to be maintained by Oracle and spinoff Nexenta which also has a free version.

      ZFS does very well but it is a file system, not an application nor a distributed or networked file system (although it can be set up that way). I know applications that run end-to-end data integrity on top of ZFS because ZFS can maintain local integrity while the application keeps global integrity.

      Usually those applications are very slow to store, retrieve and repair data but sometimes it is necessary or we can deal with it if the bottleneck to the consumer is not in the storage solution (eg. on the consumer Internet your link becomes the bottleneck). For users requiring high bandwidth storage over eg. NFS, an end-to-end application would just delay everything unnecessarily.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    11. Re:Anything over 2TB should be ZFS... by Anonymous Coward · · Score: 0

      Could Nexenta be shoehorned into it? In theory the SATA multiplier issue is fixed...

    12. Re:Anything over 2TB should be ZFS... by Elshar · · Score: 1

      Just out of curiosity, what would happen if a bit had changed over at BB's end? Would it then sync the BB's version of the file over to the client? Also, what about if both ends change at the same time (Highly unlikely, I know). Which side's version would be preferred?

  14. file system by roman_mir · · Score: 2

    When you choose which file system to use, you should consider what the purpose of the storage is. If it's to run a database, you may want to rethink the decision to go with a journaling file system, because databases often their own journaling (like PostreSQL WAL), which actually means the performance will get reduced if you put a journaling file system underneath that. Just my 0.0003 grams of gold.

    1. Re:file system by Anonymous Coward · · Score: 0

      Yo dawg, I heard you like journaling...

    2. Re:file system by QuantumRiff · · Score: 1

      They don't run databases on this storage. the ONLY way they access all this storage is via an HTTPS connection to the tomcat server running on the machine. They have some very, very interesting blog entries about how things scale when you go beyond a handful of servers.

      --

      What are we going to do tonight Brain?
    3. Re:file system by roman_mir · · Score: 1

      That's not why I wrote the comment, I saw that the access is over http, I wrote it because this story is an ad for this company, but also it's talking about building a system like that for your own use, and if you do it for your own use, why would you do http only?

    4. Re:file system by Anonymous Coward · · Score: 0

      I'm not sure it's wise to follow your advice.

      A filesystem journal protects against meta data corruption (two different files aren't corrupted and inter-mangled).
      A database journal projects against the writes within the file being either complete, or not performed.

      Most databases have pre-allocated files, and the amount of meta-data journaling they create is fairly low. You want to have both journals in place so that you can recover fast in a crash.

    5. Re:file system by Anonymous Coward · · Score: 0

      Just my 0.0003 grams of gold.

      You mean your one and a half cent?

    6. Re:file system by roman_mir · · Score: 1

      you have to learn to count better than that.

  15. Feh. by toonces33 · · Score: 1

    It really won't cost that much because you can sell your furnace.

  16. 7K for software raid? and why a low end cpu? by Joe_Dragon · · Score: 1

    Why not use a SAS card?
    why have three PCIe cards that are only X1 when a x4 or better card with more ports has more PCI-e bandwidth and some even have there own RAID cpu on them.

    Why use a low end I3 cpu in a 7K system? at least go to i5 even more so with software raid.

    1. Re:7K for software raid? and why a low end cpu? by drinkypoo · · Score: 4, Informative

      Hardware RAID controllers are stupid in this context. The only place they make sense is in a workstation, where you want your CPU for doing work, and if the controller dies you restore from backups or just reinstall. Using software RAID means never having to try to get a rebuilder software to convert the RAID from one format to another because the old controller isn't available any more, or because you can't get one when you really need one to get that project data out so you can ship and bill.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:7K for software raid? and why a low end cpu? by gman003 · · Score: 3, Insightful

      Because, for this project, raw storage capacity is much more important than performance. Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

      Yeah, in a situation where you need high I/O performance, this design would be less than ideal. But they don't - they're providing backup storage. They don't need heavy write performance, they don't need heavy read performance. They just need to put a lot of data on a disk and not break anything.

      PS: SAS doesn't really provide much better performance than SATA, and it's a lot more expensive. Same for hardware RAID - using those would easily octuple the cost of the entire system.

    3. Re:7K for software raid? and why a low end cpu? by gweihir · · Score: 1

      Very simple: Best bang for the buck. Your approach just increases cost without any real benefit in the target usage scenario. For example, the i5 is just a waste of money and energy. Hardware RAID drives cost, but the only "advantage" is has is that it is easier to use for clueless people.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    4. Re:7K for software raid? and why a low end cpu? by pz · · Score: 3, Insightful

      No. Hardware controllers are the right solution in this context. These pods are not designed for individual users, but for corporations that can afford stockpiles of spare parts, so replacing a board can be done easily. Using hardware controllers allows many more drives per box, and thus per CPU. A populated 6-CPU motherboard is going to be less reliable, dissipate more heat, require more memory, and likely be less reliable, than the special-purpose hardware approach that allows for a single CPU.

      Software RAID makes sense when you have a balance of storage bandwidth requirements to CPU capacity that is heavy on the CPU side. This box is designed for the opposite scenario, as the highly informative blog describes:

      http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

      (Yes, I know, expecting someone to read the blog would mean that they would have to read the linked article and then click through to the original post, a veritable impossibility. Still, it is recommended reading, especially the part about their experience with failure rates and how they have *one* guy replacing failed drives *one* day per week.)

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    5. Re:7K for software raid? and why a low end cpu? by Anonymous Coward · · Score: 0

      It's funny though... any large RAID array has other local bandwidth needs such as scrubbing the array or synchronizing a new disk. I wouldn't want an array that can only keep up with application requests.

      Also, if I were using these for backups, I'd be running rsync on each one, so they need a lot more disk speed than network speed on any typical backup scenario (doing the differential transfer causes full read on the source site and potentially full read plus full write on the destination, even if only a tiny fraction is actually transferred).

      Finally, I'd be adding more than one gigabit ethernet port to a storage device in an enterprise setting, if not sticking it into the backbone with a 10 gigabit ethernet NIC, which can be had for under $1k these days...

    6. Re:7K for software raid? and why a low end cpu? by greed · · Score: 1

      SAS does have better enclosure services support, and the SAS port expanders technology seems to be a lot more robust than SATA. (Part of the problem with SATA is, you're allowed "dumb" and "smart" port multipliers. And it's really hard to find out what is what from the label; you basically have to hope there's a photo on NewEgg where you can read the numbers off the chips.)

      SAS isn't guaranteed expensive. If you can find a non-RAID SAS controller, they're comparable to an equivalent SATA controller, but of course can drive SAS expanders, SAS and SATA drives.

      It's just very hard to find non-RAID SAS controllers. I've got the Supermicro AOC-SASLP-MV8 one on a bunch of machines; that sucker will let me push 700 MB/s to bog-standard Seagate SATA drives in a stripe-set. (That works out to the maximum sustained write speed on all 6 drives in the set; you can see performance drop off as you get closer to the disk hubs.) Yours for a measly $120CDN or so; fanout cables extra.

      That aside, the Backblaze pod approach--right down to the SiL chipset on the SATA controllers--is exactly how I've got my home media fileserver set up. Only, being a cheap bastard, I got a bunch of port-multiplying eSATA boxes on sale and didn't have to get the custom chassis. I can playback multiple Blu-Ray .mt2s files at once, what more do I need?

    7. Re:7K for software raid? and why a low end cpu? by nabsltd · · Score: 1

      Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

      This is absolutely true. Even with a pair of bonded 1Gb Ethernet connections, it's not nearly enough to keep up with a PCIe x1 in the real world. I'm moving to a single 10Gb connection from each server to iSCSI SAN because of this.

    8. Re:7K for software raid? and why a low end cpu? by gman003 · · Score: 1

      Well, let's do some math now. Two Gigabit Ethernet ports gives you... two gigabits of data. That's equal to a single PCIe 1.0 link, or half the bandwidth of a PCIe 2.0 link. Those are x1 links, I remind you - the smallest and slowest PCIe gets. That's also just a bit under the 2400 mbit/s of SATA 2 (or SAS, if you like). And very few hard drives can sustain 2400mbit/s of read - even a 10krpm drive will barely reach half of that. So even accounting for system-generated traffic (RAID maintenance, etc.), you've got more than enough bandwidth internally for the network to be the bottleneck.

      And you seem to have missed the point - you don't even NEED high performance. They're offsite backups - why do you need high I/O performance for something that gets written to once a week or so (daily, at most), and gets accessed even less frequently. Taking your "it could be performing better so why don't they ___" logic to its absurd extremes, they should be stuffing them with SSDs (gotta get that seek time down!), and it should all be on 40GBASE-X Ethernet, if not a nice thick Infiniband. Oh, and they absolutely need quadruple Xeons in each, otherwise you'd be bottlenecking that poor processor, and we can't have that.

    9. Re:7K for software raid? and why a low end cpu? by F.Ultra · · Score: 1

      No, Hardware RAID is completely wrong in this context because their usage is not CPU bound so they have CPU power left to spend on things like RAID. Software RAID does not require much CPU power anyways. And where did you get the 6-CPU motherboard from? The pods run with a single i5 CPU!

      Hardware RAID has several problems, first they tend to change on disk format from revision to revision and also to get a Hardware RAID with the performance and functionality of the Linux MD device you have to purchase some enterprise level RAID which does not come cheap.

    10. Re:7K for software raid? and why a low end cpu? by Anonymous Coward · · Score: 0

      Because, for this project, raw storage capacity is much more important than performance. Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

      10 gig ethernet isn't that expensive. And many gigabit cards will easily bond two channels together for more speed.

    11. Re:7K for software raid? and why a low end cpu? by Anonymous Coward · · Score: 0

      Hardware controllers are absolutely essential to get this density, but hardware RAID is not.

      For my workplace I spent a year researching options, planning, and then building and testing a test-case server based upon ZFS. It went well, so now were migrating to a solution very similar to their "pods" but with BSD, ZFS, and hardware controllers as JBOD. Here are some of our observations:

      Hardware controllers are cheap and make your life much easier. Even x4 slots are simple to get and cheap to populate, and if you run the numbers they really are required (even for SATA 3.0)

      AMD was our only real option (at the time) due to ECC support. We're currently using consumer 6-core chips, but I can barely make a dent in CPU usage despite using cpu-heavy compression by default on all my pools. I probably could have bought low-end CPUs and still have been fine.

      The biggest bottleneck will likely be your protocols (cifs, nfs, iSCSI) unless you're careful. The little quirks of end-user software that go unnoticed can easily become a big problem when piped over the network to a remote server.

      The second biggest bottleneck is the network, but only -most- of the time, and this is an area where you have a lot of flexibility for future changes. It's true that I can max out a 1Gb/s connection with just a few clients, but if you plan a 20-drive 4U around that (with controllers etc), you'll get a nasty surprise when you suddenly need a fast internal transfer: you don't want to wait when moving data between pools or when rebuilding from lost disks.

      The complexity of everything grows exponentially when you outgrow a single chassis. Backblaze doesn't care much about the things discussed here (ext4, zfs) because they are working with distributed filesystems and a higher level of management.

  17. Backblaze is speaking about scalability in SF by Jim+Ethanol · · Score: 3, Informative

    If you're in the SF Bay Area check out http://geeksessions.com/ where Gleb Budman from Backblaze will be speaking about the Storage Pod and their approach to Network & Infrastructure scalability along with engineers from Zynga, Yahoo!, and Boundary. This event will also have a live stream on geeksessions.com.

    Full Disclosure: This is my event.

    50% discount to the event (about $8 bucks and free beer) for the Slashdot crowd here: http://gs22.eventbrite.com/?discount=slashdot

    1. Re:Backblaze is speaking about scalability in SF by gpuk · · Score: 2

      Hi Jim

      I'm quite a few timezones East of you, meaning the live stream will start at 0300 local on Wednesday for me. I'm willing to tough it out and stay up to watch it if necessary but it would be much more civilised if I could watch a playback. Will it be available for download later or is it live only?

      It sucks I've only just learnt about geeksessions :( Some of your earlier events look awesome

    2. Re:Backblaze is speaking about scalability in SF by Anonymous Coward · · Score: 0

      Are we allowed to throw tomatoes, eggs, or dead babies at the Zynga reps? You'd get a lot more attendees, I guarantee it.

    3. Re:Backblaze is speaking about scalability in SF by Anonymous Coward · · Score: 0

      Thanks for the heads up. I just purchased a ticket.

    4. Re:Backblaze is speaking about scalability in SF by Anonymous Coward · · Score: 0

      Check http://geeksessions.com after Tuesday. We will have the Networking and Infrastructure Scalability video archived on Justin.TV and possibly other platforms.

  18. Original blog post by Baloroth · · Score: 5, Informative

    Here is a link to Backblaze's actual blog entry for the new pods 135TB, and here is the original 67TB pods. The blog article is actually quite fascinating. Apparently they are employee owned, use entirely off-the-shelf parts (except for the case, looks like), and recommend Hitachi drives (Deskstar 5K3000 HDS5C3030ALA630) as having the lowest failure rate of any manufacturer (less than 1% they say).

    I found it kinda amusing that ext4's 16TB volume limit was an "issue" for them. Not because its surprising, but because... well, its 16TB. The whole blog post is actually recommended reading for anyone looking to build their own data pods like this. It really does a good job showing their personal experience in the field and problems/not problems they have. For instance: apparently heat isn't an issue, as 2 fans are able to keep an entire pod within the recommended temperature (although they actually use 6). It'll be interesting to see what happens as some of their pods get older, as I suspect that their failure rate will get pretty high fairly soon (their oldest drives are currently 4 years old, I expect when they hit 5-6 years failures will start becoming much more common.) All in all, pretty cool. Oh, and it shows how much Amazon/ Dell price gouges, but that shouldn't really shock anyone. Except the amount. A petabyte for three years is $94,000 with Backblaze, and $2,466,000 with Amazon.

    P.S. I suspect they use ext4 over ZFS because ZFS, despite the built in data checks, isn't mature enough for them yet. They mention they used to use JFS before switching to ext4, so I suspect they have done some pretty extensive checking on this.

    --
    "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    1. Re:Original blog post by cgfsd · · Score: 1

      This reminds me of the Sun 4500 which held 48 drives. Based off an AMD processor running Solaris x86 and ZFS.

      The overall concept is great, but in practice replacing bad drives was a pain.

      When I asked the Sun rep about replacing the drives, he said about once a year or when you get about half a dozen drives failed, power down the system, pull it out of the rack and replace the failed drives.

      Would I store critical data on something like this, hell no. You get what you pay for.

    2. Re:Original blog post by Baloroth · · Score: 1

      They mention that they have one guy dedicated to building new pods and replacing old drives. Out of ~9000 drives and ~200 pods, they replace ~10 drives per week, and with the RAID6 data redundancy the chance of losing data is absolutely minimal. RAID6 uses 2 drives for data parity, so I believe you would need 3 drives out of 45 to fail within a week to actually lose data. I suspect they would shut a pod down if 2 drives in it failed at the same time. Since the failure rate, including infant mortality, is only ~5 percent per year per drive, the chances of even that happening are pretty tiny. I'm not sure what brand of drives the Sun 4500 uses, but 6 a year sounds like a lot. I'm guessing this is considerably more reliable. All in all, because they have a person dedicated to maintaining the system on a weekly basis, this seems like it wouldn't even be all that bad for critical data. I wouldn't make it your only copy (fires/storms do happen) but it definitely seems reliable as an offsite backup.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    3. Re:Original blog post by femtobyte · · Score: 1

      Oh, and it shows how much Amazon/ Dell price gouges, but that shouldn't really shock anyone. Except the amount. A petabyte for three years is $94,000 with Backblaze, and $2,466,000 with Amazon.

      With services like Amazon S3, you aren't paying for just the storage space but also for the (considerably more complicated and expensive) access/availability to the data. Backblaze offers an entirely different type of service: bulk backup space, that will mostly be "write once, read never" --- the data is stored reliably, but certainly not available for random access by thousands of simultaneous connections. If you're using Amazon S3 for bulk backup, then yes, you are stupid and paying way too much. But if you need hosting for data for "live" web use, available over massive amounts of globally distributed bandwidth, then S3 is a rather competitively priced product.

    4. Re:Original blog post by Zemplar · · Score: 1

      That's bogus advice. ZFS supports hot-swap and you can replace drives as they fail, if you like, but one of the biggest benefits to ZFS is that ZFS corrects corruption other filesystems can't detect.

    5. Re:Original blog post by Baloroth · · Score: 1

      True. And I'll be honest, I didn't really think of that. Also, I'm pretty sure Amazon's service is intended for use with way smaller amounts of data, in which case it becomes much more cost effective and reasonable. Still, when you could build and maintain servers with 25 times the amount of data for the same cost as Amazon, I'm inclined to say that Amazon is overcharging. It maybe that everyone else in the market does too, and I acknowledge that building that kind of infrastructure and software is no mean feat. Guess I shouldn't complain: the idea of hosting a petabyte of data that can be accessed at high speeds anywhere in the world... well, thats impressive.

      --
      "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
    6. Re:Original blog post by brianwski · · Score: 1

      RAID6 uses 2 drives for data parity, so I believe you would need 3 drives out of 45 to fail within a week to actually lose data. I suspect they would shut a pod down if 2 drives in it failed at the same time.

      (Disclaimer: I work at Backlaze) We have 3 RAID groups inside each 45 drive pod, each RAID group is 15 drives. So you need 3 drive failures out of one single 15 drive group to lose data. So... when the FIRST drive fails in one 15 drive RAID group, our software automatically stops accepting any more customer data on that particular 15 drive group and the management software puts the file system sitting on top of that RAID gruop into read only mode. This may seem obvious in retrospect, but we found writing to drives causes them to fail or pop out of RAID arrays at more than 100 times higher frequency than just keeping them spinning and reading the information off of them. So by doing this the customers can still restore data from that pod, and we're pretty relaxed about replacing that particular drive sometime in the next few days.

      When a second drive subsequently fails inside a pod, pagers start going off and a Backblaze employee starts driving towards the datacenter.

      With that said, it is worth noting that multiple simultaneous drive failures in one pod are WAAAY more common than pure statistics would indicate. If a SATA card fails, it has three SATA cables plugged into it leading to three separate port multipliers and ultimately is talking with 15 hard drives. So we'll see 15 drives simultaneously drop out of the RAID arrays in one pod and it's pretty obvious what just happened. No big deal, it doesn't (necessarily) corrupt any data. I'm just mentioning you can't take the random drive failure rates of one single drive and do straight multiplication to get to pod failure rates.

    7. Re:Original blog post by hackertourist · · Score: 1

      I'm just mentioning you can't take the random drive failure rates of one single drive and do straight multiplication to get to pod failure rates.

      ISTR that drives from one batch tend to fail in clusters, so when one goes it's time to assume the rest isn't far off. Does Backblaze have any data on this?

    8. Re:Original blog post by Mysticalfruit · · Score: 1

      Also, you can do cool things like we did, which was to systematically replace our 1TB drives for 2TB and ZFS magically saw double the space without having to reboot!

      --
      Yes Francis, the world has gone crazy.
    9. Re:Original blog post by Slashdot+Parent · · Score: 1

      Still, when you could build and maintain servers with 25 times the amount of data for the same cost as Amazon, I'm inclined to say that Amazon is overcharging.

      Amazon is definitely not cheap, but I don't think 25x is an apples to apples comparison.

      For one thing, Amazon provides you with "99.999999999% durability and 99.99% availability of objects over a given year". With Backblaze's NAS (which, by the way, is really frickin' amazing), your 1PB of data is one natural disaster away from becoming 0PB of data. So double the hardware/storage/cooling cost of your Backblaze solution, because if your data are important, you need a minimum of two of them in independent geographical regions.

      Next, they don't factor in the admin cost. Obviously that isn't going to bring you to equal pricing, but you really have to add that in. Who is going to keep your NAS boxes running in multiple facilities?

      Also, they say to install FreeNAS. Are you sure you got the configuration right to achieve the fault advertised tolerance? This type of problem is tricky, and getting it wrong could mean the loss of your important data. AWS is ready for you right now, and it isn't going to lose your data.

      Lastly, AWS charges only for what you use. That's the elastic nature of the thing. So let's say you are doing a data intensive monthly processing that needs 1PB of capacity for 1 or 2 days out of the month, but for the rest of the month, you only need 100TB. All of a sudden the numbers look very different. You can't give your admin a 90% pay cut for 28 days out of the month, but your AWS bill will be only for what you use.

      So AWS gives a ton of flexibility that is financially compelling for a lot of use cases. That being said, I completely agree with you that the use case of "permanent data storage that just keeps growing over time and is hopefully never read" isn't one where AWS can give you an attractive price. You'd be much, much better off with some other solution.

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
    10. Re:Original blog post by brianwski · · Score: 1

      We don't have any data yet. The oldest Backblaze pods contain hard drives that are not quite 4 years old, so we haven't seen any old age mortality yet.

      Here is another totally random thought: We pay $1,400 / month / cabinet in physical space rental plus electricity, which comes to about $5 / drive / month / cabinet. Even if the old (smaller) drives last forever, there will come a moment where it is just a good financial decision to copy all the data off of them onto denser drives because in some number of months the savings in physical space rental and saving electricity (assuming energy use is fixed per drive) pays for the new drives. If a new hard drive is 10 times as dense, it saves Backblaze $4.50 / drive / month in physical space and electricity rental. In 22 months it pays for the $100 replacement drive. (I did that math super quick, so let me know if I'm off by a factor of 10.)

    11. Re:Original blog post by BitZtream · · Score: 1

      For one thing, Amazon provides you with "99.999999999% durability and 99.99% availability of objects over a given year"

      Wrong, they ADVERTISE that.

      They've proven this year already they can not possibly maintain such a record, they will be unable to make such a claim for several years to come ... unless they just ignore previous years performance ... and if you do that, you're an idiot.

      Marketing fluff != reality. Amazon isn't anywhere near what they claim to be. Useful? Sure. Living up to their marketing? Not unless you your math somehow lets you get 99.9999999% reliability when you're down for a couple of days straight.

      At 99.99% you have less than an hour of downtime at your hands per year. Amazon will take decades before than can legitimately again claim 99.99% reliability. Ignoring previous events is just fucking retarded so this 'given year' shit is just a 'we don't actually mean what we say, but if we twist the words JUST RIGHT, idiots will believe its REALLY safe regardless of actual evidence to the contrary being all over the news'

      For reference however, Amazon claims 99.95%, which is roughly 4.5 hours ... so lets see, given that they were down for over ... oh I donno some were down over 36 for sure (waving my hand being one of them) which means ... they need about 8 years of absolutely flawless service before they can legitimately claim 99.95%.

      My servers are 100% reliable over any give period that they are online and functional. Isn't marketing bullshit fun?! And actually, my servers are more reliable than amazons, I've never in my life had a critical server/service down for an hour that wasn't planned well in advance. I admit, some of that is luck, and I'll probably get struck by lightening (or my servers will) for saying it, but if Amazon impresses you, you should not be making IT related decisions for anyone.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    12. Re:Original blog post by Slashdot+Parent · · Score: 1

      It's a shame that Amazon uses terminology that some people find confusing. When AWS claims 99.999999999% durability, they are referring to a percentage of objects lost, not accessibility of those objects. The 99.99% availability refers to your ability to access those stored objects. I can see that you found this confusing based on sentences such as the following:

      Not unless you your math somehow lets you get 99.9999999% reliability when you're down for a couple of days straight.

      Not to belabor the point, but the 99.999999999% durability figure could not be measured in "days straight". An object can be either lost or not lost. I only bring this up so you understand why I assert that you did not, in the strictest sense, comprehend AWS's durability/availability claims.

      Along those lines, the latest figures I can recall about S3 objects is that they are storing 500 billion objects. Assuming I didn't lose track of any zeroes, that would mean that AWS could lose 500 objects annually and still be able to make that claim. If you are aware that S3 has lost, or is on pace to lose, 500 objects this year, I'd appreciate a link. If not, I'm going to assume this is just your confusion talking, and does not relate to any real-world facts.

      They've proven this year already they can not possibly maintain such a record, they will be unable to make such a claim for several years to come ... unless they just ignore previous years performance ... and if you do that, you're an idiot.

      I'm not sure why you think that you've debunked AWS's claims by including prior years. If you'll bother read and comprehend their claim, it's "99.99% availability of objects over a given year", emphasis mine. It's certainly fair game for you to include prior years' downtimes when evaluating your own storage needs; however, they make no claim that their 99.99% includes all prior years. The sword, of course, cuts both ways. I don't think they had any downtime in 2010, but that doesn't give them the right to have 105 minutes of downtime this year.

      So that brings us to you using prior years' performance in evaluating your storage needs. As I'm sure you know, past performance does not guarantee future results. S3 launched in early 2006, and the last major S3 outage I can think of was in the summer of '08 (I think it was 6 or 8 hours or something). You'd be right to say that if you average out 99.99% tolerances over a period of the 5 years that they've been around, they can't claim 99.99% uptime, year over year. Just based on that outage, alone.

      Ignoring previous events is just fucking retarded so this 'given year' shit is just a 'we don't actually mean what we say, but if we twist the words JUST RIGHT, idiots will believe its REALLY safe regardless of actual evidence to the contrary being all over the news'

      This yields an interesting question to ponder. Let's say you are evaluating S3 against Joe's Cloud Storage and Bait and Tackle Shop. Joe's launched on 1/1/2010 and has had zero down time. S3 launched in '06 and has had several hours' downtime. Both services claim 99.99% annual uptime. Whom do you trust more?

      My servers are 100% reliable over any give period that they are online and functional. Isn't marketing bullshit fun?! And actually, my servers are more reliable than amazons, I've never in my life had a critical server/service down for an hour that wasn't planned well in advance. I admit, some of that is luck,

      No, most of that is comparing apples with oranges and has nothing to do with luck. S3 does not have planned downtime. Ever. You need to include your planned downtime in your uptime metrics or kindly retract your 100% uptime claim.

      if Amazon impresses you, you should not be making IT related decisions for anyone.

      Well, S3 stores 500 billion objects and has had less than 1 day of downtime spread out over

      --
      They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
  19. Yes. by Anonymous Coward · · Score: 0

    ...And actual useful snapshot capabilities. And utilities so easy to use, even your freakin' grandma can sling about storage pools.

  20. Mod parent -1, Idiot by Anonymous Coward · · Score: 0

    No. most manufacturers define the terms as 1024 bytes per kilobyte, 1000 kilobytes per megabyte, 1000 megabytes per gigabyte, and 1000 gigabytes per terabyte. Which gets really confusing sometimes - they can't even stay consistent within their own system.

    I haven't checked how Hitachi does it, but that's how Seagate and Western Digital do it. I would assume Hitachi marks them the same way.

    No, actually, you're completely wrong.

    Hitachi (click Specifications):

    Capacity - One GB is equal to one billion bytes and one TB equals 1,000GB (one trillion bytes) when referring to hard drive capacity.

    Seagate:

    When referring to hard drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes.

    Western Digital (click Specifications):

    As used for storage capacity, one megabyte (MB) = one million bytes, one gigabyte (GB) = one billion bytes, and one terabyte (TB) = one trillion bytes.

    Some floppies use hybrid measurements, but hard drives have been entirely powers of ten for ages.

  21. ... but you can't use it by savanik · · Score: 2

    With the latest bandwidth caps I'm seeing on my provider (AT&T U-verse), I can download data at a rate of 250 GB per month. So it'll take me 45 YEARS to fill up that 135 TB array. Something tells me they'll have better storage solutions by then.

    In the meantime, I'm just waiting for Google to roll out the high-speed internet in my locale next year - maybe then I'll have a chance at filling up my current file server.

    1. Re:... but you can't use it by GodfatherofSoul · · Score: 1

      Crazy enough, you can actually *buy* content instead of downloading it from Pirate Bay.

      --
      I swear to God...I swear to God! That is NOT how you treat your human!
    2. Re:... but you can't use it by Anonymous Coward · · Score: 1

      And, crazily enough, that gets counted against your download cap too!

      Unless you're talking about buying the physical media and ripping it yourself, in which case... congratulations! That's completely irrelevant to his complaint!

    3. Re:... but you can't use it by Anonymous Coward · · Score: 0

      Or even create it! What a concept! Start a business as a wedding photographer or videographer. If you download a few hours of raw HD DV for post production work (say, in Cinelarra), you're going to need a TON of storage.

    4. Re:... but you can't use it by Anonymous Coward · · Score: 0

      For Realz??? Source or STFU.

    5. Re:... but you can't use it by Anonymous Coward · · Score: 0

      And amazingly enough, if you *buy* digital music/movies from iTunes/Amazon/whoever, you still have to download it.

    6. Re:... but you can't use it by pz · · Score: 1

      These pods are not intended for the individual user. Your ability to saturate a home pipe without filling up 135 TB isn't relevant.

      --

      Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
    7. Re:... but you can't use it by Anonymous Coward · · Score: 0

      And I expect you'll be wanting him to read all of it off of disks instead of downloading the content he bought?
      Nice.

    8. Re:... but you can't use it by Anonymous Coward · · Score: 0

      Crazy enough, you can actually *buy* digital content instead of ripping it from the physical media.

    9. Re:... but you can't use it by Dahamma · · Score: 1

      What would you be downloading legally over your Internet connection that would require 135TB of space to store? Just curious...

      Most people I know that use huge amounts of storage like that at home either rip their DVDs/BDs or download movies from bittorrent. I have to admit I don't feel much sympathy towards those hitting a data cap due to the latter...

    10. Re:... but you can't use it by CCarrot · · Score: 1

      Crazy enough, you can actually *buy* digital content instead of ripping it from the physical media.

      Yeah...along with a whackload of crap DRM* as a free bonus!

      Extra Special Price, Buy Now!!! Then Buy Again Later When Your Hardware Changes!! And Again!!!

      *(other than mp3's, they finally got that right)

      --
      "I love animals! Some are cute, others are tasty, what's not to like?" - Betsy Schroeder, Jeopardy contestant
    11. Re:... but you can't use it by houstonbofh · · Score: 1

      With the latest bandwidth caps I'm seeing on my provider (AT&T U-verse), I can download data at a rate of 250 GB per month. So it'll take me 45 YEARS to fill up that 135 TB array.

      LAN Party... :)

    12. Re:... but you can't use it by BitZtream · · Score: 1

      Wise men learned long ago that when sharing large amounts of files ... you mail a hard drive to someone.

      My friends and I have a large drive that rotates between a few of us. When you get the drive, you pull all the new stuff off of it, and put all of your new stuff on it, then ship the drive to the next person. Continue. Works even better when several of the people on the list all work together so there isn't any mail time between updates.

      We could easily fill that array up in a couple of weeks with no Internet connection between all of us.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    13. Re:... but you can't use it by BitZtream · · Score: 1

      I wouldn't.

      I do know some companies with projects who's design documents and related materials easily exceed terabytes per project.

      Think about a project like the space shuttle, an Airbus 380, or a big ass Boeing jumbo jet.

      You and I aren't going to be accessing those projects (well, I'm not, maybe you get lucky enough to work with some of those guys) but there are plenty of engineers who would have a reason to do so. I've seen far larger stores than this at a company which designs large aircraft for that very purpose.

      --
      Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
    14. Re:... but you can't use it by Dahamma · · Score: 1

      I guarantee you that those engineers are not downloading terabytes of highly confidential design documents over their residential cable Internet connection to a giant NAS in their home, even with a VPN. And if they do, it probably doesn't fall under any definition of "legally"...

      But even assuming some company is stupid enough to allow that, that's still 0.0000001% of the population, and if it's required by their job they will be able to expense any overages anyway.

  22. Meh by Anonymous Coward · · Score: 0

    Not really that useful for any data that needs to be accessible 100% of the time. The drive do not look to be hot swapable and there is no redundancy anywhere in the design.
    Even with all those raid groups with a single processor read write times are going to be hideous. Also not knowing about the software Your volumes/aggregates may be limited to a single RAID group which limits the usefulness.

    Yeah its a cheap solution but its usefulness in a production or backup environment is limited. There are storage providers out there that have systems with price points not much higher than this that aren't as unreliable.

    1. Re:Meh by pnutjam · · Score: 1

      The redundancy is unit based, not component based. This makes alot of sense, it's what google does. You don't have to go for expensive proprietary parts, you just buy two commodity parts (or more).

    2. Re:Meh by fuzzyfuzzyfungus · · Score: 1

      I think that you are looking for redundancy at too small a scale: Yes, per-box, there is very little redunancy. RAID-6 makes it not completely useless; but a PSU going out will take out half the box, which will render it pretty useless until the PSU comes back online, and if the mobo dies, game over.

      However, as the pictures suggested, they are running rather a lot of these boxes. Their (proprietary) software layer handles storing data across all the boxes and presenting it in some useful-to-the-backblaze-client way over the internet. An OSS analog would be something like Tahoe-FS treating each storage box as a backend server. In that scenario, you can, depending on the desired tradeoff between cost and risk, allow one or more entire servers to fail without compromising the overall logical filesystem...

    3. Re:Meh by brianwski · · Score: 1

      The drives do not look to be hot swapable

      (Disclaimer: I work at Backblaze) All SATA drives are inherently hot swappable, including the ones in the Backblaze pod. We have tried it, it worked the few times we did it. But for normal operations, we shut the pod down completely to swap drives. The first reason is that because the pods are stacked on top of each other and the drives are replaced from the top, we have to slide the pod out half way out of the rack like a drawer. It feels kinda wrong to slide servers around like that while the drives are spinning, so we avoid it (I have no proof it actually causes significant problems). Another reason is that with the top of the pod open, the cooling airflow isn't the same and some of the drives in the center start rising in temperature. This isn't fatal, but it puts you on a "timer" where you want to get the hot swap done within a reasonable amount of time (like 5 minutes) and get the pod closed back up again. Finally, it just seems safer to let the machine come up cleanly with the drive replaced. For our application it doesn't matter at all, no customer can possibly know or care if one, two, or ten pods are offline during a reboot.

  23. Engineering competence does give an edge by gweihir · · Score: 1

    I did something a bit similar on a smaller scale about 9 years ago. (Linux software RAID, 12 disk in a cheap server). The trick is to make sure that you pay something like 70% of the total hardware cost for the disks. It is possible, it can be done reliable, but you have to know what you are doing. If you are not a competent and enterprising engineer, forget it (or become one). But the largest cost driver in storage is that people want to buy storage pre-configured and in a box that they do not need to understand. This is not only very expensive, (when I researched this 9 years ago, disk part of total price was sometimes as low as 15%!), but gives you lower performance and lower reliability. And also less flexibility.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    1. Re:Engineering competence does give an edge by Walker1337 · · Score: 2

      But the largest cost driver in storage is that people want to buy storage pre-configured and in a box that they do not need to understand. This is not only very expensive, (when I researched this 9 years ago, disk part of total price was sometimes as low as 15%!), but gives you lower performance and lower reliability. And also less flexibility.

      You aint kidding. I have installed systems for people that cost hundreds of thousands of dollars and they cant even give me basic information in order to complete the install. How many disks to each head? No Idea. How big do you want your RAID groups? No idea. Excuse me sir this IP and Gateway are in different subnets can I have another? That last one has actually happened more than once.

  24. ME WANT!! by Anonymous Coward · · Score: 0

    I can't imagine who has a need for such a ridiculous amount of storage, but nevertheless...

    ME WANT!

    After all, "640K ought to be enough for anybody"...okay, he was talking about memory, but still...

    *Sigh* (goes back to tinkering with 3 TB RAID array/server)

    1. Re:ME WANT!! by maxwell+demon · · Score: 1

      Yes, 640K disks with 640 Terabytes each ought to be enough for anybody. :-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
  25. SAN? by Anonymous Coward · · Score: 0

    My place is in the market for a new SAN device, so it was very interesting to see this post today. What kind of changes would people suggest in order to make this sort of thing perform better (and more reliably) as a SAN device instead of just backup storage?

    1. Re:SAN? by pnutjam · · Score: 1

      you could use openfiler, but you would want to swap some of your disk space for network controllers.

    2. Re:SAN? by Savantissimo · · Score: 1

      Number 1 thing would be more / bigger network links. I think this has used all its PCIe slots, so you might have to cut the capacity by 1/3 to put in a bigger network card - say a 4x Gigabit Ethernet card (cheap) or a 10gig card (more expensive, plus need the 10gig port to hook it to). Or get a motherboard with more slots.

      More speculatively:
      More RAM might help if you can set it up to cache the right things. Faster drives would help the IOPS (lower latency) but the bandwidth bottleneck is going to be the network. You'll likely want two or three of these boxes for redundancy and backup, too. (Plus spares of everything.) Or maybe a big tape loader, but at this scale I think whole-server redundancy is a lot less trouble. (Your backup will weigh ~145lbs, though.)

      ZFS is likely a more reliable good way to go than RAID, though some think it's too new. If you add some SSDs, you can get better system response, too:
      "ZFS also supports both read and write caching, for which special devices can be used. Solid State Devices can be used for the L2ARC, or Level 2 adaptive replacement cache, speeding up read operations, while NVRAM buffered SLC memory can be boosted with supercapacitors to implement a fast, non-volatile write cache, improving synchronous writes. Finally, when mirroring, block devices can be grouped according to physical chassis, so that the filesystem can continue in the case of the failure of an entire chassis."- http://en.wikipedia.org/wiki/ZFS

      With a lot of RAM, mostly set up as a disk, a UPS, (and a shutdown script running on line power loss, of course) the write caching could likely be implemented more inexpensively than the SLC solution. It isn't that big a perceived performance increase on most systems, though, and adds some risk. Regular MLC SSD read caching with L2ARC can make the system far more responsive at very little cost, and with no reliability concerns if ZFS is set up correctly.

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    3. Re:SAN? by Savantissimo · · Score: 1

      See: http://bigip-blogs-adc.oracle.com/brendan/entry/test for more about ARC, L2ARC, using SSDs with ZFS. With 128GB RAM, 550GB of SSDs, and 18TB of disk, the speedup was 8.4x over just the RAM and disks, with 20x less latency. YMMV with different workloads.

      --
      "Is life so dear, or peace so sweet, as to be purchased at the price of chains and slavery?" - Patrick Henry
    4. Re:SAN? by fuzzyfuzzyfungus · · Score: 1

      My place is in the market for a new SAN device, so it was very interesting to see this post today. What kind of changes would people suggest in order to make this sort of thing perform better (and more reliably) as a SAN device instead of just backup storage?

      The number of changes that you would need to make to this device to turn it into a decent SAN would probably be rather more expensive than just buying the SAN from somebody who has economies of scale. You could just install an iSCSI initiator on the OS and call it a day; but performance would be deeply miserable and uptime not so exciting, by SAN standards.

      Such a comparatively unreliable node really starts to make sense if you are working at a scale where each storage pod is considered to be a swappable component where failure or downtime is acceptable. There are a number of filesystems, some OSS, some proprietary, which allow you to present a single logical filesystem whose contents(and a configurable amount of redundancy information) are spread among a (potentially large) number of storage nodes connected over an IP network.

      If you were talking about needing that amount of storage, you could set up a 'SAN' head node, based on a fairly powerful, all-the-redundant-bells-and-whistles enterprise grade server, which would run such a filesystem across a large number of these pods and present an iSCSI initiator to the rest of the network. It would still be on the slower-but-cheaper side of Real Serious SAN gear; but the correct choice of head node or head nodes could get reliability up there. If your needs are less than or equal to a single pod, though, you really can't bolt on incrementally more reliability or performance without causing the price to zoom up...

  26. They specified a RAID CPU by Quila · · Score: 1

    An Intel i3 540, more powerful than the CPU on most hardware RAID controllers.This thing will be doing very little other than handling the RAID sets.

  27. One guy? by EvilStein · · Score: 1

    hmm.

    What the hell else is Sean doing with his time? That's what the articles are really missing...

  28. Alternative Software by D-OveRMinD · · Score: 1

    Instead of FreeNAS, you can use Openfiler. Also, Open-E is really good, and has easy to setup block replication failover as well. If you want to go high end for custom storage, take a look at Datacore.

  29. ZFS by Anonymous Coward · · Score: 0

    P.S. I suspect they use ext4 over ZFS because ZFS, despite the built in data checks, isn't mature enough for them yet. They mention they used to use JFS before switching to ext4, so I suspect they have done some pretty extensive checking on this.

    ZFS is mentioned in the blog comments, as well as in the HN thread: they looked into it, but given that they decided to go with Linux on their servers, ZFS isn't really available in a stable fashion. If they had decided to go with (Open)Solaris or Illuminos or FreeBSD, then ZFS would be a more viable option.

    It'd certainly be a lot less convoluted than mdadm(RAID6) -> LVM(PV,VG, 3xLV) -> ext4. A 'zpool create mypool1 raidz2 disk1 disk2 ... diskN' is a lot simpler.

  30. But the raid cpu is on it's own where the system by Joe_Dragon · · Score: 1

    But the raid cpu is on it's own where the system cpu has to do the video, networking, and the OS on top of doing the raid work.

  31. Re:But the raid cpu is on it's own where the syste by Quila · · Score: 1

    This is a modern 3.1 GHz, dual-core CPU vs. .... let's take a Promise SATA RAID card with an Intel 333 controller. That's an 800 MHz ARMv5TE CPU, two ARM generations ago, not even superscalar. The i3 is going to have many cycles to spare after taking the load of three such controllers.

  32. Maybe I am missing the point by Anonymous Coward · · Score: 0

    I have a hard time envisioning such an extreme capacity versus throughput requirement.

    Like I said, part of it is the question of scrubbing which is a local task that scales with the amount of storage. I have seen much smaller SATA MD arrays (3-5 disks) which end up taking over a day per week just to complete their scrubbing with some light background load. If these larger arrays cannot effectively scale the scrubbing with many more disks, they could wind up saturating with nothing but scrubbing 24x7. How long does it take for a pod to scrub its entire array? How long does it take to sync a replacement 3 TB drive?

    And as for application requirements, I admit I have trouble understanding a new deployment that would be sized to the throughput of one or two disks as a relevant benchmark. When I think of RAID 6, I think of many disks in parallel and many times the throughput of a single disk. I also assume many time the typical client load, e.g. backup of a department full of PCs, or a rackful of servers, rather than one PC or one server.

  33. The drives alone cost more than $7.3k without RAID by BitZtream · · Score: 1

    Let see... first thing I see when I click on hard drives on new egg is a 3TB drive for $180.

    So.

    135/3 = 45
    45 * $180 = 8100

    Thats just drives, no raid, no controllers, no chassis/cass.

    With more digging I find a 5400 RPM drive for 139 ... so ...

    45 * 140 = 6300, but still just the drives ... and no RAID.

    Can you find cheaper drives? I'm sure, I spent all of about 10 seconds looking, but I doubt you're going to want to.

    You guys are all wondering around arguing over the silliness of their slashvertisment (which it most certainly is) and various software implementations that would take the place of theirs and be better (which I don't disagree with one bit) but ... you entirely over looked the fact that their statements are bold faced lies. They didn't build it for that much. They may have ignored a bunch of costs and said 'we built it for X amount', but thats like saying the space program only cost American Tax payers the cost of shutting it down because we already did the other stuff so it doesn't count against the cost!

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  34. Re:The drives alone cost more than $7.3k without R by funky_vibes · · Score: 2

    And that takes into account price breaks and volume pricing?

    There exist 10 to 25 OEM packs of drives from many manufacturers, did you look at those mfg part no.s?
    What about a full pallet?

    Only a moron would buy that amount of drives from a company that sells mainly to CONSUMERS.
    Even as a consumer, with large enough volumes, you may in some cases purchase straight from a distributor.

  35. Commercial Alternative? by kmand · · Score: 1

    While its great that they posted the plans, some of the parts list are custom, and its a bit too much hardware tinkering for me. What I would like to see is a similar commercially produced box, minus drives for a few thousand. All the big players with turnkey solutions seem to sell only with drives at ridiculous prices.

    1. Re:Commercial Alternative? by Anonymous Coward · · Score: 0

      SuperMicro has 4U rack cases which can hold up to 36 3.5" disks and 72 2.5" disks. They also have JBOD case models from these which can hold up to 45 3.5" disks. We got couple with 24x3.5" bays and we are happy with them.