Slashdot Mirror


Build Your Own 135TB RAID6 Storage Pod For $7,384

An anonymous reader writes "Backblaze, the cloud-based backup provider, has revealed how it continues to undercut its competitors: by building its own 135TB Storage Pods which cost just $7,384 in parts. Backblaze has provided almost all of the information that you need to make your own Storage Pod, including 45 3TB hard drives, three PCIe SATA II cards, and nine backplane multipliers, but without Backblaze's proprietary management software you'll probably have to use FreeNAS, or cobble together your own software solution... A couple of years ago they showed how to make their first-generation, 67TB Storage Pods"

34 of 239 comments (clear)

  1. Re:My God... by ByOhTek · · Score: 2

    It's full of slashvertisements!!

    --
    Self proclaimed typo king, and inventor of the bear destroying coffee table (patent not pending).
  2. Not enough by bryan1945 · · Score: 2

    For a true porn collector yet.

    --
    Vote monkeys into Congress. They are cheaper and more trustworthy.
  3. This is a huge step forward by mugurel · · Score: 3, Funny

    for both internet security and privacy: each of us can now store his own local copy of the internet and surf offline!

  4. Can't actually store 135TB of data by gman003 · · Score: 4, Interesting

    The article says it uses RAID 6 - 45 hard drives are in the pod, which are grouped into an arrays of 15 that use RAID 6 (the groups being combined by logical volumes), which gives you an actual data capacity of 39TB per group (3TB * (15 - 2) = 39TB), which then becomes 117TB usable space (39TB * 3 = 117TB). The 135TB figure is what it would be if you used RAID 1, or just used them as normal drives (45 * 3TB = 135TB).

    And these are all "manufacturer's terabytes", which is probably 1,024,000,000,000 bytes per terabyte instead of 1,099,511,627,776 (2^40) bytes per terabyte like it should be. So it's a mere 108 terabytes, assuming you use the standard power-of-two terabyte ("tebibyte', if you prefer that stupid-sounding term).

    1. Re:Can't actually store 135TB of data by GameboyRMH · · Score: 2, Informative

      A manufacturer's terabyte would be 1,000,000,000,000 bytes.

      --
      "When information is power, privacy is freedom" - Jah-Wren Ryel
    2. Re:Can't actually store 135TB of data by gman003 · · Score: 3, Informative

      Common usage for the past 50 years has been that, in the context of computer memory capacity, 'tera-" is to be interpreted as 2^40 (with "giga-" being 2^30, and so on). You'll note that I included a sidenote on 'tebibytes" to appease revisionists like you.

      PS: It's rather ironic that someone accusing me of bastardizing SI prefixes can't even spell 'terabytes" properly. Unless you're somehow referring to Earth Bytes or something.

    3. Re:Can't actually store 135TB of data by Kjella · · Score: 4, Informative

      Hitachi:
      "Capacity - One GB is equal to one billion bytes and one TB equals 1,000GB (one trillion bytes) when referring to hard drive capacity."

      Western Digital:
      "As used for storage capacity, one megabyte (MB) = one million bytes, one gigabyte (GB) = one billion bytes, and one terabyte (TB) = one trillion bytes."

      Seagate (PDF product sheets):
      "When referring to hard drive capacity, one gigabyte, or GB, equals one billion bytes and one terabyte, or TB, equals one trillion bytes."

      So no, no and more no. Sometimes there really should be a "-1, Wrong" moderation...

      --
      Live today, because you never know what tomorrow brings
    4. Re:Can't actually store 135TB of data by Just+Some+Guy · · Score: 2

      Stop bastardizing the SI prefixes. Terra is the prefix

      The irony: it is strong with this one.

      --
      Dewey, what part of this looks like authorities should be involved?
    5. Re:Can't actually store 135TB of data by Solandri · · Score: 2

      Some marketer at Maxtor(?) started the transition from the 2^20 definition of MB to 10^6 for HDDs in the mid-1990s. The (at the time) smaller HDD manufacturers like Western Digital quickly followed suit. Seagate was one of the later ones. IBM (now Hitachi) was the last one to make the switch - they held out until about 2000.

  5. Re:The price is too high.. by tomz16 · · Score: 2

    Nope, not at all... $2,000 is actually really cheap IMHO. Try to find a way to connect 68 drives cheaply (RAID cards and SATA multiplier backplanes are both pretty expensive). Don't forget that you also need a custom case, motherboard, ram, cpu, PS, and cooling for everything.

  6. Anything over 2TB should be ZFS... by QuietLagoon · · Score: 2
    ... if you really care about the data. ZFS has built-in so much more data integrity checks, and more extensive data integrity checks, than the vanilla RAID6 arrays.

    .
    Both FreeBSD and FreeNAS, in addition to OpenSolaris, support ZFS.

    1. Re:Anything over 2TB should be ZFS... by brianwski · · Score: 4, Interesting

      ... if you really care about the data.

      (Disclaimer: I work at Backblaze) - If you really care about data, you *MUST* have end-to-end application level data integrity checks (it isn't just the hard drives that lose data!).

      Let's make this perfectly clear: Backblaze checksums EVERYTHING on an end-to-end basis (mostly we use SHA-1). This is so important I cannot stress this highly enough, each and every file and portion of file we store has our own checksum on the end, and we use this all over the place. For example, we pass over the data every week or so reading it, recalculating the checksums, and if a single bit has been thrown we heal it up either from our own copies of the data or ask the client to re-transmit that file or part of that file.

      At the large amount of data we store, our checksums catch errors at EVERY level - RAM, hard drive, network transmission, everywhere. My guess is that consumers just do not notice when a single bit in one of their JPEG photos has been flipped -> one pixel gets every so slightly more red or something. Only one photo changes out of their collection of thousands. But at our crazy numbers of files stored we see it (and fix it) daily.

    2. Re:Anything over 2TB should be ZFS... by brianwski · · Score: 2

      Using JFS instead of ZFS is the biggest mistake for this build.

      (Disclaimer: I work at Backblaze) - We no longer deploy new pods with JFS, but over half our fleet of 200 pods are running JFS and we are perfectly happy with it. We worked through a couple bugs related to large volumes, but after that our main reason for using EXT4 going forward is that in our application EXT4 is measurably faster than JFS, and it is reassuring to be on a filesystem that is used by more people so it (hopefully) has more bugs fixed, etc.

      Earlier we were totally interested in ZFS, as it would replace RAID & LVM as well (and ZFS gets great reviews). But (to my understanding) native ZFS is not available on Linux and we're not really looking to switch to OpenSolaris.

      ANOTHER option down this line of thinking is switching to btrfs, but we haven't played with it yet.

  7. file system by roman_mir · · Score: 2

    When you choose which file system to use, you should consider what the purpose of the storage is. If it's to run a database, you may want to rethink the decision to go with a journaling file system, because databases often their own journaling (like PostreSQL WAL), which actually means the performance will get reduced if you put a journaling file system underneath that. Just my 0.0003 grams of gold.

  8. Re:My God... by x6060 · · Score: 2

    It's not a slashvertisement if they tell you how to build it yourself.....

  9. Re:Feelin' HOT HOT HOT by hjf · · Score: 3, Informative

    This is nothing new. You've never been in a datacenter before, kid. You can ask a grownup one day and he can take you there and you will feel the heat. And NOISE. No offense, but I think you're one of those gamer kids who builds rigs for max FPS, with esoteric water cooling and silent fans everywhere.

    Yeah, no, you don't need to pamper your hardware that much. Even laptop drives work way hot (60C+) for years with no issue.

    Most servers are built that way too. The Sun x4500 is extremely densely packed. And there are hundreds running just fine.

  10. Backblaze is speaking about scalability in SF by Jim+Ethanol · · Score: 3, Informative

    If you're in the SF Bay Area check out http://geeksessions.com/ where Gleb Budman from Backblaze will be speaking about the Storage Pod and their approach to Network & Infrastructure scalability along with engineers from Zynga, Yahoo!, and Boundary. This event will also have a live stream on geeksessions.com.

    Full Disclosure: This is my event.

    50% discount to the event (about $8 bucks and free beer) for the Slashdot crowd here: http://gs22.eventbrite.com/?discount=slashdot

    1. Re:Backblaze is speaking about scalability in SF by gpuk · · Score: 2

      Hi Jim

      I'm quite a few timezones East of you, meaning the live stream will start at 0300 local on Wednesday for me. I'm willing to tough it out and stay up to watch it if necessary but it would be much more civilised if I could watch a playback. Will it be available for download later or is it live only?

      It sucks I've only just learnt about geeksessions :( Some of your earlier events look awesome

  11. Original blog post by Baloroth · · Score: 5, Informative

    Here is a link to Backblaze's actual blog entry for the new pods 135TB, and here is the original 67TB pods. The blog article is actually quite fascinating. Apparently they are employee owned, use entirely off-the-shelf parts (except for the case, looks like), and recommend Hitachi drives (Deskstar 5K3000 HDS5C3030ALA630) as having the lowest failure rate of any manufacturer (less than 1% they say).

    I found it kinda amusing that ext4's 16TB volume limit was an "issue" for them. Not because its surprising, but because... well, its 16TB. The whole blog post is actually recommended reading for anyone looking to build their own data pods like this. It really does a good job showing their personal experience in the field and problems/not problems they have. For instance: apparently heat isn't an issue, as 2 fans are able to keep an entire pod within the recommended temperature (although they actually use 6). It'll be interesting to see what happens as some of their pods get older, as I suspect that their failure rate will get pretty high fairly soon (their oldest drives are currently 4 years old, I expect when they hit 5-6 years failures will start becoming much more common.) All in all, pretty cool. Oh, and it shows how much Amazon/ Dell price gouges, but that shouldn't really shock anyone. Except the amount. A petabyte for three years is $94,000 with Backblaze, and $2,466,000 with Amazon.

    P.S. I suspect they use ext4 over ZFS because ZFS, despite the built in data checks, isn't mature enough for them yet. They mention they used to use JFS before switching to ext4, so I suspect they have done some pretty extensive checking on this.

    --
    "None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
  12. Re:7K for software raid? and why a low end cpu? by drinkypoo · · Score: 4, Informative

    Hardware RAID controllers are stupid in this context. The only place they make sense is in a workstation, where you want your CPU for doing work, and if the controller dies you restore from backups or just reinstall. Using software RAID means never having to try to get a rebuilder software to convert the RAID from one format to another because the old controller isn't available any more, or because you can't get one when you really need one to get that project data out so you can ship and bill.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  13. Re:7K for software raid? and why a low end cpu? by gman003 · · Score: 3, Insightful

    Because, for this project, raw storage capacity is much more important than performance. Besides, they claim their main bottleneck is the gigabit Ethernet interface - even software RAID, the PCIe x1, and the raw drive performance is less of a limiting factor.

    Yeah, in a situation where you need high I/O performance, this design would be less than ideal. But they don't - they're providing backup storage. They don't need heavy write performance, they don't need heavy read performance. They just need to put a lot of data on a disk and not break anything.

    PS: SAS doesn't really provide much better performance than SATA, and it's a lot more expensive. Same for hardware RAID - using those would easily octuple the cost of the entire system.

  14. Re:My God... by TheRaven64 · · Score: 2

    Except for the bit about how it would be even better if you paid for their proprietary management software...

    --
    I am TheRaven on Soylent News
  15. ... but you can't use it by savanik · · Score: 2

    With the latest bandwidth caps I'm seeing on my provider (AT&T U-verse), I can download data at a rate of 250 GB per month. So it'll take me 45 YEARS to fill up that 135 TB array. Something tells me they'll have better storage solutions by then.

    In the meantime, I'm just waiting for Google to roll out the high-speed internet in my locale next year - maybe then I'll have a chance at filling up my current file server.

  16. Re:But you cant use it without getting too hot? by Black.Shuck · · Score: 2

    It's probably fine.

  17. Re:The price is too high.. by kiwimate · · Score: 2

    You might want to read the actual blog where they explain what they use in a bit of detail. This isn't my area of expertise either, but I do know that running 10 servers is very different from running 100 servers, which is also different from running 1000 servers. There are many questions that crop up that you really don't have to consider when you're down in the smaller arenas. (E.g. patch management - manually patching 10 servers is feasible and more cost effective than having an OTS solution; manually patching 1000 servers, not so much.)

    They do also state at the outset:

    In this post, we'll share how to make a 2.0 storage pod, and you're welcome to use the design. We'll also share some of our secrets from the last three years of deploying more than 16 petabytes worth of Backblaze storage pods. As before, our hope is that others can benefit from this information and help us refine the pods.

    My reading - they definitely know more about this than I do, and they're not too proud to admit there could be lessons they can learn from the community.

  18. Re:My God... by Dillon2112 · · Score: 5, Insightful

    My problem with Backblaze is their marketing is very misleading...they pit these storage pods up against cloud storage and assert that they are "cheaper", as though a storage pod is anything like cloud storage. It isn't. Sure, there's the management software issue that's already been mentioned, but they do no analysis on redundancy, power usage, security, bandwidth usage, cooling, drive replacement due to failure, administrative costs, etc. It's insulting to anyone who can tell the difference, but there are suits out there who read their marketing pitch and decide that current cloud storage providers like Google and Amazon are a rip off because "Backblaze can do the same thing for a twentieth the price!" It's nuts.

    You can see this yourself in their pricing chart at the bottom of their blog post. They assert that Backblaze can store a petabyte for three years for either $56k or $94k (if you include "space and power"). And then they compare that to S3 costing roughly $2.5 million. In their old graphs, they left out the "space and power" part, and I'm sure people complained about the inaccuracies. But they're making the same mistake again this time: they're implicitly assuming the cost of replicating, say, S3, is dominated by the cost of the initial hardware. It isn't. They still haven't included the cost of geographically distributing the data across data centers, the cost of drive replacement to account for drive failure over 3 years, the cost of the bandwidth to access that data, and it is totally unclear if their cost for "power" includes cooling. And what about maintaining the data center's security? Is that included in "space"?

    On a side note, I'd be interested to see their analysis on mean time between data loss using their system as it is priced in their post.

    You could say the Backblaze is serving a different need, so it doesn't need to incur all those additional costs, and you might be right, but then why are they comparing it to S3 in the first place? It's just marketing fluff, and it is in an article people are lauding for its technical accuracy. Meh.

  19. Re:Engineering competence does give an edge by Walker1337 · · Score: 2

    But the largest cost driver in storage is that people want to buy storage pre-configured and in a box that they do not need to understand. This is not only very expensive, (when I researched this 9 years ago, disk part of total price was sometimes as low as 15%!), but gives you lower performance and lower reliability. And also less flexibility.

    You aint kidding. I have installed systems for people that cost hundreds of thousands of dollars and they cant even give me basic information in order to complete the install. How many disks to each head? No Idea. How big do you want your RAID groups? No idea. Excuse me sir this IP and Gateway are in different subnets can I have another? That last one has actually happened more than once.

  20. Re:But you cant use it without getting too hot? by demonbug · · Score: 3, Interesting

    Or can somebody tell me if the cooling of the HDs is ok if they are stacked like in the picture?

    According to their blog post about it, they see a variation of ~5 degrees within unit (middle drives to outside drives) and about 2 degrees from the lowest unit in a rack to the highest. They also indicate that the drives stay within the spec operating temperature range with only two of the six fans in each chassis running.

    Keep in mind these are 5400 RPM drives, not the 10K+ drives you would expect in an application where performance is critical. These are designed for one thing - lots of storage, cheap. No real worries about access times, IOPS, or a lot of the other performance measures that a more flexible storage solution would need to be concerned with. These are for backup only - nice large chunks of data written and (hopefully) never looked at again.

  21. Re:My God... by x6060 · · Score: 4, Insightful

    Did you notice how they even gave you the alternatives to their software? Essentially they are saying "We developed this for our own internal use and if you would LIKE to pay for it its cool. If you dont then there are these other free alternatives." But then again just because some company is mentioned in the article it MUST be a slashvertisment.

  22. Re:My God... by x6060 · · Score: 2

    You must be. They even GIVE you the free alternatives to their software. But in the same page they give you everything you need to do it yourself. You just have to add some hardware.

  23. Re:7K for software raid? and why a low end cpu? by pz · · Score: 3, Insightful

    No. Hardware controllers are the right solution in this context. These pods are not designed for individual users, but for corporations that can afford stockpiles of spare parts, so replacing a board can be done easily. Using hardware controllers allows many more drives per box, and thus per CPU. A populated 6-CPU motherboard is going to be less reliable, dissipate more heat, require more memory, and likely be less reliable, than the special-purpose hardware approach that allows for a single CPU.

    Software RAID makes sense when you have a balance of storage bandwidth requirements to CPU capacity that is heavy on the CPU side. This box is designed for the opposite scenario, as the highly informative blog describes:

    http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/

    (Yes, I know, expecting someone to read the blog would mean that they would have to read the linked article and then click through to the original post, a veritable impossibility. Still, it is recommended reading, especially the part about their experience with failure rates and how they have *one* guy replacing failed drives *one* day per week.)

    --

    Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
  24. Re:My God... by Archangel+Michael · · Score: 3, Insightful

    First: ALL Marketing is misleading. That is what marketing does. Accentuate the positive, eliminate the negative. So complaining about that is just idiotic.

    Second: You could have a couple dozen Backblaze units, pay for a tech to monitor them 24/7/365 and replace all the drives twice over for what Amazon charges for the same thing. Sure that doesn't included cost for premises, and HighSpeed Internet to multiple locations. But still, that is aggregated with all the other clients.

    Third: what are you paying for in the "cloud", I mean besides ethereal concepts. Does Amazon tell you how they do things? You probably know less about Amazon (and the others) setup so you're comparing something you know something about (not everything) verses something you know almost nothing about, and the complain that they aren't doing it in a comparable way. You don't know.

    Fourth: Your basic assumption is that Backblaze has no contigency for drive replacement, which is false. Since these are "new" drives there might be insufficient data about failure rates and therefore the actual cost of replacement (never mind warranties) or having drives in both Hot and Cold Spare setups. I'm sure that Backblaze in their $5/MO service figures what it costs to store data, have spares, keep the Datacenter running and profitable. Even if they double the cost to $10, it still puts the others to shame.

    Have you compared the data loss rates for the last three years between Amazon and Backblaze? Can you even compare or is that data held secret (see point 1b). My point here, is that you're pulling shit out of your ass and thinking it doesn't stink. Even if it isn't directly comparable, it is at least in the realm of consideration, EVEN if everything you said is true. And at 10 times less in cost, that can buy a lot of redundancy. It is just a matter of perspective.

    --
    Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  25. Re:My God... by x6060 · · Score: 2

    Thats all well and good that you can point out a few places that "Looks like marketing/advertising speak" and ignore the fact that they give you all the tools and knowledge to do it yourself and replicate the success of the "ultra-frugal Storage Pods". Hell they even give you some of the issues you will run into using one of the free alternatives AND EVEN GIVE YOU THE SOLUTIONS FOR THEM. OH! But there are a few places where they mention the product that they run off the very system they told you how to build and could make yourself. That must mean they are evil corporate fat-cats trying to cheat you out of your hard earned dollars. DAMN THEM FOR PROVIDING COST EFFECTIVE BACKUP SPACE!!!!!!!! How DARE they seek to earn a living. I am guessing you would be fine with the "slashvertisment" if their service was free though.

  26. Re:The drives alone cost more than $7.3k without R by funky_vibes · · Score: 2

    And that takes into account price breaks and volume pricing?

    There exist 10 to 25 OEM packs of drives from many manufacturers, did you look at those mfg part no.s?
    What about a full pallet?

    Only a moron would buy that amount of drives from a company that sells mainly to CONSUMERS.
    Even as a consumer, with large enough volumes, you may in some cases purchase straight from a distributor.