Slashdot Mirror


Build Your Own $2.8M Petabyte Disk Array For $117k

Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"

17 of 487 comments (clear)

  1. You know why Amazon charges that much? by Nimey · · Score: 4, Insightful

    Support.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
    1. Re:You know why Amazon charges that much? by Richard_at_work · · Score: 5, Insightful

      And backup, redundancy, hosting, cooling etc etc. The $117,000 cost quoted here is for raw hardware only.

    2. Re:You know why Amazon charges that much? by johnlcallaway · · Score: 4, Insightful

      It's great having someone tell you they will be there in three hours to replace your power supply, that you then have to dedicate a staff person to be with when they go out on the shop floor because some moron in security requires it. If they had just left a few spare parts you could do it yourself because everything just slides into place anyway.

      That 2.683M also pays for salaries, pretty building(s), advertising, research, conventions, and more advertising.

      I could hire a couple of dedicated staff to have 24x7 support for far less than 2.683M, plus a duplicate system worth of spare parts.

      This stuff isn't rocket science. Most companies don't need high-speed, fiber-optic disk array subsystems for a significant amount of their data, only for a small subset that needs blindingly fast speed. The rest can sit on cheap arrays. For example, all of my network accessible files that I open very rarely but keep on the network because it gets backed up. All of my 5 copies of database backups and logs that I keep because it's faster to pull it off of disk than request a tape from offsite. And it's faster to backup to disk, then to tape.

      BackBlaze is a good example of someone that needs a ton of storage, but not lightening fast access. Having a reliable system is more important to them than one that has all the tricks and trappings of an EMC array that probably 10% of all EMC users actually use, but they all pay for.

      --
      I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
    3. Re:You know why Amazon charges that much? by interval1066 · · Score: 5, Insightful

      Backup: depends on the backup strategy. I could make this happen for less than an additional 10%. But ok, point taken.

      Redundancy: You mean as in plain redundancy? These are RAID arrays are they not? You want redundancy at the server level? Now you're increasing the scope of the project which the article doesn't address. (Scope error)

      Hosting: Again, the point of the article was the hardware. That's a little like accounting for the cost of a trip to your grandmother's, and factoring in the cost of your grandmother's house. A little out of scope.

      Cooling: I could probably get the whole project chilled for less than 6% of the total cost, depending on how cool you want the rig to run.

      I think you're looking for a wrench in the works where none exist.

      --
      Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
    4. Re:You know why Amazon charges that much? by MrNaz · · Score: 5, Insightful

      Redundancy can be had for another $117,000.
      Hosting in a DC will not even be a blip in the difference between that and $2.7m.

      EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.

      --
      I hate printers.
  2. Ripoff by asaul · · Score: 4, Insightful

    Looks like a cheap downscale undersized version of a Sun X4500/X4540.

    And as others have pointed out, you pay a vender because in 4 years they will still be stocking the drives you bought today, where as for this setup you will be praying they are still on ebay

    --
    "If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
  3. wtf? by pak9rabid · · Score: 5, Insightful
    FTA...

    But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.

    Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.

  4. Re:A Very Shortsighted Article by Desler · · Score: 5, Insightful

    The point is that the costs of services like Amazon or NetApp, etc include the costs for support, server maintenance, upgrades, etc. That they are only comparing this to just the bare minimum price for this company to construct their server is highly misleading.

  5. Not that shortsighted for their purposes by Overzeetop · · Score: 5, Insightful

    Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).

    These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.

    FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).

    --
    Is it just my observation, or are there way too many stupid people in the world?
  6. Lets try to be a bit more supportive here! by fake_name · · Score: 4, Insightful

    If an article went up describing how a major vendor released a petabyte array for $2M the comments would full of people saying "I could make an array with that much storage far cheaper!"

    Now someone has gone and done exactly that (they even used linuxto do it) and suddenly everyone complains that it lacks support from a major vendor.

    This may not be perfect for everyones needs, but it's nice to see this sort of innovation taking place instead of blindy following the same path everyone else takes for storage.

  7. What's all the hate? by xrayspx · · Score: 5, Insightful

    These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.

    It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.

    I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.

  8. Re:A Very Shortsighted Article by Anarke_Incarnate · · Score: 4, Insightful

    You will more than likely NOT have to take a node offline. The design looks like they place the drives into slip down hot plug enclosures. Most rack mounted hardware is on rails, not screwed to the rack. You roll the rack out, log in, fail the drive that is bad, remove it, hot plug another drive and add it to the array. You are now done.

    They went RAID 6, even though it is slow as shit, for the added failsafe mechanisms.

  9. Re:they are missing hardware mgmt by N1ck0 · · Score: 4, Insightful

    Its better at what they need it for. Based on the services and software they describe on their site, it looks like they store data in the classic redundant chunks distributed over multiple 'disposable' storage systems. In this situation most of the added redundancy that vendors put in their products doesn't add much value to their storage application. Thus having racks and racks of basic RAIDs on cheap disks and paying a few on-site monkeys to replace parts is more cost effective then going to a more stable/tested enterprise storage vendor.

  10. Re:they are missing hardware mgmt by swillden · · Score: 4, Insightful

    personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.

    I don't these folks guy believe their solution is better -- just cheaper. MUCH cheaper. So much cheaper that you can employ a team of people to maintain the "homebrew" solution and still save money.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  11. Re:Not ZFS? by mollog · · Score: 4, Insightful

    I have worked in disk storage design. This was a very cool project. This looks like a promising start and in some ways represents the future of storage; COTS parts. Others have pointed out some areas of improvement, cooling and the like.

    And I think I would use dual micro ATA motherboards, perhaps in their own cases to make them replaceable in case of failure.

    I realize that the layout of the drives was done with an eye toward airflow, but I personally don't like to see drives set on their edges. It's probably a personal bias, but I like to see drives set flat. The bearings seem to last longer that way. Just my personal experience.

    And, one final point, storage density is reaching the point where we can jam a lot of storage into a small space. Perhaps we have reached the point where we can start to spread things out and do things like put the drives in a separate enclosure or multiple enclosures. It makes designing, installing, and servicing easier. Use eSATA ports on the SATA cards to make external storage easier.

    --
    Best regards.
  12. are you a project manager by any chance? by leoc · · Score: 4, Insightful

    I like how you dismiss a detailed real world design example based simply on a claimed feature without any further substantiation. Very classy. I'm not saying you are wrong, but would it kill you to go into a little more detail about why these folks need "luck" when they are clearly very successful with their existing design?

    --
    STFU about slashdot bias.
  13. *sigh* by upside · · Score: 4, Insightful

    How about reading the section "A Backblaze Storage Pod is a Building Block".

    <snip> the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failures — it's irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesn't allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a "solution" in itself.

    Emphasis mine. I believe there are quite a few successful and reliable storage vendors not using ZFS. We get the point, you like it. Doesn't mean you can't succeed without it. Be more open minded.

    --
    I'm sorry if I haven't offended anyone