Slashdot Mirror


Build Your Own $2.8M Petabyte Disk Array For $117k

Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"

10 of 487 comments (clear)

  1. My plan comes to fruition! by elrous0 · · Score: 5, Informative

    Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!

    And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.

    --
    SJW: Someone who has run out of real oppression, and has to fake it.
  2. Re:My math is a bit rusty... by Desler · · Score: 5, Informative
    It's not your math that's rusty it's your reading skills.

    Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.

  3. Re:A Very Shortsighted Article by SatanicPuppy · · Score: 4, Informative

    The focus of the article was only on the hardware, which was extremely low cost to the point of allowing massive redundancy...This is not an inherently flawed methodology.

    If you can deploy cheap 67 terabyte nodes, then you can treat each node like an individual drive, and swap them out accordingly.

    I'd need some actual uptime data to make a real judgment on their service vs their competitors, but I don't see any inherent flaws in building their own servers.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  4. Re:A Very Shortsighted Article by staeiou · · Score: 4, Informative

    We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.

    They actually do talk about that in the article. The difference in cost for one of the homegrown petabyte pods from the cheapest suppliers (Dell) is about $700,000. The difference between their pods and cloud services is over $2.7 million per petabyte. And they have many, many petabytes. Even if you do add "a few hundred thousand a year for the people who need to maintain this hardware" - and Dell isn't going to come down in the middle of the night when your power goes out - they are still way, way on top.

    I know you don't pay premiums because you're stupid. But think about how much those premiums are actually costing you, what you are getting in return, and if it is worth it.

  5. Re:A Very Shortsighted Article by Tx · · Score: 4, Informative

    We don't pay premiums because we're stupid. We pay premiums because we're lazy.

    There, fixed that for you ;).

    Ok, that was glib, but you do seem to have been too lazy to read the article, so perhaps you deserve it. To quote TFA, "Even including the surrounding costsâ"such as electricity, bandwidth, space rental, and IT administratorsâ(TM) salariesâ"Backblaze spends one-tenth of the price in comparison to using Amazon S3, Dell Servers, NetApp Filers, or an EMC SAN.". So that aren't ignoring the costs of IT staff administering this stuff as you imply, they're telling you the costs including the admin costs at their datacentre.

    --
    Oh no... it's the future.
  6. Re:they are missing hardware mgmt by SatanicPuppy · · Score: 5, Informative

    This sort of attitude is how Sun got it's lunch eaten in the market in the first place.

    Yes, your hardware rocks. It's so fucking sexy I need new pants when I come into contact with it.

    It also costs more than a fucking italian sports car.

    Turns out that if your awesome hardware is 10 times better than commodity hardware, but also 25 times as expensive, people are just going to buy more commodity hardware.

    I've got some Sun data appliances and I've got some Dell data appliances, and the only difference I've seen between them is purely one of cost. The only thing that ever breaks is drives.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
  7. Re:A Very Shortsighted Article by ianpatt · · Score: 3, Informative

    From the credits list: "Protocase for putting up with hundreds of small 3-D case design tweaks", which I assume is http://www.protocase.com/.

  8. Re:are you a project manager by any chance? by pyite · · Score: 5, Informative

    are you a project manager by any chance?

    Of course not. A project manager would look at this and go, "wow, we saved a lot of money!" It's pretty simple. ZFS does what most other filesystems do not; it guarantees data integrity at the block level by the use of checksums. When you're dealing with this many spindles and dense, non-enterprise drives, you are virtually guaranteed to get silent corruption. The article does not once have any of the words corrupt.*, checksum, or integrity mentioned in it once. The server doesn't use ECC RAM. The project, while well intentioned, should scare the crap out of anyone thinking about storing data with this company.

    --

    "Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman

  9. Re:Not ZFS? by FoolishBluntman · · Score: 3, Informative

    >That is scary as hell. You didn't know the drive failed??? Why?? How the heck did they know? Do you really provide them access to your data 24/7?? That's crazy! No moron, high end disk arrays "phone home" either by dedicated phone line or email when a disk failure occurs. The disk array immediately starts rebuilding a RAID set using a hot spare. The disk you receive in the mail or from an on-site call is to replace the failed drive. They don't need access to your data, just the status of the array subsystem. >The biggest argument against the large storage companies, is that large, dynamic companies don't use them. Amazon doesn't. Google doesn't. Facebook doesn't. The only company in your list that doesn't use a large storage company is Google. Most companies don't have the in-house expertise to keep trace of their data. They out source a lot of the work so they can concentrate on their core business.

  10. Re:A Very Shortsighted Article by sholto · · Score: 3, Informative

    I'd need some actual uptime data to make a real judgment on their service vs their competitors,

    I did an extensive interview with the Backblaze CEO. No hard data on uptime but he says they lose one drive a week from the whole 1.5petabyte system and have never had a pod fail. They've been running for a year. Here's the link to the story. Also comments about the designing/testing process. http://www.crn.com.au/News/154760,want-a-petabyte-for-under-us120000.aspx