Build Your Own $2.8M Petabyte Disk Array For $117k
Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"
Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.
SJW: Someone who has run out of real oppression, and has to fake it.
Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.
But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.
Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.
The point is that the costs of services like Amazon or NetApp, etc include the costs for support, server maintenance, upgrades, etc. That they are only comparing this to just the bare minimum price for this company to construct their server is highly misleading.
Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).
These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.
FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).
Is it just my observation, or are there way too many stupid people in the world?
For the 2.683M difference, that support better come with a "happy ending" for the entire staff...
where's the extensive stuff that sun (I work at sun, btw; related to storage) and others have for management? voltages, fan-flow, temperature points at various places inside the chassis, an 'ok to remove' led and button for the drives, redundant power supplies that hot-swap and drives that truly hot-swap (including presence sensors in drive bays). none of that is here. and these days, sas is the preferred drive tech for mission critical apps. very few customers use sata for anything 'real' (it seems, even though I personally like sata).
this is not enterprise quality no matter what this guy says.
there's a reason you pay a lot more for enterprise vendor solutions.
personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.
--
"It is now safe to switch off your computer."
Depends on how it works. Hopefully (or ideally) it's more like the google approach - build it to maintain data redundancy, initially with X% overcapacity. As disks fail, what do you do then? Nothing. When it gets down to 80% or so of original capacity (or however much redundancy you designed in), you chuck it and buy a new one. By then the tech is outdated anyways.
And backup, redundancy, hosting, cooling etc etc. The $117,000 cost quoted here is for raw hardware only.
These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.
It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.
I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.
I like music
Backup: depends on the backup strategy. I could make this happen for less than an additional 10%. But ok, point taken.
Redundancy: You mean as in plain redundancy? These are RAID arrays are they not? You want redundancy at the server level? Now you're increasing the scope of the project which the article doesn't address. (Scope error)
Hosting: Again, the point of the article was the hardware. That's a little like accounting for the cost of a trip to your grandmother's, and factoring in the cost of your grandmother's house. A little out of scope.
Cooling: I could probably get the whole project chilled for less than 6% of the total cost, depending on how cool you want the rig to run.
I think you're looking for a wrench in the works where none exist.
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
Redundancy can be had for another $117,000.
Hosting in a DC will not even be a blip in the difference between that and $2.7m.
EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.
I hate printers.
Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
Well... yes and no. When you've worked with high-end arrays, you learn that storage is only the beginning. NetApp and EMC provide far, far more. I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
Since that time (over 10 years ago), the state of the art has improved over and over again. If you're buying a petabyte of storage, it's because you have a need that breaks most basic storage models, and the average sysadmin who thinks that storage is cheap is going to go through a lot of pain learning that he's wrong.
Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity. Until then, there are demands placed on you when you administrate that much storage which demand a very different class of device than a Linux box with a bunch of raid cards.
As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
Just make sure the wife doesn't catch you unit testing the outsourced part.
are you a project manager by any chance?
Of course not. A project manager would look at this and go, "wow, we saved a lot of money!" It's pretty simple. ZFS does what most other filesystems do not; it guarantees data integrity at the block level by the use of checksums. When you're dealing with this many spindles and dense, non-enterprise drives, you are virtually guaranteed to get silent corruption. The article does not once have any of the words corrupt.*, checksum, or integrity mentioned in it once. The server doesn't use ECC RAM. The project, while well intentioned, should scare the crap out of anyone thinking about storing data with this company.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman