Build Your Own $2.8M Petabyte Disk Array For $117k
Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"
Good luck with all the silent data corruption. Shoulda used ZFS.
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
Support.
Hail Eris, full of mischief...
E pluribus sanguinem
Before realizing that we had to solve this storage problem ourselves, we considered Amazon S3, Dell or Sun Servers, NetApp Filers, EMC SAN, etc. As we investigated these traditional off-the-shelf solutions, we became increasingly disillusioned by the expense. When you strip away the marketing terms and fancy logos from any storage solution, data ends up on a hard drive.
That's odd, where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad, what happens when maintenance needs to be performed, what happens when the infrastructure needs upgrades, etc. This article left out a lot of buzzwords but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?
You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
My work here is dung.
Looks like a cheap downscale undersized version of a Sun X4500/X4540.
And as others have pointed out, you pay a vender because in 4 years they will still be stocking the drives you bought today, where as for this setup you will be praying they are still on ebay
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
AHhh, this is why the EMC guy committed suicide. It wasn't because he was dying of cancer.
Trolling is a art,
Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.
SJW: Someone who has run out of real oppression, and has to fake it.
How do you replace disks in the chassis? We've got 1,000 spinning disks and we've got a few failures a month. With 45 disks in each unit you are going to have to replace a few consumer grade drives.
Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.
But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.
Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.
You misread. It's $7,867 per 67 terabytes. So at the hard disk standard for a petabyte (base 10, not base 2), 1000 TB == 1 PB:
(1000 TB / 67 TB) * $7,867 = $117417.91
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
If you check out what the company does, they are an online backup company. They don't host servers on this array, just backup data from your desktop. They just need massive amounts of space which they make redundant.
I love free shipping, even if it costs me more !! I like FREE STUFF !!
They designed and built it so they should know how to support it. If someone else builds one, just learning how to get that beast up and running is excellent hands on training.
Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).
These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.
FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).
Is it just my observation, or are there way too many stupid people in the world?
where's the extensive stuff that sun (I work at sun, btw; related to storage) and others have for management? voltages, fan-flow, temperature points at various places inside the chassis, an 'ok to remove' led and button for the drives, redundant power supplies that hot-swap and drives that truly hot-swap (including presence sensors in drive bays). none of that is here. and these days, sas is the preferred drive tech for mission critical apps. very few customers use sata for anything 'real' (it seems, even though I personally like sata).
this is not enterprise quality no matter what this guy says.
there's a reason you pay a lot more for enterprise vendor solutions.
personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.
--
"It is now safe to switch off your computer."
Reliant Technology sells you NetApp FAS 6040 for $78,500 with a maximum capacity of 840 drives, without the hard drive (source: Google Shopping). If you buy FAS 6040 with the drives, most vendors will use more expensive and less capacity 15k rpm drives instead of the 7200rpm drives the BlackBlaze Pod uses, and this makes up a lot of the price difference. The point is, you could buy NetApp and install it yourself with cheap off-the-shelf consumer drives and end up spending about the same magnitude amount of money. I estimate that NetApp would cost just 1.5x the amount.
NetApp FAS 6040 at $78,500 + 840 x 1.5TB drives at $120 each = $179,300 which gives you 1.26PB. Cost per petabyte is $142,500, only slightly more expensive than BlackBlaze $117,000 from the article. The real story is that BlackBlaze is able to show a competitive edge of $30,000, or being 20% cheaper.
I once had a signature.
and save $2,799,720.
If an article went up describing how a major vendor released a petabyte array for $2M the comments would full of people saying "I could make an array with that much storage far cheaper!"
Now someone has gone and done exactly that (they even used linuxto do it) and suddenly everyone complains that it lacks support from a major vendor.
This may not be perfect for everyones needs, but it's nice to see this sort of innovation taking place instead of blindy following the same path everyone else takes for storage.
These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.
It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.
I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.
I like music
If you build a petabyte stack using 1.5TB disks you need about 800 drives including RAID overhead. With an MTBF for consumer drives of 500,000 hours, a drive will fail roughly every 10-15 days, if your design is good and you create no hotspots/vibration issues.
Rebuild times on large RAID sets are such that it is only a matter of time before they run a double drive failure and lose their customers data. The money they saved by going cheap will be spent on lawyers when they get the liability claims in.
If you RTFA, you will see that they are using RAID6 with 2 parity drives per raid, so a double drive failure can be handled, and it is only the less likely triple drive failure that will ruin them. It seems weak that they don't have hot-swappable drives in this configuration, but they have software that is managing the data across disk sets, and presumably they have redundant copies of data that keep the data accessible when one of their servers is taken down to replace a drive (if they don't, the downtimes due to replacing drives will make the service useless). This redundancy may also save them in the case that they actually lose a RAID set.
they used incredibly cheep-ass HBA's for no good reason.
In their defence:
A note about SATA chipsets: Each of the port multiplier backplanes has a Silicon Image SiI3726 chip so that five drives can be attached to one SATA port. Each of the SYBA two-port PCIe SATA cards has a Silicon Image SiI3132, and the four-port PCI Addonics card has a Silicon Image SiI3124 chip. We use only three of the four available ports on the Addonics card because we have only nine backplanes. We don't use the SATA ports on the motherboard because, despite Intel's claims of port multiplier support in their ICH10 south bridge, we noticed strange results in our performance tests. Silicon Image pioneered port multiplier technology, and their chips work best together.
where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad,
Whomever spec'd your systems should have accommodated obvious failures like this. As in, paying for colo, using servers with dual power supplies that fail over, sensible RAID strategy. Giving money to EMC in this situation is not sensible.
but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?
I have a couple of hundred drives going at any one time and I get an SNMP alert when a drive goes bad. I take one out of the closet and destroy the broken one. The RAID does the rest.
someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
Our storage strategy is N+1 all the way and required to be online 24/7 so failures are part of the plan. They are probably part of the plan at this startup.
We pay premiums so we can relax and concentrate on what we need to concentrate on.
I don't understand this. If your job is 89% software dev, then EMC may be the way to go. Expensive! But, it makes a little business sense. If you aren't spending most of your time writing software that adds value to your service/product, then EMC is doing your job and you are some kind of TPS generator. Do you pay a premium to blame someone else? I've had the opportunity to work in places like this and I've always passed because of the veiled contempt for IT.
Please, explain this to me.
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html
I like how you dismiss a detailed real world design example based simply on a claimed feature without any further substantiation. Very classy. I'm not saying you are wrong, but would it kill you to go into a little more detail about why these folks need "luck" when they are clearly very successful with their existing design?
STFU about slashdot bias.
The real value in a data storage system isn't in the hardware, it's in the data. And the real cost incurred in a data storage system is measured in the inability of the customer to access that data quickly, efficiently and (in the case of a disaster) at all.
If you need to crunch the data quickly, a higher-performing system is going to save you money in the end. Look at all the benchmarks: no home-grown systems are anywhere on the lists. If you want to stream through your data at several gigabytes per second, you need to pay for a fast interconnect. Putting 45 drives behind a single 1GbE just doesn't cut it.
Similarly, if you want to ensure that the data is protected (integrity, immutable storage for folks who need to preserve data and be certain it hasn't been tampered with, etc) and stored efficiently (single instance store, or dedupe, so you don't fill your petabytes of disks with a bajillion copies of the same photos of Anna Kournakova) then you need to pay for the extra goodness in that software and hardware as well.
Finally, if you want extremely high availability, then the cost of the hardware is miniscule compared to the cost of downtime. We had customers that would lose millions of dollars per service interruption. They're willing to pay a million dollars to eliminate or even reduce downtime.
These folks are essentially just building a box that makes a bunch of disks behave like a honking big tape drive. It's a viable business--that's all some folks need. But EMC et al are not going to lose any sleep over this.
Am I part of the core demographic for Swedish Fish?
How about reading the section "A Backblaze Storage Pod is a Building Block".
<snip> the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failures — it's irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesn't allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a "solution" in itself.
Emphasis mine. I believe there are quite a few successful and reliable storage vendors not using ZFS. We get the point, you like it. Doesn't mean you can't succeed without it. Be more open minded.
I'm sorry if I haven't offended anyone