Build Your Own $2.8M Petabyte Disk Array For $117k
Chris Pirazzi writes "Online backup startup BackBlaze, disgusted with the outrageously overpriced offerings from EMC, NetApp and the like, has released an open-source hardware design showing you how to build a 4U, RAID-capable, rack-mounted, Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867. This works out to roughly $117,000 per petabyte, which would cost you around $2.8 million from Amazon or EMC. They have a full parts list and diagrams showing how they put everything together. Their blog states: 'Our hope is that by sharing, others can benefit and, ultimately, refine this concept and send improvements back to us.'"
Support.
Hail Eris, full of mischief...
E pluribus sanguinem
Before realizing that we had to solve this storage problem ourselves, we considered Amazon S3, Dell or Sun Servers, NetApp Filers, EMC SAN, etc. As we investigated these traditional off-the-shelf solutions, we became increasingly disillusioned by the expense. When you strip away the marketing terms and fancy logos from any storage solution, data ends up on a hard drive.
That's odd, where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad, what happens when maintenance needs to be performed, what happens when the infrastructure needs upgrades, etc. This article left out a lot of buzzwords but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?
You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
My work here is dung.
Looks like a cheap downscale undersized version of a Sun X4500/X4540.
And as others have pointed out, you pay a vender because in 4 years they will still be stocking the drives you bought today, where as for this setup you will be praying they are still on ebay
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
AHhh, this is why the EMC guy committed suicide. It wasn't because he was dying of cancer.
Trolling is a art,
Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.
SJW: Someone who has run out of real oppression, and has to fake it.
How do you replace disks in the chassis? We've got 1,000 spinning disks and we've got a few failures a month. With 45 disks in each unit you are going to have to replace a few consumer grade drives.
Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.
But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.
Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.
Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).
These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.
FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).
Is it just my observation, or are there way too many stupid people in the world?
where's the extensive stuff that sun (I work at sun, btw; related to storage) and others have for management? voltages, fan-flow, temperature points at various places inside the chassis, an 'ok to remove' led and button for the drives, redundant power supplies that hot-swap and drives that truly hot-swap (including presence sensors in drive bays). none of that is here. and these days, sas is the preferred drive tech for mission critical apps. very few customers use sata for anything 'real' (it seems, even though I personally like sata).
this is not enterprise quality no matter what this guy says.
there's a reason you pay a lot more for enterprise vendor solutions.
personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.
--
"It is now safe to switch off your computer."
Get both Debian and ZFS.. Nexenta. Links in my sig.
http://dilemma.gulecha.org - My philospohical short film.
If an article went up describing how a major vendor released a petabyte array for $2M the comments would full of people saying "I could make an array with that much storage far cheaper!"
Now someone has gone and done exactly that (they even used linuxto do it) and suddenly everyone complains that it lacks support from a major vendor.
This may not be perfect for everyones needs, but it's nice to see this sort of innovation taking place instead of blindy following the same path everyone else takes for storage.
These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.
It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.
I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.
I like music
What do you mean by more expensive? OpenSolaris with ZFS costs the same as Linux. And yes, You'll have to get up a lot less often in the middle of the night, since a few bad sectors aren't going to force a fail of the entire disk.
Disclaimer: Evolution comes with NO WARRANTY, except for the IMPLIED WARRANTY of FITNESS FOR A PARTICULAR PURPOSE.
Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
Well... yes and no. When you've worked with high-end arrays, you learn that storage is only the beginning. NetApp and EMC provide far, far more. I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
Since that time (over 10 years ago), the state of the art has improved over and over again. If you're buying a petabyte of storage, it's because you have a need that breaks most basic storage models, and the average sysadmin who thinks that storage is cheap is going to go through a lot of pain learning that he's wrong.
Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity. Until then, there are demands placed on you when you administrate that much storage which demand a very different class of device than a Linux box with a bunch of raid cards.
As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
I have worked in disk storage design. This was a very cool project. This looks like a promising start and in some ways represents the future of storage; COTS parts. Others have pointed out some areas of improvement, cooling and the like.
And I think I would use dual micro ATA motherboards, perhaps in their own cases to make them replaceable in case of failure.
I realize that the layout of the drives was done with an eye toward airflow, but I personally don't like to see drives set on their edges. It's probably a personal bias, but I like to see drives set flat. The bearings seem to last longer that way. Just my personal experience.
And, one final point, storage density is reaching the point where we can jam a lot of storage into a small space. Perhaps we have reached the point where we can start to spread things out and do things like put the drives in a separate enclosure or multiple enclosures. It makes designing, installing, and servicing easier. Use eSATA ports on the SATA cards to make external storage easier.
Best regards.
As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
Or they have been bought by one of the bigger storage companies.
I prefer rogues to imbeciles because they sometimes take a rest.
I like how you dismiss a detailed real world design example based simply on a claimed feature without any further substantiation. Very classy. I'm not saying you are wrong, but would it kill you to go into a little more detail about why these folks need "luck" when they are clearly very successful with their existing design?
STFU about slashdot bias.
I have news for you. The high end boxes from EMC, NetApp and the like have silent data corruption too!
>That is scary as hell. You didn't know the drive failed??? Why?? How the heck did they know? Do you really provide them access to your data 24/7?? That's crazy! No moron, high end disk arrays "phone home" either by dedicated phone line or email when a disk failure occurs. The disk array immediately starts rebuilding a RAID set using a hot spare. The disk you receive in the mail or from an on-site call is to replace the failed drive. They don't need access to your data, just the status of the array subsystem. >The biggest argument against the large storage companies, is that large, dynamic companies don't use them. Amazon doesn't. Google doesn't. Facebook doesn't. The only company in your list that doesn't use a large storage company is Google. Most companies don't have the in-house expertise to keep trace of their data. They out source a lot of the work so they can concentrate on their core business.
How about reading the section "A Backblaze Storage Pod is a Building Block".
<snip> the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failures — it's irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesn't allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a "solution" in itself.
Emphasis mine. I believe there are quite a few successful and reliable storage vendors not using ZFS. We get the point, you like it. Doesn't mean you can't succeed without it. Be more open minded.
I'm sorry if I haven't offended anyone
They're betting on the MTTF of the drives, on RAID, and on redundant system backups.
Yes, it's cheap hardware. Yes, cheap hardware fails more often than expensive hardware. Yes, cheap hardware is slower than expensive hardware. But you have to look at the offsets: they are building a backup service, where they don't need "instant" data access speeds. As for drive failures, I have some experience there. I have 57,000 cheap-ass consumer drives in service, and over 10,000 of them are 11 years old. They're dying at the rate of about ten failures per day. The key is to build your processes to tolerate and handle failures.
As long as your redundant systems are keeping copies of the data, and you understand exactly what the impact is of a failed component as well as have a recovery plan in place, why not use cheap hardware? Let's do a bit of math. The guy had a photo of himself standing behind about 18 of these boxes. That's 810 drives. If we lowball cheap drives at 300,000 hours MTBF, he'll see an average of two failures per month. It might take him $200 and an hour to recover each failed drive. We could keep doing the math on each component, but I suspect this is still a complete and total bargain that will meet his business needs very well.
It may not be as shiny as EMC or NetApp, and you have to do the legwork yourself, but why spend the extra money on a system that would provide him with "too much service"? From an ROI perspective, this guy is probably going to do very well, even though he may drive a few sysadmins crazy in the process.
John