Build Your Own $2.8M Petabyte Disk Array For $117k

Not ZFS? by pyite · 2009-09-02 02:10 · Score: 2, Insightful

Good luck with all the silent data corruption. Shoulda used ZFS.

--

"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman

Re:Not ZFS? by anilg · 2009-09-02 02:41 · Score: 4, Interesting

Get both Debian and ZFS.. Nexenta. Links in my sig.

--
http://dilemma.gulecha.org - My philospohical short film.
Re:Not ZFS? by Lord+Ender · 2009-09-02 02:59 · Score: 2, Insightful

Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?

--
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Re:Not ZFS? by chudnall · 2009-09-02 03:13 · Score: 3, Interesting

What do you mean by more expensive? OpenSolaris with ZFS costs the same as Linux. And yes, You'll have to get up a lot less often in the middle of the night, since a few bad sectors aren't going to force a fail of the entire disk.

--
Disclaimer: Evolution comes with NO WARRANTY, except for the IMPLIED WARRANTY of FITNESS FOR A PARTICULAR PURPOSE.
Re:Not ZFS? by ajs · 2009-09-02 03:15 · Score: 5, Interesting

Are you saying that with the more expensive system, disks never fail and nobody ever has to get up in the night?
Well... yes and no. When you've worked with high-end arrays, you learn that storage is only the beginning. NetApp and EMC provide far, far more. I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
Since that time (over 10 years ago), the state of the art has improved over and over again. If you're buying a petabyte of storage, it's because you have a need that breaks most basic storage models, and the average sysadmin who thinks that storage is cheap is going to go through a lot of pain learning that he's wrong.
Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity. Until then, there are demands placed on you when you administrate that much storage which demand a very different class of device than a Linux box with a bunch of raid cards.
As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
Re:Not ZFS? by mollog · 2009-09-02 03:20 · Score: 4, Insightful

I have worked in disk storage design. This was a very cool project. This looks like a promising start and in some ways represents the future of storage; COTS parts. Others have pointed out some areas of improvement, cooling and the like.

And I think I would use dual micro ATA motherboards, perhaps in their own cases to make them replaceable in case of failure.

I realize that the layout of the drives was done with an eye toward airflow, but I personally don't like to see drives set on their edges. It's probably a personal bias, but I like to see drives set flat. The bearings seem to last longer that way. Just my personal experience.

And, one final point, storage density is reaching the point where we can jam a lot of storage into a small space. Perhaps we have reached the point where we can start to spread things out and do things like put the drives in a separate enclosure or multiple enclosures. It makes designing, installing, and servicing easier. Use eSATA ports on the SATA cards to make external storage easier.

--
Best regards.
Re:Not ZFS? by ImprovOmega · 2009-09-02 03:43 · Score: 2, Informative

I was damned impressed when I first heard a presentation from NetApp about their technology, but the day that they called me up and told me that the replacement disk was in the mail and I answered, "I had a failure?" ... that was the day that I understood what data reliability was all about.
Agreed. We've had similar experiences with HP EVA systems here at work with things like that, it's wonderful =)

Someday, you'll have a petabyte disk in a 3.5" form-factor. At that point, you can treat it as a commodity.
As much as I want to believe this, I know that just as in the past the business will find a way to fill an array of such drives. They'll decide to do something silly like 24/7 recording of 1000 different cameras, or hourly snapshots of critical systems going back 3 months "just in case", or something. If you have seemingly unlimited amounts of cheap storage, the business *will* find a way to fill it.
Re:Not ZFS? by NotBornYesterday · 2009-09-02 04:05 · Score: 3, Insightful

As evidence of that, I submit that dozens of companies like the one in this article have existed over the years, and only a handful of them still exist. Those that still do have either exited the storage array business, or have evolved their offerings into something that costs a lot more to build and support than a pile of disks.
Or they have been bought by one of the bigger storage companies.

--
I prefer rogues to imbeciles because they sometimes take a rest.
Re:Not ZFS? by iphayd · 2009-09-02 05:28 · Score: 2, Interesting

On a similar note, they claim that they will backup any one computer for $5/month. Well, my one computer happens to be the backup node for my SAN, so they're going to need about 15 TB (It's a small SAN) to have 30 day backups for me. Please note, that all of the files on my SAN are under 4GB and I have a SAN, not a NAS, so my servers see it as a native hard drive.
Re:Not ZFS? by FoolishBluntman · 2009-09-02 05:40 · Score: 3, Interesting

I have news for you. The high end boxes from EMC, NetApp and the like have silent data corruption too!
Re:Not ZFS? by FoolishBluntman · 2009-09-02 05:50 · Score: 3, Informative

>That is scary as hell. You didn't know the drive failed??? Why?? How the heck did they know? Do you really provide them access to your data 24/7?? That's crazy! No moron, high end disk arrays "phone home" either by dedicated phone line or email when a disk failure occurs. The disk array immediately starts rebuilding a RAID set using a hot spare. The disk you receive in the mail or from an on-site call is to replace the failed drive. They don't need access to your data, just the status of the array subsystem. >The biggest argument against the large storage companies, is that large, dynamic companies don't use them. Amazon doesn't. Google doesn't. Facebook doesn't. The only company in your list that doesn't use a large storage company is Google. Most companies don't have the in-house expertise to keep trace of their data. They out source a lot of the work so they can concentrate on their core business.
Re:Not ZFS? by anegg · 2009-09-02 06:04 · Score: 2, Informative

NetApp provides a function in the storage servers that they sell whereby significant events such as drive failures as well as general health check information can be sent to NetApp if you choose. The information is sent via e-mail or an HTTP POST (if I recall correctly). If you have support services, they monitor your installation via these messages, and will automatically send out a new drive if you have drive replacement services, for example. They do not have remote command access to your storage server (unless you chose to give them that by making the interface available outside of your firewall).
Re:Not ZFS? by iphayd · 2009-09-02 06:12 · Score: 2, Informative

So you are saying that they're happy to get their return of investment on their hardware alone in 44 years? I doubt it.
Re:Not ZFS? by rnturn · 2009-09-02 07:10 · Score: 2, Insightful

Because, you know, ZFS cures cancer and stops bad breath, too. No to be too snarky but jeez... what did everybody do before ZFS came along?

--
CUR ALLOC 20195.....5804M
Re:Not ZFS? by plover · 2009-09-02 10:04 · Score: 3, Interesting

They're betting on the MTTF of the drives, on RAID, and on redundant system backups.
Yes, it's cheap hardware. Yes, cheap hardware fails more often than expensive hardware. Yes, cheap hardware is slower than expensive hardware. But you have to look at the offsets: they are building a backup service, where they don't need "instant" data access speeds. As for drive failures, I have some experience there. I have 57,000 cheap-ass consumer drives in service, and over 10,000 of them are 11 years old. They're dying at the rate of about ten failures per day. The key is to build your processes to tolerate and handle failures.
As long as your redundant systems are keeping copies of the data, and you understand exactly what the impact is of a failed component as well as have a recovery plan in place, why not use cheap hardware? Let's do a bit of math. The guy had a photo of himself standing behind about 18 of these boxes. That's 810 drives. If we lowball cheap drives at 300,000 hours MTBF, he'll see an average of two failures per month. It might take him $200 and an hour to recover each failed drive. We could keep doing the math on each component, but I suspect this is still a complete and total bargain that will meet his business needs very well.
It may not be as shiny as EMC or NetApp, and you have to do the legwork yourself, but why spend the extra money on a system that would provide him with "too much service"? From an ROI perspective, this guy is probably going to do very well, even though he may drive a few sysadmins crazy in the process.

--
John
Re:Not ZFS? by therufus · 2009-09-02 12:10 · Score: 2, Interesting

You need to look at the grand scheme of things. Sure, you may get 5-10% of customers using massive amounts of data (over 500Gb) but when 90-95% of your customers are home users and small businesses who don't have their own data centers, and they may only have a 50Mb backup, their lack of use offsets the heavy users.
Imagine if in a 1Pb server, 750Tb of data was used by 10,000 individuals paying $5/mth and the other 250Tb was used by 50 individuals paying $5/mth. I failed at mathematics at school, but I'm sure the 10k will pay the data center costs that would be incurred by the 50.

--
You moved your mouse. Please restart Windows for changes to take effect.

You know why Amazon charges that much? by Nimey · 2009-09-02 02:12 · Score: 4, Insightful

Support.

--
Hail Eris, full of mischief...

E pluribus sanguinem

Re:You know why Amazon charges that much? by bytethese · 2009-09-02 02:31 · Score: 5, Funny

For the 2.683M difference, that support better come with a "happy ending" for the entire staff...
Re:You know why Amazon charges that much? by drooling-dog · 2009-09-02 02:41 · Score: 4, Funny

Damn. I was going to offer support for half of that price until I saw this new requirement...
Re:You know why Amazon charges that much? by machine321 · 2009-09-02 02:46 · Score: 3, Funny

For 2.683M, you can probably afford to outsource that part.
Re:You know why Amazon charges that much? by Richard_at_work · 2009-09-02 02:47 · Score: 5, Insightful

And backup, redundancy, hosting, cooling etc etc. The $117,000 cost quoted here is for raw hardware only.
Re:You know why Amazon charges that much? by johnlcallaway · 2009-09-02 02:51 · Score: 4, Insightful

It's great having someone tell you they will be there in three hours to replace your power supply, that you then have to dedicate a staff person to be with when they go out on the shop floor because some moron in security requires it. If they had just left a few spare parts you could do it yourself because everything just slides into place anyway.

That 2.683M also pays for salaries, pretty building(s), advertising, research, conventions, and more advertising.

I could hire a couple of dedicated staff to have 24x7 support for far less than 2.683M, plus a duplicate system worth of spare parts.

This stuff isn't rocket science. Most companies don't need high-speed, fiber-optic disk array subsystems for a significant amount of their data, only for a small subset that needs blindingly fast speed. The rest can sit on cheap arrays. For example, all of my network accessible files that I open very rarely but keep on the network because it gets backed up. All of my 5 copies of database backups and logs that I keep because it's faster to pull it off of disk than request a tape from offsite. And it's faster to backup to disk, then to tape.

BackBlaze is a good example of someone that needs a ton of storage, but not lightening fast access. Having a reliable system is more important to them than one that has all the tricks and trappings of an EMC array that probably 10% of all EMC users actually use, but they all pay for.

--
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
Re:You know why Amazon charges that much? by interval1066 · 2009-09-02 02:56 · Score: 5, Insightful

Backup: depends on the backup strategy. I could make this happen for less than an additional 10%. But ok, point taken.
Redundancy: You mean as in plain redundancy? These are RAID arrays are they not? You want redundancy at the server level? Now you're increasing the scope of the project which the article doesn't address. (Scope error)
Hosting: Again, the point of the article was the hardware. That's a little like accounting for the cost of a trip to your grandmother's, and factoring in the cost of your grandmother's house. A little out of scope.
Cooling: I could probably get the whole project chilled for less than 6% of the total cost, depending on how cool you want the rig to run.
I think you're looking for a wrench in the works where none exist.

--
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
Re:You know why Amazon charges that much? by MrNaz · 2009-09-02 02:56 · Score: 5, Insightful

Redundancy can be had for another $117,000.
Hosting in a DC will not even be a blip in the difference between that and $2.7m.
EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.

--
I hate printers.
Re:You know why Amazon charges that much? by MoonBuggy · 2009-09-02 02:59 · Score: 4, Interesting

The lowest cost of an (apparently) comparable solution on their site is from Dell, at $826,000 per PB. That includes hardware and support but still requires hosting, cooling and so on at extra cost. To quote backup and redundancy as part of the cost seems misleading, since none of the solutions appear to include that.
Basically, in order to compare favourably to the Dell units simply requires that one can get support for less than $709,000. If you want to throw in backup and redundancy, then buy twice as many units - you've still got change from half a million compared to the single Dell unit in order to cover the extra power, support and cooling costs, not to mention that support costs don't necessarily scale linearly.
Re:You know why Amazon charges that much? by MadKeithV · 2009-09-02 03:21 · Score: 5, Funny

Just make sure the wife doesn't catch you unit testing the outsourced part.
Re:You know why Amazon charges that much? by NotBornYesterday · 2009-09-02 04:07 · Score: 4, Funny

"Sorry, I have to stay late tonight honey, ... I'm hard at work."

--
I prefer rogues to imbeciles because they sometimes take a rest.
Re:You know why Amazon charges that much? by Anonymous Coward · 2009-09-02 04:17 · Score: 2, Funny

Our support model is close to that. We give you the lube, then we tell you to $&%* yourselves.
Re:You know why Amazon charges that much? by Score+Whore · 2009-09-02 06:32 · Score: 4, Interesting

Redundancy can be had for another $117,000.
Hosting in a DC will not even be a blip in the difference between that and $2.7m.
EMC, Amazon etc are a ripoff and I have no idea why there are so many apologists here.
First these aren't even storage arrays in the same sense that EMC, Hitachi, NetApp, Sun, etc. provide. The only protocol you can use to access your data is https? WTF! Second the Hitachi array in my data center doesn't put 67 TB storage behind half a dozen single points of failure the way this thing does. Third the Hitachi array in my data center doesn't put 67 TB behind a dinky gigabit ethernet link. My Hitachi will provide me with 200,000 IOPS with 5 ms latency. I can hook a whole slew of hosts up to my SAN. I can take off-host, change-only copies of my data so backups don't bog down my production work. I can establish replication between the Hitachi here in this building and the second array four hundred miles away with write order fidelity and guaranteed RPOs.
Comparing this thing to enterprise class storage is like some sixteen year old adding a cold air intake and a coat of red paint to his Honda civic then running around bragging that his car is somehow comparable to a Ferrari ("look they're both red!") Every time I see something like this the only thing I learn is that yet another person doesn't actually "Get It" when it comes to storage.
HelloWorld.c is to the Linux kernel as this thing is to the Hitachi USP-V or EMC Symmetrix.
Re:You know why Amazon charges that much? by Sandbags · 2009-09-02 06:55 · Score: 2, Informative

"Redundancy can be had for another $117,000." ...plus the inter SAN connectivity ...plus the SAN Fabric aware write plitting hardware and licensing ...plus the redundancy aware server connected to that SAN fabric ...plus the multipath HBA licensing for the servers ...plus multiple redundant HBAs per server and twice as many SAN fabric switches ...plus journaling and rollback storage, and block level deduplication within it (having a real-time copy is useless if you get infected with a virus). ...plus another real-time asynchonously replicated SAN at an offsite location at least 100 miles away ...plus the ISP connection to the offsite ...plus the staff to support an additional site and all the complex software and clusters ...plus cluster aware operating systems
This is why Tier 0 arrays cost in the millions...

--
There is no contest in life for which the unprepared have the advantage.
Re:You know why Amazon charges that much? by ToasterMonkey · 2009-09-02 08:32 · Score: 2, Informative

My Hitachi will provide me with 200,000 IOPS with 5 ms latency.
While that is just a TAD overkill for disk backup, these guy's $.11/GB is not something I'd trust my backups on.

HelloWorld.c is to the Linux kernel as this thing is to the Hitachi USP-V or EMC Symmetrix.
You nailed it.
Service Time/IOPS is less important here than trustworthy and proven controller hardware & software, and built in goodies like replication. That's why I would trust disk backups to Sun, NetApp, Hitachi, EMC, and not these people. Possibly home systems I guess, but bragging about homemade storage is a real turnoff.

A Very Shortsighted Article by eldavojohn · 2009-09-02 02:12 · Score: 3, Insightful

Before realizing that we had to solve this storage problem ourselves, we considered Amazon S3, Dell or Sun Servers, NetApp Filers, EMC SAN, etc. As we investigated these traditional off-the-shelf solutions, we became increasingly disillusioned by the expense. When you strip away the marketing terms and fancy logos from any storage solution, data ends up on a hard drive.

That's odd, where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad, what happens when maintenance needs to be performed, what happens when the infrastructure needs upgrades, etc. This article left out a lot of buzzwords but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?

You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.

We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.

--
My work here is dung.

Re:A Very Shortsighted Article by SatanicPuppy · 2009-09-02 02:23 · Score: 4, Informative

The focus of the article was only on the hardware, which was extremely low cost to the point of allowing massive redundancy...This is not an inherently flawed methodology.
If you can deploy cheap 67 terabyte nodes, then you can treat each node like an individual drive, and swap them out accordingly.
I'd need some actual uptime data to make a real judgment on their service vs their competitors, but I don't see any inherent flaws in building their own servers.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:A Very Shortsighted Article by Desler · 2009-09-02 02:26 · Score: 5, Insightful

The point is that the costs of services like Amazon or NetApp, etc include the costs for support, server maintenance, upgrades, etc. That they are only comparing this to just the bare minimum price for this company to construct their server is highly misleading.
Re:A Very Shortsighted Article by staeiou · 2009-09-02 02:27 · Score: 4, Informative

We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
They actually do talk about that in the article. The difference in cost for one of the homegrown petabyte pods from the cheapest suppliers (Dell) is about $700,000. The difference between their pods and cloud services is over $2.7 million per petabyte. And they have many, many petabytes. Even if you do add "a few hundred thousand a year for the people who need to maintain this hardware" - and Dell isn't going to come down in the middle of the night when your power goes out - they are still way, way on top.

I know you don't pay premiums because you're stupid. But think about how much those premiums are actually costing you, what you are getting in return, and if it is worth it.
Re:A Very Shortsighted Article by Tx · 2009-09-02 02:28 · Score: 4, Informative

We don't pay premiums because we're stupid. We pay premiums because we're lazy.
There, fixed that for you ;).
Ok, that was glib, but you do seem to have been too lazy to read the article, so perhaps you deserve it. To quote TFA, "Even including the surrounding costsâ"such as electricity, bandwidth, space rental, and IT administratorsâ(TM) salariesâ"Backblaze spends one-tenth of the price in comparison to using Amazon S3, Dell Servers, NetApp Filers, or an EMC SAN.". So that aren't ignoring the costs of IT staff administering this stuff as you imply, they're telling you the costs including the admin costs at their datacentre.

--
Oh no... it's the future.
Re:A Very Shortsighted Article by parc · 2009-09-02 02:37 · Score: 3, Interesting

At 67T per chassis and 45 drives documented per chassis, they're using 1.5T drives. 1 petabyte would then be 667 drives.
The worst part of this design that I see (and there's a LOT of bad to see) is the lack of an easy way to get to a failed drive. When a drive fails you're going to have to pull the entire chassis offline. Google did a study in 2007 of drive failure rates (http://labs.google.com/papers/disk_failures.pdf) and found the following failure rates over drive age (ignoring manufacturer):
3mo: 3% = 20 drives
6mo: 2% = 13 drives
1yr: 2% = 13 drives
2yr: 8% = 53 drives
Their logic is probably along the lines of "we're already paying someone to answer the pager in the middle of the night," but jeez, you're going to have to take a node offline ever 2-3 days for the first year and then almost 2 a day after that!
Re:A Very Shortsighted Article by fulldecent · 2009-09-02 02:49 · Score: 2, Informative

>> You might as well add a few hundred thousand a year for the people who need to maintain this hardware and also someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
>> We don't pay premiums because we're stupid. We pay premiums so we can relax and concentrate on what we need to concentrate on.
Or... you could just buy ten of them and use the left over $1m for electricity costs and an admin that doesn't sleep

--
-- I was raised on the command line, bitch
Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 02:57 · Score: 4, Insightful

You will more than likely NOT have to take a node offline. The design looks like they place the drives into slip down hot plug enclosures. Most rack mounted hardware is on rails, not screwed to the rack. You roll the rack out, log in, fail the drive that is bad, remove it, hot plug another drive and add it to the array. You are now done.
They went RAID 6, even though it is slow as shit, for the added failsafe mechanisms.
Re:A Very Shortsighted Article by SatanicPuppy · 2009-09-02 03:09 · Score: 3, Insightful

Why would you bother? Just start off by writing the data to three nodes, and then you can swap new ones in and out silently. If your space really is cheap, then that's not a problem.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:A Very Shortsighted Article by Anarke_Incarnate · 2009-09-02 03:18 · Score: 2, Informative

The hardest part will be identifying the bad drives. That is ANOTHER feature that you pay for on expensive disk systems. The controllers will alert you to where the failed drive is, as well as often times alerting the manufacturer of the failure. There have been times I have been called by a vendor to let me know a part and on site engineer was being dispatched for a failure my users were not even aware of yet due to it being off hours (and ops were asleep at the wheel).
Re:A Very Shortsighted Article by ianpatt · 2009-09-02 03:36 · Score: 3, Informative

From the credits list: "Protocase for putting up with hundreds of small 3-D case design tweaks", which I assume is http://www.protocase.com/.
Re:A Very Shortsighted Article by rijrunner · 2009-09-02 03:50 · Score: 4, Interesting

Having a couple decades of working both sides of the Support Divide, I am now of the opinion that the sole purpose of a Support Contract is to have someone at the other end of the phone to yell at. It makes people feel better and have a warm fuzzy. But, having had to schedule CE's to come onto site to replace failed hardware, I have generally found that that adds hours to any repair job. I would guess that you could power off this array, remove every single drive, move them to a new chassis, reformat them in NTFS, then back to JFS and still finish before a CE shows up on site. I recall that in the winter of 1994, *every* Seagate 4GB drive in our Sun boxes died.
What happens now when a drive goes bad now is that a drive goes bad. You spot it through some monitoring software. You pick up the phone and call a 1-800 number. Someone asks a few questions like "What is you name? What is your quest? What is your favorite color?", then you hear typing in the background. After a bit, if you're lucky, they have you in the system correctly and can find your support contract for that box. Then, they give you a ticket number and put you on hold. Then, after a bit, an "engineering" rep will come appear and say "What is the nature of the emergency" and you then tell them the same stuff, except you get to add works like "var adm messages" or something. They'll tell you to send them some email so they can do some troubleshooting. You send them what they ask for. About an hour or so later, you get an email or call back saying that the drive has gone bad and need replaced, which is pretty much the same thing you told them when you called in. They then tell you that you are on a Gold Contract with 24/7 support and that the CE has a 4 hour callback requirement from the time the call is dispatched to the CE. By this point, you are about 3-4 hours after the disk drive failed in the first place. Finally, the CE will call back after some amount of time to schedule a replacement. And here comes the real kicker.... In almost every instance for the last 10 years, we have had to do all maintenance during a scheduled window. At 1AM.
What happens now when something breaks is that someone fixes it.
Any business is faced with a Buy-It-Or-Build-It dilemma for any service or equipment. Since this was their core business, it certainly makes sense. And, it makes sense for any business of a certain size or set of skills. The reality is that the math is favoring consumer electronics for most applications because they are good enough for 85% of the business needs out there. The whole Cost-Benefit analysis must be periodically re-addressed. If you do not have $1 million a year in billed repair from a Support contract, is it worth $1 million a year for the contract? Seriously.. Even if you have a support contract, you're probably going to get billed time and materials on top of everything else.
With the math on this unit, you can build in massive layers of redundancy to greatly reduce even the possibility of the data being inaccessible and still come in far, far cheaper than any support contract and you can schedule downtown because you have redundancy across multiple chassis.
Re:A Very Shortsighted Article by PRMan · 2009-09-02 05:45 · Score: 2, Interesting

I used to work at a company that paid a 20% premium on hardware for support from HP that was COMPLETELY WORTHLESS. I told them they would be better off just ordering a 6th computer for every 5 that they bought.
The guy would show up with no tools, not even a screwdriver, and then he would need to come back the next day (with a screwdriver). Then he didn't have the part (say RAM) that we told them in the first call and the day before. Then he showed up the next day with RAMBUS instead of DDR RAM. After 3 weeks, we got the machine back online.
Which means, in the meantime, since the person whose machine it was had to have something to work on, we had to cobble together a PC from no spare parts and then try to transfer their stuff off of their drive (because nobody ever heeded the store everything on the U: and S: drive mantra) and we worked like crazy to do it, eating up our whole day.
If we had had spare machines instead, we could have just replaced her RAM in 1 minute. Or, if it was the motherboard, put her drive in an identical replacement machine in 1 minute.

--
Peter predicted that you would "deliberately forget" creation 2000 years ago...
Re:A Very Shortsighted Article by sholto · 2009-09-02 17:55 · Score: 3, Informative

I'd need some actual uptime data to make a real judgment on their service vs their competitors,
I did an extensive interview with the Backblaze CEO. No hard data on uptime but he says they lose one drive a week from the whole 1.5petabyte system and have never had a pod fail. They've been running for a year. Here's the link to the story. Also comments about the designing/testing process. http://www.crn.com.au/News/154760,want-a-petabyte-for-under-us120000.aspx

Ripoff by asaul · 2009-09-02 02:14 · Score: 4, Insightful

Looks like a cheap downscale undersized version of a Sun X4500/X4540.

And as others have pointed out, you pay a vender because in 4 years they will still be stocking the drives you bought today, where as for this setup you will be praying they are still on ebay

--
"If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton

Re:Ripoff by Anonymous Coward · 2009-09-02 02:29 · Score: 3, Insightful

why wouldn't you just build an entirely new pod with current disks and migrate the data? You could certainly afford it.
Re:Ripoff by timeOday · 2009-09-02 02:42 · Score: 5, Interesting

Depends on how it works. Hopefully (or ideally) it's more like the google approach - build it to maintain data redundancy, initially with X% overcapacity. As disks fail, what do you do then? Nothing. When it gets down to 80% or so of original capacity (or however much redundancy you designed in), you chuck it and buy a new one. By then the tech is outdated anyways.
Re:Ripoff by ciroknight · 2009-09-02 03:17 · Score: 2, Informative

Since most modern commercial-grade HDs come with a 3-5 year or better warranty these days [1], it's easier just to cash those in when the drives go bad and build a new box around the newer-model drives they ship you in return.

This is truly RAID, as Google, etc. have realized and developed. When the drives die, you don't cry over having the exact same drive stocked. You don't cry at all. At $8k a machine, you could actually afford to flat-out replace the entire box every 4 years and not affect your bottom line (since, you know, you're saving better than three times that by not going with one of the 'cloud vendors').

--
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
Re:Ripoff by PAjamian · 2009-09-02 12:46 · Score: 2, Interesting

Fine then, replace just the broken drives but as far as I'm aware Linux software raid 6 does not require the drives be the same model, or even the same size. You can get newer drives for the same or less cost as the old drives and just plug them in. Who cares if they have more capacity? Just let it go to waste if you must but it'll work just fine and certainly you won't have to be scrounging drives off of ebay.
Also consider that five years down the road we may have 10tb drives or better, but 1.5 tb drives should still be available on the consumer market (and keep in mind these are cheap consumer drives) for dirt cheap and these guys will probably be quite happy to use their same design with newer high capacity drives available at the time.

--
Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.

It's all clear now. by grub · 2009-09-02 02:17 · Score: 4, Funny

AHhh, this is why the EMC guy committed suicide. It wasn't because he was dying of cancer.

--
Trolling is a art,

My plan comes to fruition! by elrous0 · 2009-09-02 02:20 · Score: 5, Informative

Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!

And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.

--
SJW: Someone who has run out of real oppression, and has to fake it.

Re:My plan comes to fruition! by ShadowRangerRIT · 2009-09-02 02:27 · Score: 3, Funny

But what about storing the new episodes in HD? Clearly a masterpiece of TV such as this should not be stored at mere SD quality!

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Re:My plan comes to fruition! by RMH101 · 2009-09-02 02:29 · Score: 4, Funny

I think we have a new metric unit of storage, to rival the (now deprecated) Library Of Congress SI unit.
Re:My plan comes to fruition! by ari_j · 2009-09-02 02:33 · Score: 5, Funny

Soon I shall have a single media server with every episode of "General Hospital" ever made stored at a high bitrate. WHO'S LAUGHING NOW, ALL YOU WHO DOUBTED ME!!!!
And how big is a petabyte you ask? There have been about 12,000 episodes of General Hospital aired since 1963. If you encoded 45 minute episodes at DVD quality mpeg2 bitrate, you could fit over 550,000 episodes of America's finest television show on a 1 petabyte server, enough to archive every episode of this remarkable show from its auspicious debut in 1963 until the year 4078.
Of all the computer systems out there, yours is the one for which becoming self-aware terrifies me the most.
Re:My plan comes to fruition! by maxume · 2009-09-02 02:49 · Score: 2, Interesting

William Shatner has continued to be awesome into well into his 70s. He even went on Conan and mocked Sarah Palin (while gently ribbing himself).
Of the personalities in Hollywood, he is one I like quite a bit.

--
Nerd rage is the funniest rage.
Re:My plan comes to fruition! by Junior+J.+Junior+III · 2009-09-02 03:04 · Score: 2, Funny

I'm holding out for the porn version, Genital Horse Spittle.
Great donkey scenes.

--
You see? You see? Your stupid minds! Stupid! Stupid!
Re:My plan comes to fruition! by MartinSchou · 2009-09-02 07:09 · Score: 2, Interesting

You raise an "interesting" train of thought in my mind.
Encoding in 720p x264 you get something like 45 minutes in 1.1 GB. This gives you 60,900 episodes per 4U unit or 609,000 episodes per 40U rack.
In 1080p x264 you get something like 45 minutes in about 2.5 GB. This is 27,000 episodes per 4U unit or 270,000 episodes per 40U rack.
Assuming 22 episodes per season and a five year average run time, you end up with 220 episodes per show (typical science fiction shows).
Assuming 5 shows per week, 40 weeks a year, 10 year run time, you end up with 2,000 episodes per show (typical soaps).
So you could easily store 100 full sci-fi shows and 100 full soaps on in one rack (that'd be 222,000 episodes), all stored in glorius 1080p.
IMDb lists the following statistics:

452,982 movies released theatrically.
792,565 TV episodes.
75,316 made for TV movies.
61,440 TV series.
77,624 direct to video movies.
Leaving out "TV series" (they average 12.9 episodes/series, which seems reasonable with the amount of cancelled series) I'll make the following assumptions about average run time:
Theatrical releases: 120 minutes
TV episodes: 35 minutes
TV movies: 90 minutes
Direct to video: 100 minutes
That's a total of 96,638,455 minutes. Encoding that in 720p would require 2,362,274 GB or 5,315,117 GB for 1080p.
What's my point? Well, for one thing you couldn't ever watch it, as it's 183 years, so no, that wasn't my point ;)
That it is entirely within the realm of feasibility to offer downloads of every single movie and tv-show on IMDb from a hardware point of view. One of the complaints I've heard from the production companies is that it would be impossible to set up the hardware needed for it. Even at Sun's prices, you'd "only" need to pay 10 million dollars to store everything in both 720p and 1080p quality. Set up redundant servers in 10 different locations, 5 in the US, 5 in Europe, and you're still only out 100 million dollars.
From a cultural point of view, think of all the things that are lost when the copyright holders let these things rot away on shelves, throw it out or it's lost in some kind of calamity. And this is just movies and tv-shows. Add in music and news and I suspect you could easily get hugely redundant back-ups of it all for 1 billion dollars. Even if you had to replace the storage arrays every 3 years, it's still really really cheap. Figure twice that for maintenance, and we have an annual cost of about a billion dollars - cheap when we're saving all knowledge for our successors. That's roughly the cost of building 125 miles of rural freeway in Michigan. It'd be cheap at 10x the price. And in ten years - we will probably still be using high bit rate encoding (1080p+), but will the cost of storage still be as high? I suspect it'll slowly fall, slightly faster than inflation.
Having to reencode everything from time to time, would obviously take a huge amount of time, but that is the price we pay for progress. On the other hand, even with 1:1 encoding time, it'd only take 183 computer-years to do it.
Imagine what it would be like if 25 years from now your kids could, at the touch of a button, gain access to every bit of entertainment and news as from the last 25 years. I don't mean going to Wikipedia and looking up The Terminator but actually watch the film, read all the news about it, as it looked at the time, five years on, seven years on after Terminator 2: Judgement Day had its effect on the new franchise etc.
Imagine them not having to settle for what history books said happened in the year 2010 or about specific events in that year, but be able to pull up every single news article and tv news report on the subject and make up their own mind, de

Disk replacement? by jonpublic · 2009-09-02 02:20 · Score: 3, Insightful

How do you replace disks in the chassis? We've got 1,000 spinning disks and we've got a few failures a month. With 45 disks in each unit you are going to have to replace a few consumer grade drives.

Re:Disk replacement? by markringen · 2009-09-02 02:23 · Score: 2, Informative

slide it out on a rail, and drop in a new one. and there is no such thing as consumer grade anymore, they are often of much higher quality stability wise than server specific drives these days.
Re:Disk replacement? by maxume · 2009-09-02 02:54 · Score: 2, Informative

It sounds like they just soft-swap a whole chassis once enough of the drives in it have failed.
If their requirements are a mix of cheap, redundant and huge (with not so much focus on performance), cheap disposable systems may fit the bill.

--
Nerd rage is the funniest rage.
Re:Disk replacement? by TooMuchToDo · 2009-09-02 03:28 · Score: 2, Interesting

What kind of drives are you using? We've got 4800+ spinning drives, and we only have 1-2 failures a month.

Re:My math is a bit rusty... by Desler · 2009-09-02 02:21 · Score: 5, Informative

It's not your math that's rusty it's your reading skills.

Linux-based server using commodity parts that contains 67 terabytes of storage at a material cost of $7,867.

wtf? by pak9rabid · 2009-09-02 02:23 · Score: 5, Insightful

FTA...

But when we priced various off-the-shelf solutions, the cost was 10 times as much (or more) than the raw hard drives.

Um..and what do you plan on running these disks with? HD's don't magically store and retreive data on their own. The HD's are cheap compared to the other parts that create a storage system. That's like saying a Ferrari is a ripoff because you can buy an engine for $3,000.

Re:wtf? by Rich0 · 2009-09-02 07:08 · Score: 2, Interesting

Yup.
You can do even better than the price quoted in this article. On Newegg I found a 1TB drive for $95 - that is only $95k/PB. What a bargain!
Except that I don't have a PB of space with my solution. I have 0.001PB of space. If I want 1PB of space then I need hundreds of drives, and some kind of system capable of talking to hundreds of drives and binding them into some kind of a useful array.
This sounds like criticizing the space shuttle as being wasteful as you can cover the same distance in a truck for 1/10000000 x the cost. Except of course for the minor detail that the truck can't fly in space, and can't do all that distance on a single load of fuel in a few hours.
Or, I can generate completely green energy at a very low price per gigawatt using a small generator and a hamster wheel. Except that I'm not generating a gigawatt - I'm generating maybe a few mW and scaling it up. Unless I bury China in rats I'm not going to be competing with the Three Gorges Dam.

Re:My math is a bit rusty... by ShadowRangerRIT · 2009-09-02 02:24 · Score: 2, Informative

You misread. It's $7,867 per 67 terabytes. So at the hard disk standard for a petabyte (base 10, not base 2), 1000 TB == 1 PB:
(1000 TB / 67 TB) * $7,867 = $117417.91

--
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print

Re:That's great but what about all the hidden cost by CoolCash · 2009-09-02 02:24 · Score: 2, Informative

If you check out what the company does, they are an online backup company. They don't host servers on this array, just backup data from your desktop. They just need massive amounts of space which they make redundant.

Yeah, but with Amazon you get FREE SHIPPING !! by Anonymous Coward · 2009-09-02 02:27 · Score: 2, Insightful

I love free shipping, even if it costs me more !! I like FREE STUFF !!

Re:That's great but what about all the hidden cost by hodagacz · 2009-09-02 02:28 · Score: 2, Insightful

They designed and built it so they should know how to support it. If someone else builds one, just learning how to get that beast up and running is excellent hands on training.

Not that shortsighted for their purposes by Overzeetop · 2009-09-02 02:30 · Score: 5, Insightful

Yeah, this only works if your the geeks building the hardware to begin with. The real cost is in setup and maintenance. Plus, if the shit hits the fan, the CxO is going to want to find some big butts to kick. 67TB of data is a lot to lose (though it's only about 35 disks at max cap these days).

These guys, however, happen to be both the geeks, the maintainers, and the people-whos-butts-get-kicked-anyway. This is not a project for a one or two man IT group that has to build a storage array for their 100-200 person firm. These guys are storage professionals with the hardware and software know how to pull it off. Kudos to them for making it and sharing their project. It's a nice, compact system. It's a little bit of a shame that there isn't OTS software, but at this level you're going to be doing grunt work on it with experts anyway.

FWIW, Lime Technology (lime-technology.com) will sell you a case, drive trays, and software for a quasi-RAID system that will hold 28TB for under $1500 (not including the 15 2TB drives - another $3k on the open market). This is only one fault tolerant, though failure is more graceful than a traditional RAID). I don't know if they've implemented hot spares or automatic failover yet (which would put them up to 2 fault tolerant on the drives, like RAID6).

--
Is it just my observation, or are there way too many stupid people in the world?

they are missing hardware mgmt by TheGratefulNet · 2009-09-02 02:32 · Score: 5, Interesting

where's the extensive stuff that sun (I work at sun, btw; related to storage) and others have for management? voltages, fan-flow, temperature points at various places inside the chassis, an 'ok to remove' led and button for the drives, redundant power supplies that hot-swap and drives that truly hot-swap (including presence sensors in drive bays). none of that is here. and these days, sas is the preferred drive tech for mission critical apps. very few customers use sata for anything 'real' (it seems, even though I personally like sata).

this is not enterprise quality no matter what this guy says.

there's a reason you pay a lot more for enterprise vendor solutions.

personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.

--

--
"It is now safe to switch off your computer."

Re:they are missing hardware mgmt by N1ck0 · 2009-09-02 02:58 · Score: 4, Insightful

Its better at what they need it for. Based on the services and software they describe on their site, it looks like they store data in the classic redundant chunks distributed over multiple 'disposable' storage systems. In this situation most of the added redundancy that vendors put in their products doesn't add much value to their storage application. Thus having racks and racks of basic RAIDs on cheap disks and paying a few on-site monkeys to replace parts is more cost effective then going to a more stable/tested enterprise storage vendor.
Re:they are missing hardware mgmt by SatanicPuppy · 2009-09-02 03:04 · Score: 5, Informative

This sort of attitude is how Sun got it's lunch eaten in the market in the first place.
Yes, your hardware rocks. It's so fucking sexy I need new pants when I come into contact with it.
It also costs more than a fucking italian sports car.
Turns out that if your awesome hardware is 10 times better than commodity hardware, but also 25 times as expensive, people are just going to buy more commodity hardware.
I've got some Sun data appliances and I've got some Dell data appliances, and the only difference I've seen between them is purely one of cost. The only thing that ever breaks is drives.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Re:they are missing hardware mgmt by Anonymous Coward · 2009-09-02 03:09 · Score: 2, Informative
RTFA - they are not saying one of these is a mission critical enterprise storage system. In fact they said:

No One Sells Cheap Storage, so We Designed It
When you are talking about multiple petabyte scale paying 5x as much for 5 temperature sensors, SAS drives, LEDs etc becomes pretty stupid.
- Treat the 67TB system as an $8,000 hard drive.
- Deploy a few tens or hundreds of them with redundancy between them.
- In 2-3 years when they start to fail, replace them with a larger capacity drives.
- ???
- Take your hundreds of thousands of dollars not payed to SUN, IBM, EMC, NetApp etc and PROFIT!!!
Re:they are missing hardware mgmt by swillden · 2009-09-02 03:12 · Score: 4, Insightful

personally, I have a linux box at home running jfs and raid5 with hotswap drive trays. but I don't fool myself into thinking its BETTER than sun, hp, ibm and so on.
I don't these folks guy believe their solution is better -- just cheaper. MUCH cheaper. So much cheaper that you can employ a team of people to maintain the "homebrew" solution and still save money.

--
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
Re:they are missing hardware mgmt by BobMcD · 2009-09-02 08:50 · Score: 2, Funny

And speaking of sexy, sports cars, and Sun, there is one huge factor that sets apart the purchase decisions -
Sun has nothing on Ferrari for getting you laid.

cheap drives too by pikine · 2009-09-02 02:38 · Score: 2, Informative

Reliant Technology sells you NetApp FAS 6040 for $78,500 with a maximum capacity of 840 drives, without the hard drive (source: Google Shopping). If you buy FAS 6040 with the drives, most vendors will use more expensive and less capacity 15k rpm drives instead of the 7200rpm drives the BlackBlaze Pod uses, and this makes up a lot of the price difference. The point is, you could buy NetApp and install it yourself with cheap off-the-shelf consumer drives and end up spending about the same magnitude amount of money. I estimate that NetApp would cost just 1.5x the amount.

NetApp FAS 6040 at $78,500 + 840 x 1.5TB drives at $120 each = $179,300 which gives you 1.26PB. Cost per petabyte is $142,500, only slightly more expensive than BlackBlaze $117,000 from the article. The real story is that BlackBlaze is able to show a competitive edge of $30,000, or being 20% cheaper.

--
I once had a signature.

Or wait 5 years and buy it at newegg for $280 by dicobalt · 2009-09-02 02:39 · Score: 2, Funny

and save $2,799,720.

Lets try to be a bit more supportive here! by fake_name · 2009-09-02 02:50 · Score: 4, Insightful

If an article went up describing how a major vendor released a petabyte array for $2M the comments would full of people saying "I could make an array with that much storage far cheaper!"

Now someone has gone and done exactly that (they even used linuxto do it) and suddenly everyone complains that it lacks support from a major vendor.

This may not be perfect for everyones needs, but it's nice to see this sort of innovation taking place instead of blindy following the same path everyone else takes for storage.

What's all the hate? by xrayspx · 2009-09-02 02:53 · Score: 5, Insightful

These guys build their own hardware, think it might be able to be improved on or help the community, and they release the specs, for free, on the Internet. They then get jumped on by people saying "bbbb-but support!". They're not pretending to offer support, if you want support, pay the 2MM for EMC, if you can handle your own support in-house, maybe you can get away with building these out.

It's like looking at KDE and saying "But we pay Apple and Microsoft so we get support" (even though, no you don't). The company is just releasing specs, if it fits in your environment, great, if not, bummer. If you can make improvements and send them back up-stream, everyone wins. Just like software.

I seem to recall similar threads whenever anyone mentions open routers from the Cisco folks.

--
I like music

Re:What's all the hate? by sockonafish · 2009-09-02 03:05 · Score: 4, Interesting

Running on the cheapest hardware possible and engineering the software to gracefully deal with hardware failure is exactly how Google runs their datacenters, as well. As long as you've got the talent to pull it off, it's much more cost effective than buying a prefab solution.

Re:Liability insurance by devjoe · 2009-09-02 03:10 · Score: 2, Insightful

If you build a petabyte stack using 1.5TB disks you need about 800 drives including RAID overhead. With an MTBF for consumer drives of 500,000 hours, a drive will fail roughly every 10-15 days, if your design is good and you create no hotspots/vibration issues.

Rebuild times on large RAID sets are such that it is only a matter of time before they run a double drive failure and lose their customers data. The money they saved by going cheap will be spent on lawyers when they get the liability claims in.

If you RTFA, you will see that they are using RAID6 with 2 parity drives per raid, so a double drive failure can be handled, and it is only the less likely triple drive failure that will ruin them. It seems weak that they don't have hot-swappable drives in this configuration, but they have software that is managing the data across disk sets, and presumably they have redundant copies of data that keep the data accessible when one of their servers is taken down to replace a drive (if they don't, the downtimes due to replacing drives will make the service useless). This redundancy may also save them in the case that they actually lose a RAID set.

Re:They could have quite better by cowbutt · 2009-09-02 04:00 · Score: 2, Informative

they used incredibly cheep-ass HBA's for no good reason.

In their defence:

A note about SATA chipsets: Each of the port multiplier backplanes has a Silicon Image SiI3726 chip so that five drives can be attached to one SATA port. Each of the SYBA two-port PCIe SATA cards has a Silicon Image SiI3132, and the four-port PCI Addonics card has a Silicon Image SiI3124 chip. We use only three of the four available ports on the Addonics card because we have only nine backplanes. We don't use the SATA ports on the motherboard because, despite Intel's claims of port multiplier support in their ICH10 south bridge, we noticed strange results in our performance tests. Silicon Image pioneered port multiplier technology, and their chips work best together.

Please.... by mpapet · 2009-09-02 04:04 · Score: 2, Interesting

where I work we pay a premium for what happens when the power goes out, what happens with a drive goes bad,

Whomever spec'd your systems should have accommodated obvious failures like this. As in, paying for colo, using servers with dual power supplies that fail over, sensible RAID strategy. Giving money to EMC in this situation is not sensible.

but they also left out the people who manage these massive beasts. I mean, how many hundreds (or thousands) of drives are we talking here?
I have a couple of hundred drives going at any one time and I get an SNMP alert when a drive goes bad. I take one out of the closet and destroy the broken one. The RAID does the rest.

someone to get up in the middle of the night when their pager goes off because something just went wrong and you want 24/7 storage time.
Our storage strategy is N+1 all the way and required to be online 24/7 so failures are part of the plan. They are probably part of the plan at this startup.

We pay premiums so we can relax and concentrate on what we need to concentrate on.
I don't understand this. If your job is 89% software dev, then EMC may be the way to go. Expensive! But, it makes a little business sense. If you aren't spending most of your time writing software that adds value to your service/product, then EMC is doing your job and you are some kind of TPS generator. Do you pay a premium to blame someone else? I've had the opportunity to work in places like this and I've always passed because of the veiled contempt for IT.

Please, explain this to me.

--
http://www.maxineudall.com/2010/02/should-economists-be-sued-for-malpractice.html

are you a project manager by any chance? by leoc · 2009-09-02 04:05 · Score: 4, Insightful

I like how you dismiss a detailed real world design example based simply on a claimed feature without any further substantiation. Very classy. I'm not saying you are wrong, but would it kill you to go into a little more detail about why these folks need "luck" when they are clearly very successful with their existing design?

--
STFU about slashdot bias.

Re:are you a project manager by any chance? by pyite · 2009-09-02 04:13 · Score: 5, Informative

are you a project manager by any chance?
Of course not. A project manager would look at this and go, "wow, we saved a lot of money!" It's pretty simple. ZFS does what most other filesystems do not; it guarantees data integrity at the block level by the use of checksums. When you're dealing with this many spindles and dense, non-enterprise drives, you are virtually guaranteed to get silent corruption. The article does not once have any of the words corrupt.*, checksum, or integrity mentioned in it once. The server doesn't use ECC RAM. The project, while well intentioned, should scare the crap out of anyone thinking about storing data with this company.

--
"Nature doesn't care how smart you are. You can still be wrong." - Richard Feynman
Re:are you a project manager by any chance? by profplump · 2009-09-02 06:28 · Score: 3, Insightful

What failure rate are you using to "virtually guarantee" that you'll get data corruption with 45 drives?
What failure rate in your RAM, CPU, and motherboard are you using to guarantee that the ZFS checksum are not themselves corrupted? Not to mention the high possibility of bugs in a younger file system, and the different performance characteristics among FSes.
I'm not say ZFS is a bad plan, at least if you're running enough spindles, but if you're going to "virtually guarantee" silent corruption with less than 100 drives I'd like to see some documentation for the the non-detectable failure rates you're expecting.
It's also worth noting that in a lot of data, a small amount of bit-flips might not be worth protecting against at all. Or they might be better protected at the application level instead of the block level -- for example, if the data will be transmitted to another system before it is consumed, as would be typical for a disk-host like this, a single checksum of the entire file (think md5sum) could be computed at the end-use system, rather than computing a per-block checksum at the disk host and then just assuming the file makes it across the network and through the other system's I/O stack without error.

Don't forget where the real value is by pedantic+bore · 2009-09-02 06:23 · Score: 2, Insightful

Forgive me; I've committed the sin of working for one of those name-brand storage companies.

The real value in a data storage system isn't in the hardware, it's in the data. And the real cost incurred in a data storage system is measured in the inability of the customer to access that data quickly, efficiently and (in the case of a disaster) at all.

If you need to crunch the data quickly, a higher-performing system is going to save you money in the end. Look at all the benchmarks: no home-grown systems are anywhere on the lists. If you want to stream through your data at several gigabytes per second, you need to pay for a fast interconnect. Putting 45 drives behind a single 1GbE just doesn't cut it.

Similarly, if you want to ensure that the data is protected (integrity, immutable storage for folks who need to preserve data and be certain it hasn't been tampered with, etc) and stored efficiently (single instance store, or dedupe, so you don't fill your petabytes of disks with a bajillion copies of the same photos of Anna Kournakova) then you need to pay for the extra goodness in that software and hardware as well.

Finally, if you want extremely high availability, then the cost of the hardware is miniscule compared to the cost of downtime. We had customers that would lose millions of dollars per service interruption. They're willing to pay a million dollars to eliminate or even reduce downtime.

These folks are essentially just building a box that makes a bunch of disks behave like a honking big tape drive. It's a viable business--that's all some folks need. But EMC et al are not going to lose any sleep over this.

--
Am I part of the core demographic for Swedish Fish?

*sigh* by upside · 2009-09-02 06:29 · Score: 4, Insightful

How about reading the section "A Backblaze Storage Pod is a Building Block".

<snip> the intelligence of where to store data and how to encrypt it, deduplicate it, and index it is all at a higher level (outside the scope of this blog post). When you run a datacenter with thousands of hard drives, CPUs, motherboards, and power supplies, you are going to have hardware failures — it's irrefutable. Backblaze Storage Pods are building blocks upon which a larger system can be organized that doesn't allow for a single point of failure. Each pod in itself is just a big chunk of raw storage for an inexpensive price; it is not a "solution" in itself.

Emphasis mine. I believe there are quite a few successful and reliable storage vendors not using ZFS. We get the point, you like it. Doesn't mean you can't succeed without it. Be more open minded.

--
I'm sorry if I haven't offended anyone

Slashdot Mirror

Build Your Own $2.8M Petabyte Disk Array For $117k

89 of 487 comments (clear)