For First Three Years, Consumer Hard Drives As Reliable As Enterprise Drives
nk497 writes "Consumer hard drives don't fail any more often than enterprise-grade hardware — despite the price difference. That's according to online storage firm Backblaze, which uses a mix of both types of drive. It studied its own hardware, finding consumer hard-drives had a failure rate of 4.2%, while enterprise-grade drives failed at a rate of 4.6%. CEO Gleb Budman noted: 'It turns out that the consumer drive failure rate does go up after three years, but all three of the first three years are pretty good,' he notes. 'We have no data on enterprise drives older than two years, so we don't know if they will also have an increase in failure rate. It could be that the vaunted reliability of enterprise drives kicks in after two years, but because we haven't seen any of that reliability in the first two years, I'm skeptical.'"
At my company all the hardware is managed by CSC. They retire severs in about 3 years...including the drives.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
"Enterprise" drives may have longer warranty coverage, so you are essentially just buying an extended warranty that is built into the selling price. This is how water heaters are priced...a 5 year warranty water heater is often identical to a 10 year warranty unit, but the manufacturer has crunched the failure rate numbers and will just wind up replacing a percentage of 10 year models when they start to leak in 8 years.
"We make our world significant by the courage of our questions and by the depth of our answers." Carl Sagan
But are those "Enterprise SSDs"?
Appended to the end of comments you post. 120 chars.
What? There's absolutely difference between 87 octane and 92+ octane. While many high end cars are able to compensate for this difference by sacrificing efficiency, it's certainly not wise to put the lower grade gasoline in a high performance vehicle. Not a good analogy at all.
Depends entirely on your car.
For many cars, premium/high octane gas does very little. For higher-end cars and sports cars, it can make a huge difference.
And then on the really high-end there's a reason they make racing fuel (118 octane), because it makes a huge difference for some things.
A 1996 Buick, not so much. A Porsche or something like that, I bet it makes a huge difference -- both in performance and engine longevity.
Lost at C:>. Found at C.
Sadly no, they are just Intrepid's SSDs
I am Bennett Haselton! I am Bennett Haselton!
What? There's absolutely difference between 87 octane and 92+ octane.
For 99% of cars, there is no difference. Unless a car is specifically designed to use a higher compression ratio, there is no benefit whatsoever to a higher octane rating. Besides, you are assuming that the premium gas actually has a higher octane rating. Years ago, it actually cost more to make high octane gas. Today the octane rating can be tweaked with cheap additives. So it is common to just make it all 92, then just use one tanker truck to make the delivery and just fill all the tanks with identical gas.
Google already published many detailed reports on various issues surrounding the HDD business, proving that the money saved by buying cheaper hard-drives, and using them in data 'defending' situations (replicating data on multiple drives) made far more sense then using so-called 'enterprise' class equipment in complex, expensive configurations. Once again, to the surprise of no alpha, the KISS (keep it simple, stupid) principle wins out in engineering.
The buzz wordy, mock intellectual, synthetically complex world of 'enterprise' solutions is designed to appeal to the mind of the 'beta', a class of technocrat for whom rote-learning is everything. IT people are mostly of this class, so the 'paraphernalia' and 'jargon' make such people feel 'special'. The fundamentals of Computer Science fly right over the heads of most people involved in computer decision making.
It shames people to not even understand why the capitalist society works best with mass manufactured items, and that limited run items will always have significant compromises. Make more of an item, and it gets cheaper AND more reliable through necessity of efficiency.
But only a few days back, in some forum, people were dribbling in ecstasy because some fake enterprise HDD (RED series from Seagate?) was being 'discounted' to only 40% above the cost of the cheapest quality 3TB HDD. Many people gave EXPENSE as the primary reason for buying the vastly inferior Xbox One over the PS4 (in other words they were 'big' individuals because they could afford the more expensive console).
Disclaimer: Backblaze engineer here. I don't think all "commercial storage systems" get exactly the same "hammering". Some commercial systems are used to store data quietly for a long time (let's say online backup or shutterfly storage of photos), some commercial systems are hammered constantly (google's homepage search). I reject the concept that "enterprise" or "commercial" is a thing. You MUST look at the specific application. Some consumers use their hard drives quite a bit, some don't. Some corporations are hammering away at their drives, some are not.
"Realistically the first part to fail on a PC will be the hard drive."
Only because the user isn't technically part of the PC.
Also FYI the octane requirement can be related to timing advance, where a lower-compression turbocharged engine with more advanced timing would need higher octane gas to make longer burns from each spark (higher octane gas burns longer than lower octane gas). The earlier spark sets off a longer-burn time of gas timed to the timing, needing the longer-burn ability of the 92+ octane. An old simple truck with 0 BDC timing would be happy with 87 octane, where a newer engine with 15 BDC timing advance would be better with 92+ octane.
Fuck this is way off topic from hard drives, sorry. Just needed to fill in some missing info.
As for hard drives, the more, the better. RAID is for safety now, and SSD's are for speed where we used to have RAID-0. ETC
All the newer shelves came preloaded with Coraid-approved drives. As I said, there's hundreds of drives involved here, a lot of SATA 1TB and 2TB and some SAS 600GB. I think out of the later drives, we've had two fail. Maybe three.
Asked about it, Coraid said, yes, the warranty is better on "Enterprise-class" or "RAID-class" drives, but also, the firmware is different. They claim that drives intended for the consumer / SOHO market spend a lot of time retrying marginal reads before declaring an unreadable sector and sparing it. They say that SAN-class drives limit the retry time, because the array controller handles it more efficiently, since it has the big-picture view.
The also say that the drives are optimized for close-quarters operation, all jammed together in an array, handling vibration and heat build-up slightly differently, and that they have minor differences to keep lubrication from migrating out of the spindle bearing under continuous operation. I don't know but I imagine loss of spindle bearing lube would add vibration and make any but the best reads more marginal.
I don't know for sure, but we've spent a great deal of US dollars on their products and our experience has borne out the fact that there's a definite difference in arrays.
As for corporate desktop and/or server use, well, I don't really know. Our servers that have one to four drives were mostly shipped with those drives, so we didn't choose them. I can't tell you if they are enterprise class drives, but I imagine they are, based on the replacement costs. And I know about what some of those costs are, or anyhow I know they were way more than I personally pay for drives for home desktop and server use. I know that because occasionally they fail, and I have to buy new ones.
Do you actually do Enterprise Storage? Because I know people who do.
At the really high end, the machines automatically call home and report a fault to the vendor. The vendor then dispatches someone to replace the faulty bit within the SLA.
In my experience, and from what I've been told by people who do this for a living, the Enterprise class drives come with the benefit of a warranty in which the manufacturer is contractually obligated to get you a replacement within a fixed amount of time.
Anyone doing real enterprise class storage for real mission critical things -- using commercial SATA drives is just not done unless it's cheap/bulk storage. Sure, you pay through the nose to the vendor for that kind of support, but you also have guaranteed service time and availability.
I just don't see evidence of people who do this at an enterprise scale cheaping out on disks for the important stuff.
Lost at C:>. Found at C.
SSDs have their uses, but they're nowhere near cheap enough to replace systems with massive amounts of storage or that rely on RAID.
They're getting really close for primary storage and being used in RAID arrays..
300GB 15k RPM SAS is about $180-$200. An Intel DC S3500 Series SSD (300GB) is around $390. So the price difference of the SSD vs the spinning rust is only about 2x now. And you will probably gain 25x IOPS over that spinning rust.
Bulk storage using 7200 RPM drives is still the domain of spinning rust and will be for a while.
Wolde you bothe eate your cake, and have your cake?
No difference between enterprise and home HDD's that I know of.
As for what "hammering and heavy use " of a drive is?
The biggest killer of HDD's is something called the CSS test cycle.
CSS = Contact Start Stop where the drive is booted up, spun up, and then shut down repetitively.
Generally, a HDD sitting there spinning away is not what kill them off,
however turning them on-off-on-off a lot is the most abusive thing that you can do.
I still think WD makes the best quality out there, but that's just my opinion.
just my 0.02 worth...
www.effectiveelectrons.com "chips that work" Analog, RF, Mixed Signal
Our Dell shelves (billing servers and store customer account info) have hot spares already spinning inside the shelves. NetApp Filers do this also. If a drive fails, the storage system begins IMMEDIATELY transitioning to the spare. So I agree with you wholeheartedly there. Backblaze uses RAID6 for the customer backup storage where we group 15 drives into a RAID group with 2 parity drives. So we can lose any 2 drives out of 15 and the data is still 100% intact. I really, REALLY cannot recommend RAID5 to anybody. Having a lone hard drive is fine for some applications (my laptop), and having RAID6 with 2 parity drives is fine for some applications. I cannot imagine why you would have RAID because you care about your uptime, but not care enough to use more than RAID5.
Consumer drives have this thing called being half the price, keep one spare, what the heck if it breaks go out and buy a new one, in 1 a hour, still faster than 4 hours. What kind of enterprise organization wouldn't have a few hard drives spare just in case a few failed. Send the old one back to replaced, in their own good time.
I don't see why you would have to pay 100% markup for what is basically insurance, for the manufactures defects.
Sort of like airline tickets that you can reschedule, more than 2x the price and still subject to availability (last time my company bought one), just buy the non refundable ticket, if your plans change then buy another one, the average cost is going to be less, unless you change your plans a lot, perhaps you need better planning? You also have travel insurance for such things which is not the cost of the plane ticket, and covers other things too.
No, from TFA:
, so the comparison is indeed pointless (more accurately, it's baseless).
make imaginary.friends COUNT=100 VISIBLE=false
Are you saying that the enterprise drives last longer?
I didn't say that.
Or just that they are replaced for free when they die at the same or higher rates? If you want to save money, I think the answer is *NOT* buy the warranty (so buy consumer drives) because the warranty costs more than just replacing the failed drives?
If your company wants to do that, then do it. But I would think that is a hard sell to the IT directors who want service and replacement parts quickly. Here's the scenario:
1. HD fails
2. Log ticket with HD company and get replacement drive with little cost
or
2. Put in a purchase order for a new drive.
At some companies, buying a new drive outright is more troublesome/bureaucratic than getting a replacement drive.
Well, there's spam egg sausage and spam, that's not got much spam in it.
No it doesn't. You utterly fail to understand what the octane rating means. The engine in your saturn would in no way benefit from the higher octane rating. It could in fact run without noticing a problem with a significantly lower octane rating. Octane ratings matter in high compression engines or turbo/supercharged engines, not in econobox.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
"Enterprise" grade drives are often faster, having better processors and more cache
The cache is whatever is written on the drive, so a "Enterprise" drive with 32 MB of cache has less than a "Consumer" drive with 64 MB. I don't know what the heck you think the word "Enterprise" gets you in this case?
drive manufacturers have to listen to server and storage array manufacturers and meet their requirements
Different storage arrays have different requirements, I hate the idea that people think "Enterprise" magically got all the tradeoffs correct. For example, low power and high responsiveness are BOTH valid goals but probably are at odds. Some Enterprises (like Backblaze and Shutterfly) care deeply about their electrical power bill and the drives aren't the performance bottleneck. Should we buy enterprise drives or not?
You may want to check your environment for heat or dust, or get better power supplies. I can not remember the last drive I have had fail in the warranty period.
I totally agree that "bureaucracy affects IT decisions". In a previous company we sold spam blocking software (we were the good guys) but our customers asked us to provide the software and hardware in a bundle because they had a hard time convincing their management to purchase stand alone computer hardware. So we pre-bought a PC clone, marked it up by a FACTOR OF 4 (for our trouble), put a sticker on the front with our company name and the IT guys happily passed the price on to their managers who happily signed the P.O.
The only major company I know that uses consumer grade HDs in volume is probably Google
What qualifies as "major"? :-) This article is about Backblaze, we have 25,000 consumer hard drives, are we "major"?
Given the cheap PSU's I've seen in a lot of boxes (and the rate of failure), I'd say in many cases that it's a contest between the drives and the PSU, especially when you get to areas with flakey power.
I think you missed his point. With the money you save, buy a spare drive.
6 drives with enterprise warranty: $1800, 12 hour replacement
7 drives with consumer warranty: $1300, instant replacement
You can't use a consumer drive in a RAID array if that drive will spend 90 seconds trying to recover a normal read error before sparing the sector out. TLER means "give up almost immediately" on media errors.
Yes, it's a bit of a scam that you have to buy a high-end drive to get TLER, since it's just a flag in the firmware, but it's still critical ro RAID.
Socialism: a lie told by totalitarians and believed by fools.
> Do you actually do Enterprise Storage? Because I know people who do.
> At the really high end, the machines automatically call home and report a fault to the vendor. The vendor then dispatches someone to replace the faulty bit within the SLA.
Yes, I deal directly with that, with Big Company and Really Big Company, and I have to say the process doesn't work very well, for many reasons that I won't enumerate here for keep-my-job reasons. In all honesty, we had better uptime and much faster response when we stocked our own spares and hired someone to walk through the machine room daily looking for yellow lights. Sorry, but that has been my experience. After outsourcing storage, the lag from warning light to replacement is significant, with many hilarious hijinks along the way. (My favorite being when they remotely updated the firmware during the same service call as disk replacement and bricked the device.) It's a great example of not getting what you pay for, except the ability to check off managerial line items.
Oliver's law of assumed responsibility: If you're seen fixing it, you will be blamed for breaking it.
I was curious, so I did some math:
320 miles / 15 gallons = 21.33 MPG
250 miles / 15 gallons = 16.67 MPG
320 - 250 = a 70 mile difference in performance.
At 16.67 MPG, 70 miles equates to about 4.2 extra gallons needed to reach 320 miles. So for that person, using premium is like having an extra 4.2 gallons in his tank.
In my state, the best prices I could find for 87 and 92 gas were:
$2.83 for 87 and $3.11 for 92
$2.83 * 19.2 = $54.34
$3.11 * 15.0 = $46.65
So for every 320 miles he drives, he is basically saving $7.69. Not earth shattering, but definitely a win.
Doesn't work like that. Knock sensor outputs are used to retard timing. Maximum advance is determined in advance for the engine with recommended fuel. Otherwise a failed knock sensor could quickly result in a destroyed engine.
Perhaps if OP's engine was suffering from excessive knock due to a fault (excessive carbon build-up, incorrect timing, etc.) then higher octane fuel would make a difference.
The real "Libtards" are the Libertarians!
If you think SSDs fail because a part "fails" you lack understanding of how they work.
SSDs have a property called "write endurance" - their data cells are rated to a specific number of writes. Every time you write, you consume some of the remaining write capacity of the drive. It works like a salt shaker: works find until you run out of salt.
Enterprise drives can have dozens to hundreds of times the write endurance of a consumer drive. For example, the Intel SSDs we use are rated to withstand 100% of the drive's capacity in writes every 24 hours for many years on end. A consumer drive couldn't do that for more than a few weeks, perhaps a month or two.
I'd happily pay 2x or 3x the money to get 20x the write endurance.
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Until we see some names, those studies are useless...
I've got better things to do tonight than die.
Backblaze happens to use software RAID6 - standard Debian Linux, we use the built in mdadm tool. Our current pods have 8 GBytes of RAM, so I guess they could theoretically use all of that (and swap) instead of using "crummy RAID controllers with no memory to speak of".
Yeah, no kidding. Back in my younger and less persuasive days, we were on a project where we were forced by PHBs to use consumer drives in an enterprise system (storing and retreiving syslog data in a VERY busy environment). We were literally blowing them out every three months or so until the Powers That Be finally relented and let us put in proper storage (back then that also meant shelling out for a pricy SCSI HBA). I think that the gap has closed somewhat since then, and there are also some interesting options in drives that are purpose-built for things like DVRs and low-volume RAID. Also, back then (I don't know if it's still the case today) enterprise HDDs were tested individually for quality control, whereas consumer HDDs were just randomly sampled from each batch.
For many enterprise applications, though, the difference in things like seek times and sustained data transfer rate can be substantial in a busy environment.
Help save the critically endangered Blue Iguana
I'd happily pay 2x or 3x the money to get 20x the write endurance.
That only makes sense if you are hitting the write limits. If the drive dies because the bearings wear out after 5 years of spinning regardless of the number of writes, you have just paid 3x the money and gotten exactly zero benefit.
Here in Australia, 92 is the standard fuel and 97 is the premium. I can't imagine putting 87 in my car...
Australia displays the "Research Octane Number" on the pumps, while the US diplays the "Anti-Knock Index", which is:
((Research Octane Number) + (Motor Octane Number)) / 2
Since MON is often 8-10 points lower for the same fuel, this results in 4-5 points lower on the pump display in the US.
Turbo charged cars and supercharged cars both have a higher compression ratio than naturally aspirated cars.
Wrong; compression ratio is a function of the geometry of the piston/combustion chamber only. In practice, it is actually usually the opposite - a similar engine that is turbo/super-charged will have a lower compression ratio than its NA counterpart, specifically because the forced induction requires less compression ratio to get the same power.
Even my old 4-banger (gutless) 1997 Saturn SL1 sees a difference in pickup between 87 and 89 octane fuels when at highway speeds.
You are deluding yourself. Unless your car uses high or variable compression (it doesn't), there is no benefit whatsoever to higher octane gasoline.
What the heck? The error retry and sector sparing are within the drive itself. ZFS doesn't even see this. What ZFS can see is a drive not responding for 90 seconds after a write command, and ZFS or the driver below the ZFS level does not like this. There is real danger of multiple drives being kicked out of the storage pool quickly and the whole pool failing, when proper drive behavior lets the pool continue undegraded even in the face of bad sectoirs on multiple disks.
There are plenty of consumer drives that can be set to the same TLER (time limited error control) behavior as enterprise drives, though.
Right on the money about using ZFS, though. I will never understand losers using old fashioned expensive caching RAID controllers when ZFS on dumb SATA/SAS ports is far superior in every way. Many or most of them are Windows losers, of course.
very common for multiple drives in an array to fail within a short time window, due to shared environmental problems
Exactly. We had one interesting incident where in the middle of the night, 3 pods right next to each other in a rack all went berserk and all their RAID fell apart. That's 135 drives all at once (3 pods each with 45 hard drives). We reassembled them all, and the VERY NEXT NIGHT at the same time it happened again. We moved all three servers to different ends of the datacenter -> and finally figured out which server was causing the problems. The fan bearings on a fan were going bad, and when the fan came on it vibrated the entire cabinet. We have "nightly cleanup" jobs that run to verify data integrity and delete files we no longer want, this was enough load to cause the CPU to heat up enough to trigger the bad fan.