Analyzing Long-Term SSD Failure Rates
wintertargeter writes "It looks like Tom's Hardware has posted the first long-term study of SSD failure rates. The chart on the last page is interesting — based on numbers, it seems SSDs aren't more reliable than hard drives. "
or should we conclude 'just as reliable'?
SSDs haven't even been available "long term". The SSD in this computer hasn't been available for longer than 20% of its warranty period, for example. Extrapolating data from generation zero SSDs to today's SSDs seems foolish.
Did the poster even look at the chart he linked to? Those big lines that shoot up to the top after 1-3 years? They're the failure rates for hard disks. The ones near the bottom? They're the failure rates for SSDs. Now, some of the SSD figures are projected and look quite optimistic, but the number of hard disks failing after 3 years looks high than the number of SSDs failing after three years by all of the studies. For most workloads, the SSDs fail less often, and the SSD failures only exceed HD failures very early on in their lifetimes.
I am TheRaven on Soylent News
The chart linked is not terribly useful, since the legend doesn't really explain what the curves are (three completely different curves with the same label, HDD Schroeder 2007).
http://www.geoffreylandis.com
Hasn't this always been the assumption? I've always been told by everyone in every discussion about SSD vs HDD that SSD has a lesser lifetime.
I didn't read TFA but the chart doesn't tell me that "SSDs aren't more reliable than hard drives".. the SSDs were generally 6% or under (assuming the linear progression) whereas regular HDD approached 14%+ after five years. And "Long-term" in the title? The SSD data in the chart only goes for 1 year. Not exactly long term when the chart goes from 1-5 years of use. The actual data for the SSDs is only 20% of the time span.
it seems SSDs aren't more reliable than hard drives.
I have never ever heard a single fucking person make the claim that SSDs were more reliable. Faster, yes. Less reliable, yes. But I've never heard anyone claim they were more reliable as some sort of selling point.
Whoever wrote that article might know a lot about drives. But they don't know a lot about how to write an interesting and readable article.
Never email donotemail@WeAreSpammers.com
THG did a good job separating the cause of 'failures' between SSDs and HDDs as I get asked a lot by potential customers the differences between the two.
SSD is a whole other ball field and, personally, I'm more HDD biased than SSD.
I still see SSDs as a brand new technology that has yet to normalize as much as HDDs have in the past 15 years. However, the article does make me realize that SSDs really won't normalize as much as HDDs have due to how vastly different they work. So, I'll have to give SSDs some slack in future customer builds.
In the beginning, the issue with SSDs were the write speed, but now the focus in this sector is on the controller chip (ie. Sandforce) more than the storage medium (SLC vs MLC) the SSD uses.
It looks like having the SSD/HDD combo in a PC is still the best way to go until the storage chips drop considerably.
Previewing comments are for sissies!
Tom's Hardware has a rather sordid history of... biased reporting.
I don't think it's really fair to say at this stage that SSDs aren't more reliable than hard drives.
For one, SSDs are still rather new. Yes, they've been around for a few years but compared to hard drives they are still at the beginning of their development cycle, and it shows: firmware issues and recalls, as stated in the article, may be a heavy contributing factor to the SSD failure rates. We can expect this to drop as manufacturers continually revise their firmware and manufacturing techniques for the better.
For another, the article also notes that the SSD failure rates, to this point, are rather constant. If this trend holds, SSDs can easily outstrip HDDs in reliability by the 3rd or 4th year.
Finally, SSD has been coined reliable often in the perspective of the average consumer. The benefits are obvious: when mishandling occurs (which happens much more often than you'd think, even on desktops), HDDs will have a far higher chance of damage.
Hence, while the results of this article is indeed interesting, perhaps a study done when the technology matures further would be more useful.
I'm sure that some of us maybe has ever thought about this idea.
This idea wil come true if :
1. the speed of our telecommunication line is fast enough,
2. our telecommunication's charge is low enough.
The basic of this idea is :
There is a storage service provider that leases its storages to the individual/corporate customers, using internet. So that the customer does not need to bring their notebook with their harddisks or other storage devices. They just bring an input-output device (keyboard, CPU, RAM, monitor, modem), a telecommunication device (handphone, radio link, etc), and maybe a printer. Customers access this remote storage as if they access their local harddisks.
The advantages of this system are :
1. Customers will not worry about the lost of their data due to lost/damage of their notebooks. They just buy another notebook, and then connect again to the storage service provider.
2. Customers can use any computer from anywhere at anytime to access their data. They don't need to bring their computers when they go abroad.
3. Customers can count on their storage service provider. The storage service provider must guarantee : backup data system, virus free, security of data, newest version of application softwares, etc.
4. The price of the computer will go down.
The disadvantages are :
1. Speed will slow.
2. Telecommunication's charge will go up.
3. The security of data.
I'm sure that this idea will become a standard in the future. With hard disk failure rate not improving, is better to store all of one's data with offsite provider so is accessible everywhere and backed up on probably redundant disk array tape drive backup tape.
Better yet, holographic storage with probability matrix encompassing all permutation of your data, so is always already there haha.
WhereTF are dotted line projections coming from? Where is the comparison HDD lines?
I have had laptop hard drives fail on me because of the hits they take in my bicycle carrier or on my lap as I bounce along in the shuttle on city roads. My SSDs have yet to fail. Never again will I get a hard drive in my laptop, because of reliability.
I have over 100 Intel 320 and X25-M drives in my organization and not had one fail yet.
Think of flash memory as acid/base chemistry: a one is stored by pH much lower than 7, a zero is stored by a pH much greater than 7. The reaction is confined to pores in a pumice stone. In order to reduce cost, pumice stone with increasingly small bubble cavities (and mineral wall thickness) has been pressed into service.
By the laws of solid state physics, this makes acid/base pumice stone inherently more reliable than magnetic domains spinning on a fluid bearing.
The bottom line here is that every SDD die shrink generation is an entirely new set of loaded dice. Due to the incredible churn rate in IC fabrication technology, no SSD product remains in the market after establishing a solid track record.
It could be that Intel SSD products are like the Staal brothers. Or not. If you're willing to average over a flock of white swans and black swans, it could well be more reliable than HDD storage.
I find the underlying variance frightening. The maturity model sucks, because as fast as they figure out one problem, the problem is immediately replaced by a harder problem, as per Douglas Adams.
The candle that burns twice as bright burns half as long, and you have burned so very, very brightly Roy...
Let me summarize:
A) Chart is worthless. I have never see a more ambiguous meaningless chart in my life. They might as well not bother to label things.
B) Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation.
C) Results in either case depend highly on specific device model and configuration.
The poster probably saw the chart, as they seem to have actually read the article in addition to merely glancing at a picture on the last page. Right below that graph:
There are other numerous quotes as well about MTBF not being equivalent to reliability, correlation to vintage, etc.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Based on numbers, the study shows SSDs to be more reliable than HDDs. The best data I have seen in that article is the following:
SSDs: 1.28--2.19% over 2 years
HDDs: >=5% over 2 years
The HDD data comes from: http://media.bestofmicro.com/2/N/289103/original/google_afrtemputilization_475.png The SSD data comes from the table on Page #6.
I don't think any of this data is particularly surprising, HDDs are mechanical so the curves for failure would not be linear. The most interesting part of the article for consideration with SSDs is that SMART is going to be near useless for them. Since most failures are random occurrences in electronics which SMART isn't good at detecting, we may need better technology for detecting SSD failures.
it is also a lot easier to retrieve data from disc then SSD that most of the time go without warning
According a a perfectly baseless linear interpolation on several charts, SSD have similar failure rates as HDDs. Just great... Call me back when we have 5 years of solid data, not just conjectures et inference.
Stupidity is the root of all evil.
Referring to the chart on the last page the HDs look linear at the beginning too. I guess we don't really know yet. What if the SSD start failing at a higher rate at three years.
Reply to un-do accidental moderation. Apologies.
A waste of time. Read some of the FAQs people make on how to get the most from an SSD with a Windows machine and keep it reliable. There is a laundry list of 5-20 things to do and maintain and monitor. All for what? Benchmarks and the cool factor on your freaking desktop? What are these people doing with their desktops that the 50% bootup time reduction and speed of SSD as the boot drive is that important and worth the hassle and money to get that "performance" edge, and was the SSD the major bottleneck that could not be overcome with other hardware at a reduced cost?
I'm not against SSD, I maintain SANs at work with tiered storage including SSD and we have found some improvement in certain circumstances where disk IO/latency was a bottle neck, for disk cache it is an advantage for the rest of the systems. For us, the ROI was questionable compared to other routes we could have taken and if our SAN vendor had not sweetened the deal and lead us in that direction, it would not have even been close.
I want to know if SSDs are more reliable than HDDs in an environment full of cat hair. I've never had a SCSI HDD outlast its warranty.
If you read the article you'll understand that the AFR for the only two years of data they could get their hands on for SSD shows the reliability is not different from 2 years of reliability of enterprise drives. Since there was no information after two years you cannot draw a conclusion of the AFR.
glad Im not going bonkers the chart did say the opposite....
I read TFA, and to me, it seems like a bunch of samples that aren't necessarily comparable nor do they necessarily agree with each other.
This is interesting as a starting point, and I applaud Tom's Hardware for the effort they have put into this article, but I think we will need a lot more data before we can get meaningful conclusions.
What did surprise me, though, are the return rates on hard disks. Multiple percent in a single year seems high to me! I'm glad I'm not in the hardware business.
Please correct me if I got my facts wrong.
We don't buy SSDs because they are more reliable (they don't seem to be in our large RAID arrays), but because they are faster than HDDs.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
That's my take. And unlike a hard drive, firmware is something which can be continuously improved. SSD manufacturers are starting to understand and deal with the failure modes.
One thing they don't mention is off-line storage. If I take a hard drive out of service and store it on a shelf for a year, it's virtually guaranteed to fail when I power it up. That is, every single HDD I've taken off the shelf will tend to work for a short while, long enough for me to get the data off of it usually, but every single one has failed within a month of being repowered.
I expect over the next few years the combination of firmware improvements and flash improvements (that is, improvements in being able to predict a flash cell failure) will result in the SSDs running away in the reliability department. Hard drives have been around for a long time and yet they still fail at a horribly high rate... too high for the higher capacities they now have. Intel has certainly already seen the light.
Several vendors are now putting hi capacity caps in their SSDs to remove one common failure mode... exploded meta-data/table table due to unexpected power downs, which is a particular problem for SSDs which use idle time for wear leveling activities.
One thing for sure, we are going to get some excellent statistics over the next decade.
-Matt
Any thread on SSD failures should include a link to Jeff Atwood's blog entry on the topic:
Full post here: http://www.codinghorror.com/blog/2011/05/the-hot-crazy-solid-state-drive-scale.html
You are using the wrong sort of cat.
Put the disk in a box with SchrÃdinger's cat .. then the drive can be both alive and dead at the same time as the cat ;)
--I thought I was wrong once, but I was mistaken.
I think a good question is how do HD's fail vs how SDD's fail.
There are two distinct ways that an HD can fail, either the circuitry on the PC board goes bad or what's inside the sealed chamber goes bad.
In the former case the data SHOULD be recoverable. In the later case there are three possible failure points, the platter motor, the head stepper motor, or the heads themselves. In the first two cases the drive could be repaired and the data salvaged, but it will take more effort and money to do so. If the heads themselves failed (with a resulting crash) the platters are likely destroyed and with them the data. Note that with the loss of the platter motor the heads will come to rest on the platter at some point, hopefully a soft landing or they have been retracted first by the drives protection features. If the head position servos died it's possible that some areas of the platters were damaged if the heads didn't come to a soft landing (depends on if the drive's electronics will refuse to spin up the platter if the servos are bad).
SSD failures are mostly in the storage itself. Maybe special firmware can be downloaded to recover something, if anyone has figured out how to do this.
Oh boy, did /. make a mess of Schrodinger's with an umlaut :)
I must preview before I submit!
--I thought I was wrong once, but I was mistaken.
Wouldn't you expect to see a bathtub curve for each item in the graph. Failure right at the very beginning at year 0 there should be a higher than 0 failure rate that decreases before it begins increasing again. Or is that part of the chart obscured by manufacturer burn in that you do not get to see because year 0 is actually after burn in?
Could someone explain why the failure rate for HDDs is a curve and the failure rate (so far) is a line?
The failiure of firmware seems most prevalent in SSD drives if you read the forums. Every days folks are bitten by firmwares from all the companies. Better to stay with slow mediums for a while longer.
Are like wine, & "get better with age"? I doubt it, & especially NOT if they're based on FLASH memory (trim & garbage collection types notwithstanding)... to wit:
"Lets do a reliability study on SSD's that they don't have any long term data on past 2 years, yet compare it to HDD that typically at least have a 3 year warranty. By that I only mean, I'll go out on a limb and guess that the average failure rate of HDD is > 3 years, if only for economic self preservation." - by DarthVain (724186) on Friday July 29, @10:12AM (#36920924)
I have found WD HDD's to be EXTREMELY reliable, both on the job & @ home (I still have WD 212mb -> 242mb "Caviars" from 1992-1994 iirc, that STILL RUN HERE no less in fact)...
In fact, my disks @ home tend to run a lot longer than 3++ yrs. here, easily (I've only had to "trade in" 2 WD Raptors, ever, since their inception/release in fact & I run 4 of them & have had a total of 6 of them (& when they "bit it", it was RIGHT AWAY outta the box only, not after usage (just lemons)).
WD Raptor/Velociraptors, imo & experience with many oem's models & types? Best there is, or rather, best I have ever tried!
Yes... they're good stuff, but, co$tly ( but... you DO get what you pay for!) & they absolutely "HAUL A$$" to boot!
I also defrag & turn up caching OS side, "to-the-max" as well, plus run my disks (Velociraptors with 8mb cache onboard in the form of buffering) also thru a Promise Ex-8350 PCI Express x16 slot with 128mb of ECC caching RAM on them...
Yes, caching... it's beneficial not only for performance, but also, longevity!
I feel caching data from HDD's helps to stall off excessive head movements (which I feel IS the "death of disks", or a major contributor thereof).
Fact is, I've been using Caching controllers on drives @ home ever since 1993-1994 iirc (first was a TekRam with 16mb of 30 pin FastPage RAM on it in fact, on Windows 3.11 + DOS).
I had them for machines that had ISA, VLB, & lately PCI Expresses bus lanes/circuits on them... for a long period, I could NOT find one that ran for the PCI 2.2 bus though (not even SURE they had them then, probably did, but I did not have them @ least during PCI 2.2's "reign").
Caches - They help keep "data in ram" as long as it's not "flagged dirty" along with OS side diskcache kernel mode subsystems, & alongside defrags, keep head movements down (or less)... smart move for SPEED, and imo also? Longevity too!
I also use my Gigabyte IRAM 4gb TRUE SSD (based on DDR2 RAM & SATA I/II bus) &/or CENATEK "RocketDrive" (old faithful here this one) to OFFLOAD my std. HDD's, by moving these things to them:
---
A.) Pagefile.sys placement (1/2 of 4gb IRAM in own partition)
B.) WebBrowser cache, history, & actual browser program placements
C.) Print Spooler location
D.) %Comspec% location
E.) %TEMP% and %TMP% ops for OS + Apps
F.) Operating System & Application Event Loggings & logging in general ... and more!
---
Thus, lessening work duties performed on my HDD's resulting not only in speeds of those processes/activities being faster because they are on a FASTER media, but also by lessening workloads of my 10k rpm 8mb buffered Velociraptors too (dual bonus for both speed, AND LONGEVITY)...
* Think about it!
So, anyhow/anyways:
As far as studies like these?
Well... unless I have SOLID long-term proof of SSD longevity, especially FLASH based SSD tech based ones?
Hey - I just don't see FLASH memory based products outlasting std. mechanical HDD's even, & certainly not "True SSD's" types (more on that below) not based on FLASH ram...
APK
P.S.=> Now, I actually have proof of "superior longevity" (certainly superior to FLASH units), & from a TRUE SSD (NOT on an SSD based on FLASH) in a CENATEK 2gb PC-133 SDRAM "RocketDrive" unit, which still runs here flawlessly, & has since late 2002 in fact!
(In fact? Well - I'd wager since it has NO MOVING PARTS, it will even outlast my far, Far, FAR NEWER WD "Velociraptors" in fact)...
... apk
For example:
Steadfast Networks' Karl Zimmerman, as quoted from bottom of this page (emphasis added)
We simply get significantly higher I/O [with SSDs] at a lower cost than we'd be able to get with standard drives. We've had many customers needing more I/O than what 4x 15k RPM SAS drives in RAID 10 provide, and an upgrade involves moving to a larger server chassis to support more than four drives, a larger RAID card, etc. Other configurations have needed 16+ 15k RPM drives to get the necessary I/O. Going with a single SSD (or a couple SSDs in RAID) greatly simplifies the configurations and makes them much cheaper overall.
That is then compounded by the fact that you generally use one SSD to replace 4+ standard drives on average.
You're then looking at a 20%+ AFR with hard drives and 1.6% with an SSD.
(AFR = Annual Failure Rate)
What I care about is MTTDL (Mean Time to Data Loss) of a complete system. Hard drives are unreliable as is any component in your computer. That's why you should make at the very least every individual component your data path follows double or triple redundant. This is easily accomplished these days, ZFS, RAID6 etc. Make sure you don't just rely on the mechanical parts of your system to keep your data safe.
What else you want to know is undetected data corruption. Hard disks are very bad at keeping data, SSD's may wear out too (as in the end there is still a physical/mechanical process) but you might be able to read the cells even if you can't write to it. On average hard disks will give an uncorrected error every 12TB read.
Besides that, SSD's give way more IOPS than any hard drive available (even the 15k RPM ones). So you can keep the system's MTTDL much lower as you have less parts and for the same price you could even invest more in redundancy (double mirror an SSD) instead of a set of RAID10's.
Custom electronics and digital signage for your business: www.evcircuits.com
You might find parts in an SDD drive which you can use for pinning stuff to your fridge. But anyone who ever put a hard drive magnet on his fridge knows that nothing can hold your data as reliably as a hard drive magnet.
By definition anything that contains powerful magnets is cool.
(Oh, and a fast spinning BLDC motor.)
Shiny platters!
As a RAID system engineer, it's good to see that I'll still have a job after disks stop spinning.
The only data they could possibly have that is that old is data on some of the earliest mass produced consumer SSD drives. These were first generation products and logic would dictate that the drives being made today would be far more reliable. I think it's too early to try and draw any conclusions, everyone knows there were lots of problems with the first generation of drives, like most first gen products (eg - lack of TRIM, stuttering with the early Indilinx controllers, etc).
But... But... Where???
Do you keep your critically important hosts file?
I'll go out on a limb and guess its loaded from the fake SSD to the real SSD, and kept in the cache as well?
Please Advise!
Exactly! If you just spent 3 hours installing and updating your system, what better time to make a backup image!
It would be my second, the first being after install, before updates.
It's just too easy not to!
Because of 0 ms access/seek speed, the 1st part of the File I/O Seek/Open/Read-Write/Flush/Close Cycle...
* AND then, once my HOSTS file has been operated on once?
It's then cached into RAM by the local diskcache subsystems running in kernelmode (Ring 0/RPL 0) after it's initially read the first time, for even better speed!
(I have said this before here many times troll and I've used it to "blow away" naysayer trolls before in fact. You're obviously one of them, so... Glad you remembered it)
APK
P.S.=> A pity that the best you have is your "ac trolling" though... & nothing that could ever disprove points in things in technical in computing I write about here!
... apk
As someone involved in reliability and usable lifetime of both technologies, let me start by saying: HDD vs. SSD is apples to oranges in the first place. Each has radically different fundamental physical failure mechanisms so you can't strictly compare them. The fact that the chart linked has exponential HDD and linear SSD lifetime curves is a hint of this. Fundamentally, HDDs fail proportional to spin-up time while SSDs fail proportional to data volume so you get different results. Power down and park an HDD and you get longer lifetime. Write-erase less data or fewer times on an SSD and you get longer lifetime. Not the same use model so you get different incomparable results.