Power-Loss-Protected SSDs Tested: Only Intel S3500 Passes
lkcl writes "After the reports on SSD reliability and after experiencing a costly 50% failure rate on over 200 remote-deployed OCZ Vertex SSDs, a degree of paranoia set in where I work. I was asked to carry out SSD analysis with some very specific criteria: budget below £100, size greater than 16Gbytes and Power-loss protection mandatory. This was almost an impossible task: after months of searching the shortlist was very short indeed. There was only one drive that survived the torturing: the Intel S3500. After more than 6,500 power-cycles over several days of heavy sustained random writes, not a single byte of data was lost. Crucial M4: failed. Toshiba THNSNH060GCS: failed. Innodisk 3MP SATA Slim: failed. OCZ: failed hard. Only the end-of-lifed Intel 320 and its newer replacement, the S3500, survived unscathed. The conclusion: if you care about data even when power could be unreliable, only buy Intel SSDs."
Relatedly, don't expect SSDs to become cheaper than HDDs any time soon.
"after experiencing a costly 50% failure rate on over 200 remote-deployed OCZ Vertex SSDs"
Stop gloating about how you got the good batch of OCZ SSDs! Some of us weren't so lucky....
AntiFA: An abbreviation for Anti First Amendment.
and get a UPS. Why blow more money on a slightly more reliable SSD when a UPS is so much cheaper?
These things are already expensive; surely spending a few more cents per unit on a capacitor to ensure power loss reliability isn't a big deal.
The cap only has to be big enough so the controller can do a controlled shutdown.
Slightly more seriously than my last post, the S3500 was the only enterprise-grade SSD tested in that batch. Frankly, I have little sympathy for you if you expected consumer-grade SSDs to perform like Enterprise-grade SSDs in a mission-critical application.
Consumer grade drives, even/especially the "high performance" ones that will often benchmark better than the "overpriced" enterprise drives, ain't designed to have perfect data retention. Of course, consumer or enterprise, any drive can fail and appropriate measures including RAID and backup* should always be in place no matter what type of drive you have.
* Yes, RAID != backup, I know, don't bother making that post.
AntiFA: An abbreviation for Anti First Amendment.
If I were to pull the plug on a consumer grade mechanical hdd in the middle of a write, would it not lose data as well?
Does this mean the write-cache is NAND too? I do not see that in the features for the SSDs they selected.
Also, why was Samsung excluded? Their 800 series with RAID support has been tested in the past with long term writes with great results.
http://us.hardware.info/reviews/4178/10/hardwareinfo-tests-lifespan-of-samsung-ssd-840-250gb-tlc-ssd-updated-with-final-conclusion-final-update-20-6-2013
I do not mean to plug a particular brand, but the range of SSD's tested in the articles does not seem very expansive nor do they seem to fit into the criteria they specify.
If you have important data don't store it on an SSD drive. I own decent size small company which ships lots of systems with the better drives (not Intel) with comparable user satisfaction ratings to Intels SSD drives and they certainly aren't that terribly reliable. They are much better than the junk SSD drives, but for real reliability stick with the 7200 RPM or 5400 RPM drives. Sadly the 7200 RPM drives are dead now. Nobody makes them for laptops. I guess the next best thing for speed + a little more reliability is Intel SSD.
Original research by someone whose identity I can't look up. Hmm.
I'd trust every conclusion except the one that pretty blatantly advertises Intel. I guess that means Toshiba might be worth looking into.
There's still one 720RPM laptop drive, I just bought a 1TB 7200RPM HGST drive recently...
That said one of the newer Seagate drives scored faster in a speed check. Not sure what to make of that.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Do it again OP with exactly the same parameters, but this time compare SSD's to platter hard drives.
Seven puppies were harmed during the making of this post.
I'm sure the reviewer tested what they had available, but I'm not sure I'd draw any conclusions from this list of drives. The drive that passes is the only current generation drive on the list. Everything else is last generation or older. In the case of the OCZ Vertex, much older. Most of the current popular drives seem to be omitted.
People who have "important data" and fail to make a backup copy - no matter which type of media they are using - deserve to lose their data. Seriously, what you said doesn't only apply to SSD's.
Seven puppies were harmed during the making of this post.
If you are worried about data loss during a power failure wouldn't the money be better spent ensuring there isn't a power loss?
UPS are cheap and reliable, and give you time to shut down.
Its interesting and good to know that the intel SSD survived thousands of powercycles while it was trying to work without losing a single byte of data. But my desktop SSD is on a UPS. And my laptop has a battery built into it. So a power failure affecting the SSD in the middle of an operation is pretty much unheard of.
> Isn't this why god created UPS?
When my UPS battery starts going bad, the first sign is that it just cuts the power without warning. If you have a SSD, that could be the deathblow that sends your data bye-bye.
The bigger question, though, is WHY THE FUCK can't we either disable whole-drive encryption, or at least set it to a key WE control, with some means to read the bits from even a drive that's totally nonfunctional SATA-wise (JTAG, SPI, whatever) and reconstruct it offline? That's why I despise Sandforce so much. As if it's not bad ENOUGH that Sandforce-based drives can just die from a single corrupted write, they have to go a step further and make it impossible for end users to do any kind of meaningful data recovery. There's NO REASON why a corrupted SSD should require thousands of dollars of commercial data recovery. If they'd just give us some fucking way to rip the raw bits from the drive, document the data structures, and give us control over the encryption, a fucked up SSD would just be an annoyance.
Steve Jobs created UPS technology?
You're missing the point of this advertisement. Only an Enterprise class Intel drive will save your data. All other factors of the test are irrelevant, like the other drives being consumer grade or that all the other drives were beaten with a rubber mallet for 5 minutes before each test while the intel was handled with silk mittens attached to 7 grounding point. And you definitely don't need to pay attention to the fact the power loss with the Intel drive was carried out via software shutdown while the other drives were done by power surging the computer until the motherboards burst into flames.
Nope, pay no attention to that irrelevant information. Just remember that only official certified and authorized Intel drives can protect your data. Now please wait while the next advertisement queues up, which will explain how the Intel drives protext your data with a computer rendering of the drive tucking your data into bed at night before turning off the lights.
I don't suffer from insanity, I enjoy every minute of it!
Correct me if I'm wrong, but from my skim through the article, it seems like he only used a single drive of each type. That makes it hard to demonstrate that the differences he saw were real, and not just random. I.e., it may be that all drives have a 75% chance of surviving the test, and that the Intel one just happened to be the lucky one. A more robust test would be to test N copies of each drive. N = 5 should give pretty good significance if this really is completely deterministic.
This is all a great theory, until the "data" in question is something like copy protection hackery that someone's high-end software puts on your SSD boot disk without necessarily telling you anything about it.
The only time I had an SSD failure, the hardware guys were great and got a replacement to me the next day, while it took literally weeks (and, in the end, a recorded letter threatening legal action) to get Adobe to let me use the software I already f**king bought on the same f**king PC it was always installed on, after I'd reinstalled everything on the replacement SSD.
If that had been an isolated occurrence, I might be willing to drop the point, but since I know of others who have also been screwed by Adobe's DRM/copy protection mess after a drive failure and I also know of other high-end software providers who play similar games, I don't think "just back everything up" is a good enough answer to unreliable drives. A drive failure typically costs some of us at least an order of magnitude more than just replacing the hardware itself once you factor in downtime, and we shouldn't have to mess around with RAID arrays of SSDs just to compensate for poorly designed products that fail unnecessarily.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
That it is losing data outside of the data being written.
Some SSDs are notorious for the firmware's block tables getting corrupted if they're suddenly powered off. Unlike a hard disk, what this means is they could potentially be writing under the assumption that the set of blocks they're reading/writing are meant for an entirely different set of sectors than they actually contain. IE massive data corruption because you're not getting back the data you're assuming you will. Due to the write limits of Flash, the SSDs are basically constantly shuffling the window of writable sectors in order to do 'wear levelling', which means if anything disrupts that process and they're using either old or new physical block locations with the old logical ones, your data may not be ending up as it should be.
If I were to pull the plug on a consumer grade mechanical hdd in the middle of a write, would it not lose data as well?
My only guess is that they're looking at it from the point of view of file system corruption with journaling filesystems, and whether or not stuff committed with sync() is actually safely stored on the drive at that point in time or not. However, the poor way in which the author describes this (assuming it's what he's attempting to describe at all) seriously makes me wonder why I should trust that he knows what the hell he's doing.
Some years ago while discussing design of a journaling filesystem with someone in a newsgroup, we were wondering whether sector writes to a hard disk could be expected to be atomic or not. Once a drive has begun writing to a sector, there's a very tiny amount of time it has to keep power in order to finish the sector, which would seem trivial to store in a capacitor, and with some added circuitry, the momentum in the spindle could supply some power as well. Not to mention that attempting to read a half-written sector is going to cause all sorts of hell for an algorithm that assumes there was a full sector there and so it should be able to error correct it into something meaningful, and might cause it to prematurely declare the sector dead and remap it. So it seemed a bit silly to think that a hard disk wouldn't be able to check its power status between sector writes and simply avoid beginning one which it wasn't going to be able to finish reliably, and this would allow someone to utilize this fact when designing a journaling filesystem since they could at least count on any sector they read to contain valid data even if they couldn't count on whether it was current data or old data. For example, each sector of the journal might have an index number allowing old entries to be distinguished from new ones without worry that the drive died half-way while writing the sector, thus causing it to begin with a recent index number but contain older data at the end. Of course, neither of us knew if this was true of how drives worked or not, but one random person took the time to reply simply "sector writes are atomic" for whatever a random person's word is worth.
Solid state drives have a similar issue in that once they begin rewriting their data structures, if they don't finish, then the data on the drive is going to be rather fucked, particularly since they don't work on sectors like traditional hard drives, but rather, each page of flash holds many sectors, and they're not even in linear order but instead there are wear-leveling algorithms in play. So even when the OS asks the drive to sync(), in the interest of speed, since it will have to combine the sectors written with other sectors and additional wear-leveling data before committing it to flash, it's likely in its interest to lie to the OS and say "OK, it's done" when in fact it's merely committing to writing those sectors before it shuts down even if power is cut immediately. Obviously there are a lot of ways to screw up such a commitment and be unable to deliver upon it, and I assume that's what the author of the article is testing.
...but, hell if I know. It'd be nice to hear from someone who actually knows about these things.
SSDs were made to replace harddrives. what happens when you unplug power from harddrives in the middle of a write?
Hard drives do not constantly re-arrange pages into newly-erased blocks, and so do not constantly have to update the mapping of logical blocks to physical location, so with power removed will most likely just drop whatever file data is in cache, instead of dropping the mapping update like an SSD which potentially results in massive corruption.
Given that the 840 EVO only came out this summer, both those drives are still under warranty.
So why didn't you get them replaced?
Lots of people are using those drives without issue. It sucks that you got two bad ones, but it's hardly representative of the drives as a whole.
Or if you really don't want to deal with them, take them out of the 'garbage bin' and give 'em to someone who'll do the RMA themself for a free drive.
120 gig version, Randomly hangs for anywhere from 30 seconds to 2 minutes at any given, access lights go on, and the computer becomes more or less non responsive, I can see the mouse cursor move, but no dice for anything else. Have tried reading forum advice and disabling certain power management settings, same problem. No firmware updates, and it's slow. My daughters WD 500 gig blue edition is damned near as fast loading levels in games. Pure waste of money, I'll never buy another SSD.
I understand that the reviewer was restricted by the ultra-low price point set by his employer, but the result is that this is a really poor selection of SSDs, many of them obsolete, and is not particularly reflective of the market today. For instance, he reviewed the Crucial M4 (release date: early 2011), but not the newer Crucial M500, which according to reviews has both RAID-style NAND redundancy and a bank of capacitors to protect against power failure. The M500 isn't even all that expensive on a per-GB basis, though it isn't available in the ultra-small sizes the reviewer apparently needed because of his very limited budget.
There are other, even more glaring, omissions. No mention of any Samsung drive? Nothing from SanDisk? These are two of the biggest SSD vendors, and both have a good reputation for reliability. Leaving out their products makes this roundup almost worthless.
The SSD market is advancing so fast that reviewing drives over 2 years old is going to give an extremely misleading impression of the current state-of-the-art.
HDDs, even the cheapest ones nowadays, allow the software to enforce the order in which pending data is written to safe permanent storage and software to known that pending data has indeed been safely committed to permanent storage.
The operative systems, file systems and applications build upon this to ensure that, in case of an unexpected crash, you don't end up with a corrupted file system or data. You may lose files created in the last 5 minutes, but you won't end up with a file system so corrupted that you need to re-install your computer.
Databases uses this to ensure that, once you've clicked "pay" in a e-commerce site, it will either record it properly or not at all, so you don't end up with half-way situations where you get charged and don't get the product you paid for or vice-versa.
According to reports like TFA and the article TFA was attempting to reproduce, a lot of cheap SSDs break this guarantees.
That's cute, but approximately 100% of professionals working in graphic design would disagree with you. If someone else made products anywhere the level of Creative Suite and with better customer service than Adobe, plenty of us would use them.
If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
I can, but you won't be able to decipher it without the magic seer stones of Itsajokelaughdamnit.
Those seer stones are also good for translating ancient gold tablets that detail how to create your own religion.
I don't suffer from insanity, I enjoy every minute of it!
Depends on the kind of documentation you're asking for.
The behavior is described in the HDD interface standards (ATA/SATA, SCSI/SAS).
The interesting bits are the description of the desired behavior of
- write through caches
- write back caches and FLUSH CACHE EXT or SYNCHRHONIZE
- write back caches and FUA or DPO
If you want documentation on how many drives support and honor this behavior, then I can't give you much pointers.
I don't think there's a SATA HDD in the market which doesn't support and tries to honor FLUSH CACHE EXT. Many support FUA/DPO.
Bugs are known, but seem rare: http://forums.seagate.com/t5/Desktop-HDD-Desktop-SSHD/ST3250823AS-7200-8-ignores-FLUSH-CACHE-in-AHCI-mode/td-p/82046.
The PostgreSQL folks keep a page with some information about this issue.
http://wiki.postgresql.org/wiki/Reliable_Writes
They recommend a test for drives.
Sure, of course. But all the UPSes in the world aren't going to help when a capacitor on that particular system's motherboard pops.
Karma: Terrifying (mostly affected by atrocities you've committed)
I tried to punt the details here toward the references provided, but you raise a good question: why not just use the lifetime percentage exposed at attributed 202/0xCA "Percent Lifetime Remaining". There's two problems with that data.
First off, that SMART attribute hasn't been consistent since the drive was released. See M500 960GB MU03 SMART Issue as one observation about the biggest firmware change. I believe that happened after the Tech Report review. The fact that Crucial changed exposing wear data over the life of the drive is itself enough to get it booted from some companies as an immature product.
But let's say you consider that ancient history now. The other side of the complaints here is that the M500 doesn't give wear data in terms of bytes written. If you have two M500 drives that show identical wear data as measured by 202/0xCA, what does that tell you about their respective workloads? Unfortunately, it doesn't tell you anything useful for that purpose without more context. And that's a critical failure for the standard way such things are rated and evaluated now.
Intel publishes white papers for the recommended drive in TFA like DC S3500 Series RAID Workload Characterization, and that gives a lot of data about how to compare production deployments against drive specifications. I did exactly that for their earlier drives in the blog article I referenced.
There's just not quite enough data available from a Crucial M500 to do a similar analysis on it. "Erase count" is really an implementation detail specific to the drive; you can't compare those across different manufacturers. The most useful standard that aims to eliminate the workload specific aspect from lifespan ratings is JESD218. That also looks at lifetime in terms of terabytes written. There are some really fundamental detaisl that so far seem missing on Crucial's drives. You can back out write data from some of the other statistics, but without a hard published spec for such things I don't consider that very useful.