TRIM and Linux: Tread Cautiously, and Keep Backups Handy
An anonymous reader writes: Algolia is a buzzword-compliant ("Hosted Search API that delivers instant and relevant results")
start-up that uses a lot of open-source software (including various strains of Linux) and a lot of solid-state disk, and as such sometimes runs into problems with each of these. Their blog this week features a fascinating look at troubles that they faced with ext4 filesystems mysteriously flipping to read-only mode: not such a good thing for machines processing a search index, not just dishing it out.
"The NGINX daemon serving all the HTTP(S) communication of our API was up and ready to serve the search queries but the indexing process crashed. Since the indexing process is guarded by supervise, crashing in a loop would have been understandable but a complete crash was not. As it turned out the filesystem was in a read-only mode. All right, let's assume it was a cosmic ray :) The filesystem got fixed, files were restored from another healthy server and everything looked fine again. The next day another server ended with filesystem in read-only, two hours after another one and then next hour another one. Something was going on. After restoring the filesystem and the files, it was time for serious analysis since this was not a one time thing.
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
I suggest we call it SNATCH.
If you were me, you'd be good lookin'. - six string samurai
I'll Google in a moment, but I was wondering if anyone knew of any good sites that maintain lists of good/bad SSDs for Linux. With the number of vendors out there nowadays, having to scan the source seems like a poor way to track the information.
I do not fail; I succeed at finding out what does not work.
This is why Apple doesn't support TRIM in third-party SSDs...
The only TRIM use I recommend is running on it on an entire partition, e.g. like the swap partition, at boot, or before initializing a new filesystem. And that's it. It's an EXTREMELY dangerous command which results in non-deterministic operation. Not only do SSDs have bugs in handling TRIM, but filesystem implementations almost certainly also have ordering and concurrency bugs in handling TRIM. It's the least well-tested part of the firmware and the least well-tested part of the filesystem implementation. And due to cache effects, it's almost impossible to test it in a deterministic manner.
You can get close to the same performance and life out of your SSD without using TRIM by doing two simple things. First, use a filesystem with at least a 4KB block size so the SSD doesn't have to write-combine stuff on 512-byte boundaries. Second, simply leave a part of the SSD unused. 5% is plenty. In fact, if you have swap space configured on your SSD, that's usually enough on its own (since swap is not usually filled up during normal operation), as long as you TRIM it on boot.
-Matt
Just from reading the summary it's clear that it should be:
TRIM and SSDs: Tread Cautiously, and Keep Backups Handy
see ata_blacklist_entry
(reformatted to get past Slashdot's 'junk' filter)
static const struct ata_blacklist_entry ata_device_blacklist [] = {
see ata_blacklist_entry
static const struct ata_blacklist_entry ata_device_blacklist [] = /* Devices with DMA related problems under Linux */ , , , , , , , , , , , , , , , , , , , , , , , , , , , , /* Odd clown on sil3726/4726 PMPs */ /* Weird ATAPI devices */ , , , /* Devices we expect to fail diagnostics */ /* Devices where NCQ should be avoided */ /* NCQ is slow */ , /* http://thread.gmane.org/gmane.linux.ide/14907 */ /* NCQ is broken */ , , , , /* Seagate NCQ + FLUSH CACHE firmware bug */ , , , /* Seagate Momentus SpinPoint M8 seem to have FPMDA_AA issues */ , /* Blacklist entries taken from Silicon Image 3124/3132 .inf file - also several Linux problem reports */ ,
WDC AC11000H, NULL, ATA_HORKAGE_NODMA
WDC AC22100H, NULL, ATA_HORKAGE_NODMA
WDC AC32500H, NULL, ATA_HORKAGE_NODMA
WDC AC33100H, NULL, ATA_HORKAGE_NODMA
WDC AC31600H, NULL, ATA_HORKAGE_NODMA
WDC AC32100H, 24.09P07, ATA_HORKAGE_NODMA
WDC AC23200L, 21.10N21, ATA_HORKAGE_NODMA
Compaq CRD-8241B, NULL, ATA_HORKAGE_NODMA
CRD-8400B, NULL, ATA_HORKAGE_NODMA
CRD-848[02]B, NULL, ATA_HORKAGE_NODMA
CRD-84, NULL, ATA_HORKAGE_NODMA
SanDisk SDP3B, NULL, ATA_HORKAGE_NODMA
SanDisk SDP3B-64, NULL, ATA_HORKAGE_NODMA
SANYO CD-ROM CRD, NULL, ATA_HORKAGE_NODMA
HITACHI CDR-8, NULL, ATA_HORKAGE_NODMA
HITACHI CDR-8[34]35,NULL, ATA_HORKAGE_NODMA
Toshiba CD-ROM XM-6202B, NULL, ATA_HORKAGE_NODMA
TOSHIBA CD-ROM XM-1702BC, NULL, ATA_HORKAGE_NODMA
CD-532E-A, NULL, ATA_HORKAGE_NODMA
E-IDE CD-ROM CR-840,NULL, ATA_HORKAGE_NODMA
CD-ROM Drive/F5A, NULL, ATA_HORKAGE_NODMA
WPI CDD-820, NULL, ATA_HORKAGE_NODMA
SAMSUNG CD-ROM SC-148C, NULL, ATA_HORKAGE_NODMA
SAMSUNG CD-ROM SC, NULL, ATA_HORKAGE_NODMA
ATAPI CD-ROM DRIVE 40X MAXIMUM,NULL,ATA_HORKAGE_NODMA
_NEC DV5800A, NULL, ATA_HORKAGE_NODMA
SAMSUNG CD-ROM SN-124, N001, ATA_HORKAGE_NODMA
Seagate STT20000A, NULL, ATA_HORKAGE_NODMA
2GB ATA Flash Disk, ADMA428M, ATA_HORKAGE_NODMA ,
Config Disk, NULL, ATA_HORKAGE_DISABLE ,
TORiSAN DVD-ROM DRD-N216, NULL, ATA_HORKAGE_MAX_SEC_128
QUANTUM DAT DAT72-000, NULL, ATA_HORKAGE_ATAPI_MOD16_DMA
Slimtype DVD A DS8A8SH, NULL, ATA_HORKAGE_MAX_SEC_LBA48
Slimtype DVD A DS8A9SH, NULL, ATA_HORKAGE_MAX_SEC_LBA48 ,
WDC WD740ADFD-00, NULL, ATA_HORKAGE_NONCQ
WDC WD740ADFD-00NLR1, NULL, ATA_HORKAGE_NONCQ, ,
FUJITSU MHT2060BH, NULL, ATA_HORKAGE_NONCQ ,
Maxtor *, BANC*, ATA_HORKAGE_NONCQ
Maxtor 7V300F0, VA111630, ATA_HORKAGE_NONCQ
ST380817AS, 3.42, ATA_HORKAGE_NONCQ
ST3160023AS, 3.42, ATA_HORKAGE_NONCQ
OCZ CORE_SSD, 02.10104, ATA_HORKAGE_NONCQ ,
ST31500341AS, SD1[5-9], ATA_HORKAGE_NONCQ |
ATA_HORKAGE_FIRMWARE_WARN
ST31000333AS, SD1[5-9], ATA_HORKAGE_NONCQ |
ATA_HORKAGE_FIRMWARE_WARN
ST3640[36]23AS, SD1[5-9], ATA_HORKAGE_NONCQ |
ATA_HORKAGE_FIRMWARE_WARN
ST3320[68]13AS, SD1[5-9], ATA_HORKAGE_NONCQ |
ATA_HORKAGE_FIRMWARE_WARN ,
ST1000LM024 HN-M101MBB, 2AR10001, ATA_HORKAGE_BROKEN_FPDMA_AA
ST1000LM024 HN-M101MBB, 2BA30001, ATA_HORKAGE_BROKEN_FPDMA_AA ,
Windows driver
HTS541060G9SA00, MB3OC60D, ATA_HORKAGE_NONCQ,
HTS541080G9SA00, MB4OC60D, ATA_HORKAGE_NONCQ, ,
No we need a TRIM standard first.
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*
Its not the fault of TRIM... but Linux guys will code a fix for the offending hw before we can blink. Is this shady maneuvering at top levels of hardware design by competing OS parties to take cause Linux to take a reliability hit? Or just an oversight bug?
Just typical crappy SSD firmware. Writing 0s to the wrong fucking block will trip up any OS!
Don't buy Samsung SSDs.
It sounds like a kind of infection. The kind you get, you know, down there
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Seriously? In 2015? SSDs are "becoming" the norm?
"we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel"
???? SERIOUSLY???
I got some bad news for you, Sunshine.
LOL@vword: berating
Or you could boodmark the source file containing the list from the linux kernel repo ?
SSD's are great technology, but it's a race to the bottom. The manufacturers will work with you if you help them redesign their circuits (mostly by adding capacitors) to fulfill the original design specs. Everyone else, who can't force them to stop producing cheaper and cheaper garbage will just have to eat it.
It's fucking sand guys, how much cheaper can you get?
Linus is on vacation so you'll have to wait on your next SSD purchase for him to return to merge the patches....
Or if your going to use consumer ones vet the hell out of them.
No sir I dont like it.
I wonder if this issue has anything to do with why Apple only supports TRIM on specific drives they OEM?
I had the same exact thing happen with the consumer micro SD card in my phone. It went to read-only for no reason. I reformatted it but it still acts kind of flakey.
The classic landing strip is a safe default.
Is it too much to ask for the actionable piece of information to be in the summary, .e.g. the identity of the offending SSD's?
We have several servers with RAID'ed Samsung 840s and have had no issues for two years now. They are running on LSI RAID controllers with built in TRIM support, though.
So perhaps this is an issue with the software raid they used?
I have an 850 Pro at home and an 850 EVO at work, and haven't experienced any corruption. I know that Windows uses TRIM. Why am I not seeing any problems?
I doubt EXT4 or whatever part of Linux issuing TRIM commands is doing it wrong, but they're clearly doing it different, and maybe it can be worked around or at the very least reported to the manufacturer to fix broken firmware.
It takes a couple of links and searching through source code to get there. So here's the list of problematic drives, better formatted but still in regular expression format:
Micron_M500*
Crucial_CT*M500*
Micron_M5[15]0*
Crucial_CT*M550*
Crucial_CT*MX100*
Samsung SSD 8*
So, basically, all the ones I thought were the best. The list of whitelisted drives after it only includes those brands, Intel, and ST-something. So other brand may be unknowns.
(T>t && O(n)--) == sqrt(666)
From the link to the original article:
Broken SSDs:
SAMSUNG MZ7WD480HCGM-00003
SAMSUNG MZ7GE480HMHP-00003
SAMSUNG MZ7GE240HMGR-00003
Samsung SSD 840 PRO Series
recently blacklisted for 8-series blacklist
Samsung SSD 850 PRO 512GB
recently blacklisted as 850 Pro and later in 8-series blacklist
Working SSDs:
Intel S3500
Intel S3700
Intel S3710
See also https://github.com/torvalds/linux/blob/e64f638483a21105c7ce330d543fa1f1c35b5bc7/drivers/ata/libata-core.c#L4109-L4286
The Crucial MX100 with the latest MU02 firmware is now whitelisted by the Linux Kernel, and has it's TRIM ability re-enabled.
Correct title: "TRIM and Any Fucking Operating System: Don't Buy Defective SSDs"
It's not as if Windows or MacOS has any magic that makes queued TRIM work with non-compliant and poorly-coded hardware, right?
Seriously, WTF, over?
Welcome to the Panopticon. Used to be a prison, now it's your home.
ObPedant: those aren't regexes, they're globs. Otherwise (for instance), the Samsung entry would match
Samsung SSD<space>
Samsung SSD<space>8
Samsung SSD<space>88
Samsung SSD<space>888
.
.
.
ad nauseam: the "*" regex operator means "zero or more occurrences of the previous pattern", which in this case is the character "8".
At least, I hope they're not supposed to be regexes. Otherwise, the kernel blacklist itself will have some serious issues known-bad SSDs because someone never learned how to create a regular expression.
Welcome to the Panopticon. Used to be a prison, now it's your home.
There's also an upgrade path for Micron's older SSDs - I just upgraded my Crucial M550 from MU01 to MU02 using a bootable ISO from Micron's support site:
http://www.crucial.com/usa/en/support-ssd-firmware
Not directly an answer to your question, but related: after Googling for a bit I actually cannot find any mention of Samsung SSD 840 PRO having issues with TRIM under Windows. If it was, indeed, a controller - problem then it would have to happen under all OSes as long as TRIM is enabled, but all the evidence I'm finding only points towards to Linux or these guys own setup as being the culprit.
Linus is on vacation so you'll have to wait on your next SSD purchase for him to return to merge the patches....
What? "The most influential individual economic force of the past 20 years" gets to just wander off?
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
i never though the 30 percent of men who shave below the belt would be so heavily represented on slashdot
I'll Google in a moment, but I was wondering if anyone knew of any good sites that maintain lists of good/bad SSDs for Linux. With the number of vendors out there nowadays, having to scan the source seems like a poor way to track the information.
The source has the list the kernel needs. But what people need is a list of known good SSDs for linux. Sure you can avoid the bad ones. But if you're drive is not listed as bad, you don't know whether it's ok, or just hasn't been tested.
Linux probably tickles a bug in the SSD. This is the problem with things only being validated within Windows.
I'd be very interested if a Trim bug affects drives in a Windows install. There's a pile of machines out there with affected Samsung drives that need to be checked.
There is are two easy solutions to Ext4 vs. SSD problems. The first is ReiserFS which is still eminently usable on Gentoo. The second is UFS which is available on the BSD's.
I assume that Windows does not submit queued trim commands, thereby avoiding this problem.
You will only find SSDs from the very best vendors there... because the crap ones don't claim to support queued TRIM in the first place.
It is interesting that the Micron M500, *which is an enterprise datacenter SSD*, is listed. Rather bad PR for Micron, that: an enterprise datacenter SSD that corrupts data and has not been fixed?!
As usual, good PR for Intel... too bad their SSDs self-destruct based on a timer, instead of trying to soldier on until things actually get really broken (and only *then* self-destruct).
Until today, nobody has heard of these jokers.
Are they owned by Dice, too?
Even when implemented correctly, TRIM slows down regular I/O that happens around the time it's done. On top of that, you are risking OS and drive bugs that can vary with every incremental revision. You may not notice corruption until all your backups are overwritten, and just think of a hassle of restoring even once. Is it really worth potential minor performance benefits that are often realized by drive itself anyway?
I can think of exceptions like building a supercomputer with monolithic array of drives uses for disposable cache. But for individual users TRIM makes no sense.
I only have experience with customer grade SSDs and not with enterprise ones. But as it comes for customer SSDs most of the ones I've used or maintained caused no problems. But I recall one HP made drive that used crash after about a year - total data loss after a year of usage. Reformat and the drive was ok - another year passed and crash and data loss. As it turned out the disk had some encryption procedures in firmware which were faulty - firmware upgrade (hopefully) fixed it but also said firmware update required to erease all data. I've always had decent backups as monthly system image and daily data so recovery was easy. But I am aware that SSD drives are much less reliable than HDDs due to controller/firmware problems. And this is IMHO a general known fact.
Use BTRFS or any other COW file system (with modest overprovisioning) - and you won't need trim. That's due to absence of random writes (in case you're wondering)
The newest firmware updates for still supported Samsung drives enabled queued TRIM support as part of SATA 3.2 enablement, but the feature is supremely buggy. Issuing a queued TRIM at the wrong time and the SSD controller locks up, possibly eating some data in the process. And who knows what happens when it seemingly works.. Samsung support and other public channels will likely still deny this, but a fix should be in the works according to my sources.
Luckily for Samsung only Linux and possibly Windows 10 (but thats still beta) has started making use of this feature. Linux recently blacklisted the feature on Samsung 8*, but it has just started to come out in distributions update channels, and install media also need to get updated.
For the 840 EVO this was extra bad luck; the new firmware is required to get usable read performance, so everyone is updating, getting broken queued TRIM instead.
For the 850 Pro I've got 3 shipped with broken firmware from the factory from two different suppliers a month or so ago.
Until the dust has settled stay away from Samsung SSDs.
Wait, what?
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
The Crucial/Micron SSDs has been fixed. They are all shipping with fixed firmware nowadays and a updater is available for the rest.
Wait, what?
When Intel SSDs decide they are bad, they just brick themselves instead of going into read-only-good-luck-your-data-may-be-bad-mode. This probably makes sense for Enterprise RAID, and for absolutely no other use case.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
how is f2fs now-a-days?
Most drives work just fine under Windows. The problem is that the drives are developed and tested against Windows, so running them under Linux can cause all sorts of unexpected jibbery-jabbery.
> SSD drives are much less reliable than HDDs due to controller/firmware problems
Even some of our Dell enterprise SSDs that cost $3,758 for a 350 GB drive have data loss bugs. We still use 15k SAS drives for production databases, like our customers. We host more than a hundred other businesses in our colo, and I can't remember ever seeing a database server with SSDs. Well, I've seen them installed, but they're typically replaced within a few months with real hard drives. We do use SSDs on read-only slaves for reporting, and they work great for that...until they don't. I wish SSDs were reliable enough to use for that application so we could reduce power usage, AC, and noise, but they're not there yet.
Because Windows doesn't do queued TRIM.
TRIM in Windows and Linux before now worked more like this. -DATA- -DATA- -FLUSH ALL COMMANDS TO DRIVE- -WAIT- -TRIM- -DATA- -DATA- When I drive was doing the trim thing it could not do anything else, there could be no other in flight commands to the drive.
This is different. -DATA- -DATA- -TRIM- -DATA- -TRIM- -DATA- -DATA- -DATA-
TRIM is part of the NCQ and is an operation occurring with other instructions in the SATA queue. Problem is some disk manufactures have pissed this up. It seems likely that a firmware update will be able to fix this issue.
https://en.wikipedia.org/wiki/...
Yep freebsd is fine here with 840 pros.
I find it hysterical that it must be the drives attitude on this heavily biased site towards linux.
http://saveie6.com/
The drive's media wear indicator ran out shortly after 700TB, signaling that the NAND's write tolerance had been exceeded. Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power is cycled. Despite suffering just one reallocated sector, our sample dutifully followed the script. Data was accessible until a reboot prompted the drive to swallow its virtual cyanide pill.
09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
Well when FreeBSD gets around to supporting queued TRIM, people like yourself can thank Linux devs and users for getting the manufacturers to fix their firmware.
Some men choose to walk the paths others have created, while some venture forth and create their own.
We should use CUT. Although that might conflict with a builtin in the Gnu assisted shell.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
That's insane. Granted you're supposed to have backups, but if your machine crashes at that point, you'll lose whatever you're working on.
The article now contains an update stating that queued TRIM was not involved (that ordinary TRIM was to blame is also mentioned in the author's Hackers News comment) only ordinary TRIM. It also appears that the company's drives were enterprise Samsung PROs.
Do not buy intel SSDs.
Also their CPU chipsets are backdoored with VPro.
FTFS:
Is "SSD drive" grammatically anything like "PIN number"?
When it comes to IT, I'll gladly let the hipsters deal with the pain and the data loss for a 0.5% speedup and use whatever they came up with - and is still relevant - after 10 years. Thank you very much.
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
If your booting from the SSD, chances are the machine will crash...
Would be much better to just stay in readonly mode, and give you the chance to copy data off (and yes im aware this is no substitute for a backup, but think of the use case of a travelling laptop far away from its backup server etc).
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Update from TFA:
Broken SSDs:
SAMSUNG MZ7WD480HCGM-00003
SAMSUNG MZ7GE480HMHP-00003
SAMSUNG MZ7GE240HMGR-00003
Samsung SSD 840 PRO Series
recently blacklisted for 8-series blacklist
Samsung SSD 850 PRO 512GB
recently blacklisted as 850 Pro and later in 8-series blacklist
Working SSDs:
Intel S3500
Intel S3700
Intel S3710
You've never seem Windows code have you? =p
I could shove a dead Rat in the CPU slot and Windows would work with errors galore while Linux would actually tell me I have a dead Rat in the CPU slot. Linux exposes bad Hardware.
But if the drive was broken and someone had to write special software to fix it, how can you be sure that it was fixed correctly and completely? Can you also be sure that the "fix" works for all versions of firmware on the drive?
Because the fix is relatively simple.
To put in general terms:
- The problem is that the drive advertises a bunch of features. Linux tries to use them. But the firmware is buggy and the feature don't work or aren't even implemented.
- The fix is to ignore any advanced feature even if advertised by firmware. Stick to only the small subset of features that are also used in windows.
e.g:
- the most frequent problem with trim is that the device advertises supporting TRIM with NCQ (= reordering of commands).
(the latest firmware for Samsung 840 EVO started advertising this in addition of fixing the speed decay).
Linux *can* issue TRIM together with NCQ. So when the drives says it does, it will start using it.
But the drives doesn't work with TRIM and NCQ combined. There's a bug in the implementeation, or the firmware doesn't even support it.
(no Samsung 8?? actually support TRIM+NCQ. It's falsely reported as present by the firmware).
- Windows (and perhaps Mac OS X) aren't affected by this because they don't support it to begin with.
- Linux fix is to simply ignore the falsely advertised TRIM+NCQ. It reverts to either use NCQ, or use TRIM in a blocking (slow) way.
And that is a permanent fix because is simply reverts to a behavious similar to windows. No new problem should arise in Linux because it simply mimics the behaviour of Windows. If anything was still broken, it would be affecting Windows too.
It will work against any version of the firmware (Linux isn't tricking the firmware or trying to compensate. Just plain ignoring missing features).
Of course, the best would be to use 100% standard compliant SSD.
But reality is that not much *are* actually standard compliant.
So unless you're ready to shell lots of money for some actually enterprise-grade SSD,
accepting SSD with patched support is the next best option.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
it's not trim by itself that is problematic.
it's the combined use with NCQ (= reordering of commands).
The latest firmware (the one that fixes the speed decay) has started to falsely advertise support for this combination, whereas the drive doesn't actually support it.
The drive isn't actually able to re-order TRIM commands, and the wrong bit might end up being erased due to NCQ.
So this but will only show up:
- your drive is a Samsung 840 EVO (850 aren't affected by the speed decay and didn't get the faulty upgrade)
- if you have upgraded the firmware to the latest (faulty) version.
- you run an OS which is actually able to use TRIM+NCQ (Linux and BSD, basically)
- your OS also follows the standards (asks the drive what is supported and gets the false advertisement of TRIM+NCQ).
- you actually enable TRIM on the drive.
remove any one point of this chain and the bug doesn't happen.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
I have an 850 Pro at home and an 850 EVO at work, and haven't experienced any corruption. I know that Windows uses TRIM. Why am I not seeing any problems?
You're shielded from the problem because of 2 different things:
- Samsung 850 aren't as much affected by speed decay as Samsung 840. Thus a firmware fixing the speed problem was only shipped for 840s, not for 850s - and it's that firmware which had the problem. You drive simply didn't get the problematic firmware.
- That newest firmware falsely advertises that the drive supports TRIM together with NCQ. But the drive actually doesn't.Re-ordering should happen while TRIM is used.
Linux follows the standards: it asks the drive and only uses the feature that are advertised as supported. Because that specific firmware on that specific drive falsely reports supporting a missing feature, corruption happens.
Windows simply doesn't support TRIM with NCQ at all. The bug isn't triggered.
The current fix in linux is to put the drive on a blacklist and mimic Windows behavior by ignoring TRIM+NCQ.
So: The differences is that Linux uses TRIM+NCQ when instructed by the drive, whereas Windows doesn't. And your drive didn't happen to get the firmware that falsely report this.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
What, so now we are supposed to listen and believe the gossip of some kernel?
Windows is doing it wrong {..} When Linux tries to follow the standard and do it right, it gets burned.
Indeed, that right: Windows doesn't support TRIM+NCQ, whereas Linux does and will enable it if the drive reports it as present.
the ssd manufacture has 'modified' their firmware to work around it. {...} and the ssd manufacture has 'modified' their firmware to work around it.
In this case it's purely accidental. Samsung issued a firmware upgrade for some Samsung SSD to fix a problem causing a decay of speed as data ages on the SSD. That new firmware happens to falsely report support for TRIM+NCQ whereas it doesn't actually support it.
It's a bug left in a new firmware fix, not a tweak intentionally designed to work around quirks and bugs in windows.
The bug doesn't trigger in Windows because windows doesn't support the feature. Linux (and BSD ?) do and are affected.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Except that's irrelevant, the guys didn't use queued TRIM either. It says in the article itself that they used non-queued TRIM.
They more precisley said :
The TRIM on our drives is un-queued
Which is true.
Except that, recent firmware fixes from Samsung (you know, the whole "speed decay on aging data" fiasco) had suddenly started to falsely report support for TRIM+NCQ.
So it might be possible that unkowningly to them, their Linux installation has suddenly started to issue queued TRIMs, even if the drive actually don't support them, because it trusts what the firmware told to do.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
We have Crucial/Micron SSD's in RAID 10 configurations. We of course by them in batch. There's nothing like watching 16 of them all go "bad" at the same time, and not having a clue WTF is going on. Fixed via firmware, but glorious hell, it made my heart sink watching every drive just die in a 20 second window. Randomly and repeatedly over a period of a couple weeks. They all work like a charm now...
The problem with ReiserFS is you never know when it will kill your drive...
Bad joke...I know...I'll show myself out.
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
I assume from that comment you have never actually used a SSD. .5% is a considerable exaduration. It is more like 90%; the drives slow to a crawl until you format them if there is no TRIM support.
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
There is are two easy solutions to Ext4 vs. SSD problems. The first is ReiserFS which is still eminently usable on Gentoo. The second is UFS which is available on the BSD's.
If the problem is that the drive doesn't follow the spec for TRIM, I'd rather just disable TRIM than try to keep using it with a different filesystem. That seems a bit like playing Russian Roulette. Are you really that sure that ReiserFS won't have the same problem (unless it just doesn't use TRIM anyway, in which case it is no better than ext4 without TRIM).
how is f2fs now-a-days?
No idea in general, but I'd think that a log-based filesystem would be fairly immune to this kind of nonsense since it would only issue TRIMs very rarely, and then only for huge areas of the disk at a time. They don't overwrite random blocks in-place constantly.
Update for Micro MX100 Series Drives:
From their support portal, the MX100 series has a firmware version MU02 from March of 2015 that appears to handle this properly now:
http://www.crucial.com/usa/en/support-ssd
Version MU02 includes the following changes:
Improved stability, Efficiency, and Performance during power state transitions
Improved handling of environments with unstable power supplies
Improved handling of environments with SATA interface signal integrity issues
Improved response time for SMART read commands
Corrected error handling NCQ Trim Commands
Corrected reporting of SMART Attribute 5
Maybe a dead rat is all I have and I need to be able to get some work done with a dead rat in the CPU slot; Windows might allow me to do this.
why don't they use freebsd ?
What's more, windows will allow you to get the same amount of work done with a legit CPU as with a dead rat in that slot.
Sadly, a Libertarian cannot force his views on another, and freedom cannot spread as does the cancer known as religion.
The data array in question is processed by this this code:
glob_match(ad->model_num, model_num)
So it is in fact a glob.
This is about queued TRIM vs. just TRIM, not TRIM vs. no TRIM at all.
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6