TRIM and Linux: Tread Cautiously, and Keep Backups Handy
An anonymous reader writes: Algolia is a buzzword-compliant ("Hosted Search API that delivers instant and relevant results")
start-up that uses a lot of open-source software (including various strains of Linux) and a lot of solid-state disk, and as such sometimes runs into problems with each of these. Their blog this week features a fascinating look at troubles that they faced with ext4 filesystems mysteriously flipping to read-only mode: not such a good thing for machines processing a search index, not just dishing it out.
"The NGINX daemon serving all the HTTP(S) communication of our API was up and ready to serve the search queries but the indexing process crashed. Since the indexing process is guarded by supervise, crashing in a loop would have been understandable but a complete crash was not. As it turned out the filesystem was in a read-only mode. All right, let's assume it was a cosmic ray :) The filesystem got fixed, files were restored from another healthy server and everything looked fine again. The next day another server ended with filesystem in read-only, two hours after another one and then next hour another one. Something was going on. After restoring the filesystem and the files, it was time for serious analysis since this was not a one time thing.
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."
Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."
The only TRIM use I recommend is running on it on an entire partition, e.g. like the swap partition, at boot, or before initializing a new filesystem. And that's it. It's an EXTREMELY dangerous command which results in non-deterministic operation. Not only do SSDs have bugs in handling TRIM, but filesystem implementations almost certainly also have ordering and concurrency bugs in handling TRIM. It's the least well-tested part of the firmware and the least well-tested part of the filesystem implementation. And due to cache effects, it's almost impossible to test it in a deterministic manner.
You can get close to the same performance and life out of your SSD without using TRIM by doing two simple things. First, use a filesystem with at least a 4KB block size so the SSD doesn't have to write-combine stuff on 512-byte boundaries. Second, simply leave a part of the SSD unused. 5% is plenty. In fact, if you have swap space configured on your SSD, that's usually enough on its own (since swap is not usually filled up during normal operation), as long as you TRIM it on boot.
-Matt
It takes a couple of links and searching through source code to get there. So here's the list of problematic drives, better formatted but still in regular expression format:
Micron_M500*
Crucial_CT*M500*
Micron_M5[15]0*
Crucial_CT*M550*
Crucial_CT*MX100*
Samsung SSD 8*
So, basically, all the ones I thought were the best. The list of whitelisted drives after it only includes those brands, Intel, and ST-something. So other brand may be unknowns.
(T>t && O(n)--) == sqrt(666)
Its a good bet.
As apple is probably quite aware, being probably the biggest seller of non-windows PCs, there is an endemic problem with a whole lot of hardware shipped claiming to be "compliant" with any given standard.
Most vendor's testing methodology pretty much comes down to "Works on windows? Ship it"
Linux has been dealing with this problem for decades. Power management implementations in laptops (and some desktop motherboards) are often outright broken and don't behave anything close to what the "standard" dictates. (Its so bad in laptops that Microsoft's power management maintains a hardware checklist with custom hacks for laptops with known bad implementations. On many systems it does not even /attempt/ to use standard calls)
Linux developers attempt to access hardware in a manner according to how documented standards state and end up tripping all sorts of bugs from mild to hardware-bricking. Flabbergasted hardware vendors often respond with "It works in windows!"
(Fortunately this shit doesn't fly in the server space where Linux is now pretty much King.. Well, at least in theory)
So yes, I'd be willing to bet that Apple found that enabling trim in any old SSD led to an unacceptable chance of filesystem corruption and decided to implement a white list. So, you know, they don't catch shit for someone else's broken hardware.
The Crucial MX100 with the latest MU02 firmware is now whitelisted by the Linux Kernel, and has it's TRIM ability re-enabled.
ObPedant: those aren't regexes, they're globs. Otherwise (for instance), the Samsung entry would match
Samsung SSD<space>
Samsung SSD<space>8
Samsung SSD<space>88
Samsung SSD<space>888
.
.
.
ad nauseam: the "*" regex operator means "zero or more occurrences of the previous pattern", which in this case is the character "8".
At least, I hope they're not supposed to be regexes. Otherwise, the kernel blacklist itself will have some serious issues known-bad SSDs because someone never learned how to create a regular expression.
Welcome to the Panopticon. Used to be a prison, now it's your home.