Samsung Finds, Fixes Bug In Linux Trim Code
New submitter Mokki writes: After many complaints that Samsung SSDs corrupted data when used with Linux, Samsung found out that the bug was in the Linux kernel and submitted a patch to fix it. It turns out that kernels without the final fix can corrupt data if the system is using linux md raid with raid0 or raid10 and issues trim/discard commands (either fstrim or by the filesystem itself). The vendor of the drive did not matter and the previous blacklisting of Samsung drives for broken queued trim support can be most likely lifted after further tests. According to this post the bug has been around for a long time.
Well, that's gotta be embarrassing for everyone bashing Samsung over this. I remember reading some rather strong opinions about who was at fault.
Who does that anymore? I just leave it up to someone else to [not] figure out.
BTW, the Samsung 850 pro is breathtakingly fast.
Thank You Samsung!
While our company cad-workstations don't run Linux, all of them do run on Samsung SSD's.
Nice to see vendors working together to improve Linux.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
When Apple updated OS X to allow TRIM on non-Apple supplied SSDs, forums were flooded with people claiming you should never use Samsung because they were fundamentally broken with regards to TRIM. Their "proof" was that corruption happened on Linux and they would not be swayed by the thought that maybe the problem was with Linux.
This is just another case of "Not My Problem" syndrome that too many techs get into. They think their code/tools/systems/whatever must be perfect, and other's are the ones fucking up. Samsung drives went on a blacklist for issuing the commands to them due to this bug? "WALP, LINUX IS PERFECT, MUST BE THE HARDWARE GUYS, even though their devices perform perfectly on other OSes" - and instead now we're left with a bug in Linux that corrupts data until the patch can make its way through the distro channels and pushed out to end users.
After many complaints that Samsung SSDs corrupted data when used with Linux
Ive used Samsung SSD's for years now and until today I've never heard of a 14e07c2ea4f[NO CARRIER]
Good people go to bed earlier.
Vote with your wallet, my next SSD will be a samsung.
Rejoice!
Why didn't other manufacturer brands had this issue?
:thumbsup:
The first reply:
Thanks for tracking this down. Instead of explicitly coding around the
issue in raid0/raid10/linear I would prefer to fix bio_split(). It seems
like a deficiency in the interface that it does not handle this
transparently.
Do you have a reproducible test case? If so it would be great if you
could try the following patch and let us know the results.
Too bad about Samsung's undeserved reputation from neckbeard linux aspies.
Sometimes you have to get your hands dirty to fix problems that weren't your fault anyway. Good for Samsung.
Hats off to Samsung for finding and even fixing the problem.
The problem is, you're only allowed to get trim with things you don't ever want to see again...
Every software can have unkown weak security points and errors, also linux. But open source software like linux has the advantage and disadvantage that everybody can study the source and find them...
On behalf of all internet users everywhere, whether in this specific space-time continuum or not, I would like to formally apologize to Samsung for all of the totally unwarranted bashing they took over over this issue. And I would also like to express my gratitude to them for finding a bug, fixing it, and posting a fix. Good job.
Just cruising through this digital world at 33 1/3 rpm...
Only paid developers get shit done and done correctly.
More like an assumption that the bug was in the driver because they hadn't noticed issues on other drives.
hardware firmware is commonly buggy. Device drivers often have to work around buggy hardware, so blacklisting devices for various functionality is not at all unusual.
If the code seems to work with other devices and breaks with a new device, then the first instinct is going to be to assume the new device is doing something wrong.
If this were Windows or OSX, you people would be shitting on it like sick Hogs for weeks on end.
Instead, several of you call it a triumph of open source!? What a fantasy.
There have been at least two kernel releases since this cropped up. That means nobody even looked. *Outsiders* had to find it.
Linus must be rolling over in his grave 'cause sure as shit he's mortified to death.
I'd installed 8 1TB Samsung 850 Pros two days prior to seeing this bug announcement. I there were many poos.
I'm still keenly waiting for the Kernel fix to be released, but my sphincter is unclenching.
Bugs happen. If you've got code that seems to work and then you investigate and it doesn't work on one particular brand of drive, it would be a reasonable suspicion that there is something funny with those drives.
Given the fact that multiple Samsung drive models were failing but multiple Intel drive models were *not* failing under the same test (from the linked article), the developers could be forgiven in suspecting there was something wonky going on with the Samsung drives.
There were issues with the 840 EVO losing significant speed after it had been in use for a while. There was eventually (after much complaining from customers) a "fix" released that helped but didn't actually completely resolve the issue.
But it doesn't have to. If a drive were to implement TRIM by doing absolutely nothing (which is completely within spec) then it wouldn't show the problem, but it doesn't mean the drive is better than another or the other drive has a fault.
It's quite possible that the way IBM implements TRIM is just a little different. Perhaps they defer it for a few ms or something. So the bug is occurring over and over but it doesn't show itself with corruption.
Yes, assuming that because you can reproduce it on Samsung drives it must be a Samsung bug is confirmation bias.
http://lkml.org/lkml/2005/8/20/95
The linked article pointed out that five models of Samsung SSD were affected, three models of Intel SSD were not. So there were at least some drives that didn't seem to be affected by the bug. (Presumably just due to luck/usage-pattern/etc.)
I'm running Linux on a RAID-0 SSD array.
I guess I should turn off fstrim until there's a backport of the fix to Fedora?
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Many mouths mewling madly on mailing lists.
Passing the buck instead of fixing their fucking code.
Something doesn't add up ... The fix for this was an oversight in a relatively new "bio_split()" routine that merged in with the immutable bio vector patch set for Linux kernel 3.15. The Algolia blog referenced in the Samsung patch claims it was able to replicate the discard issue using kernels 3.2, 3.10, and 3.14, before the bug existed. What gives?
While an apology is due, this sort of problem is inevitable given the nature of the technology. TRIM on NAND is a crutch for a technology that is poorly suited to data storage. Transforming NAND into a usable storage device requires heroic efforts on the part of the vendor, and it is hard to blame them for the bugs. Likewise, it is hard to blame Linux developers for their heroic efforts to work around the extensive deficiencies of NAND flash. Trusting in cheap commodity devices that don't even claim to protect against power loss is ill-advised.
Using TRIM as a band-aid for the performance woes of over-filled NAND devices is just asking for trouble. It has long been known that filling up filesystems leads to terrible performance, and the same applies to NAND drives. It is irresponsible of the vendors to provision the drives with insufficient reserved space, but one can compensate by setting aside an empty partition covering 5% of the space. It is much safer to disable TRIM and under-provision the drive, and it achieves the same effect of limiting write-amplification, without having to worry about bugs trimming away live data.
The only place were TRIM really makes sense is in the context of virtualization. Recovering space in sparse virtual disk images has real benefit, and operating system vendors have a lot more incentive and ability to make it work properly.
Company fixes driver for their own hardware on open source operating system? (installed on most systems on the planet, in part thanks to that very same company?) Doesn't this happen all the time? I'd assume Samsung submits tons of patches to the kernel every day.
Thanks guys, Ill be buying that LED screen this year :)
https://imgur.com/XVR8khP
"So, we add some code..." //Seunguk Shin (http://www.spinics.net/lists/raid/msg49440.html)