Slashdot Mirror


Samsung Finds, Fixes Bug In Linux Trim Code

New submitter Mokki writes: After many complaints that Samsung SSDs corrupted data when used with Linux, Samsung found out that the bug was in the Linux kernel and submitted a patch to fix it. It turns out that kernels without the final fix can corrupt data if the system is using linux md raid with raid0 or raid10 and issues trim/discard commands (either fstrim or by the filesystem itself). The vendor of the drive did not matter and the previous blacklisting of Samsung drives for broken queued trim support can be most likely lifted after further tests. According to this post the bug has been around for a long time.

184 comments

  1. awkward! by Anonymous Coward · · Score: 4, Insightful

    Well, that's gotta be embarrassing for everyone bashing Samsung over this. I remember reading some rather strong opinions about who was at fault.

    1. Re:awkward! by Anonymous Coward · · Score: 2, Interesting

      I'd be interested to see if anyone has apologized. Doing so is exceedingly rare on internet forums.

    2. Re: awkward! by Anonymous Coward · · Score: 0

      It lets people take advantage of the panic sales of Samsung ssd that happen after data corruption stories get published.

    3. Re:awkward! by mwvdlee · · Score: 2, Insightful

      Even more so for the kernel developers that blacklisted the Samsung drives.
      These developers should probably be banned from kernel development or atleast banned from making decisions regarding functionality.
      Creating code with a bug is human, not doubting your own code and blaming somebody else is stupid.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    4. Re:awkward! by Anonymous Coward · · Score: 1

      Agreed. The accusations never made sense as the issue, that was supposedly a Samsung firmware issue, did not affect Samsung drives on Windows machines.

      No one seemed to want to hear that though.

    5. Re:awkward! by Khyber · · Score: 2, Insightful

      If the kernel devs and Linus don't apologize, they're all a bunch of self-absorbed shitlords and should be smacked off the face of this planet.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    6. Re:awkward! by Anonymous Coward · · Score: 0

      Firmwares often include bugs and non-standard APIs/ABIs... Including precisely to try to mess up with other OSes than Microsoft Windows.

      Problems would also be much less likely to happen if vendors published sources more (using free licences), and contributed more to the open source environment, of which they are happy to profit from when it suits them.

    7. Re:awkward! by Anonymous Coward · · Score: 5, Insightful

      The firmware bug of Samsung drives, a very severe one actually, was confirmed by Samsung. The RAID 0 issue is a totally different one, hardly affecting anyone.

      So yes, the severe issue was a bug on Samsung side, thile the very rare RAID 0 bug is Linux kernel one.

    8. Re:awkward! by Anonymous Coward · · Score: 1, Informative

      Why would Linus apologize? He did not write the Samsung firmware, which says it can do queued TRIM falsely. That affects quite many users. This article is about another bug, which hardly affects anyone, maybe some cloud operators only.

      So you may apologize for not seeing the difference between an elephant and an ant.

    9. Re:awkward! by Anonymous Coward · · Score: 0

      Humans are stupid, regardless which OS they develop. If you want to ban stupid developers then there will be no computers, at all.

    10. Re:awkward! by Anonymous Coward · · Score: 1

      We need a new term for people like you RTFS... either that or your a troll, or stupid.

    11. Re:awkward! by Anonymous Coward · · Score: 0

      Hardly affecting anyone, except those using RAID10 and issuing TRIM commands. Which is probably not as uncommon as you're making it out to be.

    12. Re:awkward! by Cramer · · Score: 1

      Then explain why people NOT running md/dmraid have reported corruption. (and why Samsung themselves confirmed issues with their internal firmware)

    13. Re:awkward! by Anonymous Coward · · Score: 0

      Creating code with a bug is human, not doubting your own code and blaming somebody else is divine.

      Fixed that to tease the religious people.

    14. Re:awkward! by Zero__Kelvin · · Score: 1

      That is because most of us understand software. We know that things often "work" in Windows only, because Windows often ignores failures or they have internal workarounds based on inside knowledge of the hardware and firmware, as well as the flaws in its implementation. It is actually quite common that something "works" in Windows and fails with Linux because Linux is following the standard / functioning properly while Windows is not, and the actual fault is with the hardware and/or Windows.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    15. Re:awkward! by BradMajors · · Score: 0

      This is standard procedure for those writing code for Linux. Off-load your testing to your customers.

      The Linux kernel people never bothered to test their code on Samsung drives. Hence, if the kernel does not work on Samsung drives it is up to Samsung to fix the problem.

      The ONLY way to verify your code works on a piece of hardware is to test it on that piece of hardware!

    16. Re:awkward! by Kaenneth · · Score: 1

      Bullshit.

    17. Re:awkward! by GigaplexNZ · · Score: 5, Informative

      I've read the articles. There are two separate bugs here. One, Samsung drives advertise support for queued TRIM even though it's not properly supported, causing corruption. Two, the kernel had a TRIM bug that affected serial TRIM with mdadm RAID, which is the kernel bug Samsung found and fixed. The queued TRIM bug still exists in the Samsung firmware.

    18. Re:awkward! by GigaplexNZ · · Score: 3, Informative

      The queued TRIM blacklist on Samsung drives doesn't affect Windows because Windows doesn't support queued TRIM yet. This Linux kernel bug is a different issue, but many assumed it was the same, even though Algolia clearly stated in their blog post that they weren't using queued TRIM.

    19. Re:awkward! by poltsy · · Score: 1

      Windows can't trigger the bug because it doesn't use that feature.

    20. Re:awkward! by sjames · · Score: 2

      The AC was sorta half right. It is not uncommon for hardware to break the standard so that it works with Windows. That sort of thing is becomm9ing less common but it's hardly unknown.

    21. Re:awkward! by GigaplexNZ · · Score: 2

      Because there are two separate bugs.

    22. Re:awkward! by Khyber · · Score: 1

      Linus needs to apologize for his devs going "Not my fucking fault!" when in fact it WAS their fault.

      https://blog.algolia.com/when-...

      Here's the company that found the actual problem and pinpointed it.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    23. Re:awkward! by GigaplexNZ · · Score: 1

      I'm aware of the company that found the actual problem - I specifically stated that I read the linked articles, and that link you provided is one of the ones already linked. Obviously the kernel devs erred when they automatically assumed there wasn't a kernel bug.

      That said, Linus never apologises for his own out-rightly abusive comments and actions. There's no way he's going to apologise on behalf of someone else, especially when there's some truth to the kernel developers comments - there are known bugs in the Samsung firmware. They just made the mistake of assuming that this particular one was one of those instead of one of their own. The best we can hope for is those responsible developers apologising on their own behalf.

    24. Re:awkward! by Anonymous Coward · · Score: 0

      If the kernel devs and Linus don't apologize, they're all a bunch of self-absorbed shitlords and should be smacked off the face of this planet.

      Idiot. Shouldn't you be rolling back your Windows 10 suckdate by now?

      FTA: "linux md raid with raid0 or raid10 and issues trim/discard commands"

      Anybody who set up raid would have surely looked for pre-existing problems when they set it up. If you read the article and followed the thread, you came to this within a few clicks.

      http://www.spinics.net/lists/raid/msg49452.html

      Piergiorgio> Does this mean we should disable any trimming on RAID-10
      Piergiorgio> until further notice?

      If you are using SATA SSDs, absolutely.

      This bug has been around for a long time and I'm surprised we haven't
      heard of it until now. But it's very specific to the way linear, raid0
      and raid10 interface with the block layer. I'm guessing that most SSD
      users deploy raid1 which is not affected.

      The problem is also there if you have a storage array that prefers the
      UNMAP command. But it's even less likely that you'd be using software
      RAID in that case."

    25. Re:awkward! by TheRaven64 · · Score: 1

      Nonsense. It is true, however, that Windows and Linux use different (overlapping) subsets of the SATA (and SCSI) command sets and, in particular, use very different sequences of commands in common use. If you test heavily with Windows and not with Linux, then you may find that there are code paths in your firmware that Linux uses a lot but which are mostly untested.

      --
      I am TheRaven on Soylent News
    26. Re:awkward! by david_thornley · · Score: 0

      The kernel devs and Linus have probably contributed significantly more to the world than you have, and your comment is impolite. If you are going to eschew all software written by rude people, exactly what are you going to run?

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    27. Re:awkward! by shentino · · Score: 1

      Accepting the patch *was* the apology.

    28. Re:awkward! by david_thornley · · Score: 1

      How many paying customers does Linux have? Massive testing is expensive. There's more likely to be issues with Linux than Microsoft Windows, because everybody tests on WIndows. That doesn't mean that Microsoft itself tests better than Linux devs, and indeed we find that Microsoft puts out lots of bugs.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
    29. Re:awkward! by Anonymous Coward · · Score: 0

      Ever hear of Red Hat and SuSE? They have customers that get labeled "enterprise". That's millions to billions of dollars of paying customers Linux has. Get a clue, you fuckwit.

    30. Re:awkward! by Khyber · · Score: 1

      "The kernel devs and Linus have probably contributed significantly more to the world than you have"

      I keep people fed and develop new technologies to ensure people can remain fed. They do nothing nearly as important. The world could exist quite well without people like them. Not much of a world would exist without people like me.

      Try again when you're not so ignorantly assumptive.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    31. Re:awkward! by david_thornley · · Score: 1

      So? I referred to people who are crucial in Linux development. You are referring to people like you. I can refer to my work in enabling efficient small-scale manufacturing, but the blunt truth is that, if I were to retire tomorrow, it would have an extremely minor effect on that or the world as a whole.

      Computing in general would be a lot worse off without Linux in many ways. Even if you don't use Linux, you benefit from the general raising of the bar that happens when people try to do better than Linux. You probably wouldn't be as effective without Linus and the rest.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  2. Proper Troubleshooting by Anonymous Coward · · Score: 0

    Who does that anymore? I just leave it up to someone else to [not] figure out.
    BTW, the Samsung 850 pro is breathtakingly fast.

  3. Yhank You by Anonymous Coward · · Score: 2

    Thank You Samsung!
    While our company cad-workstations don't run Linux, all of them do run on Samsung SSD's.

  4. Bravo by Virtucon · · Score: 4, Interesting

    Nice to see vendors working together to improve Linux.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
    1. Re:Bravo by gstoddart · · Score: 4, Insightful

      After many complaints that Samsung SSDs corrupted data when used with Linux

      There was definitely some self-interest there.

      Samsung can't have people saying their SSDs corrupt data when it's not them doing it.

      --
      Lost at C:>. Found at C.
    2. Re:Bravo by Anonymous Coward · · Score: 0

      Samsung can't have people saying their SSDs corrupt data

      The one causing it is irrelevant. Only the one getting the blame matters.

    3. Re:Bravo by jones_supa · · Score: 1

      Nice to see vendors working together to improve Linux.

      Well, Samsung had some SSDs to sell. It's part of the open source philosophy: you scratch your own itch, and everyone benefits.

      Still, the problem is that we don't arrive at a well-rounded result. Fixing some things here and there is not deep QA. After stories like this I always get cold chills imagining what else broken is there.

    4. Re:Bravo by DarkOx · · Score: 5, Interesting

      Sure there was self interest. Still I think they deserve a lot of credit here. Rather than the typical "Its not my code" response from a developer who is sure the problem is elsewhere (rightly or wrongly) they actually found and fixed the problem. That is good behavior!

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    5. Re:Bravo by Anonymous Coward · · Score: 4, Insightful

      Of course, this is only possible when the "other person's" code is Free Software. If this had been a problem in Windows/OSX that Microsoft/Apple was refusing to fix, there's little Samsung could have done about it.

    6. Re: Bravo by Anonymous Coward · · Score: 0

      Why don't you just audit the code yourself so you don't get those cold chills?

    7. Re:Bravo by gstoddart · · Score: 2

      Sure it was good behavior.

      But it was borne entirely out of the Linux people saying "OMG, teh Samsung is teh sux0r".

      I do give them a lot of credit. More than the people who apparently insisted it was the fault of Samsung in the first place.

      --
      Lost at C:>. Found at C.
    8. Re:Bravo by Anonymous Coward · · Score: 1

      there's little Samsung could have done about it
      Having dealt with MS on a few issues. I am 100% sure they could have got whatever attention they wanted. Samsung is not exactly some little mom and pop OEM... I am not so sure about Apple as I have not dealt with their support structure over the years.

      They even refund the amount of the support call *IF* it is their bug.

    9. Re:Bravo by Anonymous Coward · · Score: 1

      You use to have to pay the MS tax in $$$ because you couldn't avoid it.

      Now you have to pay the Linux tax in contributions because you can't avoid it.

      Shifting the burden from volunteers to those with a mutual self interest is how things are suppose to work, it's a good thing, expectations are rising.

    10. Re: Bravo by bill_mcgonigle · · Score: 4, Interesting

      Yeah, the outcome is great. I just wonder why they waited more than a year to look into it. Maybe this will set a good example for the industry that with a little bit of effort you can take care of your customers and sell more product.

      If this were the 80's and a hard drive vendor had more than two reports of data loss under, say VMS, there would have been engineers on a plane to DEC by morning to get it solved by the coming weekend.

      Now we have thousands of users with reports and millions of units sold, and a wealthy vendor, and it's all crickets, leaving some kernel hackers to half-ass a blacklist. It's not like this is BeOS - there are millions of servers running in the target market. I don't mean to absolve the bad troubleshooting by kernel devs, but want to know what drove the apathy at Samsung (and other vendors behaving poorly). It's obviously not profit motive.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    11. Re: Bravo by bill_mcgonigle · · Score: 5, Informative

      I take some of that back. It seems the real credit for digging in goes to these guys. Samsung came in a month ago after they were provided a test suite and then gets credit for finding the kernel code path that caused the problem. An Oracle engineer provided a more-correct patch.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    12. Re:Bravo by Yunzil · · Score: 1

      Rather than the typical "Its not my code" response from a developer who is sure the problem is elsewhere (rightly or wrongly)

      Except that's exactly what happened (on the Linux side).

    13. Re:Bravo by Anonymous Coward · · Score: 0

      Something tells me that Samsung probably has access to windows source code

    14. Re: Bravo by aNonnyMouseCowered · · Score: 1

      "If this were the 80's and a hard drive vendor had more than two reports of data loss under, say VMS, there would have been engineers on a plane to DEC by morning to get it solved by the coming weekend."

      Hard disks were way more expensive in the 80s, and they sold in lower numbers. So it makes economic sense to do hands-on damage control.

    15. Re:Bravo by Anonymous Coward · · Score: 0

      Sort of.

      Samsung firmware *DOES* corrupt data (broken NCQ TRIM implementation). The same way Crucial M500 also does. Everyone knows this, which has been a very very large black mark on those two vendors. In fact, nobody buys Samsung or Micron(!!!) for the datacenter anymore because of this. All such SSDs are blacklisted so as to avoid the firmware bugs.

      Now, Micron and Crucial fixed their shit (on everything but the M500). Samsung is going to get a fix out for their shit soon as well. When that happens, the blacklists are updated (they _are_ firmware version aware in Linux) so that fixed SSDs can use the better NCQ TRIM.

      But they'd still get the blame for *anything* gone wrong with TRIM, due to their firmware bugs, fixed or not. Tracking down the md TRIM bug and fixing it is a good thing for any SSD vendor, but doubly so for any SSD vendor that has a black mark due to TRIM bugs in their firmware.

    16. Re: Bravo by jones_supa · · Score: 1

      It's boring.

    17. Re:Bravo by Anonymous Coward · · Score: 0

      Alignment of private and public incentives -- this is how it's supposed to work !

  5. Crying wolf by Sponge+Bath · · Score: 5, Informative

    When Apple updated OS X to allow TRIM on non-Apple supplied SSDs, forums were flooded with people claiming you should never use Samsung because they were fundamentally broken with regards to TRIM. Their "proof" was that corruption happened on Linux and they would not be swayed by the thought that maybe the problem was with Linux.

    1. Re:Crying wolf by ArcadeMan · · Score: 0

      Well of course the problem couldn't be Linux itself! It's open-source software and there's thousand of people looking at the source code every day!

      Closed source... open source... they all have bugs.

    2. Re:Crying wolf by beernutz · · Score: 4, Insightful

      The point however is that in a closed source system, Samsung could not have found and fixed the bug themselves.

      --
      (stolen from DaBum) I am dyslexia of borg - your ass will be laminated.
    3. Re:Crying wolf by Anonymous+Brave+Guy · · Score: 3, Insightful

      Is that really the point, though?

      Vendors of products affected by bugs in closed source software collaborate all the time. It's usually in their mutual interests, and it has been going on forever. Just look at the extraordinary lengths Microsoft used to go to in order to maintain compatibility of Windows with older applications.

      On the other hand, the existence of this issue in the first place, the fact that other vendors whose products may also have been affected did not act as Samsung did, and particularly the denial and active yet unjustified blacklisting of Samsung products by the people running the project with the real fault are indictments of that project, no matter how open it claims to be or how big and famous it is.

      This whole affair does not look good for Linux, and more importantly, it does not reflect well on the people currently running development of Linux.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    4. Re: Crying wolf by Anonymous Coward · · Score: 0

      I agree that they couldn't have fixed it in closed-source software but I am sure that Samsung has the tools and expertise to monitor and analyze what is happening on the SATA cable from the protocol level down to the physical signal if necessary.

    5. Re:Crying wolf by kaiser423 · · Score: 5, Informative

      What makes you think that? Samsung is one signature away (PIA -- Proprietary Information Agreement) from viewing the vendor's source code and advising them. It's pretty damn routine and uncontroversial. I don't understand why people think that just because something is not open source that no one outside of the company ever, ever, under any circumstances can see a hunk of the code. Just sign a PIA and over the code in a secure manner, or give them remote VPN access to the test box. Pretty damn simple and routine.

    6. Re:Crying wolf by Anonymous Coward · · Score: 0

      Is the kernel of OS X, namely Darwin, closed?

    7. Re:Crying wolf by Anonymous Coward · · Score: 0

      > The point however is that in a closed source system, Samsung could not have found and fixed the bug themselves.

      Both Apple and Microsoft have policies for allowing 3rd parties gain access to the code. It requires signing an NDA and a fee (around $200k IIRC and is sometimes refundable).

    8. Re:Crying wolf by bhcompy · · Score: 1

      If only I had more +1 Informative's to give

    9. Re:Crying wolf by Anonymous Coward · · Score: 0

      Woohoo, calling it from the hindsight!! Way to go dude!

      Judgment calls are never 100% accurate, and at the end of the day those who are implementing the kernel development process are people. At some point someone has to declare something to be a hardware bug, when they've decided enough resources have been put into fixing something that it's just not worth chasing anymore. If you're going to chase a hardware defect as some other defect forever, you'd never get any other work done. It's not always going to be the right call, and when it isn't, you eat it like a good manager. But without those decisions to guide resources and evaluate efforts, all you're going to have is a bunch of stalled projects.

      Don't crap on these guys too much, they are cooking your dinner.

      And thank you Samsung for fixing this!

    10. Re:Crying wolf by Anonymous+Brave+Guy · · Score: 1

      Hindsight really has nothing to do with it. If they didn't know for sure what the cause was, there was no need to call it at all. You can mark an issue not reproducible in a bug tracker without actively blaming someone else for a mistake they never made.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    11. Re:Crying wolf by GigaplexNZ · · Score: 4, Informative

      That really depends on whether OS X uses serial or queued TRIM. The Samsung drives work fine with serial TRIM, but are still broken with queued TRIM. The bug that Algolia reported and Samsung fixed in the kernel was a serial TRIM issue in the Linux kernel with RAID, which is unrelated to the queued TRIM firmware issues.

    12. Re:Crying wolf by gnasher719 · · Score: 1

      The point however is that in a closed source system, Samsung could not have found and fixed the bug themselves.

      Says who? If a similar bug happened with Samsung SSD drives connected to Macintosh computers, Samsung as a highly esteemed supplier of parts would most likely be given any help needed to fix the problem. They can't just download the software, but one phone call from the right person at Samsung to the right person at Apple would fix that.

    13. Re:Crying wolf by KGIII · · Score: 1

      Shared Source Initiative. You can access Microsoft's Windows code if you want to sign an NDA.

      --
      "So long and thanks for all the fish."
    14. Re:Crying wolf by beernutz · · Score: 1

      Maybe the difference lies in permission?

      It seems like a better situation all around when you are not dependent on legal agreements and "may I look at this source please?". This also does not guarantee that your fix will be used (though it is quite likely to be).

      What if you are a smaller company than Samsung? Maybe Microsoft will just ignore or outright deny requests to see the source code.

      I think the ability to see code and make/publish changes to that code independent of permission to do so is an important right.

      --
      (stolen from DaBum) I am dyslexia of borg - your ass will be laminated.
    15. Re:Crying wolf by Anonymous Coward · · Score: 0

      If you had read the article (blasphemy, I know) you would have known that this patch doesn't fix the problem that Samsung got so much flak for. And apparently Samsung still hasn't fixed that problem.
      The patch fixes a different problem that can occur when using the drive in a rather rare RAID configuration. Most Linux users wouldn't be affected, although I'm of course glad that they fixed it.

    16. Re:Crying wolf by Anonymous Coward · · Score: 0

      Well, the actual bug in Linux had to do with TRIM on drives in a RAID. Since Macs are pretty much non-upgradable, you're never going to be able to add a second SSD, so I'd say Apple users really don't have anything to worry about.

  6. Just another case.... by darkain · · Score: 4, Insightful

    This is just another case of "Not My Problem" syndrome that too many techs get into. They think their code/tools/systems/whatever must be perfect, and other's are the ones fucking up. Samsung drives went on a blacklist for issuing the commands to them due to this bug? "WALP, LINUX IS PERFECT, MUST BE THE HARDWARE GUYS, even though their devices perform perfectly on other OSes" - and instead now we're left with a bug in Linux that corrupts data until the patch can make its way through the distro channels and pushed out to end users.

    1. Re:Just another case.... by Anonymous Coward · · Score: 0

      Devices working perfectly in other OSes is no indicator that the device is no at fault. Witness the vast amount of crap laptop hardware, whose disastrous ACPI implementations only worked because their Windows drivers were chock-full of workarounds.

    2. Re:Just another case.... by DRJlaw · · Score: 0, Flamebait

      "WALP, LINUX IS PERFECT, MUST BE THE HARDWARE GUYS, even though their devices perform perfectly on other OSes"

      It was even better. The alleged reason that the hardware didn't fail on other OSes such as Microsoft Windows was that Microsoft had conspired with Samsung to cover up its hardware bugs -- i.e., Microsoft implemented both standard-TRIM support and broken-TRIM support.

      No evidence whatsoever that this mechanism existed, but Microsoft engineers must have figured it out and then kept super-duper quiet about changes to their own filesystem-to-device-driver-to-SATA communications chain in order to keep the Linux plebes down.

    3. Re:Just another case.... by DRJlaw · · Score: 2

      Devices working perfectly in other OSes is no indicator that the device is no at fault. Witness the vast amount of crap laptop hardware, whose disastrous ACPI implementations only worked because their Windows drivers were chock-full of workarounds.

      It certainly is an indicator. I think you mean to say "is not conclusive evidence."

      But then again, disastrous ACPI implementations are not conclusive evidence that a whole different type of device is at fault.

      Your reasoning falls into the very trap GP was pointing out.

    4. Re:Just another case.... by 0123456 · · Score: 4, Interesting

      Devices working perfectly in other OSes is no indicator that the device is no at fault. Witness the vast amount of crap laptop hardware, whose disastrous ACPI implementations only worked because their Windows drivers were chock-full of workarounds.

      Back when I was writing Windows drivers for plugin cards, there were certain motherboards that we'd detect and switch the motherboard bus to the slowest possible speed, because the chipset was a heap of junk that didn't work properly at higher speeds. Anyone who said 'but it works on Windows!' clearly had no idea that it only worked because we'd intentionally turned off most of the features.

    5. Re:Just another case.... by Anonymous+Brave+Guy · · Score: 3, Interesting

      A pro-Linux bias on Slashdot is not exactly a surprise, but an equally accurate headline on another forum might have read "Critical bug in Linux corrupts data on SSDs", and the subtitle "Linux maintainers deny serious fault, blame innocent parties for data loss" would probably have been fair too.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    6. Re:Just another case.... by LWATCDR · · Score: 1

      You should take a look at the "black list" before you try to figure that question out.
      The list includes other brands of drive as well as Samsung...

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    7. Re:Just another case.... by goarilla · · Score: 1

      The list contains a lot of ATA workarounds (https://github.com/torvalds/linux/blob/e64f638483a21105c7ce330d543fa1f1c35b5bc7/drivers/ata/libata-core.c).
      Apparently it's quite normal to have software work around hw defects.

    8. Re:Just another case.... by gstoddart · · Score: 1

      A pro-Linux bias on Slashdot is

      A complete myth. At least these days.

      Slashdot has several bags of crazy, all competing with one another at various times.

      There's Windows fanbois, Linux fanbois, and Apple fanbois. Over the years the ratio of those has swung back and forth, these days I'd say on balance you'd be hard pressed to say there's a strong bias one way or another.

      At various times it's been chic to tend more to one or another, now it seems like Slashdot has grown enough that there's at least 30 different kinds of batshit crazy at any given time, all struggling to get out.

      But let's face it, the actions of the Linux people in their unwavering belief in the perfection of Linux is no less sketchy than the people whose unwavering defense of Microsoft defies logic.

      I'd like to say Slashdot has a bias towards rational thought. I'd like to, but if anything I'd say Slashdot has an increasing bias towards fixed positions and screeching monkeys flinging poo.

      There's always been poo flinging, but now there's less rational discourse.

      --
      Lost at C:>. Found at C.
    9. Re:Just another case.... by kaiser423 · · Score: 1

      Have you seen the Linux ATA/SATA and other code bases like Audio, Video, etc (likely AHCI also)? They're chock full of work arounds for various chipsets, drivers and firmwares. Acting like workarounds aren't effectively industry standard is a little silly. Linux has adapted to its fair share of odd hardware that doesn't work quite as expected.

    10. Re:Just another case.... by nojayuk · · Score: 4, Interesting

      We did workarounds on the ATA bus spec for known hardware bugs in older VIA chipsets. These were silicon bugs, not chipset firmware so they couldn't be fixed afterwards with patches and there were millions of these boards out there. Declaring our devices (CD-ROM and DVD-ROM drives) wouldn't work with these boards was not going to happen for sales reasons so our code included a lockup-recovery function that was invoked when the rare bug conditions were met and the IDE bus froze. The average user never noticed these lockups and we didn't tell them about them.

      Out-of-spec bugs like this were well-known in the industry and workarounds were easy to produce as long as you had access to a few million bucks worth of test equipment and a good team of professional engineers with decades of experience, not something that's common in the Linux world.

    11. Re:Just another case.... by Anonymous Coward · · Score: 0

      I think that mostly just proves my point: when an OS has trouble interacting with a device, the cause is most likely to be the device not adhering to whatever spec it was supposed to adhere to. In a perfect world, there wouldn't have to be a mountain of workarounds for all these different devices, because they would already work like they were supposed to!

    12. Re:Just another case.... by Anonymous Coward · · Score: 0

      There's Windows fanbois, Linux fanbois, and Apple fanbois.

      Not to mention the host files fanboi.

    13. Re:Just another case.... by thegarbz · · Score: 1

      How many software engineers does it take to change a lightbulb? None it's an electrical problem.
      How many electrical engineers does it take to change a lightbulb? None we'll just work around it in software.

    14. Re:Just another case.... by GigaplexNZ · · Score: 1

      This is just another case of "Not My Problem" syndrome that too many techs get into.

      No, it's a case of everyone jumping to conclusions.

      Samsung drives went on a blacklist for issuing the commands to them due to this bug?

      No, they went on the queued TRIM blacklist due to a different bug. This bug was an unrelated serial TRIM bug when used in conjunction with RAID.

    15. Re:Just another case.... by GigaplexNZ · · Score: 1

      It's actually much simpler than that. Windows doesn't yet support queued TRIM, it still uses the legacy serial TRIM. The Samsung firmware bug is in the queued TRIM implementation, which is a different issue to the Linux kernel TRIM bug that Samsung found.

    16. Re:Just another case.... by Anonymous Coward · · Score: 0

      There's Windows fanbois, Linux fanbois, and Apple fanbois.

      Hey now, don't forget about us Amiga fanbois. We're not going to take that from a gstod dart. Amiga Forever!

    17. Re:Just another case.... by jones_supa · · Score: 1

      In a perfect world

      We don't live in such a world. If we want our computers to work properly today, these workarounds have to be taken into account.

    18. Re:Just another case.... by jones_supa · · Score: 1

      Windows doesn't yet support queued TRIM, it still uses the legacy serial TRIM.

      Queued TRIM is serial as well... :) Everything is serial in the SATA bus.

      With "serial TRIM" you probably mean "blocking TRIM" (it requires other operations to be halted and command queue flushed before it can be performed).

    19. Re:Just another case.... by GigaplexNZ · · Score: 1

      Yes, we're talking about the same thing.

    20. Re:Just another case.... by Anonymous Coward · · Score: 0

      Well to be completely honest, there are plenty of things to love and to hate about all three systems.

      I have been Slashdoting for a very long time. I've been an apple fainboi, a microsoft fanboi, a linux fanboi. I've also been a linux hater, windows hater, and apple hater.

      I can see how this could be confusing for some people so for that I apologize.

      At least we still have choices between them, and they must still compete against each other eh?
      Prove yourself: Jealousy. The OS is always greener on the other side.

    21. Re:Just another case.... by david_thornley · · Score: 1

      You're leaving out the anti-fanbois. There are people who hate Linux, Windows, and Apple, frequently displaying a lot of ignorance about what they're complaining about. Fanbois at least tend to know something when pontificating on their favorites.

      I'm wondering whether we could round up some fanbois and anti-fanbois, put them in the same room, and use the results to power the Slashdot servers.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  7. not the case in my situation by nimbius · · Score: 3, Funny

    After many complaints that Samsung SSDs corrupted data when used with Linux

    Ive used Samsung SSD's for years now and until today I've never heard of a 14e07c2ea4f[NO CARRIER]

    --
    Good people go to bed earlier.
    1. Re:not the case in my situation by Anonymous Coward · · Score: 0

      Ok, this joke is getting old.

    2. Re:not the case in my situation by ArcadeMan · · Score: 1

      Enough money to afford SSDs but not enough to afford something better than dial-up.

      I bet you have a 20MHz CPU with 64GB of RAM, too.

    3. Re:not the case in my situation by edtice1559 · · Score: 3, Informative

      If you have 64GB of RAM, you can cache the entire SSD. Then you won't have to issue TRIM commands!

    4. Re:not the case in my situation by Anonymous Coward · · Score: 0

      Plus the millennials might not even know what NO CARRIER means

    5. Re:not the case in my situation by Anonymous Coward · · Score: 0

      Does +++ATH0 still work?

    6. Re:not the case in my situation by Rinikusu · · Score: 4, Funny

      "But.. what does my cell phone carrier have to do with anything?"

      --
      If you were me, you'd be good lookin'. - six string samurai
    7. Re:not the case in my situation by stooo · · Score: 1

      Easy : it's the same samsung in it !

      --
      aaaaaaa
    8. Re:not the case in my situation by Anonymous Coward · · Score: 0, Informative

      How is this marked Informative? TRIM is irrelevant if you only read data and the cache will still have to be flushed to SSD if you write data.

      Also it's been more than five years since SSDs passed the point where 128GB had the best price per gigabyte.

    9. Re:not the case in my situation by swillden · · Score: 1

      If you have 64GB of RAM, you can cache the entire SSD. Then you won't have to issue TRIM commands!

      My SSD is 1 TB. The other one is 256 GB. SSDs today are a lot larger than you seem to realize.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    10. Re:not the case in my situation by Anonymous Coward · · Score: 0

      Please kill yourself or learn what a joke is.

    11. Re:not the case in my situation by edtice1559 · · Score: 1

      Whoosh! The portion of the thread is entirely humor starting with the NO CARRIER joke. I hate it when that happens. But the good news is that if the SSDs are that big, you can make a giant RAM disk out of virtual memory.

    12. Re:not the case in my situation by Anonymous Coward · · Score: 0

      Hey guys, I haven't paid my cell phone bill, you think they'll cut m##P^%^^$$[NO CARRIER]

  8. Vote with your wallet by jwkane · · Score: 4, Interesting

    Vote with your wallet, my next SSD will be a samsung.

    1. Re:Vote with your wallet by grasshoppa · · Score: 1

      Same problem, only spun around.

      I'll buy whatever fits my job requirements. Prior to this discovery, that certainly wouldn't have been Samsung. Now? They get to be considered along with all the other vendors.

      --
      Mod me down with all of your hatred and your journey towards the dark side will be complete!
    2. Re:Vote with your wallet by mwvdlee · · Score: 1

      Just curious; what were your reasons not to consider them before? In what way didn't they fit your job requirements?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    3. Re:Vote with your wallet by grasshoppa · · Score: 1

      The reports of their data loss.

      --
      Mod me down with all of your hatred and your journey towards the dark side will be complete!
    4. Re:Vote with your wallet by MobyDisk · · Score: 1

      grasshopper said the data loss. I would have said the firmware issues that lead to performance problems with their EVO line of SSDs.

    5. Re:Vote with your wallet by Anonymous Coward · · Score: 1

      Just because a manufacturer independent RAID bug was fixed in Linux (by Samsung, and that is nice of them), you buy Samsung SSD with firmware known to have queued TRIM related bug. Fortunately kernel developers blacklisted Samsung SSDs for you. But it is Samsung who should do more testing, as they take the money. Still, they seem to do a questionable work.

    6. Re:Vote with your wallet by danomac · · Score: 1

      I had a SSD fail recently (two weeks ago?) and while searching for a replacement found the Samsung TRIM issues, so I didn't buy one. I got some cheap replacement for the time being.

      When this new one inevitably fails prematurely, I will look again at Samsung models.

    7. Re:Vote with your wallet by GigaplexNZ · · Score: 1

      840 (non EVO) and 840 EVO. As far as I'm aware it doesn't affect the 850 EVO. They didn't even bother addressing the 840 non EVO model with their firmware updates.

  9. Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

    Rejoice!

    1. Re: Slashdot users are finally getting trim! by Anonymous Coward · · Score: 1

      That's odd... It's been working flawlessly for me on my Windows machines for ages.

    2. Re: Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      whoosh

    3. Re:Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      Yeah, but it's with their cousins. Unfortunately that's only a success story in places like Arkansas and West Virginia.

    4. Re:Slashdot users are finally getting trim! by I'm+New+Around+Here · · Score: 1

      When I worked on a military base a while back, there was a young female in the group whose last name was Trim. I never made a comment on it until the last couple days I was going to be there, and only in response to her making a remark like "some guys snicker" when hearing her name. I told her it was one of the first thought in my mind months earlier, but couldn't say anything.

      Could be worse though. In World War II, there was an Admiral Kuntz. He has a road and access gate named after him at Pearl Harbor. Imagine being his daughters.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    5. Re: Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      whoosh

    6. Re: Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      So you use RAID 0 on your Windows box for ages. Or just failed to read the article?

    7. Re:Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      In World War II, there was an Admiral Kuntz. He has a road and access gate named after him at Pearl Harbor. Imagine being his daughters.

      I hope neither were named Sandy.

    8. Re:Slashdot users are finally getting trim! by Anonymous Coward · · Score: 0

      There's a Beaver Cove in Nova Scotia, the locals call it Cunt's Cove

  10. Why did it only happened on Samsung's SSDs? by hyperar · · Score: 1

    Why didn't other manufacturer brands had this issue?

    1. Re:Why did it only happened on Samsung's SSDs? by Anonymous Coward · · Score: 5, Insightful

      Confirmation bias. It was happening with other brands, but for one reason or another, people focused in on Samsung as the culprit, and once that happened, there was no getting out of it.

    2. Re:Why did it only happened on Samsung's SSDs? by ArcadeMan · · Score: 1

      Excellent question. My first guesses would be that either the Samsung SSDs were doing something a bit out-of-specs, or the Samsung SSDs have something that's missing from other SSDs.

    3. Re:Why did it only happened on Samsung's SSDs? by wbo · · Score: 1

      Other brands of SSDs are on the Blacklist so I think there is a very good chance that they were impacted as well. I looked at the blacklist quickly and saw drives from Crucial, Micron, and Intel on the list as well as Samsung.

      People just complained about Samsung drives more,

    4. Re:Why did it only happened on Samsung's SSDs? by goarilla · · Score: 1

      Well they pump out the cheapest SSD's (TLC) around, which everyone buys. And let's not forget the fact that Samsung's reputation was already dwindling because of the many performance degradation issues.

    5. Re:Why did it only happened on Samsung's SSDs? by hyperar · · Score: 1

      OK, thanks for your answers, i thought that Samsung was doing something they weren't supposed to with their SSDs, but it turned out to be kind of a witch hunt. Thanks!.

    6. Re:Why did it only happened on Samsung's SSDs? by DRJlaw · · Score: 2

      Excellent question. My first guesses would be that either the Samsung SSDs were doing something a bit out-of-specs, or the Samsung SSDs have something that's missing from other SSDs.

      From TFS: "The vendor of the drive did not matter and the previous blacklisting of Samsung drives for broken queued trim support can be most likely lifted after further tests."

      If the vendor of the drive does not matter in testing, then there is no relevant difference in specification compliance or other "somethings." It's purely a matter of which anecdotes gain what traction within a small population of users using md raid with multiple SSDs in a raid 0 or 10 configuration, and which of those users circumstantially has the best contacts within the development community.

      My first guess is the users trying that configuration were purchasing the fastest available SSDs, which tend to be Samsung drives (large market share) or boutique manufacturers (small market share).

    7. Re:Why did it only happened on Samsung's SSDs? by ArcadeMan · · Score: 1

      I'm not familiar with all the flash-related technologies currently in use, what's your opinion on the Intel SSDs?

    8. Re:Why did it only happened on Samsung's SSDs? by swb · · Score: 2

      Perhaps competitive prices coupled with perceived quality (and good experience on other platforms) led to these drives being selected by more knowledgeable or performance oriented people.

      These drives then got pushed harder or in ways more likely to expose the bugs, leading to a perception that they were unreliable under Linux.

    9. Re:Why did it only happened on Samsung's SSDs? by Anonymous Coward · · Score: 0

      WTF are you talking about? Samsung SSDs have always been considered top tier. If you want the "best" SSDs you basically buy an Intel, Samsung or Crucial, in that order (but all roughly the same, with Intel standing out a bit more than the other two).

    10. Re:Why did it only happened on Samsung's SSDs? by goarilla · · Score: 1

      They have a good overall reputation but some of them hijack your data when they fail.
      See SSD life endurance test https://techreport.com/review/....
      Anyway SSD's fail completely different from Hard drives. Most just vanish, some corrupt massively and others go in a final one chance read-only mode (select Intel consumer models).
      Tested backups are a necessity here.

    11. Re:Why did it only happened on Samsung's SSDs? by goarilla · · Score: 1

      WTF are you talking about? Samsung SSDs have always been considered top tier. If you want the "best" SSDs you basically buy an Intel, Samsung or Crucial, in that order (but all roughly the same, with Intel standing out a bit more than the other two).

      If you want to buy the cheapest (price/performance) consumer SSD out there then yes you buy Samsung or Crucial.
      Intel prices their consumer stuff higher because they want fatter margins.

    12. Re:Why did it only happened on Samsung's SSDs? by Crispy+Critters · · Score: 1

      The story says the bug affected all drives equally, but a linked-to article says that the bug was isolated to TRIM commands by a group that found their 5 models of Samsung drives became regularly corrupted but their 3 models of Intel drives did not. That is not confirmation bias.

    13. Re:Why did it only happened on Samsung's SSDs? by MobyDisk · · Score: 1

      Most (not all) Intel drives are higher priced because they use SLC memory. Prior to 2014, I believe all Intel drives were SLC.

    14. Re:Why did it only happened on Samsung's SSDs? by Anonymous Coward · · Score: 0

      Name one non-EoL intel drive that uses SLC.

    15. Re:Why did it only happened on Samsung's SSDs? by goarilla · · Score: 1

      While Intel is conservative in its NAND flash and controllers, which are unique selling points. They haven't used SLC for a while now. Even the venerable X25 had plenty of non SLC variants (http://ark.intel.com/nl/products/56600/Intel-SSD-X25-M-Series-160GB-2_5in-SATA-3Gbs-34nm-MLC)

    16. Re:Why did it only happened on Samsung's SSDs? by Anonymous Coward · · Score: 0

      If the vendor of the drive does not matter in testing, then there is no relevant difference in specification compliance or other "somethings."

      Yes, because multiple SSD vendors would never use the same buggy controller would they?

      While I haven't opened up any SSDs yet it is not uncommon to find the same Toshiba controller or whatever in CF cards fro different vendors.

    17. Re:Why did it only happened on Samsung's SSDs? by thegarbz · · Score: 1

      Excellent question. My first guesses would be that either the Samsung SSDs were doing something a bit out-of-specs, or the Samsung SSDs have something that's missing from other SSDs.

      Knowing the industry the way it is it is just as likely that Samsung were the only ones who implemented the spec faithfully without some dodgy firmware workaround.

      Sometimes the "broken" device is the only one actually working properly.

    18. Re:Why did it only happened on Samsung's SSDs? by wonkey_monkey · · Score: 1

      Because there are two different bugs at issue here. There was a bug in the Linux kernel which Samsung fixed; and some of their drives have broken queued TRIM support. Summary makes a mess of it.

      --
      systemd is Roko's Basilisk.
    19. Re:Why did it only happened on Samsung's SSDs? by hyperar · · Score: 1

      If your right, they yes, the article didn't make that clear, at all.

  11. +1 by Dishwasha · · Score: 1

    :thumbsup:

  12. Except they didn't by Anonymous Coward · · Score: 1

    The first reply:


    Thanks for tracking this down. Instead of explicitly coding around the
    issue in raid0/raid10/linear I would prefer to fix bio_split(). It seems
    like a deficiency in the interface that it does not handle this
    transparently.

    Do you have a reproducible test case? If so it would be great if you
    could try the following patch and let us know the results.

    1. Re:Except they didn't by Anonymous Coward · · Score: 0
  13. Too bad by Anonymous Coward · · Score: 0

    Too bad about Samsung's undeserved reputation from neckbeard linux aspies.
    Sometimes you have to get your hands dirty to fix problems that weren't your fault anyway. Good for Samsung.

  14. Good work by Kuruk · · Score: 2

    Hats off to Samsung for finding and even fixing the problem.

  15. We finally get trim, but... by Anonymous Coward · · Score: 0

    The problem is, you're only allowed to get trim with things you don't ever want to see again...

  16. Every software can have unkown weak security point by Anonymous Coward · · Score: 0

    Every software can have unkown weak security points and errors, also linux. But open source software like linux has the advantage and disadvantage that everybody can study the source and find them...

  17. Apology by JustAnotherOldGuy · · Score: 2

    On behalf of all internet users everywhere, whether in this specific space-time continuum or not, I would like to formally apologize to Samsung for all of the totally unwarranted bashing they took over over this issue. And I would also like to express my gratitude to them for finding a bug, fixing it, and posting a fix. Good job.

    --
    Just cruising through this digital world at 33 1/3 rpm...
  18. Proves that free opensource is bullshit by Anonymous Coward · · Score: 0, Troll

    Only paid developers get shit done and done correctly.

    1. Re:Proves that free opensource is bullshit by Anonymous Coward · · Score: 1

      You got it, dude. Your moderation proved this even more. Morons!

  19. don't remember any denial by Chirs · · Score: 1

    More like an assumption that the bug was in the driver because they hadn't noticed issues on other drives.

    1. Re:don't remember any denial by batkiwi · · Score: 1

      If you look at the SSD blacklist it's HUGE, and not just filled with Samsung drives.

  20. fairly common to blacklist devices by Chirs · · Score: 1

    hardware firmware is commonly buggy. Device drivers often have to work around buggy hardware, so blacklisting devices for various functionality is not at all unusual.

    If the code seems to work with other devices and breaks with a new device, then the first instinct is going to be to assume the new device is doing something wrong.

    1. Re:fairly common to blacklist devices by Midnight+Thunder · · Score: 4, Insightful

      hardware firmware is commonly buggy. Device drivers often have to work around buggy hardware, so blacklisting devices for various functionality is not at all unusual.

      If the code seems to work with other devices and breaks with a new device, then the first instinct is going to be to assume the new device is doing something wrong.

      Another way of seeing things, is even if the bug is in the kernel, black listing still prevents damage to data on said vendor's hardware. When it comes to data corruption the first thing to do is limit damage, no matter who is it at fault. Afterwards, you can work together to try to isolate source of problems. Having unhappy users and customers is never good, unless you are the competition.

      --
      Jumpstart the tartan drive.
    2. Re:fairly common to blacklist devices by AmiMoJo · · Score: 1, Informative

      It's the fact that they put the boot in to Samsung, claiming that their TRIM implementation was broken. They then stopped looking at their own code and had to wait for Samsung to fix their bug.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    3. Re:fairly common to blacklist devices by Anonymous Coward · · Score: 5, Informative

      Sorry, that's incorrect.

      There's a bug on MD raid0 and raid10. In Linux.

      There is a data destroyer bug in SAMSUNG NCQ TRIM firmware. Which is *blacklisted*, so that it uses the non-ncq trim.

      See? You're an idiot and everyone but you actually knew what they were complaining about. The samsung firmware is buggy crap that destroys data on NCQ TRIM, and the Linux kernel had a data destroyer bug in RAID0/RAID10 + TRIM that was fixed by a samsung engineer.

      The samsung firmware is still broken, the linux kernel has been fixed, and you're still an useless idiot.

    4. Re:fairly common to blacklist devices by Anonymous Coward · · Score: 1

      So when Samsung has a bug its:

      The samsung firmware is buggy crap that destroys data on NCQ TRIM

      But when the kernel has a bug its:

      the Linux kernel had a data destroyer bug in RAID0/RAID10 + TRIM

      I think that's what people like to call cognitive dissonance.
      Both had bugs, one of those bugs is now fixed. Calling the other crap to protect yourself does you no favours.

    5. Re:fairly common to blacklist devices by AmiMoJo · · Score: 1

      Wow, so much rage. You should see a doctor.

      The alleged buggy implementation of NCQ TRIM in the Samsung firmware is not a bug at all. It can be safely re-enabled now, no need to blacklist it. It works fine on other operating systems too.

      Maybe you should try to understand this issue before going full ragetard on it.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    6. Re:fairly common to blacklist devices by Anonymous Coward · · Score: 0

      The samsung firmware is still broken, the linux kernel has been fixed, and you're still an useless idiot.

      Linus, is that you?

  21. "oh." You say. " Thumbs Up!" by DogShoes · · Score: 0

    If this were Windows or OSX, you people would be shitting on it like sick Hogs for weeks on end.

    Instead, several of you call it a triumph of open source!? What a fantasy.

    There have been at least two kernel releases since this cropped up. That means nobody even looked. *Outsiders* had to find it.

    Linus must be rolling over in his grave 'cause sure as shit he's mortified to death.

  22. Good News by Anonymous Coward · · Score: 0

    I'd installed 8 1TB Samsung 850 Pros two days prior to seeing this bug announcement. I there were many poos.

    I'm still keenly waiting for the Kernel fix to be released, but my sphincter is unclenching.

    1. Re:Good News by ledow · · Score: 1

      Doesn't matter much - this is why many Samsungs were mistakenly blacklisted, thinking it was a problem with the drive.

      Unless you're running RAID0 or similar, it's not going to bite you. Not at all sure why anyone runs RAID0, to be honest, and certainly not with SSD's, but there you go. RAID10 is affected, I believe, but with 8 drives I'm not sure what you'd get from RAID10 that RAID 5 wouldn't have been better for you anyway.

  23. a bit too harsh by Chirs · · Score: 1

    Bugs happen. If you've got code that seems to work and then you investigate and it doesn't work on one particular brand of drive, it would be a reasonable suspicion that there is something funny with those drives.

    Given the fact that multiple Samsung drive models were failing but multiple Intel drive models were *not* failing under the same test (from the linked article), the developers could be forgiven in suspecting there was something wonky going on with the Samsung drives.

    1. Re:a bit too harsh by Anonymous+Brave+Guy · · Score: 1

      Yes, bugs happen, and yes, sometimes diagnosing hardware compatibility issues is tricky. But if I see a potential data loss bug in software I develop, I don't start making judgements about where it comes from -- and I definitely don't start pointing the finger at other people and denying anything is wrong with my own code -- until I've identified the root cause of the problem.

      The issue here isn't really that a bug happened, even though the bug was serious. It's the way it was handled that is the greater cause for concern.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    2. Re:a bit too harsh by AthanasiusKircher · · Score: 2

      Bugs happen. If you've got code that seems to work and then you investigate and it doesn't work on one particular brand of drive, it would be a reasonable suspicion that there is something funny with those drives.

      It's hard to evaluate exactly what went on here. If you read the original report of the discovery (which I did last month and is still the first link in TFS), you see this explanation:

      Poking around in the source code of the kernel looking for the trim related code, we came to the trim blacklist. This blacklist configures a specific behavior for certain SSD drives and identifies the drives based on the regexp of the model name. Our working SSDs were explicitly allowed full operation of the TRIM but some of the SSDs of our affected manufacturer were limited. Our affected drives did not match any pattern so they were implicitly allowed full operation.

      In other words, they didn't know what was going on. Then they happened upon some code in the Linux kernel that explicitly blacklisted certain model segments from certain manufacturers. So, at some point someone made the assumption that this must be related to certain models from certain manufacturers, based on code in the Linux kernel.

      This could easily have led to confirmation bias in a situation where errors were not occurring frequently. (Note the further explanation that when they first informed Samsung, Samsung was unable to reproduce the issue until they started using a custom "much more intensive script" to increase the error rate of the problem.)

      So, I don't claim to know the full situation, but my guess is that Samsung wouldn't have been blamed for this at all if this blacklisting code hadn't already been seen in the Linux kernel.

      I'm not trying to place the blame on anyone in particular. But in this case there were various reasons they probably started thinking manufacturers were the problem other than just simple logic, and the "aha" moment apparently was based on looking at code in the Linux kernel already, not on actual prior observation that certain brands of drives were failing. (Otherwise, they would have probably suspected a hardware problem earlier... but instead the post describes a lot of time searching for software issues before they discovered the blacklist.)

  24. the 840 evo speed issues... by Chirs · · Score: 1

    There were issues with the 840 EVO losing significant speed after it had been in use for a while. There was eventually (after much complaining from customers) a "fix" released that helped but didn't actually completely resolve the issue.

    1. Re:the 840 evo speed issues... by ledow · · Score: 1

      Why would you use consumer level drives in a business?

      Clients really don't need an SSD as they are mostly limited by network speed more than anything, and servers shouldn't be touching that shit.

      That said, the 840 EVOs put me off upgrading my laptop, but I just went with a 1Tb 850 instead, which doesn't have any of those problems.

      Every manufacturer has problems with certain models, it's inevitable. But make sure you're using the right use case, evaluate properly, and disregard things that couldn't have affected you and you're fine.

      To be honest, a manufacturer that has the nous to say "We don't think it's us, we've stuck a programmer on it, look we found a bug, here's the patch" is something worth supporting because that's PRECISELY what I'd expect of any decent company.

    2. Re:the 840 evo speed issues... by david_thornley · · Score: 1

      The presumed difference between consumer level and enterprise level components is usually reliability. If you're running a large enough operation, stuff will be failing all the time anyway, so dealing with it has to be a normal procedure rather than exceptional. At that point, you can ask yourself if fewer failures are worth the extra cost, and the consumer level stuff may be more cost-effective.

      --
      "When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
  25. it could affect all drives equally by YesIAmAScript · · Score: 1

    But it doesn't have to. If a drive were to implement TRIM by doing absolutely nothing (which is completely within spec) then it wouldn't show the problem, but it doesn't mean the drive is better than another or the other drive has a fault.

    It's quite possible that the way IBM implements TRIM is just a little different. Perhaps they defer it for a few ms or something. So the bug is occurring over and over but it doesn't show itself with corruption.

    Yes, assuming that because you can reproduce it on Samsung drives it must be a Samsung bug is confirmation bias.

    --
    http://lkml.org/lkml/2005/8/20/95
  26. the article said some Intel drives not affected by Chirs · · Score: 1

    The linked article pointed out that five models of Samsung SSD were affected, three models of Intel SSD were not. So there were at least some drives that didn't seem to be affected by the bug. (Presumably just due to luck/usage-pattern/etc.)

    1. Re:the article said some Intel drives not affected by Anonymous Coward · · Score: 0

      So, to have trim support working, is this a kernel patch ?

  27. Oh bugger by metamatic · · Score: 1

    I'm running Linux on a RAID-0 SSD array.

    I guess I should turn off fstrim until there's a backport of the fix to Fedora?

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  28. Yes, the power of open sores. by Anonymous Coward · · Score: 0

    Many mouths mewling madly on mailing lists.

    Passing the buck instead of fixing their fucking code.

  29. How was this recreated before the bug existed? by godamntheman · · Score: 4, Insightful

    Something doesn't add up ... The fix for this was an oversight in a relatively new "bio_split()" routine that merged in with the immutable bio vector patch set for Linux kernel 3.15. The Algolia blog referenced in the Samsung patch claims it was able to replicate the discard issue using kernels 3.2, 3.10, and 3.14, before the bug existed. What gives?

    1. Re:How was this recreated before the bug existed? by Anonymous Coward · · Score: 0

      stable backports of changes. And I *really* hate any that touch mm code, skb_ (network buffers) and bio_ (io buffers) crap, as those are optimized to _insane_ levels so there is very very little security against bugs, and the interactions are very subtle.

    2. Re:How was this recreated before the bug existed? by godamntheman · · Score: 1

      There are definitely subtle interactions, and that's why this commit was not a candidate for stable. No distro back-ported immutable bio iterators (commit: 20d0189b1012a37d2533a87fb451f7852f2418d1).

  30. Blame NAND Flash Memory... by KonoWatakushi · · Score: 1

    While an apology is due, this sort of problem is inevitable given the nature of the technology. TRIM on NAND is a crutch for a technology that is poorly suited to data storage. Transforming NAND into a usable storage device requires heroic efforts on the part of the vendor, and it is hard to blame them for the bugs. Likewise, it is hard to blame Linux developers for their heroic efforts to work around the extensive deficiencies of NAND flash. Trusting in cheap commodity devices that don't even claim to protect against power loss is ill-advised.

    Using TRIM as a band-aid for the performance woes of over-filled NAND devices is just asking for trouble. It has long been known that filling up filesystems leads to terrible performance, and the same applies to NAND drives. It is irresponsible of the vendors to provision the drives with insufficient reserved space, but one can compensate by setting aside an empty partition covering 5% of the space. It is much safer to disable TRIM and under-provision the drive, and it achieves the same effect of limiting write-amplification, without having to worry about bugs trimming away live data.

    The only place were TRIM really makes sense is in the context of virtualization. Recovering space in sparse virtual disk images has real benefit, and operating system vendors have a lot more incentive and ability to make it work properly.

    1. Re:Blame NAND Flash Memory... by Anonymous Coward · · Score: 1

      How is the OS is meant to tell the drive that your partition is 'empty space' that can be 'set aside' for use as over provisioning? It can only do that with a TRIM command....

    2. Re:Blame NAND Flash Memory... by Anonymous Coward · · Score: 0

      TRIM is only necessary to free written blocks. The entire drive starts as empty space, and remains so if unwritten. Create an empty partition on a new or erased drive and do not format it, or ever write to it.

  31. This is news? by Anonymous Coward · · Score: 0

    Company fixes driver for their own hardware on open source operating system? (installed on most systems on the planet, in part thanks to that very same company?) Doesn't this happen all the time? I'd assume Samsung submits tons of patches to the kernel every day.

  32. Awesome on Sammy by Anonymous Coward · · Score: 0

    Thanks guys, Ill be buying that LED screen this year :)

  33. So, we add some code... by Anonymous Coward · · Score: 0

    https://imgur.com/XVR8khP

    "So, we add some code..." //Seunguk Shin (http://www.spinics.net/lists/raid/msg49440.html)

    1. Re:So, we add some code... by Anonymous Coward · · Score: 0

      Add some CODES, to be precise :'D
      https://imgur.com/63k0Mci