Slashdot Mirror


TRIM and Linux: Tread Cautiously, and Keep Backups Handy

An anonymous reader writes: Algolia is a buzzword-compliant ("Hosted Search API that delivers instant and relevant results") start-up that uses a lot of open-source software (including various strains of Linux) and a lot of solid-state disk, and as such sometimes runs into problems with each of these. Their blog this week features a fascinating look at troubles that they faced with ext4 filesystems mysteriously flipping to read-only mode: not such a good thing for machines processing a search index, not just dishing it out. "The NGINX daemon serving all the HTTP(S) communication of our API was up and ready to serve the search queries but the indexing process crashed. Since the indexing process is guarded by supervise, crashing in a loop would have been understandable but a complete crash was not. As it turned out the filesystem was in a read-only mode. All right, let's assume it was a cosmic ray :) The filesystem got fixed, files were restored from another healthy server and everything looked fine again. The next day another server ended with filesystem in read-only, two hours after another one and then next hour another one. Something was going on. After restoring the filesystem and the files, it was time for serious analysis since this was not a one time thing.

The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."

Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."

10 of 182 comments (clear)

  1. Apple by Anonymous Coward · · Score: 2, Insightful

    This is why Apple doesn't support TRIM in third-party SSDs...

  2. TLDR by Anonymous Coward · · Score: 2, Insightful

    Don't buy Samsung SSDs.

  3. Re:Apple TRIM Whitelist? by Anonymous Coward · · Score: 2, Insightful

    Yes. Apple also has custom firmware to support temperature sensors (instead of just using the standard SMART commands), which is why the fans in their iMacs/Macbooks will run at full speed if you swap out their drive for a 3rd party one... OS X assumes that the sensor is broken and goes into safe mode to avoid an overtemp burnout.

  4. Re:Apple TRIM Whitelist? by Anonymous Coward · · Score: 5, Insightful

    Its a good bet.

    As apple is probably quite aware, being probably the biggest seller of non-windows PCs, there is an endemic problem with a whole lot of hardware shipped claiming to be "compliant" with any given standard.

    Most vendor's testing methodology pretty much comes down to "Works on windows? Ship it"

    Linux has been dealing with this problem for decades. Power management implementations in laptops (and some desktop motherboards) are often outright broken and don't behave anything close to what the "standard" dictates. (Its so bad in laptops that Microsoft's power management maintains a hardware checklist with custom hacks for laptops with known bad implementations. On many systems it does not even /attempt/ to use standard calls)

    Linux developers attempt to access hardware in a manner according to how documented standards state and end up tripping all sorts of bugs from mild to hardware-bricking. Flabbergasted hardware vendors often respond with "It works in windows!"

    (Fortunately this shit doesn't fly in the server space where Linux is now pretty much King.. Well, at least in theory)

    So yes, I'd be willing to bet that Apple found that enabling trim in any old SSD led to an unacceptable chance of filesystem corruption and decided to implement a white list. So, you know, they don't catch shit for someone else's broken hardware.

  5. Another Deceptive Slashdot Title by idontgno · · Score: 4, Insightful

    Correct title: "TRIM and Any Fucking Operating System: Don't Buy Defective SSDs"

    It's not as if Windows or MacOS has any magic that makes queued TRIM work with non-compliant and poorly-coded hardware, right?

    Seriously, WTF, over?

    --
    Welcome to the Panopticon. Used to be a prison, now it's your home.
    1. Re:Another Deceptive Slashdot Title by Trogre · · Score: 4, Insightful

      Dear Microsoft,

      Thank you for your generous donation to our staff social club. As promised, please find attached drivers that utilise the *real* TRIM commands for our SSDs.

      Sincerely yours,
      A. Manufacturer

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  6. Re:Is there a site maintaining a list of "bad" SSD by Anonymous Coward · · Score: 1, Insightful

    Linux probably tickles a bug in the SSD. This is the problem with things only being validated within Windows.

  7. Re:Is there a site maintaining a list of "bad" SSD by Anonymous Coward · · Score: 2, Insightful

    I assume that Windows does not submit queued trim commands, thereby avoiding this problem.

  8. Re:Is there a site maintaining a list of "bad" SSD by drinkypoo · · Score: 4, Insightful

    Wait, what?

    When Intel SSDs decide they are bad, they just brick themselves instead of going into read-only-good-luck-your-data-may-be-bad-mode. This probably makes sense for Enterprise RAID, and for absolutely no other use case.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  9. Re:Is there a site maintaining a list of "bad" SSD by Anonymous Coward · · Score: 1, Insightful

    Well when FreeBSD gets around to supporting queued TRIM, people like yourself can thank Linux devs and users for getting the manufacturers to fix their firmware.

    Some men choose to walk the paths others have created, while some venture forth and create their own.