Slashdot Mirror


TRIM and Linux: Tread Cautiously, and Keep Backups Handy

An anonymous reader writes: Algolia is a buzzword-compliant ("Hosted Search API that delivers instant and relevant results") start-up that uses a lot of open-source software (including various strains of Linux) and a lot of solid-state disk, and as such sometimes runs into problems with each of these. Their blog this week features a fascinating look at troubles that they faced with ext4 filesystems mysteriously flipping to read-only mode: not such a good thing for machines processing a search index, not just dishing it out. "The NGINX daemon serving all the HTTP(S) communication of our API was up and ready to serve the search queries but the indexing process crashed. Since the indexing process is guarded by supervise, crashing in a loop would have been understandable but a complete crash was not. As it turned out the filesystem was in a read-only mode. All right, let's assume it was a cosmic ray :) The filesystem got fixed, files were restored from another healthy server and everything looked fine again. The next day another server ended with filesystem in read-only, two hours after another one and then next hour another one. Something was going on. After restoring the filesystem and the files, it was time for serious analysis since this was not a one time thing.

The rest of the story explains how they isolated the problem and worked around it; it turns out that the culprit was TRIM, or rather TRIM's interaction with certain SSDs: "The system was issuing a TRIM to erase empty blocks, the command got misinterpreted by the drive and the controller erased blocks it was not supposed to. Therefore our files ended-up with 512 bytes of zeroes, files smaller than 512 bytes were completely zeroed. When we were lucky enough, the misbehaving TRIM hit the super-block of the filesystem and caused a corruption."

Since SSDs are becoming the norm outside the data center as well as within, some of the problems that their analysis exposed for one company probably would be good to test for elsewhere. One upshot: "As a result, we informed our server provider about the affected SSDs and they informed the manufacturer. Our new deployments were switched to different SSD drives and we don't recommend anyone to use any SSD that is anyhow mentioned in a bad way by the Linux kernel."

11 of 182 comments (clear)

  1. Name and shame by Anonymous Coward · · Score: 2, Informative

    see ata_blacklist_entry

    (reformatted to get past Slashdot's 'junk' filter)

    static const struct ata_blacklist_entry ata_device_blacklist [] = {
    see ata_blacklist_entry

    static const struct ata_blacklist_entry ata_device_blacklist [] = /* Devices with DMA related problems under Linux */
    WDC AC11000H, NULL, ATA_HORKAGE_NODMA ,
    WDC AC22100H, NULL, ATA_HORKAGE_NODMA ,
    WDC AC32500H, NULL, ATA_HORKAGE_NODMA ,
    WDC AC33100H, NULL, ATA_HORKAGE_NODMA ,
    WDC AC31600H, NULL, ATA_HORKAGE_NODMA ,
    WDC AC32100H, 24.09P07, ATA_HORKAGE_NODMA ,
    WDC AC23200L, 21.10N21, ATA_HORKAGE_NODMA ,
    Compaq CRD-8241B, NULL, ATA_HORKAGE_NODMA ,
    CRD-8400B, NULL, ATA_HORKAGE_NODMA ,
    CRD-848[02]B, NULL, ATA_HORKAGE_NODMA ,
    CRD-84, NULL, ATA_HORKAGE_NODMA ,
    SanDisk SDP3B, NULL, ATA_HORKAGE_NODMA ,
    SanDisk SDP3B-64, NULL, ATA_HORKAGE_NODMA ,
    SANYO CD-ROM CRD, NULL, ATA_HORKAGE_NODMA ,
    HITACHI CDR-8, NULL, ATA_HORKAGE_NODMA ,
    HITACHI CDR-8[34]35,NULL, ATA_HORKAGE_NODMA ,
    Toshiba CD-ROM XM-6202B, NULL, ATA_HORKAGE_NODMA ,
    TOSHIBA CD-ROM XM-1702BC, NULL, ATA_HORKAGE_NODMA ,
    CD-532E-A, NULL, ATA_HORKAGE_NODMA ,
    E-IDE CD-ROM CR-840,NULL, ATA_HORKAGE_NODMA ,
    CD-ROM Drive/F5A, NULL, ATA_HORKAGE_NODMA ,
    WPI CDD-820, NULL, ATA_HORKAGE_NODMA ,
    SAMSUNG CD-ROM SC-148C, NULL, ATA_HORKAGE_NODMA ,
    SAMSUNG CD-ROM SC, NULL, ATA_HORKAGE_NODMA ,
    ATAPI CD-ROM DRIVE 40X MAXIMUM,NULL,ATA_HORKAGE_NODMA ,
    _NEC DV5800A, NULL, ATA_HORKAGE_NODMA ,
    SAMSUNG CD-ROM SN-124, N001, ATA_HORKAGE_NODMA ,
    Seagate STT20000A, NULL, ATA_HORKAGE_NODMA ,
    2GB ATA Flash Disk, ADMA428M, ATA_HORKAGE_NODMA , /* Odd clown on sil3726/4726 PMPs */
    Config Disk, NULL, ATA_HORKAGE_DISABLE , /* Weird ATAPI devices */
    TORiSAN DVD-ROM DRD-N216, NULL, ATA_HORKAGE_MAX_SEC_128 ,
    QUANTUM DAT DAT72-000, NULL, ATA_HORKAGE_ATAPI_MOD16_DMA ,
    Slimtype DVD A DS8A8SH, NULL, ATA_HORKAGE_MAX_SEC_LBA48 ,
    Slimtype DVD A DS8A9SH, NULL, ATA_HORKAGE_MAX_SEC_LBA48 , /* Devices we expect to fail diagnostics */ /* Devices where NCQ should be avoided */ /* NCQ is slow */
    WDC WD740ADFD-00, NULL, ATA_HORKAGE_NONCQ ,
    WDC WD740ADFD-00NLR1, NULL, ATA_HORKAGE_NONCQ, , /* http://thread.gmane.org/gmane.linux.ide/14907 */
    FUJITSU MHT2060BH, NULL, ATA_HORKAGE_NONCQ , /* NCQ is broken */
    Maxtor *, BANC*, ATA_HORKAGE_NONCQ ,
    Maxtor 7V300F0, VA111630, ATA_HORKAGE_NONCQ ,
    ST380817AS, 3.42, ATA_HORKAGE_NONCQ ,
    ST3160023AS, 3.42, ATA_HORKAGE_NONCQ ,
    OCZ CORE_SSD, 02.10104, ATA_HORKAGE_NONCQ , /* Seagate NCQ + FLUSH CACHE firmware bug */
    ST31500341AS, SD1[5-9], ATA_HORKAGE_NONCQ |
    ATA_HORKAGE_FIRMWARE_WARN ,
    ST31000333AS, SD1[5-9], ATA_HORKAGE_NONCQ |
    ATA_HORKAGE_FIRMWARE_WARN ,
    ST3640[36]23AS, SD1[5-9], ATA_HORKAGE_NONCQ |
    ATA_HORKAGE_FIRMWARE_WARN ,
    ST3320[68]13AS, SD1[5-9], ATA_HORKAGE_NONCQ |
    ATA_HORKAGE_FIRMWARE_WARN , /* Seagate Momentus SpinPoint M8 seem to have FPMDA_AA issues */
    ST1000LM024 HN-M101MBB, 2AR10001, ATA_HORKAGE_BROKEN_FPDMA_AA ,
    ST1000LM024 HN-M101MBB, 2BA30001, ATA_HORKAGE_BROKEN_FPDMA_AA , /* Blacklist entries taken from Silicon Image 3124/3132
    Windows driver .inf file - also several Linux problem reports */
    HTS541060G9SA00, MB3OC60D, ATA_HORKAGE_NONCQ, ,
    HTS541080G9SA00, MB4OC60D, ATA_HORKAGE_NONCQ, ,

  2. Re:Is there a site maintaining a list of "bad" SSD by Ken_g6 · · Score: 5, Informative

    It takes a couple of links and searching through source code to get there. So here's the list of problematic drives, better formatted but still in regular expression format:

    /* devices that don't properly handle queued TRIM commands */
    Micron_M500*
    Crucial_CT*M500*
    Micron_M5[15]0*
    Crucial_CT*M550*
    Crucial_CT*MX100*
    Samsung SSD 8*

    So, basically, all the ones I thought were the best. The list of whitelisted drives after it only includes those brands, Intel, and ST-something. So other brand may be unknowns.

    --
    (T>t && O(n)--) == sqrt(666)
  3. Re:Is there a site maintaining a list of "bad" SSD by Anonymous Coward · · Score: 5, Informative

    The Crucial MX100 with the latest MU02 firmware is now whitelisted by the Linux Kernel, and has it's TRIM ability re-enabled.

  4. Re:Is there a site maintaining a list of "bad" SSD by idontgno · · Score: 5, Informative

    ObPedant: those aren't regexes, they're globs. Otherwise (for instance), the Samsung entry would match

    Samsung SSD<space>
    Samsung SSD<space>8
    Samsung SSD<space>88
    Samsung SSD<space>888
    .
    .
    .

    ad nauseam: the "*" regex operator means "zero or more occurrences of the previous pattern", which in this case is the character "8".

    At least, I hope they're not supposed to be regexes. Otherwise, the kernel blacklist itself will have some serious issues known-bad SSDs because someone never learned how to create a regular expression.

    --
    Welcome to the Panopticon. Used to be a prison, now it's your home.
  5. Re:Is there a site maintaining a list of "bad" SSD by Anonymous Coward · · Score: 4, Informative

    There's also an upgrade path for Micron's older SSDs - I just upgraded my Crucial M550 from MU01 to MU02 using a bootable ISO from Micron's support site:

          http://www.crucial.com/usa/en/support-ssd-firmware

  6. Re:TRIM -- command of mass destruction by Anonymous Coward · · Score: 5, Informative

    Man Linux users are hilarious. TRIM has worked and been safe on every other platform for ages.

    LOL.

    Do you know who you're replying to? Matt Dillon is the principal developer of DragonflyBSD, and the HAMMER fileystem.

    While he probably does use Linux from time to time, I think you're more likely to find him at a BSD system.

  7. Re:Another Deceptive Slashdot Title by Anonymous Coward · · Score: 3, Informative

    Windows and MacOS do not issue Queued TRIM in the first place. They only issue the regular TRIM command, which has to stop all data in flight and quiesce the entire submission queue (all tags, etc).

    Linux is ultra-high-IO-load optimized, queued TRIM is a must when dealing with high-performance storage (not just SSDs). Maybe it should stop trusting devices that are neither attached to a SAS or FC transport by default when they claim to actually implement advanced features, though.

  8. Re:Btrfs? by BitZtream · · Score: 3, Informative

    COW doesn't solve the problem that TRIM solves.

    Once you write over the entire drive once, then all blocks of flash are dirty and MUST be erased before any new writes can take place. At this point, you can't even write the meta data without a sector erase, then you can write to it ... just to tell it that you've added another ref to an existing block.

    With TRIM, blocks are erased when they are no longer used, so they do not need an erase cycle when before writing to them.

    I don't use BTRFS, but do use ZFS and it most certainly benefits from TRIM on an active drive, which is certainly what all your SSDs are going to be.

    --
    Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
  9. Re:Another Deceptive Slashdot Title by amiga3D · · Score: 1, Informative

    It's a buggy hardware issue. How each operating system deals with it may vary but the entire dilemma results from shitty hardware.

  10. Re:Is there a site maintaining a list of "bad" SSD by PlusFiveTroll · · Score: 4, Informative

    Because Windows doesn't do queued TRIM.

    TRIM in Windows and Linux before now worked more like this. -DATA- -DATA- -FLUSH ALL COMMANDS TO DRIVE- -WAIT- -TRIM- -DATA- -DATA- When I drive was doing the trim thing it could not do anything else, there could be no other in flight commands to the drive.

    This is different. -DATA- -DATA- -TRIM- -DATA- -TRIM- -DATA- -DATA- -DATA-

    TRIM is part of the NCQ and is an operation occurring with other instructions in the SATA queue. Problem is some disk manufactures have pissed this up. It seems likely that a firmware update will be able to fix this issue.

    https://en.wikipedia.org/wiki/...

  11. Re:Is there a site maintaining a list of "bad" SSD by complete+loony · · Score: 4, Informative
    From the SSD Endurance Experiment;

    The drive's media wear indicator ran out shortly after 700TB, signaling that the NAND's write tolerance had been exceeded. Intel doesn't have confidence in the drive at that point, so the 335 Series is designed to shift into read-only mode and then to brick itself when the power is cycled. Despite suffering just one reallocated sector, our sample dutifully followed the script. Data was accessible until a reboot prompted the drive to swallow its virtual cyanide pill.

    --
    09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.