Slashdot Mirror


Intel Confirms Data Corruption Bug, Halts New SSDs

CWmike writes "Intel has confirmed that its new consumer-class X25-M and X18-M solid state-disk drives (SSDs) suffer from data corruption issues and said it has pulled back shipments to resellers. The X25-M (2.5-inch) and X18-M (1.8-inch) SSDs are based on a joint venture with Micron and used that company's 34-nanometer lithography technology. That process allows for a denser, higher capacity product that brings with it a lower price tag than Intel's previous offerings, which were based on 50-nanometer lithography technology. Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer. When that happens, the SSD becomes inoperable and the data on it is irretrievable. This is not the first time Intel's X25-M and X18-M SSDs have suffered from firmware bugs. The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time. Intel issued a firmware upgrade as a fix."

33 of 137 comments (clear)

  1. Test before you ship by alain94040 · · Score: 4, Interesting

    Maybe they should have used HW/SW co-verification (like Seagate in that study - an example of how a storage company tests their firmware).

    For you software developers out there who enjoy free debuggers, you should know that we, hardware designers, also have our own debuggers. Except they are a little bit more expensive (think $500,000+) and can be quite bulky. But they are the only way to really test firmware before taping-out a chip.

    1. Re:Test before you ship by Anonymous Coward · · Score: 5, Informative

      As a professional FW tester, I can say 1) firmware can be tested easier than the hardware verification the parent is talking about, and 2) Parent is confusing HW verification with firmware verification. Don't confuse HW verification with Firmware, and don't confuse Software testing with hardware verification. They are vastly different than each other, and have their own set of tools and methods (try sitting through a STAR East or STAR West seminar as a FW tester - it is a total waste of time).

      I can (and do) test firmware on buggy hardware all day long - its not an issue.

  2. Ugh... summary.... by blahplusplus · · Score: 3, Informative

    "The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time."

    The performance degradation in the Intel X-25 is not because of a "firmware bug". All SSD's will suffer performance degradation whether or not their writing/wear leveling algorithms have been updated via firmware.

    1. Re:Ugh... summary.... by ShadowRangerRIT · · Score: 4, Informative

      The X25-M's initial firmware was unusually bad; the degradation was more rapid and more severe than necessary. Thus, they issued a firmware update. The results were quite impressive. It not only reduced the perf degradation, but it seems to have made writes faster across the board.

      --
      $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
    2. Re:Ugh... summary.... by Krizdo4 · · Score: 4, Informative

      The performance degradation in the Intel X-25 is not because of a "firmware bug".

      Bugs can cause slowdowns, too

      Though it's highly regarded, Intel's X25-M SSD had a firmware bug that adjusted the priorities of random and sequential writes, leading to a major fragmentation problem that dropped throughput dramatically. The issue was originally uncovered by PC Perspective after two months of testing. Those tests showed that write speeds dropped from 80MB/sec. to 30MB/sec. over time, and read speeds dropped from 250MB/sec. to 60MB/sec. for some large block writes.

      https://www.techworld.com.au/article/302571/ssd_performance_--_slowdown_inevitable?pp=3

      Before firmware update

      the result suggested a write speed of 30 MB/sec.

      http://pcper.com/article.php?aid=691&type=expert&pid=3

      After firmware update

      After composing myself, I did the same file copy I had tried earlier. 76 MB/sec.

      http://pcper.com/article.php?aid=691&type=expert&pid=4

      Not a firmware bug?

    3. Re:Ugh... summary.... by blahplusplus · · Score: 2, Informative

      "Although Intel acknowledged that all of its SSDs will suffer from reduced performance because of significant fragmentation, the type of write levels needed to reproduce PC Perspective's results aren't likely for everyday users, whether they're running Windows and Apple's Mac OS X. Even so, it still released the firmware upgrade to slow fragmentation."

    4. Re:Ugh... summary.... by cecom · · Score: 2, Informative

      The X25-M's initial firmware was unusually bad; the degradation was more rapid and more severe than necessary.

      Unusually bad? More severe than necessary? Not really. Even with this supposed degradation, it was ages ahead of any and all competition. What was unusually bad was the complete lack of understanding from all reviewers who did not understand basic principles and the fundamental limitations of flash and yet rushed ahead with their articles. Those poor fools expected that the driver should behave like a regular HDD - they weren't prepared for the unavoidable deterioration in performance.

      I expect they will be similarly surprised when some drives stop working, because Flash has a very limited number of rewrites. Wear leveling improves the situation, but it just postpones the inevitable. For example, if the driver us full to capacity and you start rewriting a single sector at full speed, you will get to the 10000 rewrite limit relatively quickly.

    5. Re:Ugh... summary.... by cecom · · Score: 2, Informative

      Don't answer with generalities unless you have really thought about it. Wear-leveling is based on heuristics; since it cannot predict the future it is always possible to construct scenarios which will hit the worst case. And if it is theoretically possible, it will happen.

      Imagine a simple case and go from there. Imagine a flash with 5 blocks total, 4 sectors per block. The logical capacity is 16 sectors; the extra block is over-provisioned for wear leveling, etc. Now, imagine that you have the 4 blocks neatly filled with occupied sectors and the 5-th block is erased.

      What happens if you want to write to a random sector? The sector is written in the erased space in the 5th block and its physical position is updated in the map. If you repeat that operation 3 more times, the 5th block will get filled with 4 used sectors, and each of the other 4 blocks will have one invalid sector on the average. So far so good.

      What happens if you want to rewrite a random sector now, though? Tough luck. You need to erase a whole block, pack all valid sectors in it, and write the modified sector.

      From now on you get one erase per sector write. Not only that, but you get 3 additional writes. That is called write amplification and is unavoidable in the worst case.

      Now, tell me, how will wear leveling have helped this? Wear leveling works well only well there is plenty of free space. And even then it is possible to construct artificial bad scenarios.

  3. Re:I find this disturbing by jtownatpunk.net · · Score: 5, Insightful

    Future? You must be new to computers. I updated the firmware in my very first 80's printer to give it more features. Had to pop out the old chips and put in the new ones. I upgraded the firmware in modems from several different manufacturers (some more than once) to add features and fix bugs. I've updated the firmware (BIOS) on most of my motherboards. I've updated the firmware on optical drives. I've updated the firmware on a scanner. I've updated the firmware on SCSI controllers. I've updated the firmware on hard drives. I've updated the firmware on switches and routers. Hell, I've updated the firmware on keyboards.

    This is hardly a new phenomenon.

  4. Re:BIOS password on a disk? by ShadowRangerRIT · · Score: 3, Informative

    They probably meant a hard disk password. Depending on implementation, this means either disk supported full disk encryption, or a simple firmware interlock that prevents reading through the controller without the password (but could be bypassed with forensic tools that read the disk surface directly).

    --
    $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
  5. Feature Not A Bug by mrbene · · Score: 5, Insightful

    Seriously, I'd say this is in the By Design bucket. For the security conscious - set a BIOS password. If the (feds/aliens/wife/others) remove the password, all access to the data is gone.

    Brilliant! Secure!

    Mind you, not being able to change my password once every other day might hinder my current security model.

  6. Re:Well.. by ShadowRangerRIT · · Score: 2, Insightful

    Not really. Making an educated guess from the article, it appears that this is implemented as a simple controller lockout, not actual encryption. So swapping the flash memory into another controller (common computer forensics technique) would bypass it. Most people paranoid enough to want a disk password want real encryption, so using Intel's half-measure of a password is likely a very uncommon scenario. The tests are probably very simple; glossing over this case would be an understandable, if not desirable, oversight.

    --
    $_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
  7. Non-destructive fw update coming + rave on G2 by owlstead · · Score: 2, Informative

    Although this bug should have been caught faster it seems that it is possible to update the firmware without any data loss (fortunately I have put it in a laptop, power outages are no problem). I've looked at the Intel site and the flash utility seems to be simply bootable from CD - if this is the last bug I'll be a very happy punter indeed.

    My 80 GB G2 SSD replaced a not too fast laptop drive. I'm now trying Linux, but I'll try Vista as well just for fun - I'll just write my 80 GB to an external drive using Gparted. These drives come highly recommended even if they would slow down to 50% of performance (which, it seems, they don't). I unzipped Eclipse to it and JavaDoc and I could see that the archiver that unzipped the .zip has some performance issues reading the index. It took longer than the unzipping and gunzipping and untarring (the Eclipse gunzipping/untarring took less than 2 seconds - yikes). The only thing faster is the tmpfs in RAM which I used to compile the OpenJDK in on my "workstation". Starting Eclipse takes now less time on my laptop than on my workstation even though it got twice as few cycles.

  8. Re:Well.. by rickb928 · · Score: 4, Interesting

    Is this a cost issue, or a thoroughness issue?

    No, we dont catch every possible scenerio here, either, but we do try very, very hard. Knowing one of the coders in Intel's RAID drivers groups, he goes crazy with stuff. And he just writes Linux drivers. I do not envy him - this past year, every bug he's had to fix has been caused by someone else's code. Someone not writing Intel drivers. And he gets slammed every time for bad testing, as if he can test all the rest of the kernel team's stiff, NTM every fly-by-night Chinese hardware outfit. They're killing him.

    I can't even say 'ext4', he just goes insane. Though he chuckles when I whisper 'ReiserFS', and opens another beer.

    I'm glad I'm not in that line of work.

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  9. Re:Typical redditor by Anonymous Coward · · Score: 3, Insightful

    Yes, they do.

    C doesn't have voltage or current leaks.

  10. Next "Ask Slashdot"... by neokushan · · Score: 3, Funny

    "How to recover lost/corrupted files from an SSD?"

    --
    +1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
  11. Re:I've seen this before by Grishnakh · · Score: 4, Insightful

    Why bother though? If someone breaks in, you'll have to fix or replace your front door, even though the motion-detecting laser robots zapped him. If you just leave your front door unlocked instead, intruders can just walk in, and the laser-wielding robots can zap him, and then automatically dispose of the body for you too. This way, the intruder won't cause any damage.

  12. Re:Typical redditor by Movi · · Score: 4, Insightful

    Because suddenly your code becomes time-based, eg it matters WHEN x=0 becomes x=1, and what's in between.

    Believe me, this kicks you in the balls really hard. I still remember the frustration on my Altera course, where in simulation everything worked fine, but once flashed onto a FPGA everything went to shit.

  13. Re:I find this disturbing by SBrach · · Score: 2, Interesting

    Dell has released updated firmware for my laptops BIOS 17 times.

  14. Re:I've seen this before by NotQuiteReal · · Score: 4, Funny

    To keep out the innocent neighbor kids or the maid who comes on the wrong day. You only want to dispose of bodies that deserve it.

    You'll sleep better that way.

    --
    This issue is a bit more complicated than you think.
  15. Re:I've seen this before by Grishnakh · · Score: 3, Funny

    The maid I can understand, but if your neighbor's kids are anything like mine, they're not innocent.

  16. Re:I find this disturbing by couchslug · · Score: 2, Informative

    Aircraft (F-16 among others) flight control firmware has been updated by reprogramming UVPROMs for many years.

    --
    "This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
  17. Solid State Disk Revolution by JakFrost · · Score: 3, Insightful

    This really seems like a very unlikely event to happen to trigger the problem on these drives for most users since from my experience personally and professionally I have yet to see anyone actually know about BIOS passwords, much less about setting a password on the drive using the ATA secure drive password feature. I am surprised that this was even caught by anyone unless it was a complete fluke or there actually are people or companies using this type of a feature for security. (I don't doubt it but haven't seen it.)

    I personally own the first generation Intel X25-M 80GB MLC SSD and I have written about it extensively here on this forum. I heard rumors that the new TRIM feature support will only made available to this second generation release of these drives but I'm unsure if that is really true. I'm on the fence right now whether I should sell my G1 drive and upgrade to the G2 because of this feature and also for a little more performance because I am so happy with the performance of this drive and also the current 8820 firmware that solved the fragmentation and slowdown issues.

    If you are one of those folks who is still sitting around not knowing what to do when all of this Solid State Disk news is coming out all over then you are missing the biggest paradigm shift to computing performance since the transfer from floppy disks to hard drives.

    With the upcoming re-release of this newly affordable drive around 2009-08-28 from Intel X25-M G2 80GB MLC SSD at ~$230 USD from Newegg or ZipZoomFly you should definitely dig down deep and save a little money to buy one of these drives and experience the biggest performance and responsiveness improvement to your computer that you could imagine.

    If you need a primer on the SSD revolution check out my previous post regarding the articles to read.

    Required Reading for Solid State Drives (Score 1)

  18. What took them so long to report this? by AllynM · · Score: 4, Informative

    Welcome to 2 weeks ago:

    http://www.pcper.com/comments.php?nid=7544

    Allyn Malventano
    Storage Editor, PC Perspective

    --
    this sig was brought to you by the letter /.
  19. Re:Typical redditor by atmurray · · Score: 2, Interesting

    So? It's just a set of different paradigms. It's just like using a different programming language. 99.9% of the time if your code works during functional verification testing (which doesn't simulate the physics of hardware) it will work fine in timing/hardware verification and then also in real hardware (so long as you don't violate any timing constraints, which your synthesis tool will tell you about). That's one of the reasons why RTL synthesis tools like Cadence are so insanely expensive, because they do allow you to go from function verification which verifies the syntax and semantics of your code to hardware verification which allows you to ensure your design will work as expected in actual hardware. If you're getting "kicked in the balls really hard" then it's probably because you need to brush up on your VHDL/Verilog, just like if you're getting segfaults when writing C you're doing something wrong. It doesn't mean that the process is any less deterministic.

  20. Re:Typical redditor by NP-Incomplete · · Score: 4, Informative
    On a chip, adding 2^256-1 and 1 may not equal 2^256 when:
    1. Your destination register is 256 bits.
    2. Your destination register is in a different clock domain.
    3. Your timing constraints are wrong.
    4. Your power grid cannot support switching 256 registers.

    Functional simulations will only catch #1.

  21. Re:Typical redditor by Obfuscant · · Score: 3, Insightful
    ... just like if you're getting segfaults when writing C you're doing something wrong. It doesn't mean that the process is any less deterministic.

    If you are getting segfaults in C you usually ASSUME that the processor you are running on is acting in a deterministic manner and ASSUME the problem is your code.

    The DIFFERENCE is that SOMETIMES the underlying hardware is not acting deterministically because it is a PHYSICAL system that has physical flaws or imperfections. Like leakage currents that are JUST a tiny bit too much, or depend on the state of the neighboring circuit or the temperature.

    In other words, I've written C code that had "segfaults" and it wasn't the fault of the C code, it was memory issues that resulted in problems. And I've written C code that suffered from a buggy compiler, too. I've also written code that "misread" about 1% of the characters typed in at the terminal, and it wasn't the code that was at fault, it was the UART.

    I don't know anything about the source of Intel's problem, but I will say that they can send me ALL of the "defective" SSDs and I'll give them a home where I promise never to set a password on the disk or change it after I do.

  22. Re:Too early to adopt by magarity · · Score: 3, Insightful

    What makes Intel a hard disk vendor anyway? Yes, it is still a disk
     
    It's solid state mass storage, where "solid state" = "chips". A disk is a spinning thingy which is completely different. Since Intel designs and make chips (see: "solid state" = "chips"), it is a perfect choice for them to make solid state mass storage devices out of chips.
     
    Have I mentioned the relationship between "solid state" and "chips" and how "solid state" != "spinning thingy"?

  23. Re:I find this disturbing by Ungrounded+Lightning · · Score: 3, Interesting

    I remember updating the HARDWARE of my modem: Changing the swamping resistors to reduce the Q of the filters and broaden the passbands so the Rx side would work at 300 as well as the original 110 baud. B-)

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  24. I've seen this bug before, sorta. by Allnighterking · · Score: 2, Interesting
    I've seen this before, though I can't remember where. In that case what was happening was that when you changed or removed the password it would corrupt the password file and lock you out. The first time (no password exists set original) does the following
    • read the password
    • hash the password
    • write the hash to the data file

    Now the problem came in that case when you wanted to change/delete the password. It would use a second subroutine to do.

    • read the old password
    • get the old password hash and use it to check if the user knows the correct password
    • get new password (twice and compare)
    • hash the result of the diff of the first entry and the second entry for the new password

    That last step was the killer, seems that someone had declared a global variable and a local variable with the same name. End result one overwrote the others data, and one never knew exactly what the box hashed, nor you could figure out what to key in to the screen to unlock the door. (so to speak.)

    --

    I'm sorry, I'm to tired to be witty at the moment so this message will have to do.

  25. Re:Too early to adopt by magarity · · Score: 2, Insightful

    I'm not even going to put a foot in the flamefest over whether solid state mass storage is cost effective or even reliable - I only ask you don't call some chips that just sit there a spinning disk.
     
    More than 1/4 of Intel's revenue comes from miscellaney chips and motherboards that are not microprocessors. That's a big enough chunk it shouldn't be dismissed as not a core business.
     
    That this bug made it through means someone should be looking for employment and indicates a problem with management and internal processes, not that they shouldn't make the product in the first place.

  26. Insufficient testing? by Waccoon · · Score: 2, Interesting

    Ask anyone who bought a JMicron-based SSD about insufficient testing. How any company thought that controller was worthy for their SSDs is beyond me.

    Before I replaced mine with a Samsung SSD, my [censored] was regularly giving me studders and pauses that lasted for 20-40 seconds at a time. It just flat-out halted everything on the computer for half a minute for no apparent reason, even while reading, not just writing. Apparently, this was predominant behavior for the controller that dominated the SSD arena until the X-25 started blowing people away.

    I think I understand now why Seagate, WD, and the other HD manufacturers are taking so long to get SSDs on the market. Since their market depends almost exclusively on storage, they can't afford to screw up their first SSDs. At least, I hope that's the reason. Even they have to understand that the hard drive market isn't going to last forever.

  27. Re:I find this disturbing by TheRaven64 · · Score: 2, Informative

    The FDIV bug wasn't fixed in firmware. There was a microcode update that worked around the problem, but it made division painfully slow. Intel's 'fix' was to recall all of the affected chips and provide replacements. It cost the company a lot of money and the story became the introduction to Andy Grove's biography.

    --
    I am TheRaven on Soylent News