Intel Confirms Data Corruption Bug, Halts New SSDs
CWmike writes "Intel has confirmed that its new consumer-class X25-M and X18-M solid state-disk drives (SSDs) suffer from data corruption issues and said it has pulled back shipments to resellers. The X25-M (2.5-inch) and X18-M (1.8-inch) SSDs are based on a joint venture with Micron and used that company's 34-nanometer lithography technology. That process allows for a denser, higher capacity product that brings with it a lower price tag than Intel's previous offerings, which were based on 50-nanometer lithography technology. Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer. When that happens, the SSD becomes inoperable and the data on it is irretrievable. This is not the first time Intel's X25-M and X18-M SSDs have suffered from firmware bugs. The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time. Intel issued a firmware upgrade as a fix."
Maybe they should have used HW/SW co-verification (like Seagate in that study - an example of how a storage company tests their firmware).
For you software developers out there who enjoy free debuggers, you should know that we, hardware designers, also have our own debuggers. Except they are a little bit more expensive (think $500,000+) and can be quite bulky. But they are the only way to really test firmware before taping-out a chip.
"The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time."
The performance degradation in the Intel X-25 is not because of a "firmware bug". All SSD's will suffer performance degradation whether or not their writing/wear leveling algorithms have been updated via firmware.
I find it difficult to really blame them for this. What an obscure bug. How do you QA yourself out of something like that without spending more than you did on your R&D?
What the hell is that supposed to mean? Data structures and algorithms don't suddenly work differently when they're synthesized from Verilog instead of compiled from C.
Future? You must be new to computers. I updated the firmware in my very first 80's printer to give it more features. Had to pop out the old chips and put in the new ones. I upgraded the firmware in modems from several different manufacturers (some more than once) to add features and fix bugs. I've updated the firmware (BIOS) on most of my motherboards. I've updated the firmware on optical drives. I've updated the firmware on a scanner. I've updated the firmware on SCSI controllers. I've updated the firmware on hard drives. I've updated the firmware on switches and routers. Hell, I've updated the firmware on keyboards.
This is hardly a new phenomenon.
Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer.
What does this mean? The flash drive has a password lockout? If so:
(1) a password lockout on a drive is daft, you want to encrypt the drive or not worry about it.
(2) flash drives trashing themselves irretreivably when you reboot after enabling passwords? I've seen that before, on "secure" thumb drives. I won't have anything to do wit that kind of hardwarelockout or encryption after that.
No, we're looking at a past like that. Lest you forget, both the 486 and the Pentium had firmware updates too (the Pentium FDIV bug being the better remembered of the two.) My first firmware update was a bugfix in a 300 baud accoustic coupler, way back in 1983 or thereabouts.
Can't imagine why you think this is anything new; even video game consoles have been doing this for ten years now.
StoneCypher is Full of BS
They probably meant a hard disk password. Depending on implementation, this means either disk supported full disk encryption, or a simple firmware interlock that prevents reading through the controller without the password (but could be bypassed with forensic tools that read the disk surface directly).
$_ = "wftedskaebjgdpjgidbsmnjgcdwatb"; tr/a-z/oh, turtleneck Phrase Jar!/; print
Seriously, I'd say this is in the By Design bucket. For the security conscious - set a BIOS password. If the (feds/aliens/wife/others) remove the password, all access to the data is gone.
Brilliant! Secure!
Mind you, not being able to change my password once every other day might hinder my current security model.
Here, let me google that for you: http://lmgtfy.com/?q=hard+disk+password
My user number is prime. Is yours?
"the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer". A password protected SSD? Can someone please explain? I must be new to computers...
TOP DSLR Cameras Reviews of the top DSLRs
Although this bug should have been caught faster it seems that it is possible to update the firmware without any data loss (fortunately I have put it in a laptop, power outages are no problem). I've looked at the Intel site and the flash utility seems to be simply bootable from CD - if this is the last bug I'll be a very happy punter indeed.
My 80 GB G2 SSD replaced a not too fast laptop drive. I'm now trying Linux, but I'll try Vista as well just for fun - I'll just write my 80 GB to an external drive using Gparted. These drives come highly recommended even if they would slow down to 50% of performance (which, it seems, they don't). I unzipped Eclipse to it and JavaDoc and I could see that the archiver that unzipped the .zip has some performance issues reading the index. It took longer than the unzipping and gunzipping and untarring (the Eclipse gunzipping/untarring took less than 2 seconds - yikes). The only thing faster is the tmpfs in RAM which I used to compile the OpenJDK in on my "workstation". Starting Eclipse takes now less time on my laptop than on my workstation even though it got twice as few cycles.
Yes, they do.
C doesn't have voltage or current leaks.
"Or are you just new around here?"
I would ask the same of you, replying to an obvious troll like that :P
So how do voltage and current leaks invalidate the universal mathematical principles of computer science? I'm beginning to get a whiff of anti-intellectualism here.
"How to recover lost/corrupted files from an SSD?"
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Conservatively, 40% of Seagate's high-capacity (1TB+) drives have suffered from a firmware bug which bricked the drive. Seagate has promised free data recovery + firmware fix on affected units - not many people know this! So if your SATA or external Seagate has failed recently on boot, you may be able to recover the drive and your data free. Customer support is very sketchy but if you keep trying for the free data recovery you will succeed. http://www.engadget.com/2009/01/19/seagate-offers-fix-free-data-recovery-for-disks-affected-by-fir/2
Because suddenly your code becomes time-based, eg it matters WHEN x=0 becomes x=1, and what's in between.
Believe me, this kicks you in the balls really hard. I still remember the frustration on my Altera course, where in simulation everything worked fine, but once flashed onto a FPGA everything went to shit.
It sounds like Signetics WOM (Write Only Memory) to me! http://www.national.com/rap/Story/WOMorigin.html
Like the beaver, it's just Dam one thing after another
From my perspective it's actually beggining to be quite common among HW manufacturers to release broken hardware. Actually had 2 run-ins with a required firmware upgrade to gfx boards (both nvidia)
#1 8800GTX 512MB who in it's video bios claimed to only have 256MB. I guess the windows drivers had their own VRam enumeration procedure, but this majorly put other drivers off to a hang (OSX - yeha i know hackintosh is bad, and noveau). I had to get the vbios from the board, hexedit it (4 offsets), then flash it back. Thankfully all went well and now it's reporting what it should have been in the first place. Why did the card lie about this, i have no clue.
#2 9800GTX 512 - would hang on any driver reload in windows. I spent DAYS figuring this out, first with WinXP, finally some older drivers managed to load, then with Windows7 - multiple builds, multiple version (x86, x64), BIOS settings, hackery. Finally something irked me "what if this card is lying too". Went to check for a BIOS update - huzzah "Fixes windows reload driver hang".
On both of these occurences, i wouldn't imagine a normal PC user doing these. So i guess releasing broken hardware which is then "fixed" is the norm. Now that i think about it.. AMD Phenom Look-Aside cache bug, countless ATi-Mac firmware, SMC and EFI updates. This is actually common, no?
Hehe.. .bis at some point.. Also remember upgrading TOS ROMs on my ST :D..
I remember updating my modem to support the
I got one to add that I'm still working on:
GTX 285 - hangs with blue/black screen of death both in idle and in games although far more frequently at idle, for some people it happens so early and often that a RMA is their only option. For me it happens within 3-5 days of bootup. What I think the problem is: the card is designed to throttle down when it's not being fully utilized, but I suspect the voltage regulators weren't designed to handle this, so even during full utilization when the BIOS runs at its default profiles, you'll have massive voltage spikes and drops (I can only monitor the 3.3V sensor voltage for this in RivaTuner, but it appears to affect everything, fan speeds, core/memory clockrates, stability?) so I suspect that after some time, the voltage regulators drop the voltage for enough time that there isn't enough voltage to maintain everything in video RAM, which causes the card to hang. Fix, which I'm still testing because circumstances beyond my control haven't allowed me to try reaching a week of uptime: force the card to run in 3D performance mode in RivaTuner.
It consumes far more power and runs hotter, but I'll take both of these (it'd still be less than falling back to SLI 8800GT's) if the damn thing stays stable. No voltage craziness so far either.
"There is much pleasure to be gained from useless knowledge." - Bertrand Russell.
Dell has released updated firmware for my laptops BIOS 17 times.
C doesn't have voltage or current leaks.
But C has a lot more loops and pointers, which makes verification a lot harder (I work on a static analysis tool for C/C++, and it's also very expensive ;) )
Aircraft (F-16 among others) flight control firmware has been updated by reprogramming UVPROMs for many years.
"This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
This really seems like a very unlikely event to happen to trigger the problem on these drives for most users since from my experience personally and professionally I have yet to see anyone actually know about BIOS passwords, much less about setting a password on the drive using the ATA secure drive password feature. I am surprised that this was even caught by anyone unless it was a complete fluke or there actually are people or companies using this type of a feature for security. (I don't doubt it but haven't seen it.)
I personally own the first generation Intel X25-M 80GB MLC SSD and I have written about it extensively here on this forum. I heard rumors that the new TRIM feature support will only made available to this second generation release of these drives but I'm unsure if that is really true. I'm on the fence right now whether I should sell my G1 drive and upgrade to the G2 because of this feature and also for a little more performance because I am so happy with the performance of this drive and also the current 8820 firmware that solved the fragmentation and slowdown issues.
If you are one of those folks who is still sitting around not knowing what to do when all of this Solid State Disk news is coming out all over then you are missing the biggest paradigm shift to computing performance since the transfer from floppy disks to hard drives.
With the upcoming re-release of this newly affordable drive around 2009-08-28 from Intel X25-M G2 80GB MLC SSD at ~$230 USD from Newegg or ZipZoomFly you should definitely dig down deep and save a little money to buy one of these drives and experience the biggest performance and responsiveness improvement to your computer that you could imagine.
If you need a primer on the SSD revolution check out my previous post regarding the articles to read.
Required Reading for Solid State Drives (Score 1)
It is called the bleeding edge for a reason.
Problem is, in the future as hardware is becoming more complicated I think we're going to see more and more issues like this. It seems that it's mostly engineers that end up writing the code at this level, especially when dealing with hardware, and they just can't write software for crap. I have worked with many over the years and there is not one I would consider capable of writing something that needed to be very reliable.
ee
Welcome to 2 weeks ago:
http://www.pcper.com/comments.php?nid=7544
Allyn Malventano
Storage Editor, PC Perspective
this sig was brought to you by the letter
So? It's just a set of different paradigms. It's just like using a different programming language. 99.9% of the time if your code works during functional verification testing (which doesn't simulate the physics of hardware) it will work fine in timing/hardware verification and then also in real hardware (so long as you don't violate any timing constraints, which your synthesis tool will tell you about). That's one of the reasons why RTL synthesis tools like Cadence are so insanely expensive, because they do allow you to go from function verification which verifies the syntax and semantics of your code to hardware verification which allows you to ensure your design will work as expected in actual hardware. If you're getting "kicked in the balls really hard" then it's probably because you need to brush up on your VHDL/Verilog, just like if you're getting segfaults when writing C you're doing something wrong. It doesn't mean that the process is any less deterministic.
What makes Intel a hard disk vendor anyway? Yes, it is still a disk. Expertise which Intel doesn't have is a huge factor along with software support.
Other alternative? It is "OCZ" and Samsung. What kind of software support do they give? Zero. Samsung can't even produce pages without english spelling mistakes.
Call me old fashioned, I am waiting and will continue to wait until Seagate, Western Digital does real stuff, not "we can do it too" stuff if you understand what I mean.
Yes, it doesn't work. If you ever tried to design something using Verilog or VHDL, and tried to generate a real-world design, either an FPGA or a real chip, you will see that things aren't so easy.
I learned it the hard way, while doing my last year of undergraduate course. The simulation worked perfectly - correct input, correct output. On the other hand, making it work on the FPGA was a horrible, horrible, horrible job. Took 2 weeks of trying this, trying that, still with no clue.
Although the problem was a small behavior/synthesis mismatch, I found out that this was going to be a horrible job, because you may have bosses thinking just like you, and ask you to complete the implementation job by a few days. The truth is, that each synthesis job (equivalent to compiling) takes hours (if not days) to complete, and it is almost certain that it won't run on the first try. Believe me, there is a reason that there is a multi-billion dollar market for designing and verifying chips, where a huge portion of that is verification and debugging.
For firmwares, it is sorta similar state. You have to work around hardware bugs, e.g. you have to avoid calling some instruction that is supposed to work, and did work on simulation, because the processor screws itself when that instruction is called once every million time. The problem is, not calling that instruction may be possible, but identifying the problem gets really dirty.
Now I write simulators and models for simulation, rather than writing HDL code that should end up inside some FPGA or ASIC. I am much happier now, since Intel and AMD did a lot of work to verify and fix their dirty bugs, and I can trust the underlying hardware.
Ones who flames us whenever we say "it is early, don't beta test storage hardware" should come up and answer them. Especially when it is predictably personal memories which has no backup.
In an enterprise environment which X-25 was originally designed for, data loss is not a huge problem. They have all kinds of backups,verification, mirroring and cool filesystems like ZFS. When it comes to personal data of ordinary OS X or Windows user, the problem begins. Whenever they suggest an untested technology to ordinary people, they should leave a phone number or working mail address to get called when 1000s of unreproducible personal jpegs are gone forever.
At least they're not like some companies that ignore that there is a (tiny tiny tiny) problem and just gag its customers.
Functional simulations will only catch #1.
If you are getting segfaults in C you usually ASSUME that the processor you are running on is acting in a deterministic manner and ASSUME the problem is your code.
The DIFFERENCE is that SOMETIMES the underlying hardware is not acting deterministically because it is a PHYSICAL system that has physical flaws or imperfections. Like leakage currents that are JUST a tiny bit too much, or depend on the state of the neighboring circuit or the temperature.
In other words, I've written C code that had "segfaults" and it wasn't the fault of the C code, it was memory issues that resulted in problems. And I've written C code that suffered from a buggy compiler, too. I've also written code that "misread" about 1% of the characters typed in at the terminal, and it wasn't the code that was at fault, it was the UART.
I don't know anything about the source of Intel's problem, but I will say that they can send me ALL of the "defective" SSDs and I'll give them a home where I promise never to set a password on the disk or change it after I do.
Normally, a good, supported modern device will eventually have bugs fixed with a firmware update. Companies can't really test millions of different configurations, usage patterns or a "one in the million" issue. Some companies like Apple have went beyond it and they would even ship "double click in gui" firmware updates. Of course, it is all fail safe.
I always pick hardware which *does have* firmware updates on site, with good documentation and release notes. For example, Lacie keeps updating their firewire and more advanced drives. Not because they can't be used without updating, it is because some engineers find some little issues which could be problem in rare cases or operating system issues, performance enhancements etc.
One thing of course, always read documentation and apply firmware update if it will benefit to you especially regarding BIOS updates.
I remember updating the HARDWARE of my modem: Changing the swamping resistors to reduce the Q of the filters and broaden the passbands so the Rx side would work at 300 as well as the original 110 baud. B-)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I guess in hardware static analysis is easier, and dynamic analysis is harder.
The link I saw seemed to indicate static analysis.
That would be why the ATA standard requires the data to be encrypted with AES, so removing the physical flash chips and attempting to read them would do no good without the encryption key and the data would only be in 512 byte blocks with some ECC code and with an unknown physical to logical mapping. Good luck on decrypting and reconstructing the contents of a 160GB drive 512 bytes at a time with an unknown and complex type of error checking code.
Impersonating Tycho from Penny Arcade since before there was a PA.
I remember when firmware updates meant baking your chips in uv light and then plugging it into something you soldered on a perf-board and connected to the parallel port.
Custom electronics and digital signage for your business: www.evcircuits.com
If you did happen to lose all your data because of this one particular bug, then you have no one to blame but yourself. Storage fails, ALOT. Plan accordingly.
Good-bye
Now the problem came in that case when you wanted to change/delete the password. It would use a second subroutine to do.
That last step was the killer, seems that someone had declared a global variable and a local variable with the same name. End result one overwrote the others data, and one never knew exactly what the box hashed, nor you could figure out what to key in to the screen to unlock the door. (so to speak.)
I'm sorry, I'm to tired to be witty at the moment so this message will have to do.
This news is days and days old, very old.
Anyone who cares knows about this, we've long since known! What we want to know now is when is the patch coming out, for existing owners and when will the god damned disks be going back on shelves?
There is going to be even more demand for the things, as soon as they are re-listed, prices are going to skyrocket at the retailers.
Also, on this note, it's August 4'th where I am right now, Windows 7 is available within about 72 hours internationally for certain MSDN subscribers, so Intel, where the hell is the TRIM firmware support? Why even bother releasing this new drive about 2 weeks ago, then recall it, patch the firmware for a BIOS password bug, only to re-patch the firmware to add TRIM support.
Surely it should be done, fixed, tested by now, Windows 7 is RTM and the beta has been available for near 7 months for end users, let alone you guys - get with it and get the TRIM firmware out NOW! this idiocy is hampering SSD adoption.
Just showed that your simulation was poor. Try using the AFTER command in VHDL to generate more precise simulations.
On a chip, adding 2^256-1 and 1 may not equal 2^256 when:
5. You're using an original Pentium
(cheap shot, but since it's an intel story...)
May contain traces of nut.
Made from the freshest electrons.
Ask anyone who bought a JMicron-based SSD about insufficient testing. How any company thought that controller was worthy for their SSDs is beyond me.
Before I replaced mine with a Samsung SSD, my [censored] was regularly giving me studders and pauses that lasted for 20-40 seconds at a time. It just flat-out halted everything on the computer for half a minute for no apparent reason, even while reading, not just writing. Apparently, this was predominant behavior for the controller that dominated the SSD arena until the X-25 started blowing people away.
I think I understand now why Seagate, WD, and the other HD manufacturers are taking so long to get SSDs on the market. Since their market depends almost exclusively on storage, they can't afford to screw up their first SSDs. At least, I hope that's the reason. Even they have to understand that the hard drive market isn't going to last forever.
confirm the data errors in my Phison SSD, but the things been booting since somewhere around mid 2008.
Good people go to bed earlier.
Static analysis of concurrent code quickly hits problems with combinatorial explosion. I was recently at a seminar given by someone working on formal verification of MPI code and the number of possible states that a brute-force approach gives for trivial MPI programs is so huge that it's not feasible to reason about them. You can apply some heuristics with MPI that get this down to a manageable number, but that's generally not possible with hardware because the number of interactions isn't limited by a mediating layer.
Verifying hardware is hard. The only reason it works at all is that hardware designers go out of their way to isolate various components and that hardware, generally, has much simpler requirements to software.
I am TheRaven on Soylent News
'Did you apply the 0.23b firmware update? That problem's fixed in the latest betas...'
I am TheRaven on Soylent News
The FDIV bug wasn't fixed in firmware. There was a microcode update that worked around the problem, but it made division painfully slow. Intel's 'fix' was to recall all of the affected chips and provide replacements. It cost the company a lot of money and the story became the introduction to Andy Grove's biography.
I am TheRaven on Soylent News
Doesn't Java have leaks?
Leakage currents, neighbouring circuit interference and temperature are all able to be modelled (again, this is why Cadence et. al. are so expensive), plus hardware engineers worth their salt put in sensible tolerances for all these values. My point was, hardware design is comparably deterministic as software engineering. Sure, if you break the silicon or run it out of spec it stops doing what you designed it to do, but so does software.
OK so you've provided an edge case where a complex system would exhibit undocumented behaviour that the software engineers weren't aware of. What part of what I said is therefore Wrong? Just because things happen that aren't documented/expected, doesn't make them non-deterministic. If you want me to clarify what I said to the point of nitpicking fine. Digital hardware designers don't generally concern themselves with the analog behaviour of the underlying technology. Why? Because their lives are hard enough as it is dealing with digital stuff which they presume to be deterministic. Digital guys try and make the algorithm or code as simple as is practicable to minimise space whilst maximising space. Hardware just wouldn't be made if digital guys had to worry about non-deterministic effects of every latch and logic element in a design containing millions of such elements. Hell, digital hardware guys these days don't generally concern themselves with RTL, that's why we have languages like VHDL and Verilog.