Intel Confirms Data Corruption Bug, Halts New SSDs
CWmike writes "Intel has confirmed that its new consumer-class X25-M and X18-M solid state-disk drives (SSDs) suffer from data corruption issues and said it has pulled back shipments to resellers. The X25-M (2.5-inch) and X18-M (1.8-inch) SSDs are based on a joint venture with Micron and used that company's 34-nanometer lithography technology. That process allows for a denser, higher capacity product that brings with it a lower price tag than Intel's previous offerings, which were based on 50-nanometer lithography technology. Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer. When that happens, the SSD becomes inoperable and the data on it is irretrievable. This is not the first time Intel's X25-M and X18-M SSDs have suffered from firmware bugs. The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time. Intel issued a firmware upgrade as a fix."
Maybe they should have used HW/SW co-verification (like Seagate in that study - an example of how a storage company tests their firmware).
For you software developers out there who enjoy free debuggers, you should know that we, hardware designers, also have our own debuggers. Except they are a little bit more expensive (think $500,000+) and can be quite bulky. But they are the only way to really test firmware before taping-out a chip.
"The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time."
The performance degradation in the Intel X-25 is not because of a "firmware bug". All SSD's will suffer performance degradation whether or not their writing/wear leveling algorithms have been updated via firmware.
Drivers and Firmware are Intel's biggest weakness. A major possible showstopper for Larrabee. This is just another example on top of the years of historical failures (e.g., all Intel IGPs which had appalling drivers, or late drivers - up to a year to deliver promised features).
Anyway, corruption bugs on storage are a product killer in the marketplace.
To Intel's credit though, unlike Seagate, at least they are admitting there's a problem.
I find it difficult to really blame them for this. What an obscure bug. How do you QA yourself out of something like that without spending more than you did on your R&D?
What the hell is that supposed to mean? Data structures and algorithms don't suddenly work differently when they're synthesized from Verilog instead of compiled from C.
Forgive me if this is a really dumb question... But how do you BIOS password a disk?
BIOS passwords are for preventing the computer from booting or locking users out of the BIOS and have no impact on the disks in the system, no?
Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer.
What does this mean? The flash drive has a password lockout? If so:
(1) a password lockout on a drive is daft, you want to encrypt the drive or not worry about it.
(2) flash drives trashing themselves irretreivably when you reboot after enabling passwords? I've seen that before, on "secure" thumb drives. I won't have anything to do wit that kind of hardwarelockout or encryption after that.
This is the land of nerds--News for nerds. Stuff that matters.
Or are you just new around here?
Seriously, I'd say this is in the By Design bucket. For the security conscious - set a BIOS password. If the (feds/aliens/wife/others) remove the password, all access to the data is gone.
Brilliant! Secure!
Mind you, not being able to change my password once every other day might hinder my current security model.
"the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer". A password protected SSD? Can someone please explain? I must be new to computers...
TOP DSLR Cameras Reviews of the top DSLRs
Although this bug should have been caught faster it seems that it is possible to update the firmware without any data loss (fortunately I have put it in a laptop, power outages are no problem). I've looked at the Intel site and the flash utility seems to be simply bootable from CD - if this is the last bug I'll be a very happy punter indeed.
My 80 GB G2 SSD replaced a not too fast laptop drive. I'm now trying Linux, but I'll try Vista as well just for fun - I'll just write my 80 GB to an external drive using Gparted. These drives come highly recommended even if they would slow down to 50% of performance (which, it seems, they don't). I unzipped Eclipse to it and JavaDoc and I could see that the archiver that unzipped the .zip has some performance issues reading the index. It took longer than the unzipping and gunzipping and untarring (the Eclipse gunzipping/untarring took less than 2 seconds - yikes). The only thing faster is the tmpfs in RAM which I used to compile the OpenJDK in on my "workstation". Starting Eclipse takes now less time on my laptop than on my workstation even though it got twice as few cycles.
Yes, they do.
C doesn't have voltage or current leaks.
"Or are you just new around here?"
I would ask the same of you, replying to an obvious troll like that :P
So how do voltage and current leaks invalidate the universal mathematical principles of computer science? I'm beginning to get a whiff of anti-intellectualism here.
"How to recover lost/corrupted files from an SSD?"
+1 IDisagreeSoHeMustBeATrollOrAnAstroturferOrAShill
Conservatively, 40% of Seagate's high-capacity (1TB+) drives have suffered from a firmware bug which bricked the drive. Seagate has promised free data recovery + firmware fix on affected units - not many people know this! So if your SATA or external Seagate has failed recently on boot, you may be able to recover the drive and your data free. Customer support is very sketchy but if you keep trying for the free data recovery you will succeed. http://www.engadget.com/2009/01/19/seagate-offers-fix-free-data-recovery-for-disks-affected-by-fir/2
Because suddenly your code becomes time-based, eg it matters WHEN x=0 becomes x=1, and what's in between.
Believe me, this kicks you in the balls really hard. I still remember the frustration on my Altera course, where in simulation everything worked fine, but once flashed onto a FPGA everything went to shit.
It sounds like Signetics WOM (Write Only Memory) to me! http://www.national.com/rap/Story/WOMorigin.html
Like the beaver, it's just Dam one thing after another
I would never put a password on my drive, so no corruption for me, but I could use this to get a cheaper price, and I *really* want to put silent drives in my multimedia PC.
I can live without the password feature...
C doesn't have voltage or current leaks.
But C has a lot more loops and pointers, which makes verification a lot harder (I work on a static analysis tool for C/C++, and it's also very expensive ;) )
This really seems like a very unlikely event to happen to trigger the problem on these drives for most users since from my experience personally and professionally I have yet to see anyone actually know about BIOS passwords, much less about setting a password on the drive using the ATA secure drive password feature. I am surprised that this was even caught by anyone unless it was a complete fluke or there actually are people or companies using this type of a feature for security. (I don't doubt it but haven't seen it.)
I personally own the first generation Intel X25-M 80GB MLC SSD and I have written about it extensively here on this forum. I heard rumors that the new TRIM feature support will only made available to this second generation release of these drives but I'm unsure if that is really true. I'm on the fence right now whether I should sell my G1 drive and upgrade to the G2 because of this feature and also for a little more performance because I am so happy with the performance of this drive and also the current 8820 firmware that solved the fragmentation and slowdown issues.
If you are one of those folks who is still sitting around not knowing what to do when all of this Solid State Disk news is coming out all over then you are missing the biggest paradigm shift to computing performance since the transfer from floppy disks to hard drives.
With the upcoming re-release of this newly affordable drive around 2009-08-28 from Intel X25-M G2 80GB MLC SSD at ~$230 USD from Newegg or ZipZoomFly you should definitely dig down deep and save a little money to buy one of these drives and experience the biggest performance and responsiveness improvement to your computer that you could imagine.
If you need a primer on the SSD revolution check out my previous post regarding the articles to read.
Required Reading for Solid State Drives (Score 1)
It is called the bleeding edge for a reason.
Problem is, in the future as hardware is becoming more complicated I think we're going to see more and more issues like this. It seems that it's mostly engineers that end up writing the code at this level, especially when dealing with hardware, and they just can't write software for crap. I have worked with many over the years and there is not one I would consider capable of writing something that needed to be very reliable.
ee
Welcome to 2 weeks ago:
http://www.pcper.com/comments.php?nid=7544
Allyn Malventano
Storage Editor, PC Perspective
this sig was brought to you by the letter
So? It's just a set of different paradigms. It's just like using a different programming language. 99.9% of the time if your code works during functional verification testing (which doesn't simulate the physics of hardware) it will work fine in timing/hardware verification and then also in real hardware (so long as you don't violate any timing constraints, which your synthesis tool will tell you about). That's one of the reasons why RTL synthesis tools like Cadence are so insanely expensive, because they do allow you to go from function verification which verifies the syntax and semantics of your code to hardware verification which allows you to ensure your design will work as expected in actual hardware. If you're getting "kicked in the balls really hard" then it's probably because you need to brush up on your VHDL/Verilog, just like if you're getting segfaults when writing C you're doing something wrong. It doesn't mean that the process is any less deterministic.
What makes Intel a hard disk vendor anyway? Yes, it is still a disk. Expertise which Intel doesn't have is a huge factor along with software support.
Other alternative? It is "OCZ" and Samsung. What kind of software support do they give? Zero. Samsung can't even produce pages without english spelling mistakes.
Call me old fashioned, I am waiting and will continue to wait until Seagate, Western Digital does real stuff, not "we can do it too" stuff if you understand what I mean.
Yes, it doesn't work. If you ever tried to design something using Verilog or VHDL, and tried to generate a real-world design, either an FPGA or a real chip, you will see that things aren't so easy.
I learned it the hard way, while doing my last year of undergraduate course. The simulation worked perfectly - correct input, correct output. On the other hand, making it work on the FPGA was a horrible, horrible, horrible job. Took 2 weeks of trying this, trying that, still with no clue.
Although the problem was a small behavior/synthesis mismatch, I found out that this was going to be a horrible job, because you may have bosses thinking just like you, and ask you to complete the implementation job by a few days. The truth is, that each synthesis job (equivalent to compiling) takes hours (if not days) to complete, and it is almost certain that it won't run on the first try. Believe me, there is a reason that there is a multi-billion dollar market for designing and verifying chips, where a huge portion of that is verification and debugging.
For firmwares, it is sorta similar state. You have to work around hardware bugs, e.g. you have to avoid calling some instruction that is supposed to work, and did work on simulation, because the processor screws itself when that instruction is called once every million time. The problem is, not calling that instruction may be possible, but identifying the problem gets really dirty.
Now I write simulators and models for simulation, rather than writing HDL code that should end up inside some FPGA or ASIC. I am much happier now, since Intel and AMD did a lot of work to verify and fix their dirty bugs, and I can trust the underlying hardware.
Ones who flames us whenever we say "it is early, don't beta test storage hardware" should come up and answer them. Especially when it is predictably personal memories which has no backup.
In an enterprise environment which X-25 was originally designed for, data loss is not a huge problem. They have all kinds of backups,verification, mirroring and cool filesystems like ZFS. When it comes to personal data of ordinary OS X or Windows user, the problem begins. Whenever they suggest an untested technology to ordinary people, they should leave a phone number or working mail address to get called when 1000s of unreproducible personal jpegs are gone forever.
At least they're not like some companies that ignore that there is a (tiny tiny tiny) problem and just gag its customers.
Nonsense. In C/C++ I can set break points or use a debugger to see anything I want. In hardware it is often impossible to watch the signal at a part of a circuit because it is sensitive to my probes. too small, inside a chip etc. For that reason I have to figure out which part of the hardware is faulty by diagnosing its effects on other circuits that I can get too. The process of elimination can often be hard and lengthy - especially with analogue circuitry.
Functional simulations will only catch #1.
If you are getting segfaults in C you usually ASSUME that the processor you are running on is acting in a deterministic manner and ASSUME the problem is your code.
The DIFFERENCE is that SOMETIMES the underlying hardware is not acting deterministically because it is a PHYSICAL system that has physical flaws or imperfections. Like leakage currents that are JUST a tiny bit too much, or depend on the state of the neighboring circuit or the temperature.
In other words, I've written C code that had "segfaults" and it wasn't the fault of the C code, it was memory issues that resulted in problems. And I've written C code that suffered from a buggy compiler, too. I've also written code that "misread" about 1% of the characters typed in at the terminal, and it wasn't the code that was at fault, it was the UART.
I don't know anything about the source of Intel's problem, but I will say that they can send me ALL of the "defective" SSDs and I'll give them a home where I promise never to set a password on the disk or change it after I do.
I guess in hardware static analysis is easier, and dynamic analysis is harder.
The link I saw seemed to indicate static analysis.
ooh fucksticks.. Now I need to research OCz SSD drives (which i'm completely not happy with spending 900 bucks for a 250g drive) to see if the bug applies here?
Now the problem came in that case when you wanted to change/delete the password. It would use a second subroutine to do.
That last step was the killer, seems that someone had declared a global variable and a local variable with the same name. End result one overwrote the others data, and one never knew exactly what the box hashed, nor you could figure out what to key in to the screen to unlock the door. (so to speak.)
I'm sorry, I'm to tired to be witty at the moment so this message will have to do.
This news is days and days old, very old.
Anyone who cares knows about this, we've long since known! What we want to know now is when is the patch coming out, for existing owners and when will the god damned disks be going back on shelves?
There is going to be even more demand for the things, as soon as they are re-listed, prices are going to skyrocket at the retailers.
Also, on this note, it's August 4'th where I am right now, Windows 7 is available within about 72 hours internationally for certain MSDN subscribers, so Intel, where the hell is the TRIM firmware support? Why even bother releasing this new drive about 2 weeks ago, then recall it, patch the firmware for a BIOS password bug, only to re-patch the firmware to add TRIM support.
Surely it should be done, fixed, tested by now, Windows 7 is RTM and the beta has been available for near 7 months for end users, let alone you guys - get with it and get the TRIM firmware out NOW! this idiocy is hampering SSD adoption.
Just showed that your simulation was poor. Try using the AFTER command in VHDL to generate more precise simulations.
On a chip, adding 2^256-1 and 1 may not equal 2^256 when:
5. You're using an original Pentium
(cheap shot, but since it's an intel story...)
May contain traces of nut.
Made from the freshest electrons.
Ask anyone who bought a JMicron-based SSD about insufficient testing. How any company thought that controller was worthy for their SSDs is beyond me.
Before I replaced mine with a Samsung SSD, my [censored] was regularly giving me studders and pauses that lasted for 20-40 seconds at a time. It just flat-out halted everything on the computer for half a minute for no apparent reason, even while reading, not just writing. Apparently, this was predominant behavior for the controller that dominated the SSD arena until the X-25 started blowing people away.
I think I understand now why Seagate, WD, and the other HD manufacturers are taking so long to get SSDs on the market. Since their market depends almost exclusively on storage, they can't afford to screw up their first SSDs. At least, I hope that's the reason. Even they have to understand that the hard drive market isn't going to last forever.
confirm the data errors in my Phison SSD, but the things been booting since somewhere around mid 2008.
Good people go to bed earlier.
Static analysis of concurrent code quickly hits problems with combinatorial explosion. I was recently at a seminar given by someone working on formal verification of MPI code and the number of possible states that a brute-force approach gives for trivial MPI programs is so huge that it's not feasible to reason about them. You can apply some heuristics with MPI that get this down to a manageable number, but that's generally not possible with hardware because the number of interactions isn't limited by a mediating layer.
Verifying hardware is hard. The only reason it works at all is that hardware designers go out of their way to isolate various components and that hardware, generally, has much simpler requirements to software.
I am TheRaven on Soylent News
Doesn't Java have leaks?
Wrong..
Was that ever an interesting bug to run down. It dates back to pre-OpenSolaris days, so what you're seeing comes from closed source origins.
To make a long story short: On Sun 15K (top-end supercomputers at the time) a hardware issue would cause mkfs to fail some times, but only if data was accessed on a non-8-byte boundary.
The fix:
"Doctor, it hurts when I do this."
"So, don't do it."
Cheaper and faster then respinning silicon.
So yes, non-determinism does crop up.
Leakage currents, neighbouring circuit interference and temperature are all able to be modelled (again, this is why Cadence et. al. are so expensive), plus hardware engineers worth their salt put in sensible tolerances for all these values. My point was, hardware design is comparably deterministic as software engineering. Sure, if you break the silicon or run it out of spec it stops doing what you designed it to do, but so does software.
OK so you've provided an edge case where a complex system would exhibit undocumented behaviour that the software engineers weren't aware of. What part of what I said is therefore Wrong? Just because things happen that aren't documented/expected, doesn't make them non-deterministic. If you want me to clarify what I said to the point of nitpicking fine. Digital hardware designers don't generally concern themselves with the analog behaviour of the underlying technology. Why? Because their lives are hard enough as it is dealing with digital stuff which they presume to be deterministic. Digital guys try and make the algorithm or code as simple as is practicable to minimise space whilst maximising space. Hardware just wouldn't be made if digital guys had to worry about non-deterministic effects of every latch and logic element in a design containing millions of such elements. Hell, digital hardware guys these days don't generally concern themselves with RTL, that's why we have languages like VHDL and Verilog.