Slashdot Mirror


Seagate Firmware Update Bricks 500GB Barracudas

Voidsinger writes "The latest firmware updates to correct Seagate woes have created a new debacle. It seems from Seagate forums that there has yet to be a successful update of the 3500320AS models from SD15 to the new SD1A firmware. Add to that the updater updates the firmware of all drives of the same type at once, and you get a meltdown of RAID arrays, and people's backups if they were on the same type of drive. Drives are still flashable though, and Seagate has pulled the update for validation. While it would have been nice of them to validate the firmware beforehand, there is still a little hope that not everyone will lose all of their data."

24 of 559 comments (clear)

  1. Re:If You Can Reflash It, It's Not Bricked by Anonymous Coward · · Score: 3, Informative

    Except that there are cases in this incident where you can't reflash it. So bricked is correct.

  2. Re:Huh.... by sjames · · Score: 3, Informative

    Normally, they wouldn't, but these drives already had issues. Seagate recommended updating the firmware (with their 'handy' windows only updater). Unfortunately, that made the problem worse.

  3. Re:If You Can Reflash It, It's Not Bricked by Urza9814 · · Score: 4, Informative

    I gotta agree with the GP. I mean, the term is 'bricked' as in 'it is now worthless as anything other than a brick (paper weight, building material, etc). If you can just reflash it, it's not bricked. Now of course there are a variety of levels of not being able to flash it anymore, but I would say that if you can flash it back using the same process you used to flash it in the first place...obviously you know how and are capable of doing it, therefore it should be reasonably simple for you to fix it and therefore it is still worth more than a brick. 'Bricked' means you can't fix it, you send it in for service, and all they can do is throw it in the trash and give you a new one.

  4. Not Windows. by antdude · · Score: 5, Informative

    The firmware updater uses FreeDOS from a CD image (ISO). Users had to burn it to a CD and boot from it. Here's an example when I tried it (first release that crashed while upgrading -- did not brick for people and me) under VMware to see if my CD booted: http://img403.imageshack.us/img403/7128/screenshotsa7.gif from Sunday night. I didn't bother to try the second one because that one totally bricked 500 GB HDDs which I have!

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  5. Re:I have a solution for long term data storage. by beav007 · · Score: 4, Informative

    CDs work by blistering aluminium foil with a laser.

    Wrong (at least for the vast majority of current cases). Manufactured CDs are pressed, while CD+/-Rs have an organic dye that the laser heats to change its optical properties.

  6. Re:If You Can Reflash It, It's Not Bricked by Flentil · · Score: 3, Informative

    What if you could send it to a 3rd party to get it working again, like one of those data-recovery specialists? What if it costs $800 to do that? Is it considered bricked then because it's 'totaled' like a car? See it's a slippery slope that easily avoided by simply accepting the current accepted meaning of something being bricked. It's not working right now. It's not good for anything but a paperweight. It's like a brick. It's bricked. Get it fixed tomorrow and it's un-bricked. See that's easy. If you want to talk about something being broken beyond repair, I'm sure there's some other word for that.

  7. Re:bad Seagate, bad! by seifried · · Score: 2, Informative

    "web hosting company" - lots of cheap servers with lots of disk (how else do you sell 10gig VPS servers? It's not like these machines have high IO requirements typically.

  8. I talked with A/S 10 minutes ago by digirave · · Score: 5, Informative

    I talked with A/S 10 minutes ago

    After talking with Seagate A/S a few days ago and told I needed to update my firmware and sent an email on how to update, no fireware was downloadable from the links in the email provided.

    Annoyed I talked to Seagate A/S again today, it seems I do not need a firmware upgrade anymore, and only some of the hard drives made in Taiwan between some date seem to be defective and updating firmware in non-defective drives seems to be causing problems. Hence they removed all links to firmware. Since they are not 100% sure of what I mentioned above yet, they told me they are going to update their site and call me back when things get finalized next week.

  9. Re:If Seagate keeps this up by rrohbeck · · Score: 2, Informative

    http://www.theregister.co.uk/2006/05/23/seagate_6000_job_cuts/

    http://articles.latimes.com/2006/may/23/business/fi-maxtor23

    This was practically all of Maxtor US, Longmont and Milpitas, including what was left of Quantum HDD, except Shrewsbury AFAIK.

  10. Re:If You Can Reflash It, It's Not Bricked by rrohbeck · · Score: 4, Informative

    I have to agree. The manufacturer can generally reload the firmware from scratch through a serial or diag port. After all that's what they do in manufacturing. When I worked with disk drives, we had ROMware, firmware (in flash) and Diskware. The ROM is mask programmed and has only boot code that can program the flash ROM, the flash ROM can be reloaded via the disk interface or a serial port (and can't do much more than load a track from disk), and the disk contains the actual code.
    Then we got rid of the flash ROM and things became a little more exciting because the code in ROM had to be able to read and write a few sectors reliably - for the entire lifetime of the product [line], including cost reductions.

  11. Re:If Seagate keeps this up by diamondsw · · Score: 2, Informative

    The remainder will largely be made up of Maxtor's Asia-Pacific manufacturing workers, Seagate said.

    The drives with bad firmware came out of operations in Thailand, if I recall. This could still easily be Maxtor...

    --
    I don't know what kind of crack I was on, but I suspect it was decaf.
  12. Re:Oh what a long, long fall. by Anonymous Coward · · Score: 1, Informative

    Hutchinson Technology, in Sioux Falls SD where I live, which makes components for Seagate disk drives has been laying off massive ammounts of people.

    http://www.localnews8.com/Global/story.asp?S=9665455

    I had a friend that worked there and I asked him about where the components went and the only company he named was Seagate. It would seem the future is looking bleak for Seagate, and they don't care.

  13. Re:A thank-you! (and some questions) by maxtorman · · Score: 5, Informative

    I'll answer your questions to the best of my ability, and as honestly as I can! I'm no statistician, but the 'drive becoming inaccessable at boot-up' is pretty much a very slim chance - but when you have 10 million drives in the field, it does happen. The conditions have to be just right - you have to reboot just after the drive writes the 320th log file to the firmware space of the drive. this is a log file that's written only occasionally, usually when there are bad sectors, missed writes, etc... might happen every few days on a computer in a nin-RAID home use situation.. and if that log file is written even one time after the magic #320, it rolls over the oldest file kept on the drive and there's no issue. It'll only stop responding IF the drive is powered up with log file #320 being the latest one written... a perfect storm situation. IF this is the case, then seagate is trying to put in place a procedure where you can simply ship them the drive, they hook it up to a serial controller, and re-flashed with the fixed firmware. That's all it takes to restore the drive to operation! As for buying new drives, that's up to you. None of the CC firmware drives were affected - only the SD firmware drives. I'd wait until later in the week, maybe next week, until they have a known working and properly proven firmware update. If you were to have flashed the drives with the 'bad' firmware - it would disable any read/write functions to the drive, but the drive would still be accessible in BIOS and a very good chance that flashing it back to a previous SD formware (or up to the yet to be released proven firmware) would make it all better. Oh, and RAID0 scares me by it's very nature... not an 'if' but 'when' the RAID 0 craps out and all data is lost - but I'm a bit jaded from too much tech support! :)

  14. Re:If Seagate keeps this up by legallyillegal · · Score: 0, Informative

    It's coming from Maxtor's Thailand facility.

    --
    ?giS
  15. Re:Huh.... by Dibblah · · Score: 3, Informative

    It's not "moving the head to prevent wear". It's SMART data gathering. smartctl will soon sort you out. However, I would personally not recommend it.

    smartctl --smart=on --offlineauto=off

  16. Re:A thank-you! (and some questions) by maxtorman · · Score: 2, Informative

    I've been a denizen of slashdot for many years - I just wish all these mod points were on my main account! :)
    But it is nice to be able to contribute knowledge and experience back to the community for once.

    Thing is, this issue -is- rare. But it manifests itself in a way that's hard to distinguish from a normal drive failure. (suddenly no detection in the BIOS; spins up but never is seen on the computer - this can happen for a dozen reasons including a loose or bad cable, physical drive failure, etc) so a whole lot of 'me toos' doesn't mean much. When this issue potentially affects millions of drives, a >.01 chance of failure still adds up to thousands of drives.

    I'll say this. There is no *more* chance of it dying on your next boot up then here has been of it dying your last 3000 boot ups.

    Many people have noted that drives do tend to fail in batches. If there is some slight manufacturing error that causes a drive to fail, it tends to also exist in other drives from that same lot, the closer to the same manufacture time, the more likely it is to also fail. I tend to agree with them - it would make sense to me that if a photo lithography machine got bumped or a slight bad mix in the emulsion for etching the PCBs were present, it would affect all drives around the same build time.

    Who knows.

  17. Re:THE FACTS by maxtorman · · Score: 2, Informative

    Wait a few days - Seagate will have in place a procedure to get bricked drives due to a bad firmware, in place. Once they do, you should just be able to send them the drive and it'll be reflashed with good firmware and sent back. I can't say this for absolute certain, but that's what they're telling us now.

    If you have confidential data on the drive, you have two options:

    a) if you send it in for a reflash, there will be a tech who flashes the drive using a serial interface, and then verifies good read/writes to the data. But he's likely unbricking a hundred drives a week, and doesn't care about what's on the drive unless he happens to maybe notice a folder when he does he read/write test labled "OMG HUGE AMOUNT OF CHILD PORNOGRAPHY". I can't even say that a person will even be doing the R/W test - but there is that chance.

    or b) RMA your drive. The first thing that happens once the drive passes a visual inspection (verifying that the warranty is still valid and the drive hasn't been user-damaged physically) is the drive is thrown on a text machine. if the drive passes the physical tests, then it's firmware is flashed and the diag machine goes through a 7 pass zero-random-zero-random cycle that destroys any and all data on the drive. This not only ensures data wipe, but also helps diagnose any read/write errors on the drive. If you RMA the drive, it's not even hooked up to a human-accessable 'computer' (just diag equipment) until the next customer who received the drive as a refurb, puts it in their computer - at which point it should be so blank, not even the government could recover data from it using the most advanced tech that we know about.

    Call back and push your way up to a supervisor, and see what they offer you on friday, since the agent sent you the wrong firmware.

  18. Re:A thank-you! (and some questions) by maxtorman · · Score: 4, Informative

    As far as I know, if your drive has the CC1G, CC1H, CC1J or any of the CC firmwares really, it is completely unaffected by this issue.
    However, it may need an update if you experience 'stuttering' (the drive pausing for more then a few seconds during data transfer). The CC1H and CC1J firmwares are *fine* and will absolutely not brick your drive.

    I'd still wait a little while though - support is overwhelmed and mistakes are being made as noone is used to these changes. Once everyone gets a routine down (once there -is- a routine at all), they'll be better able to help reliably.

  19. Re:THE FACTS by maxtorman · · Score: 5, Informative

    Thank you! I wish this information would have been public and I didn't have to create a new account to avoid being fired for releasing 'confidential information' - but what can you do with jerkoff lawyers tearing at your corporate heels already?

    Now, to your questions!

    1) It keeps changing because the scope of the issue keeps changing. I'm pretty sure it's a range of drives within the familys noted in the KB article - but also, there are some external drives affected because they contain an internal drive with the problem, that aren't on the article yet. Your best bet would be to compare your drive to the list of models, and then wait a little while.. around friday, I *think* they should have most issues sorted out and the information accurate. But I can't promise anything.

    2) That could very well be it. I'm not privy to the nitty-gritty details, as engineering clammed up pretty quickly - I'm just a geek enough to understand what I hear in passing or the few technical details I came across when I go looking for information. But the mysterious death log being a SMART self-test log would absolutely make sense, and is consistent with what I'm hearing.

    3) Unofficially, I've seen more then just the 1.5Tb drives display symptoms similiar to the stuttering issue, but none so blatent or as impacting as it is in the 1.5Tb drives.

    As far as the firmware fixing both the stuttering issue and the unresponsive-drive issue, yes. The changes for the stuttering issue was made in CC1H and SD1A firmwares. Any firmware equal or more recent then those two, will have the fix for both issues.

    4) I have no idea. SMART characteristics can vary from part number to part number - or even sometimes drive-to-drive; so what is 'out of tolerances' for one part number could be just fine for a different p/n (even though they are the same model number).

  20. Re:Huh.... by magarity · · Score: 2, Informative

    it starts to randomly move the head to prevent wear
     
    I have to ask what wear you think is being prevented by INDUCING activity. If the head arm DIDN'T move, that would be *preventing* wear compared to moving it. But "reducing" wear?

  21. Re:THE FACTS by maxtorman · · Score: 4, Informative

    1 word: Lawsuits. if they gave incorrect information, it could open them up for liability if people acted o that information. When a business' data could be worth millions, one slip-up could cost them dearly. The only reason this firmware isn't such an issue, because of the disclaimers allover the place when you flash a drive.

    yes, the 1.5Tb drives both stutter and are at risk of bricking due to the journal issue. The Stuttering issue is fairly recent and mostly runs in the 1.5tb drives - but the journal issue is older and exists across many 7200.11 drives. ES2 drives and Diamondmax drives.

    SD1A fixes both of these problems in the 1.5Tb drives.

  22. A victims point of view by jupp201 · · Score: 5, Informative

    I am one of the victims and your report confirmed all the problems which I expected to occur inside your company. I previously worked with an electronic giant and the problems are just too similar.

    The catastrophic problems which Seagate is facing now could have been prevented - if there would have been one single person in customer service who would have cared and pushed the issue, which was known for months, up to the right people. A little googling some months ago would have proven that this issue is far bigger than a "one time" incident.

    After all it doesn't happen every day that Data Recovery companies announce with joy that they are able to handle widespread 7200.11 firmware problems. Or that the two major companies which provide recovery solutions race for being the first to have a two click solution for this cash cow.

    Data recovery companies were flooded with drives. They figured out an easy way to fix the firmware and kept it secret. They made a great profit, charging prices as if it was a hardware failure.

    Seagate Datarecovery did the same by quoting up to 1800 USD for a 10 minute fix. Although I am sure that they were the only ones not aware of the easy fix.

    The problem with the undetectable bios drives really isn't new. Your customer service knew it for a long time, but they are paid so little and probably have such strict procedures that they don't care about Seagates customers and no one dared to report the drive failures as a major incident. Everyone shut up about it and the people which are responsible and do care only learned about it months later when (or shortly before) it got out to the press.

    Seagate had months of time to fix it. Two months ago when my drive broke, there was already plenty of information about the problem on the net. The only one who would deny any problem was Seagate.

    I warned your board moderator of the disaster which will strike Seagate months ago. I tried to show him that these were not normal failure rates but the poorly paid guy didn't care.

    The email support who takes two weeks to respond, and the phone and live support were just as ignorant.

    There were people reporting how 4 out of 6 drives broke within weeks, and Seagate would only respond that such failure rates are normal.

    People on the Seagate boards were constantly reporting the problem, but your board moderator shut them up. Threads where getting deleted and locked, including a big thread where the community was working on a fix. The reason, according to Seagate, was that it added nothing to the community.

    The board moderator would consistently tell everyone that there is no known problem with the drive - the same message as your customer service.

    It went as far as blocking links in private messages to a posting on another board which could help the victims. So how could Seagate expect from those people now to actually believe that the company cares?

    The posting on the new board had within a short time 10.000 views. That's when things started to get out of hand for Seagate.

    People were pissed off for months about Seagate. Everyone knew that the firmware was broken, but the company denied any problems. We knew that it is not that difficult to recover the data if you have the tools and knowhow, but the company wouldn't give any assistance. Many would have accepted the fate if the drive would truly be broken. But not if it is inaccessible because of a firmware bug which makes every single drive a -clicking- time bomb.

    People everywhere were calling Seagate harddrives junk drives which are so unreliable that they will never buy them again.

    So I, as many others, went on to warn every single person we knew about the problem with Seagate drives. The hilarious/sad thing is that before, I would recommend Seagate to everyone I knew. If someone would ask me which drive to buy I would reply with no doubt: Seagate.

    This could have been prevented if Seagate would have acknowledged the problem much earlier. I wasted day after day,

  23. Re:Bricked Threshold by Oak1 · · Score: 2, Informative

    Um, no. A "functional electronic device" is one for which its cost of repair = $0.

  24. Seagate Official Root Cause by Anonymous Coward · · Score: 1, Informative

    Root Cause

    This condition is caused by a firmware bug that allows the driveÃ(TM)s Ãoeevent logà pointer to be set to an invalid
    location. This condition is detected by the drive during power up, and the drive goes in to failsafe mode to
    prevent inadvertent corruption to or loss of user data. As a result, once the failure has occurred user data
    becomes inaccessible.

    During power up, if the Event Log counter is at entry 320, or a multiple of (320 + x*256), and if a particular
    data fill pattern (dependent on the type of tester used during the drive manufacturing test process) had
    been present in the reserved-area system tracks when the driveÃ(TM)s reserved-area file system was created
    during manufacturing (note this is not the Operating SystemÃ(TM)s file system, but is instead an area reserved
    outside the driveÃ(TM)s logical block address space that is used for drive operating data structures and
    storage), firmware will incorrectly allow the Event Log pointer to increment past the end of the Event Log
    data structure. This error is detected and results in an ÃoeAssert FailureÃ, which causes the drive to hang as
    a failsafe measure. When the drive enters failsafe further updates to the counter become impossible and
    the condition will persist through all subsequent power cycles.

    The problem can only occur if a power cycle initialization occurs when the Event Log is at 320 or some
    multiple of 256 thereafter. Once a drive is in this state, an end user will not be able to resolve/recover
    existing failed drives. Recovery of failed drive requires Seagate technical intervention. However, the
    problem can be prevented by updating drive firmware to a newer version and/or by keeping the drive
    powered on until a newer firmware version is available.

    Note that in order for a drive to be susceptible to this issue, it must have both the firmware revision that
    contains the issue, have been tested through the specific manufacturing process, and be power cycled.

    Corrective Action
    Seagate has implemented a containment action in to ensure that all manufacturing test processes write a
    Ãoebenignà data fill pattern that does not trigger the error condition. This change is already a permanent part
    of the test process. All drives with a date of manufacture January 12, 2009 and later are not affected by
    this issue as they have been manufactured with this corrected test process. In addition, Seagate is
    releasing updated firmware that will make a drive immune to this failure regardless of the date of
    manufacture.