Seagate Firmware Update Bricks 500GB Barracudas
Voidsinger writes "The latest firmware updates to correct Seagate woes have created a new debacle. It seems from Seagate forums that there has yet to be a successful update of the 3500320AS models from SD15 to the new SD1A firmware. Add to that the updater updates the firmware of all drives of the same type at once, and you get a meltdown of RAID arrays, and people's backups if they were on the same type of drive. Drives are still flashable though, and Seagate has pulled the update for validation. While it would have been nice of them to validate the firmware beforehand, there is still a little hope that not everyone will lose all of their data."
I didn't even know that people updated the firmware on their drives.
Mirrors are not a backup, and now a raid array isn't either. I'm gonna stick to printing my porn and storing it in a cargo container.
They'll be no different from other HDD manufacturers. I recently got a Seagate external because the price and 5-year warranty were a great combo. I hear they are going to lower the warranty period and now these problems; makes me wonder where I will be able to buy reliable drives in the future.
I would like to know where the hell the firmware update IS? I have opened a ticket with Seagate for each drive. Followed the directions (which were linked to here last week) in detail, and I have heard back NOTHING.
Not even an acknowledgment that they have looked at my tickets. I got a "your ticket was created" email, and that is it.
Seagate is getting very close to losing a lot of customers.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order- Ed Howdershelt Via Tass
Although that was probably(?) meant as a joke, I'm wondering if that might not end up being the way things go. Think for a moment. CDs work by blistering aluminium foil with a laser. It's not perfect, but it works.
If you had a more stable structure and a little more oomph on the writing laser, it is quite possible that there are ceramics or metals you could etch with an information density every bit as good as a hard drive.
As you'd be altering the structure of something, rather than playing with very weak magnetic fields on a medium that doesn't hold them very well, the longevity should be every bit as good as that of the baked clay tablets found Mesopotamia.
Not that there's anything wrong with magnetic fields. Core memory is considered good for 100+ years under normal conditions and is still used today for extreme radiation environments. The magnetic field recorded in lava flows has lasted hundreds of millions of years.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Because a brick can't be fixed to do anything better. The term as originally used is wonderfully descriptive - I think I first heard it used about the PSPBrick trojan, which really turns the PSP into a BRICK. Like, you can't do anything with it anymore. There isn't a thing in the world that you can do with your PSP, ever again, except keep a table from wobbling. I like having a term for that sort of hardware-disabling software problem, and I can't imaging there's anything as evocative as "brick" for the purpose.
ResidntGeek
Depends on the application; but probably a lot. With SATA drives natively supported by SAS controllers, and substantially larger and cheaper than SAS, they are quite attractive for anything that doesn't need very high speed.
Can we, for God's sake, just permanently ban the use of the word "brick" or "bricked" in the summaries. I have yet to see it used correctly.
Brett
I work for Seagate. I was there when the fit hit the shan, and I saw everything going in internally, as well as externally.
I really love my job, so please excuse the sock-puppet nature that creating a brand new account and claiming to be an authority on the subject I must seem to be. But I am a geek, and I really think you all need to know the true story behind the scenes.
This whole thing started with the 1.5 Terabyte drives. It had a stuttering issue, which at first we all thought was a simple bad implementation of SATA on common chipsets. Seagate engineers promptly jumped in and worked to try to duplicate the issue and prove where the problem was. This wasn't a massive rush as 1.5tb drives are what? 5% of the drives on the market. When it became obvious that the issue was more widespread, they buckled down and put out a couple of firmware revisions to fix it.
Now, in the 1.5tb drives, there are 2 main revisions. the the product line that gets the CC* firmware, and the line that gets the SD* firmware. They came out with firmware CC1H and SD1A to fix these issues and started issuing them.
But, seagate has always been restrictive of handing out their firmware, so such updates required calling in with your serial so that the people who had access to hand out the firmware could check a) model, b) part number, and c) current firmware just to make absolutely sure that they were giving the right firmware out. This has been a procedre that has worked for YEARS up until now.
Then the bricking issue came to their attention. It took so long because it's an issue that's hard to track down - pretty much the journal or log space in the firmware is written to if certain events occur. IF the drive is powered down when there are 320 entries in this journal or log, then when it is powered back up, the drive errors out on init and won't boot properly - to the point that it won't even report it's information to the BIOS.
This is a rare, but still obviously bad issue. Up until now, we all figured it was just some standard type of failure, as it was such a rare event, so we'd RMA the drives.
So, for whatever reason, mid management started freaking out (as it could be a liability for seagate, I suspect - ontop of the already potentially liable issue of the stuttering problem causing drives to fail in RAIDs). So, they pushed the release of the SD1A firmware to the general public. They took a few days to 'test', though it was mostly just including some code in the batch file that kicks off the firmware updater, to check that it is a BRINKS drive, and the proper model number. Then it was kicked out to the public.
Please understand, this firmware had to go through five different checks to make sure it applies to the specific conditions to qualify sending to a customer, before now. 5 chances for us to go your drive needs the other (or none) firmware update. Suddenly, it's down to ONE check, and even that was more designed for a contingency just incase the wrong firmware was sent out.
Of course, it starts bricking drives.
Right now, the engineers are crapping themselves, the firmware's been pulled, the support agents are told to say "The firmware will be released soon" and no real procedure to fix this issue is in place. Our phones are flooded so bad that it locks the system up when there are too many calls in queue, and emails are coming in at hundreds an hour.
We simply cannot keep up.
The good news is, the chance of your drive simply not spinning up one day is very low. And for those of you who flashed the wrong firmware - be patient. It's not bricked, just unable to write data to the platters properly. When they have a *GOOD* firmware out, a new flash should un-brick the drives. If not, flashing it back to SD15 should make it work again.
Seagate really pushes the idea of being open and honest as much as we can without being sued to hell. They let agents make choices and use their skills instead of scripting us to death. They worked hard to bring their support back t
This problem isn't anything to do with the drives being SATA versus anything else, and the FC lobby shouldn't get too smug. Some (with hindsight, at least) bad engineering decisions got taken in a complex product, and the result was that the product got into trouble. All disks are a mixtures of electronics, mechanicals and firmware, and although this happened today on a SATA drive it could happen equally well tomorrow on an FC drive. The answer to your question is ``anyone who wants to be power, space and money efficient''. There are products now shipping in volume --- Pillar, Sun's Fishworks boxes spring to mind --- where the performance of SATA is brought up to FC standards for many workloads (your mileage may vary, objects may be closer than they appear, etc) by the application of appropriate filesystem structures, battery-backed RAM, flash, SSD, etc, etc. There are, before anyone jumps in, workloads where nothing this side of a gazillion independent spindles of 15000rpm FC is going to work. But conversely, there are other workloads where performance isn't as much of an issue as space and power density (backup, for example) or where capacity causes the business far more issues than performance. I've got a Pillar stuffed full of SATA. There's FC available for people who need the performance bump, but I don't: my application workload saturates on other factors long before it maxes out the NAS server, and even if that were not the case, the value to my business of small deltas of performance (and the difference between FC and SATA is a lot smaller than people make out) is less than the massive difference in price. In general terms, SATA today is where FC was five years ago, and even if you end up short stroking it it's _still_ cheaper than FC. My Pillar allows me to effectively short-stroke SATA for performance and use the residue for non-critical data, which is nice, of course. Performance isn't everything, as otherwise we'd all be going to the supermarket in Formula 1 cars. There are other criteria, and SATA may be appropriate for your business, depending on what your business is. And slightly more controversially, I'm suspicious of admins who claim their application needs the latest bleeding edge of a component --- disks --- which is on a slow development curve for performance. The speed of disks scales, loosely, at the square root of the capacity times any increase in rotational speed, but seek times have only improved by a factor of four over the twenty years I've been running fileservers for. If you're seek bound, you've got deeper problems that disk technology won't always help you with. ian
Maxtorman, I'd mod you up if I had the points. Your comments are the first ones to alleviate a very significant knot that formed in my stomach after reading this. I'm still a little concerned though, and have some questions at the bottom I hope you could answer.
I'm a little late to the party because I only use these only for non-critical stuff like home office and family PC's, but the prospect of having all my drives inevitably die really scares me. I've bought 18 drives (ST31000340AS and ST3500320AS all w/ FW SD15) in the last half of 2008 that sound like they match those reported to fail on the forum.
Funny enough I was complaining to my vendor about 4 drives that had to be replaced because they died within a month of use. Thought it was a bad batch they were pawning-off on customers, but I still trusted the Seagate brand.
So, my questions:
1) Is this definitely fixable in firmware? Should I be buying new drives right now?
2) What are the honest chances of a drive dying before a working firmware patch? My critical stuff is in RAID5 so I can always rebuild, but the gaming rig is RAID0, and off-site stuff like mom's media PC is only a single drive.
I appreciate your comments. Good to know there's a guru in the Slashdot community :)
-Matt
--- Need web hosting?
Oh they are trying. Trying to bugger up systems! Surely if they validated the firmware update before releasing it the problems would have been caught in the QA process? I'd love to have been a fly on the wall in the QA meeting after the latest fix was released.
XML is like violence. If it doesn't solve the problem, use more.
Deleted already :(
Slashdotted ?
So far, there is no indication that they even have a QA process...
Indeed.
What the summary fails to mention is that the original package Seagate posted to their web site had a flash update utility that would SEGV before doing anything.
Good thing, too. Because otherwise I would have bricked five drives.
Yes, Seagate released firmware that bricks drives with a flash update utility that SEGVs.
That's epic fail. Truly epic.
Seagate never played Whack-a-mole growing up.
The game you played with a server full of seagate drives "growing up" is that if it was off long enough to cool down it was a virtual certainty that at least one of those drives wouldn't spin up. The odds of another disk developing stiction while you were taking the first one out of the case and whacking it with a screwdriver approached infinity as the number of disks became large enough to make the machine or enclosure heavy. Is that enough like whack-a-mole for you?
On the plus side, most of those drives had really huge filters in them. I had a 40MB RLL disk that I opened, de-stuck, and closed with nary a data error. I just did it in my bedroom (huh huh) with no dust or static control in the environment whatsoever. That drive eventually burned a trace right off of its control board, and then burned off the jumper wire I replaced it with, but that was over a year later and I'm quite sure one had nothing to do with the other.
I don't know how Seagate actually got its good name. I've used a lot of their disks, since way back when, and I just don't see it. I used to like Maxtor in spite of the noise, but WD has pretty much been my best friend all along. Today I will hardly buy anything else.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Well unfortunately you will get the occasional clunker no matter who you go with. Any product cranked out in those kinds of numbers is bound to have at least small bad batches no matter how good the QA. Now as for ExcelStor, when I said small I was talking 80-160Gb range. I have built plenty of office machines with ExcelStor drives in that range and they are quite popular. They are VERY quiet, which if you are building a machine which is going to be on the desktop is a plus.
With WD I've always had those fail really quick(and thus be under warranty) or not at all. I've got a drawer filled with 40Gb WD IDE drives from upgrades that I'll need to figure out what to do with. Samsung I've had good luck with on very large capacity, as well as Hitachi. But if you are wanting a 1.5TB you are pretty much stuck in Seagate land-they are pretty much the only game in town. Might I suggest either the Samsung I linked to earlier or perhaps one of these, two in a RAID 0 if you really must get above the 1TB range? Because until Seagate gets their collective shit together I would be afraid if picking up one of their drives ATM. My WDs may only be 500GB each(and that strikes me as funny as hell that I can say "only" when my first HDD was 2GB) but they are VERY quiet and give me more space than I will ever need.
ACs don't waste your time replying, your posts are never seen by me.
Can we please stop saying this?
When a router is bricked (bricked is a layperson's term mind you, not a technical term!) it can nearly always be recovered through tftp in the first few ms of boot up (if it automatically listens in an automated failsafe recovery step like some Buffalo routers) or through a JTAG port. By your logic, a router is never bricked unless the NVRAM is fried, right? Wrong. It's bricked - just like iPhone users who jailbreak and then end up with dead iPhones. Sure, they can be recovered, but for the average joe, the thing is a "brick" good for little else than a paperweight or door stop (depending on mass) even though it can be fixed.
It's not a technical term by any means. It's slang for "ZOMG! IT DONE BROKE I DUNNO IF IT CAN BE FIXED!" then you find out it can be undone through a firmware update, even if by a JTAG port, then it's "I HAS SHINY FIXED! KTHXBAI!"
Now, let's get over the word "brick" and agree that its meaning is not necessarily "permanently broken" but its meaning is "non-working shiny which may or may not be reparable."
KTHXBAI!
The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
You're close, but bricked really just means "you can't fix it, nor can the average layperson". There is such a think as "unbricking".
For example, you might brick a motherboard by flashing it with some hacked BIOS you found on a tweak forum. If you're as dumb as the average forum troll, you're probably not clever, resourceful or brave enough to hotflash your socketed chip on a different board, but an experienced techie could do it.
There's also a pretty large market of "unbricking services", usually just some half-breed with a special cable he bought off of some other wannabe-crook on eBay. He'll reflash your PSP, cell phone or hacked FTA receiver for ten bucks, right from his ornate Honda Civic office.
There are very few cases where a "bricked" device is truly beyond repair by a skilled and equipped technician. If a gadget sells for $100, and your staff tech costs $50/hour, then as long as he can fix more than one unit every two hours (minus S&H and markdown), you fix the gadget. In practice, you end up seeing the same problems over and over, most of them very simple, so your tech might be able to fix 5+ per hour, and I'm being conservative here.
Throwing it in the trash is not a good idea, because if you don't try to fix the broken ones, someone else will buy your trash and do it behind your back. Then you have a bunch of poorly-repaired devices bearing your brand name, floating around generating forum posts and hate mail all over the web. The cost of junking returns can be greater than the cost of repairing them.
-Billco, Fnarg.com
Core memory is making a comeback (sort of) - http://en.wikipedia.org/wiki/Magnetoresistive_Random_Access_Memory
retrorocket.o not found, launch anyway?
Which is all well and good.
Having said that, I first discovered that this was possible due last weekend's /. thread on this topic, and had I had a lot of fun tinkering with with an expendable drive.
If it's not under NDA, are there also serial commands to issue arbitrary ATA commands? I didn't see anything immediately obvious in the online help from the drive. (Also, again prefacing the question with "if it's not under NDA", is firmware-flashing-via-serial done with things like "DownloadGenericFile", or with variations on EnterBatchFile or RunBatchFile?)
Seriously, with your posts today you've cut a lot of support costs. Knowing that the root problem is actually understood is very cool; it means it's easier to wait a few days for a fix. Knowing that any bricked drives can be unbricked with a relatively simple (even though officially unsupportable) serial port hack was a huge breakthrough, as semi-shady overseas commercial data recovery "services" were aware of this workaround for weeks, but kept it under wraps in order to gouge customers $bignum for it. I'm glad the open community has finally figured out the secret.
Now that I know the problem's with the 320th entry in the logfile, I can use DisplayLogFile to see how long the thing is. If that's the logfile referred to with the Level L commands, maybe I can use CreateLogFile/DeleteLogFile/InitLogFile to zero it out if it's too long, or lengthen it to 320 entries to consistently reproduce the original misbehavior and then try to find a fix that doesn't involve unplugging the PCB from the drive.
Any hints you can provide (within the limits of NDA, and again, I'm experimenting on a drive whose data is expendable :) would be appreciated. Even if you can't provide any further hints beyond the output of the drive's "Q"uery (DisplayAsciiCmdInfo) online help, thanks again for kicking serious ass today. You've saved the support queues several unnecessary calls, and you've also helped restore trust in the brand. "It was a bug, the bug's known (here's the bug), the patch was rushed through, no drive affected by the bad patch needs to lose data (here's the workaround), no drive affected by the firmware bug has lost any data (here's the other workaround!)" isn't something that marketing can put a positive spin on, but you've given enough details about where each of those steps went wrong that IT people can appreciate.