Slashdot Mirror


Magnetic Wobbles Cause Hard Drive Failure

An anonymous reader writes "According to this report by IT PRO, scientists working at the University of California have discovered the main reason of hard drive failure. According to researchers, some materials used in hard drives are better at damping spin precession than others. Spin precession of magnetic material effects its neighbors' polarity and this can spread and cause sections of hard drives to spontaneously change polarity and lose data. This is known as a magnetic avalanche. So next time Windows fails to start, you'll know why!"

64 of 276 comments (clear)

  1. Sigh by Anonymous Coward · · Score: 5, Insightful

    Pretty sure this will also keep Linux from starting!

    1. Re:Sigh by larry+bagina · · Score: 5, Funny

      yes, but the gpl v3 fixes this limitation.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    2. Re:Sigh by martin-boundary · · Score: 5, Funny

      Only if you're making the assumption that Linux is running from a hard disk installation. Plenty of linuxes are actually run from a cd drive, in which case the poster is correct: this is really mainly a Windows issue.

    3. Re:Sigh by Anonymous Coward · · Score: 2, Informative

      You can run WinPE from a CD too.

    4. Re:Sigh by jd · · Score: 3, Funny

      The tracks can wobble on independent threads under BeOS.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    5. Re:Sigh by Tim+C · · Score: 4, Insightful

      I'm sorry, but you're *really* clutching at straws there. I personally don't know of anyone who runs Linux from CD. I appreciate that you can, and that some people almost certainly do, but if they're anything but a tiny minority of users I'll eat my PC.

      You're also ignoring that every OS X system will be running from a hard drive, so it's as much an OS X issue. And a *BSD one, a Solaris one, and every other OS.

      Mindless Windows bashing just is not cool, and only serves to lessen the impact of genuine gripes.

    6. Re:Sigh by Tim+C · · Score: 2, Informative

      It wasn't more or less on topic, it was a direct response to the closing remark in the summary. Of course hard drive failure may prevent Windows from booting - but it'll prevent any other OS from booting too, so why the unnecessary swipe at Windows?

    7. Re:Sigh by cp.tar · · Score: 4, Interesting

      In all honesty, while on /. it may seem as an unnecessary swipe at Windows (if there can be such a thing here), the closing sentence only mirrors the fact that Windows are still on a vast majority of computers.

      None of us regularly get phonecalls such as "oh, my Linux won't start, OMG, what I'm gonna do?". We do get them related to Windows, though.

      So while I'm just guessing (and assuming stupidity and not malice), I'd say the OP typed Windows instead of $OS_OF_CHOICE or whatever.

      Besides, it's obvious that the issue affects every and any OS, since it's a hardware issue; so even if the swipe at Windows was intentional, it was supposed to be humorous. Yet the /. mob swarms in on obvious trivialities, thus proving that geeks are just as easily baited as the rest. Yay.

      --
      Ignore this signature. By order.
    8. Re:Sigh by eat+here_get+gas · · Score: 2, Insightful

      because this is a *nix camp, and if M$ wasn't around to swipe, what would we do?

      --
      the significance of a signature is insignificant
    9. Re:Sigh by somersault · · Score: 5, Funny

      Work?

      --
      which is totally what she said
    10. Re:Sigh by bhiestand · · Score: 5, Funny

      I've always liked you, but I have to recommend that you be permanently banned from slashdot for suggesting such a thing.

      --
      SWM seeks new sig for a brief fling
    11. Re:Sigh by LiquidCoooled · · Score: 5, Funny

      How is that any different to full windows?

      --
      liqbase :: faster than paper
    12. Re:Sigh by BigDogCH · · Score: 2, Interesting

      Just a side note............while most of my calls are still from Windows users, I am starting to get more Linux calls (3 this year, up from 0 ever). Two had burned Ubuntu disks from who-knows-where, while one had a retail version Suse which he paid $45 for. All 3 wanted help installing it. The majority are clearly geeks/nerds, however I just wanted to point out that this might be changing.

      Also, last July, I went to a non-nerd relatives house, to help "setup his new digital camera". I was surprised to find he had Ubuntu running on his machine. He said that a friend told him to try it, and he liked it (though he was a bit frustrated by some things, he admitted he only tried it because he was a bit frustrated with Windows). I would suggest that he is a "above average" user, but he isn't a nerd/geek. He is a retired crane operator, and a good craftsman (so he has patience, and is not afraid to read/learn).

  2. I've had this problem before... by commlinx · · Score: 2, Funny

    But I'd put the wobbly boots down to being pissed.

  3. First questions to mind: by UncleTogie · · Score: 4, Interesting

    Which materials/processes dampen the "avalanche" best? Which hard drive manufacturers use those materials/processes?

    --
    Don't tell me to get a life. I'm a gamer; I have LOTS of lives!
    1. Re:First questions to mind: by TheThiefMaster · · Score: 2, Informative

      IIRC it was a specific few models of maxtor that liked to die, particularly 80-160GB drives.

      I have an older 20GB and several newer 250GB and 300GB maxtors and none have died (except one that the delivery man dropped and was replaced free). Before I got these I had a couple of 80GB and a couple of 160GB drives, and those have ALL died now.

      Is this the same as what you've seen?

  4. Re:Question by untaken_name · · Score: 5, Funny

    It doesn't effect Windows at all, actually. It might, however, affect Windows.

  5. As my high school music teacher always said... by RealGrouchy · · Score: 5, Funny

    "So next time Windows fails to start, you'll know why!" It's a bad carpenter who blames his tools.

    - RG>
    --
    Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
  6. Which University of California?! by tutwabee · · Score: 3, Interesting

    It's lovely how both Slashdot post and the original article state that scientists at the "University of California" discovered this. This could mean the University of California, Berkeley, UCLA, UCSD, or others. The website link is to the University of California, Santa Cruz website so I assume that's where the scientists were located.

    1. Re:Which University of California?! by background+image · · Score: 3, Funny

      I heard that they discovered how to "fix" vampires at UC Sunnydale.

      Well I, for one, welcome our new, neutered vampire overlords...

    2. Re:Which University of California?! by kf6auf · · Score: 4, Informative

      Here is a link to the UC Santa Cruz press release and the professor is indeed there (I'm sure you can find him). A little spiel from me: I took a class on nanomagnetism this past term and definitely learned about this effect for individual spins and for domains and it has been known for quite some time. Without reading the PRL article because I'm off campus and don't have a personal subscription ($$$ and, hey, this is /.), my guess is that the model explains the why a lot better than existing ones, and how we get from individual precessing spins to the average spin of the entire domain without brute-force computing it, which is nearly impossible. That being said, different ferromagnetic materials are very different in their interactions between spins and orbits between nearby spins and orbits and so I'm not sure without looking into it how many different ferromagnetic materials this applies too.

  7. Grammar Nazi x2 by Anonymous Coward · · Score: 2, Informative

    Spin precession of magnetic material effects its neighbors' polarity

    That would be "affects" its neighbours' polarity with an option on calling neighbours' erroneous too - depending on the precise physical phenomena that they are trying to describe.

  8. Re:Question by Speare · · Score: 5, Informative
    affect

    When 'effect' is used as a verb, it means 'to create.' The article writeup has the same primary-school error. It's not that hard, people.

    --
    [ .sig file not found ]
  9. SOME types of failures... by DTemp · · Score: 5, Insightful

    So this claims that most hard drive *failure* is caused by this. Now, I'm sure this causes isolated data loss here and there, and maybe I've had a different experience than the average person, but most of my hard drive failures in the past had loud screeching or clicking noises. I dont think this was caused by magnetic spin!

    1. Re:SOME types of failures... by IndigoParadox · · Score: 3, Informative

      It seems possible that this magnetic affectation could be a cause of spontaneous damage the hard drive servo information.

      This would cause one of the clicking-type malfunctions which you described, as that "clicking" you hear is the noise the head assembly makes when the drive is rapidly moving it back and forth across the platter attempting to get a fix.

    2. Re:SOME types of failures... by Fweeky · · Score: 2, Funny

      No, that's the disk throwing a tantrum because it can't find the data it wants.

      You can demonstrate this yourself; open up a running hard disk and remove the platter - in pretty much all cases a rather physically violent ending will occur. That's because the disk is *upset*; you took away its data!

      It's hoped that, once we have disks who's lifetimes can be measured in decades instead of a handful of years, the devices will be mature enough to take such failures in their stride.

    3. Re:SOME types of failures... by RallyNick · · Score: 3, Informative

      Hard drives that are used 24/7 fail because their mechanical (moving) parts are built from the cheapest materials that would last for the warranty period. Most of my Western Digital drives develop a noticeable "whine" within a year or two and typically fail soon after that. The "whine" sounds somewhat like an F1 engine running at max rpm, just not as loud (you can hear it if you get your ear close to the drive), and it definitely sounds like there's metal-on-metal friction in the bearings (not good). Better bearings are slightly harder to manufacture and thus no longer used in consumer products. Afterall we're supposed to get products that break and need to be replaced often to keep the manufacturers in bussiness.

  10. Hmmm... by Edward+Teach · · Score: 5, Insightful

    So next time Windows fails to start, you'll know why!

    Pretty sure that's not the main reason. :-(

    --

    Setting his threshold to 5, Sparky eliminated most of the trolls on /.

  11. Nothing insightful to say. by geekboy642 · · Score: 2, Interesting

    Groovy. Maybe we'll get some more reliable drives based on this discovery. Sadly, every drive I've ever had fail was due to heat. When I was 12, I learned why most people use properly ventilated cases and refrain from leaving a server running in an attic closet. According to the logs, those drives hit upwards of 85C before failing. Fairly impressive, I guess.

    --
    Just another "DOJ fascist authoritarian totalitarian bootlicker" -- Zeio
  12. Re:Question by thanatos_x · · Score: 2, Funny

    Because this is Slashdot. Everyone knows open source transcended hardware ages ago. Also it cannot affect OS X systems, as Apple is never to blame for anything going wrong with their computers. That leaves Windows as the only logical choice. You like logic, don't you?

    --
    I am not an expert. If I am misled in something, please correct me.
  13. Interesting but WRONG conclusion. by Anonymous Coward · · Score: 2, Interesting

    Reading TFA, it sounds like they have found a mechanism for data being randomly lost, NOT bad sectors developing on a disk.

    I would not call this a mechanism for "hard disk failure."

    1. Re:Interesting but WRONG conclusion. by Max+Littlemore · · Score: 3, Funny

      I would not call this a mechanism for "hard disk failure."

      I sure as hell wouldn't call it a "hard disk success"

      --
      I don't therefore I'm not.
  14. Re:It "effects" it's neighbors... by Refenestrator · · Score: 2, Funny

    You mean "its" there, not "it's." Certain possessives don't have apostrophes in ou'r language.

  15. So do lots of other things by Whuffo · · Score: 5, Interesting
    When I think of hard drive failure, it's almost always due to a drive hardware failure. Bad motors, bad chips on the controller board. Another popular failure is due to flaky firmware on the drive controller causing the tracking information on the platter to become overwritten.

    Magnetic wobbles? Let me see a show of hands - how many have had their data spontaneously change due to this phenomenon. Yeah, I thought so...

    1. Re:So do lots of other things by DerekLyons · · Score: 3, Informative

      I don't see anywhere in TFA that specifies this is the cause of complete hard drive failure. It is, however, a very credible mechanism for the slow increase in bad sectors that is typical of many hard drives. (You young un's may not have heard of this, or seen it, as the hardware/software conspires to hide it from you now-a-days.) I have seen this eventually lead to failure (I.E. unuseability) of a drive.
       
      Since (I would assume) a given manufacturer would tend to use the same materials across a broad span of drive models, this could also be a reasonable explanation for why some manufacturers have reps for 'bad drives'.

  16. Misleading title by Tribbin · · Score: 3, Insightful

    Should be something like:

    Magnetic Wobbles Cause Data Loss

    --
    If you mod this up, your slashdot background will turn into a beautiful sunset!
  17. Re:Buy lots of ram by ls671 · · Score: 2, Informative

    True, I would say a machine packed with RAM will wear the drives about 10 times slower than a machine tight on memory. By "tight on memory" I do NOT mean a machine swapping like crazy. A lot of machines tight on memory aren't using their swap-space at all.

    The basic principle is that all spare RAM is used as IO buffers and caches thus lowering the number of physical accesses to the drives needed, lowering drive wear and speeding up the machine. You can never have enough RAM, unless you have more RAM than drive space ;-)

    --
    Everything I write is lies, read between the lines.
  18. Windows won't start?? by suv4x4 · · Score: 5, Funny

    So next time Windows fails to start, you'll know why!

    Because... I didn't install it?

    1. Re:Windows won't start?? by MikeBabcock · · Score: 3, Funny

      Congrats, you've figured out the key to enjoying your new PC :-)

      --
      - Michael T. Babcock (Yes, I blog)
  19. I'm disappointed. by Khaed · · Score: 5, Funny

    This has been up at least an hour.

    So next time Windows fails to start, you'll know why!

    Where are all the jokes about this? Seriously! A bad hard drive is not the only reason Windows won't start. It's not even in the top ten. I've had Windows not start maybe once in ten years over a hard drive. I've had it not start for a variety of other reasons... well the number is greater than one, but I don't keep count (I bet twitter did, though).

    C'mon you slackers, it was a slow day, where are my +5 funny posts about the ineptitude of Microsoft?

  20. Mac OS X by Anonymous Coward · · Score: 2, Funny

    Luckily Mac OS X is safe, as it is pretected by a global reality distortion field.

  21. Re:Question by nocomment · · Score: 3, Funny

    Replace the HD or hand in your geek card please.

    --
    /* oops I accidentally made a comment, sorry */
    /* http://allyourbasearebelongto.us */
  22. This could explain where my files go.... by sssssss27 · · Score: 4, Funny

    I backup all of my DVDs to my computer because I have a notorious habit of losing them. Every once in a while I'll go to watch a movie that I swear I've backed up and can't find on my computer. So at least now I can blame it on some science thing and not just my failing memory. Every day science makes one less thing your fault, lol.

  23. How about a bewolf cluster of failed drives!... by Cafe+Alpha · · Score: 3, Funny

    No that's not it. In Soviet Russia, you fail hard drive... No. Where's that goatse link?

  24. Looks like a precession hit the article by noidentity · · Score: 2, Funny

    During that brief time, each magnetic field contributes forces that affect the precession of neighbouring fields. Each of these spins Combining all those wobbles adds up to a lot of energy that changes the polarity of neighbouring bits and spreads across the surface, causing sections of disk drive to be wiped out.

    That's what they get for using a hard drive!

  25. Re:As usual the slashdot summary is wrong by mshurpik · · Score: 3, Insightful

    >It seems to me that years ago, slashdot authors did more than dump articles into summaries

    Your memory is faulty.

  26. Not "the" but "a lesser known" by mritunjai · · Score: 4, Informative

    This phenomenon is only one of the several ways for bit rot to creep in and make you lose data.

    In bit rot, bits on HDD spontaneously change. It is generally not observable and the results are often blamed on applications and/or OS.

    It is lesser known because in the current state of technology, the aplications, OS, filesystem and even RAID can't even detect the problem much less solve them. (RAID doesn't work because it can't tell which copy is right and which is wrong. It assumed what it got from disk is what it wrote to it.)

    ZFS (Solaris/SUN filesystem) solves this problem by using end-to-end checksums. However, it exists for few platforms only.

    --
    - mritunjai
    1. Re:Not "the" but "a lesser known" by egoproxy · · Score: 2, Insightful

      Information provided by some hardware vendors (3Ware for example) says RAID-6 protects against data loss potentially caused by data rot. Reason given that in RAID-6 there is a second parity set.

      I guess the likelihood of an undetected media failure when you have 2 sets of parity must be very low.

      For those on RAID-5: remember to run periodic Verify processes and make frequent backups!

    2. Re:Not "the" but "a lesser known" by Brane2 · · Score: 3, Insightful

      I don't see how such an error would get around ECC and checksums on each sector that the drive verifies and updates by itself.

      Once few bits in a sectors would flip, that sector would be invalid...

    3. Re:Not "the" but "a lesser known" by DaleGlass · · Score: 2, Interesting

      Most filesystems don't, what you see is an error from the drive which propagates up the chain until the OS gives you an error message.

      The hard disk has some redundant info for the sector and by using ECC can determine whether the sector is good. If it didn't read well, then it'll mark it as a "pending sector" (you can see this in SMART), and try to read it until it works or the sector is overwritten. Once it gets the correct data, it'll remap it to a spare area. That part is something the OS usually didn't notice.

      Now if that fails, the drive has no choice but to return an error to the OS, which ends up giving you an error message.

      FAT is far too simple for anything as fancy as its own ECC checks, by the way. At most it can detect obvious corruption in its structures, such as a file that according to the FAT is located after the end of the disk, but it won't notice corruption in files at all, unless the problem is that the drive fails to read a sector. But in that case it's the drive which detects it, and FAT would let it slip through if the drive didn't detect it.

  27. Re:As usual the slashdot summary is wrong by martin_henry · · Score: 2, Funny

    Your memory is faulty.
    good case of the magnetic wobbles...
    --
    www.purevolume.com/martyd
  28. The real reason by auroran · · Score: 3, Interesting

    We all know the real reason here. It's all those perpendicular bits on the dance floor getting drunk and falling down.

    They were all fairly calm when this footage was shot but the wildness ensued soon after.
    http://www.hitachigst.com/hdd/research/recording_h ead/pr/PerpendicularAnimation.html

  29. Re:How timely... by Anonymous Coward · · Score: 5, Interesting

    I don't get myself whether this will improve Windows though. As you mention, Windows itself tends to eat files just for the fun of it. I've lost more drives in addition to sporadic data from Windows then ALL the hard drives that have died on me on all my machines in the past 10 years.

    My favorite was using Windows Update "Hardware, Optional." I had a Western Digital PCI card because my motherboard BIOS didn't support large drives (>137gb or whatever) and that was the only way to do it (nowadays, clipped drives are actually read properly). Anyways, the card worked fine, I accessed the files regularly; 4 200gb drives hooked up to that card. On checking for security fixes one day as I reguarly do since I was running IE6 and XP, I noticed the (0) ahd changed to a (1). Saw there was a driver update. Hmmm....

    Yes, I was suscipious. Yes, I know if it ain't broke, don't fix it, i.e. don't update your BIOS if everything works fine sort of philosophy. But it was OFFICIAL man. You also have to remember, this is after MS giving all that PR about WHQL or official approved drivers and software. And this was being pushed on MS's own site as an approved update. It was like Microsoft was saying, "Just do it. Your machine'll run better." It was, after all, a cleared driver coming from the main company itself. I even hated using Windows (although not as much as I hate it nowadays) and read /. and agree with the anti-MS sentiment and used to be a Mac user.

    I installed the driver. It required a reboot. I rebooted. And XP promptly went about "fixing" allocation errors etc. on all the drives...Drive Check or whatever it's now called on startup popped up to fix "corrupt" files and "allocation errors." Hmmm...I was suspicious again, was going to pull the power plug (4 drives after all, going through each one after the other), then decided, "Nah, approved update."

    I never felt stupider in front of a computer. Take the shock of losing hardware or data, and multiply by 100. I was, quite literally, ashamed, and on the edge of just giving up on computers entirely despite using them for over a decade. The update for some reason made the drives unreadable from their then current state, so drive check was set on them, which FARKED the master tables totally. The data itself is there, but without the tables, nothing corresponds. I still have the drives in the corner--partial files, file name mismatches, it's horrific. The filenames no longer corresponds with the correct files, i.e. file1 now points to part of data from file3 which was 4gb but now 1.3gb.

    Shame turned to sheer and complete smoldering anger. The result? It accelerated me setting up a big NAS setup by over a year. I will not upgrade to Vista. I will not buy another XP box or MS upgrade or MS software at all. I now use Ubuntu or OpenBSD on all my new machines. I am migrating my old Win98 machines to Linux boxes. I will have a few XP machines for like web viewing and crap and since I just haven't really gotten around to figuring out what I want to do with them, but I dread the data on them such that I now backup even non-critical files, because the hassle of simply just redownloading or restoring them or reinstalling or recovering or re-encoding a large CD collection or the sheer inconvenience of it all just outweighs the cost of getting 2 drives instead of 1. (I backed up critical stuff regularly before this experience.) And any business machines, which I usually have 1 or 2 in the set that has XP on it simply because I felt it needed to be there, is strictly not now. I'd rather buy 2 500gb and mirror data periodically then send 1 penny on Windows or MS software (and I haven't bought their hardware either despite liking MS keyboards and webcams...I half think that the keyboard is going to explode or the webcam suddenly going to have a stepper motor or something hidden in it that's going to switch on and follow me into the shower or something--I'm that paranoid, half-assed jokingly cynical about any MS product).

  30. dab oot ton by Anonymous Coward · · Score: 2, Funny

    degnahc ytiralop retfa

  31. Re:Buy lots of ram by LoztInSpace · · Score: 2, Funny

    I think you mean 640K. (Sorry if I missed a sarcasm tag there).

  32. Re:Over filling a HD by seibed · · Score: 2, Interesting

    there might, in theory, be a method to that madness. Though it would be difficult to prove.

    two things happen as a drive gets full:
    more seeks all over the surface of the drive may exaggerate wear in the bearings of the actuator, increase the likelyhood of particle generation (through increased air cavitation) or the chances of the head running into one of those loosened particles or already stressed zones. (there are more seeks because as a drive fills, there is more and more fragmentation)
    The other thing that may be related would be the drivemaker playing fast and loose with their tolerances near the OD or ID. both areas have their own unique dangers for the flying head, and both are outside the boundaries of optimal airflow (since air moves faster relative to the head at the OD. Naturally, with the exception of the fragmented files already discussed, as the drive fills up, it is forced to utilize the non-optimal areas (which will vary depending on intended usage of the drive) and therefore *may* be subject to increased error rates.

    But on the whole, as a "cause of failure", a drive filling up is pretty low. Just spinning it up for the first couple of times probably has a higher likelyhood of failure as would any number of other potential problems.

  33. bearings overheating by phatvw · · Score: 3, Informative

    Agreed. I'd bet that the mechanical components, specifically the ball-bearings in the drive motor, are more likely to overheat and fail. In addition power-regulation/power-supply components such as large power transistors and resistors on the logic board are likely to fail.

    After 5 years of solid running, a lot of hard drives begin to sound different. Guess what, thats the bearings wearing out... More intersting stuff http://storagemojo.com/?p=378

  34. Reversing the polarity by Catastrophator · · Score: 2, Funny

    I though reversing the polarity solved problems, not caused them. Guess it only works on isolinear storage...

  35. BOFH by JudeanPeople'sFront · · Score: 2, Funny

    Magnetic wobbles? I thought it was static electricity from nylon underwear :)

  36. I don't get it - magnetic wobbling? by suv4x4 · · Score: 2, Funny

    But I'm sure it must be free energy in there somewhere! Man, imo gonna start a company based on this.

    - Sean McCarthy, Steorn CEO

  37. Re:It "effects" it's neighbors... by ThomsonsPier · · Score: 2, Informative

    No, it isn't, nor has it ever been.

  38. Re:scapegoats by n1ckml007 · · Score: 2, Informative

    You can count on /. for polarizing articles like these.

  39. Fluid bearings by zerofoo · · Score: 2, Informative

    Most drive manufacturers have gone to fluid bearings. These bearings don't have mechanical contact, the hydroplaning action of the fluid means the bearing parts never touch.

    I haven't had a fluid bearing drive fail yet due to bearing failure.

    -ted

  40. Re:How timely... by dm0527 · · Score: 2, Insightful

    Aside from initial installation, never, NEVER let Windows Update do ANYTHING with your hardware. It's pure evil. I have NEVER had a good experience letting windows update do anything by itself, but I flat out refuse to let it update drivers. Reasoning is exactly the same problem you had - I had it trash the drivers for and hard drive running off a card meant to let the OS see all of a large drive. Since then, never. If you're running a M$ OS, do yourself a favor: get the machine to a complete installation state (updates, drivers installed, basics, etc.) and then make an image of the drive using Drive Image or something like that. Then use the box and NEVER keep your data the same hard drive. Then you can wipe the drive and re-image it anytime you want without worrying about your data.

    --
    - dm - The two most common elements in the universe are Hydrogen and stupidity.