Slashdot Mirror


Your Hard Drive Lies to You

fenderdb writes "Brad Fitzgerald of LiveJournal fame has written a utility and a quick article on how all hard drives from the consumer level to the highest level 'enterprise' grade SCSI and SATA drives do not obey the fsync() function. Manufacturers are blatantly sacrificing integrity in favor of scoring higher on 'pure speed' performance benchmarking."

10 of 512 comments (clear)

  1. Why do we need it? by Godman · · Score: 4, Interesting

    If we are just now figuring out that fsync's don't work, then the question is, why do we care? Have we been using them, and they just haven't been working or something?

    If we've made it this far without it, why do we need it now?

    I'm just curious...

    --
    I have this really funny quote that I like to put here. Unfortunately, there's this really annoying thing called a char
  2. Of course it does! by grahamsz · · Score: 4, Interesting

    Having written some diagnostic tools for a smaller hard disk maker (who i'll refrain from naming) it's amazing to me that disks work at all.

    Most systems can identify and patch out bad sectors so that they aren't used. What surprised me is that the manufacturers have their own bad sector table, so when you get the disk it's fairly likely that there are already bad areas which have been mapped out.

    Secondly the raw error rate was astoundingly high. It's been quite a few years but it was somewhere between on error in every 10E5 to 10E6 bits. So it's not unusual to find a mistake in every megabyte read. Of course CRC picks up this error and hides that from you too.

    Granted this was a few years ago, but i wouldn't be surprised if it's as bad (or even worse) now.

    1. Re:Of course it does! by pyropunk51 · · Score: 3, Interesting

      As anybody who's ever used (or had to use :-( ) SpinRite will tel you, your HDD not only lies to you, it cheats and steals as well. To whit: It makes it seem there are no bad sectors, when in fact the surface is riddled with them, only the manufacturer hides this fact from you by having a bad sector table. Also errors are corrected on the fly by some CRC checking. You can ask the SMART for the stats, but you can do very little about the results it gives you, other than maybe buying a new disk (which most likely has a different set of problems - you just don't know what they are). And where have you ever seen a 40Gb drive that is exactly 40 billion bytes big? The bottom line is: Reliability is NOT profitable. Where would Hardware manufacturers be if we didn't have to buy a new disk every 2 years!

      --
      double penetration; //ouch
  3. More information by Halo1 · · Score: 5, Interesting

    There was an interesting discussion on this topic a while ago on Apple's Darwin development list a while ago.

    --
    Donate free food here
  4. Sadly unpredictable by grahamsz · · Score: 5, Interesting

    i know all disks ultimately fail, but it's frustrating that some can be really abused and run for years, when others die abruptly.

    While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.

    Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.

    In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.

  5. Re:Which ones ? by ewhac · · Score: 4, Interesting
    Can someone explain how OSes could lie?

    Easy. The driver gets a 'sync' command from the OS. However, the driver writer believes that most other programmers call fsync() when they don't really need to, and decides to "optimize" this case. So he passes the command on to the drive, but returns immediately (allowing the drive command to complete asynchronously). This makes his driver appear faster.

    Fortunately, most driver writers have their priorities straight about data integrity, so this kind of thinking isn't very common.

    Schwab

  6. Re:Author lied when implied that DRIVES are the is by Sinner · · Score: 3, Interesting

    Parent either doesn't know what he's talking about, or is a troll. Pity there isn't an "incoherent rant" moderation option, or we could avoid the ambiguity.

    --
    fish and pipes
  7. Examples from the World of Windows. by stereoroid · · Score: 4, Interesting
    Microsoft have had a few problems in this area - see KB281672 for example.

    Then they released Windows 2000 Service Pack 3, which fixed some previous cacheing bugs, as documented in KB332023. The article tells you how to set up the "Power Protected" Write Cache Option", which is your way of saying "yes, my storage has a UPS or battery-backed cache, give me the performance and let me worry about the data integrity".

    I work for a major storage hardware vendor: to cut a long story short, we knew fsync() (a.k.a. "write-through" or "synchronize cache") was working on our hardware, when the performance started sucking after customers installed W2K SP3, and we had to refer customers to the latter article.

    The same storage systems have battery-backed cache, and every write from cache to disks is made write-through (because drive cache is not battery-backed). In other words, in these and other Enterprise-class systems, the burden of honouring fsync() / write-through commands from the OS has switched to the storage controller(s), the drives might as well have no cache for all we care. But it still matters that the drives do honour the fsync() we send to them from cache, and not signal "clear" when they're not - if they lie, the cache drops that data, and no battery will get it back..!

    --
    (this is not a .sig)
  8. Re:Err... "lying" is the default setting. RTFM. by pv2b · · Score: 3, Interesting

    Right. And the author is implementing a program that sends raw commands to ATA drives... in perl. Right. He does no such thing, at least not what I can see, by glancing at the source code of the perl script. Granted, I'm not fluent in perl, but it doesn't seem to do anything else than to do an fsync() equivalent. Please do correct me if I'm wrong.

    The truth is that he doesn't know wtf he's talking about. I decide to cut him some slack though, because the FreeBSD 4 man pages at least are very misleading, and I don't know what man pages he did read.

    By the way, I sent him an e-mail. It's available on my web space. I'm not posting it in full here, because it's a little long and it would be redundant, since a lot of the surrounding posts discuss pretty much the same thing as I said.

  9. Put a capacitor on the harddrive by kublikhan · · Score: 3, Interesting

    Couldn't they just stick a large capacitor or small battery on the harddrive that is only used for flushing the write cache to the platters in the event of a power failure? It should be a simple enough matter, we only need a few seconds here, and it would solve this whole mess.