Your Hard Drive Lies to You

In other news.... by ToraUma · 2005-05-12 19:31 · Score: 5, Funny

96% of Livejournal users replied, "What's a hard drive? Is that like a modem?"

Re:Err... "lying" is the default setting. RTFM. by ewhac · 2005-05-12 19:35 · Score: 5, Informative

Yes, except there is a 'sync' command packet that is supposed to make the drive commit outstanding buffers to the platters, and not signal completion until those writes are done. It would appear, at first blush, that the drives are mis-handling this command when write-caching is enabled.

There is historical precedent for this. There were recorded incidents of drives corrupting themselves when the OS, during shutdown, tried to flush buffers to the disk just before killing power. The drive said, "I'm done," when it really wasn't, and the OS said Okay, and killed power. This was relatively common on systems with older, slower disks that had been retrofitted with faster CPUs.

However, once these incidents started ocurring, the issue was supposed to have been fixed. Clearly, closer study is needed here to discover what's really going on.

Schwab

--
Editor, A1-AAA AmeriCaptions

More information by Halo1 · 2005-05-12 19:39 · Score: 5, Interesting

There was an interesting discussion on this topic a while ago on Apple's Darwin development list a while ago.

--
Donate free food here

Author lied when implied that DRIVES are the issue by Anonymous Coward · 2005-05-12 19:42 · Score: 5, Informative

The author lied when implied that DRIVES are the issue.

ATA-IDE, SCSI, and S-ATA drives from all major manufacturers will accept commands to flush the write buffer including track cache buffer completely.

These commands are critical before cutting power and "sleeping" in machines that can perform a complete "deep sleep" (no power at all whatsoever sent to the ATA-IDE drive.

Such OSes include Apples OS 9 on a G4 tower, and some versions of OSX on machines not supplied with certain nuaghty video cards.

Laptops, for example need to flush drives... AND THEY do.

All drives conform.

As for DRIVER AUTHORS not heeding the special calls sent to them.... he is correct.

Many driver writers (other than me) are loser shits that do not follow standards.

As for LSI raid cards, he is right, and otehr raid cards... that is becasue the products are defective. But the drives are not and the drivers COULD be written to honor a true flush.

As for his "discovery" of sync not working.... DUH!!!!!

the REAL sync is usually a privelidged operation, sent from the OS, and not highly documented.

For example on a Mac the REAL sync in OS9 is a jhook trap and not the documented normal OS call which has a governor on it.

Mainframes such as PRIMOS and other old mainframes including even unix typically faked the sync command and ONLY allowed it if the user was at the actual physical systems console and furthermore logged in as a root or backup operator.

This cheating always sickened me. but all OSes do this because so many people that think they know what they are doing try to sync all the time for idiotic self-rolled journalling file systems and journalled databases.

But DRIVES, except a couple S-ATA seagates from 2004 with bad firmware, ALWAYS will flush.

This author should have explained that its not the hard drives.

They perform as documented.

Admittedly Linux used to corrupt and not flush several years ago... but it was not the IDE drives. They never got the commands.

Its all a mess... but setting a DRIVE to not cache is NOT the solution! Its retarded to do so, and all the comments in this thread taling of setting the cache off are foolish.

As for caching device topics, there are many options.

1> SCSI WCE permanent option

2> ATA Seagate Set Features command 82h Disable write cache

3> ATA config commands sent over SCSI (RAID card) device using a SCSI CDB in passthrough It uses 16 byte CBD with 8h, or 12 byte CDB with Ah for sending the tunneled command.

4> ATA ATAPI commands for WCE bit, asif it was SCSI

Fibre Channel drives of course honor SCSI commands.

As for mere flushing, a variety of low level calls all have the same desired effect and are documented in respective standards manuals.

Sadly unpredictable by grahamsz · 2005-05-12 19:46 · Score: 5, Interesting

i know all disks ultimately fail, but it's frustrating that some can be really abused and run for years, when others die abruptly.

While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.

Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.

In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.

Re:Why do we need it? by Erik+Hensema · 2005-05-12 19:50 · Score: 5, Insightful

We need it because of journalling filesystems. A JFS needs to be sure the journal has been flushed out to disk (and resides safely on the platters) before continuing to write the actual (meta)data. Afterwards, it needs to be sure the (meta)data is written properly to disk in order to start writing the journal again.

When both the journal and the data are in the write cache of the drive, the data on the platters is in an undefined state. Loss of power means filesystem corruption -- just the thing a JFS is supposed to avoid.

Also, switching off the machine the regular way is a hazard. As an OS you simply don't know when you can safely signal the PSU to switch itself off.

--

This is your sig. There are thousands more, but this one is yours.

Re:An acceptable alternative. by Sparr0 · 2005-05-12 19:54 · Score: 5, Insightful

You have no grasp of what 'kilo', 'mega', and 'giga' mean. They have meant the same thing for 45 years, computers did not change that. There is a standard for binary powers, you simply refuse to use it.

Re:What's this? by thsths · 2005-05-12 20:10 · Score: 5, Informative

> 1,000,000,000 bytes != 1 Gigabyte

Actually, it is. The standard was updated in 1998 to avoid confusion (Standard IEC 60027-2). Giga is 10^9, and it is constant, which means it does not change just because you use it for hard disks or memory.

If you mean 2^30, then you have to say gigabinary, abbreviated as gibi or Gi. Having different name for different things can avoid an awful lot of confusion, so it would very much recommend using them.

And now please put the following events into the correct order: America goes metric, hell freezes over, people use Gibi correctly.

Re:Hardly a new thing... by Clay+Pigeon+-TPF-VS- · 2005-05-12 20:17 · Score: 5, Funny

You must be new here. Computers always do what you tell them to do in the command line. What, you're using a gui? Well that's your fault then.

--
Viral software licensing is not freedom, it is in fact GNU/Socialism.

Re:Hardly a new thing... by pyrrhonist · 2005-05-12 20:28 · Score: 5, Funny

Computers always do what you tell them to do in the command line.

They sure do.

$ rm -rf * .o $ ls -a . .. $

FUCK!!!!!

--
Show me on the doll where his noodly appendage touched you.

Re:Err... "lying" is the default setting. RTFM. by Everleet · 2005-05-12 21:13 · Score: 5, Informative

fsync() is pretty clearly documented to cause a flush of the kernel buffers, not the disk buffers. This shouldn't come as a surprise to anyone.

From Mac OS X --

DESCRIPTION Fsync() causes all modified data and attributes of fd to be moved to a permanent storage device. This normally results in all in-core modified copies of buffers for the associated file to be written to a disk. Note that while fsync() will flush all data from the host to the drive (i.e. the "permanent storage device"), the drive itself may not physi- cally write the data to the platters for quite some time and it may be written in an out-of-order sequence. Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written. The disk drive may also re-order the data so that later writes may be present while earlier writes are not. This is not a theoretical edge case. This scenario is easily reproduced with real world workloads and drive power failures. For applications that require tighter guarantess about the integrity of their data, MacOS X provides the F_FULLFSYNC fcntl. The F_FULLFSYNC fcntl asks the drive to flush all buffered data to permanent storage. Applications such as databases that require a strict ordering of writes should use F_FULLFSYNC to ensure their data is written in the order they expect. Please see fcntl(2) for more detail.

From Linux --

NOTES In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.

From FreeBSD's tuning(7) --

IDE WRITE CACHING FreeBSD 4.3 flirted with turning off IDE write caching. This reduced write bandwidth to IDE disks but was considered necessary due to serious data consistency issues introduced by hard drive vendors. Basically the problem is that IDE drives lie about when a write completes. With IDE write caching turned on, IDE hard drives will not only write data to disk out of order, they will sometimes delay some of the blocks indefinitely under heavy disk load. A crash or power failure can result in serious file system corruption. So our default was changed to be safe. Unfortu- nately, the result was such a huge loss in performance that we caved in and changed the default back to on after the release. You should check the default on your system by observing the hw.ata.wc sysctl variable. If IDE write caching is turned off, you can turn it back on by setting the hw.ata.wc loader tunable to 1. More information on tuning the ATA driver system may be found in the ata(4) man page. There is a new experimental feature for IDE hard drives called hw.ata.tags (you also set this in the boot loader) which allows write caching to be safely turned on. This brings SCSI tagging features to IDE drives. As of this writing only IBM DPTA and DTLA drives support the feature. Warning! These drives apparently have quality control problems and I do not recommend purchasing them at this time. If you need perfor- mance, go with SCSI.

--
It's tragic. Laugh.

Re:Hardly a new thing... by pyropunk51 · 2005-05-12 21:32 · Score: 5, Funny

I really hate this damned machine!
I wish that I could sell it.
It never does quite what I want,
But only what I tell it!

--
double penetration; //ouch

Re:Why do we need it? by swmccracken · 2005-05-12 22:50 · Score: 5, Insightful

This writes to the disks write cache but I don't believe it actually issues the sync command to the drive.

Yeah - that's the point of this thing - what's supposed to happen with fsync? From memory, sometimes it will guarentee it's all the way to the platters, sometimes it will not, depending on what storage system you're using, and how easy such a guarentee is to make.

Linus in 2001 discussing this issue - it's not new. That whole thread was about comparing SCSI against IDE drives, and it seemed that the IDE drives were either breaking the laws of physics, or lying, but the SCSI drives were being honest.

From hazy memory, one problem is that without tagged-command-queing or native-command-queuing, one process issuing a sync will cause the hard drive and related software to wait until it has fully synched for all i/o "in flight"; holding up any other i/o tasks for other processes!

That's why fsync often lies; because it's not pratical for people that fsync all the time to flush buffers to screw around with the whole i/o subsystem, and apparently some programs were overzealous with calling fsync when they shouldn't.

However, with TCQ, commands that are synched overlap with other commands, so it's not that big a deal (other i/o tasks are not impacted any more than they would by other, unsynchronised, i/o). (Thus, with TCQ, fsync might go all the way to the platters, but without it it might just go to the IDE bus.) SCSI has had TCQ from day one, which is why a SCSI system is more likely to sync all the way than IDE.

If I'm wrong, somebody correct me please.

Brad's program certainly points out an issue - it should be possible for a database engine to write to disk and guarentee that it gets written; perhaps fsync() isn't good enough - be this fault in the drives, the IDE spec, IDE drivers or the OS.

13 of 512 comments (clear)