Your Hard Drive Lies to You
fenderdb writes "Brad Fitzgerald of LiveJournal fame has written a utility and a quick article on how all hard drives from the consumer level to the highest level 'enterprise' grade SCSI and SATA drives do not obey the fsync() function. Manufacturers are blatantly sacrificing integrity in favor of scoring higher on 'pure speed' performance benchmarking."
96% of Livejournal users replied, "What's a hard drive? Is that like a modem?"
There is historical precedent for this. There were recorded incidents of drives corrupting themselves when the OS, during shutdown, tried to flush buffers to the disk just before killing power. The drive said, "I'm done," when it really wasn't, and the OS said Okay, and killed power. This was relatively common on systems with older, slower disks that had been retrofitted with faster CPUs.
However, once these incidents started ocurring, the issue was supposed to have been fixed. Clearly, closer study is needed here to discover what's really going on.
Schwab
Editor, A1-AAA AmeriCaptions
There was an interesting discussion on this topic a while ago on Apple's Darwin development list a while ago.
Donate free food here
The author lied when implied that DRIVES are the issue.
ATA-IDE, SCSI, and S-ATA drives from all major manufacturers will accept commands to flush the write buffer including track cache buffer completely.
These commands are critical before cutting power and "sleeping" in machines that can perform a complete "deep sleep" (no power at all whatsoever sent to the ATA-IDE drive.
Such OSes include Apples OS 9 on a G4 tower, and some versions of OSX on machines not supplied with certain nuaghty video cards.
Laptops, for example need to flush drives... AND THEY do.
All drives conform.
As for DRIVER AUTHORS not heeding the special calls sent to them.... he is correct.
Many driver writers (other than me) are loser shits that do not follow standards.
As for LSI raid cards, he is right, and otehr raid cards... that is becasue the products are defective. But the drives are not and the drivers COULD be written to honor a true flush.
As for his "discovery" of sync not working.... DUH!!!!!
the REAL sync is usually a privelidged operation, sent from the OS, and not highly documented.
For example on a Mac the REAL sync in OS9 is a jhook trap and not the documented normal OS call which has a governor on it.
Mainframes such as PRIMOS and other old mainframes including even unix typically faked the sync command and ONLY allowed it if the user was at the actual physical systems console and furthermore logged in as a root or backup operator.
This cheating always sickened me. but all OSes do this because so many people that think they know what they are doing try to sync all the time for idiotic self-rolled journalling file systems and journalled databases.
But DRIVES, except a couple S-ATA seagates from 2004 with bad firmware, ALWAYS will flush.
This author should have explained that its not the hard drives.
They perform as documented.
Admittedly Linux used to corrupt and not flush several years ago... but it was not the IDE drives. They never got the commands.
Its all a mess... but setting a DRIVE to not cache is NOT the solution! Its retarded to do so, and all the comments in this thread taling of setting the cache off are foolish.
As for caching device topics, there are many options.
1> SCSI WCE permanent option
2> ATA Seagate Set Features command 82h Disable write cache
3> ATA config commands sent over SCSI (RAID card) device using a SCSI CDB in passthrough It uses 16 byte CBD with 8h, or 12 byte CDB with Ah for sending the tunneled command.
4> ATA ATAPI commands for WCE bit, asif it was SCSI
Fibre Channel drives of course honor SCSI commands.
As for mere flushing, a variety of low level calls all have the same desired effect and are documented in respective standards manuals.
i know all disks ultimately fail, but it's frustrating that some can be really abused and run for years, when others die abruptly.
While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.
Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.
In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.
We need it because of journalling filesystems. A JFS needs to be sure the journal has been flushed out to disk (and resides safely on the platters) before continuing to write the actual (meta)data. Afterwards, it needs to be sure the (meta)data is written properly to disk in order to start writing the journal again.
When both the journal and the data are in the write cache of the drive, the data on the platters is in an undefined state. Loss of power means filesystem corruption -- just the thing a JFS is supposed to avoid.
Also, switching off the machine the regular way is a hazard. As an OS you simply don't know when you can safely signal the PSU to switch itself off.
This is your sig. There are thousands more, but this one is yours.
You have no grasp of what 'kilo', 'mega', and 'giga' mean. They have meant the same thing for 45 years, computers did not change that. There is a standard for binary powers, you simply refuse to use it.
> 1,000,000,000 bytes != 1 Gigabyte
Actually, it is. The standard was updated in 1998 to avoid confusion (Standard IEC 60027-2). Giga is 10^9, and it is constant, which means it does not change just because you use it for hard disks or memory.
If you mean 2^30, then you have to say gigabinary, abbreviated as gibi or Gi. Having different name for different things can avoid an awful lot of confusion, so it would very much recommend using them.
And now please put the following events into the correct order: America goes metric, hell freezes over, people use Gibi correctly.
You must be new here. Computers always do what you tell them to do in the command line. What, you're using a gui? Well that's your fault then.
Viral software licensing is not freedom, it is in fact GNU/Socialism.
They sure do.
FUCK!!!!!Show me on the doll where his noodly appendage touched you.
From Mac OS X --
From Linux --
From FreeBSD's tuning(7) --
It's tragic. Laugh.
I really hate this damned machine!
I wish that I could sell it.
It never does quite what I want,
But only what I tell it!
double penetration;
This writes to the disks write cache but I don't believe it actually issues the sync command to the drive.
Yeah - that's the point of this thing - what's supposed to happen with fsync? From memory, sometimes it will guarentee it's all the way to the platters, sometimes it will not, depending on what storage system you're using, and how easy such a guarentee is to make.
Linus in 2001 discussing this issue - it's not new. That whole thread was about comparing SCSI against IDE drives, and it seemed that the IDE drives were either breaking the laws of physics, or lying, but the SCSI drives were being honest.
From hazy memory, one problem is that without tagged-command-queing or native-command-queuing, one process issuing a sync will cause the hard drive and related software to wait until it has fully synched for all i/o "in flight"; holding up any other i/o tasks for other processes!
That's why fsync often lies; because it's not pratical for people that fsync all the time to flush buffers to screw around with the whole i/o subsystem, and apparently some programs were overzealous with calling fsync when they shouldn't.
However, with TCQ, commands that are synched overlap with other commands, so it's not that big a deal (other i/o tasks are not impacted any more than they would by other, unsynchronised, i/o). (Thus, with TCQ, fsync might go all the way to the platters, but without it it might just go to the IDE bus.) SCSI has had TCQ from day one, which is why a SCSI system is more likely to sync all the way than IDE.
If I'm wrong, somebody correct me please.
Brad's program certainly points out an issue - it should be possible for a database engine to write to disk and guarentee that it gets written; perhaps fsync() isn't good enough - be this fault in the drives, the IDE spec, IDE drivers or the OS.