Your Hard Drive Lies to You
fenderdb writes "Brad Fitzgerald of LiveJournal fame has written a utility and a quick article on how all hard drives from the consumer level to the highest level 'enterprise' grade SCSI and SATA drives do not obey the fsync() function. Manufacturers are blatantly sacrificing integrity in favor of scoring higher on 'pure speed' performance benchmarking."
Since when do computers do what you mean?
|>>?
Write Cache enable is default on most IDE/ATA
drives. Most SCSI drives don't enable it.
If you don't like it, turn it off. There's
no "lying", and I'm sure the fsync() function
doesn't know diddly squat about the cache of
your disk. Maybe the ATA/device abstraction layer does, and I'm sure there's a configurable registry/sysctl/frob you can twiddle to make it DTRT (like FreeBSD has).
Move along, nothing to see...
Hard drive manufacturers screwing over customers? Why, who would have thought?
1 billion bytes equals 1 gigabyte - since when?
Dropped MTBF right after reducing the 3 year standard wrty to a 1 year - good timing.
Now this?
Wow what a track record of consumer loving...
So, do you think someone typed "Nuclear weapons are being developed by the government of Iraq.^H^Hn." just before the power went out?
If we are just now figuring out that fsync's don't work, then the question is, why do we care? Have we been using them, and they just haven't been working or something?
If we've made it this far without it, why do we need it now?
I'm just curious...
I have this really funny quote that I like to put here. Unfortunately, there's this really annoying thing called a char
Having written some diagnostic tools for a smaller hard disk maker (who i'll refrain from naming) it's amazing to me that disks work at all.
Most systems can identify and patch out bad sectors so that they aren't used. What surprised me is that the manufacturers have their own bad sector table, so when you get the disk it's fairly likely that there are already bad areas which have been mapped out.
Secondly the raw error rate was astoundingly high. It's been quite a few years but it was somewhere between on error in every 10E5 to 10E6 bits. So it's not unusual to find a mistake in every megabyte read. Of course CRC picks up this error and hides that from you too.
Granted this was a few years ago, but i wouldn't be surprised if it's as bad (or even worse) now.
hitachi's working on something for that right now.
Corporate Integrity, not data integrity. I've read through the article and don't see how you can lose data integrity unless you disable all caching, from the OS to the disk itself. In this day and age, nobody does that. Sure, somethings broke. But I fail to see how its very useful these days anyway. Maybe someone with a better grasp of why you would need Fsync could help out?
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
96% of Livejournal users replied, "What's a hard drive? Is that like a modem?"
... "Swear to you there's no pr0n there !!"
Mcow.
Can someone explain how OSes could lie ? I mean, is it because of a buggy implementation or is it intentional?? "lie" would mean I personally don't see any reason why an OS should lie...
... and I shall strike upon thee with great vegeance, furious anger and a slightly positive karma.
There is historical precedent for this. There were recorded incidents of drives corrupting themselves when the OS, during shutdown, tried to flush buffers to the disk just before killing power. The drive said, "I'm done," when it really wasn't, and the OS said Okay, and killed power. This was relatively common on systems with older, slower disks that had been retrofitted with faster CPUs.
However, once these incidents started ocurring, the issue was supposed to have been fixed. Clearly, closer study is needed here to discover what's really going on.
Schwab
Editor, A1-AAA AmeriCaptions
I can write p0rn faster to my disk, but I can't save my p0rn. Ack!
$ sync
$ sync
$ sync
Yeah. In the days when the biggest hard drive you could get was 2 gigs, you would get 147,483,648 bytes less storage than advertised, unless you read the fine print located somewhere. This is only about 140 megs less than advertised. Today, when you can get 200 gig hard drives, the difference is much larger: 14,748,364,800 bytes less storage than advertised. This means that now, you get almost FOURTEEN GIGABYTES less storage than advertised. That's bigger than any hard drive that existed in 1995. That is a big deal.
I'm bringing up the size issue in a thread on fsync() because it is only one more area where hard drive manufacturers are cheating to get "better" performance numbers, instead of being honest and producing a good product. As a result, journaling filesystems and the like cannot be guaranteed to work properly.
If the hard drive mfgs really want good performance numbers, this is what they should do: Hard drives already have a small amount of memory (cache) in the drive electronics. Unfortunately, when the power goes away, the data therein becomes incoherent within nanoseconds. So, embed a flash chip on the hard drive electronics, along with a small rechargeable battery. If the battery is dead or the flash is fscked up, both of which can easily be tested today, the hard drive obeys all fsync() more religiously than the pope and works slightly more slowly. If the battery is alive and the flash works, the hard drive will, in the event of power-off with data remaining in the cache (now backed by battery), that data would be written to the flash chip. Upon the next powerup, the hard drive will initialize as normal, but before it accepts any incoming read or write commands, it will first record the information from flash to the platter. This is a good enough guarantee that data will not be lost, as the reliability of flash memory exceeds that of the magnetic platter, provided the flash is not written too many times, which it won't be under this kind of design; and as I said, nothing will be written to flash if the flash doesn't work anymore.
How do I turn off write caching?
With windows xp there is a setting available, but it no longer works with sp2; the setting is reset when you leave the dialog.
Also, on my mac, I have two external drives in firewire enclosures--how do I turn it off here?
In any case, this completely sucks. Are there any disks one can buy that default to a sane setting?
it shows the right file when i click on it, i dont care...
untill one day the thing dies, and i lose everything.
then, i take it outside and set it on fire.
Who's fSync()ing now!!!!
There was an interesting discussion on this topic a while ago on Apple's Darwin development list a while ago.
Donate free food here
Lot's of stuff relies on knowing when blocks hit the disk. Think about it... knowing that something is on the disk means you can make assertions about write ordering. What relies on ordering? Databases and filesystems (i.e. BSD softupdates) for starters. If the disk lies to the OS about when data is written, bad stuff will happen sooner or later.
The author lied when implied that DRIVES are the issue.
ATA-IDE, SCSI, and S-ATA drives from all major manufacturers will accept commands to flush the write buffer including track cache buffer completely.
These commands are critical before cutting power and "sleeping" in machines that can perform a complete "deep sleep" (no power at all whatsoever sent to the ATA-IDE drive.
Such OSes include Apples OS 9 on a G4 tower, and some versions of OSX on machines not supplied with certain nuaghty video cards.
Laptops, for example need to flush drives... AND THEY do.
All drives conform.
As for DRIVER AUTHORS not heeding the special calls sent to them.... he is correct.
Many driver writers (other than me) are loser shits that do not follow standards.
As for LSI raid cards, he is right, and otehr raid cards... that is becasue the products are defective. But the drives are not and the drivers COULD be written to honor a true flush.
As for his "discovery" of sync not working.... DUH!!!!!
the REAL sync is usually a privelidged operation, sent from the OS, and not highly documented.
For example on a Mac the REAL sync in OS9 is a jhook trap and not the documented normal OS call which has a governor on it.
Mainframes such as PRIMOS and other old mainframes including even unix typically faked the sync command and ONLY allowed it if the user was at the actual physical systems console and furthermore logged in as a root or backup operator.
This cheating always sickened me. but all OSes do this because so many people that think they know what they are doing try to sync all the time for idiotic self-rolled journalling file systems and journalled databases.
But DRIVES, except a couple S-ATA seagates from 2004 with bad firmware, ALWAYS will flush.
This author should have explained that its not the hard drives.
They perform as documented.
Admittedly Linux used to corrupt and not flush several years ago... but it was not the IDE drives. They never got the commands.
Its all a mess... but setting a DRIVE to not cache is NOT the solution! Its retarded to do so, and all the comments in this thread taling of setting the cache off are foolish.
As for caching device topics, there are many options.
1> SCSI WCE permanent option
2> ATA Seagate Set Features command 82h Disable write cache
3> ATA config commands sent over SCSI (RAID card) device using a SCSI CDB in passthrough It uses 16 byte CBD with 8h, or 12 byte CDB with Ah for sending the tunneled command.
4> ATA ATAPI commands for WCE bit, asif it was SCSI
Fibre Channel drives of course honor SCSI commands.
As for mere flushing, a variety of low level calls all have the same desired effect and are documented in respective standards manuals.
For example, don't think "home user losing the last porn pic", think for example "corporate databases using XA transactions".
The semantics of XA transactions say that at the end of the "prepare" step, the data is already on the disc (or whatever other medium), just not yet made visible. That, basically all that could possibly fail, has in fact had its chance to fail. And if you got an OK, then it didn't.
Introducing a time window (likely extending not just past "prepare", but also past "commit") where the data is still in some cache and God knows when it'll actually get flushed, throws those whole semantics out the window. If, say, power fails (e.g., PSU blows a fuse) or shit otherwise hits the fan in that time window, you have fucked up the data.
The whole idea of transactions is ACID: Atomicity, Consistency, Isolation, and Durability:
- Atomicity - The entire sequence of actions must be either completed or aborted. The transaction cannot be partially successful.
- Consistency - The transaction takes the resources from one consistent state to another.
- Isolation - A transaction's effect is not visible to other transactions until the transaction is committed.
- Durability - Changes made by the committed transaction are permanent and must survive system failure.
That time window we introduced makes it at least possible to screw 3 out of 4 there. An update that involves more than one hard drive may not be Atomically executed in that case: only one change was really persisted. (E.g., if you booked a flight online, maybe the money got taken from your account, but not given to the airline.) It hasn't left the data in a Consistent state. (In the above example some money have disappeared into nowhere.) And it's all because it wasn't Durable. (An update we thought we committed hasn't, in fact, survived a system failure.)
A polar bear is a cartesian bear after a coordinate transform.
"Granted this was a few years ago, but i wouldn't be surprised if it's as bad (or even worse) now."
Gee. Who would have guessed the world wasn't perfect. Anyone who doesn't take failure into account when designing something is an idiot.
i know all disks ultimately fail, but it's frustrating that some can be really abused and run for years, when others die abruptly.
While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.
Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.
In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.
No. If you had no cache, there would be no need for a flush command. The flush command exists purely for the reason of flushing buffer and caches on the harddisc. The ATA-5 specifies the command as E7h (and as mandatory).
The command is specified in practically in all storage interfaces for exactly the reason the author cited, integrity. Otherwise, you can't assure integrity without sacrificing a lot of performance.
"Between strong and weak, between rich and poor [...], it is freedom which oppresses and the law which sets free"
"If we've made it this far without it, why do we need it now?"
There would be very few slashdot comments otherwise.
using the wrong definitions to make their products seem bigger. I bought a P4 2.4GHz CPU the other day, and was shocked to find it wasn't 2,576,980,377.6Hz like it should be! Lying thieves...
It's not a lie. fsync syncs to a device. The device is a hard drive with a cache.
You'd expect a fsync to complete only when the data is physically written to disk. However usually this is not the case it completes only when it is fully written to the cache on the physical disk.
The downside of this is that it's possible to loose data if you pull the power plug (usually not just by hitting the power switch). However if the disks were to actually commit fully to the physical media on every fsync you would see a very very dramatic performance degredation. Not just a little slower so you look bad in a magazine article but incredibly slow, especially if you are running a database or similar application that fsyncs often.
Server class machines solve this problem by providing battery backed cache on their controllers. This allow the full speed operation by fsyncing only to cache but if power is lost the data is then safe because of the battery.
This doesn't matter too much for the average joe for a number of reasons. First the when the power switch is hit, the disks tend to finnish writing their caches before spinning down. IN the case of a power failure journaled file systems will usually keep you safe (but not always).
This is a big issue however if you are trying to implement an enterprise class database server on everyday hardware.
So turn off the write cache if you don't want it on but don't complain when your system starts to crawl.
T bought this PC recently. There was a C:\ and a D:\. One day both failed at the same time. I opened the case and i saw only One hard disk. Where did the other one go? Who lied to me? The dealer or the hard disk?
simple fix... just dont power down. (its better on the wear and tear anyways nowadays)
Parent either doesn't know what he's talking about, or is a troll. Pity there isn't an "incoherent rant" moderation option, or we could avoid the ambiguity.
fish and pipes
I remember that MS had a fix for this (for laptops etc)... Which just made Windows wait a duration (~30s)... I call that breaking the OS... Fix the underlying problem. Cause and effect. MS should be kicking the teeth of the harddrive manufactures on pure principle atleast. Because let's face it, if MS wants to do anything, MS will do anything they damn please. This can of course be a good thing from time to time!
fsync semantic is needed whenever you want to implement ACID transactions. This lies at the core of database systems and journaling file systems, for example. No fsync, no data integrity.
"That time window we introduced makes it at least possible to screw 3 out of 4 there. An update that involves more than one hard drive may not be Atomically executed in that case: only one change was really persisted. (E.g., if you booked a flight online, maybe the money got taken from your account, but not given to the airline.) It hasn't left the data in a Consistent state. (In the above example some money have disappeared into nowhere.) And it's all because it wasn't Durable. (An update we thought we committed hasn't, in fact, survived a system failure.)"
True. However anyone designing such a critical system will not have a single point of failure. But will have multiple fail-safes. Remember failure is always "just around the corner" e.g. power failure, bad media, cosmic ray, server gets slashdotted, etc.
There is _much_ to see here!
This has nothing to do with whether the write caches are enabled or not.
Unless you have an intelligent controller with battery backup, the fsynch chain should issue a SYNC command to the disc and flush the cache on the disk, the disk controller and the OS. If this does not happen, then there is a bug.
I see a market for an extenede "real flush" since HD makers are not soon to change this practice. (just as they count gigabytes different from the OS)
His surname is actually Fitzpatrick and not Fitzgerald.
In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdata sync return.
When an ext2 file system is mounted with the sync option, directory entries are also implicitly synced by fsync.
On kernels before 2.4, fsync on big files can be ineffi cient. An alternative might be to use the O_SYNC flag to open(2).
I call BS here.
When a power failure occurs, even if you are running a journalling filesystem, you _really_, _really_, _really_ care about drive write caching.
If the drive has the journalled data in it's write cache and not on disk at the time of the power failure, you've just corrupted your filesystem because the journal replay will not be correct when you next mount your filesystem. Journalled filesystems rely on the journal being on disk when the I/O completes and not some time after.
Toy (*ATA) use write cache for _performance_ because otherwise they suck really badly. And that results in filesystem corruption on power loss. I've seen the mess it makes far too many times....
I guess one would assume that fsync() is meant to flush all data to the disk. Yes and no. It means, if you have in-memory buffers (like all OSs do) they should be flushed to the storage system. It does NOT guarantee that the storage sub-systems themselves will be flushed. To ensure that, most OSs just force the subsystems to remain idle for a few seconds, which is sufficient for them to write-back their cache contents. That's how it's been for 30 years ... so what's new?
Hey, parent, or someone who knows something - could you maybe ellaborate?
It seems that the problem, as pointed out by the parent and posts below is not with the disks, but with fsync() - it seems that fsync only promises do give all data to the disk, but nothing about whether or not it is actually written (?). So, disks could actually flush, if one only asked them nicely enough? How? Can this be tested? Implemented?
Do journaling FS just use regular fsync? Can they use some other call that actually does flush?
Which SATA Seagates from 2004 have bad firmware and what is the nature of the flaw? I have drives of that description that cause some really strange errors in my hot-swap IDE enclosure.
"I remember that MS had a fix for this (for laptops etc)... Which just made Windows wait a duration (~30s)..."
This turned into the "my computer isn't doing what I want it to do, which is turn the F off" at which point the consumer simply reached down and yanked the power cord.
Try writing a routine for this routine!
Well, it's unlikely this is going to change. The real solution is to give power long enough to the disk drive to let it complete its writes no matter what, and/or to add non-volatile or flash memory to the disk drive so that it can complete its writes after coming back up.
There is a fairly simple external solution for that: a UPS. They're good. Get one.
And even then it is not guaranteed that just because you write a block, you can read it again, because nothing can guarantee it. So, file systems need to deal, one way or another, with the possibility that this case occurs.
subj
does no one else notice the fitzgerald / fitzpatrick discrepency?
I'm not trying to be pedantic here, but as a fellow Fitzpatrick I feel for Brad. I too have been mistakenly called Fitzgerald all my life.
His name was Brad Fitzpatrick -- damnit.
From Mac OS X --
From Linux --
From FreeBSD's tuning(7) --
It's tragic. Laugh.
Isn't this something that Alan Cox is complaining about in the Linux 2.6 IDE layer? Something about fsync not always waiting for the completion of the cache flush? He tells everyone on LKML to turn the disk write-cache off on IDE disks to make fsync work properly. Or am I clueless?
If you have to try this, TURN IT OFF WITH THE WALL SWITCH. Yanking the powercord out can damage components easily if the earth comes out first, which is unfortunately very likely with PC power cords. I have several sticks of RAM that can provide supporting evidence, unfortunately.
The manufacturers DID fix it. What do you expect MS to do? They can't kick the manufacturers into replacing every affected drive on the planet, and they sure as hell can't get clueless/empathetic users to do anything about it either.
So... Release a patch for it. The patch was not required either, and was quite clearly documented when displayed in Windows Update for users of Windows 9x.
hello dear sirs my name is jamesh i are india (bihar) can u guide me install red had linux 9?
You must be talking about some locally specific powercord and generalize it to all PC power cords?
A reasonably designed power connection standard always makes sure that earth is connected first and disconnected last. I am sure our local standard does.
Would you please elaborate on how 2> or 4> are done? I have a seagate 7200.7 disc, and the hdparm -W 0 setting does not seem to be permanent. Thanks.
I've never heard "kibibyte" or "mebibyte", or seen them actually used. Have you? I didn't even know what they were until I've read your post. I have seen GiB and MiB, and I've always confused GB and GiB as to which one is which.
Now, it might be that I'm just sooo ignorant, or it might be that no one actually uses that fine standard... the latter being, I'm afraid, the more likely case. Which means that ths standard is rather irrelevant.
could you specify at what timeframe/what kernel version linux didnt pass through the sync command to the hard disks? in the last 11 years of my linux usage ive never noticed that, had always consistent shutdowns, but may be i was missing a special kernel version?
As all PC power supplies are the same, then I would think that the cord heading out of the back of them would also be the same.
As for the wall socket side, you will usually be alright, but not always.The idea is to flush all buffers in the software and the specs are not talking about the buffers in the hardware.
That's nonsense. Applications that use fsync() do so in order to be certain that things are actually recorded in the hardware. It's by FAR the most important issue, and this is the whole purpose of fsync() --- a portable way of achieving it.
...Partition Magic lies more. Bastard.
Resident of Skara Brae since 1985
I am sure our local standard does.
He's probably from the US, where the power cord is specifically designed to disconnect power before disconnecting ground, with the ground prong a whole quarter inch longer than the other two. It usually works as intended, except thatan electrical connection between the ground prong and the socket contact is necessarily flaky while the prong is being pulled out, and your equipment could fry as a result.
Firt everyone mods me down and THEN you ask for help?
great.
anyway the best answer for you is
"you need to send the command every wakeup from sleep or every bootup." the command is a standard "Set Features" ATA-IDE command with command byte of 82 in hex.
A systems level programmer could write a small utility that does so, or you could possibly learn to write and send the command.
for info on method "2' download a free manual from seagate called the "Barracuda Serial ATA V Product Manual, Rev. A" ir will have deatail relevant to your drive, despite the name.
refer to section 2.2.2 "Set Features command"
Table 9:
82h Disable write cache (features rgister value)
Power-on default has the read look-ahead and write caching features enabled.
method "4" will not work for that specific device, but would be permanent if it did accept it. (stored in a vendor features settings track, rather than flash ram)
BS? *NO* get a clue idiot. i was correct in my parent post.
First of all I design Fibre Channel Firmware, SCSI drivers, caching drivers, MO drivers, DVD and CD burnign tools, ATA-IDE raid drives, S-ATA drivers, and USM Sorage Class drivers for a living.
I do not even want to take the time to elaborate how incorrect your denail of m post is.
but get a clue fool..... for example even if you turn off all cache on ANY drive in the world sold today including Fibre Channel and SCSI320 high end drives.... you will still have corruption on powerfailure if writing data or failure happens soo after getting data.
Why? because its in the TRACK CACHE you fool, and the drives lack capacitors to feed the ASICs during poweroutage even if the rotational energy is still there. All they can do usually is pull the heads back to landing zone.
if you "eally_, _really_, _really" care the OS issues a goddamned FLUSH command to the drivers that then send it to the drives. DUH!
If you already misunderstand how cache and flush works in a drive, I doubt I could ever teach you here.
just read and learn.
you asked "Do journaling FS just use regular fsync? Can they use some other call that actually does flush?"
the answers are : NO THEY INVOKE FSYNC in special contexts and/or issue power-manager oriented commands to flush the drive as a side effect or by direct calls to drivers
Can they use some other call that actually does flush? yes they can and do on almost every OS ever sold.
the reason is people could write denial of service user apps that rob the entire OS of speed by maliciously sync-ing all the time to cause trouble.
also , idiots that flush for no reason
apple in OSX for example has a special way of issuing flush that they do document but wish people to not call
other methods on many classic unix systems include running as root or as a special process or in a context designed to allow it
I wonder if Linux kernel issues this flush command on fsync() at all, as it works on filesystem level.
/dev/hda
We could blame HDD manufacturers if there is data loss after:
hdparm -f
Maybe using kilo to mean 1024x is wrong.
Fact of it is that *anyone* who knew enough about computers for it to matter would have known and agreed on this standard anyway, right or wrong.
They came along and messed up a standard that everyone had agreed upon and was happy with. Don't even *think* of saying that using decimal kilobytes et al had any purpose other than making drives seem bigger than they were; that trick only worked because everyone had previously agreed that a kilobyte was 1024 bytes.
If the industry was *so* damn keen to get the 'correct' meaning of the words, they wouldn't still be using the 'incorrect' versions when selling memory.
Simple fact; anyone who wants to be pedantic about it can correctly argue that the 1024 definition of kilobyte is wrong. What they can't do is give any proper justification for changing a definition that everyone knew and understood to mean 1024 bytes.
Marketing bullshit, pure and simple; in fact, I propose the phrase "marketing gigabyte", just to make it absolutely clear which definition is in use...
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Then they released Windows 2000 Service Pack 3, which fixed some previous cacheing bugs, as documented in KB332023. The article tells you how to set up the "Power Protected" Write Cache Option", which is your way of saying "yes, my storage has a UPS or battery-backed cache, give me the performance and let me worry about the data integrity".
I work for a major storage hardware vendor: to cut a long story short, we knew fsync() (a.k.a. "write-through" or "synchronize cache") was working on our hardware, when the performance started sucking after customers installed W2K SP3, and we had to refer customers to the latter article.
The same storage systems have battery-backed cache, and every write from cache to disks is made write-through (because drive cache is not battery-backed). In other words, in these and other Enterprise-class systems, the burden of honouring fsync() / write-through commands from the OS has switched to the storage controller(s), the drives might as well have no cache for all we care. But it still matters that the drives do honour the fsync() we send to them from cache, and not signal "clear" when they're not - if they lie, the cache drops that data, and no battery will get it back..!
(this is not a
I tried running hdparm -i /dev/sda and get the following:
/dev/sda:
HDIO_GET_IDENTITY failed: Inappropriate ioctl for device
there were many linux defects with no track cache flush command being recived by devices, but if you want one set of recent fixes for flush corruption ...
:
-force-ide-cache-flush-on-shutdown-flush-fix.patch
/ patches/2.6/2.6.6/2.6.6-mm2/
refer to
-force-ide-cache-flush-on-shutdown-flush.patch
in Changes since 2.6.6-mm1
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm
why the hell my informative parent post gets modded to only a "2" just because people do not like the truth is astounding.
I was hoping this would happen to my INFORMATIVE post because it just means i will not bother helping anyone in slashdot again for another halfyear absence form posting.
i figure... why bother... the S/N ratio is such that no low level coders seem to ever read slashdot anymore anyways in recent years.
its probably time for me to more to other sites as well.
"2"! on the only FACTUAL and informative post in the entire damned thread!
Yes, nothing by itself is enough, not even XA transactions, but it can make your life a _lot_ easier. Especially if not all records are under your control to start with.
E.g., the bank doesn't even know that the money is going to reserve a ticket on flight 705 of Elbonian United Airlines. It just knows it must transfer $100 from account A to account B.
E.g., the travel agency doesn't even have access to the bank's records to check that the money have been withdrawn from your account. And it shouldn't ever have.
So you propose... what? That the bank gets full access to the airline's business data, and that the airline can read all bank accounts, for those integrity checks to even work? I'm sure you can see how that wouldn't work.
Yes, if you have a single database and it's all under your control, life is damn easy. It starts getting complicated when you have to deal with 7 databases, out of which 5 are in 3 different departments, and 2 aren't even in the same company. And where not everything is a database either: e.g., where one of the things which must also happen atomically is sending messages on a queue.
_Then_ XA and ACID become a lot more useful. It becomes one helluva lot easier to _not_ send, for example, a JMS message to the other systems at all when a transaction rolls back, than to try to bring the client's database back in a consistent state with yours.
It also becomes a lot more expensive to screw up. We're talking stuff that has all the strength of a signed contract, not "oops, we'll give you a seat on the next flight".
Yes, your tools discovered that you sent the order for, say, 20 trucks in duplicate. Very good. Then what? It's as good as a signed contract the instant it was sent. It'll take many hours of some manager's time to negotiate a way out of that fuck-up. That is _if_ the other side doesn't want to play hardbal and remind you that a contract is a contract.
Wouldn't it be easier to _not_ have an inconsistency to start with, than to detect it later?
Basically, yes, please do write all the integrity tests you can think of. Very good and insightful that. But don't assume that it suddenly makes XA transactions useless. _Anything_ that can reduce the probability of a failure in a distributed system is very much needed. Because it may be disproportionately more expensive to fix a screw-up, even if detected, than not to do it in the first place.
A polar bear is a cartesian bear after a coordinate transform.
Actually, it is. The standard was updated in 1998 to avoid confusion. Having different name for different things can avoid an awful lot of confusion, so it would very much recommend using them.
Which is more important? The de facto standard that slightly misuses the 'kilo-' prefix, but *everyone* knows what it means; or something that was forced into place by marketing?
As I argued in more depth elsewhere, anyone who used computers *knew* what "kilobyte" and friends meant.
There was no confusion, because only the 1024-byte definition was widely used.
The 'need' to use the '1000 byte' definition was created by marketing, not computer people. THEY caused the confusion for their (short term) gain by exploiting the old meaning of 'kilobyte' to make their drives seem larger.
Marketing do not give a flying **** about correctness or clarity; if there was any problem, *they* created it. Computer people knew what kilobyte meant.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
(OFFTOPIC warning)
You have no heart! You killed my bunny!!!
thanks for the info, i appreciate it. unfortunately, the 7200.7 is sitting behind a 1394 bridge, so it sounds like i'm basically screwed. (and to add to the issue, the enclosure periodically power cycles the disc for no apparent reason...)
i am no longer purchasing seagate discs. in the past, i have tried to get this info out of their tech support, however they are completely incompetent, condescending, and plain annoying.
fwiw, disabling the write cache on IBM/Hitachi discs is permanent, and they offer a utility to do so. Unfortunately, under SP2, windows vigorously reenables the write cache setting regardless of what you tell it.
fsync(2) man does state:
fsync copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage.
But then it goes on to state:
NOTES
In case the hard disk has write cache enabled, the data may not really be on permanent storage when fsync/fdatasync return.
Which, as you point out, can be a BAD THING (TM) if someone opens a window. So, who should change? fsync, and it's man page's NOTES for devices that have a cache but actually are capable of flushing that cache? Or should there be a special really_fsync() call?
When posting such as article, it'd be handy to get your sources right. His name is Brad Fitzpatrick, not Fitzgerald.
Frankly, this speaks volumes to the reasons why when you enable write caching in hdparm, and Winblows, and the thing crashes, you have to wait while the file system is checked, scrubbed, et al before coming back up.
All content in this message is copyright (c) 2008. All rights reserved. RIAA is prohibited here.
Hey man, tell me what sites you are going to and I will follow you. Bottom down: I love posts like yours, and I agree it is too bad people fuck with them.
Simple fact; anyone who wants to be pedantic about it can correctly argue that the 1024 definition of kilobyte is wrong. What they can't do is give any proper justification for changing a definition that everyone knew and understood to mean 1024 bytes.
Because it's not a simple fact. kilobyte is 1024 bytes when referring to binary addressed data (such as RAM chips) but is 1000 bytes when used in other areas, such as network bandwidth, or floppy drive space, or bus bandwidth, or what have you.
The problem is everyone does not know and understand 1024 bytes to be one kilobyte, they only presume it always does, when it quite obviously doesn't.
Since you've demonstrated confusion over the matter yourself by making a blanket statement that 1024 bytes is one kilobyte, while ignoring the times when it IS NOT one kilobyte, you demonstrate a need for rejecting the system that lead to your own confusion.
Don't even *think* of saying that using decimal kilobytes et al had any purpose other than making drives seem bigger than they were; that trick only worked because everyone had previously agreed that a kilobyte was 1024 bytes.
Why do you say such inaccuracies? drives going back to the first drives ever made used kilobyte = 1000 bytes. It has always been that way and that is the correct way because a hard drive is not binary addressed data rather it is arbitrary based on the number of bits that fit on a circle of metal. Nobody "previously agreed that a kilobyte was 1024 bytes" because that is a blanket incorrect statement.
People agreed that a kilobyte is 1024 bytes only when referring to binary addressed data which is not the case on a hard drive platter which is an arbitrary size much like network speeds or things like bandwidth. The only time in a hard drive life when kilobyte=1024 is when you are talking about the MAXIMUM ADDRESSABLE DATA over the controller that the drive is attached to. and that has a bit width and therefor is a power of two.
Drives always have been decimal binary even from when they were first research-only inventions. it is revisionism to suggest it is all marketing and you have fallen into a trap of thinking that.
the windows OS shows every partition as a seperate drive even though they are not actually seperate drives.
so what lied to you was your operating systems user interface that claims there are two drives when in fact there are just two partitions on the same drive
note to nitpickers: i said USER INTERFACE i know you can see the partitions in the administrative tools but most users won't know that exists.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
From the UNIX spec, vol 2:
---
NAME
fsync - synchronise changes to a file
SYNOPSIS
#include
int fsync(int fildes);
DESCRIPTION
The fsync() function can be used by an application to indicate that all data for the open file description named by fildes is to be transferred to the storage device associated with the file described by fildes in an implementation-dependent manner. The fsync() function does not return until the system has completed that action or until an error is detected.
The fsync() function forces all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronised I/O completion state. All I/O operations are completed as defined for synchronised I/O file integrity completion.
---
In short, fsync() is specifically designed to flush the data in memory to the device as well as ensure the device to writes right fucking now and stfu until the job is done. fsync() under Linux does indeed issue command E7h for ATA5, which the drive is expected to follow immediately.
If the device fails to do so, then it's operating out of spec and therefore is either faulty, or the manufacturer is falsely claiming compliance with the spec and selling something other than what was promised.
No, flush is still used to dump filesystem changes from system memory to the drive even if the drive doesn't have a cache.
However, fsync() _is_ expected to ensure that the data is committed in a way that ensures data integrity, regardless of the medium being used.
If the drive has a hardware cache, then fsync() implimentations are expected to ensure that this cache is also flushed. To this end, various incarnations of Linux and BSD employ ATA commands specifically designed for this task and which are mandatory for a drive to claim ATA compliance.
If the drive manufacturers are failing to impliment these commands as specified, then we have what amounts to dirty pool and most likely consumer fraud.
this is true ..
Installing a fresh windows 98 SE on a fairly new pc and then doing windows update, there is an update witch this description:
e nts/WUCritical/q273017/Default.asp
The Windows IDE Hard Drive Cache Package provides a workaround to a recently identified issue with computers that have the combination of Integrated Drive Electronics (IDE) hard disk drives with large caches and newer/faster processors. Computers with this combination may risk losing data if the hard disk shuts down before it can preserve the data in its cache.
This update introduces a slight delay in the shutdown process. The delay of two seconds allows the hard drive's onboard cache to write any data to the hard drive.
I found it nice to see how M$ worked around it, just waiting 2 seconds, how ingenious !
link to the M$ update site: http://www.microsoft.com/windows98/downloads/cont
I'll never say that nasty term. What were those retards thinking when coming up with it? "Let's make something that sounds really stupid."?
and helped the discussion, alot.
If we stop for a moment and assume the drive itself is not able to really flush all those big cache entries during a hard power fail, you'd have to ask why.
Does the power supplies being used have any excess capacitive storage? (No, they're switchers). Does the power supply power off voltage curve go down too fast after power fail signal? (Probably)
Is it because they got into cache-size competition? Is buffer-size truly limited only by how much time during a hard power fail they have to physically write it out? Has the manufacturers, from said competition, pushed the envelope to the edge?
As areal density increases, and buffers increase, speed from linear data increases clock rate and therefore logic requirements, but lower-power logic appears, spinning mass probably stayed the similar, RPMs similar.
Could there be a system constraint that is being exceeded? One that only shows during hard power fault?
Inquiring minds want to know (but too lazy and not wnough hardware to find out himself)
Linux recently added write barriers. I don't know if it helps but it looks like its related
The right answer is for the drive not to respond to the "Sync" command with "Done" untill it really is done (however long it takes) and for the OS to not continue untill it sees the "done" command from the drive.
Then throw the little switch on the back of the
power supply (labeled '0' and'1') to the '0' position to turn off the computer. This is usually a DPDT switch and will kill BOTH legs of the line at the same time. This will be safe.
Your hard disks lie to YOU! Oh.. wait..
A computer makes it possible to do, in half an hour, tasks which were completely unnecessary to do before.
with it as does the room design around the case (as I found out the hard way in the summer of 2001 when I moved my computer to some spot where there was stagnancy and fried 3 drives (of different capacity and manufacture.)
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Seems you don't get it. fsync() flushes to the device not to the physical media! The specs clearly says that all the data should be sent to the storage device, it does not say that the storage device should flush it's internal cache too! Do you see the difference?
You seem to be referring to the kernel API fsync() rather than the ATA spec for fsync(). The author is talking about the ATA spec, and the fact that the drive is ignoring the command to flush cache to media.
The Linux man page (last updated 2001-04-18) states that all data should be written to stable storage. To me stable means that if power is pulled that data is still there. It does however, give a warning in the NOTES section that if write cache is enabled on the drive, "the data may not really be on permanent storage." I don't know if that warning is just there because of observed behavior, or if the various specs allow said behavior.
-matt
There is no such thing as fsync() in the ATA spec! ATA spec only talks about command bytes and things like that not API callable in perl!! You'd better RTFA.
Right. And the author is implementing a program that sends raw commands to ATA drives... in perl. Right. He does no such thing, at least not what I can see, by glancing at the source code of the perl script. Granted, I'm not fluent in perl, but it doesn't seem to do anything else than to do an fsync() equivalent. Please do correct me if I'm wrong.
The truth is that he doesn't know wtf he's talking about. I decide to cut him some slack though, because the FreeBSD 4 man pages at least are very misleading, and I don't know what man pages he did read.
By the way, I sent him an e-mail. It's available on my web space. I'm not posting it in full here, because it's a little long and it would be redundant, since a lot of the surrounding posts discuss pretty much the same thing as I said.
No, it's not. You're getting confused at the way "bandwidths" tend to be expressed in *bits*, not bytes. "Kilobyte" has never been considered "1000 bytes" anywhere except a hard disk manufacturer's marketing department.
"Kilobyte" has never been considered "1000 bytes" anywhere except a hard disk manufacturer's marketing department.
No, Kilobyte has only ever meant "1024 bytes" when referring to binary addressable spaces.
So when my pc tells me i'm the greatest thing since the inventions of the wheel i should suspect something???
Nitpicking, especially since fsync is defined as getting everything to a "synchronized I/O state" (whatever that's supposed to really mean). One might say, "As far as the OS is concerned, as long as the data goes to the device, it's synchronized," except that they're forgetting something: software caches and GUIs and crap are just fancy things modern OS's do. The true point of an OS is to manage the hardware so applications don't have to. Synchronizing the I/O subsystem, from an OS perspective, ought to mean attempting to make the hardware do its job moreso than doing any fancy caching (and thus flushing) in software.
But if your point is that fsync is just operating to spec, then why aren't ATA SYNC and SCSI SYNC operating to spec? Specs only work when they're followed.
Also, traditionally, unix shutdown was: switch to single-user, remount disks read-only, sync, sync, sync, halt. sync calls fsync(). fsync() is only useful if the data is committed to disk. Who cares if the software caches are flushed, if the data still doesn't get written? If the ends justify the means, fine, but we're not even making it to the ends here.
"I swear that's not my tubgirl file. You can't trust the timestamp or the userid. Hard drives lie to you according to /."
Well, there's spam egg sausage and spam, that's not got much spam in it.
This type of 'the meaning is obvious' is what causes Mars expeditions to fail. Just because you and I and many others have been abusing the term 'Kilo' for many years doesn't make it right to continue to do so.
Marketing sucks, I'll give you that. But this time it's not their fault.
To Terminate, or not to Terminate, that's the question - SCSIROB
I think you missed the point here buddy... In the case of Linux, after sending the data, the driver explicitly issues a hardware command to tell the device to write to media and STFU until done!
Do you see the difference?
>I found it nice to see how M$ worked around it,
>just waiting 2 seconds, how ingenious !
What would you have done? Verifying all data would probably take longer than 2 seconds, and you can't trust the disk to tell you when it's written the data.
So you'd either have to figure out all the data that was in the cache, and verify that against the disk surface and only write when all that is done, or wait a bit. Making some assumptions about buffer size and transfer speed, then adding a saftey factor, is probably where the 2 second came from.
Did it work? Well it'd appear so. Whats so bad about MS's fix?
No, I didn't kill *your* bunny.
Being viral, it's easy to get it to reproduce, so now I am a cute-bunny farmer.
I'm trying to work out if there's more profit in selling their feet, or keeping them whole, having them stuffed, and selling them as children's toys.
I love those cute bunnies (^_^)
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Is this why my 200Gb Seagate Barracuda SATA drive (ST3200822AS?) hangs in Linux and Windows? It happens under load and doesn't return within 30s, so I usually have to hit the reset switch.
Also, the drive is very CPU intensive, even on my MSI NEO2 M/B. So overall, it's really annoying - wished I'd bought a PATA drive instead. Damn SATA.
Well, there has been very little hard evidence (from *anyone*, including myself) here.
However, if you have any clear evidence that this was *generally* the case, I'd be interested to see it.
(I'm not saying you're wrong; I'm saying if what I believe is a fallacy/urban-myth, I'd at least like to see some evidence of it beyond one or two isolated incidents- which were probably marketing-driven anyway (^_^) )
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Exactly - the author of this "test" made a bad assumption: fsync() (or rather the windows equivalent) means it's on the disk. Understandable, and once upon a time it was true in Unix. fsync() doesn't (that I know of) issue ATA sync commands, though.
I used to beta-test SCSI drives, and write SCSI and IDE drivers (for the Amiga). Write-caching is (except for very specific applications) mandatory for speed reasons.
If you want some performance and total write-safety, tagged queuing (SCSI or ATA) could provide that (with write caching turned off). You'll still give up some performance, since the a single-threaded write application/FS will wait for data to be on disk before continuing. If the FS/app writes (say) 3 chunks of data that fill a track, with write caching off and tagged queuing, it's probably a minimum of 3 rotations (probably more like 4.5 or more) to write the data. With write caching, it's minimum 1, more like average 1.5 rotations. With a LOT of pain, you could break the single-threadedness of this in some cases by not waiting for tagged write completions and reporting success, while marking the VM pages as copy-on-write or some equivalent so the app won't overwrite the data that you're still writing (or, you could only return success to the app/FS when the data has been sent to the drive, but before it reports success). This (in a way) moves the write cache into the disk driver and thus gives you control over it. Perf will still be lower than letting the drive do it, perhaps a lot lower in some cases.
If you want _real_ performance and safety, turn on write caching, and when you hit a "safety checkpoint", tell the drive to flush the write cache to disk. I don't currently believe that ATA or SCSI drives generally ignore that command - please provide links if you know differently. It's not a benchmarking advantage to subvert that unless the OS/app is using it - but maybe OS's are turning fsync()/etc into ATA/SCSI sync commands, and the drive makers are lieing.
There's an IEC standard that adds a "bi" postfix to the SI prefixes for specifying binary multiples of a quantity. Kibi for 1024, mebi for 1.048 576 and so on. More info available from the wikipedia article.
After twenty years in the industry, I can't recall anyone (outside of marketing driods and pedantic wankers), anywhere, anytime, ever use the term "kilobyte", "megabyte", "gigabyte" or "terabyte" to mean a base 10 number, whether they were talking about hard disk space, floppy disk size, network bandwidth, bus bandwidth or, indeed, anything except the advertising disclaimer on a hard disk.
Do you have any examples of the usage you are talking about ?
No, Kilobyte has only ever meant "1024 bytes" when referring to binary addressable spaces.
I think you'll find the de facto definition is "1024 except for metrics being stated in bits or hertz". Or, to put it another way, for anything that would have "bytes" on the end of it, "kilo" means 1024.
It tells me to do things. Terrible things.
Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
Your hard drive uses something other than binary to address the data stored on it?
Food not Bombs is a nice platitude but it breaks down when you notice that the Bombees are usually well fed
My hard drive lied to me? I KNEW it was cheating on me with that graphics card! I should have known when I saw that red polygon on his platter!
Best death? What, die from a naked lady avalanche?
I was trying to think of how to put this and of course, after I posted, I got my thoughts straight. It's something I've worked with long enough that it's intuitive to me, but perhaps a bit tricky to explain so bear with me here.
A priori, we have no way of knowing whether a particular write will complete or not. Therefore, any data consistency scheme which relies on predicting that a particular write will complete won't work. Instead, we have to have consistency schemes that rely on knowing that a particular write has completed.
What does this mean? Say you disable the write cache on a drive. Does this mean that you can be guaranteed that every write you start will complete properly? No. There are too many things to go wrong. The sector you're writing to might be bad, the power might turn off in the middle of the write, the controller might go belly up in the middle of the write, etc. All the you can rely on (provided everything is designed and implemented properly) is that when a write completes the data is really on disk.
Write caching is just an extension of this. Once you turn write caching on your guarantee that things are on disk is the completion of the flush command, not the completion of a write. This is very valuable because most transactions involve multiple writes. In fact, in a sophisticated protocol, like SCSI, rather than flushing the whole cache, you can tag a particular write and get the guarantee "when this write completes, all of your previous writes have completed". This gives you much better performance than constantly flushing the cache.
Yanking the power cord = the user reminding the computer who controls whom. :)
It's not a lie. It's the truth with lossy compression.
You need a vaguely recent 2.6.x kernel to support fsync(2) and fdatasync(2) flushing your disk's write cache. Previous 2.4.x and 2.6.x kernels would only flush the write cache upon reboot, or if you used a custom app to issue the 'flush cache' command directly to your disk.
Very recent 2.6.x kernels include write barrier support, which flushes the write cache when the ext3 journal gets flushed to disk.
If your kernel doesn't flush the write cache, then obviously there is a window where you can lose data. Welcome to the world of write-back caching, circa 1990.
If you are stuck without a kernel that issues the FLUSH CACHE (IDE) or SYNCHRONIZE CACHE (SCSI) command, it is trivial to write a userspace utility that issues the command.
It's called Not Keeping Info from the User(tm).
All that needs to be done is instead of simply displaying "Windows is Shutting Down..." display what's going on.. Like "Flushing Disc Buffers..." then "Awaiting Disc OK "
And people won't assume the PC has Hung and yank the cord (and if they did, they took an informed gamble and deserve the consequences.)
Sometimes I wish I was a plumber, then I'd know how to deal with other people's shit.
I dropped a disk once and thought it was fine too but it failed a few weeks later. Apparantly the impact released a drop of oil from the spindle bearing and it crawled across the platter.
It didn't have anything important on it either so this may have contributed to it lasting as long as it did afterwards..
The command sync all caches to disk (or it's equivilent) _is_ issued for both SCSI and IDE drives (and IDE via libata (which is itself in the SCSI layer)). You can even confirm it by enabling debugging at a high enough level, then it will announce nearly any command it sends.
i know I should not respond to this but;
Modern computers weren't designed under metric standards (8-bit bytes, not 10-bit bytes as the standard, for example).
1 bit has a binary state (1 or 0) i.e. 2 posible values
1 byte has 8 bits offering the values 0-255 (256 possible values)
if a byte, lets call it a decabyte had 10 bits then;
1 decabyte has 10 bits offering the values of 0-1023 (1024 possible values)
how is that metric (or even MORE metric) ?
(you will notice the number of different values 2,256,1024 are powers of two)
[decabyte is a made up term that just neatly melds deca (10) and byte (hungry)]
Heh, if I had teh funny mod points to give, you would so be getting them.
include $sig;
1;
Write barriers do help. Both IDE and SCSI (not sure about fancy RAID cards).
Some IDE drives are not supported though that don't correctly implement "cache flush" command.
Well if you know the size of the buffer, you could always write $buffsize zeros to disk.
perish
perish the thought
Remember, users are stupid, have the screen state:
'Making sure your data doesn't get corrupted, DON'T TURN THE COMPUTER OFF UNTIL WE SAY SO!'
Except that the drive is likely to finish writing those before it gets to everything that was already in the buffer, slowing down the synchronization process. The disk buffer is not a strict queue, because the write order is optimized for locality on the disk surface.
It's tragic. Laugh.
This is a perfect example of someone who hasn't read why T&R did not have sync work that way. Probably doesn't know why someone old and crusty would type
# sync
# sync
# reboot
and not
# sync;sync;reboot
Hrmphh.
If you can't tell the difference, don't write about it!
This is why god created a flag to turn write-caching off.
It can be a severe performance penalty, depending on the technology - but it gives you data integrity in return.
I have news for you. The earth connection should not have ANYTHING going through it, unless you have a major electrical fault. Maybe you need to look at your wiring before blaming the plug...
Yes. And trust me, this isn't the only one.
66 (2/3) MHz times 8-bytes wide is 533 (1/3) MB/s. Here mega means 10^6, not 2^20. If it were megabinary, it'd be 508-something MiB/s. (*)
Look, computer usage of kilo has always sucked and been inconsistent. Always. Own up to it and fix it.
(*: I find it amusing that in order to find an example, I had to find one where they used "66 MHz" incorrectly, but no one actually writes 66.66... MHz, so forgive the irony.)
"1024 except for metrics being stated in bits or hertz"
So a 2 GHz link that's 1 byte wide transfers data at 1.862 GB/s? This is just silly.
This is what you used to do for ages:
# sync
# sync
# sync
Seems there was a reason for it.
The green ground wire is a safety connection, not a current-carrying one. If disconnecting ground before the line and/or neutral fries your equipment either it is defective or your building is dangerously miswired.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Went out to his little house...$3,000 worth of cat toys...
"Reality is that which, when you stop believing in it, it doesn't go away." - Philip K. Dick
propagation of myth. You have those in different camps. Spinning up does not necessarily cause more wear and tear. Drives go into low power states and spin up and down all the time. Many times you have a MPOHBF (Mean Power On Hours Between Failure) level on equipment. You do not have the same issue as with combustion engines that have an oil pan that needs to have viscous fluid distributed around moving parts to keep it from failing. There is evidence to both support and negate both the "Keep it on" and "Turn it off" camps.
The whole point though is the drive has absolutely no reason to cache the data. The OS caches for you, so you can use fsync() to say "make sure this is written" - I can see why that might be useful in many cases. If the drive is caching too, what's the point of fsync()ing anything.
Brad understands what fsync() actually does, but his point is it should match the real world needs, not just say "yeah it's written" when all it's done is moved from one bit of memory to another but not necessarily hit the disk.
Get paid to search..It's geniune and
Here's some fun math for you. A 9600 modem vs. a 14.4k modem.
Now, according to you, the 14.4k should mean ~'14063' bits a second.
How, then, do you explain that 14400 bits a second is exactly 1.5 times 9600? 14.4k modems are 1.5 times faster than 9600, and they transfer 14400 bits a second, and they're called 14.4k modems.
Looky, they measure modems where k=1000. They incidently measure network speeds the same way.
In fact, they've always measured everything except memory that way. Your .5GB of memory may indeed be 512MB and 524288kB and 536870912 bytes, but it's the only thing that does that, except, oddly enough, some file and disk size measurements in the OS. Your 1Gb/s network card is exactly 10 times faster than your 100M/s card, not 10.24.
Whcih, incidently, is damn good, because otherwise it would be hell to convert bus speeds to data transfer speeds.
And I think the fact people are arguing otherwise shows exactly why we need 'kibibyte' and whatnot, no matter how silly those names were. It's so bad it confuses us.
If corporations are people, aren't stockholders guilty of slavery?
It's called Not Keeping Info from the User(tm).
All that needs to be done is instead of simply displaying "Windows is Shutting Down..." display what's going on.. Like "Flushing Disc Buffers..." then "Awaiting Disc OK "
But don't you trust Microsoft?
Windows is shutting down, all is well. Worry not your pretty little user head, your big brother Bill is taking care of everything for you.
You can't take the sky from me...
When you are connecting two floating pieces of equipment by a cable, something has to equalize the static charges. Normally it would be the green wire ground. Lacking that, the static would have to discharge somehow. Most cables are designed so that either a ground pin or a grounded shield makes connection first, but you can't really count on that happening every time, can you?
The electronic circuits in the equipment that connect to externally accessable pins are supposed to be designed such that they could take some static discharge, providing another degree of protection.
Manufacturers of industrial equipment use a special tester (call HIPOT, after high-potential) to zap all external connections of their equipment with 5kV charge, as a test. It takes a fairly extensive test program to certify equipment for HIPOT, one of the reasons being that static often causes latent internal damage that doesn't kill the equipment right off, but drastically reduces MTBF instead.
Can you really trust the manufacturer of every board in your homebuilt box to have done proper testing? I'd say, no...
As far as miswired buildings are concerned, every tiem I move I check all the outlets in the new apartment with a little 3-led tester - you know the kind - two green LEDs are supposed come on, and the red LED should stay off. I'm yet to move into a house where every outlet would be wired properly - and I move quite a lot. If I'm forced to use power connection that doesn't have proper grounding I always make sure that all the components of my computer system are plugged in to the same power strip, i.e. their ground pins are connected together. This way even though the safety function of the ground connection is absent, I don't have to worry about my monitor zapping my video card as I plug in the video cable.
His contention that RAID controllers "lie" as well underscores his misunderstanding of storage hardware technology. While disks should probably flush the cache whenever requested, RAID controllers with write caching enabled should have battery backups. Forcing such a controller to flush its cache from the application level is unnecessary, unreasonable, and paranoid. An application programmer is not a "hardware guy" and should let the hardware engineers and driver programmers handle these considerations.
Gamingmuseum.com: Give your 3D accelerator a rest.
Your hard drive uses something other than binary to address the data stored on it?
Yah, LBA (logical block addressing). You ask the drive "give me block X", and it gives you block X. No binary involved. The fact that the numbers are transmitted over binary is unimportant. It could've been done over ternary, or avian squawkspeak consisting of fifteen symbols.
"Binary addressing" means "addressed by a bunch of address lines which hold values in binary". So a 1024-byte SRAM has 10 address lines. If I want to add more capacity, I have to add another address line, which gives me 2048 bytes. In other words, memory sizes are 2^N, where N is the number of address lines. Here, it does matter that the address is transmitted via binary: if each address line held 3 states, the total memory size would be 3^N, not 2^N.
Note that most modern memory multiplex the address lines (as having 30 address lines for a gigabyte of memory is awkward) into row and column addresses, but it's still binary addressing, and memory sizes still need to scale by powers of 2.
That can be so fucking frustrating when you're working on a locked-up laptop...
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Couldn't they just stick a large capacitor or small battery on the harddrive that is only used for flushing the write cache to the platters in the event of a power failure? It should be a simple enough matter, we only need a few seconds here, and it would solve this whole mess.
I wish my drive would lie to my wife about who downloaded all those pictures.
I have never talked to my harddrive, but if I did, it wouldnt surprise me to catch it lying...i never did trust it...
#include bier;
Drives will NOT always flush. Consumer ATA drives are notorious for ignoring flush commands to get another half a point on some benchmark. In addition, many drives will *lie* about whether you can turn write cache on and off, too.
If you talk shit about Brad Fitz he has Fits.
I'm both drunk *and* stoned.
Should be a lot of moddin' fun today, lemme tell ya..
Oh look, someone read a man page and wrote about it. From 'man 2 fsync':
NOTES
In case the hard disk has write cache enabled, the data may not really
be on permanent storage when fsync/fdatasync return.
"Those who would sacrifice essential liberty for temporary safety deserve neither liberty nor safety."
Real soon now, disks will be listed in TB, are you will have to relive all your years of anguish. Sorry, bub.
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"
You clearly don't understand the purpose of documentation or the actual operation of the (e.g.) Linux drivers.
The drivers not only flush the data to disk, but also issue the appropriate ATA commands to flush data from the device to stable storage (E7h). So, the correct behavior of fsync() is to flush any data in either the OS or the drive cache to storage.
With regards to documentation: even though the purpose of the syscall is to flush data to storage, it's clearly quite common for drives to ignore the command. It's important that developers know about this problem, so it's in the documentation as a "NOTE" at the bottom.
Write-Cache Enabled?
Wrong. the drive has a very good reason to cache data. Only the drive knows which sectors are nearest to where [why? a) hotfix sectors b) all drives now use LBA. CHS has been a hack since 500MB hard-drives. the actual geometry is hidden from the OS, thus the OS can only do very limited re-ordering of reads/writes.].
That is the point of TCQ, and why drives should buffer reads and writes and execute out of order.
FSYNC(2) Linux Programmer's Manual FSYNC(2)
NAME
fsync, fdatasync - synchronize a file's complete in-core state with
that on disk
[...]
NOTES
In case the hard disk has write cache enabled, the data may not really
be on permanent storage when fsync/fdatasync return.
No, it doesn't. Or at least the documentation doesn't seem to think so.
I vaguely seem to recall some older IBM machines that were BCD to the point of only having BCD addresses for memory--which would indeed give such numbers.
hawk
You're wrong. He's not.
He explains that he got the same results using the rawmedia interface. It was just a bitch to do the aligned writes in perl.
Fsync on linux happens to issue the underlying raw flush command.
The issue is NOT that fsync isn't guaranteed to do a physical flush to media! The issue is that even though on linux it happens to issue that command, the underlying physical media controller lies about having done so.
If you think his understanding has anything to do with fsync itself, you need to go back and re-read this article.
You owe him a public apology. I doubt you'll have the balls to do it, though.
Of course their bottom line is more important than your data. Unless you force them to re-evaluate their bottom line by not buying their products.
Heh, been there, pulled the battery too.
-- Alastair
I'd bet that the AC comment you are referring to is Andre Hedrick, once known as the Linux IDE guy.
Here's a previous Slashdot interview featuring him.
I think everyone who knows much about hard drives has known about this issue for a long time. It would be a good introduction for the newbies if not for how inflamatory this story is: "OMG They're lying! You'll lose all your data!"
For a long time, absolutely any documentation about any filesystem (Linux journaling FSes, BSD softupdates, etc) has explianed that you need to disable the drive's cache to ensure consistency.
What surprises me, after all these years, they still haven't added battery-backed cache to medium/low-end systems. They could either add a battery to the hard drive's PCB, or integrate that with the motherboard's ATA controller. Either would be a big improvement, but I think having it on the controller/mobo would make it faster, cheaper, etc. In better systems, they could have an extra slot for HDD cache (EDO/SODIMM) that would be battery backed. In low-end systems, they could dedicate a portion of main RAM to the cache, as they do with onboard graphics. If they had done that, you probably wouldn't see journaling filesystems today, as the disk would never be inconsistent, and probably never need an fsck.
Yes, you can buy (expensive) ATA controllers with battery-backed onboard-cache, but the fact it isn't found in every controller shows a great deal of apathy.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
If they use 10^6 for MB/s, then that's different than how everyone else uses it, isn't it? It's just an excuse to make it look larger than it really is.
How hard is it to say 1.8 GB/s?
Actually, it's a flaw in the ATA specification: ATA drives can do a disconnected read, but there is no way to do a disconnected write.
Because of this, you can have a tagged command queue for read operations, but there is no way to provide a corresponding one for write operations.
SCSI does not have this limitation, but the bus implementation is much more heavyweight, and therefore more expensive.
The problem is exacerbated, in that ATA does not permit new disconnected read requests to be issues while the non-diconnected write request is outstanding. Therefore, any write acts as a read stall barrier.
In order to compete with SCSI on both write performance, and interleaved read/write operation performance, manufacturers added write caching by default, breaking the historical contract about when a write completes to stable storage vs. the write operation not returning until it did.
Today, there are still a number of disks that *actually* lie, and there are a number of firewire/ATA bridge chipsets that do not propagate the FW sync into an ATA sync, even if you didn't end up with a disk that lied.
So you can be screwed if:
1) The disk lies about honoring the cache flush request (there was one series of Quantum ATA disks that did this, for which Quantum promptly provided a firmware update. I really like Quantum for this, and you can find the discussion on the FreeBSD-hackers mailing list archives).
2) The controller or bridge chipset responds to the flush request, but does not propagate it to the actual devices (there is one popular bridge chip that does this; since it was not recalled by the manufacturer, and there is no firmware update fix possible, in the interests of not being sued, I'm going to avoid naming names here.
3) The OS may not issue the command for user perceived peroformance reasons relative to the competition (this is why, before the cache flush command existed in the ATA specification, FreeBSD turned back on the write cache by default, even though everyone knew that data integrity guarantees *would* go out the window).
Unfortunately, I can no longer just say "ATA sucks; use SCSI", because a number of SCSI disk manufacturers have started doing the same pig tricks with their SCSI disks (again, not naming names), and ignore the SCSI cache flush command, or ignore the mode page setting for synchronous I/O completion with tagged write commands (writing is slow, especially if you have to read an entire track to write a block).
Hopefully, this Slashdot article will cause the mainstream press to put enough light on this issue to shame the drive manufacturers into at least labelling actually compliant drives.
-- Terry
Good God Slashdot. I can't stand it. Here are the correct IEC S.I. prefixes. Get used to them.
kilo = 1000
kibi = 1024
kiki = 1066
acrin = 6666
kinki = 6969
mega = 1000000
mebi = 1048551 = 1024*1024
mixi = 1474569.3 = 1.44 * 1024 *1000
mipi = 3141593
mumbo = 1111111
mjumbo = 9999999
giga = 1000000001
gibi = 1368572279 = 1024 * 1024 * 1024
garbagi = 1254768991 = 1024 * 1024 * 1024 -1
giganti = 9999999999
If people wuold just commit these to memory I believe that life would be a lot easier.
Mainly caused by people who can't bother to wait to have their computers shut down correctly?
Maybe I should rephrase that. Does this mean that since HDD makers are too lazy to ensure their drives as far as data protection/integrity is concerned, that this could be the reason that we have OS degradation? It seems to link perfectly, especially with the comment on faster systems shutting down before the HDD can write what it needs to the HDD?
If this were the case, just out of STUPID curiosity, if every system shut down properly, could the security problems be lessend since the OS would have less of a chance of degrading and having more vulnerabilites?
I know this is considered an old story, but, still, I gotta ask this question and get some answers/opinions. To some point, this could explain (Even though I hate them, but rely upon them) at least SOME of |\/|$'s security problems that their patches just don't seem to fix?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Note that while fsync() will flush all data from the host to the drive
(i.e. the "permanent storage device"),
Well, then quite frankly, that is retarded wording. If it doesn't hit the platters, then there is nothing permanent about it.
Besides,
In UNIX a block device doesn't necessarily have to be permanent storage device, it could be a RAM disk. So calling it a "permanent storage device" is wrong in general.
They should just fucking say that it flushes it to the underlying block device and be done with it. Fuckwits.
You're a fucking idiot.
Tell us why we should cut you some slack if you're too fucking retarded to read the source code for fsync.
Why do you say such inaccuracies? drives going back to the first drives ever made used kilobyte = 1000 bytes. It has always been that way and that is the correct way because a hard drive is not binary addressed data rather it is arbitrary based on the number of bits that fit on a circle of metal. Nobody "previously agreed that a kilobyte was 1024 bytes" because that is a blanket incorrect statement.
Actually, way back in the day, a harddrive kilobyte was 1024 bytes. It was sometime around when a 20-40MB drive was all the rage that marketing-speak took over and all the confusion started. The whole bait-and-switch thing is what people are pissed off about.
CHS addressing wasn't quite binary--1024 cylinders, 16 heads, 63 sectors max.
Please see this link for a more complete description
e b/msg00072.html
of what's going on:
http://lists.apple.com/archives/darwin-dev/2005/F
That came out of a discussion where someone claimed that fsync() on MacOS X was deficient (which is not true). I hope that helps clarify the issues.
Where? I found no reference to rawmedia in the linked article. If it was in another article I'm not aware of, please tell me. If I write something incorrect in one article, but I have 10 other articles where I'm correct, does that make a person wrong for pointing out a mistake in any of the articles somebody has written?
Who mentioned Linux? I certainly only mentioned it as one of the sources I pulled man pages for, for a standard library function available on pretty much all operating systems today. If you want to go platform-specific, fine. That's not what I was doing. The only place he mentions any specific operating system is in his later clarification in the top of his post.
I understood this after he wrote his clarification. The thing is, I didn't intend to criticise the person or his knowledge. I can't do that. I was discussing the content of the article, where he specifies that the fsync() call IS guaranteed to flush to disk (without specifying which operating system) and then blames hard drive manufacturers for fsync() not delivering the guarantees it's specifically documented not to guarantee.
I'm willing to admit, publically, that he knows his shit, and that if I somehow doubted it, it's because he wrote a misleading article. I won't apologise for criticising a poorly written article though. I'm not a mind reader, I can't assess knowledge that's in somebody's head, but not in writing.
He does have a point -- enabling write caching on high-end drives by default is brain dead. If that were the point he was making, I'd agree. Instead, he went on about drives not obeying fsync(). Without knowing what operating system's fsync() he was talking about, I had no choice but to refer to as much as the library standard says.
For full disclosure, I did reply to his reply to me (he probably sent the same reply to many people) acknowledging that I'm somewhat happy with the way his article reads now that he's added his clarification at the top.
Either way, I don't think there's anything left to discuss here. The author of the article has already updated his blog with the relevant information, and I have no beef with him, and I hope he has no beef with me.
For full disclosure: a copy of the followup he sent and my final reply is now up at my web space.
There. I had the balls to admit he knows his shit after all. Now, the ball's* in your court. Unless you have the balls to identify yourself when you reply, don't bother replying at all. I have nothing more to say to an Anonymous Coward.
pv2b
* Man. That was a bad pun. Please hit me in the head with a large anti-pun readjustion device or whatever.
No, it shouldn't. When dealing with metrics being measured in bits, kilo has had the SI definition.
In fact, they've always measured everything except memory that way.
Actually they've measured just about anything that would be measured as xxxx-bytes "that way".
Your .5GB of memory may indeed be 512MB and 524288kB and 536870912 bytes, but it's the only thing that does that, except, oddly enough, some file and disk size measurements in the OS.
Are we seeing the pattern here yet ? Like, maybe, that when data is being stored and referred to in *bytes* that kilo means 1024 ?
Your 1Gb/s network card is exactly 10 times faster than your 100M/s card, not 10.24.
That's because it's a measurement being made in bits, not bytes.
And I think the fact people are arguing otherwise shows exactly why we need 'kibibyte' and whatnot, no matter how silly those names were. It's so bad it confuses us.
I've never met anyone for whom it mattered who was confused about when kilo meant 1024 or 1000.
That's the stupidest fucking thing I've ever heard of, and not the least bit true.
Data transfer rates are always using 1000, regardless of whether they're in bits or bytes. It's just that almost no one writes data transfer in bytes, just like absolutely no one writes storage in bits.
But, google for '60 megabytes USB' and see how many people assert that USB 2.0's 480 megabits a second is 60 megabytes a second, whereas in your universe it's apparently 62.9 megabytes a second. Google for '63 megabtyes USB' and '62.9 megabytes USB' and see how far that gets you.
But, luckily, there is one place where data transfer is in bytes. IDE bus speeds, because those transfer whole bytes at a time. If you'd do the math, you're see when they're talking about, for example, ATA-100 drives, rthey are talking about them operating at 100 MB/s, not 100 MiB/s. They can't be talking about the latter, the damn bus is only 100 Mhz. You can't transfer 104857600 bytes in 100000000 cycles.
If corporations are people, aren't stockholders guilty of slavery?
Marketing bullshit, pure and simple; in fact, I propose the phrase "marketing gigabyte", just to make it absolutely clear which definition is in use...
Personally I prefer 'weaselbyte' and 'rab idweaselbyte'. Marketing screwed the whole industry over just to make their numbers look good, I propose we use these terms to make them absolutely truthful again.
Clairification:
The drives may need to cache the write, but the issue here is whether they also fail to flush their cache when a sync command is sent over the ide bus.
There: Something at a specific location.
Their: Owned by someone.
Please make sure your english compiles.
How the hell did this get modded "Informative" anyway?
:P
Slashdot sense-o-humor meter:
E[\..........]F
How hard is it to say 1.8 GB/s?
/. Second is s.
Let me repeat this again. A 2 GHz link that transfers 1 byte on every clock cycle.
Two giga-transfers of 1 byte per second.
Giga is G. Transfers are unitless. Byte is B. Per is
That's 2 GB/s.
Units are standards. Yes, we've been screwing around with them for a while. It's time for us to grow up and act like adults.
This isn't different than everyone else. This is the same as everyone else.
From The Checkpoint Mechanism in KeyKOS by Charles R. Landau:
From EROS: A Novel Combination by Jonathan Shapiro. In other words, yes, some people actually have written a routine for yanking the power cordI'm guessing that's why you're only a "2".
fish and pipes
Repeatedly used 10^9 instead of 10^6. Obvious brain crash there folks.
Bitter and proud of it.