Replacing Atime With Relatime in the Kernel
eldavojohn writes "Our friend Jeremy at the Kernal Trap has dug up some interesting criticism of atime from Linus Torvalds. As Linus submitted patches to improve relatime he noted: 'I cannot over-emphasize how much of a deal it is in practice. Atime updates are by far the biggest IO performance deficiency that Linux has today. Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the past 10 years, _combined_.' And later severely beat atime about the head with a pointed stick: 'It's also perhaps the most stupid Unix design idea of all times. Unix is really nice and well done, but think about this a bit: 'For every file that is read from the disk, lets do a ... write to the disk! And, for every file that is already cached and which we read from the cache ... do a write to the disk!'" Well, I guess I can expect my Linux machine to become a little bit faster!"
After I mounted my system with nodiratime and noatime, I did not 'feel' any actual speed increase. I didn't did any hard testing of course.
Disclaimer: Disregard the above post.
Since I have no idea what atime or relatime actually are, could someone just tell me which kernel settings should be changed re: this story for an ideal desktop system?
Thanks.
Seriously. Many have recommended mounting filesystems with the "noatime" parameter if you don't need to know atime for many years now,
In the various BSD flavors you can mount volumes "noatime", which is generally safe and does a pretty good job of keeping things moving. If you really need atime updates you can always remount the volume, but frankly not many people use it from what I've seen (maybe tail -f?).
I read the internet for the articles.
man mount says:
Is there something I'm missing?
Friends don't help friends install M$ junk.
And, for every file that is already cached and which we read from the cache ... do a write to the disk!'
Uh, why? Or rather, why not cache the atime update until disk usage is clear, THEN do the write, at least for already cached files?
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
if the poster had read the article they would have noticed that Linus did not say those things that are quoted - Ingo Molnar did.
Spot the citation/reference error!
Amazingly, standard Unix filesystems keep time of last access (atime), change of status (ctime), and file modification (mtime) but do not remember when the file was first created, which is something I have frequently wished for.
Intron: the portion of DNA which expresses nothing useful.
WTF? I think I get it now, but I didn't until I started reading the comments... But how about someone give a synopsis for the non kernal hackers here.
Do not meddle in the affairs of sysadmins, for they are subtle, and quick to anger.
And then I read the article. Very intersting, but the noatime trick will work till they get realtime sorted out. (unless you use mutt)
Friends don't help friends install M$ junk.
I see no reason why atime updates can't be postponed until some moment other metadata has to be flushed, or once a minute, whatever comes first.
The exactness of atime might suffer, but nobody will notice.
That said, I agree the noatime mount option covers most needs.
Aphorisms don't fix code. (Bart Smaalders)
Why should atime updates have to be written out to disk immediately? It probably isn't the end of the world if a few get lost if a filesystem doesn't get unmounted cleanly, and it probably updates a *lot* more frequently than anything else in the inode, so why not just have the filesystem keep the atimes separately from the rest of the metadata somewhere? It would only take a little bit of space to hold all the atimes on most filesystems (4 bytes per atime times say 250,000 files plus 5% for indexing overhead (you'd have to map inode numbers to indices into the array of atimes) is just a little over a megabyte), so if you just set that aside somewhere, cached a copy in memory somewhere, and wrote out updates whenever there was some free bandwidth to the disk, you'd be able to merge updates for many different files together instead of having to write out an entire block for every atime update, that you have to write out immediately because it counts as an inode update and it'd be bad to let those fall out of sync.
I don't see a story here, unless he's suggesting that the default be changed, and even then it's a pretty minor story. Damn, if only he had the source code, he could change it.
Note that this story has nothing to do with real-time systems, contrary to what I thought when I read the title.
If you read carefully, you will see it says rELAtime, not rEALtime.
Please correct me if I got my facts wrong.
I found that mounting using the noatime option on Solaris 8 and later was good for performance on filesystems which I mirrored via rsync. For example we had a cvs server that was struggling a little to keep up because rsyncing the files out to the mirrors was expensive. Changing its filesystem options to include noatime meant that a lot of the unnecessary I/O disappeared (previously I guess the very act of checking if a file had changed would have incurred a disk write, and we were syncing every all the files many times an hour). Other operations (e.g. cvs check-in, check-out) would normally operate on a smaller range of files, rsync was the worst case.
"You fell victim to one of the classic blunders! The most famous is never get involved in a flame war on Kerneltrap..."
"Can of worms? The can is open... the worms are everywhere."
It should tell you something, noticing that even nerds have no patience for anime.
Slashdot - where whining about luck is the new way to make the world you want.
if you lose power you're in trouble because now you have all those pending (how many?) atimes that you just lost. and you don't know how many and for what files so if you rely on atime for something then you suddenly no longer can for any of those files (but which ones? - you have no way of knowing)..
If ever there were a time to bring it back...
Tweet, tweet.
The update part of the "locate" utility bugs me. First time that thing went off when I was at the PC, causing all kinds of mysterious disk activity, I thought I'd just been hit by a virus. Now I make sure to remove "locate" entirely from all my systems. Wonder how much "noatime" would help? I'd probably still keep "locate" off my systems.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
So every time a file is used, it has to write an entry on the HDD, slowing performance,... so I have the noatime line in my fstab.
I have noatime set on my laptop.
Writing atime to the disk every time data is read from the cache keeps the disk spinning, burning battery power. "noatime" means the disk can shut down much of the time, spinning up only when writes must be synced or data not yet in cache must be read.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
I can't believe this is even an issue. I've been mounting everything noatime on my BSD systems for ages. The only thing I can think that it might possibly be used for are scripts that purge /tmp of files that haven't been accessed in a while. Even for that, ctime or mtime isn't a bad compromise.
Bit of trivia -- NTFS does the exact same thing with regards to access times. There's a registry entry to disable it buried somewhere deep that I don't remember at the moment, but if you're stuck on Windows it might be worth looking up to improve I/O performance.
I thought I'd tack this onto the FP, but the OP is severely mistaken. First of all, Ingo was the one who finally submitted the patch, and it was also Ingo who made the 'faster than all the improvements over the last 10 years combined' statement.
If you don't believe me RTFA. I just wish the OP had actually read the article before submitting it.
+5, Truth
Why don't they just implement something like SOFTUPDATES a la BSD?
Remove the 4-byte timestamp of "access time" of all inode of each file.
Only 4-byte timestamp of file's "modifying time" is needed.
When you read some file, it implies updating access time, it writes the timestamp to disk if you don't want write anything in the file!!!
Without "access time" timestamp, the filesystem will go fast with less writings!!!
It's valid upto 2038 because of 32-bit timestamp.
It's better to use 64-bit timestamp in us instead of s.
Hey, Slashdot posted an article about me! [ They also renamed me to Linus - what more can a geek ask for? ;-) ]
In any case, the latest version of the better-relatime patch can be picked up from:
http://redhat.com/~mingo/relatime-patches/
Apply it, build it, reboot into the new kernel and enjoy a faster (and lower latency) desktop. (no fstab twiddling needed)
If all you keep track of is attributes on an entire file basis, once a file is modified the create date no longer has any meaning.
Because if you do that, you can't trust the access times in case of a crash. ... One solution to the problem, might be to write the atimes to the journal, and then just update it once in a while, or perhaps even link the atimes to some seperate part of the disk, so the atimes can be written in one go somehow.
Another might be a "weakatime" option, to update the atimes in the inode cache but mark them as something weaker than "dirty" ("sumdged"?) when the data is re-read from cache. Opening, last closing, or reading more data to the cache would still dirty the inode. "Smudged" inodes would be written:
- If something else really dirtied the inode (i.e. updating wtime or last close of the file).
- The next time there is a sync that is the result of a command or has to write an actually dirty sector (so you won't spin up a laptop disk at periodic sync time just to update atimes)
- Just before the file system is unmounted.
A weaker version yet would be "wimpyatime", in which additional reads to cache after the open would "smudge" rather than "dirty" the inode. (This would be equivalent to "weakatime" with a file system that cached the entire file on open.)
But caching the atimes in memory is a recipe for disaster and inconsistency!
Not completely. They'd be updated on disk when the file is opened and (except for "wimpyatime") when new sectors are read into cache. You'd just delay the continual updating on the hard disk when an already open file is examined further. There'd be more inconsistency if the system crashed rather than shutting down gracefully, but you'd still be able to see that the file had been open.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
How does Mac OS X handle atime? Is their a noatime option?
The Christian religion has been and still is the principal enemy of moral progress in the world. -- Bertrand Russell
On of my gripes with Linux is that one cannot spin down the disks to lessen their wear and tear.
Ive been told that the kernel constantly needs to access the disk...
Is this the reason of something else prevents the disks from spinning down?
If you're using a desktop system with a hard disk you'll hardly notice any difference unless you hammer the system really hard.
Remember though that most Linux systems are either embedded (using mainly flash) or servers. In both these cases atime updates can be very damaging to performance and should be avoided unless there's a very good reason to turn it on.
Engineering is the art of compromise.
Problem solved. Many older/slower machines [laptops] can be sped up considerably by this step.
You can disable it in a few different ways Once this is done, the Last Access Time attribute for newly created files will simply be their File Creation Time. Disabling Last Access Time may affect the operation of backup programs that use the Remote Storage service. YMMV
Or there's a registry key, which requires a reboot:
Cool! Amazing Toys.
Not all file systems support atime, and they can tell the kernel that. Result: that fs does not get the atimer performance hits.
Engineering is the art of compromise.
I did a document clustering calculation with 1.3 million small documents (average 2903 bytes each) stored on a Reiser3 partition. Mounting with noatime cut the time to read the documents from disk by 27%, and cut the time to read from cache by 52%.
(mount ext3 filesystem with noatime flag)
$ time for i in `seq 1 10000`; do touch file1.dat; done
real 0m15.231s
user 0m3.075s
sys 0m11.970s
$ time for i in `seq 1 10000`; do cat file1.dat >>/dev/null; done
real 0m14.326s
user 0m2.928s
sys 0m11.172s
(remount without noatime flag)
$ time for i in `seq 1 10000`; do touch file1.dat; done
real 0m12.629s
user 0m2.687s
sys 0m9.772s
$ time for i in `seq 1 10000`; do cat file1.dat >>/dev/null; done
real 0m12.401s
user 0m2.624s
sys 0m9.624s
Yes I think I'll stick with atime for now, thanks Linus.
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
POSIX compliant? POSIX's BUG compliant?
If the OS writes 10'000 files per second to the ramdisk/diskcache/disk sequentially one after other, then
WHY EACH 10'000 FILES HAVE THE SAME 32-bit TIMESTAMP IF THEY ARE WRITTEN SEQUENTIALLY???
YOU DON'T KNOW WHAT FILE WAS WRITTEN AFTER OF OTHER FILE AND IT CAN BE A PROBLEM.
But sometimes you need it... Whether it is to project your savings or to figure out, if a particular file was read within the last year.
My problem with atime is that it is not universal enough. For example, reading a file via mmap() or sending it directly to a socket via sendfile() (both methods widely used by web-servers) will not update its atime. The access-timestamp should be updated every time a file is opened for reading, rather than when a read() is issued on it...
So, when I wanted to report, when my little piece of software was last downloaded (via HTTP), I could not, unfortunately, rely on the file's atime...
In Soviet Washington the swamp drains you.
...updating atime on close(2). It's been a long time, but I think that's what the best operating system of all time (TOPS-20) did.
A few comments up our venerable friend Twitter fell into the same trap:
c id=20162589
http://linux.slashdot.org/comments.pl?sid=264507&
Considering how much editorial work the 'editors' around here actually do I wouldn't be surprised at people habitually auto-correcting the spelling in an article title and summary. Still, in the context of atime, relatime makes a little bit more sense than realtime.
I may make you feel, but I can't make you think.
correspondence. you may want to know when you first began writing a letter.
or you may want to know when you first installed a driver/application.
etc etc etc
www.purevolume.com/martyd
Set up a web site and take donations via Pay Pal for a new keyboard.
If you mod me down, I shall become more powerful than you could possibly imagine.
Ok other than at least the handful of possible problems I can think of, if you have a file open, why isnt that part of the file system data cached also? I apologize for the lack of deep analysis here, but if the argument is that if you want to store access time of a file that is in cache, and you have to hit the platter to update that time, and to save that non cache hit by not storing AT, then why not have that block of the file system cached when you open the file? When you close the file, you also do a final flush on that file system block, otherwise the cache gets flushed on the normal cache flush criteria?
...and yes, it DOES matter.
Well, only if you want someone to actually _find_ what you wrote later.
I enabled this on my Linux box, and it completely flies! Vista can't hold a candle to the speed I'm experiencing now! Just another great reason to use Linux!
What, you want to shoot yourself in the foot?
Let my sell you the gun.
I'll even put special features on it for you, such as laser sighting and recoil suppression and a silencer. (The silencer is so you can hear yourself scream.)
If you report a creation time, you should report a creation time.
That would mean your file system would be able to tell the difference between a file opened and saved under a different name and a file opened and created from scratch. And, since the concept of template is somewhere between ex-nihilo creation and simple editing, the file system should be able to tell the difference between ex-nihilo and templated creation, and saving under a different name.
In the email discussion linked in the article kernel devs were wondering the same thing... why doesn't mutt/bash just use inotify on the mailbox/maildir if available at config/build time? (Linux does have a event system in epoll and inotify -- and we almost got that unified under kevent, and it's threatening to happen again pulling in AIO too)
They were also concerned about breakage of other systems that are not open source (HSMs) which use atime for usage tracking. Noting that HSMs and atime-sensitive maintenance programs usually don't exhibit time resolution in policies more fine than a day, they came up with the relatime = update at least once a day compromise.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
The inode representing the file and the file data itself are located at two different places on the disk. Both get read and cached when the file is opened and read.
... Some stuff happens with the file, and other files, or maybe nothing...
Op1) Read inode into memory (cached)
--- "Open" file for reading (inode gets atime updated, cached inode is now "dirty")
Op2) Read requested blocks of file into memory (cached)
OpN) bdflush getting impatient or a filesystem sync prompts the dirty inode to be written back out to disk and the memory reclaimed
So for every file read, there are three distinct operations, one at location A on the disk, another at location B, then a third at A again.
This is with normal atime behavior. Note that the operation at "N" is not done in a necessarily slow way; it might get batched in a bunch of delay writes to that region of the disk. But if your system isn't doing a bunch of reading and writing all over the disk, then this a very noticable seek in the grand scheme of things, especially if it's part of a linear search through files (like via grep or something).
This relatime patch changes things so that the atime is noted in memory, but the inode is not actually considered a candidate for updating "dirty" unless the PREVIOUS value of atime was at least a day ago.
If the file is opened for writing, then the atime gets updated anyway along with the mtime; that always happens as you need to update the inode anyway when the length or data blocks change.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I can't believe that I haven't seen more people coming down on the side of atime. I use it to troubleshoot very frequently. For example, if I try to start something up and it fails, I can easily see if it got as far as reading its config file by checking the atime on it. When I'm looking through a machine to find out what files are relevant to its current config, I can do an ls -lUrt and see what files were read recently and which haven't been touched for years. Yes, there are machines out there that have been cranking away doing production work for years. To do something, and then see what that something touched, is very handy.
Also, you can kind of see what an intruder did on a machine by looking at the atimes on the shared libraries, header files, etc. on a machine that you think may have been compromised, especially if that machine normally just grinds away doing a couple different things most of the time.
Seriously, that's what the noatime flag is for if you want it. But I would never use it unless I had a damn good reason.
...on incremental or differential system backups? It would seem to have no relavance to them, but one must be cautious with such things.
Never go in against a Hungarian when death is on the line!
Aha-ha-ha! Haaa-ha-ha!! Ha--...
[thud]
Do not attribute to malice that which can be easily explained by incompetence.
This sounds like a job for hybrid hard drives. atime updates can be cached in the drive's flash RAM, and the drive only needs to spin up when the cache nears full.
1011 1010 1101 1100 0000 1111 1111 1110 1110
here are my results of each option doing a find and recursive ls:
atime and diratime enabled: with noatime with noatime and nodiratime Weird, for me it was faster with just no atime. This is on ext3. Maybe you need a large number of directories to see a boost with nodiratime.
These are just simple tests, and are inconclusive. Your individual results may vary. All natural, no prescription needed! Call now! Oops, got carried away again.
Shameless plug alert: Game server control panel
I don't care about debugging and stuff, but it's really useful for my porn collection. If you have a big porn folder, just sort it by atime and you can see something "fresh" every day.
for tracking which files are no longer used. That is why backup utilities such as tar have a --preserve-atime option to avoid updating atime during a backup. The proposed relatime option preserves the function of finding files no one uses anymore - for that purpose, 1 day resolution is fine. HSM systems depend on atime to know what to archive. Again, 1 day resolution is fine for that purpose.
I don't know about Mac's but it's easy to disable this feature (last access time) on Windows:
"FSUTIL BEHAVIOR SET DISABLELASTACCESS 1"
This tends to speed up removeable drives quite a bit. It's still a speedup but less of one on non-removeable drives.
If you want a bigger speedup on removeable drives, you can also enable write-caching on a drive by performing the following steps: go to explorer "My Computer" and select drive-letter (i.e. "E:") -> select Properties (right-mouse menu) -> click Hardware (tab) -> select drive-hardware-device-name (in list) -> click Properties (button) -> click Policies (tab) -> select "Optimize for Performance" (radio button) -> click OK (button). Note if you do this you have to use the "Safely Remove Hardware" in the task tray to remove the drive or you can get write errors.
Is this something that's limited just to Linux and the ext3 filesystem?
/etc/rc, but perhaps it's been moved into that new SystemStarter business since I last checked). It seems like the same things ought to apply to HFS -- it has an attribute that's functionally identical (at least, I think it is -- feel free to correct me on this) to atime, stored in the catalog file -- but I'm not familiar enough with the workings of the filesystems to know if that's actually the case.
I'm particularly curious as to whether it's an issue on Mac OS X with the HFS filesystem also, and whether it would be possible / advisable to force Mac OS X to mount the root HFS partition as noatime/nodiratime.
OS X doesn't use a traditional UNIX-style fstab, so it's not immediately clear to me how you'd change the mount options (last time I checked disk mounting was all just in
If this doesn't occur in other OSes (I picked OS X because it's the other OS that I use frequently, and it uses a default fileystem that's pretty different in design from ext2/3), it seems like it might be worthwhile to look at why that is, and what tradeoffs other OSes have made to avoid the same issue.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
shocking. simply shocking.
It shows how much of specialists they are and how detached from the real world they are (no surprise it took them ages before accepting that something like CK's stuff would be helpful). Linux devs should find out how their operating systems are used some time. I suppose that's why most people use distros.
;) ) disable atime on windows too:
o ws2000serv/reskit/regentry/46656.mspx?mfr=true
It should pretty much be standard practice to mount filesystems noatime (and nodiratime,barrier=1 on ext3). Most sane distros have the noatime option even in their install GUI/UIs, it's not like some "new fangled" stuff like barrier=1 (which tries to make sure that data is actually flushed to nonvolatile storage on syncs, and not just the drive's volatile buffers).
Heh, and to quote Ingo: "So for most file workloads we give Windows a 20%-30% performance edge, for almost nothing".
Ingo even thinks they give Windows a performance advantage because of atime when Windows actually does something like atime. The few people who know how to set up windows systems (call them oxymorons if you want
See: http://www.microsoft.com/technet/prodtechnol/wind
Search for NtfsDisableLastAccessUpdate for more.
That said, reltime sounds vaguely interesting from an academic POV but there are bigger fish to fry. Most people who care about performance should just disable atime.
Most apps including email should NOT use atime - at _worst_ they should use "modified time" (and even then use something else if possible). Atime is more for forensics not for apps (after all the app can't tell if it's some random user/app who tried to accessed the file, so what's the point?), and nowadays if someone's got unauthorized access to your system, whether you have atime on or not doesn't make a big difference at all.
The atime problem occurs when you use mutt with mbox-formatted stores, since it defaults (according to the Mutt wiki, anyway) to using atime in order to show whether there is new mail in a particular "folder." (A 'folder' in mbox format being, of course, a single flat file.) It doesn't use atime for message-specific status, only for the status of the overall mbox folder.
There is apparently a compile-time option ("--buffy-size") that tells mutt to use the mbox's file size in lieu of atime, which is another alternative for noatime systems, although it seems like a bit of a hack to me.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
atime?
That's not enough. In addition to atime and mtime, I think we need a real creation time stamp, as well. And a flag to show whether the current file was created ex-nihilo, by copying an existing file, or from an intentional template. Or by overwriting with a new version not immediately derived from the current one on the system.
To help keep this making sense, the OS would, by default, want to implement edit-by-copy-and-rename instead of edit-in-place.
A count of edits-from-creation could be interesting. It might be used to flag the difference between creation ex-nihilo and creation-by-copy, but then you would be back to not being able to tell the difference between an edit and a branch.
Yeah, I think version control should be built into the OS for general use computers. It would sure help me keep all the cruft out of my backups.
Metadata in the directory record might help reduce the overhead of tracking all that, but versioning definitely slows a system down.
Of course, as many have already noted, many parts of the directory tree don't need to track atime, which is the worst burden. But it would sure be nice to be able to look at the metadata and have an idea, modulo deliberate or accidental manipulation of metadata, which files in the system directories were unaltered from the install of that version of the OS.
Of course, while I can sit here and daydream, I can't be bothered to actually implement any of this.
joudanzuki
Well, not exactly, but I got a used notebook drive. A year after I started using that notebook as my home web server, heat and centrifugal force and such seems to have spun the grease away from the contact surfaces of the bearings. Or, the disks may have been sleeping and restarting every ten minutes to run the dynamic dns client. (I really need to figure out how to keep that script and the perl it uses in a RAM disk or something.)
Anyway, it started humming from the re-seeks caused by disks that couldn't maintain speed, and then eventually the disk froze.
Thought I had lost the data.
But I thought twice about it. If I just trashed the drive, the data was gone. I couldn't afford to send it in to a professional recovery service, and I did have backups that were sort of recent, anyway. And I wanted to show the insides of the drive to my son.
So I opened the enclosure, showed it to my son but didn't let him touch it, rotated the disks by hand (very carefully avoiding touching or letting dust fall on the disk surfaces), closed it up, plugged it into a USB shirt-pocket enclosure, and pulled off my data.
Turns out I can still use the drive to carry unimportant data around. (Very light use.) I don't trust critical data on it, of course.
joudanzuki
I would have assumed mtime (file modification time) would be more useful for this purpose. There's no sense updating the backup of my entire music collection if the only "change" I've made is to listen to it.
Of course, rsync is incredibly useful in this case, since it can transfer only the parts of my collection that have changed (ie, nothing). The man page is thin on technical details, though; specifically, whether it relies on atime or mtime or something else to do its work.
SIG: 11
All rites reversed 2010
At least according to the POSIX spec. Because this is (as you can easily see) a bloody stupid thing to do, some systems don't do this and only update atime on open on file.
"When did I do a ./configure"?
"When did I recieve this file"?
Seriously, atime is a performance hit and quite pointless except for a few legacy applications like mutt. It's amazingly bad for mail and news and web and build servers.
/, besides mutt?
Is there any reason not to simply turn it off for
And to the people still thrilled by mutt and its use of easily parseable plain text files, seriously consider switching to a Maildir based system that provides the same unmodified email power, and doesn't have the historical limitations on maximum number of messages that mutt used to have and still may have.
Although from the thread they agree, Ingo != Linus
I can get to my mail folders from a variety of places using a common standard...
My Journal
I don't get it, wouldn't you POKE or STAB someone with a pointed stick and BEAT them about the head with a (fish | club | stick | cat) etc?
Self awareness - try it!
Can be done with a command line too:
fsutil behavior set disablelastaccess 1FYI, Ingo said those words, not Linus.
If you have to update a file that could be used by other processes, one easy way to do that safely is to create a new file in the directory with a temporary name, unlink the original file, and rename the new file to the original name.
If you don't do something like that, there's a possibility that a process could try to access the file as you're updating it and get inconsistent results.
Just try and come up with a scheme for updating a simple file in an environment without mandatory file locking that doesn't involve something like that.
If you don't chmod() a file after creating it, you can't guarantee the permissions on the file will be exactly what you want unless you've cleared the umask. But since the umask is a process-wide attribute, if you muck with it you run the risk of affecting something else.
In other words, it's easier and safer to write:
than it is to write:
The latter can impact other threads running in the same process. And that open() call could block for a long time - imagine some HSM system where the file has to be recalled from tape before you can truncate it.
Not trying to flame here, just a bit baffled by the problem ...
/tmp), there's no need at all for the data to survive the process's death; flushing its data to disk is a total waste of time, to be avoided when possible.
Back when I first started learning unix, a few decades ago, one of the standard examples of its cleverness was the way that it did in-memory buffering of all files. This meant that any changes to a file (including the inode stuff like time stamps) never require a disk write. The block's "true" value is in the buffer, not on disk, and all accesses to the block by any process will read the buffer, and thus see the true value. Writes are only needed when buffer space is full and some blocks need to be flushed to disk. And especially clever were the "disk strategy" routines that did read-ahead, making most reads instantaneous because the data was already sitting in a kernel buffer.
The (f)sync system calls are there explicitly to handle the obvious problem: In case of a power failure or other system crash, everything in the in-memory buffers is lost. Competent programmers are aware of this, and don't trust whatever heuristics the kernel might have for flushing the buffers to disk; you call fsync occasionally for data that you don't want lost in a power failure. Otherwise, you assume that the info is kept in memory, and the info on disk may be out of date. Having files buffered for hours or days is an old unix "problem", often solved by running a background task that calls sync once an hour.
Does this atime problem mean that linux systems actually flush every buffer after every change? If so, that's a blatant violations of the original unix design. Yeah, I know it's safer when there are system crashes. But the buffer design was put there for a very good reason: In a lot of situations, speed is more important than providing survival past a power failure. And for many files (e.g., those in
On almost all systems, a simple strategy can be used. For example, the kernel can flush buffers slowly on a cyclical basis, partly dependent on cpu load. You probably also want to flush any buffers for a file that has just been closed by a writer. There are lots of strategies that don't impose a measurable load on the system.
But flushing the inode after every read is just dumb. Why would anyone ever configure a kernel that way? What were they thinking?
Really, I'm not being insulting; I'm baffled by why anyone would do this. Is there some situation where you want this done? Or was it just overkill of a hypothetical problem that in practice isn't a problem?
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
ADDENDUM:
c id=20169817
To my last post on this subject, here -> http://linux.slashdot.org/comments.pl?sid=264507&
(The post above this one, it's parent, etc.)
One OTHER place, that MIGHT be affected, POSSIBLY, & I WOULD LIKE FEEDBACK ON THIS IF POSSIBLE?
Modern defraggers & also, the most modern versions of Windows' "prefetch" features!
(IIRC, since WinXP, into Server 2003, & VISTA (not SURE about 2000, iirc, it does NOT have this (been a LONG TIME since I ran 2000 is why I am no longer sure) this helps performance, in THEORY!
Thus - & THIS, I have always wondered about:
QUESTION:
NOW - IF the file access timestamp is NOT applied on NTFS using filesystems? Do the modern features mentioned (last filetime access) actually UPDATE & ALLOW Windows "prefetch" & modern defraggers like Raxco PerfectDisk, Executive Software's Diskeeper, & YES - even the NATIVE defragger in Windows XP/Server 2003, & VISTA, to GAIN PERFORMANCE BY PLACING MOST RECENTLY or rather, MOST USED files closer to the outermost, fastest tracks of the HDD?
Thanks for the answer here, & YOUR TAKE ON IT!
A particular file, being accessed recently is NOT enough, but moreso, how OFTEN it gets accessed (if daily, everyday, & a lot? Move it up to the faster areas of the disk, especially executables more than data files, but doing it to data files might not hurt either, especially "init" files, like the registry for instance)...
Thus, turning this ON (stopping NTFS filesystem last access of a file updating its timestamp in MFT$ for that)? I think it does NOT hurt defraggers... it's more the amounts of time, over a LONG PERIOD, that matter here...
E.G. #1-> I access a file, TODAY ONLY, & reboot... but, I never access that file again for WEEKS/MONTHS/YEARS? What was the point of moving it up to a faster area of the diskdrive for??
E.G. #2-> HOWEVER, on the converse, I accessa a file, TODAY AND ALL THIS YEAR, & reboot... Moving it forward HAS benefits, since the outermost tracks of a drive ARE FASTER!
(In other words, I believe that the datestamp of last access alone, is NOT enough & not ALL the "Prefetch" features of XP/Server2k3/VISTA use, but also how OFTEN this occurs & how often you access a file over a LONG period!)
APK
P.S.=> IIRC, I believe the "placement of most used files near the outermost fastest tracks of a harddisk drive (HDD)" actually came from INTEL's pioneering the idea... then again, you have the theory that placement of many files, mid-disk, allows binary searches to work faster (as they always 'cut' the entire dataset in 1/2, & search midpoint lesserthan OR greaterthan etc. for a sought item (mathematically, that is))... apk
ADDENDUM/CORRECTION (to my first addendum) in my last post, which was an addendum to my first one (lol, sorry), here:
c id=20170011
http://linux.slashdot.org/comments.pl?sid=264507&
" then again, you have the theory that placement of many files, mid-disk, allows binary searches to work faster (as they always 'cut' the entire dataset in 1/2, & search midpoint lesserthan OR greaterthan etc. for a sought item (mathematically, that is))... apk"
Change that "middisk", to "midpoint of filedata on disk"
AGAIN = NOT MID DISK, but middle of the files placed ON THE DISK (the actual searched area, the filemass, NOT the 'empty space')
That's for the binary search method that NTFS actually uses during seeks, to work for better performance!
APK
P.S.=> "The devils really ARE in the details", on this one, & I don't wish to misinform anyone, OR miss putting my point across, with correct details backing it... apk
Are you Steve Gibson? You write sort of like he does.
Hail Eris, full of mischief...
E pluribus sanguinem
No, sorry... I am not he, though I know who he is.
APK
The whole discussion above clearly shows a domain when modern operating systems have failed: information management. Living the definition of information to the applications creates the kinds of problems discussed above.
The solution is the following: Operating systems should define a mechanism for defining information types, including conceptual and physical details.
In other words, O/Ses should provide, by default, not a filesystem, but a database. If Unix came with a database by default, no such problems like the above would exist.
We used the noatime option for mounting our disks many years ago, not so much for the performance gain (which IS measurable in an I/O heavy system), but for the stability. Not having the disk writing every time a read-only query is done in your database of millions of rows means, well, no disk is being written to most of the time. Hence, a power failure that lasts longer than your UPS isn't likely to cause any nasty database corruption.
Those who like access times for security reasons probably really want a full audit log that shows every instance of the access AND which UID did it.
It ought to be easy to check whether the current atime is for today and limit the atime updates to one write per day per file. That would provide most of the info provided by atime and eliminate most of the overhead, no?
Posted from my Android phone. Oh, I can change this? There, that's better...
(equivalent) in Windows? The only thing I have found checking the access time on a file is as an alternative to the Now() function.
I actually _use_ atime values, directly.... because sometimes I really do want to know when something was last accessed. This is as a human, not a program, so saying "well fix the program" isn't helpful. And it's been built into Unix from the beginning, so it's not like it's some amazing new capability. But I agree that I typically don't need fine-grained knowledge of when something was last used; I'm usually looking to see what is old cruft, etc. Some programs only need "was it accessed after modification", and certainly there's a performance hit for atime. So this "relatime" looks great to me.... I get my "cruft" information, and existing programs work, without the performance hit of traditional atime. As long as there's a WAY to force the traditional atime to be supported (via some mount-time option), I think relatime (even as a default) is a reasonable compromise. Relatime gives higher performance in exchange for a poorer quanta on atime, and for many, that's probably a worthy trade.
I like the idea of writing the "more accurate" atime when it's "for free". I don't see the big advantage of ALWAYS being a day late... I think it's better to say "at worse, it's a day late, but it MIGHT be better".
- David A. Wheeler (see my Secure Programming HOWTO)
I agree the whole atime thing is bollocks, but you can improve efficiency while keeping the semantics - cache atime updates in one place.
The issue is the need to keep in memory a dirty copy of every inode block with updated atimes (and there are lots of inode blocks, and you have open a few files from all over the disk, so worst case you're maintaining 1 block per open file), and write them every 30s or however often.
If, instead, you kept an atime cache of (inode number, atime) which (if an entry was present) overrode the entry in the basic inode, then you could just update this cache and write its (2 or 3) blocks out to disk. You can propagate the changes to the inode atime once the file's been closed a while (and then reuse the entry in the cache, keeping its size down). You might also decide that writing the atime cache is very low urgency and only do it when other disk changes are made.
A quick off the cuff calculation suggests that a 4k block would hold 512 files worth of this information. I don't know how many open files there are on a machine at once, but on a desktop-usage one, I suspect it's less than a couple of thousand most of the time.
The risk is the effect it might have on fast path code for inode lookups. You'd be adding a hash table lookup for every inode fetch.
It's a heck of a lot easier than trying to partition the drive when the menu only gives you the option to either wipe the entire disk or manually install. It's also a lot easier to understand than stuff like getting my SD card reader to work.
A standard Gentoo install turns off atime by default. It's been that way for at least five years, perhaps longer.
"I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.