Auditing Large Unix File Systems?
jstockdale asks: "The recent article on perpendicular recording hard drive technology brought me, as a unix(tm) admin, to reflect on the management of data systems and file servers of capacities >1TB (which exist today and tomorrow will become commonplace). Since Google for once seems useless, what suggestions does the Slashdot crowd have on methods and software to audit changes, visualize file system usage, and in general to determine the qualitative and quantitative nature of the content of large unix file systems?"
To make a guess, perhaps as storage is growing fast, but read times are not, his drives are getting filled up faster than he can run "du -s -m | sort -n " to figure out whose's filling them up?
Also, whose to say file size increases with storage capacity? Perhaps at his site, number-of-files increases with storage capacity, with the file size staying statistically constant. MB for MB, traversing lots of little files is harder than traversing a few big files.
Don't get me wrong here -- in the event that you have a multi TB system keeping track of usage with 'du' really just isn't practical, but do you really even have to ASK the box where the data is?
/oradata there's the database. It gets it's own space. When that gets full he doesn't do an 'rm' to clean things up -- He has to use Oracle tools to do that.
/filestores there's a giant mess of crap that nobody can make heads or tails of. They're hashed filename from a PDM system. He can't do anything to clean that out without -- you guessed it - using the PDM system. The filesystem itself has NO idea what's going in either of it's major usage sections. It's just "stuff" and to rm -rf a directory because you're running out of space would be foolish.
Our backend storage system for my project is 1TB, or at least very close to it. I don't manage the box, but I do work with it. It holds three things:
1) It's OS (small)
2) It's Oracle Database files (300GB on disk, about 200GB used now)
3) Files. Word documents, CAD drawings, TIF, GIF, etc. A whole slew of them.
The admin knows what's using what. Under
Under
When filesystems can actually hold metadata regarding their contents then I'd give this question some though. We could have a whole new set of Unix tools to modify our everything-is-a-file-with-badass-meta-data system. Until then I don't see any way for filesystem maintence to be a huge issue on this multi TB systems. All you can really do with the FS is determine which system needs morespace and order more disks. You can't trim or manage it with the FS.
I'm wrong a lot though, but that's my take on the "issue".
This could be especially wrong if he's running things like a news server, and as more disk space is available, he up's the retention time. Archival storage of email (in a 1 file/message system, like Maildirs) would have the same problem.
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/