Ask Slashdot: Smarter Disk Space Monitoring In the Age of Cheap Storage?
relliker writes In the olden days, when monitoring a file system of a few 100 MB, we would be alerted when it topped 90% or more, with 95% a lot of times considered quite critical. Today, however, with a lot of file systems in the Terabyte range, a 90-95% full file system can still have a considerable amount of free space but we still mostly get bugged by the same alerts as in the days of yore when there really isn't a cause for immediate concern. Apart from increasing thresholds and/or starting to monitor actual free space left instead of a percentage, should it be time for monitoring systems to become a bit more intelligent by taking space usage trends and heuristics into account too and only warn about critical usage when projected thresholds are exceeded? I'd like my system to warn me with something like, 'Hey!, you'll be running out of space in a couple of months if you go on like this!' Or is this already the norm and I'm still living in a digital cave? What do you use, on what operating system?
I never run out of disk space.
How does performance change as the big disks approach full? That was always one reason for the rule of thumb about keeping at least 10% free space on UNIX.
"Almost every wise saying has an opposite one, no less wise, to balance it." - George Santayana
Today, however, with a lot of file systems in the Terabyte range, a 90-95% full file system can still have a considerable amount of free space but we still mostly get bugged by the same alerts as in the days of yore when there really isn't a cause for immediate concern.
When we had drives in the 100s of MB range, we used a few MB at a time. Now that we have drives in the multi-TB range, we tend to use tens of GB at a time. In my experiences, a 90 percent full drive has as much time left before running out as it did a decade ago.
Perhaps more importantly, running at 90% of capacity kills your performance if you still use spinning glass platters as your primary storage medium (not so much when talking about a SAN of SSDs). In general, when you hit 90% full, you have problems other than just how long you can last before reaching 100%.
I install the shareware version of Hard Drive Sentinel on all my Windows systems. It not only will warn you about hard drive usage (%); it will also warn you about errors on the drive -- and in my case I was able to predict that two drives were going to fail (saving data) before they actually failed.
Their support has been very responsive and courteous, their product can work through (see drives behind) most RAID controllers.
And no, I don't have any affiliation with HDS.
...when there really isn't a cause for immediate concern.
It all depends what one is concerned about. Is maximizing disk space down to the last possible byte important to you? Or is performance in accessing random data important to you? Or is wanting to keep artificial limits imposed by monitoring systems important to you?
.
Once you determine what is actually important to you, then you monitor for that parameter.
Whatever is measured is optimized.
It's a configuration option when you newfs a file system. Man newfs or mkfs.
[John]
Shit better not happen!
You insensitive clod! In the age of MBs, we were producing KBs of data. In the age of GBs we were producing MBs of data. And in the age of TBs we are producing GBs of data. And so on. Thus a 90% full filesystem is as bad as 10 year ago. Unless you are still producing KBs of data.
Sent as ripples into the electromagnetic field. No single photon has been harmed in the process.
I don't know; the default 5% might be excessive for really big volumes but keeping at least %1 free seems 'smart' pretty much no matter how many orders of magnitude the typical volume grows to be. The typical file size has grown with volume size. We now have all kinds of large media files we keep on online storage now that previously would have run off to some other sort of media in short order.
The entire port of the reservation is so in the event of calamity the super user retains a little free space to work in; if (s)he is going to be able to be able to shuffle things about they might well need what we nominally think of as quite a bit of space. Those things today might be a 100GB VM image or something on 20TB SAN volume for example.
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
You're living in a digital cave IMHO.
:).
Don't worry, I was too until recently...
Always mucked with fast external storage as the "main" solution -- firewire, thunderbolt, etc. This system is the main and had a few externals hooked up, that system had another, another over there for something else. It was a mess all around. How to back it all up??
Gave them all away -- bought a Synology
Then bought another (back it up
180-200M/sec throughput is the norm. On the network. Beats out most external drives I've ever come across. Everything ties into / backs up to the array. Home and work now too.
I use everything but Microsoft products. They're shit.
My filesystem is 60T w/ under 10T used today. I'll consider plugging in more drives or changing them out in the Synology somewhere between 2017 and 2020...
We switched to Check_MK for monitoring. It's basically a collection of software that sits on top of Nagios.
The default disk monitoring allows alerting based on trends (full in 24hours, etc.) or thresholds based on a "magic factor." Basically it scales the thresholds so that larger disks alert at a higher percentage, adjustable in quite a few different ways to suit your tastes.
Create a large file, that the super user then deletes when the super user needs to fix issues.
ZFS raidz2 is pretty well RAID6 with an awareness of what is going on with the files in the array giving a variety of improvements (eg. resilver time normally being vastly shorter than a RAID6 rebuild time). A few years of seeing RAID6 in action was ultimately what drove me to ZFS on hardware that's perfectly capable of doing RAID6.
Anyway, the "raid only has five more years" article keeps on getting warmed up, and keeps getting disproved by the very reasons given for the RAID use by date. Increasing capacity has only been possible by increasing the data density on the disks which means the heads pick up more information - thus faster read and write speeds. Better controllers also made a massive difference. Now dedicating lots of cycles to many cores of fast CPUs (instead of the processors in the controllers) is once again making a massive difference. It's only three hours to do a scrub on a 12 x 1TB 7200rpm drive system here with an i5 CPU and it would take close to the same to resilver a new drive. That is six mirrors so faster than raidz or raidz2, but still, it's not a huge amount of time to replace drives now even though that's bigger than the 500GB or so that was supposed to take forever to rebuild.