Data Storage Capacity Mostly Wasted In Data Center

← Back to Stories (view on slashdot.org)

Data Storage Capacity Mostly Wasted In Data Center

Posted by CmdrTaco on Wednesday July 28, 2010 @05:15AM from the and-they-never-turn-the-lights-off dept.

Lucas123 writes "Even after the introduction of technologies such as thin provisioning, capacity reclamation and storage monitoring and reporting software, 60% to 70% of data capacity remains unused in data centers due to over provisioning for applications and misconfiguring data storage systems. While the price of storage resource management software can be high, the cost of wasted storage is even higher with 100TB equalling $1 million when human resources, floor space, and electricity is figured in. 'It's a bit of a paradox. Users don't seem to be willing to spend the money to see what they have,' said Andrew Reichman, an analyst at Forrester Research."

11 of 165 comments (clear)

Min score:

Reason:

Sort:

Intentional? by Anonymous Coward · 2010-07-28 05:18 · Score: 5, Insightful

I don't know about your data center, but ours keeps drives well below full capacity intentionally.
The more disk arms you spread the operations over, the faster the operations get, and smaller drives are often more expensive than larger ones.
Plus, drives that are running close to full can't manage fragmentation nearly as well.
1. Re:Intentional? by TrisexualPuppy · 2010-07-28 05:33 · Score: 5, Insightful
  
  Yep, that's how we run things at my company. Drives and controllers have fewer files to deal with, and all else assumed equal, you get better performance this way.
  
  You also have to think of the obvious spare capacity. In 2005, my company invested in a huge (at the time) 10TB array. The boss rightfully freaked out when we were hitting more than 30% usage in 2007. After having a slow, quasi-linear growth of files for the previous couple of years, the usage jumped to 50% in a matter of months. It ended up that our CAD users switched to a newer version of the software without our knowledge (CAD group managed their own software) and didn't tell us. The unexpected *DOES* happen, and it would have been incredibly stupid to have been running closer to capacity.
  
  Accounting would have probably had half of us fired if they hadn't been able to do their document imaging which tends to take up a lot of space on the SAN.
  
  Yet another sad FUD or FUD-esque article based on Forrester's findings.
2. Re:Intentional? by Nerdfest · 2010-07-28 05:46 · Score: 4, Insightful
  
  Simply put, over-provisioning is relatively harmless while under-provisioning is very bad.
3. Re:Intentional? by hardburn · 2010-07-28 05:48 · Score: 4, Insightful
  
  FTA:
  
  Rick Clark, CEO of Aptare Inc., said most companies can reclaim large chunks of data center storage capacity because it was never used by applications in the first place. . . . Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.
  In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.
  
  --
  Not a typewriter
4. Re:Intentional? by Score+Whore · 2010-07-28 06:41 · Score: 4, Insightful
  
  Not to mention the fact that over the last few years drive capacities have skyrocketed while drive performance has remained the same. That is, your average drive / spindle has grown from 36 GB to 72 GB to 146 GB to 300 GB to 400 GB to 600 GB, etc. while delivering a non-growing 150 IOPS per spindle.
  If you have an application that has particular data accessibility requirements, you end up buying IOPS and not capacity. A recent deployment was for a database that needed 5000 IOPS with services times to remain less than 10 ms. The database is two terabytes. A simple capacity analysis would call for a handful of drives, perhaps sixteen 300 GB drives mirrored for a usable capacity of 2.4 TB. Unfortunately those sixteen drives will only be able to deliver around 800 IOPS at 10 ms per. Instead we had to configure one hundred and thirty 300 GB drives, ending up with over 21 TB of storage capacity that is about ten percent utilized.
  These days anytime an analyst or storage vendor starts talking to me about thin provisioning, zero page reclaim, etc. I have to take a minute and explain to them my actual needs and that they have very little to do with gigabytes or terabytes. Usually I have to do this multiple times.
  In the near future we will be moving to SSD based storage once more enterprise vendors have worked through the quirks and gained some experience.
Let's play the odds: by fuzzyfuzzyfungus · 2010-07-28 05:20 · Score: 5, Insightful

Likelihood that I get fired because something important runs out of storage and falls over(and, naturally, it'll be most likely to run out of storage under heavy use, which is when we most need it up...): Relatively high...

Likelihood that I get fired because I buy a few hundred gigs too much, that sit in a dusty corner somewhere, barely even noticed except in passing because there is nobody with a clear handle on the overall picture(and, if there his, he is looking at things from the sort of bird's eye view where a few hundred gigs looks like a speck on the map): Relatively insignificant...
Slashvertisement by hcdejong · 2010-07-28 05:21 · Score: 5, Insightful

for a storage monitoring system.
Disk space is free by amorsen · 2010-07-28 05:27 · Score: 5, Interesting

Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.
I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.

--
Finally! A year of moderation! Ready for 2019?
Re:Overprovisioning by Maarx · 2010-07-28 05:28 · Score: 5, Insightful

That mother is terrible.
CYA Approach by MBGMorden · 2010-07-28 05:40 · Score: 4, Informative

This is the CYA approach, and I don't see it getting any better. When configuring a server, it's usually better to pay the marginally higher cost for 3-4x as much disk space as you think you'll need, rather than risk the possibility of returning to your boss asking to buy MORE space later.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
IO'/second count matters, too by natoochtoniket · 2010-07-28 07:03 · Score: 4, Insightful

There are two numbers that matter for storage systems. One is the raw number of gigabytes that can be stored. The other is the number of IO's that can be performed in a second. The first limits the size of the collected data. The second limits how many new transactions can be processed per time period. That, in turn, determines how many pennies we can accept from our customers during a busy hour.
We size our systems to hit performance targets that are set in terms of transactions per second, not just gigabytes. Using round numbers, if a disk model can do 1000 IO/second, and we need 10,000 IO/second for a particular table, then we need at least 10 disks for that table (not counting mirrors). We often use the smallest disks we can buy, because we don't need the extra gigs. If the data volume doesn't ever fill up the gigabyte capacity of the disks, that's ok. Whenever the system uses all of the available IO's-per-second, we think about adding more disks.
Occasionally a new SA doesn't understand this, sees a bunch of "empty" space in a subsystem, and configures something to use that space. When that happens, we then have to scramble, as the problem is not usually discovered until the next busy day.