Data Storage Capacity Mostly Wasted In Data Center

Intentional? by Anonymous Coward · 2010-07-28 05:18 · Score: 5, Insightful

I don't know about your data center, but ours keeps drives well below full capacity intentionally.

The more disk arms you spread the operations over, the faster the operations get, and smaller drives are often more expensive than larger ones.

Plus, drives that are running close to full can't manage fragmentation nearly as well.

Re:Intentional? by TrisexualPuppy · 2010-07-28 05:33 · Score: 5, Insightful

Yep, that's how we run things at my company. Drives and controllers have fewer files to deal with, and all else assumed equal, you get better performance this way.

You also have to think of the obvious spare capacity. In 2005, my company invested in a huge (at the time) 10TB array. The boss rightfully freaked out when we were hitting more than 30% usage in 2007. After having a slow, quasi-linear growth of files for the previous couple of years, the usage jumped to 50% in a matter of months. It ended up that our CAD users switched to a newer version of the software without our knowledge (CAD group managed their own software) and didn't tell us. The unexpected *DOES* happen, and it would have been incredibly stupid to have been running closer to capacity.

Accounting would have probably had half of us fired if they hadn't been able to do their document imaging which tends to take up a lot of space on the SAN.

Yet another sad FUD or FUD-esque article based on Forrester's findings.
Re:Intentional? by Nerdfest · 2010-07-28 05:46 · Score: 4, Insightful

Simply put, over-provisioning is relatively harmless while under-provisioning is very bad.
Re:Intentional? by hardburn · 2010-07-28 05:48 · Score: 4, Insightful

FTA:

Rick Clark, CEO of Aptare Inc., said most companies can reclaim large chunks of data center storage capacity because it was never used by applications in the first place. . . . Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.
In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.

--
Not a typewriter
Re:Intentional? by dmgxmichael · 2010-07-28 06:32 · Score: 2, Insightful

When I see services advertised at those kinds of rates I can't help but remember P.T. Barnum's slogan: "There's a sucker born every minute."
Re:Intentional? by Score+Whore · 2010-07-28 06:41 · Score: 4, Insightful

Not to mention the fact that over the last few years drive capacities have skyrocketed while drive performance has remained the same. That is, your average drive / spindle has grown from 36 GB to 72 GB to 146 GB to 300 GB to 400 GB to 600 GB, etc. while delivering a non-growing 150 IOPS per spindle.
If you have an application that has particular data accessibility requirements, you end up buying IOPS and not capacity. A recent deployment was for a database that needed 5000 IOPS with services times to remain less than 10 ms. The database is two terabytes. A simple capacity analysis would call for a handful of drives, perhaps sixteen 300 GB drives mirrored for a usable capacity of 2.4 TB. Unfortunately those sixteen drives will only be able to deliver around 800 IOPS at 10 ms per. Instead we had to configure one hundred and thirty 300 GB drives, ending up with over 21 TB of storage capacity that is about ten percent utilized.
These days anytime an analyst or storage vendor starts talking to me about thin provisioning, zero page reclaim, etc. I have to take a minute and explain to them my actual needs and that they have very little to do with gigabytes or terabytes. Usually I have to do this multiple times.
In the near future we will be moving to SSD based storage once more enterprise vendors have worked through the quirks and gained some experience.
Re:Intentional? by KernelMuncher · 2010-07-28 06:52 · Score: 3, Insightful

I think the above example is a great reason why you should always over-engineer your storage capability somewhat. Demand for space can come up unexpectedly and stop the whole show if it's not there. Also if you don't use the storage today, you will definitely make use of it tomorrow. Data usage always goes up, not down. So there's ROI for the next fiscal year when you can make use of the extra capacity.

100TB = $1 million by maxwell+demon · 2010-07-28 05:19 · Score: 2, Insightful

I didn't know that I've got $25000 dollars worth of storage at home :-)

--
The Tao of math: The numbers you can count are not the real numbers.

Re:100TB = $1 million by phantomcircuit · 2010-07-28 05:23 · Score: 2, Informative

I didn't know that I've got $25000 dollars worth of storage at home :-)
It's not worth that much in your home, unless you happen to have redundant power supplies and redundant uplinks.
Re:100TB = $1 million by Luyseyal · 2010-07-28 05:30 · Score: 2, Funny

And "human resources".
-l

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
Re:100TB = $1 million by aliquis · 2010-07-28 05:40 · Score: 3, Funny

And "human resources"
"I'll go build my own data center, with blackjack and hookers!"?
Re:100TB = $1 million by Luyseyal · 2010-07-28 05:50 · Score: 2, Funny

In fact, forget the data center and blackjack!
-l

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!

Let's play the odds: by fuzzyfuzzyfungus · 2010-07-28 05:20 · Score: 5, Insightful

Likelihood that I get fired because something important runs out of storage and falls over(and, naturally, it'll be most likely to run out of storage under heavy use, which is when we most need it up...): Relatively high...

Likelihood that I get fired because I buy a few hundred gigs too much, that sit in a dusty corner somewhere, barely even noticed except in passing because there is nobody with a clear handle on the overall picture(and, if there his, he is looking at things from the sort of bird's eye view where a few hundred gigs looks like a speck on the map): Relatively insignificant...

Re:Let's play the odds: by qbzzt · 2010-07-28 05:29 · Score: 3, Insightful

Exactly, and that's the way it should be. Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

--
-- Support a free market in the field of government
Re:Let's play the odds: by _damnit_ · 2010-07-28 06:29 · Score: 2, Insightful

Of course this is the case. This study is as exciting as news that George Michael is gay. There have been plenty of studies to this effect. My company makes tons of money consulting on better storage utilization. [Some Fortune 500 companies I've visited run below 40% utilization.] EMC, IBM, HDS, NetApp and the rest have no real interest in selling you less drives. They all make vague, glossy statements about saving storage money but in reality you need to be wasteful if you want to protect your ass. Think of the things we spend $ on just to get another 9 on the uptime digits: UPS, generators, clustering, DR systems/networks that sit idle, dark fibre between datacenters, RAID 1(+0), RAID 6, tapes, VTLs, Storage Arrays, redundant Fibre Channel SANs, . . .
From a human perspective, fuzzyfungus is right. Over-engineering is less likely to cost your job than failure. Plus, over-engineering is easy to justify.
Some things are just known to cost money if you MUST ensure that business is not subject to fallibility in hw and sw. The fact that there are 50 TBs unused out of your 200 TB of usable storage really might not mean too much. [Some of the numbers quoted could point to the mirrored side of RAID 1 stripes as wasted. It's a cheap gimmick to make the numbers look worse but still true to a certain extent if the performance difference between R5 and R1 is not needed.] Of course, there are usually low hanging fruit that can be attacked to save real money and prevent cascading costs on the other cost centers mentioned above but there will always be waste. It's the cost of five 9's.

--

_damnit_

It's my job to freeze you. -- Logan's Run
Re:Let's play the odds: by wagnerrp · 2010-07-28 06:51 · Score: 3, Insightful

They're not buying the $100 2TB bargain special, they're buying the $300 300GB 15K SAS drive. They don't care how much storage they have, they just want the IOPS.

Slashvertisement by hcdejong · 2010-07-28 05:21 · Score: 5, Insightful

for a storage monitoring system.

Overprovisioning by shoppa · 2010-07-28 05:23 · Score: 3, Interesting

It's so easy to over-provision. Hardware is cheap and if you don't ask for more than you think you need, you may end up (especially after the app becomes popular, gasp!) needing more than you thought at first.

It's like two kids fighting over a pie. Mom comes in, and kid #1 says "I think we should split it equally". Kid #2 says "I want it all". Mom listens to both sides and the kid who wanted his fair share only gets one quarter of the pie, while the kid who wanted it all gets three quarters. That's why you have to ask for more than you fairly need. It happens not just at the hardware purchase end but all the way up the pole. And you better spend the money you asked for or you're gonna lose it, too.

Re:Overprovisioning by Maarx · 2010-07-28 05:28 · Score: 5, Insightful

That mother is terrible.
Re:Overprovisioning by Archangel+Michael · 2010-07-28 05:44 · Score: 3, Insightful

Dad here. Had that fight (or similar). I asked a simple question to the kid who wanted it all. I asked him "all or nothing?" and again he said "all", to which I said "nothing".
Of course he rightly cried "Not Fair!!!", and I said, you set the rules, you wanted it all, setting the rule up that you didn't want to be fair, I'm just playing by your rules.
Never had that problem again. EVER.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by Archangel+Michael · 2010-07-28 06:13 · Score: 2, Insightful

Nope, he wasn't screwed, because it wasn't the only option; it was a false dichotomy. I gave him a chance to offer another choice, it was just veiled. Kobioshi Maru. He could have thought about it and said "half" even though that wasn't an obvious choice.
I often give my kids tests to break them out of self imposed boxes (false dichotomy). Pick a number between 1 and 10 .... 1 - no, 2 - no, 3 - no, 4 - no .... 9 - no, 10 no ... THAT IMPOSSIBLE DAD!!.
No it isn't. The number I had in mind was Pi.
Raising kids to think for themselves, and outside the "boxes" society tends to put on things makes them able to deal better with things that don't appear to make sense.
You can dumb down your kids by not challenging them, or you can challenge them every step of the way, in ways that force them to learn more than they know.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by Archangel+Michael · 2010-07-28 06:22 · Score: 2, Funny

I wish I would have hit you now - Dad

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by Culture20 · 2010-07-28 07:50 · Score: 2, Insightful

At that point he was screwed. If he said "nothing", he could reasonably expect to get nothing. His only option was to say "all" if he wanted to get a chance at something.
If my son (nobly or stubbornly) said "nothing", I'd offer him half or nothing. Parents are allowed to alter the deals. Pray that they alter them further.
Re:Overprovisioning by sheph · 2010-07-28 08:51 · Score: 2, Insightful

Well done, man!! See, some folks just don't know what tough love is, and the positive impact it can have. You wanna run for office in 2012? We could use someone like you after the current round of buffoons!

--
I don't believe in karma, I just call it like I see it.

Disk space is free by amorsen · 2010-07-28 05:27 · Score: 5, Interesting

Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.

I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.

--
Finally! A year of moderation! Ready for 2019?

Re:Disk space is free by eldavojohn · 2010-07-28 05:44 · Score: 2, Interesting

Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.
I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.
I don't know too much about what you just said but I do know that the Linux images I get at work are virtual machines of a free distribution of Linux. I can request any size I want. But my databases often grow. And then the next thing is that a resizing of a partition is very expensive from our provisioner. So what do we do? We estimate how much space our web apps take up a month and then we request space for 10 years out. Because a resize of the partition is so damned expensive. And those sizes are usually pretty small anyway if you're building databases. Then we occasionally notify our managers when space is getting low by using the provisioner's dashboard tool and we re-assess the application. Is it getting unexpectedly popular or was it bad estimation from the beginning?

I don't know if I should be bothering with the hardware level of things. I sure do like it this way even though it is a really expensive price for the project but the payment remains inside our company anyway. It's internal to the company so we're all using some nebulous group of actual machines and RAIDs to produce a massive cloud of smaller servers as images. There are some downsides and a bit of overhead to pay for virtualization but I thought everyone had moved to this model ...

--
My work here is dung.
Re:Disk space is free by bobcat7677 · 2010-07-28 05:48 · Score: 2, Interesting

Parent has an excellent point. Utilization is not always about how full the disk is...especially in a data center where there is frequently large database operations requiring extreme amounts of IOPS. In the past, the answer was to throw "more spindles" at it. At which point you could theoretically end up with a 20GB database spread across 40 SAS disks making available ~1.5TB of space using the typical 73GB size disks just to reach the IOPS capacity needed to handle heavy update/insert/read operations. Huge waste of space, but only way to do it with spinning disks. SSDs of course can solve the problem, but most SAN vendors are still charging insane prices for what meager SSD options they offer, with some vendors not even offering SSD options yet. And then you can end up on the other end of the scale, with having to buy more IOPS capacity then you need just to get enough SSD space for your data. Adaptec has some cool technology for "hybrid" arrays consisting of both SSDs and spindle disks in the same array (I have heard the latest versions of Solaris can do this with ZFS too). But the applications for Hybrid arrays are somewhat limited because write performance still sucks once any available write cache is saturated (and especially if the controller/software array has no cache).

Or IT is provisioning for peak usage by Todd+Knarr · 2010-07-28 05:37 · Score: 3, Informative

Having too much storage is an easy problem. Sure it cost a bit more, but not prohibitively so or you'd never have gotten approval to spend the money. Not having enough storage, OTOH, is a hard problem. Running out of space in the middle of a job means a crashed job and downtime to add more storage. That probably just cost more than having too much would've, and then you pile the political problems on top of that. So common sense says you don't provision for the storage you're going to normally need, you provision for the maximum storage you expect to need at any time plus a bit of padding just in case.

AT&T discovered this back in the days when telephone operators actually got a lot of work. They found that phone calls tend to come in in clumps, they weren't evenly distributed, so when they staffed for the average call rate they ended up failing to meet their answer times on a very large fraction of their calls. They had to change to staffing for the peak number of simultaneous calls, and accept the idle operators as a cost of being able to meet those peaks.

CYA Approach by MBGMorden · 2010-07-28 05:40 · Score: 4, Informative

This is the CYA approach, and I don't see it getting any better. When configuring a server, it's usually better to pay the marginally higher cost for 3-4x as much disk space as you think you'll need, rather than risk the possibility of returning to your boss asking to buy MORE space later.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain

Re:Mod parent up by TrisexualPuppy · 2010-07-28 05:45 · Score: 2, Insightful

Interesting. Was the culprit all cad files out of the new rev?

Yes, for the most part. Because of a bad config, they were going from drawings around 1-10MB to drawings over 100MB. That's what happens when you get management to take the IT department out of the software management and configuration equation. We were, of course, still left to sweep up the pieces.

sounds like the consultants are having a slow year by alen · 2010-07-28 05:45 · Score: 2, Interesting

time to go and buy up all kinds of expensive software to tell us something or other

it's almost like the DR consultants who say we need to spend a fortune on a DR site in case a nuclear bomb goes off and we need to run the business from 100 miles away. i'll be 2000 miles away living with mom again in the middle of no where and making sure my family is safe. not going to some DR site that is going to close because half of NYC is going to go bankrupt in the depression after a WMD attack

ISPs & hosting services by shmlco · 2010-07-28 05:47 · Score: 2, Insightful

This isn't like an ISP overbooking a line and hoping that everyone doesn't decide to download a movie at the same time. If a hosting service says your account can have 10GB of storage, contractually they need to make sure 10GB of storage exists.

Even though most accounts don't need it.

One client of mine dramatically over-provisioned his database server. But then again, he expects at some point to break past his current customer plateau and hit the big time. Will he do so? Who can say?

It may be a bit wasteful to over-provision a server, but I can guarantee you that continually ripping out "just big enough" servers and installing larger ones is even more wasteful.

Your pick.

--
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.

No... by rickb928 · 2010-07-28 06:11 · Score: 2, Interesting

"It's a bit of a paradox. Users don't seem to be willing to spend the money to see what they have,"

I think he meant users don't seem willing to spend the money to MANAGE what they have.

As many have pointed out, you need 'excess' capacity to avoid failing for unusual or unexpected processes. How often has the DBA team asked for a copy of a database? And when that file is a substantial portion of storage on a volume, woopsie, out of space messages can happen. Of course they should be copying it to a non-production volume. Mistakes happen. Having a spare TB of space means never having to say 'you're sorry'.

Aside from the obvious problems of keeping volumes too low on free space, there was a time when you could recover deleted files. Too little free space pretty much guarantees you won't be recovering deleted files much older than, sometimes, 15 minutes ago. In the old days, NetWare servers would let you recover anything not overwritten. I saved users from file deletions over the span of YEARS, in those halcyon days when storage became relatively cheap and a small office server could never fill a 120MB array. Those days are gone, but without free space, recovery is futile, even over the span of a week. Windows servers, of course, present greater challenges.

'Online' backups rely on delta files or some other scheme that involves either duplicating a file so it can be written intact, or saving changes so they can be rolled in after the process. More free space here means you actually get the backup to complete. Not wasted space at all.

Many of the SANs I've had the pleasure of working with had largely poor management implementations. Trying to manage dynamic volumes and overcommits had to wait for Microsoft to get its act together. Linux had a small lead in this, but unless your SAN lets you do automatic allocation and volume expansion, you might as well instrument the server and use SNMP to warn you of volume space, and be prepared for the nighttime alerts. Does your SAN allow you to let it increase volume space based on low free space, and then reclaim it later when the free space exceeds threshold? Do you get this for less than six figures? Seven? I don't know, I've been blessed with not having to do SAN management for about 5 years. I sleep much better, thanks.

Free space is precisely like empty parking lots. When business picks up, the lot is full. This is good.

--
deleting the extra space after periods so i can stay relevant, yeah.

IO'/second count matters, too by natoochtoniket · 2010-07-28 07:03 · Score: 4, Insightful

There are two numbers that matter for storage systems. One is the raw number of gigabytes that can be stored. The other is the number of IO's that can be performed in a second. The first limits the size of the collected data. The second limits how many new transactions can be processed per time period. That, in turn, determines how many pennies we can accept from our customers during a busy hour.

We size our systems to hit performance targets that are set in terms of transactions per second, not just gigabytes. Using round numbers, if a disk model can do 1000 IO/second, and we need 10,000 IO/second for a particular table, then we need at least 10 disks for that table (not counting mirrors). We often use the smallest disks we can buy, because we don't need the extra gigs. If the data volume doesn't ever fill up the gigabyte capacity of the disks, that's ok. Whenever the system uses all of the available IO's-per-second, we think about adding more disks.

Occasionally a new SA doesn't understand this, sees a bunch of "empty" space in a subsystem, and configures something to use that space. When that happens, we then have to scramble, as the problem is not usually discovered until the next busy day.

Re:100 TB for $1,000,000? No way! by spazimodo · 2010-07-28 07:19 · Score: 2, Insightful

I'm not sure if you're trolling or not, but if you're serious did you happen to manage the storage for Microsoft's Sidekick servers?

A couple things wrong with your assumptions:
1) 1TB drives might be great for storing your goat porn collection, but on a server with actual load, how many of those drives do you need to get adequate IOPS? Also exactly 100 of them means no RAID, but that's OK because drives from Newegg never fail so your 100TB of data should be fine.
2) You seem to have left controllers out of your list. Anyone who's ever had a RAID controller start barfing garbage all over a LUN, or take out a second drive after a drive failure will tell you the controller is the really critical bit (and is usually a single point of failure in systems with DAS.)
3) Where's your backup hardware? Where's space for snapshots? Where's space for replication?
4) Ever time a RAID5 rebuild on say a 9 drive LUN with 1TB SATA disks?

Storage is expensive because the data on it has value and making sure that data is available and isn't lost or corrupted costs money. Cheap storage solutions don't end up that way when the drives have to go to OnTrack for recovery and the company's down for a week, or valuable data is lost.

--

Fsck the millennium, we want it now.
Millennium Crisis Line: 0890 900 2000 [calls cost 50p/min]

Re:100 TB for $1,000,000? No way! by Domint · 2010-07-28 08:12 · Score: 3, Insightful

Most SAN administrators wouldn't be caught dead using your $130 1TB drives. Rerunning your calculations with 15K 450GB SAS drives (around $300 bucks), and you're spending quite a bit more: 228 drives will give you 100TB, sure, but we'd want some redundancy . . . say RAID 5 (not the best approach for SAN design, but let's keep it simple) which pushes the drive count up to 304 with a total cost of $91,200, just for disks. To get a real, enterprise enclosure (or rather, cluster of enclosures considering the drive count) that offers things like FiberChannel, 10Gb iSCSI, or InfiniBand uplinks, and features such as SAN to SAN replication, bit deduplification, and other enterprise-level utilities/features, I'd say you're looking at $500,000 (ballpark guess) just to have something to stick the drives into. We're at ~$600,000 without even taking into account the physical costs of operation, datacenter architecture, or labor costs to maintain such a SAN.

Suddenly, that $1 million isn't so far fetched, eh?

Re:100 TB for $1,000,000? No way! by Anarke_Incarnate · 2010-07-28 09:01 · Score: 2, Insightful

You lost something along the way. When you are doing RAID 5 on an enterprise array, you are likely using 5+1 sets. Your 304 drives does not take into account losing 2 drive capacity every 6 drives. You can get away with global hot sparing, but that doesn't cover your ass as much. You would need 342 drives.

Man hours more expensive than hardware. by MikeFM · 2010-07-28 09:06 · Score: 2, Insightful

We do use thin provisioning, and virtualization in general, but I agree that there is benefit to keeping utilization low. We try to keep more space than we could possibly need both because it can sometimes surprise you by growing quickly and because the drives are faster if the data is spread across multiple drives. Also SSD drives sometimes live longer if not fully utilized, because they can distribute the wear and tear, so we usually leave 20% unformatted.

Downtime and slow systems are much more expensive than wasted drive space.

--
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.

Re:Mod parent up by minorproblem · 2010-07-28 16:08 · Score: 2, Interesting

i've seen worse. At my company they moved the CAD software management to drafters and then they broke up the drafting department and just assigned each drafter to a team. I am an engineer and i sit near the IT department. I feel sorry for the poor buggers, now not only do they have to run around like headless chooks. But so do the CAD drafters because before the load level was done by a head drafter allocating work. now its managers running around asking other managers can they "borrow" there drafter, and we have different people running different versions and to sum it up its hell to watch.

And the only reason they implemented such a scheme was that accounting told them it would save money... So instead of having 8 drafter for the whole company we now have 12 (one for each project). Sometimes the world doesn't work with just numbers!

Slashdot Mirror

Data Storage Capacity Mostly Wasted In Data Center

39 of 165 comments (clear)