Data Storage Capacity Mostly Wasted In Data Center

Intentional? by Anonymous Coward · 2010-07-28 05:18 · Score: 5, Insightful

I don't know about your data center, but ours keeps drives well below full capacity intentionally.

The more disk arms you spread the operations over, the faster the operations get, and smaller drives are often more expensive than larger ones.

Plus, drives that are running close to full can't manage fragmentation nearly as well.

Re:Intentional? by TrisexualPuppy · 2010-07-28 05:33 · Score: 5, Insightful

Yep, that's how we run things at my company. Drives and controllers have fewer files to deal with, and all else assumed equal, you get better performance this way.

You also have to think of the obvious spare capacity. In 2005, my company invested in a huge (at the time) 10TB array. The boss rightfully freaked out when we were hitting more than 30% usage in 2007. After having a slow, quasi-linear growth of files for the previous couple of years, the usage jumped to 50% in a matter of months. It ended up that our CAD users switched to a newer version of the software without our knowledge (CAD group managed their own software) and didn't tell us. The unexpected *DOES* happen, and it would have been incredibly stupid to have been running closer to capacity.

Accounting would have probably had half of us fired if they hadn't been able to do their document imaging which tends to take up a lot of space on the SAN.

Yet another sad FUD or FUD-esque article based on Forrester's findings.
Re:Intentional? by Nerdfest · 2010-07-28 05:46 · Score: 4, Insightful

Simply put, over-provisioning is relatively harmless while under-provisioning is very bad.
Re:Intentional? by hardburn · 2010-07-28 05:48 · Score: 4, Insightful

FTA:

Rick Clark, CEO of Aptare Inc., said most companies can reclaim large chunks of data center storage capacity because it was never used by applications in the first place. . . . Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.
In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.

--
Not a typewriter
Re:Intentional? by nobodylocalhost · 2010-07-28 06:28 · Score: 1

Agreed. People keep on forgetting, it's not just storage, but iops matter too. When you are running a cluster with hundreds of VMs, you need to size out storage based on how much iops you can get out of these disks instead of how much storage you can give them. Even if you plan out space just enough for each and every application, if disk iops can't keep up at a useful speed, you will get applications that crash, stall, or generally performing horribly.

--
Where is the "Ignorant" mod tag?
Re:Intentional? by dmgxmichael · 2010-07-28 06:32 · Score: 2, Insightful

When I see services advertised at those kinds of rates I can't help but remember P.T. Barnum's slogan: "There's a sucker born every minute."
Re:Intentional? by Score+Whore · 2010-07-28 06:41 · Score: 4, Insightful

Not to mention the fact that over the last few years drive capacities have skyrocketed while drive performance has remained the same. That is, your average drive / spindle has grown from 36 GB to 72 GB to 146 GB to 300 GB to 400 GB to 600 GB, etc. while delivering a non-growing 150 IOPS per spindle.
If you have an application that has particular data accessibility requirements, you end up buying IOPS and not capacity. A recent deployment was for a database that needed 5000 IOPS with services times to remain less than 10 ms. The database is two terabytes. A simple capacity analysis would call for a handful of drives, perhaps sixteen 300 GB drives mirrored for a usable capacity of 2.4 TB. Unfortunately those sixteen drives will only be able to deliver around 800 IOPS at 10 ms per. Instead we had to configure one hundred and thirty 300 GB drives, ending up with over 21 TB of storage capacity that is about ten percent utilized.
These days anytime an analyst or storage vendor starts talking to me about thin provisioning, zero page reclaim, etc. I have to take a minute and explain to them my actual needs and that they have very little to do with gigabytes or terabytes. Usually I have to do this multiple times.
In the near future we will be moving to SSD based storage once more enterprise vendors have worked through the quirks and gained some experience.
Re:Intentional? by Anonymous Coward · 2010-07-28 06:44 · Score: 0, Insightful

Clearly the biggest waste is listening to analysts from Forrester Research (or other useless research company).

If your CxOs were hoodwinked by some con-slutant into buying super expensive storage "solutions" (and stuff like "blades"), then you'd probably also "need" expensive stuff like this to figure out how to allocate or reallocate the _overpriced_ space.

But if you got cheaper storage in the first place, it's better to just buy more storage if you run low on space than to spend lots of money on "solutions looking for problems".

Google, ebay, yahoo etc don't use such stuff. I doubt most companies should either.
Re:Intentional? by KernelMuncher · 2010-07-28 06:52 · Score: 3, Insightful

I think the above example is a great reason why you should always over-engineer your storage capability somewhat. Demand for space can come up unexpectedly and stop the whole show if it's not there. Also if you don't use the storage today, you will definitely make use of it tomorrow. Data usage always goes up, not down. So there's ROI for the next fiscal year when you can make use of the extra capacity.
Re:Intentional? by Dogers · 2010-07-28 07:00 · Score: 1

You might like to speak to 3PAR - when we got them in, they didn't only ask how much storage we wanted, they wanted to know how many IOPS we needed. Their stuff works on the basis that not all the data is needed all of the time. Put hot data on SSD, recent data on SAS/fibre drives and stuff that's not been touched for a while on SATA

--
I am a viral sig. Please copy me and help me spread. Thank you.
Re:Intentional? by HungryHobo · 2010-07-28 07:21 · Score: 1

I don't know about the workplace of the writer of TFA but when I worked in a big factory the price of downtime or a failure due to an application or a number of applications running out of disk space could potentially cause a million worth of damage in lost productivity or damaged product (say it gets stuck in a time sensitive step) in less than half an hour.
I heard claims that a full fab down could cost a million in 10 minutes though that could have been a slight exaggeration.
a million worth of extra disk space to significantly cut down on the chances of that happening or to allow apps to have their own disk or partition(so say one buggy app doesn't bring down 10 more) would barely have made the managers blink.
Re:Intentional? by marcosdumay · 2010-07-28 07:44 · Score: 1

That's even not talking about the fragmentation that inexorably follows a project that has the exact needed size, and all the costs of managing it.
If you are now using 60% of your storage capability, you are in troble since that can increase quite fast, not giving you time to buy adequate hardware. What follows is a hell of problems, partitioning storage servers, reallocating disks, reconfiguring workstations and so on.

--
Rethinking email
Re:Intentional? by mikeytag · 2010-07-28 08:19 · Score: 1

Well put. Ditto for us over here.
Re:Intentional? by mlts · 2010-07-28 08:46 · Score: 1

Don't forget filesystems. UNIX filesystem performance goes into the toilet as soon as drives get over 85-90% full, because the filesystem can't locate contiguous space for things, nor can it pre-allocate space around a file easily, so fragmentation results.
Re:Intentional? by Score+Whore · 2010-07-28 08:48 · Score: 1

We did a POC with an array based upon a caching architecture. Worked well as long as the cache happened to match the working set of active transactions, unfortunately a large enough percentage of the workload lead to cache misses which killed the per transaction performance for an equivalent percentage of transactions, which had cascading effects on DB threads, app server threads, connection pools, etc.
(HSM/Tiered storage == sophisticated caching strategy. Same effects apply.)
At the end of the day caching strategies will improve performance, but if you need guarantees you can't rely on cache.
Re:Intentional? by Anonymous Coward · 2010-07-28 09:25 · Score: 0

If you were running a mainframe (yep z/OS) you could run at 70% utilization without doing much special with the disk. But we spend time measuring what the hardware is doing an providing for the exceptions. And yeah we spend millions on the hardware. Dead or unused data is culled to tape, so this is 70% usage by active data.
Take that Windows and Linux folks. Though I do lust after ZFS!
Re:Intentional? by Gilmoure · 2010-07-28 09:51 · Score: 1

What about This Way To The Egress?

--
I drank what? -- Socrates
Re:Intentional? by uncleFester · 2010-07-28 11:30 · Score: 1

What about the fact* that if something runs amok in a thin-provisioned client and pins a LUN at 100%, the underlying allocation doesn't scale back DOWN after cleanup of such an event.. ending up with the wasted space anyway?
(or the rumour that our OS of choice doesn't really like the magic happening under the covers if you thin-provision, so we're better off avoiding it anyway)
-r
* .. our arrays being EMC and this is what the storage folk tell me.. what do i know, i'm the unix guy.

--
-'fester
Re:Intentional? by Anonymous Coward · 2010-07-28 18:21 · Score: 0

These days anytime an analyst or storage vendor starts talking to me about thin provisioning, zero page reclaim, etc. I have to take a minute and explain to them my actual needs and that they have very little to do with gigabytes or terabytes. Usually I have to do this multiple times.
How many times does it take until you realise that they are often sales hacks with a little bit of experience, a few buzz words, and no frigging idea WTF you are trying to tell them?
Re:Intentional? by randyleepublic · 2010-07-28 22:52 · Score: 0

From an Aptare press release: "APTARE StorageConsole 8 allows IT administrators to proactively monitor their entire storage environment through one centralized Web 2.0 console." Huh? What the fuck is a "Web 2.0 console"??? Did I miss an RFC or something? Sounds like middle management tomfoolery to me. EXPENSIVE middle management tomfoolery.

--
Social Credit would solve everything...
Re:Intentional? by bat21 · 2010-07-29 04:07 · Score: 1

They may not use such software, but I guarantee you they're using "super expensive storage 'solutions'". What do you think they do, plug 10,000 $1000 consumer grade storage arrays into a 1Gb iSCSI san? Massive corporations can afford the very best (in fact, they probably need it). Controllers with 8Gb redundant connections capable of servicing 20 or 30 drive trays of 10TB each...
Re:Intentional? by Chimel31 · 2010-07-29 08:37 · Score: 1

Not really. Google designs and builds their own servers.
The "super expensive storage solutions" are for suckers.
http://news.cnet.com/8301-1001_3-10209580-92.html
These expensive solutions are probably the reason why the analyst mentions saving $1M for each 100TB removed.
With 4U enclosures like backblaze's, you get 90TB for $11K of hardware and $6K (45 disks @ 8WH) of power usage per year.
An IT operator can control dozens of such enclosures, let's say a conservative 2 dozens. So $160K salary / 24 enclosures is $7K.
Add $7K for a full time dev and custom storage management software, and $14K for management (still for 24 enclosures).
That's still about $45K for 90TB all included, exactly 20 times less than the mentioned $1M for 100TB.
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
I replaced the 1.5TB disks with Seagate Barracuda XT SATA 6Gb/s 2TB disks at $200 on newegg in this computation.
Seagate's other models built in China have lots of problems that the XT doesn't seem to have.
Re:Intentional? by Chimel31 · 2010-07-29 08:48 · Score: 1

Well 10TB is rather small and can easily be filled up, so you need the low usage rate, but for datacenters with petabytes of storage space, which are what the article is mostly about, you can aggregate the unused space, remove some, and increase the global usage rate quite a lot, while still allowing for huge increase of storage usage from many of your customers or departments. Compared to the whole capacity of the datacenter, these increases are rather diluted to small percentages.
Re:Intentional? by jesset77 · 2010-07-30 11:35 · Score: 1

I agree with HungryHobo, but I also doubt the opposite side of that equation. How in hell do you estimate 100TB of storage costing $1 million usd? And over what time period? Per quarter, per annum? That works out to $10,000 per terabyte. I can pick up a 2TB SATA drive at frys for $180 (maybe less now). I have a Netgear SAN with 4 such drives at my house, running the equivalent of RAID 5 I've got 6TB in that alone. During the summer, my power bill is $40/mo, and I guarantee the SAN is a drop in that bucket compared to the other crazy gear running at my house.
No, what I think they did is tabbed up the total operating expense of some outfit somewhere and divided that into the size of their datacenter. Every accountant knows that Hookers and Blow should not be tabbed against disk capacity.

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
Re:Intentional? by jesset77 · 2010-07-30 11:40 · Score: 1

In other words, the whole thing is an attempt to get companies to spend tens of thousands of dollars for something that could be done by well-written shell script.
To be fair, "well-written shell script" is only an inexpensive solution for the author of that shell script. When you purchase expensive product A, you aren't spending money on the solution you are spending money on anchoring your liability in case of catastrophe (software bug misreports disk usage leading to slapstick) against the provider of the solution.
Put simply, shell scripts are the thongs of the IT world. They are too skimpy to Cover Your Ass with. 8I

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
Re:Intentional? by jesset77 · 2010-07-30 11:44 · Score: 1

At the end of the day caching strategies will improve performance, but if you need guarantees you can't rely on cache.
Words of wisdom, friend. Average is only a shorthand for bulk volume, it's peaks which challenge your bottleneck.

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.
Re:Intentional? by Tacticus.v1 · 2010-08-01 15:01 · Score: 1

You're comparing 2TB 7200rpm SATA drives running in a NAS with 15 and 10 krpm SAS disks running in a SAN ranging from 150USD for 146GB to 999USD for 600GB that's not counting the significantly more expensive card prices and everything else for running it
Re:Intentional? by jesset77 · 2010-08-01 15:47 · Score: 1

You're comparing 2TB 7200rpm SATA drives running in a NAS with 15 and 10 krpm SAS disks running in a SAN ranging from 150USD for 146GB to 999USD for 600GB that's not counting the significantly more expensive card prices and everything else for running it
You're correct, I picked up the extra concern of IOPS/access speed later in this thread. That pretty much answers my question about "How in hell" they arrive at that estimate.
But that also puts us in the position where TFA measures "dollars per hundred terabytes" without clarifying over what time interval, or the disk access speeds required. So long as gigabytes aren't the scarce resource, and people are purchasing whatever sized disks to use tiny portions in spanned arrays for speed, this is as useless as F-1 racers measuring speed in gallons per hour.

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.

But related to the cost of too little storage. by Z00L00K · 2010-07-28 05:18 · Score: 1

The cost of too much storage isn't bad.

Of course - you may say that it's necessary to delete old data, but in some cases you can't know which old data that may be needed again.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.

Shhhhh by Anonymous Coward · 2010-07-28 05:18 · Score: 0

I want more toys.

100TB = $1 million by maxwell+demon · 2010-07-28 05:19 · Score: 2, Insightful

I didn't know that I've got $25000 dollars worth of storage at home :-)

--
The Tao of math: The numbers you can count are not the real numbers.

Re:100TB = $1 million by phantomcircuit · 2010-07-28 05:23 · Score: 2, Informative

I didn't know that I've got $25000 dollars worth of storage at home :-)
It's not worth that much in your home, unless you happen to have redundant power supplies and redundant uplinks.
Re:100TB = $1 million by Luyseyal · 2010-07-28 05:30 · Score: 2, Funny

And "human resources".
-l

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
Re:100TB = $1 million by aliquis · 2010-07-28 05:40 · Score: 3, Funny

And "human resources"
"I'll go build my own data center, with blackjack and hookers!"?
Re:100TB = $1 million by Luyseyal · 2010-07-28 05:50 · Score: 2, Funny

In fact, forget the data center and blackjack!
-l

--
Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
Re:100TB = $1 million by hansamurai · 2010-07-28 05:55 · Score: 1

That's actually cheap compared to the prices I heard quotes at my company the other day. So sad.

--
Reviewing just the first hour of video games.
Re:100TB = $1 million by Anonymous Coward · 2010-07-28 05:56 · Score: 0

And the cost of the SAN infrastructure...DERP
Re:100TB = $1 million by Anonymous Coward · 2010-07-28 06:42 · Score: 1, Informative

Just get married. Of course that will cost you more in the long run - hookers are bounded by the hour.
Re:100TB = $1 million by Conception · 2010-07-28 10:29 · Score: 1

And 15K Fiber Channel drives.
Re:100TB = $1 million by Anonymous Coward · 2010-07-28 17:52 · Score: 0

Ah screw it just forget the whole thing.

Let's play the odds: by fuzzyfuzzyfungus · 2010-07-28 05:20 · Score: 5, Insightful

Likelihood that I get fired because something important runs out of storage and falls over(and, naturally, it'll be most likely to run out of storage under heavy use, which is when we most need it up...): Relatively high...

Likelihood that I get fired because I buy a few hundred gigs too much, that sit in a dusty corner somewhere, barely even noticed except in passing because there is nobody with a clear handle on the overall picture(and, if there his, he is looking at things from the sort of bird's eye view where a few hundred gigs looks like a speck on the map): Relatively insignificant...

Re:Let's play the odds: by qbzzt · 2010-07-28 05:29 · Score: 3, Insightful

Exactly, and that's the way it should be. Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

--
-- Support a free market in the field of government
Re:Let's play the odds: by ultranova · 2010-07-28 06:08 · Score: 1

Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.

A few hundred dollars gets you a few terabytes (it's around 163 dollars for a 2 terabyte drive in the first netstore I checked), not a few hundred gigabytes. Or are these "enterprise harddrives" ?-)

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Let's play the odds: by _damnit_ · 2010-07-28 06:29 · Score: 2, Insightful

Of course this is the case. This study is as exciting as news that George Michael is gay. There have been plenty of studies to this effect. My company makes tons of money consulting on better storage utilization. [Some Fortune 500 companies I've visited run below 40% utilization.] EMC, IBM, HDS, NetApp and the rest have no real interest in selling you less drives. They all make vague, glossy statements about saving storage money but in reality you need to be wasteful if you want to protect your ass. Think of the things we spend $ on just to get another 9 on the uptime digits: UPS, generators, clustering, DR systems/networks that sit idle, dark fibre between datacenters, RAID 1(+0), RAID 6, tapes, VTLs, Storage Arrays, redundant Fibre Channel SANs, . . .
From a human perspective, fuzzyfungus is right. Over-engineering is less likely to cost your job than failure. Plus, over-engineering is easy to justify.
Some things are just known to cost money if you MUST ensure that business is not subject to fallibility in hw and sw. The fact that there are 50 TBs unused out of your 200 TB of usable storage really might not mean too much. [Some of the numbers quoted could point to the mirrored side of RAID 1 stripes as wasted. It's a cheap gimmick to make the numbers look worse but still true to a certain extent if the performance difference between R5 and R1 is not needed.] Of course, there are usually low hanging fruit that can be attacked to save real money and prevent cascading costs on the other cost centers mentioned above but there will always be waste. It's the cost of five 9's.

--

_damnit_

It's my job to freeze you. -- Logan's Run
Re:Let's play the odds: by wagnerrp · 2010-07-28 06:51 · Score: 3, Insightful

They're not buying the $100 2TB bargain special, they're buying the $300 300GB 15K SAS drive. They don't care how much storage they have, they just want the IOPS.
Re:Let's play the odds: by TubeSteak · 2010-07-28 06:54 · Score: 1

Exactly, and that's the way it should be. Your CTO wants you to suggest spending a few extra hundreds of dollars on storage to avoid downtime.
The way we build servers and do storage has changed *massively* over the last 10 years.
Why is it so hard to imagine that storage is going to evolve again?
FTFA

Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises.
"Our customers can see a return on the price of the software typically in about six months through better utilization rates and preventing the unnecessary purchase of storage," Clark said.
A minimum of $5,000 per month strikes me as a touch more than "spending a few extra hundreds of dollars on storage."

--
[Fuck Beta]
o0t!
Re:Let's play the odds: by Shotgun · 2010-07-28 07:29 · Score: 1

NetApp and the rest have no real interest in selling you less drives.
Then why is about half of their feature set aimed at helping their customers reduce storage usage (wafl file system, dedupe, etc)?
Why have the instituted a systems group to do nothing BUT coach customers in how to reduce disk usage?
There is a LOT of competitive advantage in selling less drives.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Re:Let's play the odds: by BlackSnake112 · 2010-07-28 08:21 · Score: 1

Most CIOs would not risk their job on non enterprise hard drives. The regular drives may be cheaper, but they may also fail sooner. Data centers and the like are most likely using enterprise level drives.
That being said. Many of us have had enterprise drives fail in under a month and have consumer level drives that are still going strong after 10+ years.
Re:Let's play the odds: by Cramer · 2010-07-28 08:48 · Score: 1

My quick back-of-napkin math... I can build a 100TB storage system in one 42U rack for ~150k -- and that's with "enterprise" 450G 15k RPM SAS drives. It'll draw around 5Kw meaning it'll cost sub-10k$ per year to run. (cooling requirements not included. but assuming 5ton would do: $30k for the system, $10k/yr to run it.)
Re:Let's play the odds: by 7213 · 2010-07-28 09:24 · Score: 1

I'm a storage administrator, and I'll be the first to tell you the application knowledge rarely falls down to my level. When it does, half the time it's pure crap. The other half, we can architect something intelligent & I go home feeling good.
That being said:
Being on the other side of that wall, I do get fed up with the "I can just buy a bunch of disk, slap it in a server & call it a disk array" game. The software for redundancy on the level of quality of the crappiest clarrion array with dual SPs is just not there (it will be soon I think). Also, at a large enough scale, putting the slow IO bulk storage on the 'big expensive SAN disk' does make sense, in that you can use capacity on disks that may be near the IO limit but barely touched on capacity. This -CAN- cost less then buying/supporting another subsystem (like netapp). Most large companies have no way to charge you less for the lower IO profile though, as they just kick back the purchase price by GB :-( so you are subsidizing the IO of some IO hog, the company however, isn't spending more money to do this.
The real problems are
1) bean counters who are totally lost, but think they aren't.
2) application & server people who don't understand storage*, but think they do.
3) storage people who don't understand the app, but think they do.
you find the solution to THOSE 3 issues & you can write your own ticket!
(* most times the 'app people', don't even understand the shrink wrapped app they just purchased & the vendor IO profile is a crazy over estimated CYA)
Re:Let's play the odds: by MarkTina · 2010-07-28 11:05 · Score: 1

And the support and software layers ?
The tin is cheap, supporting it and providing useful features are what rack up the pennies.
Re:Let's play the odds: by dave562 · 2010-07-28 12:33 · Score: 1

From a human perspective, fuzzyfungus is right. Over-engineering is less likely to cost your job than failure. Plus, over-engineering is easy to justify.
Exactly. I'm working for a company that provides a software based service to law firms. The bill to a single client can eclipse $150,000 PER MONTH. With that kind of money being thrown around, the expectation is that the application will be up and running, ALL THE TIME. As odd as it, there are people connected into the system at 4am sometimes (with a 15 minute idle timer, you can bet they really are connected). The storage requirements of the system are pretty intense and only getting more so. Often times cases will sit in the system for years "just case". Quite frequently, the "just case" turns into "wow, we really did need that data again". It is worth it to us to hold onto the data for the clients at a rate of $50 a gigabyte. They aren't just paying for the raw disk space, they're paying for the accessibility.
We're using a mixture of drives. 15K SAS for the transaction logs and database servers. 7200RPM SATA for the file share stuff (scanned documents, etc).
Re:Let's play the odds: by drsmithy · 2010-07-28 20:41 · Score: 1

My quick back-of-napkin math... I can build a 100TB storage system in one 42U rack for ~150k -- and that's with "enterprise" 450G 15k RPM SAS drives.
What's managing it and how are you connecting to it ?
Re:Let's play the odds: by jesset77 · 2010-07-30 11:59 · Score: 1

I read years back that Google's data centers use largely commodity servers and drives, but their operations assume so much data redundancy that no one drive failure hurts them. They pull the whole server whenever it suits them, plug a spare in and send it to the coroner.
It really just makes my ears bleed to hear that seven years after I've read this, most organizations are still futzing over the reliability or IOPS of single drives. Why cannot the reliability and access speed be spread over a larger number hardware instances, thus taking each one individually off the hook?
Many hands make lighter work, they say.

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.

Slashvertisement by hcdejong · 2010-07-28 05:21 · Score: 5, Insightful

for a storage monitoring system.

You know if they were under provisioning by Anonymous Coward · 2010-07-28 05:22 · Score: 1, Interesting

The story would be generating much gnashing of teeth about the evil corporations and the corner cutting that was bringing down our pink unicorns.

Can win for losing around here.

Overprovisioning by shoppa · 2010-07-28 05:23 · Score: 3, Interesting

It's so easy to over-provision. Hardware is cheap and if you don't ask for more than you think you need, you may end up (especially after the app becomes popular, gasp!) needing more than you thought at first.

It's like two kids fighting over a pie. Mom comes in, and kid #1 says "I think we should split it equally". Kid #2 says "I want it all". Mom listens to both sides and the kid who wanted his fair share only gets one quarter of the pie, while the kid who wanted it all gets three quarters. That's why you have to ask for more than you fairly need. It happens not just at the hardware purchase end but all the way up the pole. And you better spend the money you asked for or you're gonna lose it, too.

Re:Overprovisioning by Maarx · 2010-07-28 05:28 · Score: 5, Insightful

That mother is terrible.
Re:Overprovisioning by Anonymous Coward · 2010-07-28 05:28 · Score: 0

Your siblings must have had a horrible upbringing, with you always taking more than your fair share. Though I'm sure it worked out for you nicely. also your mother is a whore.
Re:Overprovisioning by Anonymous Coward · 2010-07-28 05:39 · Score: 0

He's American so gobbling down 3/4ths of a pie is just a bite-size snack to his oversized gullet.
Re:Overprovisioning by Archangel+Michael · 2010-07-28 05:44 · Score: 3, Insightful

Dad here. Had that fight (or similar). I asked a simple question to the kid who wanted it all. I asked him "all or nothing?" and again he said "all", to which I said "nothing".
Of course he rightly cried "Not Fair!!!", and I said, you set the rules, you wanted it all, setting the rule up that you didn't want to be fair, I'm just playing by your rules.
Never had that problem again. EVER.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by Zerth · 2010-07-28 05:47 · Score: 1

And works in the budgeting dept of a company I'm glad I'm no longer at.
Re:Overprovisioning by MagicM · 2010-07-28 05:55 · Score: 1

I asked him "all or nothing?"
At that point he was screwed. If he said "nothing", he could reasonably expect to get nothing. His only option was to say "all" if he wanted to get a chance at something.
Re:Overprovisioning by hoggoth · 2010-07-28 05:56 · Score: 1

My dad used to try that fucking psychology on me. I wish he had just hit me and gotten it over with.

--
- For the complete works of Shakespeare: cat /dev/random (may take some time)
Re:Overprovisioning by Anonymous Coward · 2010-07-28 05:58 · Score: 0

That mother is terrible.
Terrible? That's a *nice* word for the woman that raised Glenn Beck
Re:Overprovisioning by Anonymous Coward · 2010-07-28 05:59 · Score: 0

His fatass porker son didn't need any pie to begin with. The fatty should have been on the treadmill instead of huffing and puffing trying to scam more pie.
Re:Overprovisioning by Lunix+Nutcase · 2010-07-28 06:02 · Score: 1, Insightful

There's a reason his mom killed herself. Would you want to be known as the one who gave birth to that festering, pustule of fat?
Re:Overprovisioning by Anonymous Coward · 2010-07-28 06:07 · Score: 0

Worst analogy ever!
Re:Overprovisioning by Archangel+Michael · 2010-07-28 06:13 · Score: 2, Insightful

Nope, he wasn't screwed, because it wasn't the only option; it was a false dichotomy. I gave him a chance to offer another choice, it was just veiled. Kobioshi Maru. He could have thought about it and said "half" even though that wasn't an obvious choice.
I often give my kids tests to break them out of self imposed boxes (false dichotomy). Pick a number between 1 and 10 .... 1 - no, 2 - no, 3 - no, 4 - no .... 9 - no, 10 no ... THAT IMPOSSIBLE DAD!!.
No it isn't. The number I had in mind was Pi.
Raising kids to think for themselves, and outside the "boxes" society tends to put on things makes them able to deal better with things that don't appear to make sense.
You can dumb down your kids by not challenging them, or you can challenge them every step of the way, in ways that force them to learn more than they know.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by Archangel+Michael · 2010-07-28 06:22 · Score: 2, Funny

I wish I would have hit you now - Dad

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:Overprovisioning by omglolbah · 2010-07-28 06:25 · Score: 1

Oh my mom was much more devious.
She would let one of us cut the pie, and the other pick the first piece....
Now imagine a 14 and 11 year old using nasa-style tools to divide a piece of pie ;)
Re:Overprovisioning by Culture20 · 2010-07-28 07:50 · Score: 2, Insightful

At that point he was screwed. If he said "nothing", he could reasonably expect to get nothing. His only option was to say "all" if he wanted to get a chance at something.
If my son (nobly or stubbornly) said "nothing", I'd offer him half or nothing. Parents are allowed to alter the deals. Pray that they alter them further.
Re:Overprovisioning by Anonymous Coward · 2010-07-28 08:00 · Score: 1, Insightful

Oh my mom was much more devious.
She would let one of us cut the pie, and the other pick the first piece....
That's not devious - all moms with even a lick of sense do it that way.
Re:Overprovisioning by Yogs · 2010-07-28 08:20 · Score: 1

As are most managers.
Re:Overprovisioning by sheph · 2010-07-28 08:51 · Score: 2, Insightful

Well done, man!! See, some folks just don't know what tough love is, and the positive impact it can have. You wanna run for office in 2012? We could use someone like you after the current round of buffoons!

--
I don't believe in karma, I just call it like I see it.
Re:Overprovisioning by Jaime2 · 2010-07-28 10:31 · Score: 1

Another factor is that it is way too expensive to re-provision. Where I work, you might as well ask for 5 times what you need, because if you go back and ask for an increase, the labor to do it costs more than the storage. I really shouldn't take 20 hours of someone's time to make my LUN bigger, but that's what the storage team will bill me for. If re-provisioning only required that you pay for the additional storage, I wouldn't worry about it.

Good storage virtualization fixes most of these problems, but it seems like nobody wants to invest in it.
Re:Overprovisioning by Anonymous Coward · 2010-07-28 19:56 · Score: 1, Insightful

Not exactly. All parties can have "nothing", but only one party can have "all". Therefore, those who say "nothing" do so in the interest of fairness, and that fairness is rewarded. Assuming that the kid is old enough to understand this, it can be a great lesson.
Captcha: maturely
Re:Overprovisioning by Archangel+Michael · 2010-07-29 02:23 · Score: 1

I'd love to. However, I wouldn't even vote for me.
In the words of Groucho Marx "I don't care to belong to a club that accepts people like me as members"

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.

Disk space is free by amorsen · 2010-07-28 05:27 · Score: 5, Interesting

Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.

I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.

--
Finally! A year of moderation! Ready for 2019?

Re:Disk space is free by eldavojohn · 2010-07-28 05:44 · Score: 2, Interesting

Who cares if you leave disks 10% full? To get rid of the minimum of 2 disks per server you need to boot from SAN, and disk space in the SAN is often 10x the cost of standard SAS disks. Especially if the server could make do with the two built-in disks and save the cost of an FC card + FC switch port.
I/O's per second on the other hand cost real money, so it is a waste to leave 15k and SSD disks idle. A quarter full does not matter if they are I/O saturated; the rest of the capacity is just wasted, but again you often cannot buy a disk a quarter of the size with the same I/O's per second.
I don't know too much about what you just said but I do know that the Linux images I get at work are virtual machines of a free distribution of Linux. I can request any size I want. But my databases often grow. And then the next thing is that a resizing of a partition is very expensive from our provisioner. So what do we do? We estimate how much space our web apps take up a month and then we request space for 10 years out. Because a resize of the partition is so damned expensive. And those sizes are usually pretty small anyway if you're building databases. Then we occasionally notify our managers when space is getting low by using the provisioner's dashboard tool and we re-assess the application. Is it getting unexpectedly popular or was it bad estimation from the beginning?

I don't know if I should be bothering with the hardware level of things. I sure do like it this way even though it is a really expensive price for the project but the payment remains inside our company anyway. It's internal to the company so we're all using some nebulous group of actual machines and RAIDs to produce a massive cloud of smaller servers as images. There are some downsides and a bit of overhead to pay for virtualization but I thought everyone had moved to this model ...

--
My work here is dung.
Re:Disk space is free by bobcat7677 · 2010-07-28 05:48 · Score: 2, Interesting

Parent has an excellent point. Utilization is not always about how full the disk is...especially in a data center where there is frequently large database operations requiring extreme amounts of IOPS. In the past, the answer was to throw "more spindles" at it. At which point you could theoretically end up with a 20GB database spread across 40 SAS disks making available ~1.5TB of space using the typical 73GB size disks just to reach the IOPS capacity needed to handle heavy update/insert/read operations. Huge waste of space, but only way to do it with spinning disks. SSDs of course can solve the problem, but most SAN vendors are still charging insane prices for what meager SSD options they offer, with some vendors not even offering SSD options yet. And then you can end up on the other end of the scale, with having to buy more IOPS capacity then you need just to get enough SSD space for your data. Adaptec has some cool technology for "hybrid" arrays consisting of both SSDs and spindle disks in the same array (I have heard the latest versions of Solaris can do this with ZFS too). But the applications for Hybrid arrays are somewhat limited because write performance still sucks once any available write cache is saturated (and especially if the controller/software array has no cache).
Re:Disk space is free by joe_frisch · 2010-07-28 06:09 · Score: 1

The $1million / 100TB might be real, though it seems high, but he great majority of that is NOT hardware costs. In fact having larger disks than you need may reduce the management costs - less chance a particular disk set will become full, extra space to move data from failing disks, etc.
Re:Disk space is free by marcosdumay · 2010-07-28 07:56 · Score: 1

Those virtual machines are stored on a real SAN somewhere. The SAN administrator deals with all the things the GP said, that is why you don't need to understand it. Anyway, he'd better have some spare capacity and plan based on I/O, and not storage size (he probably did), otherwise, you'll have big unknown risks.

--
Rethinking email
Re:Disk space is free by dave562 · 2010-07-28 12:42 · Score: 1

There are some downsides and a bit of overhead to pay for virtualization but I thought everyone had moved to this model ...
And virtualization isn't always the way to go. It is great for a lot of environments, but sometimes you have an application that really does need all of the cores and all of the RAM a box might have.

Mod parent up by Anonymous Coward · 2010-07-28 05:36 · Score: 0

Interesting. Was the culprit all cad files out of the new rev?

Re:Mod parent up by TrisexualPuppy · 2010-07-28 05:45 · Score: 2, Insightful

Interesting. Was the culprit all cad files out of the new rev?
Yes, for the most part. Because of a bad config, they were going from drawings around 1-10MB to drawings over 100MB. That's what happens when you get management to take the IT department out of the software management and configuration equation. We were, of course, still left to sweep up the pieces.
Re:Mod parent up by minorproblem · 2010-07-28 16:08 · Score: 2, Interesting

i've seen worse. At my company they moved the CAD software management to drafters and then they broke up the drafting department and just assigned each drafter to a team. I am an engineer and i sit near the IT department. I feel sorry for the poor buggers, now not only do they have to run around like headless chooks. But so do the CAD drafters because before the load level was done by a head drafter allocating work. now its managers running around asking other managers can they "borrow" there drafter, and we have different people running different versions and to sum it up its hell to watch.
And the only reason they implemented such a scheme was that accounting told them it would save money... So instead of having 8 drafter for the whole company we now have 12 (one for each project). Sometimes the world doesn't work with just numbers!

Or IT is provisioning for peak usage by Todd+Knarr · 2010-07-28 05:37 · Score: 3, Informative

Having too much storage is an easy problem. Sure it cost a bit more, but not prohibitively so or you'd never have gotten approval to spend the money. Not having enough storage, OTOH, is a hard problem. Running out of space in the middle of a job means a crashed job and downtime to add more storage. That probably just cost more than having too much would've, and then you pile the political problems on top of that. So common sense says you don't provision for the storage you're going to normally need, you provision for the maximum storage you expect to need at any time plus a bit of padding just in case.

AT&T discovered this back in the days when telephone operators actually got a lot of work. They found that phone calls tend to come in in clumps, they weren't evenly distributed, so when they staffed for the average call rate they ended up failing to meet their answer times on a very large fraction of their calls. They had to change to staffing for the peak number of simultaneous calls, and accept the idle operators as a cost of being able to meet those peaks.

Re:Or IT is provisioning for peak usage by kirillian · 2010-07-28 06:41 · Score: 1

Queue theory...one of the oddest choices for a topic to cover in operating systems class in college, but the most intriguing and useful thing I ever got out of all of my classes - honestly probably the only thing that I use day to day that I learned in class and not from teaching myself. The concept of analyzing a process that can be described with a queue (such as a datacenter or the telephone operators) and then finding an efficient means of handling the queue, including managing desirable wait times and total time in queue is incredibly applicable in corporate environments. Personally, I think queue theory would probably be more useful to business people than most of the other things that they teach them.
Re:Or IT is provisioning for peak usage by Rene+S.+Hollan · 2010-07-28 10:06 · Score: 1

Actually, telcos ramp their operator staff up and down in response to expected call volume based on historical time of day and day of week trends.
This was quite important when I worked on an automated 411 services that front-ended operators with computers doing speech recognition. Only if the system could not help the caller would it get routed to a human.
Regulatory requirements placed limits on the distribution of time to queue for an operator but not as stringent ones on returning a busy signal when staff was unavailable.
To avoid the scenario where some media announcement precipitated large numbers of 411 calls (usually for the phone number for information for some upcoming event that was missed in the radio ad), that would then queue for live operators, we adjusted the number of simultaneous calls our equipment would accept to follow the operator staffing levels.
During staffing ramp ups, I learned the hard way that the DMS100 was not too forgiving of a bunch of T1 trunks becoming active all at once: the digital trunk cards would mess up the wink start timing until things settled down. We actually had to bring the automated service lines up slowly so as not to overwhelm the trunk cards.

--
In Liberty, Rene

Need to read this one carefully by Anonymous Coward · 2010-07-28 05:38 · Score: 1, Interesting

If you RTFA (and admittedly, this is not very clear), the article tries to make the point that you don't need all of this storage capacity to be live. However, you've got a bunch of storage pools or machines just running idling as opposed to actually doing something. What the article is trying to say is that using provisioning tools that will spin up storage pools or servers as they are needed (as capacity increases) is a much better solution to just leaving them running. Obviously peak load will cause issues, but you configure your provisioning tools to be smarter to start bringing up capacity at lighter loads or specific times of day. The point still stands that most data centers just have idling machines that could just as easily be shut off most of the time and automatically brought up when needed, it's just that most do not use these tools despite the savings in electricity, wear, and cooling costs.

The article confounds the issue by starting to talk about the lack of monitoring tools that leads to overprovisioning, and ends with a discussion as to how to make the storage problem more efficient (thin provisioning). Thing is, thin provisioning only works when you have the extra capacity, but it's not live until you need it. You still need to overprovision, but you won't be running all those resources idle at once just in case.

it's cheaper to waste space by alen · 2010-07-28 05:38 · Score: 1

2 146GB drives from HP are less than $500 for the SAS drives. you can put the same storage on an EMC SAN and provision less for the system drive for a Windows server but by the time you pay their crack dealer prices for hard drives along with the drives for the BCV volumes and pay for the fiber switches and g-bics and HBA's and everything else it's cheaper to waste space on regular hard drives

Re:it's cheaper to waste space by Anonymous Coward · 2010-07-28 05:44 · Score: 0

GBICs and 146GB drives are cheap even from EMC these days. The 500GB drives and SFPs are pretty spendy though.
Re:it's cheaper to waste space by Anonymous Coward · 2010-07-28 05:59 · Score: 0

2TB SAS drives for $500? Where can I buy?

CYA Approach by MBGMorden · 2010-07-28 05:40 · Score: 4, Informative

This is the CYA approach, and I don't see it getting any better. When configuring a server, it's usually better to pay the marginally higher cost for 3-4x as much disk space as you think you'll need, rather than risk the possibility of returning to your boss asking to buy MORE space later.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain

Re:CYA Approach by petermgreen · 2010-07-28 06:35 · Score: 1

And it may well make economic sense too at least if you are talking about a low end server with a pair of SATA drives (though it depends how much your server vendor rips you off on hard drives).
Upgrading drives later has a lot of costs on top of the raw cost of getting the extra drives.
How much does it cost to get hold of those extra drives? (at uni recently someone told me that the total cost of processing a purchase order worked out to about £40 now admittedly some of that is fixed costs but still it makes you think about how you order stuff)
How much does it cost for the server monkey's time to add extra drives?
How much does it cost for the sysadmin time to reconfigure the box to use those new drives?

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

100 TB for $1,000,000? No way! by Anonymous Coward · 2010-07-28 05:42 · Score: 1, Informative

OK, bare 1TB enterprise class drives cost about $130 at Newegg retail. (half that price if you go for standard grade disks)
A hundred such disk drfives will set you back $13,000.
Figure another $10,000 for mounting, power supplies, connectors, and other obvious hardware.
Another $2,000 for four racks.

Floorspace? Racking them loosely gives you 25 per rack or 4 racks. Each rack is about 10 square feet, or 40 square feet
At $10/square foot, that's maybe $400 or $500 a month or around $5,000 per year
Electricity? 100 drives at 8 watts per drive yields full time load of 800 watts;
at a nominal $0.15 per KWhr, that's around $1100 per year in electric bills.
Air Conditioning ... roughly the same as the power cost ... another $1100 per year.
Replacement at 2 percent failure rate is perhaps $200 per year.

Human costs? The cost of labor to support a 50 TB disk farm can't be much different from that of a 100TB farm.
Indeed, it's probably less labor (and software) intensive to have a system with great overcapacity than one that needs squeezing.
In either case, at most, a 100TB disk farm might need 2 full time staffers. Generously, that's $150,000 per year.

So hardware costs of a 100TB system is around $25,000.
And annual operating costs of around $3,000 per year.
Labor costs of $150,000 per year.

Where do they get the $1,000,000 per year?

Huh? by Anonymous Coward · 2010-07-28 05:42 · Score: 0

Shitloads of unused disk space is what I *want*.

sounds like the consultants are having a slow year by alen · 2010-07-28 05:45 · Score: 2, Interesting

time to go and buy up all kinds of expensive software to tell us something or other

it's almost like the DR consultants who say we need to spend a fortune on a DR site in case a nuclear bomb goes off and we need to run the business from 100 miles away. i'll be 2000 miles away living with mom again in the middle of no where and making sure my family is safe. not going to some DR site that is going to close because half of NYC is going to go bankrupt in the depression after a WMD attack

ISPs & hosting services by shmlco · 2010-07-28 05:47 · Score: 2, Insightful

This isn't like an ISP overbooking a line and hoping that everyone doesn't decide to download a movie at the same time. If a hosting service says your account can have 10GB of storage, contractually they need to make sure 10GB of storage exists.

Even though most accounts don't need it.

One client of mine dramatically over-provisioned his database server. But then again, he expects at some point to break past his current customer plateau and hit the big time. Will he do so? Who can say?

It may be a bit wasteful to over-provision a server, but I can guarantee you that continually ripping out "just big enough" servers and installing larger ones is even more wasteful.

Your pick.

--
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.

Re:ISPs & hosting services by shentino · 2010-07-28 06:07 · Score: 1

If you need it, is it really a waste?
Re:ISPs & hosting services by Chimel31 · 2010-07-29 10:23 · Score: 1

Adding more servers is wasteful only if you have poor storage management.
If you need more space, you should be able to just add a new server and allocate that extra space to the customers who need it, or not allocate anything at all and bill the customers for the actual space/transfer used or for the extra TBs above their quota. You probably don't even need quotas at all with a smart storage management software.

This isn't a new problem... by Mysticalfruit · 2010-07-28 05:49 · Score: 1

This is one of the arguments that's made for using a SAN. Consolidate to make better use of the disk space. Smaller footprint, less power, etc.

--
Yes Francis, the world has gone crazy.

Re:This isn't a new problem... by petermgreen · 2010-07-28 06:46 · Score: 1

However SANs have issues of thier own
1: they are EXPENSIVE, figure you will be paying many times the cost per gigabyte of ordinary drives. Particually if you buy the SAN vendors drives so you get support. This tends to cancel out more efficiant use of space.
2: Even a 1U server has space for a few drives inside, so if you use a SAN with 1U servers it will probablly take up more space than just putting the drives in the servers. Blades would reduce this issue but come with issues of thier own (e.g. vendor lockin)
3: if something does go wrong with a SAN it means everything has problems at once. This can leave all sorts of IT services down for days as IT scramble to first fix the SAN and then fix everything that depends on the SAN (seen this happen at the uni I go to).
I have my doubts on the power consumption front too. Afaict drives are a negligable part of a modern computers power consumption anyway.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

On thin provisioning by JasonM314 · 2010-07-28 05:56 · Score: 1

Thin provisioning doesn't fix this problem. At least not today.

The only way thin provisioning fixes this problem is if you over-commit the thin pool. That's all well and good, but currently, any given storage chunk that is allocated to a server is stuck being allocated to that server. So, if I were a server admin who found out he'd been given thin LUNs in an over-commited pool, I know that if my neighboring admins don't keep track of their storage use, then my server could wind up crashing because they took up all the storage. So instead, I'm going to write a script first thing when I get the storage to write a text file clear across the drive. There. Now my disk is fully provisioned, and my neighbors can use all the pool they want, it won't affect me. 'course, not everyone can do that, or the pool will fill up lickety split.

Now, someday, the servers will be smart enough to tell the storage array when they're done with a chunk of storage. At which point, that part of the pool can be freed up. When that happens (and it will, but it's going to take some time), thin pools will be ideal. Everyone will have all the storage they need almost all of the time.

However, that day isn't here yet. In the mean time, there are interesting performance reasons to use thin provisioning, but not space-related ones.

Re:On thin provisioning by Guido+von+Guido · 2010-07-28 06:54 · Score: 1

The only way thin provisioning fixes this problem is if you over-commit the thin pool. That's all well and good, but currently, any given storage chunk that is allocated to a server is stuck being allocated to that server. So, if I were a server admin who found out he'd been given thin LUNs in an over-commited pool, I know that if my neighboring admins don't keep track of their storage use, then my server could wind up crashing because they took up all the storage. So instead, I'm going to write a script first thing when I get the storage to write a text file clear across the drive. There. Now my disk is fully provisioned, and my neighbors can use all the pool they want, it won't affect me. 'course, not everyone can do that, or the pool will fill up lickety split.
How exactly is using up all of your thinly provisioned disk on purpose all at once any different from your peers not watching their disk use? Answer: they might cause a problem, and you have.
As the storage admin, I'd walk over to your desk and smack you. I'm the one who's watching the size of the pool, and I'm the one who will order new disk when it's necessary. I'm the one who will make other arrangements if management doesn't fork up the money for the disks.
Depending on the technology in use, "other arrangements" could mean the migration of LUNs to other storage arrays behind the scenes (i.e., no downtime), moving virtual machines with storage vmotion, or other, usually uglier methods of dealing with it (i.e., stop the application, migrate the data manually somewhere else, bring up the application).
Re:On thin provisioning by mysidia · 2010-07-28 07:22 · Score: 1

Now, someday, the servers will be smart enough to tell the storage array when they're done with a chunk of storage.
Servers are that smart... most commonly this is needed for using SSD drives -- SCSI PUNCH, SATA TRIM, or writing a block of all zeros to a sector.... there are OS configurations that support this, and if you don't have an OS that can handle it -- a simple piece of software can take care of this, but most SANs do not understand/take advantage of the server sending those commands.
The servers aren't dumb, the uber-proprietary ultra-expensive SANs are. And when the SAN vendors eventually get the feature to understand the servers' SCSI PUNCH commands, it will probably require a few more million in additional licensing costs, in addition to having current support/upgrades agreements in place.
Writing an all zeros sector is probably the most supported. Some SANs have dedup functionality, and an all-zeros sector is easy to dedup: it just requires special software running on the server.
"Writing to all sectors" in an overcommitted pool doesn't guarantee squat, when the SAN is operating with special features such as snapshots, clones, copy on write foundation, etc.
In some environments, your ability to write or change sectors on your disk may depend on there being free additional space available in the pool, even if you've already written to that sector.
If the pool runs out, which you would in fact be making more likely with server admins pulling such stupid shenanigans, the I/O could easily still get blocked, even though the server had "written to all sectors" previously.
As for a server suddenly trying to use all the thin-provisioned disk space, there's a fix for that too: quotas.
Or, restricting the rate at which a server can consume additional thin-provisioned storage before setting off alarm bells and throttling the server's I/O limit down to forcibly reduce the rate of additional consumption.

Do the math by Anonymous Coward · 2010-07-28 06:02 · Score: 1, Insightful

70% used space with the 100TB mentioned in the article, leaving us with 70TB.

Think of how much porn 70TB is!

Re:Do the math by Score+Whore · 2010-07-28 07:03 · Score: 1

Think of how much porn 70TB is!
hottiehost$ find . -type f | wc -l
8433275
hottiehost$ bc
scale=3
8433275*(70/83)
7109250.825
It's just over seven million images...

Looking at the numbers.... by paulsnx2 · 2010-07-28 06:02 · Score: 1

So ... 100 TB / 1 Million ==> 1 TB / $10,000.

A 1 TB drive is 60-100 dollars.
The KW/h required to run a 60 watt drive 24/7 = 60/1000 KW x 24 hours x 365.25 days = 526 KW/h.
At .12 cents per KW/h, that's 63.12 per year.

Even if we double or triple the hardware costs, they will only make up a few percentages of the 10 grand per TB cited here.

The labor to maintain 100 or 200 or 400 drives is going to be relatively constant. In fact, with a little more reasonable monitoring software (just reporting drive failures in a raid system, so the labor just has to pull bad drives and replace with good drives), I don't think the capacity of a data center is all that related to the labor costs.

End result, it is just cheaper and easier to throw hardware at problems to reduce labor costs than to pay for expensive software to monitor capacity and be more efficient in the use of capacity.

Re:Looking at the numbers.... by Chimel31 · 2010-07-29 10:44 · Score: 1

60 watts? A 2TB performance SATA III drive (not the "green" low power drives) is about 8W (6 on idle, 9 on load).
So that's 8x24*365 = 70KW, more like $6.
http://www.seagate.com/www/en-us/products/internal-storage/barracuda-xt-kit/#tTabContentSpecifications
Anyway, the cost of the hardware is almost zero compared to the other costs, except is you use SAS or SSD drives.

Mini ITX? by Midnight+Thunder · 2010-07-28 06:03 · Score: 1

Instead of a medium number of large systems, I wonder whether it would make more sense to have a larger number of mini-itx type units that could be:
- easily replaced
- put in stand-by when no access - smart load balancer would decide when to wake up sleeping units.
- simplified cooling?

It would also be nice for a universal back-plane design to support plugging in boards from any company, with minimal or zero cabling.

--
Jumpstart the tartan drive.

Fire Extinguishers by Ukab+the+Great · 2010-07-28 06:03 · Score: 1

Billions of dollars are also wasted every year in the manufacturing and transporting of fire extinguishers, 99% of which will probably never be used.

the truth doesn't take up much space by Anonymous Coward · 2010-07-28 06:04 · Score: 0

explosion(s) in the straits of hormuz. anybody who's read either the bible, or playboy should be able to cipher out what that means. there was even a sci-fi book/movie about it.

No... by rickb928 · 2010-07-28 06:11 · Score: 2, Interesting

"It's a bit of a paradox. Users don't seem to be willing to spend the money to see what they have,"

I think he meant users don't seem willing to spend the money to MANAGE what they have.

As many have pointed out, you need 'excess' capacity to avoid failing for unusual or unexpected processes. How often has the DBA team asked for a copy of a database? And when that file is a substantial portion of storage on a volume, woopsie, out of space messages can happen. Of course they should be copying it to a non-production volume. Mistakes happen. Having a spare TB of space means never having to say 'you're sorry'.

Aside from the obvious problems of keeping volumes too low on free space, there was a time when you could recover deleted files. Too little free space pretty much guarantees you won't be recovering deleted files much older than, sometimes, 15 minutes ago. In the old days, NetWare servers would let you recover anything not overwritten. I saved users from file deletions over the span of YEARS, in those halcyon days when storage became relatively cheap and a small office server could never fill a 120MB array. Those days are gone, but without free space, recovery is futile, even over the span of a week. Windows servers, of course, present greater challenges.

'Online' backups rely on delta files or some other scheme that involves either duplicating a file so it can be written intact, or saving changes so they can be rolled in after the process. More free space here means you actually get the backup to complete. Not wasted space at all.

Many of the SANs I've had the pleasure of working with had largely poor management implementations. Trying to manage dynamic volumes and overcommits had to wait for Microsoft to get its act together. Linux had a small lead in this, but unless your SAN lets you do automatic allocation and volume expansion, you might as well instrument the server and use SNMP to warn you of volume space, and be prepared for the nighttime alerts. Does your SAN allow you to let it increase volume space based on low free space, and then reclaim it later when the free space exceeds threshold? Do you get this for less than six figures? Seven? I don't know, I've been blessed with not having to do SAN management for about 5 years. I sleep much better, thanks.

Free space is precisely like empty parking lots. When business picks up, the lot is full. This is good.

--
deleting the extra space after periods so i can stay relevant, yeah.

Re:No... by slinches · 2010-07-28 06:57 · Score: 1

If

Having a spare TB of space means never having to say 'you're sorry'.
and
"Love means never having to say you're sorry"
Then
Love means having a spare TB of space?

--
Knowledge Brings Fear
Re:No... by Wolfraider · 2010-07-28 12:55 · Score: 0

Hitachi actually has this. You can create a large thin provisioned volume on a dynamic pool and Hitachi can grow and shrink it as needed. It's true that the OS will only see the large volume but it's a start in the right direction to reclaim extra space when it's not used.
Also, we just purchased a 50TB Hitachi SAN with Fibrechannel and iSCSI for only $110,000. Enterprise can be had for less but that is educational cost also.
Re:No... by Chimel31 · 2010-07-29 10:54 · Score: 1

You can't rely on the OS and extra storage space to fully restore deleted files, the OS can reallocate that space at any time.
It's just pure luck each time if you can, although I agree extra space increases your chances.
You should rely on your backups and maybe custom scripts to trap all file delete requests at low level instead, but I don't even know if that's feasible. That would be totally rad!

How much does under capacity cost? by houghi · 2010-07-28 06:18 · Score: 1

What is the cost if you have 1% of shortage on your capacity? I am sure it will be more then what you pay for over capacity.

--
Don't fight for your country, if your country does not fight for you.

Re:How much does under capacity cost? by Chimel31 · 2010-07-29 10:58 · Score: 1

The article's focus is more about datacenters reducing their unused space from say 40% to 20% for petabytes of storage.
You're dead if you let unused space down to 5%, let alone 1%. Even home users get red flags when their disks go below 10% of free space.

HD Size by Anonymous Coward · 2010-07-28 06:29 · Score: 0

What trout. I suspect a large amount of this 'wastage' is due to the fact that the smallest HD's available are into the 100's of GB *.

Many dedicated server users do not waste the space but simply never needed it in the first place. Applications that need a dedicated server do not necessary need the storage that comes with it.

* Currently 160GB on an entry level Dell Server

Re:HD Size by bat21 · 2010-07-29 05:23 · Score: 1

Perhaps for cheap sata drives. SAS drives are routinely available in 73GB capacities.

Re:100 TB for $1,000,000? No way! by gencha · 2010-07-28 06:35 · Score: 1

You forgot to pay the executive.

Turn it up to 11 by Tisha_AH · 2010-07-28 06:38 · Score: 1

Unlike in the movie "This is Spinal Tap" there is not an 11 on the volume control for storage capacity in a data center. We will not see proud proclamations from boards of directors "today we are running our data storage at 115% of capacity!"

Having been in the predicament many times of frantically trying to ration out disk storage space for some critical application at 3 AM Sunday morning I think that running data centers at 80-90% is being conservative and may save your ass the next time you cannot get into your data center due to some sort of natural disaster like a hurricane (remember the data center in New Orleans a few years ago?)

Storage space does cost money, when we are looking at terabytes (petabytes anyone?) of storage there does need to be some cost factor calculations. In the telco world we do a similar exercise with Erlang calculations and blocking probability for data circuits. I would rather that the cut-off point between enough or too much storage capacity be made by well informed engineers rather than some clueless MBA looking for a feather in their hat.

--
Tisha Hayes

Re:100 TB for $1,000,000? No way! by epiphani · 2010-07-28 06:38 · Score: 1

100TB for a million dollars is about right when you start looking at enterprise storage solutions, such as Netapp or EMC.

--
.

takeaswag by Anonymous Coward · 2010-07-28 06:48 · Score: 0

Alright, I'm not running a bunch of petabytes in a big datacenter right now, but I've been doing this for a really really long time. Hasn't the rule of thumb always been to have a MINIMUM of 25% capacity free? Everyone has always been much more comfotable with 50% free. This old school rule of thumb applies at any scale, megabyte to petabyte, doesn't it?

Re:100 TB for $1,000,000? No way! by Anonymous Coward · 2010-07-28 06:59 · Score: 0

$130 is for a TB of fast SATA disks. I just had to price out a 6TB Symmetrix VMax SAN w/ EMC (4TB 15K RPM Fibre Channel, 2TB SSD), and the price was $340,000

Minded, the above cost includes the storage controllers, PDU's, rack cabinets, etc. Additional TB's will run us in the neighborhood of $8K-12K (depending on whether we go 15K RPM or SSD)

Spending a million on 100TB of storage for the enterprise is very easily doable, even if you go with slower 10K RPM Fibre Channel disks.

IO'/second count matters, too by natoochtoniket · 2010-07-28 07:03 · Score: 4, Insightful

There are two numbers that matter for storage systems. One is the raw number of gigabytes that can be stored. The other is the number of IO's that can be performed in a second. The first limits the size of the collected data. The second limits how many new transactions can be processed per time period. That, in turn, determines how many pennies we can accept from our customers during a busy hour.

We size our systems to hit performance targets that are set in terms of transactions per second, not just gigabytes. Using round numbers, if a disk model can do 1000 IO/second, and we need 10,000 IO/second for a particular table, then we need at least 10 disks for that table (not counting mirrors). We often use the smallest disks we can buy, because we don't need the extra gigs. If the data volume doesn't ever fill up the gigabyte capacity of the disks, that's ok. Whenever the system uses all of the available IO's-per-second, we think about adding more disks.

Occasionally a new SA doesn't understand this, sees a bunch of "empty" space in a subsystem, and configures something to use that space. When that happens, we then have to scramble, as the problem is not usually discovered until the next busy day.

Re:IO'/second count matters, too by hibiki_r · 2010-07-28 07:41 · Score: 1

And that's not even the whole picture: When dealing with databases, not all IO operations are equal. Reading a million records on a sequential scan in a certain part of the disk is different than reading them on a different part of the disk, or reading said records in a random order.
Large amounts of empty space are just the nature of data warehousing, and there's no way to go around that. In some cases, the RAM expense is even higher than the expense on disk, because for cases where a lot of throughput is needed, sometimes you are better off giving up on the disk array and relying on RAM to make your logical IOs faster.
Re:IO'/second count matters, too by Chimel31 · 2010-07-29 11:11 · Score: 1

Hopefully SSD disks will relieve RAM, once the market becomes mature and cheaper at even higher densities.
$3000 for a 1TB SATA II drive is a little bit excessive... ^-^

Re:100 TB for $1,000,000? No way! by spazimodo · 2010-07-28 07:19 · Score: 2, Insightful

I'm not sure if you're trolling or not, but if you're serious did you happen to manage the storage for Microsoft's Sidekick servers?

A couple things wrong with your assumptions:
1) 1TB drives might be great for storing your goat porn collection, but on a server with actual load, how many of those drives do you need to get adequate IOPS? Also exactly 100 of them means no RAID, but that's OK because drives from Newegg never fail so your 100TB of data should be fine.
2) You seem to have left controllers out of your list. Anyone who's ever had a RAID controller start barfing garbage all over a LUN, or take out a second drive after a drive failure will tell you the controller is the really critical bit (and is usually a single point of failure in systems with DAS.)
3) Where's your backup hardware? Where's space for snapshots? Where's space for replication?
4) Ever time a RAID5 rebuild on say a 9 drive LUN with 1TB SATA disks?

Storage is expensive because the data on it has value and making sure that data is available and isn't lost or corrupted costs money. Cheap storage solutions don't end up that way when the drives have to go to OnTrack for recovery and the company's down for a week, or valuable data is lost.

--

Fsck the millennium, we want it now.
Millennium Crisis Line: 0890 900 2000 [calls cost 50p/min]

Re:100 TB for $12,000? Backblaze pod! by Anonymous Coward · 2010-07-28 07:31 · Score: 0

If you don't mind a bit of DIY there is the Backblaze pod:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

It is a 4U disk server that holds 45 disks. They have made the chassis design available from Protocase.

I suppose you could call this a SAN server but it really is just a bunch of cheap storage. As has been commented earlier, in a data center multiple disks are often bought for performance not space. You gain performance by having multiple sets of heads moving at the same time. RAID cache helps this but does not eliminate it.

Not CYA, but optimal cost/benefit by marcosdumay · 2010-07-28 08:03 · Score: 1

Did you factor in how expensive is it to change storage size, and the costs of failing to change it? Also, there is the cost of adding some storage that isn't compatible to the first chunk. The amount you pay for oversized storage normaly isn't even on the same order of magnitude of all of those.

--
Rethinking email

anyone pay close attention? by nimbius · 2010-07-28 08:10 · Score: 1

Aptare's latest version of reporting software, StorageConsole 8, costs about $30,000 to $40,000 for small companies, $75,000 to $80,000 for midsize firms, and just over $250,000 for large enterprises. "Our customers can see a return on the price of the software typically in about six months through better utilization rates and preventing the unnecessary purchase of storage," Clark said. just another industry slashvertisement. nothing to see here that we didnt know about already. please move along.

--
Good people go to bed earlier.

Re:100 TB for $1,000,000? No way! by Domint · 2010-07-28 08:12 · Score: 3, Insightful

Most SAN administrators wouldn't be caught dead using your $130 1TB drives. Rerunning your calculations with 15K 450GB SAS drives (around $300 bucks), and you're spending quite a bit more: 228 drives will give you 100TB, sure, but we'd want some redundancy . . . say RAID 5 (not the best approach for SAN design, but let's keep it simple) which pushes the drive count up to 304 with a total cost of $91,200, just for disks. To get a real, enterprise enclosure (or rather, cluster of enclosures considering the drive count) that offers things like FiberChannel, 10Gb iSCSI, or InfiniBand uplinks, and features such as SAN to SAN replication, bit deduplification, and other enterprise-level utilities/features, I'd say you're looking at $500,000 (ballpark guess) just to have something to stick the drives into. We're at ~$600,000 without even taking into account the physical costs of operation, datacenter architecture, or labor costs to maintain such a SAN.

Suddenly, that $1 million isn't so far fetched, eh?

Cost/Delay of "Precise" Study vs. Cost of Hardware by billstewart · 2010-07-28 08:25 · Score: 1

I've got "Precise" in quotes because I'm skeptical that you can ever get really good predictions about the future, or even the present, especially from users. But if you try, it's going to take you a while, and you'll be spending loaded-engineer-salary time and harassed-user time trying to get predictions that the users will tell you are guesses. Meanwhile, you do need to get some disks online, and it's going to take you a while to accumulate tracking data. I'm in that kind of situation now - until there's enough disk and users on the system to get a really good model of users, we won't really know, so we're aiming high.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:sounds like the consultants are having a slow y by evilviper · 2010-07-28 08:29 · Score: 1

it's almost like the DR consultants who say we need to spend a fortune on a DR site in case a nuclear bomb goes off and we need to run the business from 100 miles away.

Flood, earthquake, hurricane (yes, possible even in New York), sink hole, etc.

Are you really going to go primeval when any one of those things happens?

First thing, of course you're going to find out if your family is fine. Assuming so, then what? Not only has their home been destroyed, but your job is gone too, so you'll now be dependant on insurance (notoriously unwilling or unable to pay after disasters) and handouts.

Not that you should be spending billions on off-site data storage and redundant systems, but a large company being completely unable to survive the loss of a single building/office is quite short-sighted, even if it happens to cost some money up-front.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

SAN vs. NAS disk performance, operations by billstewart · 2010-07-28 08:39 · Score: 1

Unfortunately, my company is sufficiently large and bureaucratic that equipment standards are often made by people who don't know the applications :-) The bureaucrats like SAN arrays because they're blazingly fast, and because they're easy to administer, back up, plan for storage growth, etc. And $8000/TB is really just fine if your idea of "huge" storage is a TB or two.

I've got an application that needs to do a bit of fast-IOPS logging (so the overpriced SAS drives and SAN array are fine), but needs lots of bulk storage that doesn't need blazing fast access, but does need to be on disks as opposed to tape. I probably need 10TB to start with, growing to 20+ as we get more customers.

It's obviously a job for NAS - Network Attached Storage, the kind of stuff Netapp used to sell (presumably still does), which wraps a certain amount of framework around the big cheap drives, so you still get the operational benefits and manageability, at a cost that's maybe twice what the cheap raw drives cost. Lacking that, I'd been thinking about stuffing a server full of 2TB drives, but our IT folks not only don't like that because they don't have the manageability and easy upgrades, but they only allow little 300GB SAS drives (not even the new 600GB), and only support 2.5" drives, not 3.5", because that's more cost-effective or something. Sigh.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:sounds like the consultants are having a slow y by asc99c · 2010-07-28 08:41 · Score: 1

> in case a nuclear bomb goes off

Or even more far-fetched, someone brings in a fan heater from home, forgets to switch it off one evening, some paper blows into the elements and sets on fire, and it burns down the building.

Keeping an off-site backup is not a ridiculous idea in itself. Could the business survive if the office burned down and all servers and data was lost? Maybe if employees are allowed to take data home, most stuff could be pieced back together, but even then it would be a substantial amount of work. But as with TFA, it's not something to spend a massive amount of money on. Where I work, all projects should have a daily backup to a central server (just simple batch script / shell script / version control system), and that has an off-site backup, which as far as I'm aware just means one of the admins swaps the hotswap bays and takes the discs home on a weekly basis. Total cost is about 5 minutes a week to swap the discs; the hotswap hardware itself, and a few extra discs is well under £1000. Everything else is no different from what we'd need to do for regular backups anyway.

Even for my own data, e.g. holiday photos, every so often I make sure to put it all on a removable hard disc and copy it onto my work PC. I'd certainly consider it worthy of a disaster recovery solution, given that it's so very easy to do.

Re:100 TB for $1,000,000? No way! by djdanlib · 2010-07-28 08:44 · Score: 1

Nice job! I applaud this effort to put wild claims into perspective.

Corporate accounting == magic, for some perverse value of magic. Every organization that touches that datacenter wants to claim some cost or another on their balance sheet, so they can be seen spending money in such an unarguably useful (and permanently reoccurring) way and ensure they get more next year. The money should only get spent once, but money circulates around from one department to another and might actually get "spent" several times before it moves to some other part of the corporate bloodstream. Meanwhile, the datacenter probably has no budget of its own.

This happens more often than not in corporations large enough to set up a 100 TB farm. I'd love to work at that fictional company, because $75k is a pretty nice salary... more than I make right now doing similar work! I don't have that sort of farm where I work, though.

Your biggest datacenter costs will probably come from networking hardware, software licensing and support contracts. Database servers are particularly odious in this respect. $25k per socket for MS SQL Server, for one - if you have two servers with dual CPUs, that's $100k right away just for the database software. Probably more for Oracle. Usually it's some proprietary software dependency that locks you into a specific DB vendor. Then there's the OS licensing. You can spend $2k or more on Windows Server, and adding seats for Terminal Services gets expensive fast. What if you want Exchange, which is a big reason for having lots of spindles? Blackberry? Nice Cisco routers? Telecom equipment and vendor support from the telco? Ka-ching!!! Then there's the applications your datacenter supports. Proprietary software from other vendors will have large support contract bills which can rise to the tune of $1M or more. When you have vendor lock-in because all the organizations who touch your datacenter use a certain brand of product, you're kinda stuck with that bill.

So, if you're running an all-Linux, all-open-source datacenter, there's no licensing/support fees. BUT, good luck selling that idea! Management wants a finger to point and someone at the other end of that finger to say "yes sir, we're sorry, we're fixing it right now and it will be done in the terms of our SLA", and the datacenter employees want that to be someone other than them!

Don't forget about performance... by jonxor · 2010-07-28 08:55 · Score: 1

I would look more to the issue of drive performance as the main cause of this. What happens to a spinning platter hard drive, when it has to read data from half-way in the disc, rather than from the outside? Performance drops anywhere from 45 to 70% once you have your hard drives filled half-way. Naturally, in order to keep high performance on real-time critical data, you have to get higher density platters, and only use the outside of them. Unfortunate as it is, I think this is more of a cause for un-used storage space than under-prescribing.

Re:Don't forget about performance... by ryty · 2010-07-28 11:23 · Score: 1

Agreed.

--
if you were me, you'd think the same way

Storage and Planning - What a Concept! by jrouleau · 2010-07-28 08:56 · Score: 1

I have read the article (gasp!) and read alot of the comments here that get into the minutia of the I/O's, read writes, CYA, etc. - Bottom Line - If a company is going to invest in any storage solution, if no one has a baseline that allows for properly sizing a storage solution, it all becomes a crap shoot. What I mean when I say planning is - knowing what you currently have (total space (used/free space)), having a growth curve or trend that shows how much actual DATA growth is experienced year over year (40%, 50%, etc), what the requirements of existing systems are (since MOST new Operating Systems and programs do not significantly expand the intial disk usage requirements over time), and putting that together with the thought of an actual DR (ability to recover). So if a storage solution is properly planned you can cost estimate what you need for how long you believe you can live with said equipment (aka - solution needs to last 8 years requires x space with x backup solution), if not then you deserve to be fired. Management and adminisration at that point become less relevent because you know your preared for everything you tried to take into account before the solution even showed up. In addition, most companies have a pretty defined structure to how and where the data lives, and most good netadmins are watching it like a hawk anyway (scripts or some existing monitoring solution). Lastly, the article does not account that some free or wasted space is REQUIRED, as almost all OS's like to have free space available for swap space and/or other hidden system functions and when you do not have adequate free (aka wasted) space those programs and OS's tend to run like crap (no matter the hardware). So from everything I can gather from this article and not taking into account that free space is usually required (disagree with me get your free space to less than 20% on windows and watch the performance go downhill as that free number moves lower) is really a - move along nothing to see here except someone trying to sell me an overpriced solution i dont need or want at this time. Nothing more than some FUD

Re:Storage and Planning - What a Concept! by Chimel31 · 2010-07-29 11:19 · Score: 1

That's not how I read the article, it describes datacenters with 40-60% of unused space, I think the goal if to reduce this to something like 20%.
If you manage 10PB at 50% usage, you can reduce it to 6PB at 20% usage, thus saving $40M (at $1M/100TB.)

Re:100 TB for $1,000,000? No way! by Anarke_Incarnate · 2010-07-28 09:01 · Score: 2, Insightful

You lost something along the way. When you are doing RAID 5 on an enterprise array, you are likely using 5+1 sets. Your 304 drives does not take into account losing 2 drive capacity every 6 drives. You can get away with global hot sparing, but that doesn't cover your ass as much. You would need 342 drives.

Man hours more expensive than hardware. by MikeFM · 2010-07-28 09:06 · Score: 2, Insightful

We do use thin provisioning, and virtualization in general, but I agree that there is benefit to keeping utilization low. We try to keep more space than we could possibly need both because it can sometimes surprise you by growing quickly and because the drives are faster if the data is spread across multiple drives. Also SSD drives sometimes live longer if not fully utilized, because they can distribute the wear and tear, so we usually leave 20% unformatted.

Downtime and slow systems are much more expensive than wasted drive space.

--
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.

Re:100 TB for $1,000,000? No way! by Anonymous Coward · 2010-07-28 09:24 · Score: 0

Working for a storage vendor, I would say even your numbers are low. Fat chance getting a 1TB drive for less than $800! Why so much? Beats me. Maybe it's the really nice sleds they are mounted in? They are high grade plastic after all...

And storage vendors don't always tell the truth by swb · 2010-07-28 09:32 · Score: 1

Or at least the whole truth, quite often you find that to replicate your 8 TB volume really requires you to buy a SAN with 16 TB capacity on one end and 16 TB on the other with the "unused" space for replication overhead or whatever fancy SAN tricks you want play.

So while you wanted 16 TB of capacity, you actually buy 32+ much of what appears to be uncommitted.

Mismanagement or Ignorance by rnturn · 2010-07-28 09:49 · Score: 1

I can't be sure which one. On second thought, make that second one ``stupidity''. I still can't decide which one's really at work.

I attend a daily conference call where I hear case after case where people are waiting for SAN space to be reclaimed so that it can be reassigned to other systems. People are either being told their projects cannot proceed because there's not enough disk space or other projects have been held up while the space they've been allocated is scaled back to allow others to work.

I'm not sure what the storage team or (as I suspect) the clueless architects are doing when someone implementing a new application asks for space on the SAN but the procedure seems to be ``whatever the project says it needs, multiply that by ten (or more)''. During one conference call I heard that the project had been placed on hold while some of the disk space they'd been allocated was reclaimed for use on other projects. When I asked how much they were having to give up, I was told that to migrate an existing database from an environment where they were using less than 1TB of disk space they had been given 20TB on the new hardware environment. (Heh... Did I same some of the disk space? I should have said most.)

--
CUR ALLOC 20195.....5804M

Re:Mismanagement or Ignorance by Skal+Tura · 2010-07-28 10:15 · Score: 1

Stupid mistake starts with the idea of using DEDICATED RESOURCES on a SAN. SAN's one and of the few perks is central storage to minimize waste, therefore maximize return on the investment.
Then again, all SAN implementations i've seen yet in production has been idiotic, moronic and stupid (yes, i know ....)
Some of the implementations i've seen has only caused headaches, downtime, poor performance etc. with an insanely huge investment. Tell me again, what's the benefit of having a SAN when it makes the end-result worse and harder to maintain? Redundancy? Well, actually less redundant if you can live with minimum. 2 HDDs per server.

--
Pulsed Media Seedboxes

This is a good thing! by ixl · 2010-07-28 09:55 · Score: 1

One of the subtle benefits of the computer revolution is that it gives society the ability to be wasteful. Desktop computers that often aren't at 100% cpu utilization, and (local) networks that rarely see peak usage are also good signs, for the same reason. Hard drive capacities obviously aren't quite there yet, but they're getting closer. This might be yet another sign of the singularity.

100tb costing 1$ million? Yeah right... by Skal+Tura · 2010-07-28 10:11 · Score: 1

We happily manage about 75-90tb total storage across our servers (really, i've forgotten how much exactly is total across all different server types), with less than 1$ million. Way less than 1$ million.

Then again, when you go the expensive route (say EMC), it's certainly expensive. But if senses are stuck together, and only acquires features required, storage doesn't need to be expensive. There's plenty of ways of running servers with storage costing almost as little as consumer storage, when some thought has been put into cost-savings, while not sacrificing redundancy. That being said, expensive storage systems DO have their perks, but only at the enterprise level.

--
Pulsed Media Seedboxes

I wish I had that problem by dave562 · 2010-07-28 12:17 · Score: 1

We're growing so fast that we can barely keep up with the demand. Maybe I can run a few cross connects into my neighbor's cage and borrow some of their unused space?

Re:100 TB for $1,000,000? No way! by Score+Whore · 2010-07-28 18:04 · Score: 1

A hundred such disk drfives will set you back $13,000.

You calculated a lot of costs there, but you forgot the most important one:

The cost of your business imploding when a critical app falls over because your one hundred, 7200 RPM SATA spindles will only deliver, on a good day with the wind in their sails, 6,000 IOPS at less than 15 ms per op. Or a bad day when you experience multiple drive failures and your shoes are all squishy due to the piss that ran down your leg because you just noticed that your IOPS just dropped to 1,000/sec, your latency just went up to 100 ms and it's going to take over twenty-four hours to complete just one rebuild.

In the data center a few floors below my desk, we have a dozen database servers that average 6,000 IOPS all day long with peak times requiring 10,000 IOPS for several hours every day. Per server.

That's why it's not a hundred drives. It's a couple thousand. And they're not 1TB+ 7200 RPM SATA. They're 146 GB/300GB/400GB 15,000 RPM FC and SAS. And there's over a hundred GB of NVRAM. And everything is connected via multiple paths through two or more separate fabrics (i.e. at least two of these. Each of which will run you over $100k just to get on the floor. Plus annual support and maintenance costs.)

Regardless, it's not just moving bytes into and out of a hard drive. It's being able to replace failed components... as in every component, things like controllers, front end controllers & ports, back end controllers & ports, batteries, drive back planes, and hard drives. Being able to add physical capacity, new drives, ports, controllers, RAM, etc. It's having off site replication. Having defined RPO & RTO. Off host backups. All without service interruption or degradation.

The good news is is that you might spend $1,000,000 on keeping all that going, but you won't be paying out tens of thousands of dollars a minute in salary and hundreds of thousands of dollars in lost business when the home brew solution explodes. And if, heaven forbid, the worst happens and your data center happens to burn to the ground, your DR site will be able to come online in ten minutes and every committed transaction will be sitting there on the drives ready for your apps, customers and employees.

Data center shared storage? by hab136 · 2010-07-28 20:23 · Score: 1

Data centers already provide power and network - why don't they also provide shared storage?

The data center can buy several big mean SANs, and then provision storage sets to customers. Customers then only have to pay for what they use, can add space without installing new hardware, and get speed and reliability.

Re:100 TB for $1,000,000? No way! by Anonymous Coward · 2010-07-28 23:28 · Score: 0

What SAN vendor is selling 15K 450GB SAS drives for $300? We pay roughly $2,000 per drive from EMC. We need to switch to whoever you guys are using!

Money Cycle ot = Service Cycle by Anonymous Coward · 2010-07-29 03:58 · Score: 0

"Modern" monied business is financially and "book" driven. Real service, real results are a bystance, a fortuitous coincidence. Oly the serfs care about them as if they were real. The rest is loans, leverages, collaterals - and bonusses, lots of bonusses. When something goes "poof!", it's on to the next one. It's a 3rd world inflationary spiral artificial economy thing.

Re:100 TB for $1,000,000? No way! by bat21 · 2010-07-29 04:21 · Score: 1

What about controllers, trays, etc? What are these drives going into, servers with software raid? You can't just buy a bunch of 4U servers, stuff them full of drives and share them through iSCSI. That's how you run a porn site, not a data center. How do you handle drive failures, hot spares, etc?

Re:100 TB for $1,000,000? No way! by Anonymous Coward · 2010-07-29 05:48 · Score: 0

What about the services from the vendor, licenses for enabling storage features like replication that usually is charged for TB of data replicated, licenses for the management software, etc, etc, etc... I would + 100.000 to your ballpark guess :-)

Re:100 TB for $1,000,000? No way! by Chimel31 · 2010-07-29 10:08 · Score: 1

I used that backblaze analogy too, but after yours, @ano:
http://hardware.slashdot.org/comments.pl?sid=1735418&cid=33075574

It comes up to $45K/90TB tops, in just one 4U enclosure, management overhead and real estate included. ($50K/100TB)
To answer @spazimodo, the Taiwan-built Barracuda XT disks have low failure rates compared to the Seagate disks built in China, and you can combine them in RAID 6 or 60 with Adaptec controllers for $1-2K more. Why would you ever use RAID 5 anyway, that's insane. We're talking raw storage here, so backup hardware, snapshots, replication are exactly the same as the 1M/100TB raw space estimate. If you want real usable space data all costs included, get 2 such servers, let's say at $50K each with RAID 6 controllers and fiber optics cluster connection. That's 74TB of usable space, plus another 74TB on the backup clustered server, or $135K/100TB usable space.
Even a backblaze enclosure comprised exclusively of SSD disks for performance would cost only $72K per enclosure, hardware and 1st year costs included, about $640K/100TB. But I assume this level of performance is far above the $1M figure, assumedly for SAS disks, and which also probably spreads out the hardware cost onto 3 years, so it's more like $0.5M for pure SSD 100TB. It would really help if the article would detail that $1M cost.

SSD costs assuming 256GB disks @ $700 each. 1 backblaze 4U server would provide only 11TB of SSD raw space, 9TB of usable RAID 6 space.

Magic $1M formula by Chimel31 · 2010-07-29 11:40 · Score: 1

It's quite bad that the $1M / 100TB cost is not detailed in any way in this article, it makes all attempts of comparison futile and impossible.
Thankfully several commenters provided their input, some even mentioning billing $5M per 100TB to their lawyers customers.

My guess is that this $1M represents partially the costs of the hardware/software configurations in existing datacenters, with most of these possibly purchased 2 years ago.
2 years is a pretty long time in IT, with many technological and financial changes, such as SATA III, 2TB disks and SSD, the latter still being rather immature and expensive.

Given the relatively low cost of the hardware, it does make sense to implement a better aggregation/allocation infrastructure, and disconnect the unused servers, but keep them on hold to add extra storage in a moment's notice, or to scavenge spare hard disks. They won't cost a dime if they are plugged but powered off, it's the online storage and maintenance that costs $1M/100TB (if that much.)

Vmware by bsercombe72 · 2010-07-29 22:24 · Score: 1

Fault Tolerance - at least in the market leading VMware world requires thick provisioning.

Is it wasted space or CYA space? by Chili-71 · 2010-07-30 01:09 · Score: 1

We have a few 100 TBs of SAN storage in our hospital. Some people would consider having the storage laid out as RAID 10 on each array and the production data base software mirrored across 3 arrays to be wasteful. Not really. Recently we had one of the data centers shutdown when maintenance was doing a generator test and forgot to put the generator in by-pass. The entire data center went dark, but the Epic application continued to run because of the mirroring done between data centers.

It took some time to clean up the mess that was caused, but the data base never went down.

As to the third mirror, we use that for nightly backups. The database is frozen in time, the mirror snapped off and moved to a different server. We backup the data base to tape, refresh a SUP{port} instance, and run integrity checks on the mirror. Once everything has completed, we bring the mirror (storage) back over to the production server and resync the volumes.

All in all, I'd say we have an HA solution that is pretty much bullet proof. It takes a lot of storage, but it gets the job done. As to waste, very little is actually wasted.

Re:Is it wasted space or CYA space? by Chimel31 · 2010-07-30 07:13 · Score: 1

Mirror and backup is the way to go, not a waste.
The article talks about the waste that would be if each of your 100TB arrays were using only 20-50TB and the full 100TB wasn't planned to be reached before, say 5 years.

IOps first Capacity second by binaryspiral · 2010-07-31 00:43 · Score: 1

In the world of storage area networks you must design too support the IO load first and capacity will typically never be an issue - in tier one and two storage.

With cache cards and ssd becoming cheaper this rule is changing but for many SANs they have wasted space only because they needed more spindles to support the load.

Slashdot Mirror

Data Storage Capacity Mostly Wasted In Data Center

165 comments