Costs Associated with the Storage of Terabytes?
NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"
Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?
--
Twoflower
the new 320 gigabyte harddrives previously mentioned. And you divide 50000 (50TB) gigs by 320. you get an approximate cost of having 50TB by multiplying that by 350$ the appoximate cost of the drive. However, with that much data a RAID is certaintly in order. So multiply the number of drives by 1.5 or 1.75 to get the number of drives needed for a RAID. Then multiply that by 350. This comes out to a little over 80000 dollars. The only cost left is the cost of all the raid controllers (expensive) and networking all the drives together. So for the raw storage of 50 terabytes it costs about $80,000. If you were to buy ultrafast scsi drives instead of the 320GB drives the price will be multiplied by about 3 since a 100MB super fast scsi drive is also about 300$ with 1/3 of the space. So that brings it to $240,000. Add to that the cost of labor and all the other hardware and I don't see how it could come out to more than 1 million dollars. I'm not an expert, but just doing the math it seems that more than that is too much.
The GeekNights podcast is going strong. Listen!
sorry for sounding a bit trollish, but the current replies here seem to follow the formula of checking the biggest ide drive on pricewatch and multipying that out to give you a number.
:)
forget all that.
if all you wanted was a pile of ide hard drives, maybe this would be ok, but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. large scale storage systems need to manage multiple host access to high speed (15krpm U3SCSI) drives in flexible raid configurations with maximum redundancy, high speed caching (with GBs of RAM to do it), fiber channel switching, cross platform capability, high end management and monitoring, HSM backup and data migration, offsite vaulting of disaster recovery data, power and air conditioning, and a fat service contract from the vendor. none of the above are going to be found at pricewatch.com.
your best bet is to talk to multiple storage vendors about your needs. call up EMC, Hitachi, IBM, and Fujitsu to start, them let them see each other's numbers. With the amount of money that you are going to spend (and it almost certainly will exceed $10 mil - but maybe not $20), each of these vendors will do backflips to get your business (and EMC is particularly good at junkets - take them for all they're worth
Get a clue man.
Where is your failover?
How are you going to connect this disks together? NFS? Samba? That kind of speed (or lack of) is not an enterprise storage solution.
How do you replace disks as they fail without taking stuff offline?
Conformity is the jailer of freedom and enemy of growth. -JFK
From experience (with EMC - Sun) your price tag sounds a bit on the high side, but not by very much. Considering that EMC storage (after all mission critical data should be stored on EMC/Hitachi/StorageTek, NOT on consumer IDE) costs much more than consumer IDE/SCSI (25 - 75x) and that's only the disks.
If you're going with EMC, you'll need to put those disks in something, like a frame (cabinet), and for your size, more like 5 cabinets. With that many cabinets, you'll need some sort of SAN switch and associated fibre cables (not cheap). That gets your disks into cabinets and all hooked together.
You wanted to access the data? Then you'll need EMC fibre channel cards ($15k a pop for the Sun 64bit PCI high end jobs). But you'll more than likely be serving data from a cluster of machines, so count on buying three ($45k) per machine (so each card is on a different I/O board hitting the SAN switch, redundancy)
Who's going to set this up? For that kind of coin, EMC (or whomever you go with) will more than likely set the thing up and burn it in for you on site. The price probably also includes some kind of maintenance contract with turn around time fitting the criticality of the system.
Yes, my 'big ass storage' experience may be limited , but I think that 20Million for 50TB installed/supported/tested by a big storage vendor is in the ballpark.
Good luck.
Floppies. Lots and lots of floppies. They are so cheap right now! And the come in pretty colors too.
That 100k was a joke right? We have 4 2tb SANS where I am and I can tell you that any 2 of them would eclipse your guess. Lets not get into the shelf disks, the extract fabrics, the Raid eating some of your space. Opss did I forget the support contract, the ups the size of a cubical, and a libert air conditioner to cool this room full of spinning drives? Wait minute, your going to need full redundant backups for all this shit, the Gbic switches to controll access, the rack space, and all the fiber hba cards for the servers.(unless you go coper).
Then you want to back this up? Break out your checkbook again for a Compaq minilibary if your lucky, that is only 10 tapes x 80gig a tape...800gig..and that is if your really doing well. So put that on top of it all 10x10X80 gives you 8 TB of backup at around 30k each for the minilibs, the price just keeps on jumpin!
No way, no how, not today or tomorrow. 100k will get you a floor full of 120gig maxtor drives and that is about it.
Neck_of_the_Woods
#/usr/local/surf/glassy/overhead
...I realize that accepted pricing is well above the price I mentioned. And yes, obviously I left out the maintenance.
The problem is that I find that corporate spending on IT purchases has gotten ridiculous. Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good! Let's buy a $2k piece of software for each workstation even though there's a free alternative!
I'm not saying that anyone *provides* something in the price range I was talking about. No one is crazy enough to do so, if companies are willing to pay much, much more. I'm saying that, if you're asking whether it's possible to *build* something like this for the price range I mentioned, off the cuff it doesn't sound so unreasonable.
Yes, a seasoned IT person who works with high-end systems like this will laugh. Why? Because they're used to paying huge amounts of money. Because it's an accepted part of the culture to throw down this much cash. What I want to know is -- how often do people question these basics? How often has someone said "Wait a minute...this is wrong."
Are you telling me that if you were in a third world country without the exorbant amount of funding that we USians enjoy, and someone asked you to put together a 50TB storage system for under $1M, you'd simply say "It can't be done"? No consideration, nothing?
I mean, when I look at the fact that the *case* on, say, a Sun high end system costs more than a whole cluster of workstations, I start to wonder just how much excess is going on here.
Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes. Throw RAID on them configured so that 1/3 of your data is overhead for reliability, and a 100Mbps Ethernet card in each. The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system. You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.
You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.
Maybe another few systems up near the head of the array to do caching and speed things up, and you still aren't even up to $150K, and you have failover (at least for each one-drive-in-three) group.
I haven't looked at this -- it might be smarter, since you'd want to do this hierarchically, to have caches existing within the hierarchy, or maybe Gbit Ethernet at the top level of the hierarchy. And obviously, this may not meet your needs. But as for whether it's possible to build something like this for that much money? Sure, I'd say so.
Finally, existing SANS or any sort of network-attached storage are overpriced, no two ways about it. Very, very healthy profit margins there. Sooner or later, someone is going to start underselling the big IT "corporate solution providers" and is going to kill them unless they trim margins by quite a bit.
May we never see th