Costs Associated with the Storage of Terabytes?
NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"
Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?
--
Twoflower
It's more involved that how many bytes you need to store, of course. How fast do they come in and go out? How often do the bits turn over? How reliable does the data need to be, and how fresh the reliability (do you need to mirror it real-time at a remote, hardened site, or back it up once a month)? What systems does the data need to feed and be fed from? What are your labor costs (tape changers, administrators, etc.)? How much wood do you need to buy for office furniture ?
sorry for sounding a bit trollish, but the current replies here seem to follow the formula of checking the biggest ide drive on pricewatch and multipying that out to give you a number.
:)
forget all that.
if all you wanted was a pile of ide hard drives, maybe this would be ok, but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. large scale storage systems need to manage multiple host access to high speed (15krpm U3SCSI) drives in flexible raid configurations with maximum redundancy, high speed caching (with GBs of RAM to do it), fiber channel switching, cross platform capability, high end management and monitoring, HSM backup and data migration, offsite vaulting of disaster recovery data, power and air conditioning, and a fat service contract from the vendor. none of the above are going to be found at pricewatch.com.
your best bet is to talk to multiple storage vendors about your needs. call up EMC, Hitachi, IBM, and Fujitsu to start, them let them see each other's numbers. With the amount of money that you are going to spend (and it almost certainly will exceed $10 mil - but maybe not $20), each of these vendors will do backflips to get your business (and EMC is particularly good at junkets - take them for all they're worth
Get a clue man.
Where is your failover?
How are you going to connect this disks together? NFS? Samba? That kind of speed (or lack of) is not an enterprise storage solution.
How do you replace disks as they fail without taking stuff offline?
Conformity is the jailer of freedom and enemy of growth. -JFK
i don't know nearly enough to put such a thing together, but i do know enough to know that every real-world project probably costs 50x what a geek-fantasy basement equivalent would cost.
What's this have to do with managers? Why don't you sell these systems? I don't, because I don't know what is takes to build them.
How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?
What do you use at the head end that can handle this sort of throughput? How do you back it up? How do you search it?
What filesystems do you use that support 50TB?
How do you manage the hot swap aspects?
There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.
Joe
Joe Batt Solid Design
I'm not sure what the prices are running these days, but back in 1999 I put together a 6TB system running RAID 5 on an all fibre-channel system using (at the time FC hubs -- switch fabric was too immature) StorageTek (aka Clariion) arrays for right around $2.5M.
Keep in mind, that's just for the disks, array controllers/cabinets, hubs, and Sun FC cards. No servers are included in that price.
There are so many variables that you didn't go into that it's hard to give you an educated answer to your question, but it seems feasible to get to around 50TB today for that kind of money taking into account the increased storage density that we've gotten in the last couple of years.
Yep your a "--Turkey" all right got about the same size brain if you think thats a viable solution...
The EMC boxes(or anyone else for that matter) have a significant amount of configuration associated with connecting the drives. You cant just open the Box up and start sticking in drives and expect it to work. For that matter, in many cases if the drives are not the ones rated for use in the box you can destroy the backplane of the machine. The power supplies, the drives themselves, etc...Power and heat are huge issues in these boxes...think of the heat the average hard drive throws off now put 100+ in a box the size of the average home refrigerator...
Then there are configuration issues, you need the software and the technical know how to write the configuration files these machine use to tell the multiple drives to act as one or many logical drives.
Then how do you connect the system(s) that will use the box up. These are all delicate issues.
If you buy a box off Ebay you will absolutely need someone working for you who knows the product inside and out(or at least on a retainer contract with 24x7 support clauses)...and you should immediately make a phone call to the proper support phone number to get the thing on a support contract...Trained EMC professionals don't come cheap, but they are worth every penny, I would assume that other companies its the same story, but I only use EMC so I don't know...
Buy EMC its really the only long term option, I have seen one of these boxes get knocked over on its side(no small task) while it was running, and just keep going with out a hitch...thats a well engineered product....
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
You've made a number of assumptions none of them good. One assumption is that the performance of a 5400 rpm ide drive (thats all the 320Gb drives are) would be acceptable for an application like this. It won't. You'd want 15000 rpm scsi-3 drives at a minimum, and you'd want them hotswappable. Figure a grand each for 140Gb drives.. in bulk Then there are a large number of other factors mentioned by others here. Raid controllers, servers to house it all, switching, cabling, racks etc.
What about power? and cooling? Ever cost out one of those huge liebert internal cooling systems? Don't forget you need 2 of them? What about the power.. you'll need huge UPS's for something like this.
How about backups? You'll need to be able to back this all up.. and transport the data offsite in a timely manner. Thats ALOT of DLT tapes, not to mention the costs of the tape libraries, drives, off-site storage facilities (perhaps you'd like to keep all of thos tapes in a locker at the space place? ) etc involved .
Now.. how are you going to access this? with 500 partitions? or perhaps you want some more sophisticated storage management software?
What about support? Are you going to accept responsibility for mainting this thing? or are you.. like most businesses going to want 24x7x4 support? Since support on products like this often involves flying an engineer in from out of state.. on almost no notice.. its not cheap.
The reality of this is that for that kind of storage you need a SAN and that means big dollars. The 2 most commons SANS are EMC (which I'd bet was what this estimate was for) and Compaq storage works. EMC is the more mature solution, but also MUCH more expensive. They often outpace Compaq and the other vendors who make similar products by %300 or more.
Is $20M too much?.. probably. Is any solution involving a room full of servers loaded with commodity IDE drives acceptible.. absolutely not.
Better to shop other EMC vendors, and other SAN solutions and make the best deal on the right product.
If privacy had a tombstone it would read "We did it for your own good" . -- John Twelve Hawks
...I realize that accepted pricing is well above the price I mentioned. And yes, obviously I left out the maintenance.
The problem is that I find that corporate spending on IT purchases has gotten ridiculous. Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good! Let's buy a $2k piece of software for each workstation even though there's a free alternative!
I'm not saying that anyone *provides* something in the price range I was talking about. No one is crazy enough to do so, if companies are willing to pay much, much more. I'm saying that, if you're asking whether it's possible to *build* something like this for the price range I mentioned, off the cuff it doesn't sound so unreasonable.
Yes, a seasoned IT person who works with high-end systems like this will laugh. Why? Because they're used to paying huge amounts of money. Because it's an accepted part of the culture to throw down this much cash. What I want to know is -- how often do people question these basics? How often has someone said "Wait a minute...this is wrong."
Are you telling me that if you were in a third world country without the exorbant amount of funding that we USians enjoy, and someone asked you to put together a 50TB storage system for under $1M, you'd simply say "It can't be done"? No consideration, nothing?
I mean, when I look at the fact that the *case* on, say, a Sun high end system costs more than a whole cluster of workstations, I start to wonder just how much excess is going on here.
Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes. Throw RAID on them configured so that 1/3 of your data is overhead for reliability, and a 100Mbps Ethernet card in each. The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system. You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.
You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.
Maybe another few systems up near the head of the array to do caching and speed things up, and you still aren't even up to $150K, and you have failover (at least for each one-drive-in-three) group.
I haven't looked at this -- it might be smarter, since you'd want to do this hierarchically, to have caches existing within the hierarchy, or maybe Gbit Ethernet at the top level of the hierarchy. And obviously, this may not meet your needs. But as for whether it's possible to build something like this for that much money? Sure, I'd say so.
Finally, existing SANS or any sort of network-attached storage are overpriced, no two ways about it. Very, very healthy profit margins there. Sooner or later, someone is going to start underselling the big IT "corporate solution providers" and is going to kill them unless they trim margins by quite a bit.
May we never see th