Costs Associated with the Storage of Terabytes?
NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"
From experience (with EMC - Sun) your price tag sounds a bit on the high side, but not by very much. Considering that EMC storage (after all mission critical data should be stored on EMC/Hitachi/StorageTek, NOT on consumer IDE) costs much more than consumer IDE/SCSI (25 - 75x) and that's only the disks.
If you're going with EMC, you'll need to put those disks in something, like a frame (cabinet), and for your size, more like 5 cabinets. With that many cabinets, you'll need some sort of SAN switch and associated fibre cables (not cheap). That gets your disks into cabinets and all hooked together.
You wanted to access the data? Then you'll need EMC fibre channel cards ($15k a pop for the Sun 64bit PCI high end jobs). But you'll more than likely be serving data from a cluster of machines, so count on buying three ($45k) per machine (so each card is on a different I/O board hitting the SAN switch, redundancy)
Who's going to set this up? For that kind of coin, EMC (or whomever you go with) will more than likely set the thing up and burn it in for you on site. The price probably also includes some kind of maintenance contract with turn around time fitting the criticality of the system.
Yes, my 'big ass storage' experience may be limited , but I think that 20Million for 50TB installed/supported/tested by a big storage vendor is in the ballpark.
Good luck.
The trend is to use iSCSI on the network side and IDE on the hardware side. Since a network file
server only has FS daemons doing I/O, and the drives
are always hot, there is no SCSI advantage as there
is in a multitasking workstation environment.
-I like my women like I like my tea: green-
That 100k was a joke right? We have 4 2tb SANS where I am and I can tell you that any 2 of them would eclipse your guess. Lets not get into the shelf disks, the extract fabrics, the Raid eating some of your space. Opss did I forget the support contract, the ups the size of a cubical, and a libert air conditioner to cool this room full of spinning drives? Wait minute, your going to need full redundant backups for all this shit, the Gbic switches to controll access, the rack space, and all the fiber hba cards for the servers.(unless you go coper).
Then you want to back this up? Break out your checkbook again for a Compaq minilibary if your lucky, that is only 10 tapes x 80gig a tape...800gig..and that is if your really doing well. So put that on top of it all 10x10X80 gives you 8 TB of backup at around 30k each for the minilibs, the price just keeps on jumpin!
No way, no how, not today or tomorrow. 100k will get you a floor full of 120gig maxtor drives and that is about it.
Neck_of_the_Woods
#/usr/local/surf/glassy/overhead
It seems that this question is extremely dependent upon the kind of application.
Are you mostly reading, or also frequently writing this data? Are you searching or doing indexed lookups? Is this a nasty bandwidth hog or a trickle? Is this a zillion parallel transactions or only a few users? What kind of latencies are expected? What reliability is required? What access is needed to historical data?
Consider some concrete examples that are *very* different from each other yet could each total 50TB and would have very different solutions:
- Video-on-demand system for a Hollywood studio deciding that peer-to-peer pirate systems can only be beaten by a legitimate system that is better.
- Online credit card transaction system for, say, Visa.
- SETI data that needs to be collected and searched for messages from extraterrestrials.
- Particle accelerator data that needs to be collected at truly horrendous rates.
- Lexis/Nexis database.
- Google database.
- Echelon data.
- IRS data.
- "Dictionary attack" database for a lone cryto-analyst.
The possibilities go on and on. At the minimum a 50 TB database might be a small number of equipment racks with a single computer attached to them, all totaling maybe $100,000.
And on the other end, I can easily imagine a system where $200,000 of a much larger total might be spent for, say, a terabyte of DRAM.
I can easily imagine a system with less than $5,000 of battery backed up power supplies, and I can imagine a system with hundreds of throusands in generators.
This question has enormous dynamic range.
-kb, the Kent who would enjoy working out solutions for specific instances of this question.
You know that I am talking about commonly available multi-CPU systems, and not exotic (and insanely expensive) systems with redundant CPUs and memory.
What are you smoking, and where can I get some?
Do you seriously believe that an E6500 or similar system will not crash if there is a faulty CPU? Despite your impressively low slashdot UID, if you believe this, you have virtually no experience with such systems.