Slashdot Mirror


Costs Associated with the Storage of Terabytes?

NetworkAttached asks: "I know of a company that has large online storage requirements - on the order of 50TB - for a new data-warehousing oriented application they are developing. I was astonished to hear that the pricing for this storage (disk, frames, management software, etc...) was nearly $20 million dollars. I've tried to research the actual costs myself, but that information seems all but impossible to find online. For those of you out there with real world experience in this area, is $20 million really accurate? What are the set of viable alternatives out there for storage requirements of this size?"

21 of 161 comments (clear)

  1. Metacomment by twoflower · · Score: 4, Insightful

    Why is it that 90% of "Ask Slashdot" pieces seem to boil down to "I have no real world experience, and I'm just wondering how I can solve problem X for Y dollars when twenty different vendors all sell solutions for 100 * Y dollars?"?

    --


    --
    Twoflower
  2. If you take by Apreche · · Score: 4, Funny

    the new 320 gigabyte harddrives previously mentioned. And you divide 50000 (50TB) gigs by 320. you get an approximate cost of having 50TB by multiplying that by 350$ the appoximate cost of the drive. However, with that much data a RAID is certaintly in order. So multiply the number of drives by 1.5 or 1.75 to get the number of drives needed for a RAID. Then multiply that by 350. This comes out to a little over 80000 dollars. The only cost left is the cost of all the raid controllers (expensive) and networking all the drives together. So for the raw storage of 50 terabytes it costs about $80,000. If you were to buy ultrafast scsi drives instead of the 320GB drives the price will be multiplied by about 3 since a 100MB super fast scsi drive is also about 300$ with 1/3 of the space. So that brings it to $240,000. Add to that the cost of labor and all the other hardware and I don't see how it could come out to more than 1 million dollars. I'm not an expert, but just doing the math it seems that more than that is too much.

    --
    The GeekNights podcast is going strong. Listen!
    1. Re:If you take by battjt · · Score: 3, Insightful

      What's this have to do with managers? Why don't you sell these systems? I don't, because I don't know what is takes to build them.

      How do you even strap 50 TB together? Is it one huge array, or arrays of arrays?

      What do you use at the head end that can handle this sort of throughput? How do you back it up? How do you search it?

      What filesystems do you use that support 50TB?

      How do you manage the hot swap aspects?

      There are so many questions that you leave unanswered, that you might spend $19 mil to answer before you spend $1 mil on hardware.

      Joe

      --
      Joe Batt Solid Design
    2. Re:If you take by secret_squirrel_99 · · Score: 3, Insightful

      You've made a number of assumptions none of them good. One assumption is that the performance of a 5400 rpm ide drive (thats all the 320Gb drives are) would be acceptable for an application like this. It won't. You'd want 15000 rpm scsi-3 drives at a minimum, and you'd want them hotswappable. Figure a grand each for 140Gb drives.. in bulk Then there are a large number of other factors mentioned by others here. Raid controllers, servers to house it all, switching, cabling, racks etc.

      What about power? and cooling? Ever cost out one of those huge liebert internal cooling systems? Don't forget you need 2 of them? What about the power.. you'll need huge UPS's for something like this.

      How about backups? You'll need to be able to back this all up.. and transport the data offsite in a timely manner. Thats ALOT of DLT tapes, not to mention the costs of the tape libraries, drives, off-site storage facilities (perhaps you'd like to keep all of thos tapes in a locker at the space place? ) etc involved .

      Now.. how are you going to access this? with 500 partitions? or perhaps you want some more sophisticated storage management software?

      What about support? Are you going to accept responsibility for mainting this thing? or are you.. like most businesses going to want 24x7x4 support? Since support on products like this often involves flying an engineer in from out of state.. on almost no notice.. its not cheap.

      The reality of this is that for that kind of storage you need a SAN and that means big dollars. The 2 most commons SANS are EMC (which I'd bet was what this estimate was for) and Compaq storage works. EMC is the more mature solution, but also MUCH more expensive. They often outpace Compaq and the other vendors who make similar products by %300 or more.

      Is $20M too much?.. probably. Is any solution involving a room full of servers loaded with commodity IDE drives acceptible.. absolutely not.
      Better to shop other EMC vendors, and other SAN solutions and make the best deal on the right product.

      --
      If privacy had a tombstone it would read "We did it for your own good" . -- John Twelve Hawks
  3. Hmmmm... by jo42 · · Score: 3, Funny

    Imagine how long FORMAT C: would take...

  4. more input needed by tchdab1 · · Score: 3, Insightful

    It's more involved that how many bytes you need to store, of course. How fast do they come in and go out? How often do the bits turn over? How reliable does the data need to be, and how fresh the reliability (do you need to mirror it real-time at a remote, hardened site, or back it up once a month)? What systems does the data need to feed and be fed from? What are your labor costs (tape changers, administrators, etc.)? How much wood do you need to buy for office furniture ?

  5. forget what you know about ide hard drives by aderusha · · Score: 5, Insightful

    sorry for sounding a bit trollish, but the current replies here seem to follow the formula of checking the biggest ide drive on pricewatch and multipying that out to give you a number.

    forget all that.

    if all you wanted was a pile of ide hard drives, maybe this would be ok, but anybody looking for 50TB of storage is not just looking for some disk to hold the pr0n they downloaded last week. large scale storage systems need to manage multiple host access to high speed (15krpm U3SCSI) drives in flexible raid configurations with maximum redundancy, high speed caching (with GBs of RAM to do it), fiber channel switching, cross platform capability, high end management and monitoring, HSM backup and data migration, offsite vaulting of disaster recovery data, power and air conditioning, and a fat service contract from the vendor. none of the above are going to be found at pricewatch.com.

    your best bet is to talk to multiple storage vendors about your needs. call up EMC, Hitachi, IBM, and Fujitsu to start, them let them see each other's numbers. With the amount of money that you are going to spend (and it almost certainly will exceed $10 mil - but maybe not $20), each of these vendors will do backflips to get your business (and EMC is particularly good at junkets - take them for all they're worth :)

  6. Google is your friend by Twylite · · Score: 3, Interesting

    I am not an expert in this field, but Google was willing to tell me lots.

    RaidWeb sells rack mountable RAID units that take IDE drives and have SCSI or fibre connectivity. A 12-bay 4U SCSI (with 12x 120Gb IDE drives) system comes in at just under $8000, giving over 1Tb fault tolerant storage. There are several other companies that have units like this.

    Rackmount Solutions sells rackmount cabinets. A 44U cabinet with fans, doors, etc. will come in at around $3000.

    In theory, a single cabinet could house 11Tb of data, and cost around $91000. This still doesn't consider cabling, cooling, power distribution, networking, a proper server room (air con, false floor for cables, access control), and in all likelihood one or more controlling servers.

    More practically, depending on how they are going to make this data accessible, you could be looking at 9 raid units per cabinet plus 3 2U servers and a switch in the remaining space. Each server can support multiple SCSI cards and gigabyte networking. Such rackmount computers will set you back in the region of $6000 (incl. network and SCSI adapters, excl. software).

    So you can call it $100,000 for 9 Tb storage ... $600,000 for 54Tb. That doesn't answer the management software question, and may not be a suitable solution. But it sure is a lot cheaper than $20 mil ;)

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
  7. Re:Sounds reasonable by duffbeer703 · · Score: 5, Insightful

    Get a clue man.

    Where is your failover?

    How are you going to connect this disks together? NFS? Samba? That kind of speed (or lack of) is not an enterprise storage solution.

    How do you replace disks as they fail without taking stuff offline?

    --
    Conformity is the jailer of freedom and enemy of growth. -JFK
  8. Re:320G Maxtor Drives? by highcaffeine · · Score: 3, Interesting

    In raw disk storage, maybe. But you're forgetting actually putting those drives into a useable state with disaster recovery plans.

    In other words, someone dealing with 50TB and who wants backups of that data will be spending many, many times the amount it would cost to just purchase enough hard drives to get the bragging rights of 50TB. And a backup located in the same room/floor/rackspace/whatever as the source data will be pointless in the event of fire, floods, nuclear fallout, etc. So, they would also need a way to transfer all that data to offsite backups in a timely manner (waiting five weeks for a full backup to transfer over a 100Mb/s pipe would probably not be acceptable).

    Aside from backups, how would the drives be accessible? Even as JBOD, you're talking 40 IDE/ATA controllers (assuming 320GB drives and 4 ports per controller), or 20 SCSI channels (assuming 160GB per drive and 15 non-host devices per channel) to support that many disks. You could also use Fibre Channel and get away with only a couple arbitrated loops. Physically, you're talking about hundreds of disks that need to be mounted somewhere, so you would also need dozens of chassis to hold the drives.

    But, hundreds of disks in a JBOD configuration means you'll have hundreds of partitions, each separate from the others. Hell, if the clients are Windows machines, they won't even be able to access more than a couple dozen at a time. And even for operating systems with better partition/mount-point addressing, it would be unmanageable.

    So, now you get in to needing a RAID solution that can tie hundreds of disks together. If you're talking about hooking these up to standard servers through PCI RAID cards, you'll need several of those machines to be able to host all the controllers necessary (especially if all the disks are not 160GB or larger each).

    The only realistic solution for this much storage, at least until we have 5TB hard drives, is a SAN-like setup. Specialized hardware designed to house hundreds of disks in stand-alone cabinets and provide advanced RAID and partitioning features. SANs don't come cheap.

    Add to the SAN the various service plans, installation, freight, configuration, management and the occasional drive swapping as individual disks fail and you've already multiplied that $50K several times, as a bare minimum (and you still haven't priced out the backup solution).

    There's a lot more to it than just having a pile of hard drives on the floor. I wouldn't even be surprised if the drives are the cheapest component.

  9. Pricing sounds a little high by speedy1161 · · Score: 5, Informative

    From experience (with EMC - Sun) your price tag sounds a bit on the high side, but not by very much. Considering that EMC storage (after all mission critical data should be stored on EMC/Hitachi/StorageTek, NOT on consumer IDE) costs much more than consumer IDE/SCSI (25 - 75x) and that's only the disks.

    If you're going with EMC, you'll need to put those disks in something, like a frame (cabinet), and for your size, more like 5 cabinets. With that many cabinets, you'll need some sort of SAN switch and associated fibre cables (not cheap). That gets your disks into cabinets and all hooked together.

    You wanted to access the data? Then you'll need EMC fibre channel cards ($15k a pop for the Sun 64bit PCI high end jobs). But you'll more than likely be serving data from a cluster of machines, so count on buying three ($45k) per machine (so each card is on a different I/O board hitting the SAN switch, redundancy)

    Who's going to set this up? For that kind of coin, EMC (or whomever you go with) will more than likely set the thing up and burn it in for you on site. The price probably also includes some kind of maintenance contract with turn around time fitting the criticality of the system.

    Yes, my 'big ass storage' experience may be limited , but I think that 20Million for 50TB installed/supported/tested by a big storage vendor is in the ballpark.

    Good luck.

    1. Re:Pricing sounds a little high by Wanker · · Score: 4, Informative

      My "big-ass storage" experience is not so limited, and speedy1161 has hit the nail right on the head.

      For enterprise-class storage (i.e. this is NOT just a pile of Maxtor IDE drives duct-taped together) paying 20M for 50TB is on the high side, but not by much. (I would have given a range of 10M-20M for the whole thing depending on the exact trade-offs made.)

      3 HBAs per host is overkill for most applications (but certainly not all). I've found that two is generally sufficient. Never rely on just one, even for a non-critical system. I'm often amazed at just how critical non-critical servers become when down for several hours in the middle of a busy day.

      Don't discount the significant setup and debugging costs at the beginning. This will cost not only in hardware/software/consulting but in time lost for your own admins to spend working with the vendor, going to classes, learning new methods of adding storage, accidently messing up the systems, cleaning up those messes, etc.

      Get the best monitoring/management software you can. EMC is famous for gouging people on software costs so you'll need to use your best judgement. (HINT: PowerPath == Veritas DMP at up to 20x the cost. SRDF == Veritas Volume Replicator at up to 20x the price. TimeFinder == Mirroring at up to an infinite multiple of the price. You get the idea-- just use your best judgement and be cautious.) Under extreme single-host disk loads the otherwise minor performance hit for host volume management can become a problem, making that 20x price worth it. Maybe.

      If possible, press them for management software that makes adding/removing/changing filesystems a one-step operation, complete with error checking. It really sucks to put that new database on the same disks as another host's old database and software can be really good at checking for stupid human mistakes.

  10. You pay for support. by molo · · Score: 3, Interesting

    When you get a Symmetrix frame from EMC, you also get a support contract. EMC will send multiple people to your installation for maintenance. EMC will remtoely monitor your Symm via modem. They will help you plan your storage needs (including what kind of backup and reliability you need). EMC will provide 24x7 support for everything you need. Then there's management software, etc.

    Don't forget that the hardware isn't cheap: Frame, multiple redundant hot swappable power supplies (requires specialty power connection), dozens of scsi drives, dozens of scsi controllers, 10-20 fibre channel connections, an interconnection network between FC and SCSI controllers that includes fiber and copper ethernet, hubs, etc., and a management x86 laptop integrated into the frame.

    $20 mil for this is a fair price in my opinion. Anyone who rolls their own is just insane. There are hundreds of engineers behind each of these boxes, and it shows.

    No, I don't work for EMC.

    --
    Using your sig line to advertise for friends is lame.
  11. I know how. by one9nine · · Score: 4, Funny


    Floppies. Lots and lots of floppies. They are so cheap right now! And the come in pretty colors too.

  12. CDs - Obvous choice by getagrip · · Score: 3, Funny

    Ok, Lets see. 50 Terabytes divided by 600 megs per CD means you will need 83334 CDs (rounded up.) At about 20 cents each (retail) that should only set you back about $17k. Add in $100 for some of those heavy duty shelving units from Home Depot and a wintel box to read and write them, and you are looking at well under 20k for total hardware cost. At this point, just go hire someone away from their McJob for a reasonable amount to swap the CDs and you are in business.

  13. Re:Try EMC on eBay by haplo21112 · · Score: 3, Insightful

    Yep your a "--Turkey" all right got about the same size brain if you think thats a viable solution...

    The EMC boxes(or anyone else for that matter) have a significant amount of configuration associated with connecting the drives. You cant just open the Box up and start sticking in drives and expect it to work. For that matter, in many cases if the drives are not the ones rated for use in the box you can destroy the backplane of the machine. The power supplies, the drives themselves, etc...Power and heat are huge issues in these boxes...think of the heat the average hard drive throws off now put 100+ in a box the size of the average home refrigerator...

    Then there are configuration issues, you need the software and the technical know how to write the configuration files these machine use to tell the multiple drives to act as one or many logical drives.
    Then how do you connect the system(s) that will use the box up. These are all delicate issues.

    If you buy a box off Ebay you will absolutely need someone working for you who knows the product inside and out(or at least on a retainer contract with 24x7 support clauses)...and you should immediately make a phone call to the proper support phone number to get the thing on a support contract...Trained EMC professionals don't come cheap, but they are worth every penny, I would assume that other companies its the same story, but I only use EMC so I don't know...

    Buy EMC its really the only long term option, I have seen one of these boxes get knocked over on its side(no small task) while it was running, and just keep going with out a hitch...thats a well engineered product....

    --
    Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
  14. Re:Sounds reasonable by aminorex · · Score: 3, Informative

    The trend is to use iSCSI on the network side and IDE on the hardware side. Since a network file
    server only has FS daemons doing I/O, and the drives
    are always hot, there is no SCSI advantage as there
    is in a multitasking workstation environment.

    --
    -I like my women like I like my tea: green-
  15. Re:Look at the quantities by Neck_of_the_Woods · · Score: 4, Informative

    That 100k was a joke right? We have 4 2tb SANS where I am and I can tell you that any 2 of them would eclipse your guess. Lets not get into the shelf disks, the extract fabrics, the Raid eating some of your space. Opss did I forget the support contract, the ups the size of a cubical, and a libert air conditioner to cool this room full of spinning drives? Wait minute, your going to need full redundant backups for all this shit, the Gbic switches to controll access, the rack space, and all the fiber hba cards for the servers.(unless you go coper).

    Then you want to back this up? Break out your checkbook again for a Compaq minilibary if your lucky, that is only 10 tapes x 80gig a tape...800gig..and that is if your really doing well. So put that on top of it all 10x10X80 gives you 8 TB of backup at around 30k each for the minilibs, the price just keeps on jumpin!

    No way, no how, not today or tomorrow. 100k will get you a floor full of 120gig maxtor drives and that is about it.

    --
    Neck_of_the_Woods
    #/usr/local/surf/glassy/overhead
  16. Re:Depends by wfrp01 · · Score: 3, Funny

    If you're using Depends, you should always opt for the high speed reliable storage over the low speed crap storage.

    --

    --Lawrence Lessig for Congress!
  17. I'm not saying that this is the standard... by 0x0d0a · · Score: 4, Insightful

    ...I realize that accepted pricing is well above the price I mentioned. And yes, obviously I left out the maintenance.

    The problem is that I find that corporate spending on IT purchases has gotten ridiculous. Let's buy a TEMPEST array! Let's buy something with a Sun nametag because the name sounds good! Let's buy a $2k piece of software for each workstation even though there's a free alternative!

    I'm not saying that anyone *provides* something in the price range I was talking about. No one is crazy enough to do so, if companies are willing to pay much, much more. I'm saying that, if you're asking whether it's possible to *build* something like this for the price range I mentioned, off the cuff it doesn't sound so unreasonable.

    Yes, a seasoned IT person who works with high-end systems like this will laugh. Why? Because they're used to paying huge amounts of money. Because it's an accepted part of the culture to throw down this much cash. What I want to know is -- how often do people question these basics? How often has someone said "Wait a minute...this is wrong."

    Are you telling me that if you were in a third world country without the exorbant amount of funding that we USians enjoy, and someone asked you to put together a 50TB storage system for under $1M, you'd simply say "It can't be done"? No consideration, nothing?

    I mean, when I look at the fact that the *case* on, say, a Sun high end system costs more than a whole cluster of workstations, I start to wonder just how much excess is going on here.

    Say we take the bare-metal, dirt cheap approach. Grab a bunch of Linux boxes. Throw RAID on them configured so that 1/3 of your data is overhead for reliability, and a 100Mbps Ethernet card in each. The figure used earlier was $1 per gig. Put 6 200 GB drives in each. Throw down $250 for the non-drive cost of each system. You have 800GB of data on each system, 400GB of overhead. That's 63 systems. $16K for the systems, $75K for the drives, and we come in to $91K. I left out switches -- you'd need a couple, but certainly not $9K worth.

    You'd need some software work done -- an efficient, hierarchical distributed filesystem. I didn't factor this in, which you could consider not fair, but there may be something like this already, and if not, it's a one time cost for the whole world.

    Maybe another few systems up near the head of the array to do caching and speed things up, and you still aren't even up to $150K, and you have failover (at least for each one-drive-in-three) group.

    I haven't looked at this -- it might be smarter, since you'd want to do this hierarchically, to have caches existing within the hierarchy, or maybe Gbit Ethernet at the top level of the hierarchy. And obviously, this may not meet your needs. But as for whether it's possible to build something like this for that much money? Sure, I'd say so.

    Finally, existing SANS or any sort of network-attached storage are overpriced, no two ways about it. Very, very healthy profit margins there. Sooner or later, someone is going to start underselling the big IT "corporate solution providers" and is going to kill them unless they trim margins by quite a bit.

  18. Re:You're joking, right? Moderators: you too, righ by treat · · Score: 3, Informative
    You obviously haven't heard about things like ChipKill and ELIZA fault-tolerance initiatives.

    You know that I am talking about commonly available multi-CPU systems, and not exotic (and insanely expensive) systems with redundant CPUs and memory.

    What are you smoking, and where can I get some?

    Do you seriously believe that an E6500 or similar system will not crash if there is a faulty CPU? Despite your impressively low slashdot UID, if you believe this, you have virtually no experience with such systems.