Slashdot Mirror


Bulk Data Storage For The Common Man?

Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."

36 of 483 comments (clear)

  1. Finally a use for my 1GB Gmail invites... by anakin357 · · Score: 4, Funny

    I'll send you a couple.

    --
    http://www.fsckin.com/
    1. Re:Finally a use for my 1GB Gmail invites... by EvilTwinSkippy · · Score: 4, Funny

      Nah, just tarball your backup into 1 or 2 GB file sizes, name it "PR0N XXX TEEN SEX DONKEY LOVE - MILITANT ISLAMIC BUKAKKE KITTEN.MPG.AVI.WMV" and share is on Gnutella.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  2. Hard disks by ConsumedByTV · · Score: 5, Informative

    You're always going to get a better rate with Hard drives but you're going to be prone to failure.

    If you buy them in bulk you can save.

    Burning DVDs is going to take you forever and drive you nuts.

    Find a hotswappable set of drives and use that for your offline backups. Use a raid for your current backups.

    --


    "Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
    1. Re:Hard disks by littlerubberfeet · · Score: 4, Informative

      hard disks are good.

      If you want one of those nifty things with robotic arms and whatnot, plan on spending upwards of $3500. The AIT Automated Tape Library goes for that much and holds only 15 tapes. Plan on spending tens of thousands for something like Ampex's DIS 914 for 30 Terabytes.

      Your friend is right: tapes or cheap. The equipment needed to support them is expensive, slow and error prone. It gets cost effective once you have enough money for a new Porsche though...

      --
      Sig (appended to the end of comments you post, 120 chars)
    2. Re:Hard disks by DetrimentalFiend · · Score: 5, Interesting

      We're dealing with storage issues right now at work, and what we're doing is buying a server with 8x250 GB SATA drives. We then run the drives in raid 5, so we have 1.75TB of storage space (unformatted). Including computer costs, it's running us about $2.50 per GB, but it's a very beefy 3u server. For backup, we're currently backing up to tape. That costs us under $0.50 per GB with ultrium tapes. For some of our data, we've been backing up to DVD's, but we've pretty much given up on that. In the long run, it's not worth it.

  3. good luck by Madcapjack · · Score: 4, Funny

    PRINTSCREEN should do the trick.

  4. Wirewire drives? by NanoGator · · Score: 5, Interesting

    For long term storage, how do you feel about firewire drives? Maybe not as cheap as you'd like, but you can get them in >160 gig flavors, plus you can hook them up to just about anything. Once you do the backup, which'd be a simple copy and paste, you can just unplug the drive and store it in a safe or something.

    Again, I'm not sure if that's as cheap as you'd want, but that's a solution I came up with for a similar problem. My company's going to be 3D rendering some stuff that could end up eating 50 megabytes a frame. (Extra data is stored for future refinement... I can go into detail if I've piqued anybody's curiosity.) We can't afford to lose this data, so the Firewire drive approach is what we're considering right now.

    --
    "Derp de derp."
    1. Re:Wirewire drives? by littlerubberfeet · · Score: 5, Informative

      Lemme address the firewire thing: I work in a sound studio, and we generate about 5-8 gigs of data a month, mostly music for TV. This isn't a huge amount, but we rely on multiple sets of Firewire drives for backup and then internal hard drives for current projects. This means we have all 400 or so projects at our fingertips. Given how fast we do things, this is important.

      Lacie makes their 1 terabyte firewire (943 gigabyte formatted) drive. I we get them for $1,080 a drive (Macmall matched Provantage's price). This is more then the article author spends now per gig, but these drives have done quite well in the studio. You can find cheaper firewire though.

      We are at the point where hard drives give the best bang for the buck. The only fault of firewire is that my bosses have burned several bridges. ground yourself before unplugging the drives. The bridges were cheap though. In any case, hard drives are probably the most failsafe and cost effective solution, with firewire being the easiest interface to use those drives with.

      --
      Sig (appended to the end of comments you post, 120 chars)
    2. Re:Wirewire drives? by SlamMan · · Score: 4, Informative

      USB makes the computer actually do work, while firewire ports handle it themselves. For a normal user, not much of an issue, but over a couple drives, you'd notice.

      --
      Mod point free since 2001
  5. Personally I prefer something in a blonde by kfg · · Score: 4, Insightful

    I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."

    Although the good ones don't come cheap. I guess this another case of "pick any two."

    KFG

  6. 1TB a month?!? by stinkydog · · Score: 5, Funny

    Short of launching his own space probe, the only way for this guy to consume a TB a month of storage is a serious porn habit. Just post your 'content' on Edonkey and it will be available when you 'need' it. You likely only watch them once anyway.

    SD

    --
    âoeWho knew something as harmless as willful ignorance could end up having real consequences?â
    1. Re:1TB a month?!? by grasshoppa · · Score: 5, Funny

      Using this method, I have achieved my life long dream of tapeless ( well, everything-less ) backups.

      I simply make a tar.bz2 file with all my important files, filter it through gpg, then post it on edonkey, usually titled, "Olsen twins getting it on", and then usually the date.

      Viola, instant backup that is available to me whereever I may go.

      --
      Mod me down with all of your hatred and your journey towards the dark side will be complete!
    2. Re:1TB a month?!? by gl4ss · · Score: 5, Funny

      ** You have obviously never heard of fMRI studies, have you?**

      oh shit! I totally missed the part of the history where FMRI scanners came commonplace for men.

      oh wait the whole ask slashdot blurb is twisted, the headline implies asking for datastorage possibilities for the common man - yet one of the first things mentioned that he needs it for his special job that generates tb's of data per month. by that definition he is not a common man, except that he hopes to have a miracle solution - that is quite common.

      still, a common man would choose whatever possibility gave the cheapest price per gb(probably harddrive). with dvd-r's he would end up burning multiple dvd-r's per day and it's kind of implied that the data would need to be retrievable so he would have to burn the same disc multiple times, even then it wouldn't be a sure thing.

      his needs are quite bigh though still, big enough to warrant for professional help since his likely going to be spending quite a bit of money on the thing.

      --
      world was created 5 seconds before this post as it is.
    3. Re:1TB a month?!? by Zone-MR · · Score: 5, Funny

      Oh, so that explains why that "Olsen Twins Getting it on - 12 Mar 2003.avi" file I downloaded last week contained a zipped tar archive full of boring spreadsheets and a lot of donkey porn.

  7. Cheap solution by codeguru73 · · Score: 4, Insightful

    Buy some inexpensive IDE drives with high storage capacity and use a software raid solution. What kind of budget do you have anyway?

  8. age old problem... by Lumpy · · Score: 5, Insightful

    Ahh the large amount of data that has X value versus a storage solution...

    If your data is worth $20,000.00 then a $2000.00 solution is dirt cheap.

    what is your data worth? that is where you need to start and then look at the 10-30% of the data's value to start looking at how must to spend on it's storage.

    If 1 month's data was lost forever, how much money would it cost the company? that is your actual $ amount that you should be shopping at.

    and that is how I got the company to buy a $20,000.00 1000 tape DLT jukebox.

    my data is worth over $100,000 a month and is much lower than yours is size.

    That is where you need to start. Justify your storage costs by figureing out what it is worth to begin with.

    --
    Do not look at laser with remaining good eye.
    1. Re:age old problem... by D-Cypell · · Score: 4, Funny

      my data is worth over $100,000 a month

      This 'data' doesnt happen to be a large collection of email addresses does it?

  9. Buy an older tape drive by Apparition-X · · Score: 5, Informative

    Look for an LTO gen 1 or SDLT220/320 on ebay, with a SCSI connection (some of them are fibre, and I assume you don't want to go there!). Don't forget to pick up some tapes. In general, this sounds like it would work if you plan on doing this for a while, and can leverage the initial investment over months or years.

    Capacities are (for the cost of a sub $50 tape):
    - LTO1: 100 GB uncompressed
    - LTO2: 200 GB uncompressed
    - SDLT220: 110 GB uncompressed
    - SDLT320: 160 GB uncompressed

    If your data is particularly ammenable to compression (i.e. database data) you could easily get 3 or 4 to 1 compression with these drives without sinking your CPU utilization.

  10. Re:!RAID by ecalkin · · Score: 4, Insightful

    because it protects against device failure, not *user* error. if you delete a file from a raid array, it's gone. that's part of what offline is all about.

    eric

  11. spongedrive is best by cubyrop · · Score: 5, Funny

    i am responsible for providing storage solutions for a mid-sized content creation company which, through version archiving, accumulates near 1-200 GB per day. they require access to their media backups on a rolling basis, so tapes are not an option.

    i have found that a Teutonium cluster of 6.5 TB Spongedrives (either Cray or SecreTech are fine) fits the bill nicely. housed in a 15-unit rack server, the amoeba-shaped drives utilize BioLas technology to store data on 6-dimensional Moebius Cilia for a slick seek time of 0.00 ms.

    a cluster costs about $45,000 USD but the price should come down in 2004 Q4 when SecreTech launches their new 40-platter blackholium SCSI's.

    --
    If I could make this sig kill you, I would.
  12. Drawbacks, what are you willing to put up with? by Anonymous Coward · · Score: 5, Informative

    All forms of media/backups have their own drawbacks... but some aren't as bad as others, and the others often are more accessable.

    Tape: Tapes break, they wear, they have dropouts, take a while to back everything up, can't always access files if you just want to restore something (Different methods vary, folks)... but ultimately, it's cheap when you use DAT because they're a common media. Swap the tapes twice as often (and throw old ones out) if you're paranoid about tape related failures.

    Hard Drive: Most common form of backup I see now, mainly for the 1:1 size factor. Yeah, drives fail, too. Sometimes you have a pretty good warning when this is going to happen, sometimes you don't. (My 13GB Maxtor and 40GB IBM Deathstar drives both went *pfft* on reboot.) Get enough of them at once, you could swap out the logic boards if one does fry out. Ultimately, RAID or just simple 1:1 mirroring is probably the most efficient and easy method. Accessing bits and pieces is also easiest under this method. I personally just use an external USB2 case with a 120GB drive in it. Everything I want to back up goes on that drive, and then eventually... DVDRs. I turn off the drive when I don't need it... hopefully prolonging the life of it when I need it most.

    DVDR: Not anymore. If we had these new-fangled DVDR discs (+ or -) say... when 2 to 6GB drives were common.... sure... But in addition to hard drives, recovering selective files is easy under this method too... Unless you use a backup program that crunches everything together on the disc in some spanning format. Burn times can be tedious... but it's not bad if you consider the overall amount of data you're putting on the disc. Cheaper than quality-brand name CDRs, though, in terms of price per mega/gigabyte. Only an idiot would trust $0.01-per-disc spindles for long-term backups. Even the longevity of DVDR has yet to be seen...

    CDR: I'm not going to bother.

    Network: Well, still relies on hard drives and other components... but good if you don't want to saddle one room with a ton of boxes. Simply for space and efficiency... external drive is probably better anyway.

    Old fashioned method: Print everything out and keep it in a filing cabinet somewhere. You could always OCRA the stuff later. ;-)

  13. LTO Ultrium 2 Tape Drive by jeffgeno · · Score: 4, Informative

    The drive will run about $4000, but the tapes are only around $0.20/GB assuming a 1.5:1 compression ratio. And keeping that assumption, 1 TB of data should only take 3 200 GB native tapes per month, so swapping wouldn't be so bad with the single tape drive. An autoloading library would be significantly more expensive, but if you really need automation, that's the way to go.

  14. Easy by Pedrito · · Score: 4, Funny

    I use bioneural gel packs at a cost of $0.04 per teraquad. What is this hard drive of which you speak?

  15. Hijack Cassini by Anonymous Coward · · Score: 5, Funny

    ... and program it as a repeater.

    It's about 90 minutes away, so at 250 Kbps that's over one terabit in storage on the way out there, and another terabit on the way back.

    Worst-case access latency is about three hours, though. Maybe the hard disks are a better idea.

    If you send your probe^H^H^H^H^H repeater to Alpha Centauri, you'll get more than 20,000 times the storage capacity.

  16. Do what Google does by glinden · · Score: 5, Insightful

    Build yourself a cluster of cheap boxes with cheap IDE disks and replicate your data across them. Because the data is replicated across your cluster, no need for backups or RAID.

  17. options options, what is your time and data worth? by segfaultcoredump · · Score: 5, Insightful
    Lets see.... hard Drives are running about $0.50 per GB, DVD's are running about $0.06 per GB (100 pack, "house brand", not something I'd put my data on but this is slashdot, and there are idiots out there who think that it is a good idea), and tapes are also running about $0.20 -> 0.50 per GB (for the DLT/AIT/LTO type, the ones that have enough capacity to not drive you nuts)

    So, you can put your data on 4-5 HD's, 10 tapes or 232 DVD's per month. The Cost of doing so will be about $500 per month for the tapes or HD's and $50 for the DVD's (assuming your time cost $0)

    At work, we had a need to keep a few TB of data online permanently, so we purchased a few NexSAN ATABeast's. At $50,000 for 10TB of usable storage ($5/GB), they may be a bit out of your price range. The advantage is that you can hold almost a years worth of data and it is protected by RAID5. It also makes management a lot easier, since it is very difficult to mount 42 300G drives in a single chassis (and it takes only 4U of rack space).

    On the low end, NexSAN has the ATABoy2 or ATABaby (2TB or 1TB) for the $8-$15K range. This will let you hold a months worth of data

    On the high end, You have EMC disk arrays (Think upwards or $20+/GB for the 'cheap' stuff from them.

    Overall, if you have 1TB per month, you need to either a) get a grant to fund your work, b) hire somebody to swap DVD's for you or b) seriously rethink your data generation.

    Any of the "cheap" storage methods have serious drawbacks, and the low cost ones are, well, not so low cost if $15,000 sounds like a lot of money to you.

    otherwise, good luck

  18. If its volume you want by TheUncleBob · · Score: 5, Informative

    If you are more interested in volume than speed, then the emphasis should be on the 'ID' part of RAID. Inexpensive Disks. If you used 160GB Drives, which appear to have the best bang for your buck at the moment, and put 6 (yes 6!) in a pc. Just use any old cheap pc (I use 200-400Mhz PII)

    Run the disks RAID 5 and you will get about 800GB of storage for $600 . Now get two cheap ata100 cards so you have a total of 6 channels, and mount each drive as a master on each channel. Build a 2gb root partition on the first disk (mirror it if you want) and then set the rest of the space up as a huge raid 5 array.

    Et Voila cheap, big server. To archive data, turn off pc, and throw into attic :-)

  19. Re:Give Up Now by Zone-MR · · Score: 5, Informative

    No figures, but I think the opposite. I've had several DVD-R disks which I've written backups to only to discover that they are unreadable a year later. My personal experience has been that HD's are unreliable, but less unreliable than writable DVDs.

    Of course higher quality media might be better, but then you can no longer quote the $0.10/GB figure.

  20. 4*400Go Sata on Raid 5 by da5idnetlimit.com · · Score: 5, Informative

    depending on the value of your data, you should try having a nice 4*400Go SATA in raid 5 *2, possibly using a distributed file system for redundancy...

    Not the cheapest, but fast, simple and saves you the unholy pleasure of having 2-3 DLT boxes to archive/cycle each month...

    You already have a linux cluster, so implementing a distributed file system, or even simply a nightly incremental mirror to the target server if you can afford losing one day work/computation...

    It would help if you told us what sort of data you work with... from databases and to automated telescope tracking system, both need large amount of storage, but you won't need the same system array for each...

    I seem to remember a /. story on a rackable Petabyte storage system

    You don't need to go to the Petabyte capacuty but you will find some interesting comments on filesystems, disk virtualisation, 1U rack providers and so on....so a 1 Terabyte rack server is definetly possible...

    Good luck...

    --
    It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
  21. Only on slashdot by gexen · · Score: 5, Funny

    Only on Slashdot would they start talking about huge storage arrays and title it "for the common man"

  22. Re:Give Up Now by ePhil_One · · Score: 4, Insightful

    Now call me crazy, but have folks completely forgotten the age old solution, TAPE? A SDLT tape goes for about $50 and holds about 320GB, LTO holds even more, and I believe Quantum has just released the latest generation of SDLT. While its not "cheap" an autoloader can be had for about $15,000 that can backup many TB hands off. Might be a bit much initially, but it the best solution long term

    --
    You are in a maze of twisted little posts, all alike.
  23. Bad idea by Anonymous Coward · · Score: 5, Informative

    Google stores data for fast access, not for reliable storage. They don't care if they lose a few hundred gigs when a handful of disks die, they'll just re-spider it in a few days when the Googlebot hits the sites which were lost. Their solution is NOT optimized for reliable storage and it's not suited in the slightest to this guy's problem.

    1. Re:Bad idea by Anonymous Coward · · Score: 4, Informative

      this is incorrect. GFS (Google File System) has many systems with the same data on each node. These nodes have 3 copies of each data slice. If one server fails then the other two mirrors re-copy the data.. If two fail then the server mirrors the data to ensure it is never lost.

      google does not want ANY data to be lost. The have many mirrors of all data.

  24. Re:Use those HDDs! by Cecil · · Score: 4, Funny

    I'm just curious, do you have any idea how much data 1 Terabyte is? Are you suggesting that he PRINT it?

    Let's say for the sake of argument that all 256 bytes can be printed as a visibly distinguishable character, or that he's got 1TB of plaintext. Also assume you can fit 10,000 characters on a 8 1/2 by 11 page.

    You can fit 10^4 bytes per page, and you need to print 10^12 bytes (I know, it's actually 2^120, but that needlessly complicates the math, so shush)

    That means you will need 10^12 bytes / 10^4 bytes/page = 10^8 pages.

    One hundred million pages. Assuming he has a good laser printer with infinite toner, let's say he can print 60 ppm or one page per second. It would take one hundred million seconds to print the data, which is 1157 days, or a little over 3 years.

    Given that he generates 1TB per month, I think this backup plan would probably become the top agenda item of most of the anti-deforestation groups out there.

  25. Re:Give Up Now by tchuladdiass · · Score: 4, Informative

    Come on, this is Slashdot. A tape changer doesn't have to cost that much money if it's make of lego (shamelessly pulled from an earlier slashdot story which I can't find at the moment).

  26. What are your near- and long-term requirements by TBone · · Score: 4, Informative

    I looked through some of the answers here, and as near as I can tell, you've got a bunch of home hobbyists telling you how to back up your home computers. Perhaps all your needs entail is a computer with an external IDE drive array and 4-10 200G SATA drives in it. But from your initial post, it's not clear what you need your offline storage _for_.

    First of all, you mention that you generate and use 1G of data a month. What happens at the end of that month? Does all of the data become useless? Is some of it carried through? Is it useful for historical processing for some time after it's not "live" any more? The disposition of that offline data is important; you can't determine how you can most effectively back up your data until you know what you need to do with that data once it's backed up.

    Since no one cares about backing up old data that they never use any more, I'm going to assume you need this data in some form in the future. I'm also assuming that your data ages out completely every month.

    Realistically, you have two options: Large redundant disk arrays, or tape. Various factors give credence to one or the other.

    First of all, get off of the SATA hacks, and realize you're going to need to go to SCSI, whether you end up with disk or tape. You're backing up data, you're going ot want it to be reliably written out, and SCSI is the de facto standard for backup architecture. Yes, you pay more for it, but there's a reason for it: the SCSI equipment I manage at work fails a fraction of the percentage of time that the various IDE/ATA systems fail. While SATA is marketed as a consumer technology, it will never meet the rigors of being a reliable backup methodology.

    • Media Cost: Tape wins over disk here. LTO tape is running, at a quick check, for about $75 retail for 200/100G tapes. Even assuming only reasonable compression, you're looking at 150G for $75 bucks. And that is single-cart pricing; tape pricing quickly drops if you're ordering in bulk (typically in packs of 10, then at the 3-packs level, then more, check with your preferred media vendor)
    • Hardware Cost: Disk wins, but it's a double-edged sword - every disk you own has electrical and mechanical failure chances. The more disks you have, the more likely you are to lose one of them. The more you're storing on disk, the more you open yourself to a catastrophic failure of those disks themselves. High-end fast tape drives and libraries are expensive, but they just _work_. You plug them in, load your preferred tape management software (hell, run mtx for that matter), and start backing stuff up. No formatting, settings up arrays, hot-swap schedules, anything like that. But you pay through the nose for it - expect to spend into the $10K range for a large-scale tape storage solution that you could match (in short-term storage duration) for a couple of thousand dollars for a disk-based solution.
    • Hosting Space: Try to store 10TB of disk, and you'll need an air conditioner in that room just to cool down the disk cabinet and controllers. 10TB of tape just sits there though; you can store 4TB of tape online in a small 3U (about 6 inches) tape library - that's 24 tapes, and such libraries typically also support two drives. Go to 5-6U, and you can get 4 drives and over 50 tapes. If those were 200GB LTO tapes, you'd be looking at up to 10TB of storage available online, or easily offline and off-siteable. In addition, tape is easily expandable. Need more storage space? Buy another tape. No new hardware needed, no power concerns, just drop it in the drive or library and go.
    • Speed: Disk definitely has an edge. Set up an decent SCSI RAID5 array (real hardware raid across multiple disks on separate physical controllers, not this playtime software 0+1 homebrew IDE raid crap) and watch your write speeds triple. If you need to back up that 1 TB overnight, you don't have much of a choice but to go to disk in some form. But again, you pay a price for it. The speed you save in the
    --

    This space for rent. Call 1-800-STEAK4U