Bulk Data Storage For The Common Man?
Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work
depends on generating, storing and analyzing a
large amount of data -- averaging about 1 TB per
month. I would like to have a storage system which is automated, fast, reliable
and most importantly does not cost the price of an
eye. Right now, I have a 4 node Linux cluster with
10 large hard disks (total capacity 1.6 TB); data storage roughly costs
about $0.60/GB (excluding the cost of PC
hardware). But long term storage is painful -- DVDs
cost about $0.10-$0.15/GB but takes too much human time
and leaving data on hard disks makes me nervous
because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if
Slashdot readers have any recommendations for a
cheap automated way to store and retrieve data."
I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."
Although the good ones don't come cheap. I guess this another case of "pick any two."
KFG
Buy some inexpensive IDE drives with high storage capacity and use a software raid solution. What kind of budget do you have anyway?
Ahh the large amount of data that has X value versus a storage solution...
If your data is worth $20,000.00 then a $2000.00 solution is dirt cheap.
what is your data worth? that is where you need to start and then look at the 10-30% of the data's value to start looking at how must to spend on it's storage.
If 1 month's data was lost forever, how much money would it cost the company? that is your actual $ amount that you should be shopping at.
and that is how I got the company to buy a $20,000.00 1000 tape DLT jukebox.
my data is worth over $100,000 a month and is much lower than yours is size.
That is where you need to start. Justify your storage costs by figureing out what it is worth to begin with.
Do not look at laser with remaining good eye.
because it protects against device failure, not *user* error. if you delete a file from a raid array, it's gone. that's part of what offline is all about.
eric
Build yourself a cluster of cheap boxes with cheap IDE disks and replicate your data across them. Because the data is replicated across your cluster, no need for backups or RAID.
How many months at 1TB/month do you require access to online? After you are done with data can you discard it or do you need it archived? What is the cost of losing your data set at any given time? In what manner do you expect to access it (read/write mixture and sizes plus aggregate throughput and number of client connections). The answers to these questions could cause the cost of a solution to vary but a couple orders of magnitude.
So, you can put your data on 4-5 HD's, 10 tapes or 232 DVD's per month. The Cost of doing so will be about $500 per month for the tapes or HD's and $50 for the DVD's (assuming your time cost $0)
At work, we had a need to keep a few TB of data online permanently, so we purchased a few NexSAN ATABeast's. At $50,000 for 10TB of usable storage ($5/GB), they may be a bit out of your price range. The advantage is that you can hold almost a years worth of data and it is protected by RAID5. It also makes management a lot easier, since it is very difficult to mount 42 300G drives in a single chassis (and it takes only 4U of rack space).
On the low end, NexSAN has the ATABoy2 or ATABaby (2TB or 1TB) for the $8-$15K range. This will let you hold a months worth of data
On the high end, You have EMC disk arrays (Think upwards or $20+/GB for the 'cheap' stuff from them.
Overall, if you have 1TB per month, you need to either a) get a grant to fund your work, b) hire somebody to swap DVD's for you or b) seriously rethink your data generation.
Any of the "cheap" storage methods have serious drawbacks, and the low cost ones are, well, not so low cost if $15,000 sounds like a lot of money to you.
otherwise, good luck
Explain this to me, I can buy a 200 disc cd changer for $100 bucks, but the same thing with a burner (cd/dvd) runs thousands of dollars. Isn't there any company out there that can do it cheaper?
Heck, I remember a slashdot article about a guy who built one out of WOOD!
This would be a great solution for short term recovery storage. Just keep a stack of CD's or DVD's ready, and it will load them in and burn them all automatically.
On a site note, it would be great for converting a 400 disc cd collection into MP3's.
Now call me crazy, but have folks completely forgotten the age old solution, TAPE? A SDLT tape goes for about $50 and holds about 320GB, LTO holds even more, and I believe Quantum has just released the latest generation of SDLT. While its not "cheap" an autoloader can be had for about $15,000 that can backup many TB hands off. Might be a bit much initially, but it the best solution long term
You are in a maze of twisted little posts, all alike.
If your "work" (as in food, housing and income) requires this kind of storage, you should be charging the kind of money that can make the ecomomics of such data storage actually viable. I'm assuming that some of the really high-end storage devices from EMC, Hitachi, et al could handle your data generation/replication/backup needs effortlessly.
If that's too expensive (and it usually is), you can kludge your own system using low-end stuff from Hpaq/IBM/Dell's x86-server-oriented product lines. LTO1 drives are pretty cheap and we've found them to be very reliable over the past 3+ years, as well as offering 100 gig native per tape.
If even that's too expensive, then I seriously think you need to re-think the economics of your work situation. If your work doesn't cover your capital costs, you're not charging enough. If the work and data are business valuable enough, cutting your storage bill to the bone by building Linux clusters crammed with IDE HDDs is just a bad business decision.
If this is just your hobby-type work, then you need a cheaper hobby, like heroin addiction or something affordable. Physical space and electricity aren't cheap enough in a metropolitan area to burn through 1TB of storage per month, let along reliable data storage.
Not enough feedback or information!
OK, 1TB/month that doesn't say much.
Always look at different levels of case scenarios and work from there. I usually start with loss of building by fire and work down through limited hardware failure or data corruption.
There are several factors that determine how often you should backup. Here's just a couple of questions to answer.
How much is the data worth?
How much is your time worth? If you lost a day or week of processing time.
Is your work time dependent? (deadlines)
If you lost the data, did you lose the data completely or just lost processing/analyzing time on the data that you can get from your clients again?
How long do you have to store the data, and have it retreivable? One month compared to several years really changes your options.
How financially responsible are you for the data?
Multiple backups(daily, weekly, monthly)(full and incremental) in multiple locations are key to a successful backups.
Raid is for redundacy or performance not backups.