Bulk Data Storage For The Common Man?
Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work
depends on generating, storing and analyzing a
large amount of data -- averaging about 1 TB per
month. I would like to have a storage system which is automated, fast, reliable
and most importantly does not cost the price of an
eye. Right now, I have a 4 node Linux cluster with
10 large hard disks (total capacity 1.6 TB); data storage roughly costs
about $0.60/GB (excluding the cost of PC
hardware). But long term storage is painful -- DVDs
cost about $0.10-$0.15/GB but takes too much human time
and leaving data on hard disks makes me nervous
because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if
Slashdot readers have any recommendations for a
cheap automated way to store and retrieve data."
You're always going to get a better rate with Hard drives but you're going to be prone to failure.
If you buy them in bulk you can save.
Burning DVDs is going to take you forever and drive you nuts.
Find a hotswappable set of drives and use that for your offline backups. Use a raid for your current backups.
"Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
Look for an LTO gen 1 or SDLT220/320 on ebay, with a SCSI connection (some of them are fibre, and I assume you don't want to go there!). Don't forget to pick up some tapes. In general, this sounds like it would work if you plan on doing this for a while, and can leverage the initial investment over months or years.
Capacities are (for the cost of a sub $50 tape):
- LTO1: 100 GB uncompressed
- LTO2: 200 GB uncompressed
- SDLT220: 110 GB uncompressed
- SDLT320: 160 GB uncompressed
If your data is particularly ammenable to compression (i.e. database data) you could easily get 3 or 4 to 1 compression with these drives without sinking your CPU utilization.
All forms of media/backups have their own drawbacks... but some aren't as bad as others, and the others often are more accessable.
;-)
Tape: Tapes break, they wear, they have dropouts, take a while to back everything up, can't always access files if you just want to restore something (Different methods vary, folks)... but ultimately, it's cheap when you use DAT because they're a common media. Swap the tapes twice as often (and throw old ones out) if you're paranoid about tape related failures.
Hard Drive: Most common form of backup I see now, mainly for the 1:1 size factor. Yeah, drives fail, too. Sometimes you have a pretty good warning when this is going to happen, sometimes you don't. (My 13GB Maxtor and 40GB IBM Deathstar drives both went *pfft* on reboot.) Get enough of them at once, you could swap out the logic boards if one does fry out. Ultimately, RAID or just simple 1:1 mirroring is probably the most efficient and easy method. Accessing bits and pieces is also easiest under this method. I personally just use an external USB2 case with a 120GB drive in it. Everything I want to back up goes on that drive, and then eventually... DVDRs. I turn off the drive when I don't need it... hopefully prolonging the life of it when I need it most.
DVDR: Not anymore. If we had these new-fangled DVDR discs (+ or -) say... when 2 to 6GB drives were common.... sure... But in addition to hard drives, recovering selective files is easy under this method too... Unless you use a backup program that crunches everything together on the disc in some spanning format. Burn times can be tedious... but it's not bad if you consider the overall amount of data you're putting on the disc. Cheaper than quality-brand name CDRs, though, in terms of price per mega/gigabyte. Only an idiot would trust $0.01-per-disc spindles for long-term backups. Even the longevity of DVDR has yet to be seen...
CDR: I'm not going to bother.
Network: Well, still relies on hard drives and other components... but good if you don't want to saddle one room with a ton of boxes. Simply for space and efficiency... external drive is probably better anyway.
Old fashioned method: Print everything out and keep it in a filing cabinet somewhere. You could always OCRA the stuff later.
The drive will run about $4000, but the tapes are only around $0.20/GB assuming a 1.5:1 compression ratio. And keeping that assumption, 1 TB of data should only take 3 200 GB native tapes per month, so swapping wouldn't be so bad with the single tape drive. An autoloading library would be significantly more expensive, but if you really need automation, that's the way to go.
Lemme address the firewire thing: I work in a sound studio, and we generate about 5-8 gigs of data a month, mostly music for TV. This isn't a huge amount, but we rely on multiple sets of Firewire drives for backup and then internal hard drives for current projects. This means we have all 400 or so projects at our fingertips. Given how fast we do things, this is important.
Lacie makes their 1 terabyte firewire (943 gigabyte formatted) drive. I we get them for $1,080 a drive (Macmall matched Provantage's price). This is more then the article author spends now per gig, but these drives have done quite well in the studio. You can find cheaper firewire though.
We are at the point where hard drives give the best bang for the buck. The only fault of firewire is that my bosses have burned several bridges. ground yourself before unplugging the drives. The bridges were cheap though. In any case, hard drives are probably the most failsafe and cost effective solution, with firewire being the easiest interface to use those drives with.
Sig (appended to the end of comments you post, 120 chars)
If you are more interested in volume than speed, then the emphasis should be on the 'ID' part of RAID. Inexpensive Disks. If you used 160GB Drives, which appear to have the best bang for your buck at the moment, and put 6 (yes 6!) in a pc. Just use any old cheap pc (I use 200-400Mhz PII)
:-)
Run the disks RAID 5 and you will get about 800GB of storage for $600 . Now get two cheap ata100 cards so you have a total of 6 channels, and mount each drive as a master on each channel. Build a 2gb root partition on the first disk (mirror it if you want) and then set the rest of the space up as a huge raid 5 array.
Et Voila cheap, big server. To archive data, turn off pc, and throw into attic
No figures, but I think the opposite. I've had several DVD-R disks which I've written backups to only to discover that they are unreadable a year later. My personal experience has been that HD's are unreliable, but less unreliable than writable DVDs.
Of course higher quality media might be better, but then you can no longer quote the $0.10/GB figure.
depending on the value of your data, you should try having a nice 4*400Go SATA in raid 5 *2, possibly using a distributed file system for redundancy...
/. story on a rackable Petabyte storage system
Not the cheapest, but fast, simple and saves you the unholy pleasure of having 2-3 DLT boxes to archive/cycle each month...
You already have a linux cluster, so implementing a distributed file system, or even simply a nightly incremental mirror to the target server if you can afford losing one day work/computation...
It would help if you told us what sort of data you work with... from databases and to automated telescope tracking system, both need large amount of storage, but you won't need the same system array for each...
I seem to remember a
You don't need to go to the Petabyte capacuty but you will find some interesting comments on filesystems, disk virtualisation, 1U rack providers and so on....so a 1 Terabyte rack server is definetly possible...
Good luck...
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
Google stores data for fast access, not for reliable storage. They don't care if they lose a few hundred gigs when a handful of disks die, they'll just re-spider it in a few days when the Googlebot hits the sites which were lost. Their solution is NOT optimized for reliable storage and it's not suited in the slightest to this guy's problem.
USB makes the computer actually do work, while firewire ports handle it themselves. For a normal user, not much of an issue, but over a couple drives, you'd notice.
Mod point free since 2001
Come on, this is Slashdot. A tape changer doesn't have to cost that much money if it's make of lego (shamelessly pulled from an earlier slashdot story which I can't find at the moment).
I looked through some of the answers here, and as near as I can tell, you've got a bunch of home hobbyists telling you how to back up your home computers. Perhaps all your needs entail is a computer with an external IDE drive array and 4-10 200G SATA drives in it. But from your initial post, it's not clear what you need your offline storage _for_.
First of all, you mention that you generate and use 1G of data a month. What happens at the end of that month? Does all of the data become useless? Is some of it carried through? Is it useful for historical processing for some time after it's not "live" any more? The disposition of that offline data is important; you can't determine how you can most effectively back up your data until you know what you need to do with that data once it's backed up.
Since no one cares about backing up old data that they never use any more, I'm going to assume you need this data in some form in the future. I'm also assuming that your data ages out completely every month.
Realistically, you have two options: Large redundant disk arrays, or tape. Various factors give credence to one or the other.
First of all, get off of the SATA hacks, and realize you're going to need to go to SCSI, whether you end up with disk or tape. You're backing up data, you're going ot want it to be reliably written out, and SCSI is the de facto standard for backup architecture. Yes, you pay more for it, but there's a reason for it: the SCSI equipment I manage at work fails a fraction of the percentage of time that the various IDE/ATA systems fail. While SATA is marketed as a consumer technology, it will never meet the rigors of being a reliable backup methodology.
This space for rent. Call 1-800-STEAK4U