Hard Drives as Backup Media?
rootus-rootus asks: "I funny thought struck me as I was going over the life expectancy for tape media for backups... Since the size of 3.5" hard disks is surpassing 100GB in a reasonably inexpensive package, has anyone thought of using them as backup media, as in a jukebox or autoloader? The access times and data transfer rate for data stored on them would make backing up databases, etc. MUCH more palatable (200+GB takes a LONG time to dump to tape for a full backup) Any thoughts on the matter?" Bet you've thought about this question before, haven't you? Has anyone done anything like this? If so, how well did it work?
It's been thought of, and rejected. The reason for this is that the data storage and mechanical parts are contained in one unit, and failure of either makes the other useless. This means that if your drive stops spinning, but your data is fine you can't get to it. This wouldn't be a problem with removable media because you can change the read/write device.
SIG: HUP
I purchased a couple of 80Gb firewire drives for my backup needs. They ran about $275 each after shipping, though I'm sure it's cheaper now. Every day I bring one to the office with me and replace it with the one that was plugged in the previous day. This allows me to do full backups every night and data recovery takes almost no time at all.
On the other hand, this isn't a perfect solution for most companies. First, it would be easy for me to bang the hard drives and have them not spin up. They also are a lot bigger than a tape cartridge. But they do save me lots of time--and that means a lot. I really don't expect these drives to last forever with the "trashing" that gets done to them every night; but since they aren't terribly expensive (for my company) I don't really care if I have to buy another.
Long, cute, or funny Sigs are just another form of over compensation, used by geeks, nerdz, etc.
I've done a reasonable quantity of backup-solution deployments, from the simple "tape drive in a server" to multi-element DLT libraries. I've had customers "invent" a version of this idea on many occasions. Typically, the customer's "invention" takes the form of one of several similar ideas.
What it comes down to, though, is that the idea behind having multiple medias, stored _away_ from the production copy of the data, is a good thing. Until recently, this has only been really convenient with tape media. With the advent of very convenient hot-swappable hard drive carriages and support for hot swapping of hard disk media in nearly every commonly used operating system, I don't see why hard drives could not be used-- but they would need to be treated with a little more physical care than tapes.
The "problem" seems to come when the (typically small-business) customer "invents" this idea, buys one of those cruddy "centronics connector on the back" sub-consumer-grade plastic "drive bays", slaps a hard drive in it, and starts doing backups to one hard drive from another. The cycle is something like: (1) insert 2nd hard drive, (2) wipe 2nd hard drive, (3) copy contents of production hard drive(s) to 2nd hard drive, (4) remove 2nd hard drive. They don't think about what would happen if, say, between steps 2 and 3 the production hard drive(s) failed.
If you're going to use hard disks as "tapes", I don't think there's anything fundamentally wrong-- but buy the same number of hard disks as you'd buy tapes-- and rotate them in the same manner. Treat them as large, mechanical tapes. Keep them away from the production data except when in use.
The Attitude Adjuster, I hate me, you can too.
A few notes on your idea:
1. There is no need to build a mechanical autoloader. IDE controllers and removable drive bays are cheap, less than $25 per drive, making them much cheaper than a robotic loader, with greater reliability and response time to boot. IDE drives can be spun down when they've been idle for a while, so electricity consumption should be similar.
2. I believe that Linux IDE does not currently support hot swapping of drives, although the PCMCIA drives do support removal of an entire IDE controller, which is what happens when you remove a CompactFlash card.
3. My understanding is that hard drives are not hermetically sealed but rather have air filters similar to what you stuff at the end of a cigarette is made of. I believe that when hard drives are not in use, they can accumulate dust internally and are more likely to have problems. You may also have problems with their greater sensitivity to being dropped and to statically electricity. So, you may want to store them in sealed conductive bags.
4. In my humble opinion, I think you have a good idea. I believe that, disk-based backups are much more valuable to an organization because they're easy enough to use that people will save time by doing minor recovery tasks. In comparison, with tape backups, the effort of doing a restore can be so much that people will often opt to spend an hour regenerating their previous work from scratch instead.
I'm just "rsync / backup@remote:/backup/$HOSTNAME"ing every night to a box offsite that rotates the drive mounted on /backup every day when a backup's not running. It runs overnight when the network's not real busy, and works fairly well. I backup the really important/dynamic stuff on site on a daily basis with a 7-disk DVD-RAM rotation. It's the right balance of price/simplicity v/s date safety for my organization, and is pretty idiot-proof.
:)
The drives in the remote backup server (which could easily be co-located at your nearest ISP) aren't "removable", but they're certainly not premanent either.
Ever drop a tape while taking it out of the bay and stuffing it into the tape store? I have. The tape was fine.
Even rugedized drives, when dropped from arm's length, are not going to hold up too well. Cheap drives will definately not hold up.
Gentoo Sucks
Okay, this is sort of an off-topic rant, but can anybody tell me what's up with ATX tower cases with 4 5.25 inch drive bays, but only the upper two are useable for anything as long as a CD or 1.2Mb floppy drive because the standard ATX motherboard is in the way, in other words, the case is high enough and wide enough, but not deep enough. Anybody else fighting this particular frustration factory?
I see even classic Slashdot is now pretty much unusable on dial up anymore.
You're saying which one is overpriced?
400GB added to a PC - $1,200
460GB RaidZone - $10,000
Many people use this for backing up to tape. You break the mirror logically, then stream it to tape, then add it back into the mirror and resync. It speeds backups because you back up from a disk which isn't otherwise busy with head seeks to other parts of the disk, and if you're doing it with software RAID, likely off a completely different SCSI controller.
I'm reading some of the replies and thinking to myself that the /. readers don't understand what a backup system is.
A backup system is not simply redundancy (i.e. RAID). A backup system for files typically can recreate any version of a file requested by the user (as backed up according to the backup regimen). Thus, if you have nightly backups, you might keep every night for the past month, every month end, and every year end for a given document. RAID won't give you this.
I'm familiar with some expensive IBM products that do this. However, they're expensive. Basically, ADSM (ADSTAR Data Storage Manager, or something) is a product that allows regular backups of products, and access to every incremental version of the documents. On the backend, it can be hooked up to a huge disk cache and a robotic tape library. The end result is terabytes of near-online access data, with automatic versioning. Pretty nice. And if your disk cache was large enough, it would never hit the tapes. It seems to me that this could be modified to remove the tapes and present what the user requires.
I'm not aware of anything open source or free (as in beer) that does this. It would be really nice, though.
Hell, I've always dreamed about an automatic versioning filesystem. Documents would be automatically versioned. You could use CVS to handle this. Perhaps you could do something as simple as have some code executed upon every file close for files that are opened with write access. When these files are closed, they are added as new versions of the document within CVS.
When the disk reaches some capacity watermark, a disk cleanup agent would be invoked. Its goal would be to remove redundant versions of old binary files from CVS. Rules could be attached to the agent to perform tasks such as retaining specifc versions of binary files (i.e. retaining the first version, the latest version, and all versions from the last named version).
Users could tag specific versions of files. These versions would always be retained.
I know this would incur a significant performance hit for disk access. Perhaps I could limit such disk access to specific directories or mount points. In this manner, I could have a mount point for documents, all of which would be automatically versioned.
Plugins for Explorer could be built to allow users to tag versions of documents and retrieve specific old versions of files. I'm thinking something like TortoiseCVS, a beautiful piece of software. In fact, for prototyping, TortoiseCVS would be enough.
Now, is anything like that available? No? Perhaps I should do something about that.
Cheers.
--Be human.
Some grad student from China came here with a 6 gig IDE disk with all his data on it. I though it was kind of weird myself, but I guess it worked out OK...
Though drives will often die if left to their own devices [ie, off] (we say they get lonely and kill themselves). Which would really suck if that was your backup, wouldn't it?