Hard Drives as Backup Media?
rootus-rootus asks: "I funny thought struck me as I was going over the life expectancy for tape media for backups... Since the size of 3.5" hard disks is surpassing 100GB in a reasonably inexpensive package, has anyone thought of using them as backup media, as in a jukebox or autoloader? The access times and data transfer rate for data stored on them would make backing up databases, etc. MUCH more palatable (200+GB takes a LONG time to dump to tape for a full backup) Any thoughts on the matter?" Bet you've thought about this question before, haven't you? Has anyone done anything like this? If so, how well did it work?
It's been thought of, and rejected. The reason for this is that the data storage and mechanical parts are contained in one unit, and failure of either makes the other useless. This means that if your drive stops spinning, but your data is fine you can't get to it. This wouldn't be a problem with removable media because you can change the read/write device.
SIG: HUP
I purchased a couple of 80Gb firewire drives for my backup needs. They ran about $275 each after shipping, though I'm sure it's cheaper now. Every day I bring one to the office with me and replace it with the one that was plugged in the previous day. This allows me to do full backups every night and data recovery takes almost no time at all.
On the other hand, this isn't a perfect solution for most companies. First, it would be easy for me to bang the hard drives and have them not spin up. They also are a lot bigger than a tape cartridge. But they do save me lots of time--and that means a lot. I really don't expect these drives to last forever with the "trashing" that gets done to them every night; but since they aren't terribly expensive (for my company) I don't really care if I have to buy another.
Long, cute, or funny Sigs are just another form of over compensation, used by geeks, nerdz, etc.
I've done a reasonable quantity of backup-solution deployments, from the simple "tape drive in a server" to multi-element DLT libraries. I've had customers "invent" a version of this idea on many occasions. Typically, the customer's "invention" takes the form of one of several similar ideas.
What it comes down to, though, is that the idea behind having multiple medias, stored _away_ from the production copy of the data, is a good thing. Until recently, this has only been really convenient with tape media. With the advent of very convenient hot-swappable hard drive carriages and support for hot swapping of hard disk media in nearly every commonly used operating system, I don't see why hard drives could not be used-- but they would need to be treated with a little more physical care than tapes.
The "problem" seems to come when the (typically small-business) customer "invents" this idea, buys one of those cruddy "centronics connector on the back" sub-consumer-grade plastic "drive bays", slaps a hard drive in it, and starts doing backups to one hard drive from another. The cycle is something like: (1) insert 2nd hard drive, (2) wipe 2nd hard drive, (3) copy contents of production hard drive(s) to 2nd hard drive, (4) remove 2nd hard drive. They don't think about what would happen if, say, between steps 2 and 3 the production hard drive(s) failed.
If you're going to use hard disks as "tapes", I don't think there's anything fundamentally wrong-- but buy the same number of hard disks as you'd buy tapes-- and rotate them in the same manner. Treat them as large, mechanical tapes. Keep them away from the production data except when in use.
The Attitude Adjuster, I hate me, you can too.
In a mission-critical environment, it is possible to achieve a very high degree of system-redundancy by using hard drives as a backup solution by "breaking mirrors". (Of course if the system is mission-critical, I'd rather have a geographically distributed set of systems, but that's not always possible...)
The idea is that you setup hot-swappable disks in a three-way mirror, that is, three drives containing the exact same data. When you want/need to take a backup, you simply pull the third drive out of the array. The system still has two drive with the full data set, so you don't loose your redundancy. You insert another drive to replace the one you pulled, and in a matter of hours it will be in sync with the rest of the array. That sync time will depend on the size of the hard drives you use, but should be well under 24 hours for up to at least 40 gigs.
Of course, you need hot-swappable disk drives, and ideally a hardware RAID controller, and you might even want to consider having a disk for every day of the week, so that solution isn't exactly cheap. However, it gives you an instant snapshot of the system when you take your backup, and restoring is simply a matter of putting the backup drive in the machine (or another similar machine, if the first was damaged/lost). If you're paranoid (not to mention incredibly rich), you might even consider a four-way mirror to have two backups, and already have instant redundancy when you restore the system!
Another interesting thing to do is to use disk-level encryption in such a scenario. Since all your drives are encrypted, so are your backups. The problem with that is that you need to provide the key (passphrase or hardware token, or combination) at boot time, which means additional downtime if there's an unscheduled reboot and you're not around with the key...
Also, you might want to do things such as checkpointing your databases right before pulling the backup drive out in order to minimize the chance of data loss. But in the end, it can't be worse than an unexpected power down, and well-written applications and OSes deal with that pretty well. A logging filesystem will definitely help.
"Words have meaning, and names have power." -- Lorien
Any opinions? Thanks.
Harddrives are used for backup all the time. It's called RAID. There is little point in using them for offline storage since tape is still so much cheaper/meg. The only time speed becomes a true issue is when your daily backups start taking more than a day to complete.
A standard backup procedure is to make a T-0 copy of
the data (LVM and IBM's Shark support this) and then backup the copy to tape for long term storage. That way you have until the next scheduled backup to complete the dump to tape.
A few notes on your idea:
1. There is no need to build a mechanical autoloader. IDE controllers and removable drive bays are cheap, less than $25 per drive, making them much cheaper than a robotic loader, with greater reliability and response time to boot. IDE drives can be spun down when they've been idle for a while, so electricity consumption should be similar.
2. I believe that Linux IDE does not currently support hot swapping of drives, although the PCMCIA drives do support removal of an entire IDE controller, which is what happens when you remove a CompactFlash card.
3. My understanding is that hard drives are not hermetically sealed but rather have air filters similar to what you stuff at the end of a cigarette is made of. I believe that when hard drives are not in use, they can accumulate dust internally and are more likely to have problems. You may also have problems with their greater sensitivity to being dropped and to statically electricity. So, you may want to store them in sealed conductive bags.
4. In my humble opinion, I think you have a good idea. I believe that, disk-based backups are much more valuable to an organization because they're easy enough to use that people will save time by doing minor recovery tasks. In comparison, with tape backups, the effort of doing a restore can be so much that people will often opt to spend an hour regenerating their previous work from scratch instead.
We use a separate sync server with lots of
/nfs/mount/on/remote/server. For a
disk space and then do nightly dumps over nfs
to this box. This server is located in a
separate building with ethernet between the
systems.
Every night on our servers, a script runs and
dumps the local filesystems at an appropriate
level. We then gzip the dump and store it on
the dump server. Since each file is uniquely
named, we can store old dumps as long as disk
space permits. In our case this is about 1 week
of old dumps.
The scripts are trivial to write (think dump |
gzip >>
while we used to dump between remote locations
and move the data via rsync but it takes forever.
local ethernet is a huge plus for moving gigabytes
of data nightly.
for more important data like our cvs repository,
we snapshot it hourly, daily, weekly, and monthly
as well as tarring it up hourly and daily. This
means we have like 15 entire copies of our CVS
tree. It's probably overkill but it helps a lot
come panic time.
You can easily add 400gb of disk space to a
regular pc for about $1200. In our case we do
it all in less than 100. The other nice thing
about doing this is that we have instantaneous
access to our dumps and can access them much
quicker than tape.
In a perfect world I'd also like to back up
the data to tape as well, but haven't yet done
so. I suppose if we wanted to be extra safe we
could also mirror the drives on the sync server
or rotate the data between physical disks so that
it would take multiple failures to lose the
backup data.
--chuck
Ever drop a tape while taking it out of the bay and stuffing it into the tape store? I have. The tape was fine.
Even rugedized drives, when dropped from arm's length, are not going to hold up too well. Cheap drives will definately not hold up.
Gentoo Sucks
For myself, working at home on a cable modem box, I don't need a lot of backup space. What I do want is the back to be at a remote location. And I do want it to be as automated as possible because if it requires me to physically do something on a regular basis I'll just stop after a couple of months, as soon as my schedule gets tight or whatever.
My solution is to find another person on a fast connection who has the same needs, and arrange to let him ssh into my box and have a few gigs worth of space, and give me the same.
Right now the only scriptified part is creating the backup files. I encrypt them and scp them to his box by hand. I will eventually have it all automated, including deleting the oldest backup if space is getting tight.
This probably isn't an option for a "real" backup solution, such as for a business or a network with a number of users. But all I want is my home directory, mail, etc. Hell, my bookmarks file and mail are probably most of what I want, and the rest is mainly small latex docs.
I think there is probably a way to use freenet for this, but I didn't think that through all the way. If I inserted my backups into freenet, and a fire burned down my house, how would I know what keys to use to get my backups back out of freenet ?
*cough* RAID *cough*
I know of one web site that has to store large amount of user files for free and is using "disposable" Linux RAID5 boxes without any regular tape backup. They are getting the cost down into the $5 or 6 per GB range which is only about 4x the cost of blank tape. It's only a matter of time....
Okay, this is sort of an off-topic rant, but can anybody tell me what's up with ATX tower cases with 4 5.25 inch drive bays, but only the upper two are useable for anything as long as a CD or 1.2Mb floppy drive because the standard ATX motherboard is in the way, in other words, the case is high enough and wide enough, but not deep enough. Anybody else fighting this particular frustration factory?
I see even classic Slashdot is now pretty much unusable on dial up anymore.
The company I work for does this occasionally. Usually where we have a SLA in place that requires us to perform a backup or restore within a certain timeframe.
Typically a backup to disk is made in order to get the backup done as fast as possible, then that backup is dumped to tape. Simple restores are quick and relatively easy because the most recent backup is always online and if we have a more serious failure, we can still restore from tape.
Fast, cheap & reliable. Pick two.
I'm reading some of the replies and thinking to myself that the /. readers don't understand what a backup system is.
A backup system is not simply redundancy (i.e. RAID). A backup system for files typically can recreate any version of a file requested by the user (as backed up according to the backup regimen). Thus, if you have nightly backups, you might keep every night for the past month, every month end, and every year end for a given document. RAID won't give you this.
I'm familiar with some expensive IBM products that do this. However, they're expensive. Basically, ADSM (ADSTAR Data Storage Manager, or something) is a product that allows regular backups of products, and access to every incremental version of the documents. On the backend, it can be hooked up to a huge disk cache and a robotic tape library. The end result is terabytes of near-online access data, with automatic versioning. Pretty nice. And if your disk cache was large enough, it would never hit the tapes. It seems to me that this could be modified to remove the tapes and present what the user requires.
I'm not aware of anything open source or free (as in beer) that does this. It would be really nice, though.
Hell, I've always dreamed about an automatic versioning filesystem. Documents would be automatically versioned. You could use CVS to handle this. Perhaps you could do something as simple as have some code executed upon every file close for files that are opened with write access. When these files are closed, they are added as new versions of the document within CVS.
When the disk reaches some capacity watermark, a disk cleanup agent would be invoked. Its goal would be to remove redundant versions of old binary files from CVS. Rules could be attached to the agent to perform tasks such as retaining specifc versions of binary files (i.e. retaining the first version, the latest version, and all versions from the last named version).
Users could tag specific versions of files. These versions would always be retained.
I know this would incur a significant performance hit for disk access. Perhaps I could limit such disk access to specific directories or mount points. In this manner, I could have a mount point for documents, all of which would be automatically versioned.
Plugins for Explorer could be built to allow users to tag versions of documents and retrieve specific old versions of files. I'm thinking something like TortoiseCVS, a beautiful piece of software. In fact, for prototyping, TortoiseCVS would be enough.
Now, is anything like that available? No? Perhaps I should do something about that.
Cheers.
--Be human.
Some grad student from China came here with a 6 gig IDE disk with all his data on it. I though it was kind of weird myself, but I guess it worked out OK...
Though drives will often die if left to their own devices [ie, off] (we say they get lonely and kill themselves). Which would really suck if that was your backup, wouldn't it?
Yes, RAID is for redundancy, not back-up, but the difference is really about how you configure it and if you need offsite storage. Let me explain our systems.
We build large (600MB-1TB) systems, either on W2k or RHL (Honestly our customers almost never prefer RHL and I'm waiting for WINE to get to 1.0 to be able to convince them to switch, but that's a tangent...) for digital storage of security video. We have specialized hardware to capture and record onto a pc's EIDE disk drives, and use 3ware cards to expand the EIDE array. Tape back-up of such large systems is useless not only because of the time to write the data, but the recovery speed is not viable for video. It takes hours to review the damn tapes. Might as well spend less and stick to VHS! So our method is such.
Normally in security video data, the cost to mirror or back up is beyond means of the customer, but when they ask for it we set up a three way back up. We dump the data to one of three disk array sets, either locally or across a high speed line. This way we only erase the oldest version before laying down the new copy. This prevents catastrophic loss in-between erasure and new back-up (mentioned above). We do have customers using pull-out drives, and we have very little trouble with them. We use the expensive trays (not the plastic crap), and they hold up fine. And the drives are pretty hardy also. Don't drop it, thats true, but don't freak either.
We normally don't mirror, you don't get 2 versions (2 months of recorded video) and generally a very expensive way to "back-up" (TCO and maintanence). And I have found that the best way to keep a hard drive back up is not to turn them off. Keep the system spinning and the drives last longer.
Also a point is that if the data is not hyper critical, than the hard disk mtbf rate should be sufficient. Take in mind that there are "no quibble" warrantees for most drives. They'll ship you a new one before you send them the kaput one. That improves up time, and generally you don't offer a guarentee you expect to honor;)
Over-all, ignore the nay-sayers. Hard disks work for back up, its cheap, and you can't have the low latency/record/review time with any other low cost back up.
Its worth the extra care!
btw, here's a shameless plug of our site, and we have bought up the remaining stock of 3ware cards to build systems. drop a line if you want one for a deal.
we are just moving from tapes to harddrives. The basic reason is that:
It takes ages to scrap a file from tape. Our db-data are stored on a RAID and also as copys on other (production) maschines basicly to reduce load but they also hold a compled copy that way.
Accidently deleted files from a production maschine is a real problems (a RAID doesnt help).
the most important point is to *learn to recover*. Technical we copy the whole system on tape. But we can boot from tape.
Storing on a disk requires a netboot, or at least a bootdisk that allows you to write the data back on a empty disk. but beware most utilities (like tar) cant write something like a partitiontable. And dump and dd are not allways an option.
A clean installation helps a lot !
I have a modbile rack in my computer and use it to back up/store everything that doesn't belong on my permanent drives. I buy 75 gig hard drives and use them as the "media". They are fast, reliable, and have the cheapest storage to cost ratio. Hard drives are excellent for these kind of applications.
Thanks,
Travis
forkspoon@hotmail.com