Affordable Home Backups for 10-100G Systems?
MichaelJames asks: "Ok, I have my MP3's streaming, all our digital pictures up, and a file server running on one machine in the basement. What would be the best way to do simple backups of the system and data? Get a tape drive Get a CDRW or DVDRW to backup the MP3 and pics, but use the old Zip drive for the file server data?" With drives in the 10-20 gig range only getting smaller and less expensive, what are we to do for backups, that have yet to scale well in the same range. For home systems with up to 100G of storage, what do you use to back up that much data, with a solution that's affordable to the average computer user? Have DVD writers become cheap enough for serious consideration as a backup media?
CD-R/CD-RW are too slow and too small, plan on spending a day or so swapping disks. You can always mirror to another hard drive, get a basic RAID card or just use a Ghost-like program to do manual backups. But tape is still cheaper per megabyte and more reliable. Sure, you can damage a tape, but it's harder to do than with a hard drive. SCSI tape drives are more expensive than another drive, but fast enough, and allow you to keep multiple versions or copies of your backup. Try that with hard drives and you need arrays. Tape starts looking REAL cheap then.
Ignorance is the root of all evil.
First, on a typical system, not all data is really worth backing up; the OS and all applications can be reinstalled in the event of a crash (for Linux, it might even be slightly beneficial, as you'll reinstall newer versions and get rid of various cruft you've forgotten you have). Some data has been saved just because it's convenient or simply less bother than having to actively remove it (for me, I tend to collect old logs, various mails I never will look at again and documentation that's several revisions old). A lot of mp3:s and movies may already be burned onto CD:s. That filled 40Gb drive may actually 'only' contain 4-5 Gb of data that actually needs to be backed up.
The data I actually need to back up I manage by having the important stuff an specified directories, then mirroring them over the net to my machine at work. By doing it incrementally, there is little time or bandwith wasted.
/Janne
Trust the Computer. The Computer is your friend.
I just hate it when somebody posts exactly what I was going to post. Had to erase a good two minutes of typing because of it. :)
Anyway, that is exactly what I am doing. I have a 13Gb hdd for the system, and a 40Gb for storage (mp3s, movies, etc). Also I have a burner and one of the removable racks you mention. And in it, there's an identical 40Gb hdd used solely for backups. I keep it safe, and I plug it in every few days to copy the new stuff to it, remove the old, etc. I know that ideally I should have more backup space than hdds I'm using, but I never really run out of space. I am always writing the very important files to CDs, sometimes in duplicate. Call me paranoid, but after losing 3 years of data because of a hdd crash and a cheap CD which refuzed to be read, I'm not taking any chances. Also, all the stuff I don't need often (less than once a month) goes to CD.
One very, very important thing though. Don't cheap out on the removable racks. Make sure that at least the lid on the one you get is mettal, and there's at least a fan in the hdd tray. All racks have one fan on the rack itself (the part that gets mounted in the case). But make sure you have another one in the tray.
I used to have a rack made out of plastic completely, and with only one fan. My Maxtor 7200rpm drive was getting HOT. And I do mean hot! Then one day I ripped the IDE cable from its mount, and I had to buy another rack. This one is with metalic lid, and 2 fans inside. Now the hdd doesn't even get warm. And the difference in price between the racks was $10 canadian (about US$7.5).
Those few extra bucks are probably going to prolong your hdd life by quite a bit.
I'm sharing my cable modem via 802.11 with all the neighbors and since I am the local "neighborhood helpdesk technician", they often come to me for advice. Recently, one of them wanted to know how to go about backing things up properly. It dawned on me that hard drive space is abundant and most people are buying much more than they need (the person in question has an 80 gig at about 20% capacity). So I worked out a deal so that everyone is backing up to each other's PC at night on a weekly basis. The 802.11b connection keeps drive thrashing to a minimum yet provides enough speed for complete backup on an overnight basis.
I should start charging for these ideas... Can't wait for the proliferation of freenet!
Life is the leading cause of death in America.
Bringing up the system is less of a problem with newer OSes, since you can usually, at minimum, get to your data. Configuring the database, webserver, and firewalling depends on how good you are with the OS. However, when I worked at a former company there was no real plan to get a working system back in place. We were using Novell with Arcserve -- unfortunately, you couldn't get to the data without a working system.
Next I usually try to segregate rapidly changing stuff versus things that are pretty much static. E.g, my mp3 collection is relatively static. I occasionally buy a fresh CD and rip it, but I'm pretty much satisfied with my collection as it is. I put these on CDROM. It takes a while to create them, but it's cheap and safe. If you want to keep everything up to date, you can run a script to save only files not included on the CDROM.
Finally, I back up my constantly changing stuff such as CVS, MySQL database, etc. to 4MM tape. It's cheap (hardware and tape) and most drives are pretty well supported.
I'm classifiable as an audio addict, having taken my entire personal
/boot
/home
/pchome
/pub
/pub/mp3
/scratch
/pub/mp3_2
/pub/software
/etc/cron.hourly/rsync_with_fumus script:
/pub
/pub
/pub
:-)
collection of CD's and ripped them to MP3's at 320 bit, and wanted to
have them stored in a central place, accessible from any machine in my
home. Currently this collection is at approximately 620 full CD's of
music, and I'm pushing right at, or just above the 80 gigabyte limit.
Now when you factor in personal files, financial records, games,
downloaded material, installation software you don't want to lose,
etc...etc... Well, see for yourself. Here's my space breakdown for the
partitions on my main file server Fumus (Smoke, in Latin):
fumus:/pub/mp3 # df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 3.0G 2.1G 804M 72% /
/dev/hda1 129M 6.8M 115M 6%
/dev/hda5 9.8G 1.8M 9.3G 1%
/dev/hda6 20G 13G 6.3G 67%
/dev/hda8 40G 22G 17G 57%
/dev/hdb1 75G 38G 33G 53%
/dev/hda7 1.9G 20k 1.8G 1%
/dev/hdc1 74G 34G 40G 46%
/dev/hdd1 74G 36G 37G 49%
So, here's what I looked at:
Tape: For the size I'd need: Way WAY too expensive. When I brought
the media down into the range I'd afford, I'd be swapping tapes all week
to get a backup done. Not time effective.
CD-R: Faster, yes, but at 650 megabytes per media, same problem as
tape, only you've traded magne tic for optical.
Extra hard drives in the same machine: Originally, this is exactly what
I had done with a single file server running Reiser file systems in the
more experimental days. I got the scare (and lesson) of my life when
Reiser went a bit nuts, and started corrupting some of my data. I only
lost about one percent, but I vowed, never never NEVER again would I
backup data on a critical machine on live media in the same machine.
Okay, so here's what I finally DID select as my solution: A second
machine called Ignis (Fire in Latin) that uses the absolutely identical
configuration, right down to the types and number of drives, partition
sizes, everything. They both connect into my 100Mb network switch, and
Ignis rsync's from Fumus every hour on the hour thanks to scripts in
/etc/cron.hourly
In fact, here's Ignis'
rsync -arul --one-file-system --quiet fumus:/pub/mp3_2
rsync -arul --one-file-system --quiet fumus:/pub/mp3
rsync -azrul --one-file-system --quiet --delete --force fumus:/pub/software
rsync -azrul --one-file-system --quiet --delete --force fumus:/pub /
rsync -azrul --one-file-system --quiet --delete --force fumus:/pchome /
Is this a bit extreme? Yes. But... if, gods forbid, Fumus really does
let out its magic smoke, or Ignis does catch on fire, and the physical
media were actually damaged, hopefully the damage would be limited to
*one* case, and wouldn't end up taking both machines out. Then I really
would be crying the blues.
Oh yes, and each machine is on their own 900VA UPS. I'm not playing
THAT game.
digital movies can be hard to replace. .mp3s really trivial? certainly not impossible, but still a pain.
Obsoleted OS's. If someone is usinf win95, and its got all the patches, you may want to back up everything, do to lack of support.
is replacing 100's of megs of
The ability to get everything back w/o reinstalling and downloading Service packs and patches is a huge plus to most people.
The Kruger Dunning explains most post on
for i in `cat rsync.list| egrep -v "^#"` /vol/backup/$HOSTNAME/$DATE
/vol/backup/$HOSTNAME/$DATE
do
HOSTNAME=`echo $i| awk -F: '{print $1;}'`
DIRECTORY=`echo $i| awk -F: '{print $2;}'`
DATE=`date +%A`
install -d
rsync --numeric-ids --compress --rsh=/usr/bin/ssh --recursive --archive --relative --sparse --one-file-system --compare-dest=/vol/backup/$HOSTNAME/current $HOSTNAME:$DIRECTORY
done
Then once a week we run a similar script that updates the 'current' directories and uses --delete
(rsync.list contains entries like "hostname:/some/mounted/partition")
To backup a 100GB drive, you require...
- 6 DVD+RW (18 GB) discs, or
- 20 DVD-RAM (5.2 GB) discs, or
- 158 CD-R discs, or
- 72,818 HD 3.5" floppy discs
My car gets 40 rods to the hogshead, and that's the way I likes it!
But I can't see any better solution affordable for the casual home user. CD-R's are even shorter-term media - I've already had 5-year-old CD-R's become unreliable to read, while my 8-year-old hard drive (it's not even an expensive one - some cheap Connor Peripherals thing that came with my Packard Bell) is working 100% perfectly.
I'd call hard drives semi-permanent media that can be taken off-site easily, especially if they are mounted in a removable rack, as suggested. If a hard drive is used solely for backup (say, once a week?), MTBF should not be less than 10 years, even for a Maxtor.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Being one of the maintainers of Amanda (www.amanda.org), I'd always been of the opinion that tape backups were the only way to do backups seriously.
/boot on RAID 1 over the 4 disks and / on RAID 1 over 2 of the disks and an alternate root to test upgrades over the other 2, but you get the point). This got me blazingly fast disk access, that tapes would never help me get :-)
The recent explosion in disk capacities and decrease in prices got me to rethink this, just when it came the time for me to set up a home office. When I compared the cost of a reasonably-good tape drive and a number of tapes large enough for me to get at least a month of backups in rotation, and computed how many 60GB disks I could buy with that money, the solution was clear.
I ended up setting up 3 machines with 4x60GB each. They're all on RAID 5, such that if any single disk fails, the machine keeps running (actually, I have
I get all my backup-worthy data rsynced over to the other machines daily or so. I plan to start playing with Inter-Mezzo soon, so that I don't have to remember to run these backups, and so that I don't run these backups on the wrong direction.
But that's not all. With the mind-boggling amount of disk space I could afford, I could (actually, I will, but you get the idea) set up Amanda to backup interesting portions of my home directory to disk, and also replicate this to at least another of my local machines. Such backups can use software compression, such that they don't take as much space as live data. Also, I intend to use another form of compression: instead of backing up CVS trees (I've got loads of check outs), I'm going to back up only local changes to files, so that, in case of disaster, I can still download the original CVS tree and re-apply patches. But this is still a plan, not something I've got running.
Finally, I've got yet another disk on a remote site, to which I rsync not only the interesting portions of my data, but also my backups. I could convince someone else to run this remote backup site for me by offering this person the speed up of RAID 0 over two disks (one of those mine). As for keeping the secrecy of the data on this remote backup site, I'd just get the backup files encrypted, no big deal.
I can strongly recommend this solution: I got pretty much as much data safety as could be expected from a tape-based backup, without any of the hassle of having to switch tapes and moving them off-site and back on-site, and with the bonus of very fast access to local data, unlikely donw-time and fast recovery except in case of total disaster (i.e., having all of my local machines failing, in which case I'd have to either download my backups from the remote site over the net or, more likely, take a replacement machine over to the remote backup site and copy files over a fast local network connection, or from disk to disk.
As for getting 4 IDE disks into a single machine, don't even think of using only the 2 IDE controllers that come on most motherboards these days (for RAID set-ups, you really want one IDE disk per controller). There are a few good motherboards that come with 4 IDE controllers, so that you can even have a CD-ROM and/or a CD-RW in addition to the 4 disks. If you can't find such a motherboard that suits your needs, you can always get one of those PCI cards that adds 2 IDE controllers to your machine.
As for the problem of fitting so many disks in a standard ATX chassis, it can be done. Cooling may be a problem, but a good cooler has been good enough.
All in all, I'm very happy with this arrangement. It was not cheap, but it was not as expensive as a tape-based solution, and it's far more flexible, way faster and it doesn't require any baby-sitting after you get it going. And I can keep far more backup history than I thought it was going to be possible.
The chances aren't as bad as you make them sound...
Let's just say you have an internal RAID system with, oooo, 4 drives, along with a removable drive to backup everything. The problem lies in the whole trusted/shared medium concept. If there is a surge passed along the case, the SCSI/IDE cable, or through the power-supply cabling, not only will ALL of your drives get toasted, but if you have the backup-harddrive connected to the system (actively archiving your data, finished archiving and waiting for you to remove it, or just because of a BAD practice of never actually removing the removable hard drive) you will loose your backup hard drive as well.
While RAID is a good thing, multiple hard drives are still at the mersey of everything they are connected to. Using such a system as your only backup is a bad idea that happens too often. Having a removable hard drive is an option, but the work involved really makes other solutions much more viable, especially on a large-scale (and on a small-scale, people are lazy!).
I propose a network-based automatic backup system for most people. You simply have your main system automatically backup it's data over the network to another system (systems with low number-crunching capabilities can be put back to work here). Of course, you would want to maintain at least 2 concurrent backups in-case the main system dies during the said backup. The benefit of network backup are that human intervention is not required (the user and administrator don't need to do much of anything after initial setup) and off-site backups can happen transparently (just send the data to the other office down the street, across town, whatever.
Speed of the network may appear to be a problem, but 100 GigaBytes (UNCOMPRESSED) can be transfered in 2.22... hours over 100Base-Tx. First of all, it's likely you'll be compressing that data, which on average halves the size, and so the time is halved as well. Secondly, Gigabit over Cat-5 is available at $45 per NIC, making backups take one-tenth that time. And finally, an encrypted SSH, IPSec, PPTP, (etc) tunnel could be established that would ensure the data is kept private. Data security is much more difficult when you have multiple copies of it unencrypted in a conviently sized package. You are just saying 'steal me'.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant