Best Way to Back Up Photos and Video?
jsalbre writes "I do a lot of digital video work, and my wife is a professional photographer. With raw DV from the video camera using up 11GB/hr, and raw images from the digital SLR using 7MB I'm quickly using up a lot of space. I currently back up all my important files each night from one harddrive to another, but I now have over 200GB of irreplaceable data (more than just DV and photos, but those make up the largest chunk) and I'm having to exclude the "less important" irreplaceable files as my backups have started failing. Several people have suggested backing up vital unchanging files to DVD (video, images,) and continue backing up frequently accessed files to harddrive, but with recent studies showing that optical media doesn't last very long I don't want to come back in a few years and find that all my backups are useless. Not to mention that some of my DV files are larger than even a dual-layer DVD, and it would be near impossible to automate backup to DVD. How do other Slashdotters back up their important data? I'd appreciate distinction between methods for frequently accessed files and for infrequently accessed files. Any suggestions will be highly appreciated!"
since when is tape archival quality? It's barely backup quality. I've had way more properly stored tapes fail than I have properly stored optical media.
Treat optical media like magnetic media (store in cool dry place) and use high-quality media and you'll get far better results than tape.
Add in the speed at which tape drives become obsolete and tapes hard to obtain, while CD's are still readable. And I've found optical to be a superior archive medium.
If you examine the study cited you'll notice that the study is for optical media in harsh conditions. Additionally they specifically state "It is demonstrated here that CD-R and DVD-R media
can be very stable (sample S4 for CD-R and sample D2 for DVD-R). Results suggest that these media types will ensure data is available for several tens of years and therefore may be suitable for archival uses."
While I love raid, RAID is not a backup - raid is about availability and consistency. So if you delete one item in a RAID it is SUPPOSED to be lost to the entire array.
/. readers, but it's two 1-line scripts and I've seen them on here before :)
/etc belongs in "current"
In everything I've read, the moral definitely seems to be harddrives, lots of harddrives, for price performance. I'm assuming you have a reasonable LAN or can set one up.
Here's the setup I haven't finished implementing yet: PLEASE give me any comments about it to help me improve my setup.
1. Setup a file server using at least one big, inexpensive disk. (This can also be a desktop as long as it can reasonably serve files.) This is your "USE" server.
2. Separate you files (on a per-directory basis) into categories based on how frequently they are changed. The important consideration is: 'If a file is changed/deleted from USE how long should I wait delete a file in the backup' Personally, I only need two categories. "current" = a month or so depending on disk space and "archive" = never (family pics, videos, etc.)
That means that if I delete something in my "current" tree _AND_ I don't notice for a month, my backups will delete it and it's gone forever.
3. Setup a 'backup server' using at least one inexpensive hard disk. Set your backup server to login to your USE server and sync your files.
It should be able to do both "full" (copy everything) and "incremental versioning" = "IV" (if something is changed, keep BOTH copies, marking them appropriately) backups. Neither of these kinds of backups should ever eliminate any information automatically - they should just add information.
4) For me, I'd run:
1) An IV backup of "archive" every night.
2) A full backup of "current" every week.
3) An IV backup of "current" every night.
4) A job that deleted the oldest backups of current every week.
Notice that I'm _never_ running a full backup of "archive" but I'm also _never_ deleting the backup.
Notes:
rsync or rsync over ssh is my preference for doing this kind of backup. It works very nicely, but I'm too tired to get it right just this minute so I'm leaving IV/full backup commands as exercises for other
cron is fine for setting it up automatically.
wget has similar functionality to rsync for a website and you don't need any privileges.
I think most of
Do make sure you log the output of your syncing software. Also make sure you monitor disk usage. If you want to be fancy, it could keep all of the full-backups of "current" until space is short (with a reasonable margin) and then always delete as many of the oldest ones as it needs to to make enough room. This means your number of snapshots will vary with disk space - some people think that's evil.
This system scales reasonably well - for more size add more harddrives per server and/or more servers. For redundancy add more backups per live copy. As long as you can keep it organized and your network handles it, there's also no reason a USE server can't be served by two backup servers or a backup server can't also serve several smaller workstations - or any combination thereof.
Do not add multiple harddrives to a backup server for redundancy. These servers are essentially free and you get much more redundancy (and some scalability) if you use two backup servers. With a setup like this, any server should only have one copy (excepting multiple versions of the same tree)
You could just do a full backup of current every night or whatever, and you could have many possibly more complicated "current" backup schemes. But for me the total size of "current" is massively smaller than "archive" so it's really not important. Remember, having more of these isn't more redundant - they're all on the same drive.
This backup server should generally run no services except possibly ssh and certainly shouldn'
Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
Rsync ( http://rsync.samba.org/ is really great for backup of Unix-like systems. The ability to hardlink identical files allows me to store hundreds of daily full images of 100GB of sources to a single target 250GB hard disk. Rsync is very smart about moving only changed data over the network, resulting in speedups of 10x to 100x. This allows me to do full backup on my offsite colo without using a lot of bandwidth. Note that Rsync is great for Mac/Unix/Linux, but it does sometimes have problems with windoze clients. But then, so do I ...
Dirvish (originally written by jw schultz) is a Perl wrapper around Rsync. It facilitates the scheduling and management of Rsync based backups. We have a fairly active mailing list and contributions from around the world (open source is so cool!).
Backups should be safe against:
Backups should be automatic (or they will not get done) and cheap (hard disks are cheaper than tape, and much cheaper when you use hard linking). Rsync stores the data in a file system closely approximating the original, which facilitates restores.
If a cheap electrolytic filter capacitor dries out in your power supply, and the 5V output decides to start making a 15V squarewave instead, everything in your computer case will get fried. Including every one of the RAID disks. External USB enclosures (or airgaps!) protect against host and power supply failure.
If I was really paranoid about protecting my data, I would run a long ethernet cable to a nerdly neighbor a few houses away, and put a second dirvish server there. While I do rotate my drives into ziplok bags in a fire-resistant safe, the maximum credible accident (a furnace explosion) would tear open the firesafe. If I was paranoid and rich, I would use a high bandwidth VPN connection to a big disk in a colo machine in a different city.
The best backup is server-pull, frequent, automated backup onto multiple R/W media in multiple places, and frequent checking of that data. The closer you can approximate this, the more secure your data will be.
Keith
Keith Lofstrom server-sky.com