Best Backup Server Option For University TV Station?
idk07002 writes 'I have been tasked with building an offsite backup server for my university's television station to back up our Final Cut Pro Server and our in-office file server (a Drobo), in case the studio spontaneously combusts. Total capacity between these two systems is ~12TB. Not at all full yet, but we would like the system to have the same capacity so that we can get maximum life out of it. It looks like it would be possible to get rack space somewhere on campus with Gigabit Ethernet and possibly fiber coming into our office. Would a Linux box with rsync work? What is the sweet spot between value and longevity? What solution would you use?'
Holy crap we're approaching the need for an Ask Slashdot FAQ. I feel old.
Try one of these babies on for size. 67TB for about $8,000.
There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.
Talk to a mechanical engineering student on campus, they can probably help with that.
A couple of details you'd need to fill in before people could give legitimate advice.
What's the rate of change of that 12TB. Is it mostly static or mostly dynamic. I would assume it's mostly write once read rarely video but maybe not.
Do you have a budget ? As cheap as practical or is there leeway for bells/whistles.
Is this just disaster recovery. You say if the station gets slagged you want a backup. How quickly do you want to restore. Minutes, hours, next day ?
Do you need historical dumps ? Will anybody want data as it existed last month ?
Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)
If you just want to dump data and restore isn't critical, you just need to be able to do it in some time frame then sure rsync'ing to some striped 6 (or 12) TB SATA array is plenty good.
That's all you need. We even use a script to create versioned backups going back six months using perl as a wrapper.
Assuming the same paths, edit to your liking. I've made the scripts available at http://www.secure-computing.net/rsync/ if you're interested. It requires the system you're running the script for have root ssh access to the boxes it's backing up. We use password-less ssh keys for authentication.
The README file has the line I use in my crontab. I didn't write the script, but I've made a few modifications to it over the years.
Does your university have a backup solution you can make use of? The one I work at lets researchers onto their Tivoli system for the cost of the tapes. I think I've got somewhere in the neighborhood of 100TB on the system and ended up being the driving force behind a migration from LTO-2 to LTO-4 this summer. If you are going to go and role your own and use disks, I'd recommend something with ZFS - you can make a snapshot after every backup so you can do point in time restores.
Also, I'd recommend more capacity on backup than you have now to allow versioning. I was the admin for a university film production recently (currently off at I believe Technicolor being put to IMAX) and I've lost track of the number of times I had to dig yesterday's or last week's version off of tape because someone made a mistake that was uncorrectable.
What solution would you use?
First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.
That said...
For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.
That said, and if possible, I would also build the "backup" machine with more storage than the "real" machine. As someone else pointed out, you'll probably discover within a few days that your food-chain-superiors have no concept of "redundancy" vs "backup" vs "I can arbitrarily roll my files back to any second in the past 28 years". Having at least nightly snapshotting, unless your entire dataset changes rapidly, won't eat much extra disk space but will make you sleep ever so much better.
You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states. However, because it keeps these diffs in addition to the mirror, it's better if you have more space on the backup side.
There are a few different frontends/guis to it but I don't have experience with them.
Why not a complete duplicate of all of the hardware? If the studio combusts you have an exact copy of everything.. hardware and all. If you use any kind of disk imaging software, you can simply recover to the server with the latest image and lose very little data.
i recommend losing the drobo as fast as you can - i know 4 people who bought these and all 4 lost data in the first year.
Everything your TV station broadcasts will automatically be backed up here.
What I use is BackupPC. It's a very nice web front end to tar over ssh.
For linux, all the remote servers need are sshd listening somewhere, and with the backuppc servers public key in an authorizedhosts file. It will pipe tar streams over an ssh connection.
For windows, it can use samba to backup over SMB
I run a copy on my home file server, which backs up all the machines in the house, plus the couple servers I have out in colo.
When it performs an incremental backup, after it is done it will populate its timestamped folder with hardlinks to the last full backup for duped files. so restoring from any incremental will still get the full version no matter when it was last backed up.
Also after each backup, it will do 2 hashes on every file and the previous backup. If the files match, it deletes the second copy and again hardlinks it to the first copy of the file.
I have nearly 3 months worth of backup retention, backups every 3 days (every day on a couple), but for the base system and files that rarely change, each 'copy' does not take up the same amount of disk space.
It is very good at saving disk space.
Heres some stats from its main page as an example
There are 7 hosts that have been backed up, for a total of:
* 26 full backups of total size 38.34GB (prior to pooling and compression),
* 43 incr backups of total size 0.63GB (prior to pooling and compression).
Pool is 10.11GB comprising 108499 files and 4369 directories (as of 9/16 01:00),
Restoring gives you a file browser with checkboxes. after you tell it what you want, it can send you a tar(.gz) or .zip file, OR it can directly restore the file via tar over ssh back to the machine it was on, by default in the original location but that can be changed easily too.
The main downside is the learning curve. But once you get things down, you end up just copying other systems as templates, updating the host/port/keyfile/etc settings.
Also, with all those hard links, it makes it a pain to do any file/folder manipulation on its data dir.
Most programs won't recognize the hard link and just copy the file, easily taking up the full amount of storage.
But works just as well with only itself and one remote server.
schedule it to start at night and stop in the morning, set your frequency and how much space to use before it deletes old backups, and let it run.
Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion â" inevitably, there will be accidental deletions and the like occurring in your studio. If you use rsync (with --delete, as any sane person would, otherwise your backup server will fill up in days, not years), then when some n00b runs `rm -rf ~/ReallyImportantVideos`, they'll be deleted from the backup too.
Remember that pro photography website that went down, because their "backup" was a mirroring RAID setup? Yep â" they lost all their data on one fell swoop when somebody accidentally deleted the whole lot. Don't make the same mistake.
Use an incremental backup tool. Three that come to mind are rdiff-backup, Dirvish, and BackupPC.
I would think that rdiff-backup would suit your needs best. I currently use BackupPC at home, which is great for home backups, but I think that it's overkill (and possibly a bit limited) for what you want.
Hope this helps!
Why do anything when you can pay someone else twice as much? 12TB from Amazon will be an order of magnitude more expensive than just running a storage server, and you have to pay for internet bandwidth instead of just running a wire.
I love rdiff backup but I'd never use it on any large datasets. I attempted to use it on ~ 600 GB of data once with about 20GB of additions every month and it ran dog slow. As in taking 6+ hours to run every day (there were a lot of small files, dunno if that was the killer).
For larger datasets, like what the poster has, I'd go with a more comprehensive backup system, like bacula. I use that to backup about 12TB and it's rock solid and fast. There's a bit of a learning curve, but the documentation is very good.
If Bacula is too intimidating rsnapshot would be a viable route, it's similar to rdiff-backup, but simpler (pretty much just rsync + cp using hard links), faster, and easier to use. It's not as space efficient, but diffing video data is probably a waste of time anyway.
Photos.
While our storage needs are nowhere near that size, I can attest to the greatness of Bacula. The hardware part is probably up to you, but as far as software, I cannot preach this software enough. 1) It's completely cross platform in terms of systems you can pull data from. The Director and Storage Daemon run flawlessly on every distro of Linux I've tried it on (Slackware, Debian, and Fedora)... and the restores are easy as pie with some of the available interfaces. Configuration is a pain and can take awhile, but once it's set, you're done. We have 5 servers, two of which are hosted outside the company and we don't even have physical access too... I was able to set these up to work with the same backup solution as if they were local with ease. Other internal servers are Windows 2000 -> Mac OS X... all backup without issue, daily incrementals, weekly diffs, and once a month fulls.
...
My university is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features. Offsetting that is the horrible, and I mean horrible, cost. Acronis just (as in like less than a month ago) came out with their new backup product, which they even give a free trial for. It does bare metal restore among other things, and I was very impressd with it, but it didnt meet some of my requirements and I didnt get to play with it much more. On the cheaper more jenky side of things, I have tried NovaStor backup products with overall horrendouse results, stay away completely from them. (things like being able to export data directly to a removable drive for first time transfer is ridiculous!) I am very impressed with a completely off the wall solution called RBackup. It seems at first very "made in india" but it has tons of features that are easy to understand (being brandable is a big plus) and generally can be setup quickly or very granularly. If your using a windows system you should check it out.I have also looked at symantecs and other things, but these so far are a few of the major players in the "I want to remote backup my own data to my own servers" category (which excludes lots of stuff) Since I am still in the review process, I am also curious to see what other people say. I can also tell you that I have setup almost 4 drobos now and they really rock, so your doing good on that front!
... BitTorrent pirates. You'll always find last night's shows backed-up on TPB the next morning. Yaaarrr!
Have each student create their "own TV station" as part of their degree requirement - no matter the area of study. Similar to research essays, you'll get the following results: 1) students who completed the assignment with no outside assistance 2) students that copied certain small portions of the data you are backing up and presenting it as their own 3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work.
As this data appears on the University network, the entire TV station will be backed-up in a local "Cloud". And if these types of assignment become popular at other universities, you can expect to find redundant off-site backups. By this point, the 12 TB will appear on BitTorrent (and probably on Newsgroups and IRC for the dedicated plagiarists). A full restore will only take a few days - as long as the full 12 TB is seeded.
Do you know what -l does?
Anyway, I have a Fedora box with a RAID 5 made of four 1 TB disks. There is a partition on the RAID called /backup0. That's not really a backup, but more meant as a convenience. I back up all my data to /backup0, then right away use rsync to copy the new data to an external drive that is either /backup1 or /backup2.
I have a safe deposit box at my bank. Every week or two I swap the external drive on my desk with the external drive in the safe deposit box.
So the reason I have that /backup0 filesystem is so that I don't have to sync the two external drives to each other - otherwise I would have to make twice as many trips to the bank, and there would be some exposure were my house to burn down while I had both external drives at home.
My suggestion for you is to find two other University facilities that are both far away, and offer to trade offsite backup services with them.
You would have two backup servers in your TV station - one for each of your partners - and they would also each have two, one each for you, as well as for each other.
That way only a hit by a large asteroid would lose all your data.
I got religion about backing up thoroughly after losing my third hard drive in twenty years as a software engineer. Fortunately I was able to recover most of that last one, but one of the other failures was a total loss, with very little of its data being backed up.
Request your free CD of my piano music.
Backups for UNIX, backups for Windows, and backups all across the board almost require different solutions.
For an enterprise "catch all" solution, I'd go with TSM, Backup Exec, or Networker. These programs can pretty much back up anything that has a CPU, although you will be paying for that privilege.
If I were in an AIX environment, I'd use sysback for local machine backups and backups to a remote server.
If I were in a general UNIX environment, I'd use bru (it used to be licensed with IRIX, and has been around so long, it works without issue with any UNIX variant.) Of course, there are other solutions that work just as well, both freeware, and commercial.
If I were in a solidly Windows environment, I'd use Retrospect, or Backup Exec. Both are good utilities and support synthetic full backups so you don't need to worry about a full/differential/incremental schedule.
If I were in a completely mixed environment, I'd consider Retrospect (it can back up a few UNIX variants as well as Macs), Backup Exec, or an enterprise level utility that can back up virtually anything.
Please note, these are all commercial solutions. Bacula, Amanda, tar over ssh, rsync, and many others can work just as well, and likely will be a lot lighter on the pocketbook. However, for a business, some enterprise features like copying media sets, or backing up a database while it is online to tape or other media for offsite storage may be something to consider for maximum protection.
The key is figuring out what you need for restores. A backup system that is ideal for a bare metal restore may be a bit clunky if you have a machine with a stock Ubuntu config and just a few documents in your home directory. However, having 12 terabytes on Mozy, and needing to reinstall box from scratch that has custom apps with funky license keys would be a hair puller. Best thing is to use some method of backups for "oh crap" bare metal stuff, then an offsite service just in case you lose your backups at that location.
Figure out your scenario too. Are multiple Drobos good enough, or do you need offsite storage in case the facility is flooded? Is tape an option? Tape is notoriously expensive per drive, but is very economical once you start using multiple cartridges. Can you get away with plugging in external USB/SATA/IEEE 1394 hard disks, backing to them, then plopping them in the Iron Mountain tub?
The hard drives are desktop class, not designed for 24x7 operation. Not designed for massive write traffic that server backups generates.
Latent defects on disks are a real concern.
You write your data to a disk, but there's a bad sector, or miswrite, and when you go back later (perhaps when you need the backup), there are errors on the data you are reading from the disk.
Moreover, you have no way of detecting it, or deciding which array has recorded the "right value" for that bit...
That is, unless every bit has been copied to 3 arrays.
And every time you read data, you compare all 3. (Or that you have two copies and a checksum)
Well, the complexity of this redundancy reduces the reliability overall, and it has a cost.
Since we're talking about Final Cut data, it's safe to assume that it's all coming from Macs. The version of cp on Mac OS doesn't take either of those options, so it's a moot point.
Time Machine is probably the way to go. It's integrated into Mac OS, and it's ridiculously easy to set up. I don't know how it scales up, but I'd be very surprised if it couldn't handle 12TB.
rsnapshot + mdadm raid6. Agreed 100%. That's what I'm currently using. Works like a charm for over 2 years now (and single HDD failure in meantime).
#
#\ @ ? Colonize Mars
#
I also use rsync and OpenSolaris/ZFS to keep daily backups. BUT - important: If the content is made of big files that change slightly each day (e.g. VMWARE/VirtualBox disk images), make sure you also use "--inplace" when you do the rsync, so that you take advantage of the copy-on-write semantics of ZFS. For example, I am using rsync to back up a VMWARE server to an OpenSolaris/ZFS fileserver, where the virtual disks are huge "vmdk" files - in the order of 10GB each. These huge files change only a little each day (less than 1%) - rsync would indeed realize this and only copy over the network the parts that changed, but it would store completely new copies in the backup server for each day! (I am assuming here that you would ZFS-snapshot each day). If instead you use the --inplace option of rsync, rsync will not only send the blocks that changed, but it will also only write the blocks that changed - thus, your ZFS will be able to host many years' worth of daily snapshots of these "vmdk", a truly marvelous thing, if you think about it...
Why bother?
OpenSolaris will run rsync just fine and it is also free.
There are a lot of good solutions out there so I wouldn't limit myself to just Linux.
You have OpenFiler running on Linux.
You have FreeNAS on BSD.
And you could roll your own on OpenSolaris and use ZFS with fancy gui tools if you really want to.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Even though I'm writing this from a linux box, if you're going to be storing that much data and you want to do it cheaply, you should really look at ZFS as the filesystem of choice for the backend.
As for moving the data over there, sure use rsync and then use zfs's snapshot features so you have some rollback capability.
Why ZFS? So I'm envisioning that you're going to need a mid range machine (duel power supplies) and hanging off that you're going to have a whole pile of JBOD. You could spend the money on something that does hardware based raid, but if you're cost concious, your best route is to buy a JBOD box and fill it with 1.5TB disks. You could try to manage all of this with LVM and possibly XFS, but it would be nightmare. ZFS basically rolls RAID/LVM/FS into a single layer. Thus adding disks to your array becomes trivial. Also, I would recomment that each user/application get it's own sub filesystem on the array, that way you'll have much finer granularity for snapshots/quotas/etc.
I didn't intend this post to be an advertisement for ZFS but I have such a setup with ~14TB of disk on it right now and it works great. As for the OS on top, you could go with opensolaris, or netezza (which is just debian rolled ontop of the opensolaris kernel.
Yes Francis, the world has gone crazy.
VMWare Snapshots
Are you backing up just data, or configurations or what? Backup Solutions are nice and all, but you're still missing something .... all the crap^H^H^H^H configurations that you've collected over the years of using that particular setup.
And once you go to VMWARE (or other VM product) you'll quickly realize that the abstraction away from specific Hardware is very nice indeed.
However, if one is REALLY concerned about backups, a duplicate Hardware setup in a seperate location sitting idle (or cold) is a necessity. And having a VMWare snapshot ready to load on backup hardware is just tits when things REALLY go south. You end up looking like a genius, and get to play Scotty (over engineered everything).
The difference between amateurs and professionals is not when things are going well, it is when the shit hits the fan. A weekend Geek can built the $8000 backupsever or whatever of storage, but once the drives start to fail (and they will) that solution starts to REALLY suck because you can't get to the freaking drives easily (and I doubt it will tell you that the drive even failed).
Let me just say it this way, if you can't afford "over engineered" equipment, you can't afford to do it right.
So, VMware, snapshots and spare hardware offsite are the way to go. Anything less these days is simply weekend geek pride.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Why bother?
See GP. If the hardware I want isn't supported by Solaris, but is supported by Linux, I'll want to use that.
OpenSolaris will run rsync just fine
It'll also run NFS, so if the hardware will support it, you do have a point -- even if I "needed" Linux for some reason, I could still use Solaris for the physical storage.
Don't thank God, thank a doctor!
for a multi-vendor environment, take a look at Unitrends. I use them and they are really sweet, disk to disk, any OS, bare-metal windows (and linux), hot swappable off-site drive or off-site vaulting. Plus, there is no charge for clients if you want to backup a database, or exchange server. It's all inclusive, even the open file client.
In my experience, getting open files backed up is the hardest thing in a 24/7 environment.
Cheap storage VM.