Best Backup Server Option For University TV Station?
idk07002 writes 'I have been tasked with building an offsite backup server for my university's television station to back up our Final Cut Pro Server and our in-office file server (a Drobo), in case the studio spontaneously combusts. Total capacity between these two systems is ~12TB. Not at all full yet, but we would like the system to have the same capacity so that we can get maximum life out of it. It looks like it would be possible to get rack space somewhere on campus with Gigabit Ethernet and possibly fiber coming into our office. Would a Linux box with rsync work? What is the sweet spot between value and longevity? What solution would you use?'
Holy crap we're approaching the need for an Ask Slashdot FAQ. I feel old.
Use Final Cut Server.
Try one of these babies on for size. 67TB for about $8,000.
There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.
Talk to a mechanical engineering student on campus, they can probably help with that.
A couple of details you'd need to fill in before people could give legitimate advice.
What's the rate of change of that 12TB. Is it mostly static or mostly dynamic. I would assume it's mostly write once read rarely video but maybe not.
Do you have a budget ? As cheap as practical or is there leeway for bells/whistles.
Is this just disaster recovery. You say if the station gets slagged you want a backup. How quickly do you want to restore. Minutes, hours, next day ?
Do you need historical dumps ? Will anybody want data as it existed last month ?
Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)
If you just want to dump data and restore isn't critical, you just need to be able to do it in some time frame then sure rsync'ing to some striped 6 (or 12) TB SATA array is plenty good.
One of the most reliable backup solutions I have put in place for most of my clients is "acronis"....It does a great job backing up across a network just schedule it for during the night as it will take some bandwith ... I deal with ems/911 servers and backups is one of the most important things I recommend to anyone... My setup for one of my biggest clients is...A dedicated server running "Acronis" with a 1 tb of hd space backing up 3 mid size servers... every night...
Why build and maintain a server, just push it to amazon.
That's all you need. We even use a script to create versioned backups going back six months using perl as a wrapper.
Assuming the same paths, edit to your liking. I've made the scripts available at http://www.secure-computing.net/rsync/ if you're interested. It requires the system you're running the script for have root ssh access to the boxes it's backing up. We use password-less ssh keys for authentication.
The README file has the line I use in my crontab. I didn't write the script, but I've made a few modifications to it over the years.
Does your university have a backup solution you can make use of? The one I work at lets researchers onto their Tivoli system for the cost of the tapes. I think I've got somewhere in the neighborhood of 100TB on the system and ended up being the driving force behind a migration from LTO-2 to LTO-4 this summer. If you are going to go and role your own and use disks, I'd recommend something with ZFS - you can make a snapshot after every backup so you can do point in time restores.
Also, I'd recommend more capacity on backup than you have now to allow versioning. I was the admin for a university film production recently (currently off at I believe Technicolor being put to IMAX) and I've lost track of the number of times I had to dig yesterday's or last week's version off of tape because someone made a mistake that was uncorrectable.
What solution would you use?
First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.
That said...
For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.
That said, and if possible, I would also build the "backup" machine with more storage than the "real" machine. As someone else pointed out, you'll probably discover within a few days that your food-chain-superiors have no concept of "redundancy" vs "backup" vs "I can arbitrarily roll my files back to any second in the past 28 years". Having at least nightly snapshotting, unless your entire dataset changes rapidly, won't eat much extra disk space but will make you sleep ever so much better.
You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states. However, because it keeps these diffs in addition to the mirror, it's better if you have more space on the backup side.
There are a few different frontends/guis to it but I don't have experience with them.
Did you check into CDs?
Why not a complete duplicate of all of the hardware? If the studio combusts you have an exact copy of everything.. hardware and all. If you use any kind of disk imaging software, you can simply recover to the server with the latest image and lose very little data.
i recommend losing the drobo as fast as you can - i know 4 people who bought these and all 4 lost data in the first year.
Everything your TV station broadcasts will automatically be backed up here.
What I use is BackupPC. It's a very nice web front end to tar over ssh.
For linux, all the remote servers need are sshd listening somewhere, and with the backuppc servers public key in an authorizedhosts file. It will pipe tar streams over an ssh connection.
For windows, it can use samba to backup over SMB
I run a copy on my home file server, which backs up all the machines in the house, plus the couple servers I have out in colo.
When it performs an incremental backup, after it is done it will populate its timestamped folder with hardlinks to the last full backup for duped files. so restoring from any incremental will still get the full version no matter when it was last backed up.
Also after each backup, it will do 2 hashes on every file and the previous backup. If the files match, it deletes the second copy and again hardlinks it to the first copy of the file.
I have nearly 3 months worth of backup retention, backups every 3 days (every day on a couple), but for the base system and files that rarely change, each 'copy' does not take up the same amount of disk space.
It is very good at saving disk space.
Heres some stats from its main page as an example
There are 7 hosts that have been backed up, for a total of:
* 26 full backups of total size 38.34GB (prior to pooling and compression),
* 43 incr backups of total size 0.63GB (prior to pooling and compression).
Pool is 10.11GB comprising 108499 files and 4369 directories (as of 9/16 01:00),
Restoring gives you a file browser with checkboxes. after you tell it what you want, it can send you a tar(.gz) or .zip file, OR it can directly restore the file via tar over ssh back to the machine it was on, by default in the original location but that can be changed easily too.
The main downside is the learning curve. But once you get things down, you end up just copying other systems as templates, updating the host/port/keyfile/etc settings.
Also, with all those hard links, it makes it a pain to do any file/folder manipulation on its data dir.
Most programs won't recognize the hard link and just copy the file, easily taking up the full amount of storage.
But works just as well with only itself and one remote server.
schedule it to start at night and stop in the morning, set your frequency and how much space to use before it deletes old backups, and let it run.
Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion â" inevitably, there will be accidental deletions and the like occurring in your studio. If you use rsync (with --delete, as any sane person would, otherwise your backup server will fill up in days, not years), then when some n00b runs `rm -rf ~/ReallyImportantVideos`, they'll be deleted from the backup too.
Remember that pro photography website that went down, because their "backup" was a mirroring RAID setup? Yep â" they lost all their data on one fell swoop when somebody accidentally deleted the whole lot. Don't make the same mistake.
Use an incremental backup tool. Three that come to mind are rdiff-backup, Dirvish, and BackupPC.
I would think that rdiff-backup would suit your needs best. I currently use BackupPC at home, which is great for home backups, but I think that it's overkill (and possibly a bit limited) for what you want.
Hope this helps!
If you are willing to try something a little different, the ZFS file system is ideal for this.
while 1:
rsync to the zfs filsystem;
snapshot the zfs filesystem;
delete snapshots more than 1 week old;
We've found that, for data that doesn't change often, you can use this mirroring technique to "backup" three or four TB in ten minutes.
You could also turn on zfs on the fly compression but it would probably not help here since your source data is likely to be already compressed.
My company is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features. Offsetting that is the horrible, and I mean horrible, cost. Acronis just (as in like less than a month ago) came out with their new backup product, which they even give a free trial for. It does bare metal restore among other things, and I was very impressd with it, but it didnt meet some of my requirements and I didnt get to play with it much more. On the cheaper more jenky side of things, I have tried NovaStor backup products with overall horrendouse results, stay away completely from them. (things like being able to export data directly to a removable drive for first time transfer is ridiculous!) I am very impressed with a completely off the wall solution called RBackup. It seems at first very "made in india" but it has tons of features that are easy to understand (being brandable is a big plus) and generally can be setup quickly or very granularly. If your using a windows system you should check it out.I have also looked at symantecs and other things, but these so far are a few of the major players in the "I want to remote backup my own data to my own servers" category (which excludes lots of stuff) Since I am still in the review process, I am also curious to see what other people say. I can also tell you that I have setup almost 4 drobos now and they really rock, so your doing good on that front!
If you're considering doing incremental or archival backups I would look into using dar. It's sort of like tar on steriods, and is great little utility. It's also nothing like bleeding edge, runs on both Linux / BSD platforms and has a windows port (that I've neever used). Combining dar w/ ssh and some simple shell scripts might be the sort of solution you're looking for.
The Backblaze hardware setup looks impressive and might be worth a look. As for software how about something like openfiler http://www.openfiler.com/ If those 2 could be combined it would make one impressive setup.
I love rdiff backup but I'd never use it on any large datasets. I attempted to use it on ~ 600 GB of data once with about 20GB of additions every month and it ran dog slow. As in taking 6+ hours to run every day (there were a lot of small files, dunno if that was the killer).
For larger datasets, like what the poster has, I'd go with a more comprehensive backup system, like bacula. I use that to backup about 12TB and it's rock solid and fast. There's a bit of a learning curve, but the documentation is very good.
If Bacula is too intimidating rsnapshot would be a viable route, it's similar to rdiff-backup, but simpler (pretty much just rsync + cp using hard links), faster, and easier to use. It's not as space efficient, but diffing video data is probably a waste of time anyway.
Photos.
ZFS replication and snapshots. Of course, you'd need something which groks ZFS on both sides of the link.
While our storage needs are nowhere near that size, I can attest to the greatness of Bacula. The hardware part is probably up to you, but as far as software, I cannot preach this software enough. 1) It's completely cross platform in terms of systems you can pull data from. The Director and Storage Daemon run flawlessly on every distro of Linux I've tried it on (Slackware, Debian, and Fedora)... and the restores are easy as pie with some of the available interfaces. Configuration is a pain and can take awhile, but once it's set, you're done. We have 5 servers, two of which are hosted outside the company and we don't even have physical access too... I was able to set these up to work with the same backup solution as if they were local with ease. Other internal servers are Windows 2000 -> Mac OS X... all backup without issue, daily incrementals, weekly diffs, and once a month fulls.
...
You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states.
Seriously people, learn the tools you have available on any stock Linux system.
Even assuming you run a much older system with an FS that doesn't support online snapshotting... "cp -al <source> <destination>". Period.
iSCSI rocks... and these things have everything built in. Seriously cool units. Costly though - but you know where that money goes when you use it - or should I say, spend 10 minutes setting it up and then job done.
My university is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features. Offsetting that is the horrible, and I mean horrible, cost. Acronis just (as in like less than a month ago) came out with their new backup product, which they even give a free trial for. It does bare metal restore among other things, and I was very impressd with it, but it didnt meet some of my requirements and I didnt get to play with it much more. On the cheaper more jenky side of things, I have tried NovaStor backup products with overall horrendouse results, stay away completely from them. (things like being able to export data directly to a removable drive for first time transfer is ridiculous!) I am very impressed with a completely off the wall solution called RBackup. It seems at first very "made in india" but it has tons of features that are easy to understand (being brandable is a big plus) and generally can be setup quickly or very granularly. If your using a windows system you should check it out.I have also looked at symantecs and other things, but these so far are a few of the major players in the "I want to remote backup my own data to my own servers" category (which excludes lots of stuff) Since I am still in the review process, I am also curious to see what other people say. I can also tell you that I have setup almost 4 drobos now and they really rock, so your doing good on that front!
... BitTorrent pirates. You'll always find last night's shows backed-up on TPB the next morning. Yaaarrr!
We backup 15TB nightly (using tar over NFS) with BackupPC running on two servers each with 10TB of storage pulling data from a high performance NAS (BlueArc). We retain 30 days of incremental backups and do a full for the various home directories every 30 days.
Just to make the /.'ers happy, I think using RAID is the best backup solution possible. In the event of hard drive failure your data is still safe!
But rsnapshot works even better. When I worked for the RI Sec State's office we found tape backup wasn't cutting it for us. We picked up a cheapie HP server loaded it up with storage and bought a bunch of terabyte capacity external drives for off sites.
You don't know what a relief it was to be able to go to a web interface and restore files from there. Worked great with linux boxes, but you had to jump through a few hoops to deal with the Windows servers we had.
Cred: Some years ago I 'engineered' and essentially built a community radio station.
Will you ever need to stream direct from the backup to air? (Go and ask the management and the other techos: "ever".)
Why? This will answer what speed you need to transfer data both to and FROM the backup, and whether you need to take any special measures to ensure that there are no bottlenecks and single points of failure in the path. And, you'll find out whether the Master Control/studio needs to 'control' this path and so what you'll need to build in at Control.
What does the Production department need for editing?
Why? Someone else has discussed versions, and from my experience there is at least a several-to-one requirement for digital space during editing. It also answers why the editors, at their suite(s), may need similar 'control' as for Master Control.
Is the station using a control computer to put content to air?
Why? Almost certainly, is the answer. You'll have to not only give Master Control a 'manual' system, but provide some way for the control computer to stream to air, and they'll be subtly different so "get over it" and plan that way.
What happens when a drive/video coder/etc blows in some system? Can you be off air? What's the time for a fix?
Why? If your station can be off air then you fix at the next available opportunity; but, if you must be on air (like a 'commercial' station) then you have to plan and execute a solution like the commercial one, only cheaper.
Looking at space, radio, science and computing from a 'down-under' amateur enthusiast perspective.
Have each student create their "own TV station" as part of their degree requirement - no matter the area of study. Similar to research essays, you'll get the following results: 1) students who completed the assignment with no outside assistance 2) students that copied certain small portions of the data you are backing up and presenting it as their own 3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work.
As this data appears on the University network, the entire TV station will be backed-up in a local "Cloud". And if these types of assignment become popular at other universities, you can expect to find redundant off-site backups. By this point, the 12 TB will appear on BitTorrent (and probably on Newsgroups and IRC for the dedicated plagiarists). A full restore will only take a few days - as long as the full 12 TB is seeded.
Do you know what -l does?
get an iSCSI device:
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=226 The Promise VessRAID series is currently available through distribution. Pricing starts at $1,899 for an 8 bay system and ranges to $3,099 for a 16 bay system. A fully populated 16 bay subsystem costs less than 26 cents per gigabyte, using enterprise-class 7200 RPM 2TB hard disk drives.
so basically, $2.6k for a unit @CDW, 16*$300 for 2TB hard drives (newegg)
total $5k for 32TB raw.
You don't have backup needs, you have recovery needs. Backup enables you to fulfill those needs.
As has been mentioned many times above, there's no one fit answer - but I don't think you're even asking the right questions.
Under what circumstances will you be recovering data? There are two main types of recovery:
day to day recoveries where users want older versions of files or to replace a corrupt or deleted file; and
disaster recovery in case of hardware, system or site failure.
Will you support both recovery needs? If so then for day-to-day recoveries you need backups every day kept for any length of time deemed appropriate. Proper tape based backup is still the industry standard here just based on the volume. 12TB at 75% used, running full backups every week kept for 4 weeks, and daily cumulative incremental backups with 5% changes every day kept for 10 days means 51.3TB of data. Plus, you don't want all your copies on a single media, imagine if that thing failed?
For disaster recovery you need to know your RPO and RTO? Your Recovery Point Objective is basically how much data can you stand to loose while your Recovery Time Objective is how long after the disaster you can take to get back up and running. Answering these will tell you how often you need to run a backup and what storage technologies and methods are appropriate, or at least which ones are inappropriate. How are you going to protect your data from the disaster - how far away is far enough? I wouldn't consider the same campus as far enough away.
There are a number of products out there. I personally work with NetBackup from Symantec and it's pretty much an industry standard, but that's my employer's choice. I've looked at amanda (http://www.zmanda.com/) a few times, but haven't done any real testing with it. There's data protector, BackupExec and many listed at http://en.wikipedia.org/wiki/List_of_backup_software
Recommending a backup solution where if one power supply dies you immediately corrupt the entire array? Yeah, that's JUST what he needs...
Please help metamoderate.
A single backup using rsync isn't going to cut it. Imagine backing up corrupted data, overwriting other stuff. Also, having all backups on the same network is a bad idea if malware ever gets in. Your second level of backup should probably be tape, making a monthly and a yearly backup. Then store the tapes in a concrete and steel fire safe. Tape has longevity that your other options don't.
Backing up Final Cut Pro projects and media files seems like a simple enough problem: just copy the files to a tape archive or drive array and be done with it. However, there is more than one reason why archive storage might be required: disaster recovery and long-term storage of project files.
The long-term case is more interesting - as local storage runs low, projects are archived for later retrieval. How do you remember what each archived project contained? How can you be sure that the item that you're retrieving will provide what you're looking for? Restoring projects from any archive is a slow process - especially when using HD formats - so why do this when all you want to do is to use a short segment from a programme?
Most broadcasters employ some form of Media Management to manage this process, allowing editors and producers to browse a permanently available low-resolution version of the archive content, and to restore smaller segments from it. Partial restore using browse-based shot selection dramatically reduces the amount of data transfer and helps to speed up busy editing operations. Employing this is probably overkill in this case, but a different angle to consider on what appears to be a simple problem.
I went down the current list of comments, and for all the people who write their own rsync tools, please go review 'rsnapshot'. It's quite efficient: it's major flaw is that it lists snapshots as 'hostname.1', 'hostname.2', etc., instead of 'hostname.YYYYMMDD', which would ease things for users grabbing their own old files from online.
Anyway, I have a Fedora box with a RAID 5 made of four 1 TB disks. There is a partition on the RAID called /backup0. That's not really a backup, but more meant as a convenience. I back up all my data to /backup0, then right away use rsync to copy the new data to an external drive that is either /backup1 or /backup2.
I have a safe deposit box at my bank. Every week or two I swap the external drive on my desk with the external drive in the safe deposit box.
So the reason I have that /backup0 filesystem is so that I don't have to sync the two external drives to each other - otherwise I would have to make twice as many trips to the bank, and there would be some exposure were my house to burn down while I had both external drives at home.
My suggestion for you is to find two other University facilities that are both far away, and offer to trade offsite backup services with them.
You would have two backup servers in your TV station - one for each of your partners - and they would also each have two, one each for you, as well as for each other.
That way only a hit by a large asteroid would lose all your data.
I got religion about backing up thoroughly after losing my third hard drive in twenty years as a software engineer. Fortunately I was able to recover most of that last one, but one of the other failures was a total loss, with very little of its data being backed up.
Request your free CD of my piano music.
You've got low latency and high bandwidth. Make your storage iSCSI OpenFiler configured in cluster mode with block replication. Do use a pair of the BackBlaze boxes somebody else mentioned. Configure with RAID 6. Get enterprise support here. You're in and done at $16K capital cost, $2k labor, and annual support (24/7 4 hour response) at $6200/yr for 67TB of raw storage (~48TB net) plus whatever the network, rackspace and power costs, and it scales in volume storage at linear cost when your needs do and the more volume you have, the better performance gets. As a bonus it fits in two 4U slots.
If you want to skimp you don't have to fully populate the boxes until you need the room and can save $8K in capital costs up front. Every couple of months you have to hot-swap out some cheapo consumer grade drives so buy a few spares and configure them as hot spares and a few more for cold spares. If you have some extra Franklins, splurge on the 10G Ethernet connection from the BackBlaze box to the local network - the remote can stay on Gig-E because it's only used for writes or HA. With a little mental gymnastics and PSU field modifications you can use one BackBlaze master to control up to three BackBlaze slaves with passthrough connections only - no internal server needed. Just get the cards with some external eSATA or external SAS ports, depending on your preference. You might need to upgrade the motherboard spec on the master BackBlaze box, but it's worth the extra money. Since Openfiler support is unlimited CPU you may as well get the dual quad core Nehalem motherboard with 72GB RAM and 8 PCIe slots, or whatever's in the sweet spot this week. I do like the X5550, but if you can get a quad core for under $100 it's hard to pass up, especially combined with one of these cheap motherboards that use up to 32GB of cheap DDR2 RAM. Be careful with your PCIe slot counts when choosing motherboards.
Configure whatever machine you're using to do a backup periodically from one i-SCSI LUN on the local machine to another LUN. This gives you protection against 90% of backup needs (oops! I accidentally all all my presentations!) and will be transparently replicated to the HA site at block level without user intervention. Somewhere in here you should educate users that backup systems are not an alternative method of version control.
You could probably upgrade this with a few TB of PCIe attached SSD cache (pdf) for the million plus IOPS, guaranteed multiple 10Gbps network port saturation for an additional $40k, if you knew how, or why, or needed to.
Or you can go cheap with Linux and BSD and some scripts. You won't save any money and you won't have support. Buy the support. It's worth the money. Disclosure: I don't work for any of these folks. For the company I work for I can quote you a FC SAN. Trust me, you don't want to know what that costs for 67TB with block replication to a DR site and 24/7 4 hour support, let alone the scalable solution I've proposed here. Just assume it's "a lot".
Help stamp out iliturcy.
Well, if you choose to backup on OS X native, which your post doesn't state since rsync is on OS X as well, there's BRU Producer's Edition. Time Machine can be a bit resource hungry in my experience, so that may not be the best option for you. On the Linux front, there are a few tools to do the trick. Again, TOLIS Group has BRU Server for Linux native, but that's a higher price than BRU PE is going to be. However, if you're looking for a free product, rsync may not cut it due to the limitations that many others have already mentioned. There's MondoRescue, but again, I don't think that will work to the needs that you require. Though the user 'mlheur' hit the nail on the head in my opinion. You need to focus on your restore needs and then choose a backup application that fits those needs!
It works. It's iSCSI + CIFS / Windows share. It has clustering and block replication. It's open source and support is available. Support is per server - unlimited sockets and storage - so you could really work them with a few hundred PB on a pair of 8 socket/32 core servers. I don't work for them, but they rock!
They're geeks. If you bribe them properly they might come up with a proprietary block level dedupe solution for you.
Help stamp out iliturcy.
http://forums.freebsd.org/showthread.php?t=3689&highlight=zfs+remote+backup
simple cheap and easy
I'm very concerned about just being able to find the particular file that I need, so I have my backups organized by topic - on each of my backup filesystems, there is a directory for my financial data, for my source code, for each of my websites and so on.
In each directory I put a bzip2ed tarball named for the date - for example "OggFrog_SVN_2009-09-16.tar.bz2". Most of my files compress quite a bit so I don't need to worry yet about running out of space.
The stuff that doesn't compress well mainly consists of media that is already compressed - audio files, my digital photos and so on. I tend not to keep infinite backups of that stuff, but just the latest copy.
It was quite a chore to get it all organized, as to make it work I had to organized the file structure that the backups came from, so that it would be easy to create each topic backup. But now that I have it all organized it is quite easy to deal with - and it's easy to find old files on my backups.
Request your free CD of my piano music.
nothing, on cp: illegal option -- l
Since we're talking about Final Cut data, it's safe to assume that it's all coming from Macs. The version of cp on Mac OS doesn't take either of those options, so it's a moot point.
Time Machine is probably the way to go. It's integrated into Mac OS, and it's ridiculously easy to set up. I don't know how it scales up, but I'd be very surprised if it couldn't handle 12TB.
Although it appears they got bought by EMC.. hrm.
Deduplication can help you reduce the size requirements on the backup server.
If buying new capacity, you should probably think about buying a backup server that can be expanded to have more capacity than your existing server, depending on current server usage.
Plan for a few years down the road, when it becomes necessary to expand capacity of the main server, backup more servers... or more likely: store multiple old versions of files that changed over time.
Normally.. if you have a 500 mb video file, and someone made some edits to it and re-saved. There are now going to be two 'files' in the backup repository for a time: the old version and the new version with the edits (twice the space usage)
So storage requirements on the backup server can actually be much more than storage requirements on the server being backed up.
If online backup is an option, why not try http://www.wuala.com/ ?
Your analysis may not work in this case. This is not a backup system for a large number of business/educational users. It's for a relatively small number of video editing stations. One new video project can easily generate hundreds of gigabytes of new data that needs to be backed up. The average daily churn rate may be comparable, but the peak churn could well be many times that.
Digitized video is not usually backed up the same way as conventional files or databases. Raw digitized video files do not change, and get archived once. Completed projects can go through a clip trimming process whereby the unused portions of the clips are trimmed away, making an archive of the entire project more space-efficient. Then, the raw digitized video files can be deleted. After all, the backup for the video are the original tapes themselves, not a computer-digitized version. The backup rules of a general-purpose office system are very different, and much less efficient.
That seems to be working for Google, MSN and Yahoo.
Maybe they're doing something wrong. You should school them up.
Help stamp out iliturcy.
I'm not the asker but I also work in the campus TV station and can provide additional details.
The primary method of storage right now is a RAID 5 based array containing around 6 TB of data. We'll be adding a Drobo Pro in the next few days with an additional 6TB of storage. Together this will serve the 12TB of data on the server located in the studio. The system storing the data now is running CentOS 5.
However, this is not a good method of backup. It just provides redundancy in case a hard drive fails. What we want is an offsite server which will serve as a backup system. The system will be located in a separate building but on the campus network (transfer speeds not an issue).
We want the backup system to be able to store the original 12TB. HOWEVER, it needs to be expandable or at least have enough space to accommodate additional data over the years. So I'm thinking the original setup could have around 16TB of storage. However this needs to be expandable up to 24TB or 32TB without too much extra work involved. With a transition to HD video we plan on having at approximately 1TB of new data per year and this will increase over time.
Because we want the system to be expandable we don't think RAID would be ideal. The idea of having to use identical drives feels very limiting. Hence the reason a Drobo Pro is very appealing. However, it just doesn't support the capacity we require beyond the initial studio server. We want to have a version control system which will require additional storage as well. We don't need daily complete backups. Just something like subversion or CVS which will log when changes are made and save them. That way if someone decides to delete all the directories a history will be stored. The snapshots of the versions don't have to be in real time - they can be done daily. If there are no changes in one day then no snapshot will be required. Typically we do a dump of all the data to the studio server once a month from each editing machine. So snapshots would occur approximately once per month. This data is rarely read - maybe once/twice per year and not all of it.
Restoration time is not that important. As long as it takes less than a few days. No application data is being stored. It's just raw project files and video files that are in directories.
We'd like this to cost between $3000 and $4000 for everything. Obviously, cheaper is preferred.
Not being pedantic but it aint a lot to backup. Just get a pair of MSA2000 with 1TB SATA disks. Total cost £20,000 inc tax. MSA2000fc if you can do fibre. Then just get a LT04 tape robot, HP, Overland or similar and do a disk to disk to tape backup setup. So you not only have tape backups for if the entire place burns down, and you also have disk based backup for a quick restore when someone accidently deletes a file. Also with DDT the throughput will be high enough to quickly complete the backups in a small time window over night.
http://www.writeitfor.us - Writing IT for the IT generation.
Your backup system must support the intricacies of HFS+ (the format of Mac hard disks) - otherwise you might loose important data in resource forks and extended attributes.
Rsync in all versions to Mac OS X 10.5 doesn't properly backup resource forks and extended attributes. I've heard there's changes in 10.6 but I've not investigated or tested them.
Use the application 'Superduper' to store your Mac files in a 'Sparsebundle'.
A 'Sparsebundle' is a single file that supports HFS+ and can live on other files systems (such as your uni's servers).
They used it for the Black Box from the first space shuttle explosion. There is a sort of paint consisting of microscopic magnetic spheres that are black on one side and white on the other. When you paint a tape with this, you can see the magnetic patterns on it. You would then take high-res digital photos of it, and recover the data from the photos. It worked for the space shuttle tapes, but as you can imagine it is very expensive to do.
Request your free CD of my piano music.
Use rdiff-backup, it makes a mirror copy of the latest version and also stores older backups as increments. It's very fast and stable.
The newest FreeNAS RC has support for ZFS and is ideal for backup purposes. The checksumming facility of ZFS also makes you sleep well at night, knowing that silent bit corruption doesn't eat your data. And it has built-in support for rsync.
Robin Smidsrod Certified Linux Administrator
Just put a NAS like the QNAP TS-809 (8-drive) at a remote location. It talks rsync, and that's all you need. It's available as standalone, rackmount and redundant PSU, and is more affordable than a big RAID box.
;)
Do the first rsync with the NAS next to your server, move it offsite afterwards
These are all built on top of rsync and turn it into a real backup tool by storing multiple versions of your files. The challenge will be the very large video files, but if you only write to these once, they are a good option.
Rsnapshot uses hard links combined with rsync --delete - rather than actually delete an old copy of a file, it unlinks it, and when there are no changes in a file, it simply creates a link to it under the current snapshot. It's not as space efficient as DAR but your big files are probably already compressed, and perhaps don't change day to day, so it may be a good option here. There are many other rsync-based tools that are equivalent, but this is easy to configure and has an active community. It only does 'pull backups' i.e. the backup server logs into the system to back up, and if you are doing full backups of the whole system, it needs a root login.
rdiff-backup stores block-level incremental diffs within its repository - it apparently has issues dealing with extremely large files so I'd be sure to test this. However it doesn't need a root login on the system you are backing up, and is more space efficient than rsnapshot. It has some level of checksumming to help detect corrupt backup files.
duplicity is a bit like rdiff-backup, but apparently does encryption as well, if that's important.
BackupPC is more work to set up, and really designed to back up a large number of client PCs, but it does provide more features, and has rsync as an option on network level. It does de-duplication across the client PCs which is good for full system backups. However, Windows backups mean you have to mess about with volume shadow copies (VSS), as for all these tools.
If you don't want rsync for some reason (despite it being insanely fast and network efficient) you could use DAR - like tar only much better for disk to disk backups, as you can extract a few files efficiently without reading the entire archive from start to finish (as tar requires). Also it does more granular compression and checksums, so if you lose some blocks due to disk corruption in the middle of the archive, most of the files can still be recovered. The rsync tools have the same granularity to some degree, but don't normally do any compression.
The world really, really needs a guide to the major categories of backup tools, pointing people to the right type of tool based on their requirements...
Most people seems to be concentrated on describing how to duplicate the server. This is great for availability but does not protect against mistakes/sabotage. Corruption on main server is too easily spreading into the clone.
Use a real backup tool (backup2l is great) and THEN use rsync to copy (not mirror/delete) the backup to a secondary server.
The secondary server should be locked down with no public services running to minimize the risk of somebody hacking both machines and sabotaging your backup.
Note to self: Make a sig
The backups on the archive server appear as complete copies of directories of the backed up machines. There will appear to be one complete backup for each day - this lets you find/restore a consistent set of files from a particular day.
The script cleverly avoids copying files that have not changed. It economises on disk use by only keeping one copy of each file - but makes that one copy appear in the various daily archives.
The idea is that one central archive server initiates backups on several other machines.
This script works well where you have many files that do not change from day to day, eg word processing documents. It is not so good where most of your files change frequently - but will still work.
GPLed, get it from: http://www.phcomp.co.uk/Packages/RsyncBackup.html
ZFS snapshots & "zfs send." Nothing is easier and cheaper than OpenSolaris' ZFS for this kind of thing.
You don't give us a hint about your campus, but I'm sure that they have some form of backup system. Tivoli Storage Manager is big, bulky, and contrary, but in the hands of a paranoid pessimist (like our campus's TSM admins), it handles huge amounts of data and handles multiple copies all over creation. The systems I admin (AIX, SAP, DB2) regularly push 2T a day directly to an 8 drive LTO-3 library, while others are backing up to the 12 drive IBM "Jaguar" library in a different part of campus.
Check out main campus IT. At worse, you might have to buy them some LTO tapes or pay a per-meg fee, but you'll probably find a well-designed system that you don't have to maintain.
(If you do use TSM with Macs, go to the 6.1 client. it's a LOT better than 5.3 on the Mac. Also, run the first backup by hand. The client has memory consumption issues sometimes on the first backup.)
Yes, it makes copying much faster
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Oh yea, and I forgot to mention that most of my clients servers with 200gb or more of company data takes on average 30 minutes to perform a full backup to an SATA based RD1000. The image becomes nice and compressed and can be restored at equally the same speed. if you're backing up to a BDR appliance with virtualization failover, then you're up in a few minutes. if you're restoring an image to a different server, then give it about 30-40 minutes and you are back in business. time is money, avoid downtime is crucial. maybe not for a college TV station though :-)
*plays the Apogee theme song music*
http://www.tedial.com/en/productos_en.html
Multiple times I have lost my entire rdiff-backup backup when the client didn't have exactly the same version of rdiff-backup as the server.
My advice is to just do a 3 way rsync.
That way you can find and restore your backed up files without any special
tools, reducing the risk of the backup tool either trashing your data or the
backups
Hard links used in a 3 way rsync (the 3rd way being a reference to the last
non-incremental backup) mean you save space for unchanged files.
You won't get the space saving when large files change slightly, but that is
more than made up for by not getting the space back when rdiff-backup deletes
all your data.
blog.sam.liddicott.com
Completely free solution. I've been carefully looking at this for about a year. In the end, we went with:
Server 1: Xen Host running Ubuntu-64 Server with Ubuntu Xen clients
- Virtualization - this is key. We use Xen, happily, thank you.
- 6 production VM servers on a single physical, relatively small, host.
Server 2: Local Backup Server running OpenSolaris
- ZFS file system in a RAIDz2 config; this is critical
- receives rdiff-backups of Xen images, opened by the host during rdiff-backups. The rdiff-backup is performed during nightly maintenance windows in just a few minutes for each VM.
Server 3: offsite, DR storage running OpenSolaris
- receives rsync images of srv2 rdiff-backup folders.
- ZFS file system in RAIDz2 config.
- May switch to zsend mirrors in the future, but we are happy with rsync.
Our CRM system takes less than 3 minutes to backup. Our email system takes about 4 minutes of downtime to backup. These are complete OS, Data, DB, and application rdiff-backups.
We retain 30 days of incrementals in the rdiff-backup local storage for each server. Complete recovery has been used 3 times this year. Flawless. Under 20 minutes from decision to restore to the apps being available.
rdiff-backup lets a 6GB image only be 7GB of storage containing 30 days of diffs. This rocks.
Virtualization is critical so the images can be restored anywhere that Xen on 64-bit X86 is available.
We also rdiff-backup the xen-serverX.cfg files and all the custom scripts used with this solution. These are retained for 90 days.
Another vote against rdiff-backup. I have had it die on very large directories with many small files with perl overflow errors. Google for rdiff-backup errors and you will find a wealth of information.
Between Ubuntu 6.06 and 8.04 the rdiff-backup protocol changed and there was no way to get the new rdiff-backup talking to the old one. No switch to change protocol etc.
Bacula is definitely superior, but nothing beats a commercial solution if you have the money and need disaster recovery bare metal restore.
Since you're running Final Cut, the simplest, most straight forward way of doing backup (especially if you have gigabit ethernet to the remote site - holly cow) is to get another Mac. Two suggestions: 1. If you have the funds, buy an XServe and attach a XRAID es disk array from Active Storage Inc. to it. You can configure that anywhere from 4 - 16TB. 2. Since it's just remote backup and you don't really need a performance system for that, you could do it on the cheap by buying a Mac mini, and attaching one or two LaCie 4big Quadra cubes (also 4-16TB) via FireWire 800 to the mini, and use OSX's TimeMachine backup software (if you can configure it to back up a remote volume) or use Carbon Copy Cloner as a backup tool. Relatively inexpensive, simple, no administrative headaches, done.
Using rsync with hard links lets you version your backups with good space efficiency and a simple structure.
http://www.mikerubel.org/computers/rsync_snapshots/
Say you want a snapshot for each of 30 days. You'll end up with a directory for each day. If you started with 12TB and 1TB changed, your backups for 30 days combined will be 13TB. Plus there are no funky metadata formats.
I also use rsync and OpenSolaris/ZFS to keep daily backups. BUT - important: If the content is made of big files that change slightly each day (e.g. VMWARE/VirtualBox disk images), make sure you also use "--inplace" when you do the rsync, so that you take advantage of the copy-on-write semantics of ZFS. For example, I am using rsync to back up a VMWARE server to an OpenSolaris/ZFS fileserver, where the virtual disks are huge "vmdk" files - in the order of 10GB each. These huge files change only a little each day (less than 1%) - rsync would indeed realize this and only copy over the network the parts that changed, but it would store completely new copies in the backup server for each day! (I am assuming here that you would ZFS-snapshot each day). If instead you use the --inplace option of rsync, rsync will not only send the blocks that changed, but it will also only write the blocks that changed - thus, your ZFS will be able to host many years' worth of daily snapshots of these "vmdk", a truly marvelous thing, if you think about it...
Hi, We have a plugin for FCSvr that allows you to archive entire productions, or single assets for that matter, in a single click to The MatrixStore solution which is a redundant disk based archive... Let me know if you want to know more. N
Before reading please note:
1. I've been up for over 24 hours, my brain may not be operating at its best.
2. I personally have not attempted anything like this, but I think I know enough that it should be do-able.
If I make any glaring mistakes please feel free to point them out and make fun of it whole heartedly.
I'm going to assume the following:
1. Recovery time isn't a huge concern.
2. You or someone that works for you is willing and capable to build it.
3. You want, or would like, point-in-time recovery abilities.
4. You don't have a lot of money to spend.
Buy a case that can fit as many hard drives as possible. For example, this case can take up to twelve 3.5" drives (I do not work for Newegg):
http://www.newegg.com/Product/Product.aspx?Item=N82E16811103029
Get a lot of large hard drives, preferable SATA. If you get a case that can take ten to twelve drives, get 1.5TB (~14TB usable space) or 2TB drives (~18TB usable space).
If you have to use a smaller case you'll need to build more than one system.
Get everything else to fill up your case: (motherboard, CPU(s), SATA cards, lots of RAM, gig-e network card, and a power supply).
Install Solaris and give all of the disks to ZFS.
Use rsync to copy the data to your newly built box to create your initial back up, then create a snapshot using ZFS.
For each subsequent back up use the --delete option when running rsync then create a snapshot using ZFS. (ta-da, you have point-in-time recovery capability!)
Depending on how thrifty you can be, and not considering the labour to build and test it, this setup could cost you as little as $4k USD at current prices.
If Solaris x86 supports it, I recommend getting a motherboard or SATA cards that support hot swapping and a case with front loading bays. Being able to replace failed drives (which will happen) is a nice thing.
Beyond this, when your storage requirements go beyond this first build you can just build another box or throw in some eSATA cards and connect some external drives to expand your ZFS pool(s).
Even though I'm writing this from a linux box, if you're going to be storing that much data and you want to do it cheaply, you should really look at ZFS as the filesystem of choice for the backend.
As for moving the data over there, sure use rsync and then use zfs's snapshot features so you have some rollback capability.
Why ZFS? So I'm envisioning that you're going to need a mid range machine (duel power supplies) and hanging off that you're going to have a whole pile of JBOD. You could spend the money on something that does hardware based raid, but if you're cost concious, your best route is to buy a JBOD box and fill it with 1.5TB disks. You could try to manage all of this with LVM and possibly XFS, but it would be nightmare. ZFS basically rolls RAID/LVM/FS into a single layer. Thus adding disks to your array becomes trivial. Also, I would recomment that each user/application get it's own sub filesystem on the array, that way you'll have much finer granularity for snapshots/quotas/etc.
I didn't intend this post to be an advertisement for ZFS but I have such a setup with ~14TB of disk on it right now and it works great. As for the OS on top, you could go with opensolaris, or netezza (which is just debian rolled ontop of the opensolaris kernel.
Yes Francis, the world has gone crazy.
http://www.netgear.com/Products/Storage/ReadyNAS3200/RN12P0610.aspx It's a 2U, 12 SATA-disk server. You could load it with 1TB drives for 12TB. The software's pretty good (based on Linux) and constantly being updated.
Gan Family Homepage
Just run a fiber link to another office on campus, put a NAS device there, send all archived data there. Pull a drive when it's full, drop a new one in, seal removed drive up in a safe room in the Dean's office.
That took all of 15 seconds of thought.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Does it store diffs of large files with small changes, instead of storing the whole file? If you have a 2 TB file with a small 1K metadata change in it, your solution will take 4TB, and rdiff-backup will take 2TB + 1K + a few more K for dir overhead.
rdiff-backup is a huge win if you have large files with small changes, such is often the case with virtual machines.
Otherwise, backuppc or backula or other simple link based replication deduping would be better.
Blessed are the pessimists, for they have made backups.
Do you know what -l does?
Yeah - It makes a hardlink to the file in question rather than actually copying it.
This takes advantage of the normal behavior of rsync (unless you explicitly tell it otherwise), where it writes to a temporary file before moving that file in place of the original - Which in the case of a hardlink, breaks the link rather than overwriting the original file.
So you effectively end up with a "snapshot" of any files that did change, and no wasted space (beyond the inode entry) for those that didn't (you can prove this to yourself fairly quickly, if you have doubts).
Incidentally, I agree that using FS-level differential snapshotting provides a much more elegant solution... But personally, I've had problems with LVM, and ZFS doesn't come stock on any older Linux distros (and that I know of, none of the rest that do come standard support snapshotting). EXT2 has supported hardlinks back into the days of antiquity, however, so the "cp -al" trick will work on just about any Linux box you touch.
If you decide on a commercial solution, Storix is a good choice. ;-) www.storix.com
Flexible bare-metal recovery for Linux/UNIX
VMWare Snapshots
Are you backing up just data, or configurations or what? Backup Solutions are nice and all, but you're still missing something .... all the crap^H^H^H^H configurations that you've collected over the years of using that particular setup.
And once you go to VMWARE (or other VM product) you'll quickly realize that the abstraction away from specific Hardware is very nice indeed.
However, if one is REALLY concerned about backups, a duplicate Hardware setup in a seperate location sitting idle (or cold) is a necessity. And having a VMWare snapshot ready to load on backup hardware is just tits when things REALLY go south. You end up looking like a genius, and get to play Scotty (over engineered everything).
The difference between amateurs and professionals is not when things are going well, it is when the shit hits the fan. A weekend Geek can built the $8000 backupsever or whatever of storage, but once the drives start to fail (and they will) that solution starts to REALLY suck because you can't get to the freaking drives easily (and I doubt it will tell you that the drive even failed).
Let me just say it this way, if you can't afford "over engineered" equipment, you can't afford to do it right.
So, VMware, snapshots and spare hardware offsite are the way to go. Anything less these days is simply weekend geek pride.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
And from what I have heard Time Machine lead to data loss, ask some one who has tried to recover a back up from time machine.
We use R1Soft for backups. It does block based backups to disk and we have found it much faster and less CPU overhead than rsync. If you have alot of files that are continuously changing than r1soft is perfect because it only backs up blocks that have changed from the previous backup. If you have a 5gig file and you make a small change to it, you only backup that small change, no need to backup the entire 5gig file again! It really simplifies backup and recovery, and also has Bare Metal Recovery features that can save your ass when a server goes down. It can also be used on both Windows and Linux servers giving you one solution for both operating systems. R1soft has saved us so much time handling our backups, its defiantly worth a look for anyone serious about backing up a sever.
I'd go with a virtual tape library (VTL).
I made a whole office of IT manager types blanch one day after carrying tapes to the admin building by asking,
"say, you know those jets that fly over every 20 minutes? do you think that we'd have any backups left if one crashed here on the library and skidded forward?"
not that they did anything about it, mind you, but I'll bet they all updated their resumes.
and it's a good plan to prepare offsite video servers/storage. KREX TV burned out early this Spring in Colorado. while it's one way to update your plant quickly, it sucks for business continuity.
if this is supposed to be a new economy, how come they still want my old fashioned money?
As there have already been tons of suggestions already, Rsync is a great utility for backing up files. It's extremely efficient, and makes synchronizing large files a breeze. However, it's command line based, so you would have to set up a script to run it. Another alternative that I have found very useful is a product called Bacula. It's an enterprise grade application, complete with a server component, workstation component, and an administration console. It basically stands up to commercial enterprise products like Backup Exec. However, it is somewhat challenging to set up, as there are a LOT of options. And, its console is text based. There is a Gnome front-end, but it does not access all of the features that Bacula has. But the upside is that it can be configured to do pretty much whatever you want, can be administrated remotely, and is very powerful. It is even used for complete disaster recovery.
I haven't seen anyone in this thread mention SymForm http://www.symform.com/, which may well be an ideal solution for your situation. This is a fairly new startup operation founded by former Microsoft and Amazon engineers that manages a cooperative cloud backup platform. You'll need to do some reading of the whitepapers on their website to wrap your brain around the concept, but the gist of the idea is that you configure your spare storage device (like your Drobo box) to form a node that connects to the cooperative cloud, which is comprised of free disk space on the spare storage devices (SAN, NAS, external SATA drives, etc.) of the other members of the cloud. With 5,000-10,000 other nodes sharing exabytes of free disk space, there is plenty of capacity for all the members of the cooperative, and as the cloud is distributed worldwide, there is no single point of failure to worry about. The data is fragmented in such a way that it is distributed randomly across multiple nodes (in a system they call RAID-96) so that no single node in the network contains a complete copy of your data. You pay a flat monthly fee to join the cloud, and your data is encrypted by your node and backed up incrementally over your network connection. It may take a while to get your first full backup transmitted, but after that, the bandwidth is used only for deltas. It's kind of a brilliant idea that blew me away the first time I heard about it.
My institution uses TSM (Tivoli Storage Manager) with many huge tape libraries and racks of disk storage. It seems to work extremely well. I imagine it is very expensive.
Well, the complexity of this redundancy reduces the reliability overall, and it has a cost.
sort of reminds me of the joke by Mitch Hedberg, the "an escalator can never break, it can only become stairs."
IE if PC redundancy is done right, then yes (for example) it might have 6 smaller drives instead of 2 drives, and 2 controllers instead of 1, it will have some hardware failures more often than the simpler system. However after the first hardware failure, it essentially becomes the simpler system without the redundancy, until you fix it.
IE if the main failure component is hard disks, with redundancy you may have 3* the number of (smaller) drives and you are then (roughly) 3* more likely to have some drive failure, than the single drive system (say over a 3 year span) So while you may have a 10% chance of failure in 3 years with the single drive system, you may have a 1-(0.9*0.9*0.9) 27% chance of a single drive failure in the redundant system. But you have only a 5.4% (0.1*0.27+0.1*0.27) chance of having 2 hard drive failures in the 3 drive system in 3 years , instead of the 10% chance in the single drive non redundant case (add a hot spare, your under 1% redundant failure rate).
So even if you weren't allowed to fix the redundant system, the likely hood of a disk failure downing the system would still be at least 1/2 as likely as the non redundant system. A 6 drive redundancy (ie a hot spare) instead of 2 drives non redundant works out much better, ie 26% chance of non-redundant failure vs 1% chance of a triple failure (or double in the same array) of the redundant system...
Cost is a valid issue, increased reliability has a hardware cost, which if it doesn't outweigh the cost of a system crash, then yes you don't need it.
I use rsync to a zfs file system. A couple of cron jobs to fire off rsync and do zfs snapshots makes for a nice TimeMachine-like solution without TimeMachine.
I can see those two power-supplies dueling each other "I was here first! ZAP!" "No I was ZAP ZAP"
s/Duel/Dual/g
.. is a Solaris system (for XFS) and rsync. After each rsync a snapshot is created, for 45 days of retention (each snapshot is fairly small for us, your data sets may vary). It's extremely fast and not difficult at all to figure out, just make sure you turn off all the unneeded Solaris services (essentially everything but ssh).
I'd love to be doing this with Linux but btfs is not yet stable enough for a production environment.
I do *not* recommend trying to use hard links for incremental backups, you'll find that unless your files are large (instead of numerous) that most of your processing time is spend expiring old snapshots.
Just because you disagree doesn't make it offtopic or flamebait.
"Real" television stations use LTO tape for video backup, along with a robot tape library like the Quantum Scalar series or the Sun StorageTek system. This is generally operated by a broadcast archive management system such as MassTech MassStore, Gorilla, or Front Porch Digital.
The broadcast archive management system is connected with your television station automation system, so when your automation system needs a certain video file to play back from your server, the archive system begins a transfer from the tape library ahead of time, so the file is on your video play back server before play back begins.
sort of reminds me of the joke by Mitch Hedberg, the "an escalator can never break, it can only become stairs."
If you don't believe in Byzantine failures, then sure. One way an Escalator can break is it suddenly starts running in the opposite direction, or it accelerates to a wild speed. Only manual intervention can stop it, and by the time you do so, someone might have gotten hurt.
IE if PC redundancy is done right, then yes (for example) it might have 6 smaller drives instead of 2 drives, and 2 controllers instead of 1, it will have some hardware failures more often than the simpler system.
The problem is not when 1 controller goes out perfectly. The problem is when 1 controller is disrupted in a way that breaks the other controllers, or breaks in a way that causes corrupt data to be written.
Single redundancy isn't enough.. You need either at least 3 copies of your data, or 2 copies on different systems, some very good checksums, and a reliable procedure for validating them.
Using Linux software RAID and RSYNC doesn't do that, even with 2 boxes.
I think what your saying is, don't use software RAID 5 to reduce your chances of a major failure from 10% to under 1%, because you cant use just that one solution to completely eliminate the chance of all failures? It is one thing to just point out that this is not a perfect solution in its self for all problems, it is another to say, give up unless you can cover every possible failure at no additional cost.
You need either at least 3 copies of your data, or 2 copies on different systems, some very good checksums, and a reliable procedure for validating them.
Using Linux software RAID and RSYNC doesn't do that, even with 2 boxes.
hunh? either you are really paranoid, or your not aware of all the features of these tools. Because that can do exactly what you claim to need. IE if you have s/w RAID-5 on the backup server, and do rsync, having it compare CRC and date of each file. Then it does exactly what you say you need. IE the backup raid will compare the stripe each time you calculate the rsync hash, rsync compares CRC as well, you have essentially 3 verified copies of everything (2 on the back RAID, 1 on the main system.) A (very unlikely event of) a strip fail may not know how to recover the file, but rsync will fix it (maybe some manual intervention needed) the next pass.
I guess if the very unlikely event of a un-detected write error to a drive, happens on a file, then the unlikely event of a failure of the main system, within the same backup period. This would leave you with a very small window of a single byte screwed up (but detected.)
I guess to get to your coverage level, both systems need raid-5, (or at least a better file system, on the main PC) so that the main system doesn't get a undetected corruption that then gets backed up.
None of that yet explains why you posted that raid would make it less reliable. I guess unless you got some really crappy disk controllers that fail more than anything else, and also fails into a "disrupting manor" most of the time. I would agree, after that happens, you would have been better off with a different system. That would be the same as telling a lottery winner that playing the lottery is a loosing bet (ie the lotto is only the right solution for players that fall into a 1 in a million situation.)
Latent disk errors are not one in a million events like winning the lottery. They are very common: the more storage you have, the more likely you will have one. CERN did some studies on Silent data corruption, because it's a real issue in scientific data collection.
They found 10^-14 error rate on Desktop hard drives (10^-15 on Enterprise disk), you expect to have 1 bit error for approximately every 11.3 Terabytes, and this is assuming good hardware, that you've qualified and verified clean, if you had a bad sector somewhere, it's a totally different story. And this is not including other sources of errors, such as RAM errors (the Backblaze chassis doesn't use ECC memory), or errors that can be introduced as a result of vibrations, due to the custom construction, or controller problems.
IF you are storing 63TB of data in RAID5
Simple. I'm saying software RAID5 on cheap disks is not a replacement for using high quality storage. When it comes to important data, all failures are major failures, even if you don't notice the failure.
Using two of these things is not nearly as reliable as using one good storage array, with proper disks and checksumming of data.
Reliability includes expected downtime. Downtime of your secondary servers can be costly too. All servers have downtime, the question is just.. how much of that is there on average, per year.
Then it does exactly what you say you need. IE the backup raid will compare the stripe each time you calculate the rsync hash, rsync compares CRC as well
From the storage layer's point of view, RSYNC'ing to a destination on a local file system is no different than copying to a new file ordinarily on the array; the destination will most likely be in page cache, when RSYNC reads back bits to verify the content checksum, some of those bits will be read back from cache (not by having each physical disk read back all those bits).
RSYNC does not use raw disk I/O, it is unable to check what is stored on each stripe and actually do any RAID verification.
RSYNC is also unable to examine metadata. If the ext4/ext2/ext3/JFS/XFS/Reiser/FAT metadata for the file or directory has latent errors, it may not cause issues until well into the future.
Latent errors do not consist of only a failed write. They may also be created by stray rights, stray reads. Just because a sector was good 15 seconds after you wrote to it, does not mean it will still contain good bits in 24 hours.
Multi-parts answer :
1. If its actually Final Cut Server , vs a file server being used for Final Cut Pro machines to dump files to - FCS is extensively extensive and scriptable - as far as media go you can have it set whatever backup or archive policy you like. It has commercial integration with several COTS backup and archive products including PresStore, Bakbone Netvault, and Atempo Time Navigator and Digital Archive. Its also very agnostic as to what it archives to - so if you can get an FTP or NFS share from elsewhere, you are good to go. If it is FCS, I'd suggest an Atempo based solution, as TiNa can handle the general purpose data as well, but Presstore is a very good alternative.
2. If its just a file server, there is no reason why you can't script up rsync on it to push data else where that makes sense.
3. Drobo's are flakey. Dark Star talking sentient bomb crazy flaky. Back this up urgently.
It really hinges on what else you can get access to as storage across the network.
If I have spare time and someone I can hand it off to, and if my user can't afford to spend that much, and the user has a relaly high quality WAN with a ton of bandwidth, I tend to use OpenSolaris, ZFS, rsync, and other open source stuff. As mentioned before, ZFS is a killer file system - absolte best out there - and you can put together a cheap app server and storage server and wire this stuff together. It does require handing off to soemoen who knows what they're doing - new releases and bugs and stuff can really suck time from you. Most of the time I use Unitrends It's an integrated appliance that does disk-to-disk backup and has replication and all taht stuff built in with a killer user interface and support I can point someone to instead of having to do it myself. I like the disk archive stuff because most of the time WAN bandwidth is an issue and I like whatever they've done in replication because it seems fastger than rsync on what I do. Plus because their unknown I like the fact I can get this priced below what pure software stuff from Symantec (which has the worst support in the universe) and CommVault (expensive as hell). I think the user interface they have kicks ass too particularly when I have to hand off to someone who isn't as technical as I am.
... for things like databases and other highly transactional applications.
IANAL but write like a drunk one.
It is highly dispiriting that after reading most comments on this thread, only one poster mentioned LTO tapes (or any tapes or long term archival means for this matter).
By copying your data to another machine (the underlying file system is irrelevant) you are only creating a set of data that is highly available, but not one that is properly backed up.
Back ups refer to archival and retention of data for long periods of time (months or years). Putting your data in another machine simply does not fulfil this requirement.
Disks were not designed as long term archival means, you will find this the hard way.
All the well intentioned comments on this thread are describing how to make your data more available, but the handling of backups implies much more than quick access to a recent set of data.
Although ZFS could be part of the chain of your backup strategy, if the data does not go ultimately to archival tape which is registered and stored safely, then you are deluding yourself if you think you have got the backup problem cracked.
IANAL but write like a drunk one.