Best Backup Server Option For University TV Station?

Done to death. by Zlurg · 2009-09-16 14:43 · Score: 5, Funny

Holy crap we're approaching the need for an Ask Slashdot FAQ. I feel old.

Re:Done to death. by Magic5Ball · 2009-09-16 14:54 · Score: 5, Informative

Cue usual discussion about defining the problem correctly, choose the right tool for the job, etc.
Specifically:
"Would a Linux box with rsync work?" - It depends on the objective business requirements you've defined or been given. If those requirements include "has to be implemented on Foo operating system", then those requirements are not just for a backup solution.
"What is the sweet spot between value and longevity?" - Simple: Graph accumulated TCO/time based on quotes from internal and external service providers. Throw in some risk/mitigation. Find the plot which best meets your cost/time/business requirements.
"What solution would you use?" - Almost certainly not the solution you would use, because my needs are different. What is your backup strategy? What are your versioning requirements? What are your retention requirements? (How) do you validate data? Who should have access? What is an acceptable speed for access to archived data? What's an acceptable recovery scenario/timeline, etc.
If you do not already know the answers to those questions, or how to find reasonable answers, ask neighboring university TV stations until you find one which has implemented a backup solution with similar business requirements as your's, and copy and paste the appropriate bits. You'll likely get better answers from people who have solved your exact problem before if you search Google for the appropriate group/mailing list for your organization's level of operating complexity, and ask there instead of asking generalists on slashdot, and hoping that someone from your specialist demographic is also here.

--
There are 1.1... kinds of people.
Re:Done to death. by neiras · 2009-09-16 14:55 · Score: 5, Funny

I feel old.
Well, your UID makes you older than me.
I SAID, YOUR UID MAKES YOU OLDER THAN ME.
Also, my name is NOT "sonny boy", and this is my lawn, not yours. Where do you think you are, old timer?
Re:Done to death. by Barny · 2009-09-16 15:04 · Score: 4, Funny

/me loads his shotgun and squints
Just walk away kid, real slow, and keep your hands where I can see em

--
...
/me sighs
Re:Done to death. by DNS-and-BIND · 2009-09-16 15:12 · Score: 2

Especially since this isn't even an "Ask Slashdot", it's in the "Linux" category. It's just the editors not reading their own site. "Throw this out there, this should be some red meat for the troops," that sort of thing.

--
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Re:Done to death. by ndege · 2009-09-16 15:38 · Score: 4, Funny

What? Can't hear ya...you gotta speak up!

--
Sig Return: 204 No Content
Re:Done to death. by Quentusrex · 2009-09-16 15:47 · Score: 1

Now I feel young...
Re:Done to death. by ge · 2009-09-16 16:01 · Score: 2, Funny

Calm down, grandpa....
Re:Done to death. by SanityInAnarchy · 2009-09-16 16:35 · Score: 1

So, some quick answers here:

"Would a Linux box with rsync work?" - It depends on the objective business requirements you've defined or been given. If those requirements include "has to be implemented on Foo operating system", then those requirements are not just for a backup solution.
However, the fact that it's been suggested means it probably would work. A better solution (also old enough to be in the FAQ) is rdiffbackup.

"What solution would you use?" - Almost certainly not the solution you would use, because my needs are different.
True, you often need a custom solution. Just as often, a generic solution works. For much of the population, if they're on OS X, I'd say use Time Machine. If they like Internet backup, I'd say use Jungle Disk. And so on.
In this case, yes, there are questions that need to be asked regarding the volume of data. But the differences between various backup schemes really aren't that big -- in this case, a linux server and rsync (or rdiffbackup) sounds like it'd work, and since it's a backup server, any RAID level other than 0 should be sufficient.

--
Don't thank God, thank a doctor!
Re:Done to death. by symbolset · 2009-09-16 17:00 · Score: 1

The correct answers change every week.

--
Help stamp out iliturcy.
Re:Done to death. by dgatwood · 2009-09-16 17:30 · Score: 1

In this case, for the Mac OS X installation, my answer would be the same as any other user (or at least the client side portion is the same):
Time Machine to an XServe attached to a giant hardware RAID 5 array via fibre channel. In other words, the same way I back up my laptop except with a serious server providing the disk instead of an ABS....
You should be able to back up the office file server to a Mac OS X Server box just as easily as you could back up to a Linux box, but the reverse isn't true. Backing up a Mac OS X installation with resource forks, extended attributes, etc. to a Linux box is nontrivial at best.... Been there, tried that, never got it working reliably.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Done to death. by Leto-II · 2009-09-16 17:34 · Score: 3, Funny

Young kids these days and their rock music... Deaf at such an early age.

--
Do not anger the worm.
Re:Done to death. by SanityInAnarchy · 2009-09-16 17:45 · Score: 1

Backing up a Mac OS X installation with resource forks, extended attributes, etc. to a Linux box is nontrivial at best
Depends what you need. If it's just someone's laptop, a raw disk image is still useful. If it's an external drive or a network share, you can format the drive such that it can be plugged directly into a backup server, and you can use a Linux fileserver.
My choice would be a Mac laptop, a disk image, and a Linux fileserver for anything that won't fit on internal storage.

--
Don't thank God, thank a doctor!
Re:Done to death. by adolf · 2009-09-16 18:34 · Score: 1

What?

--
Kid-proof tablet..
Re:Done to death. by rikkards · 2009-09-16 22:01 · Score: 0, Troll

Old People and their sense of entitlement...until they get put into a home and forgotten about
Re:Done to death. by herbman · 2009-09-16 22:56 · Score: 2, Funny

Young kids these days and their rock music... Deaf at such an early age.
excuse me?

--
your mom!
Re:Done to death. by Anonymous Coward · 2009-09-16 23:53 · Score: 1, Funny

Young kids these days and their funny notions them there colored folk are the same as us
The thread about gene therapy for color blind monkeys is in another posting, not this one.
Re:Done to death. by alop · 2009-09-17 01:26 · Score: 1

Holy crap... I just realized my UID is only 5 digits... Where did the past 10 years go?

--
--alop
Re:Done to death. by krewemaynard · 2009-09-17 01:47 · Score: 1

UID FIGHT!!!!

--
I saw it on Slashdot, it must be true!
Re:Done to death. by everett · 2009-09-17 01:56 · Score: 1

I first read that as "/me loads his shotgun and squirts" was gonna suggest you check out the depends aisle at your grocery store.

--
Sig withheld to protect the innocent.
Re:Done to death. by Anonymous Coward · 2009-09-17 02:13 · Score: 0

No kidding. Ever seen a 3-digit UID before?
Re:Done to death. by IICV · 2009-09-17 02:21 · Score: 1

He said "DEAF AT AN EARLY AGE". Not that you'd know anything about that any more, huh?
Re:Done to death. by Lumpy · 2009-09-17 02:25 · Score: 1

And why try and do it the most expensive way possible?
Get a network SDAT carousel drive and have someone take the tapes off-site nightly it's all done.

--
Do not look at laser with remaining good eye.
Re:Done to death. by Lumpy · 2009-09-17 02:27 · Score: 1

That's funny jeb, you old coot. Dont worry kid. He cant see, and he doesnt have the strength to squeeze the trigger for years now.
Hell we replaced his two shotgun shells with flares years ago, he still has not noticed.

--
Do not look at laser with remaining good eye.
Re:Done to death. by cbreaker · 2009-09-17 03:39 · Score: 1

The only difference between a four and six digit UID was about 8 months.

--
- It's not the Macs I hate. It's Digg users. -
Re:Done to death. by stokessd · 2009-09-17 03:55 · Score: 1

I bow to your 3 digit UID
Re:Done to death. by rinoid · 2009-09-17 04:03 · Score: 1

... and the difference between a 3 digit and a 5 digit is best answered by a sloth.
Re:Done to death. by Spectre · 2009-09-17 04:06 · Score: 1

Just plant some flowers on my grave on your way offa my lawn, er, plot.

--
"Flame away, I wear asbestos underwear"
Re:Done to death. by Anonymous Coward · 2009-09-17 04:12 · Score: 0

And still worth asking it from time to time, due to rapid advancement in storage and not so rapid advancement in OpenSource continous data replication software.
Re:Done to death. by Anonymous Coward · 2009-09-17 06:30 · Score: 0

Cue usual discussion about defining the problem correctly, choose the right tool for the job, etc.
Specifically:
"Would a Linux box with rsync work?" - It depends on the objective business requirements you've defined or been given. If those requirements include "has to be implemented on Foo operating system", then those requirements are not just for a backup solution.
"What is the sweet spot between value and longevity?" - Simple: Graph accumulated TCO/time based on quotes from internal and external service providers. Throw in some risk/mitigation. Find the plot which best meets your cost/time/business requirements.
"What solution would you use?" - Almost certainly not the solution you would use, because my needs are different. What is your backup strategy? What are your versioning requirements? What are your retention requirements? (How) do you validate data? Who should have access? What is an acceptable speed for access to archived data? What's an acceptable recovery scenario/timeline, etc.
If you do not already know the answers to those questions, or how to find reasonable answers, ask neighboring university TV stations until you find one which has implemented a backup solution with similar business requirements as your's, and copy and paste the appropriate bits. You'll likely get better answers from people who have solved your exact problem before if you search Google for the appropriate group/mailing list for your organization's level of operating complexity, and ask there instead of asking generalists on slashdot, and hoping that someone from your specialist demographic is also here.
There's a backup company called Arkeia that is VERY strong in the linux community. They have features comparable to what you would get with products like Backup Exec, TSM, EMC Networker, etc. I would recommend taking a look at them. I think they also have an appliance. I've worked their product before, they even offer full support during the 30-day trial.
The main drawback of rsync is you don't get the versioning and catalog/indexing that you would normally get with a real backup application. But yeah, there are so many other things to consider. Will you being backups to disk only? To tape? Or some combination of the two? Remember, depending on your retention requirements and growth rate, 12TB of data could require at least 24TB of backup capacity and probably much more...
Do you have off-site requirements? What about bare-metal recovery?
Just think it through before you start implementing, regardless of whether or not you are going with something open-source.
Re:Done to death. by minion · 2009-09-17 07:12 · Score: 1

I feel old.
Well, your UID makes you older than me.
I SAID, YOUR UID MAKES YOU OLDER THAN ME.
Also, my name is NOT "sonny boy", and this is my lawn, not yours. Where do you think you are, old timer?
Well, my UID makes me older than both of you. Now get off my lawn. :)

--

-- If we don't stand up for our rights, now, there will be no right to stand up for them later.
Re:Done to death. by Wolfrider · 2009-09-17 07:20 · Score: 1

( Grumblez about the rising costs of $EVERYTHING )

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
Re:Done to death. by eleuthero · 2009-09-17 09:11 · Score: 1

Young kids these days and their rock music... Deaf at such an early age.
Speaking of, can I get the ticker symbols for a few good hearing aid manufacturers?

Related to one of the parent posts above, why should we use a real Time Machine when any other NAS running linux can be set up to run as a Time Machine backup?
Re:Done to death. by dgatwood · 2009-09-17 13:25 · Score: 1

I like Time Machine because you don't have to think about it much. When you're talking about backing up a non-server machine, being able to pull down a menu and choose "Back Up Now" is a real advantage. Imaging a drive every so often certainly isn't a bad idea, of course, but a disk image is to Time Machine as a hydrogen bomb is to a flyswatter....

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Done to death. by Red+Storm · 2009-09-17 15:56 · Score: 1

Rock music.... isn't that what we listened to last century? I've heard these whipper snappers today listen to something called techno....

--
---- Fight to protect your right to keep and arm bears! ummmm... ya I think that's right....

Final Cut has a solution by Anonymous Coward · 2009-09-16 14:44 · Score: 0

Use Final Cut Server.

Build a Backblaze Storage Pod. by neiras · 2009-09-16 14:50 · Score: 4, Interesting

Try one of these babies on for size. 67TB for about $8,000.

There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.

Talk to a mechanical engineering student on campus, they can probably help with that.

Re:Build a Backblaze Storage Pod. by Anonymous Coward · 2009-09-16 15:02 · Score: 4, Informative

You might have mentioned the Slashdot article on these from two weeks ago.
Re:Build a Backblaze Storage Pod. by illumin8 · 2009-09-16 16:03 · Score: 4, Interesting

Try one of these babies on for size. 67TB for about $8,000.
There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.
Talk to a mechanical engineering student on campus, they can probably help with that.
Better yet, just subscribe to Backblaze and pay $5 a month for your server. Problem solved.

--
"When the president does it, that means it's not illegal." - Richard M. Nixon
Re:Build a Backblaze Storage Pod. by Firehed · 2009-09-16 16:16 · Score: 1

For that much data, that's only a practical solution if you've got a dedicated 100Mbit or faster (1Gbit?) line just to upload. And downloading the data back is going to take quite some time as well.
Plus I think the $5/mo is only for home/personal use - that tends to be the case with most of their competition at least.

--
How are sites slashdotted when nobody reads TFAs?
Re:Build a Backblaze Storage Pod. by mlts · 2009-09-16 16:31 · Score: 3, Insightful

Remote storage at a provider like Backblace, Mozy, or Carbonite is a good tertiary level backup, just in case your site goes down, but you are limited by your Internet pipe. A full restore of terabytes of videos through a typical business Internet connection will take a long time, perhaps days. Of course, one could order a hard disk or several from the backup company, but then you are stuck waiting for the data to physically arrive.
Remote storage is one solution, but before that, you have to have local ones in place for a faster recovery should a disaster happen. The first line of defense against hard disk stuff is RAID. The second line of defense would be a decent tape drive, a tape rotation, and offsite capabilities. This way, if you lose everything on your RAID (malware or a blackhat formats the volume), you can stuff in a tape, sit on pins and needles for a couple hours, and get your stuff back, perhaps back a day or two.
For a number of machines, the best thing to have would be a backup server with a large array and D2D2T (disk to disk to tape) capabilities so you can do fast backups through the network (or perhaps through a dedicate backup fabric), then when you can, copy them to the tapes for offline storage and the tub to Iron Mountain.
Of course, virtually all distributed backup utilities support encryption. Use it. Even if it is just movies.
Re:Build a Backblaze Storage Pod. by SanityInAnarchy · 2009-09-16 16:38 · Score: 1

Problem: Windows/Mac only.

--
Don't thank God, thank a doctor!
Re:Build a Backblaze Storage Pod. by coffee_bouzu · 2009-09-16 16:44 · Score: 1

Try one of these babies on for size. 67TB for about $8,000.
That could only be a good idea for large installations like backblaze. You need to have lots of spares of everything, extra capacity for failures and someone on call to fix the thing when it breaks. There is almost no redundancy and they use consumer grade hardware which means that there will be very regular hardware failures. If you have a ton of the things, this isn't so much of an issue and it probably does end up being cheaper. But using just a couple, much less one of those things would be an exercise in sheer stupidity.

I think that it all depends on what your budget is and what you have access to. You don't need fibre unless its already there and you have the hardware and knowledge to use it. Those cards ain't cheap and would add much complication. Unless you have huge amounts of data (on the scale of several hundred GB or more) changing on a daily basis or can't afford to lose anything at all in case of catastrophic failure, just use the gigE or 100M that is already in place. If that isn't enough, you should really be looking at systems that are designed for remote replication and that gets really expensive, really fast.

Are you looking for an off-site mirror or backups? Those are not the same thing and you need to make sure that you know which you really need. I sincerely doubt that you need to be worrying about recovery time if the building burns down. Just worry about reducing the risk of data loss in the event of failure.

For backups, KISS is the most important thing. If something in the university is already in place, use that. Backup administration is a PITA and you're going to have to hand it off to someone once you leave (assuming you're a student) who may know almost nothing about computers. The simpler the system is to use, the easier the handoff is going to be and the less the people after you will hate you. Someone else mentioned asking the IT department if they offer any backup programs. That would be the best solution, I think. More expensive on paper, possibly but not likely. More expensive overall, I doubt it.

If you end up building a backup server, take into account hardware failures and how much time people can spend to babysit the thing. ZFS has some awesome features, but I don't think that I would use it with anything other than Solaris or maybe BSD. There is no way that I'm going to trust backups to FUSE for linux. Then again, I don't think that ext3 or ext4 would be my first choices, either. Personally, I would probably go with JFS with linux if you have 12TB and growing. Look into external storage arrays. I'm not so familiar with this price range, but HP's MSA 2000 or something comparable might be a good choice if you have the $$$. Just remember to budget for replacement drives if you go the hard drive based route. I'm using a hard drive backed backup solution at work and BackupPC is what I have been using for software. I have been pretty happy with it so far. Its free, it works and has some really good features (like intelligent backup so that it doesn't just blindly store 20 copies of the same file) but has a bit of a learning curve. Nothing like Zmanda's MySQL backup but it needs a little more than a few clicks. I also wish it didn't hit the backup targets so hard but that may not be an issue for you.

I know that there is the temptation to do something really cool and roll your own. I get that temptation a lot, too but you need to ask yourself if you're doing it for fun or to get the job done. If its the former, good for you and I'm jealous but I suspect its the latter. Do the minimum to satisfy the requirements with the least amount of required maintenance and the least cost. Let IT worry about backup systems if you can. That way you can worry about making television programs instead of checking up on the backup server whenever something hiccups or WHEN (not if) hardware fails.

I hope that my rambling helped a little. Good luck in figuring this out.
Re:Build a Backblaze Storage Pod. by mysidia · 2009-09-16 17:20 · Score: 1

Maybe, but you will need at least 3 of them and some pretty smart software to have reasonable reliability. The cheapness of backblaze's pods comes from cutting a lot of corners in the hardware.
If you look at the design of Backblaze, this is not server grade equipment: they have two power supplies, but they are not redundant, and these are desktop power supplies not designed to be operated 24/7/365.
If PSU B goes out, a large number of your disk drives and fans go offline; so you lose data and maybe burn up.
And PSU A powers the mainboard...
The disks used by Backblaze are Seagate 7200.11.
These are desktop drives, not designed for 24x7 operation like in a backup server, MTBF stats are based on 1200 power-on hours per year.
Hello... silent data corruption. The last thing you need is for the time to come when you want to restore your backup, and you find latent defects on some of your disks means there are errors in the bits that got saved.
Re:Build a Backblaze Storage Pod. by drsmithy · 2009-09-16 20:56 · Score: 1

Try one of these babies on for size. 67TB for about $8,000.
Although, if you want a solution that's fast and reliable, you probably shouldn't.
Re:Build a Backblaze Storage Pod. by c6gunner · 2009-09-16 23:04 · Score: 1

The disk access speed on those must be truly pathetic. Putting 15 drives on a single PCI bus is rather like trying to suck a bowling ball through a garden hose.
Re:Build a Backblaze Storage Pod. by petermgreen · 2009-09-17 00:10 · Score: 1

Try one of these babies [backblaze.com] on for size. 67TB for about $8,000.
Unfortunately there are a couple of custom parts in there (IIRC the main ones are the case and the multiplier backplanes) plus you probablly won't get the same bulk discounts on the commodity parts so building one may cost you quite a bit more than it costs backblaze.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re:Build a Backblaze Storage Pod. by Nutria · 2009-09-17 00:44 · Score: 1

Remote storage at a provider like Backblace, Mozy, or Carbonite is a good tertiary level backup
For 12TB data?

--
"I don't know, therefore Aliens" Wafflebox1
Re:Build a Backblaze Storage Pod. by Culture20 · 2009-09-17 01:32 · Score: 1

The disk access speed on those must be truly pathetic. Putting 15 drives on a single PCI bus is rather like trying to suck a bowling ball through a garden hose.
So just go down to your local sleezy street corner and hire a professional to set the system up. Experience is everything.
Re:Build a Backblaze Storage Pod. by LWATCDR · 2009-09-17 02:01 · Score: 1

But it is a backup server not a database server. As long as they can keep up with the network it will be good enough.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Build a Backblaze Storage Pod. by c6gunner · 2009-09-17 03:37 · Score: 1

They've got 12 terabytes of data. A generous estimate tells me it would take about 100 hours for the initial backup. If they only generate 1 terabyte of new data per day, they'd be looking at something like a 10 hour back-up time every night. I suppose it's usable in a school environment, but it would still suck.
They did mention they wanted to set it up on a gigabit network, or possibly fiber, which makes me think they'd be transferring a hell of a lot of data to it. With such slow disk speeds, they definitely wouldn't need fiber.
The best (cheap) solution would be to look for a motherboard that supports 8 sata devices, and then throw in a couple of cheap 4 port PCI-E sata cards. That gives 'em the ability to toss in up to 16 drives, which would provide plenty of capacity even in a RAID6 or RAIDZ2 configuration. And the write speed would be massively improved.
Re:Build a Backblaze Storage Pod. by cbreaker · 2009-09-17 04:08 · Score: 1

Woah woah woah. Hold on there. What do you mean, frequent failures because of "consumer grade" hardware?

That's utter bullshit and you should know that. Desktop hard drives have not been proven to be any less or more reliable than "enterprise" disks. Ever. At my last job, we used a shit ton of desktop hard drives in our data center and the failure rate was no different. Sure, a few hundred disks isn't a great sampling but nobody else has proven it either.

Besides, the BackBlaze is set up into four sets of RAID 6 on ZFS. A lot of disks would have to fail, here. And the redundancy is NO different than any server you'd find from ANY manufacturer of comodity PC's.

He's not asking for mainframe class hot-site backups here (although we're not exactly sure what he's asking for, but that would have been something to mention) he's apparently looking for a cheap place to store a bunch of terrabytes in case the building burns down.

--
- It's not the Macs I hate. It's Digg users. -
Re:Build a Backblaze Storage Pod. by cbreaker · 2009-09-17 04:16 · Score: 1

Okay, there's many problems with your post.

What IS "Server-Grade" equipment, anyways? Because for the thousands of servers I've touched in the last decade, the only differences seem to be multiple power supplies and RAID cards. Many servers have dual-CPU sockets, sometimes four.

So really, the only real difference here is that they didn't put in *redundant* power supplies. Big whoop. Last time I had a PSU on a server go bad was 2001.

PSU A powers the mainboard. And? You do realize that these hard drives don't actually use much power and they've used 750W supplies?

And again with the unsubstantiated claim that desktop hard drives are not as reliable as "enterprise" hard drives, and you even used MTBF. MTBF is more about warranty than anything else. The manufacturer wants to sell you the much more expensive "server" drives. They're made on the same assembly lines with most of the same parts. They weigh reliability against performance - Desktop drives usually have more space and slower rotational speeds, server drives spin faster with less space.

Silent data corruption isn't a problem with Linux RAID 6. It will routinely "scrub" the disk data to make sure there's no parity errors. This effectively prevents this problem. All decent RAID cards do this, and so does Linux RAID. If parity errors are found the disk goes offline and you're alerted. Crisis averted.

Please, stop reading from a text book and get some experience in the field.

--
- It's not the Macs I hate. It's Digg users. -
Re:Build a Backblaze Storage Pod. by cbreaker · 2009-09-17 04:23 · Score: 1

Actually, it's probably not too bad.

PCI isn't that slow, and it seems as though their main concern is more about access time versus transfer rates. In fact, most enterprise computing looks at disks from an IOPS perspective and you don't even need to worry about transfer.

So, your average PCI bus won't be a bottleneck here, the disks will, and even if we were using PCIe x8 instead of PCI, it's unlikely the IOPS performance would be any different.

They put 15 disks in each array. Having that many spindles in the array will increase IOPS performance greatly. And, you underestimate Linux RAID performance.

--
- It's not the Macs I hate. It's Digg users. -
Re:Build a Backblaze Storage Pod. by cartman94501 · 2009-09-17 05:52 · Score: 1

The people who build Backblaze's cases can also build one just for you, which is super-easy, but in quantity 1 it will cost a fair bit more (say 20%) than the price listed.
Re:Build a Backblaze Storage Pod. by router · 2009-09-17 06:32 · Score: 1

I had power supplies die. In Cisco routers. Several times. Might have been something in the input or load that was killing them. They were redundant, so no downtime (except the time both of them failed, one several months (!) before the second...). So does, happen, rarely, far less than drives or CPUs or memory, or mainboards. But if you're going redundant and have the budget, might as well; something intrinsically satisfying with having a part go bad and not take the server down. Satisfaction that is rare indeed in the System Admin profession....
andy
Re:Build a Backblaze Storage Pod. by LWATCDR · 2009-09-17 09:03 · Score: 1

I agree with you there but it is still workable if a little slow.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:Build a Backblaze Storage Pod. by cbreaker · 2009-09-17 10:01 · Score: 1

Certainly; it's a great idea to have redundant PSU's on critical hardware if available, but I wouldn't call your average beige box server unreliable because of that single commission.

But really, that's the biggest difference between "server" computers and "desktop" ones when it comes to fault tolerance. That, and the aforementioned RAID controller of some sort. Those aren't even present on low end servers..

Sure, a decent server from HP will probably be more reliable, but it's usually because they chose components conservatively. RAM is never clocked to the extreme, it's always ECC, and they use very well tested chipsets. However, you can make a good selection of a commodity board and get all of these things too; you just have to be choosy.

The #1 reason for server downtime in my experience has been RAM or a bad component on a main board. #2 is RAID cards, and a far distant #3 is disk failure.

There's some excellent servers out there that provide very good levels of reliability utilizing things like mirrored/hot plug RAM, hot plug PCI cards, etc. Of course, these systems are extremely expensive; that's why most SAN systems simply use two "servers" (Like a Clariion; it's just two linux boxes) and customized software/hardware to facilitate seamless failover.

So no, a Backblaze type of system isn't ideal for mission critical storage but it provides a similar level of redundancy to most servers while being orders of magnitude cheaper.

It irks the shit out of me when I see people over and over saying desktop hard drives are crap and servers are somehow better. These people don't realize that it's all the same stuff. It's like a Macintosh versus a normal PC. The only difference is a 512Kbit BIOS update.

--
- It's not the Macs I hate. It's Digg users. -
Re:Build a Backblaze Storage Pod. by mysidia · 2009-09-17 10:55 · Score: 1

A 512kbit BIOS difference, and how rigorously it is tested, how mature it is, and your ability to update that BIOS with field upgrades later, can be the difference between having all your data and having none of it.
Out of some 50 servers; i've only once or twice seen a RAM DIMM go bad in 5 years, and never had a RAID card fail. The place is well-cooled. hard drives, fans, power supplies, and batteries need replacement more often than any other component. There are approximately 300 hard drives in the environment, only about 30 of the drives in servers are 7200.11 drives, and yet in the past year 7 of the 7200.11 drives failed and TWO Enterprise class drive failed. 78% of the drive failures were Desktop class drives being used in servers.
If I had some mission critical data, with a server workload, I do believe I wouldn't want to stake it on 7200.11s. Maybe not all desktop drives fail so often, or maybe through some strange coincidence I got a bad batch, but that's pretty unlikely.
You think desktop hardware and server hardware is the same but they are really not.
Rotational speed is not the only difference. That is wishful thinking.
I can count on one hand the number of times i've seen a proper server PSU fail, among some 200 servers, all with dual PSU. If I wanted to count the number of times i've seen a Desktop PSU fail, even UPS-protected desktops, I would need a few dozen more hands.
Re:Build a Backblaze Storage Pod. by cbreaker · 2009-09-18 01:20 · Score: 1

you really misunderstood the point on the Macintosh. The difference between a PC and a Mac these days is a different BIOS, but obviously I'm not talking about a Hackintosh. Apple takes off the shelf components, customized an EFI BIOS for it, slaps on an Apple sticker and off they go. Sure, they sometimes have custom board designs made to fix their cases, but the components are no different.

If you've never had a RAID card fail then you're lucky, but 50 servers isn't exactly a good sampling. I've worked in data centers with a thousand servers, and I've worked in a few of them. These days, with VMware stressing servers in a way that's probably never been done on commodity hardware, you get a lot more failure out of RAM and Boards. Well, the failure rate is probably the same, but these problems might never surface on your average 2% utilization server.

My experience with drives hasn't mirrored yours. Maybe 7200.11 are bad quality drives, or maybe you got a bad batch, but I've had just as many SCSI and Fiber Channel disks go bad as SATA disks.

For terms of PSU's, well, sure, many average desktops from Dell or whomever will probably use cheaper components. That doesn't mean all ATX power supplies are bad. The form factor doesn't mean anything, you know. There's some really excellent PSU's available, and Backblaze used two of them in their storage machine.

--
- It's not the Macs I hate. It's Digg users. -
Re:Build a Backblaze Storage Pod. by jon3k · 2009-09-18 02:22 · Score: 1

Both of which are relative terms. To stick in my closet at my house, that's both fantastically reliable and exceptionally fast.

Need more information by belthize · 2009-09-16 14:51 · Score: 5, Insightful

A couple of details you'd need to fill in before people could give legitimate advice.

What's the rate of change of that 12TB. Is it mostly static or mostly dynamic. I would assume it's mostly write once read rarely video but maybe not.

Do you have a budget ? As cheap as practical or is there leeway for bells/whistles.

Is this just disaster recovery. You say if the station gets slagged you want a backup. How quickly do you want to restore. Minutes, hours, next day ?

Do you need historical dumps ? Will anybody want data as it existed last month ?

Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)

If you just want to dump data and restore isn't critical, you just need to be able to do it in some time frame then sure rsync'ing to some striped 6 (or 12) TB SATA array is plenty good.

Re:Need more information by Krondor · 2009-09-16 15:25 · Score: 5, Informative

The parent is absolutely right. We don't have enough details to really make a recommendation, but if the question is 'can rsync replicate 12 TB with an average rate of churn over a 1 Gbps link reliably'? The answer is an emphatic and resounding YES!
I used to maintain an rsync disaster recovery clone that was backing up multiple NetWare, Linux, Unix, and Windows servers to a central repository in excess of 20 TB over primarily 100 Mbps links. We found that our average rate of churn was 1% / day which was easily accomplished. It was all scripted out with Perl and would notify on job status each night or failures. Very easy to slap together and rock solid for the limited scope we defined.
When you get into more specifics on HA, DR recovery turn around times, maintained permissions, databases and in use files, versioning, etc.. things can get significantly more complicated.
Re:Need more information by Anonymous Coward · 2009-09-16 15:42 · Score: 4, Insightful

>
Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)
If you think Windows registry junk could possibly be involved with Apple's pro video software, you are quite right, you don't know anything about it.
Re:Need more information by Anonymous Coward · 2009-09-16 16:26 · Score: 0

you also forgot; "In the cosmic scheme of things, does this data really matter?"

Typically, when one comes to a digital existential crisis, it pales in comparison to the real thing.
Re:Need more information by mcrbids · 2009-09-16 17:24 · Score: 2, Informative

I second that motion....
We do something similar with rsync, backing up about 6-8 TB of data. We have php scripts that manage it all and version the backups, keeping them as long as disk space allows. Heck, you can even have a copy of our scripts free of charge!
With these scripts, and a cheap-o tower computer with huge power supply and mondo cheap, SATA drives, we manage to reliably backup a half-dozen busy servers off-site, off-network, to a different city over the Internet automagically every night.
Yes, more information is needed, blah blah blah. But it's definitely a feasible idea.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:Need more information by mysidia · 2009-09-16 17:34 · Score: 1

If it's static, you could spend a few hundred $$ and buy 100 Blu-Ray Recordable disks; burn the 12 TB in sets of 50, and vault them at an off-site location.
get a small hosted account somewhere that you can continuously rsync the 30gb or so between backup sets.
When enough new video has been added to fill a BD-R, burn two copies of it, move the copies on the rsync server to a 'to be deleted directory', vault the BD-Rs off site, and then after that, delete the files off the rsync server.
Re:Need more information by Anonymous Coward · 2009-09-17 09:33 · Score: 0

ho humm well that says it all Apple no need to back it up just do your self a big favour and can the apple crap in it's entireity.
Never have Never will Never gunna like apple it all crap!.

backup solutions... by Anonymous Coward · 2009-09-16 14:51 · Score: 0

One of the most reliable backup solutions I have put in place for most of my clients is "acronis"....It does a great job backing up across a network just schedule it for during the night as it will take some bandwith ... I deal with ems/911 servers and backups is one of the most important things I recommend to anyone... My setup for one of my biggest clients is...A dedicated server running "Acronis" with a 1 tb of hd space backing up 3 mid size servers... every night...

Amazon S3 by theNetImp · 2009-09-16 14:52 · Score: 1

Why build and maintain a server, just push it to amazon.

Re:Amazon S3 by Brian+Gordon · 2009-09-16 15:22 · Score: 3, Informative

Why do anything when you can pay someone else twice as much? 12TB from Amazon will be an order of magnitude more expensive than just running a storage server, and you have to pay for internet bandwidth instead of just running a wire.
Re:Amazon S3 by Iguanadon · 2009-09-16 15:27 · Score: 1

Why build and maintain a server, just push it to amazon.
12000GB * $0.15/GB = $1800 a month. That's $21,600 a year. S3 is great for some things, terabytes upon terabytes of archival storage isn't it. That's not including the time (or bandwidth cost) it will take to upload all the data to the server...
Re:Amazon S3 by mlts · 2009-09-16 16:59 · Score: 1

I almost see a market niche for an archival cloud provider with a lot less of a price premium than Amazon's cloud. The caveat would be that the customer is not getting instant access to data that is archived off. The archives are restorable in nowhere near real time (as the server has to retrieve the customer's info from tape and copy it to a drive array for transfer back).
This would be similar to Mozy or Carbonite, but the data would persist indefinitely once copied, instead of having old versions of it evaporate in a number of days.
This way, someone can play a cloud provider "X" amount to store some data permanently, copy it up, and essentially forget about it until its needed for an audit. Of course, pricing will need to be done right because there wouldn't be a constant income stream coming in. Perhaps a yearly maintainence fee would cover this. Pricing for volume of data would have to be included too, so people don't just shove the contents of every single computer up every several weeks for a constant price.
Re:Amazon S3 by poopdeville · 2009-09-17 00:57 · Score: 1

I remember robotic tape archivers in the 90s... All they need is a fat pipe.

--
After all, I am strangely colored.
Re:Amazon S3 by Anonymous Coward · 2009-09-17 02:10 · Score: 0

"running a wire" in this case would be running a wire to some other building as this is in case of a fire or other disaster...
Not as simple as you make it out to be..
Re:Amazon S3 by dave562 · 2009-09-17 06:34 · Score: 1

Just plug in something like a Quantum ATL P3000 series and he's good to go. Of course it's completely overkill and completely overpriced for what needs to be done. On the other hand, there are definite geek points to be earned by having a robotic tape library.
Re:Amazon S3 by jon3k · 2009-09-18 02:28 · Score: 1

Unfortunately, as with most cloud services, healthcare, financial and legal have some serious regulatory roadblocks in the way. I know there isn't much data that I can backup from $work into $cloud without violating several laws.

FreeBSD/Linux + Rsync by mnslinky · 2009-09-16 14:52 · Score: 3, Insightful

That's all you need. We even use a script to create versioned backups going back six months using perl as a wrapper.

Assuming the same paths, edit to your liking. I've made the scripts available at http://www.secure-computing.net/rsync/ if you're interested. It requires the system you're running the script for have root ssh access to the boxes it's backing up. We use password-less ssh keys for authentication.

The README file has the line I use in my crontab. I didn't write the script, but I've made a few modifications to it over the years.

Re:FreeBSD/Linux + Rsync by moosesocks · 2009-09-16 14:57 · Score: 4, Informative

Actually, I'd suggest using OpenSolaris so that you can take advantage of ZFS. Managing large filesystems and pools of disks is *stupidly* easy with ZFS.
You could also do it with Linux, but that would require you to use FUSE, which has a considerable performance penalty. I'm not sure about the state of ZFS on FreeBSD, although I imagine that the Solaris implementation is going to be the most stable and complete. (For what it's worth, I've been doing backups via ZFS/FUSE on Ubuntu for about a year without any major problems)

--
-- If you try to fail and succeed, which have you done? - Uli's moose
Re:FreeBSD/Linux + Rsync by Anonymous Coward · 2009-09-16 15:33 · Score: 2, Interesting

Actually, I'd suggest using OpenSolaris so that you can take advantage of ZFS. Managing large filesystems and pools of disks is *stupidly* easy with ZFS.
You could also do it with Linux, but that would require you to use FUSE, which has a considerable performance penalty. I'm not sure about the state of ZFS on FreeBSD, although I imagine that the Solaris implementation is going to be the most stable and complete. (For what it's worth, I've been doing backups via ZFS/FUSE on Ubuntu for about a year without any major problems)
The FreeBSD port of ZFS actually works pretty damn nicely. I'm using a RAID Z configuration on my FreeBSD 7.2 server and it works great!
Re:FreeBSD/Linux + Rsync by Anonymous Coward · 2009-09-16 16:22 · Score: 1, Informative

I just started using Nexenta -- it's a Debian userland on top of an OpenSolaris kernel.
So far, it works very well -- the advantages of Debian's packaging system with advantages of OpenSolaris (e.g. ZFS)

It's a good stepping stone for those used to linux (you just have to relearn some system command usage, system config, service management, etc)

http://www.nexenta.org
Re:FreeBSD/Linux + Rsync by fnj · 2009-09-16 16:42 · Score: 1

We even use a script to create versioned backups going back six months using perl as a wrapper.
Kudos for publishing the code! Can you comment on your script vs rsnapshot, which is an established incremental rsync based solution which also uses hard links to factor out unchanging files? Rsnapshot is also a perl script, by the way.
Re:FreeBSD/Linux + Rsync by Anonymous Coward · 2009-09-16 22:39 · Score: 0

You are confusing private and business needs. Two pairs of shoes.
Re:FreeBSD/Linux + Rsync by operator_error · 2009-09-16 22:42 · Score: 1

Well I can see mnslinky's script can be used for offline backups, but I don't think rsnapshot does this, (but I haven't studied either in-depth.).
My host uses rsnapshot, and it is very convenient to use.
Re:FreeBSD/Linux + Rsync by mnslinky · 2009-09-16 23:43 · Score: 1

I have never used rsnapshot. A friend of the last admin where I work originally wrote the rsync-backup script. The former admin and myself wrote the 'backup' script as a wrapper for the other one, to produce a report, etc. The script here also uses links to factor files which do not change, but only on a host level (it won't pick up that two files are the same between hosts and only backup one of them.)
We have explored switching to Backup PC for the easier file restores, but the instances in which we need to restore anything is so rare that it's not an issue. As a side note, we're using these scripts to backup nearly 3.6TB of data for the company, and I use it for about 500GB of my own.
Last, something I didn't mention above is the scripts have support for creating an off-site copy. The way currently implemented, we rotate USB drives weekly for the offline copy. These copies do not have the versioned data, only last night's complete copy, which is under 1TB (but just). It would be fairly trivial to add code to send to a remote data center, your largest limiting factor being bandwidth.
Re:FreeBSD/Linux + Rsync by Anonymous Coward · 2009-09-17 01:13 · Score: 0

I'd agree, check into other unices for this.
If the servers are relatively low power (and low speed) and have UPS backup, you might consider FreeBSD for resource consumption and filesystem performance.
Linux is great for the desktop, but for server level functions, it appears to be lagging behind.
Re:FreeBSD/Linux + Rsync by Anonymous Coward · 2009-09-17 04:42 · Score: 0

That's all you need
Well, you might want to test that you can restore stuff as well.

Check with the university by darkjedi521 · 2009-09-16 14:55 · Score: 5, Insightful

Does your university have a backup solution you can make use of? The one I work at lets researchers onto their Tivoli system for the cost of the tapes. I think I've got somewhere in the neighborhood of 100TB on the system and ended up being the driving force behind a migration from LTO-2 to LTO-4 this summer. If you are going to go and role your own and use disks, I'd recommend something with ZFS - you can make a snapshot after every backup so you can do point in time restores.

Also, I'd recommend more capacity on backup than you have now to allow versioning. I was the admin for a university film production recently (currently off at I believe Technicolor being put to IMAX) and I've lost track of the number of times I had to dig yesterday's or last week's version off of tape because someone made a mistake that was uncorrectable.

Re:Check with the university by Cato · 2009-09-16 20:04 · Score: 1

Having looked a bit at ZFS, it really needs x64 hardware with plenty of RAM (2GB plus), and Solaris has by far the best implementation. FreeBSD 7.x is next, followed by Linux's ZFS/FUSE. All IMHO, but from the reports I've seen it's a bit early to trust it on a non-Solaris platform and even on Solaris there are some bugs. (All IMHO, and there are production users on FreeBSD who are happy with it.)
LVM on Linux lets you do snapshots, but after losing thousands of files and several LVM logical volumes, including the backup filesystem on a separate disk (probably due to write caching being enabled on the hard disks themselves, as is the default), I'm trying to stay away from it.
For a backup server, if Solaris is not an option, I would use ext3 or possibly XFS (for faster deletes), disable write caching on all drives (hdparm -W0 /dev/sdX in /etc/rc.local). You will lose some performance but the risk of losing large amounts of data due to a power outage or system crash is far less.
The nice thing about ZFS is the block level checksumming which can detect disk and memory problems, and better snapshots than LVM. However, using ext3 with DAR or rsnapshot, plus par2 (Reed-Solomon checksums for error recovery), gives you some of the same error recovery and a more proven underlying filesystem.

Just build a clone by pla · 2009-09-16 15:03 · Score: 3, Insightful

What solution would you use?

First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.

That said...

For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.

That said, and if possible, I would also build the "backup" machine with more storage than the "real" machine. As someone else pointed out, you'll probably discover within a few days that your food-chain-superiors have no concept of "redundancy" vs "backup" vs "I can arbitrarily roll my files back to any second in the past 28 years". Having at least nightly snapshotting, unless your entire dataset changes rapidly, won't eat much extra disk space but will make you sleep ever so much better.

Re:Just build a clone by nine-times · 2009-09-16 15:29 · Score: 0, Flamebait

I'm glad someone brought up the difference between redundancy/failover and backup. If you really care about your data, mirroring to another server isn't a much better backup solution than using RAID with mirroring. It protects you against hardware damage, but not necessarily against data corruption or data loss. If you're going to rsync to another server, you should look into having it keep backups instead of overwriting when something changes.
As you mentioned, nightly snapshots are a great way of handling it, too. However, I still like the idea of writing to tape periodically. Writing to tape provides a real backup instead of just a mirror, it's easy to send them offsite, in some ways they're less fragile than hard drives and supposedly easier to recover if they do break. Also, depending on how much is changing how often, rsyncing might use up a crap-ton of bandwidth (though apparently that's not as big a deal in the submitter's case, since he has GigE).
On the other hand, your suggestion of building a duplicate of the "real" machine has a benefit that other backup solutions don't: in the case of a real disaster, you not only have your data saved on another machine, but you can use the backup while you recover the original server. On the other hand, I don't know how much to trust a Drobo device, so in this particular case I might suggest getting some heavier equipment. (Or are Drobos actually good? I've never used one.)
Of course, there's another issue that I haven't seen anyone bring up, which is: is all of this data vital? You have 12 TB of storage, but is all of that completely irreplaceable, necessary data? Are some of those temporary files, scratch files, working files, or cache? Is any of it just dumb crap that you don't care about? It may seem like a dumb question, in every company I've worked for, if you give the employees free access to any amount of space, they'll fill it up. They'll have 20 copies of the same file, and someone will have put their 100GB MP3 collection on the server if you don't keep them from doing it. If you can organize the files and sort the necessary files from the crap, you might be able to cut down on the amount you need to back up.
Then again, storage is so cheap, maybe you don't care.
Re:Just build a clone by SheeEttin · 2009-09-16 15:38 · Score: 1

For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.
Of course, the only problem with that is if you have a hardware failure on-site, the backup, being built of the same thing, is probably going to fail about the same time.
Re:Just build a clone by camperdave · 2009-09-16 16:08 · Score: 1

First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.

Why wouldn't you use it for your desktops?

--
When our name is on the back of your car, we're behind you all the way!
Re:Just build a clone by Anonymous Coward · 2009-09-16 18:07 · Score: 0

In all of that you forgot that he was looking for an *offsite* backup. The point of this backup server is incase something happens to the local copies.
Re:Just build a clone by petrus4 · 2009-09-16 18:22 · Score: 2, Insightful

Why wouldn't you use it for your desktops?
Linux still doesn't have the "interface complexity vs implementation complexity," problem completely balanced on the desktop, just yet; although then again, to be fair, neither does anyone else. (Except maybe Apple, and that's a maybe)
Ubuntu can make a very pretty looking desktop, but updates will often hose the entire system, and in my experience, it can also crash if you give it a hard look.
On the other hand, you can use LFS, Slack, or Arch to make yourself something extremely hardware efficient and robust...but that isn't also going to please anyone who wants the eye candy.
Re:Just build a clone by atarashi · 2009-09-16 20:20 · Score: 2, Insightful

Well, first you would need to define goals.
What do I want to backup? (only Data, or OS + Apps + Data)
Is my Data rather static or does it change a lot?
How fast does it change?
Do I have enough bandwidth to cope with the backup? (12TB is a lot! It would take more than a day to copy it over a GBit link... so, how much of the data changes over a day?)
Do i need daily backups? or even hourly?
How fast do i need to restore everything?
Do i need different versions? (Then the needed storage might be much higher than 12TB, ouch)
Who needs to restore files? (only the admins, or the users themself)
So it all boils down to: how much money do I have ;-)
Brgds
Michael
Re:Just build a clone by Linker3000 · 2009-09-16 22:34 · Score: 1

Good point - that's why all the disks in a RAID array should come from different manufacturers or at least different batches/manufacturing plants + your 'spare' server should be a different brand or made from different components.
In the mid 90's I was working for a training company in London and they hosted all their training data, courseware, disk images etc. on a big RAID 5 array with 5 disks. One day, the tech guy arrived at work to discover the drive bearings had seized on 3 disks!

--
AT&ROFLMAO
Re:Just build a clone by machine321 · 2009-09-16 23:07 · Score: 1

The year of the Linux desktop was 1999, silly.

rdiff-backup: like rsync with versioning by Z8 · 2009-09-16 15:05 · Score: 5, Interesting

You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states. However, because it keeps these diffs in addition to the mirror, it's better if you have more space on the backup side.

There are a few different frontends/guis to it but I don't have experience with them.

An oldie but a goodie by Anonymous Coward · 2009-09-16 15:06 · Score: 0

Did you check into CDs?

Why a backup server? by cyberjock1980 · 2009-09-16 15:08 · Score: 2, Interesting

Why not a complete duplicate of all of the hardware? If the studio combusts you have an exact copy of everything.. hardware and all. If you use any kind of disk imaging software, you can simply recover to the server with the latest image and lose very little data.

lose the drobo by AnonymouseClown · 2009-09-16 15:09 · Score: 2, Informative

i recommend losing the drobo as fast as you can - i know 4 people who bought these and all 4 lost data in the first year.

Re:lose the drobo by Anonymous Coward · 2009-09-16 15:46 · Score: 0

correct. the netgear readynas series is very good.
Re:lose the drobo by mlts · 2009-09-16 17:06 · Score: 2, Interesting

I have not heard of any catastrophic data losses firsthand, but I don't like my data stored in a vendor specific format I couldn't dig out by plugging the component drives into another machine.
If you are a homebrew type, you might consider your favorite OS of choice [1] that can do software RAID, building yourself a generic server level PC, and use that for your backups. This way, when you need more drives, you can go to external SATA frames.
[1]: Almost all UNIX variants support RAID 5, Linux supports RAID 6 (two drives as parity), and of course, some BSDs and Solaris support ZFS for RAID-Z. Windows Server 2000, Windows Server 2003, Windows Server 2008, and Windows Server 2008R2 support RAID 5.

Automatic internet backup by Len · 2009-09-16 15:10 · Score: 4, Funny

Everything your TV station broadcasts will automatically be backed up here.

Re:Automatic internet backup by Kjella · 2009-09-16 17:10 · Score: 5, Funny

[3 months later]:
<admin@uni> OMG we lost the server... 0 seeds!? somebody seed plz!

--
Live today, because you never know what tomorrow brings
Re:Automatic internet backup by Anonymous Coward · 2009-09-16 17:42 · Score: 0

haha, good call !!
Re:Automatic internet backup by P1erce · 2009-09-16 22:28 · Score: 1

Only the best bits will be backed up..
Re:Automatic internet backup by BeardedChimp · 2009-09-17 01:02 · Score: 1

That's what happens when you don't name your torrent "Worlds largest xxx porn collection(BritneySpears)Lohan-dogs-horses teen.avi.torrent"

BackupPC by dissy · 2009-09-16 15:15 · Score: 3, Informative

What I use is BackupPC. It's a very nice web front end to tar over ssh.

For linux, all the remote servers need are sshd listening somewhere, and with the backuppc servers public key in an authorizedhosts file. It will pipe tar streams over an ssh connection.

For windows, it can use samba to backup over SMB

I run a copy on my home file server, which backs up all the machines in the house, plus the couple servers I have out in colo.

When it performs an incremental backup, after it is done it will populate its timestamped folder with hardlinks to the last full backup for duped files. so restoring from any incremental will still get the full version no matter when it was last backed up.

Also after each backup, it will do 2 hashes on every file and the previous backup. If the files match, it deletes the second copy and again hardlinks it to the first copy of the file.
I have nearly 3 months worth of backup retention, backups every 3 days (every day on a couple), but for the base system and files that rarely change, each 'copy' does not take up the same amount of disk space.
It is very good at saving disk space.

Heres some stats from its main page as an example

There are 7 hosts that have been backed up, for a total of:
* 26 full backups of total size 38.34GB (prior to pooling and compression),
* 43 incr backups of total size 0.63GB (prior to pooling and compression).

Pool is 10.11GB comprising 108499 files and 4369 directories (as of 9/16 01:00),

Restoring gives you a file browser with checkboxes. after you tell it what you want, it can send you a tar(.gz) or .zip file, OR it can directly restore the file via tar over ssh back to the machine it was on, by default in the original location but that can be changed easily too.

The main downside is the learning curve. But once you get things down, you end up just copying other systems as templates, updating the host/port/keyfile/etc settings.
Also, with all those hard links, it makes it a pain to do any file/folder manipulation on its data dir.
Most programs won't recognize the hard link and just copy the file, easily taking up the full amount of storage.

But works just as well with only itself and one remote server.
schedule it to start at night and stop in the morning, set your frequency and how much space to use before it deletes old backups, and let it run.

Re:BackupPC by Anonymous Coward · 2009-09-16 15:52 · Score: 0

I agree BackupPC is one of the better front ends and management of rsync for Linux.
Use it, Live it, Love it.
Re:BackupPC by IceCreamGuy · 2009-09-16 16:35 · Score: 3, Informative

I couldn't agree more; BackupPC is really great. Not only does it support Tar over SSH and SMB, but it also supports rsync over SSH, rsyncd and now in the new beta, FTP. I backup everything to a NAS and then rsync that every weekend to another DR disk (you have to be careful about hardlinks when copying the pool, since it uses them in the de-duplication process). There are several variants of scripts available on the wiki and other sites for initiating shadow copies on Windows boxes, and with a little tinkering you can even get that working on Server 2008, though of course it really shines with *nix boxes. Highly recommended - the only drawbacks are that, as the parent mentioned, the learning curve can be intimidating at first, and the project has been pretty quiet the past few years since the original developer stopped working on it. Amanda (the MySQL backup company) seems to have picked it back up and they are the ones who released the most recent beta. Did I mention it has a really convenient web interface, emails about problems, auto-retries failed backups (while it's not in a blackout period), and somebody wrote a great Nagios plugin for it? I'm pretty sure I did, oh yes definitely.
Re:BackupPC by Anonymous Coward · 2009-09-16 16:39 · Score: 1

+1 for BackupPC.
The learning curve to set it up really isn't that bad at all, particularly for linux clients - itll take a couple of hours if you follow the examples. For Windows machines, try the cygwin-rsyncd available on BackupPC's SF site... much better, faster than the samba solution, preserves rights and uses rsync magic to only move changed data.
Re:BackupPC by JayAEU · 2009-09-16 17:25 · Score: 1

Very true indeed, BackupPC really is a one-stop solution for doing sensible backups of any number of hosts (local or remote) over a long time. The learning curve isn't as steep anymore, since they introduced a more capable web interface.
I also have yet to see another program that does what BackupPC does any faster.
Re:BackupPC by j_sp_r · 2009-09-16 20:58 · Score: 1

I switched away from BackupPC because the archive size started exploding. Lots of large files getting changed a little was the main reason I think. Also, copying the backup pool takes ages
Re:BackupPC by miffo.swe · 2009-09-16 21:22 · Score: 2, Interesting

I love BackupPC more today than ever. I had a run with some of the more often used commercial offerings and the grass is NOT greener on the other side. Despite fancy wizards and support BackupPC beats any one of them anytime.
I backup about 230 GB of user data each night and still the pool is only 241 GB after many months of use.
"There are 6 hosts that have been backed up, for a total of:
* 51 full backups of total size 1895.95GB (prior to pooling and compression),
* 36 incr backups of total size 62.33GB (prior to pooling and compression). "
The pooling works really well and saves oodles of space. Best thing is that its very easy to setup/restore files through a web GUI and demands no tinkering at all once its installed. I dont think the learning curve is worse than for anything else. Even if you can install a commercial system easily to its default it takes very much learning, tinkering and work before you can let it go.

--
HTTP/1.1 400
Re:BackupPC by NumericalDriftwood · 2009-09-18 09:13 · Score: 1

I concur with the ease of use for BackupPC. For my home backup system, ease of setup and administration was the primary criterion. I am running a mixed environment, and as long as your directories are not too deeply nested on Windows (there is a pathname length limitation on cygwin's rsync that I ran into), it is fire and forget.
However, I had to do a full restore of my laptop. It was not pretty. I was only restoring 20 GB and the connection would time out. I ended up having to restore single subtrees at a time. It is quite possible that the lag was because my backup server is VERY underpowered (64 MB PII 233 running a command-line only stripped-down Ubuntu Hardy), but you should do a test restore before you even think about depending on this or any other backup system.

Don't use rsync â" at least, not vanilla by Jeremy+Visser · 2009-09-16 15:15 · Score: 4, Informative

Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion â" inevitably, there will be accidental deletions and the like occurring in your studio. If you use rsync (with --delete, as any sane person would, otherwise your backup server will fill up in days, not years), then when some n00b runs `rm -rf ~/ReallyImportantVideos`, they'll be deleted from the backup too.

Remember that pro photography website that went down, because their "backup" was a mirroring RAID setup? Yep â" they lost all their data on one fell swoop when somebody accidentally deleted the whole lot. Don't make the same mistake.

Use an incremental backup tool. Three that come to mind are rdiff-backup, Dirvish, and BackupPC.

I would think that rdiff-backup would suit your needs best. I currently use BackupPC at home, which is great for home backups, but I think that it's overkill (and possibly a bit limited) for what you want.

Hope this helps!

Re:Don't use rsync â" at least, not vanilla by Jeremy+Visser · 2009-09-16 15:17 · Score: 1

Oh dear...when will Slashdot learn to escape stuff with UTF-8? On PHP, it's easy -- htmlentities($unsafe, ENT_COMPAT, 'utf-8') will do the trick. Not sure what Perl needs.
Re:Don't use rsync â" at least, not vanilla by pla · 2009-09-16 15:26 · Score: 3, Informative

Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion - inevitably, there will be accidental deletions and the like occurring in your studio.

rsync actually includes an option to make hardlinked snapshots as part of the syncing process, nowadays.

Personally, I don't trust it and always do that part manually, then let rsync do what it does best... But yeah, even "vanilla" rsync contains exactly the functionality you mention.
Re:Don't use rsync â" at least, not vanilla by adolf · 2009-09-16 18:45 · Score: 2, Informative

*nod*, at least for various definitions of "manually."
I have a script which makes a hard-linked clone of the latest backup, and then rsyncs to that (with some manner of special commandline switch which is made for this scenario and that I can't be bothered to look up right now). It's easy, and it lets me have layered backups not totally unlike (though nowhere near as slick as) Netapp's snapshots.
I have done bare-metal restores of Linux boxen from backups made like this. Works just fine, with an iota of bootstrap knowledge.

--
Kid-proof tablet..
Re:Don't use rsync â" at least, not vanilla by MrNemesis · 2009-09-16 23:20 · Score: 3, Informative

rsync makes it pretty easy to implement a bargain-basement backup system if you're willing to do a bit of hacking around with scripts and soft/hard links. Make your backups into e.g. /backups/2009/09/17/* and update the symlink for /backups/latest to point to that dir; when the next backup comes along, use the --link-dest=/backups/2009/09/17/ to hardlink all files that have stayed the same, but copy over the newer versions into your /backups/latest. This way you get a) the absolute minimum space taken up without resorting to snapshots and b) and easy way of looking at and restoring individual files or the whole tree from a given date/time. For bonus points set up a vacuum script that automagically deletes the oldest backups whenever your backup partition gets to 90% full or whatever. Run your set of scripts every hour or so (but don't forget to include lock files/semaphores so you don't end up running nine instances of the script simultaneously).
As far as syncing large amounts of data, firstly use rsync 3 if you can - it's a hojillion times faster with large numbers of files and much easier on your memory. If you're going over the internet, tunnel through SSH using inline compression (if your data is easily compressible that is) - heck, tunnel through SSH on your private network, rsync makes it ridiculously easy. Using this technique I managed to keep a mirror of a 2TB file server over a 2Mbps SDSL link no more than an hour or two out of date.
That's how I remember it working anyway - don't have a box I can try it out on here, but in all honesty rsync and a bit of bash/python/whatever is capable of reproducing all sorts of "enterprisey" backup features for zero cost and almost zero effort (and, I'll almost certainly say, zero approval from your boss). IMHO it's one of the killer apps of UNIX.
Disclaimer: I am not an employee of Rsync Overlord Corp, just a satisfied customer ;)

--
Moderation Total: -1 Troll, +3 Goat
Re:Don't use rsync â" at least, not vanilla by Anonymous Coward · 2009-09-17 01:25 · Score: 0

I've never tried this (I don't keep diffs around) but..
FreeBSD (UFS2) has "snapshots", basically you'd have a script revolve the snapshots before each rsync.
Then if you want a file X days old, you'd use that snapshot.
It'd be pretty easy to do. (I generally use snapshots just prior to doing something semi-dangerous)
There are various how-to type documents out there explaining how to use snapshots to take hot backups of a live freebsd machine (mysql in particular), one would think the reverse would also work.
Ha! my captcha code is "commits" kind of appropriate. :-)
Re:Don't use rsync â" at least, not vanilla by jaxom · 2009-09-17 06:06 · Score: 1

This is basically how Time Machine works. Don't forget that the poster is using Final Cut, which means that he is using Mac hardware. Even though there is a Drobo in the backend, depending on how that is connected into the environment, a Time Machine backup will do exactly what you've laid out here. For online replication over distance, just use rsync over ssh, making sure that you preserve all the hard links. Mac OS X already has done the heavy lifting working out what needs to be saved, so you can just copy the backup volume.
Recovery off that volume couldn't be easier too. Just rebuild a server at the remote location and connect up the Time Machine remote backup and you then can just copy data off, knowing that you got the latest version of your environment at about roughly two hours before it got hosed. In fact, worst case would have to take into consideration the lag between the new data being written and copied over to the Time Machine volume (Time Machine runs every hour and I'm assuming about an hour for moving large new video files over to the backup volume), plus whatever the time taken is to copy the data across your pipe. I would make a SWAG at you being okay up to about two to three hours out of sync, although you really need to test the hell out of this setup to guarantee recovery.
Finally, you may want to consider archival copies of your data and purge from your main online repository. Once something has been broadcast, "freeze" it and dump it too a near or offline archive. For something that will be more user friendly, although more complex to setup, you should have something like everything broadcast for the past seven days "online", the past month on "near line" and the rest goes onto tape archived somewhere. Of course, depending on the volume of data, type of requests for older information, frequency of requests, etc you should change these values as appropriate.
This sounds like a fun project!

Solaris and ZFS by Anonymous Coward · 2009-09-16 15:17 · Score: 0

If you are willing to try something a little different, the ZFS file system is ideal for this.

while 1:
rsync to the zfs filsystem;
snapshot the zfs filesystem;
delete snapshots more than 1 week old;

We've found that, for data that doesn't change often, you can use this mirroring technique to "backup" three or four TB in ten minutes.

You could also turn on zfs on the fly compression but it would probably not help here since your source data is likely to be already compressed.

Different Solutions by Anonymous Coward · 2009-09-16 15:19 · Score: 1, Informative

My company is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features. Offsetting that is the horrible, and I mean horrible, cost. Acronis just (as in like less than a month ago) came out with their new backup product, which they even give a free trial for. It does bare metal restore among other things, and I was very impressd with it, but it didnt meet some of my requirements and I didnt get to play with it much more. On the cheaper more jenky side of things, I have tried NovaStor backup products with overall horrendouse results, stay away completely from them. (things like being able to export data directly to a removable drive for first time transfer is ridiculous!) I am very impressed with a completely off the wall solution called RBackup. It seems at first very "made in india" but it has tons of features that are easy to understand (being brandable is a big plus) and generally can be setup quickly or very granularly. If your using a windows system you should check it out.I have also looked at symantecs and other things, but these so far are a few of the major players in the "I want to remote backup my own data to my own servers" category (which excludes lots of stuff) Since I am still in the review process, I am also curious to see what other people say. I can also tell you that I have setup almost 4 drobos now and they really rock, so your doing good on that front!

Re:Different Solutions by mlts · 2009-09-16 16:54 · Score: 4, Interesting

Backups for UNIX, backups for Windows, and backups all across the board almost require different solutions.
For an enterprise "catch all" solution, I'd go with TSM, Backup Exec, or Networker. These programs can pretty much back up anything that has a CPU, although you will be paying for that privilege.
If I were in an AIX environment, I'd use sysback for local machine backups and backups to a remote server.
If I were in a general UNIX environment, I'd use bru (it used to be licensed with IRIX, and has been around so long, it works without issue with any UNIX variant.) Of course, there are other solutions that work just as well, both freeware, and commercial.
If I were in a solidly Windows environment, I'd use Retrospect, or Backup Exec. Both are good utilities and support synthetic full backups so you don't need to worry about a full/differential/incremental schedule.
If I were in a completely mixed environment, I'd consider Retrospect (it can back up a few UNIX variants as well as Macs), Backup Exec, or an enterprise level utility that can back up virtually anything.
Please note, these are all commercial solutions. Bacula, Amanda, tar over ssh, rsync, and many others can work just as well, and likely will be a lot lighter on the pocketbook. However, for a business, some enterprise features like copying media sets, or backing up a database while it is online to tape or other media for offsite storage may be something to consider for maximum protection.
The key is figuring out what you need for restores. A backup system that is ideal for a bare metal restore may be a bit clunky if you have a machine with a stock Ubuntu config and just a few documents in your home directory. However, having 12 terabytes on Mozy, and needing to reinstall box from scratch that has custom apps with funky license keys would be a hair puller. Best thing is to use some method of backups for "oh crap" bare metal stuff, then an offsite service just in case you lose your backups at that location.
Figure out your scenario too. Are multiple Drobos good enough, or do you need offsite storage in case the facility is flooded? Is tape an option? Tape is notoriously expensive per drive, but is very economical once you start using multiple cartridges. Can you get away with plugging in external USB/SATA/IEEE 1394 hard disks, backing to them, then plopping them in the Iron Mountain tub?
Re:Different Solutions by cblack · 2009-09-16 18:54 · Score: 1

Do not consider Backup Exec in a partially Linux/UNIX environment or one with large numbers of data files.
That is all.
Re:Different Solutions by Anonymous Coward · 2009-09-17 01:27 · Score: 0

Please note, these are all commercial solutions.
The kid is a student. He wants to learn, not simply follow instructions. An open source solution would be ideal for this.
Re:Different Solutions by pnutjam · 2009-09-17 04:14 · Score: 2, Informative

for a multi-vendor environment, take a look at Unitrends. I use them and they are really sweet, disk to disk, any OS, bare-metal windows (and linux), hot swappable off-site drive or off-site vaulting. Plus, there is no charge for clients if you want to backup a database, or exchange server. It's all inclusive, even the open file client.
In my experience, getting open files backed up is the hardest thing in a 24/7 environment.

--
Cheap storage VM.
Re:Different Solutions by CryptoJones · 2009-09-20 03:25 · Score: 1

If you can follow directions, you can get the RALUS working. If you can't you prolly should not be playing with linux boxen in the first place.

--
"Chance favors the prepared mind." ~Me

DAR - disk archive by pseudonomous · 2009-09-16 15:23 · Score: 1

If you're considering doing incremental or archival backups I would look into using dar. It's sort of like tar on steriods, and is great little utility. It's also nothing like bleeding edge, runs on both Linux / BSD platforms and has a windows port (that I've neever used). Combining dar w/ ssh and some simple shell scripts might be the sort of solution you're looking for.

another thought by traveler359 · 2009-09-16 15:24 · Score: 1

The Backblaze hardware setup looks impressive and might be worth a look. As for software how about something like openfiler http://www.openfiler.com/ If those 2 could be combined it would make one impressive setup.

Re:another thought by afidel · 2009-09-16 15:33 · Score: 1

The Rackblaze hardware is trash, if you care at all about data integrity you would run away from it very fast. For not much more get a system with ECC and an ECC bus and use ZFS.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:another thought by symbolset · 2009-09-16 17:18 · Score: 1

We're currently having a war between people who think that reliable end-to-end infrastructure is worth unlimited amounts of money to pay for reliable engineering, and people who think that realiable end-to-end infrastructure can be achieved by hardware and software redundancy with off-the-shelf components. Since the former require contact information so you can do a ROI survey with your marketing specialist and the latter have prices on the website you can click on, the latter are winning.
Your guidance on how we can turn this around would be helpful. The Rubes are getting away! How do we get them back?

--
Help stamp out iliturcy.
Re:another thought by afidel · 2009-09-16 17:41 · Score: 1

Redundancy is NOT data integrity unless you are doing some kind of higher level checks between nodes. Yes that can be accomplished but I am not aware of any off the shelf solution that accomplishes it. For only a couple % more per node you could get a real server class board with ECC and when combined with running ZFS be relatively assured of data integrity. Running without ECC and using SATA drives in that quantity without a check-summed file system is a guaranteed recipe for silent data corruption.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Re:rdiff-backup: like rsync with versioning by metalhed77 · 2009-09-16 15:28 · Score: 2, Interesting

I love rdiff backup but I'd never use it on any large datasets. I attempted to use it on ~ 600 GB of data once with about 20GB of additions every month and it ran dog slow. As in taking 6+ hours to run every day (there were a lot of small files, dunno if that was the killer).

For larger datasets, like what the poster has, I'd go with a more comprehensive backup system, like bacula. I use that to backup about 12TB and it's rock solid and fast. There's a bit of a learning curve, but the documentation is very good.

If Bacula is too intimidating rsnapshot would be a viable route, it's similar to rdiff-backup, but simpler (pretty much just rsync + cp using hard links), faster, and easier to use. It's not as space efficient, but diffing video data is probably a waste of time anyway.

--
Photos.

ZFS replication by Anonymous Coward · 2009-09-16 15:29 · Score: 0

ZFS replication and snapshots. Of course, you'd need something which groks ZFS on both sides of the link.

Bacula by TD_3G · 2009-09-16 15:29 · Score: 2

While our storage needs are nowhere near that size, I can attest to the greatness of Bacula. The hardware part is probably up to you, but as far as software, I cannot preach this software enough. 1) It's completely cross platform in terms of systems you can pull data from. The Director and Storage Daemon run flawlessly on every distro of Linux I've tried it on (Slackware, Debian, and Fedora)... and the restores are easy as pie with some of the available interfaces. Configuration is a pain and can take awhile, but once it's set, you're done. We have 5 servers, two of which are hosted outside the company and we don't even have physical access too... I was able to set these up to work with the same backup solution as if they were local with ease. Other internal servers are Windows 2000 -> Mac OS X... all backup without issue, daily incrementals, weekly diffs, and once a month fulls.

--
...

Re:Bacula by tokul · 2009-09-16 20:14 · Score: 1

biggest problem - 12TB full backup. Setup depends on budget and you won't be able to do full backup over night on slower links.

Re:rdiff-backup: like rsync with versioning by pla · 2009-09-16 15:32 · Score: 1

You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states.

Seriously people, learn the tools you have available on any stock Linux system.

Even assuming you run a much older system with an FS that doesn't support online snapshotting... "cp -al <source> <destination>". Period.

equallogic PS6000 by Rooked_One · 2009-09-16 15:33 · Score: 1

iSCSI rocks... and these things have everything built in. Seriously cool units. Costly though - but you know where that money goes when you use it - or should I say, spend 10 minutes setting it up and then job done.

Re:equallogic PS6000 by Rooked_One · 2009-09-16 15:34 · Score: 1

oh yea, it runs on NetBSD - too quick on the reply trigger.
Re:equallogic PS6000 by symbolset · 2009-09-16 17:27 · Score: 1

The price on these is "Please call to order". To me that says, "If you have to ask, you can't afford it.".
How does this stack up against the BackBlaze box at $120/TB, hardware and software included?

--
Help stamp out iliturcy.

Different Solutions by Anonymous Coward · 2009-09-16 15:35 · Score: 3, Informative

My university is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features. Offsetting that is the horrible, and I mean horrible, cost. Acronis just (as in like less than a month ago) came out with their new backup product, which they even give a free trial for. It does bare metal restore among other things, and I was very impressd with it, but it didnt meet some of my requirements and I didnt get to play with it much more. On the cheaper more jenky side of things, I have tried NovaStor backup products with overall horrendouse results, stay away completely from them. (things like being able to export data directly to a removable drive for first time transfer is ridiculous!) I am very impressed with a completely off the wall solution called RBackup. It seems at first very "made in india" but it has tons of features that are easy to understand (being brandable is a big plus) and generally can be setup quickly or very granularly. If your using a windows system you should check it out.I have also looked at symantecs and other things, but these so far are a few of the major players in the "I want to remote backup my own data to my own servers" category (which excludes lots of stuff) Since I am still in the review process, I am also curious to see what other people say. I can also tell you that I have setup almost 4 drobos now and they really rock, so your doing good on that front!

Best solution for backup hands-down is... by macraig · 2009-09-16 15:36 · Score: 3, Funny

... BitTorrent pirates. You'll always find last night's shows backed-up on TPB the next morning. Yaaarrr!

Re:BackupPC is enterprise grade by t35t0r · 2009-09-16 15:37 · Score: 1

We backup 15TB nightly (using tar over NFS) with BackupPC running on two servers each with 10TB of storage pulling data from a high performance NAS (BlueArc). We retain 30 days of incremental backups and do a full for the various home directories every 30 days.

RAID by Anonymous Coward · 2009-09-16 15:40 · Score: 0

Just to make the /.'ers happy, I think using RAID is the best backup solution possible. In the event of hard drive failure your data is still safe!

Rsync is ok by kilodelta · 2009-09-16 15:40 · Score: 1

But rsnapshot works even better. When I worked for the RI Sec State's office we found tape backup wasn't cutting it for us. We picked up a cheapie HP server loaded it up with storage and bought a bunch of terabyte capacity external drives for off sites.

You don't know what a relief it was to be able to go to a web interface and restore files from there. Worked great with linux boxes, but you had to jump through a few hoops to deal with the Windows servers we had.

Re:Rsync is ok by skogs · 2009-09-16 16:04 · Score: 1

mod parent up. rsnapshot is painless and elegant.

--
Who is this that even the wind and the waves obey Him? Surely this computer must submit also!
Re:Rsync is ok by symbolset · 2009-09-16 17:37 · Score: 1

Tape? Are we really still talking about tape? Please tell me we're not still talking about tape.
Do I have to draw you a diagram? Random access time is measured in fractions of an hour.
Could y'all please quit mentioning the freaking Tape solutions of yesteryear please? Tape is dead. It's really, really dead. The Andromeda strain could not bring this crap back to life. I would prefer you try to back up to floppy. The very word "tape" brings to mind the 18" reels of the '60's. I'm old, and even when I was born we were working on disk.
Could all you tape farkers please just turn in your geek card? You're all fired.

--
Help stamp out iliturcy.
Re:Rsync is ok by kilodelta · 2009-09-16 17:50 · Score: 1

Well it was 4mm DAT but yes, very limited in capacity and access. Not to mention we'd never done a full restore from tape.

You'd actually be surprised how much tape is still used in government circles, the FBI included.
Re:Rsync is ok by symbolset · 2009-09-16 18:17 · Score: 1

I most definitely would not be surprised. I had to anser a tape question today WRT purchase of a NEW tape system. And I answered it with the desired solution without asking, "WTF are they thinking?"
Maybe I'm part of the problem.

--
Help stamp out iliturcy.
Re:Rsync is ok by kilodelta · 2009-09-16 18:24 · Score: 1

That's the thing. We knew that tape is a serial system, in other words if the data you want is towards the end you have to spool through the thing to the end.

Hard drives are random access devices. Point, click, restore. Love it.
Re:Rsync is ok by symbolset · 2009-09-16 18:45 · Score: 1

We're going to have this talk again in a decade. Memory has to heat up a solid state channel. Disk has to wait until a physical block swings around to the R/W head to read and be interpreted by the firmware. Tape has to rewind that (stuff) for 20 minutes before the desired data is under the R/W head. That's the difference between mechanical storage and solid state. This one's over and tape lost. I can't believe we're still talking about tape. Tape was cool in 1980, when I could store 120K on one tape. But not since. Now: not so much.

--
Help stamp out iliturcy.
Re:Rsync is ok by Anonymous Coward · 2009-09-18 02:39 · Score: 0

Tape doesn't necessarily have to be the ONLY place your backups go. Where I work, our (roughly 1TB/wk) backups go to a local 4TB disk storage array, and then get cloned to a single LTO4 (800GB/1.6TB) tape for storage in a safe deposit box. If I need to restore something from the last month or so, it is immediate. If I have to restore something from two months ago or the year-ends (on archival quality WORM tapes), it has never taken more than 30 minutes or so. Modern tape technology is quite advanced from 18" reel-to-reel tapes, and is fine for long-term backups. Recommending floppies? I know that's hyperbole, but really? What was the capacity and speed of the last tape you used? Yes, I do randomly test restores from old tapes, and they're fine. Yes, I do have an LTO4 drive set up on a server at another location specifically for disaster recovery purposes. Tape is fine. It is not dead. It just depends on how you use it. I suppose I could hack something together using external drives instead of tapes...but why? What I have works fine.

Questions, and some answers by brindafella · 2009-09-16 15:41 · Score: 1

Cred: Some years ago I 'engineered' and essentially built a community radio station.

Will you ever need to stream direct from the backup to air? (Go and ask the management and the other techos: "ever".)

Why? This will answer what speed you need to transfer data both to and FROM the backup, and whether you need to take any special measures to ensure that there are no bottlenecks and single points of failure in the path. And, you'll find out whether the Master Control/studio needs to 'control' this path and so what you'll need to build in at Control.

What does the Production department need for editing?

Why? Someone else has discussed versions, and from my experience there is at least a several-to-one requirement for digital space during editing. It also answers why the editors, at their suite(s), may need similar 'control' as for Master Control.

Is the station using a control computer to put content to air?

Why? Almost certainly, is the answer. You'll have to not only give Master Control a 'manual' system, but provide some way for the control computer to stream to air, and they'll be subtly different so "get over it" and plan that way.

What happens when a drive/video coder/etc blows in some system? Can you be off air? What's the time for a fix?

Why? If your station can be off air then you fix at the next available opportunity; but, if you must be on air (like a 'commercial' station) then you have to plan and execute a solution like the commercial one, only cheaper.

--
Looking at space, radio, science and computing from a 'down-under' amateur enthusiast perspective.

Here's an effective cost-reduction strategy. by failedlogic · 2009-09-16 15:43 · Score: 4, Funny

Have each student create their "own TV station" as part of their degree requirement - no matter the area of study. Similar to research essays, you'll get the following results: 1) students who completed the assignment with no outside assistance 2) students that copied certain small portions of the data you are backing up and presenting it as their own 3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work.

As this data appears on the University network, the entire TV station will be backed-up in a local "Cloud". And if these types of assignment become popular at other universities, you can expect to find redundant off-site backups. By this point, the 12 TB will appear on BitTorrent (and probably on Newsgroups and IRC for the dedicated plagiarists). A full restore will only take a few days - as long as the full 12 TB is seeded.

Re:Here's an effective cost-reduction strategy. by Shinobi · 2009-09-17 01:06 · Score: 2, Funny

"3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work."
And then you can also flag future FSF cultists. Win-win. ;)

Re:rdiff-backup: like rsync with versioning by Alien+Being · 2009-09-16 15:51 · Score: 2, Insightful

Do you know what -l does?

iSCSI by mdaitc · 2009-09-16 16:00 · Score: 1

get an iSCSI device:
http://www.promise.com/product/product_detail_eng.asp?segment=undefined&product_id=226 The Promise VessRAID series is currently available through distribution. Pricing starts at $1,899 for an 8 bay system and ranges to $3,099 for a 16 bay system. A fully populated 16 bay subsystem costs less than 26 cents per gigabyte, using enterprise-class 7200 RPM 2TB hard disk drives.
so basically, $2.6k for a unit @CDW, 16*$300 for 2TB hard drives (newegg)
total $5k for 32TB raw.

Re:iSCSI by mdaitc · 2009-09-16 16:02 · Score: 1

arrg, my quick math, i mean
total a little over $7k for 32TB (or it's cheaper if you *only* get 16*1TB drives..)

Don't focus on backup, focus on recovery by mlheur · 2009-09-16 16:04 · Score: 1

You don't have backup needs, you have recovery needs. Backup enables you to fulfill those needs.
As has been mentioned many times above, there's no one fit answer - but I don't think you're even asking the right questions.
Under what circumstances will you be recovering data? There are two main types of recovery:
day to day recoveries where users want older versions of files or to replace a corrupt or deleted file; and
disaster recovery in case of hardware, system or site failure.

Will you support both recovery needs? If so then for day-to-day recoveries you need backups every day kept for any length of time deemed appropriate. Proper tape based backup is still the industry standard here just based on the volume. 12TB at 75% used, running full backups every week kept for 4 weeks, and daily cumulative incremental backups with 5% changes every day kept for 10 days means 51.3TB of data. Plus, you don't want all your copies on a single media, imagine if that thing failed?

For disaster recovery you need to know your RPO and RTO? Your Recovery Point Objective is basically how much data can you stand to loose while your Recovery Time Objective is how long after the disaster you can take to get back up and running. Answering these will tell you how often you need to run a backup and what storage technologies and methods are appropriate, or at least which ones are inappropriate. How are you going to protect your data from the disaster - how far away is far enough? I wouldn't consider the same campus as far enough away.

There are a number of products out there. I personally work with NetBackup from Symantec and it's pretty much an industry standard, but that's my employer's choice. I've looked at amanda (http://www.zmanda.com/) a few times, but haven't done any real testing with it. There's data protector, BackupExec and many listed at http://en.wikipedia.org/wiki/List_of_backup_software

redundancy, anyone? by SuperBanana · 2009-09-16 16:09 · Score: 1

Recommending a backup solution where if one power supply dies you immediately corrupt the entire array? Yeah, that's JUST what he needs...

--
Please help metamoderate.

Re:redundancy, anyone? by SanityInAnarchy · 2009-09-16 16:39 · Score: 1

So build two.
A backup server doesn't need redundancy if it's a backup server.

--
Don't thank God, thank a doctor!
Re:redundancy, anyone? by mysidia · 2009-09-16 17:27 · Score: 4, Interesting

The hard drives are desktop class, not designed for 24x7 operation. Not designed for massive write traffic that server backups generates.
Latent defects on disks are a real concern.
You write your data to a disk, but there's a bad sector, or miswrite, and when you go back later (perhaps when you need the backup), there are errors on the data you are reading from the disk.
Moreover, you have no way of detecting it, or deciding which array has recorded the "right value" for that bit...
That is, unless every bit has been copied to 3 arrays.
And every time you read data, you compare all 3. (Or that you have two copies and a checksum)
Well, the complexity of this redundancy reduces the reliability overall, and it has a cost.
Re:redundancy, anyone? by SanityInAnarchy · 2009-09-16 17:38 · Score: 1

That is, unless every bit has been copied to 3 arrays.
3 arrays? Why? Do it in software, do RAID 5. How likely is it that you'll have a bad sector, or a miswrite, that hits both the stripes and the parity?

you have no way of detecting it, or deciding which array has recorded the "right value" for that bit...
Or use ZFS. It'll checksum everything, so yes, it'll know which array has the right value.

--
Don't thank God, thank a doctor!
Re:redundancy, anyone? by mysidia · 2009-09-16 18:21 · Score: 1

3 arrays? Why? Do it in software, do RAID 5. How likely is it that you'll have a bad sector, or a miswrite, that hits both the stripes and the parity?.
It doesn't matter. Just one of them has to have been miswritten. With RAID5, you have no way of telling which one is correct and which one is wrong.
Parity = STRIPE D1 (XOR) STRIPE D2
You can customize your RAID implementation to read all 3 disks... it will find the bit where the bit in the parity stripe is not equal to STRIPE1_BITS (XOR) STRIPE2_BITS. It can know that the bit read is inconsistent, there must be the error.... the problem is RAID5 won't know which disk has the error..
How do you know which stripe is right and which stripe is wrong?
The answer is, you don't know unless you have double parity (RAID6), which would further impact performance, and reduce the amount of storage you get per $$$.
Using software RAID5/6 in such a scenario is even worse than RAID1 in a 3-way mirror config, due to the RAID write hole it introduces, you lose terribly, if, say the power fails. Using PCIe controllers with hardware RAID would massively increase the cost of the box.
It's not likely the same bit error occurs, but RAID5 cannot correct these errors for you at all, or RAID6 cannot correct these errors for you (once a single drive has failed). When using consumer drives, the probability of an eventual drive failure in your RAID5/6 array is high, and chances are it happens before you need your data. Different drives can have different bits in error. And when one of those drives fails (before you can detect any error), the redundancy is no longer there to even try. When you rebuild the array, the bits will still be in error.
That's why it's called silent data corruption RAID5/6 implementations do not read bits from all 3/4 stripes when a read is requested, that would be slow, even slower than it is due to the XOR computation against 2 stripes; they distribute their reads.
And look how massively the Backblaze unit has oversubscribed SATA channels with all these multipliers, software RAID rebuild performance will be quite reduced VS more common server configurations.
To be able to verify in software, you'd have to break the RAID abstraction and somehow read bits from each underlying disk, perform both XORs, compare them, alert if they are different. And somehow pull those bits from the other box that doesn't detect an error.
Well, Linux software RAID doesn't offer a way to do this... and it is a layering violation, also, kernel developers are not likely to add a feature such as that, as a result.
Or use ZFS. It'll checksum everything, so yes, it'll know which array has the right value.
Yes, a very good answer to all silent data corruption issues, unfortunately, their hardware uses port multipliers, which don't exist in a non-development version of OpenSolaris yet. Maybe in 2010....
Re:redundancy, anyone? by drsmithy · 2009-09-16 22:27 · Score: 1

Using software RAID5/6 in such a scenario is even worse than RAID1 in a 3-way mirror config, due to the RAID write hole it introduces, you lose terribly, if, say the power fails. Using PCIe controllers with hardware RAID would massively increase the cost of the box.
Hardware RAID controllers have exactly the same issue. They won't save you from the problem you're describing.
That's why it's called silent data corruption RAID5/6 implementations do not read bits from all 3/4 stripes when a read is requested, that would be slow, even slower than it is due to the XOR computation against 2 stripes; they distribute their reads.
Actually it wouldn't make any meaningful difference at all. Parity calculations on any remotely modern CPU are so fast (gigabytes/sec) that dealing with the paltry few 10s - or maybe 100s in rare circumstances - of megabytes per second of data coming off the disks is insignificant.
To be able to verify in software, you'd have to break the RAID abstraction and somehow read bits from each underlying disk, perform both XORs, compare them, alert if they are different. And somehow pull those bits from the other box that doesn't detect an error.
Or you could just keep checksums at the file (rather than stripe) level on several machines and compare them (and play the probabilities game that fewer are corrupted). Which I *hope* is what Backblaze are doing, because otherwise their system is doomed to catastrophic and dramatic failure.
Re:redundancy, anyone? by mysidia · 2009-09-17 01:16 · Score: 1

Hardware RAID controllers have exactly the same issue. They won't save you from the problem you're describing
The RAID5/6 write hole is closed on hardware controllers. Usually by a battery-backed cache module.
It does nothing as for silent data corruption; however.
Re:redundancy, anyone? by SanityInAnarchy · 2009-09-17 01:48 · Score: 1

unfortunately, their hardware uses port multipliers, which don't exist in a non-development version of OpenSolaris yet.
I wonder how stable ZFS is in FUSE.

--
Don't thank God, thank a doctor!
Re:redundancy, anyone? by LWATCDR · 2009-09-17 01:59 · Score: 2, Informative

Why bother?
OpenSolaris will run rsync just fine and it is also free.
There are a lot of good solutions out there so I wouldn't limit myself to just Linux.
You have OpenFiler running on Linux.
You have FreeNAS on BSD.
And you could roll your own on OpenSolaris and use ZFS with fancy gui tools if you really want to.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:redundancy, anyone? by drsmithy · 2009-09-17 02:03 · Score: 1

The RAID5/6 write hole is closed on hardware controllers. Usually by a battery-backed cache module.
No, the probability is somewhat reduced. The problem itself is fundamental and inescapable.
Re:redundancy, anyone? by Lumpy · 2009-09-17 02:30 · Score: 1

How about the fact that RAID is not a backup solution. its a high availability solution.
No sane IT expert would use RAID for a backup.

--
Do not look at laser with remaining good eye.
Re:redundancy, anyone? by Dare+nMc · 2009-09-17 02:54 · Score: 1

No sane IT expert would use RAID for a backup
Many (most?) seam to use RAID for a backup (server), just not as the backup. IE it takes more time to rebuild a backup, than to rebuild a RAID array, and if the backup goes down, it is often as big of a issue as if the main server goes down. (main server goes down, you have a backup so no work lost, users do what work they can offline until it is back up.) The backup servers I worked on had RAID on the main server, and the Backup server, why not?
Re:redundancy, anyone? by Anonymous Coward · 2009-09-17 03:05 · Score: 0

What you just said echos what Lumpy just said.
a "backup server" is a high availability server with a copy of the data. it is NOT a backup.
It seems that you don't know what the word backup means in the world of IT.
Re:redundancy, anyone? by SanityInAnarchy · 2009-09-17 03:12 · Score: 2, Interesting

Why bother?
See GP. If the hardware I want isn't supported by Solaris, but is supported by Linux, I'll want to use that.

OpenSolaris will run rsync just fine
It'll also run NFS, so if the hardware will support it, you do have a point -- even if I "needed" Linux for some reason, I could still use Solaris for the physical storage.

--
Don't thank God, thank a doctor!
Re:redundancy, anyone? by Archangel+Michael · 2009-09-17 03:21 · Score: 1

The hard drives are desktop class, not designed for 24x7 operation.
I've had better luck with desktop class drives than I have with enterprise class drives. I suspect that enterprise class drives are just spinning faster(10K/15K vs 7.2K/5K), and thus burn out faster. Yeah the bearings might be higher class, but I suspect everything else is off the same production lines. I could be wrong, that is just my "gut" opinion.
AND my guess is that magnetic drives will be mostly gone in 5 years, and we'll be using SSD for just about everything by then. So we're probably arguing over the proper length of a buggy whip while Ford is just now building Model A.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Re:redundancy, anyone? by cbreaker · 2009-09-17 04:01 · Score: 1

No, you're not wrong. And I don't even think the bearings are any different. Enterprise class hard drives generally spin faster, but they are almost always lower capacity. So, that's the trade-off - space or speed. It's always been that way. Because manufacturing hard drives has improved so much in recent years, there's less of a gap, but you still can't get a 10K 1.5TB disk. I think Hitachi is the only company making a 10K 1TB disk?

The MTBF is more about warranty than quality of the product. I guess some people think desktop hard drives come from a "less clean" room assembly plant?

--
- It's not the Macs I hate. It's Digg users. -
Re:redundancy, anyone? by generica1 · 2009-09-17 05:21 · Score: 1

What he is (I think) saying is that nobody sane would use RAID on the host machine as their only backup, and feel safe. Integrating RAID with a backup strategy whereby the RAID is not the only copy of the data being backed up, i.e. if RAID is on the backup server, and not just on the main machine being backed up, then you essentially have a combination of backup (the second box) and high availability (the RAID on the second box). Which is a Good Thing.
When implementing RAID I like to use RAID + LVM + hot swappable SATA discs. That's a nice high availability option.

--
JUMP JUMP JUMP JUMP JUMP JUMP JUMP JUMP IRRIGATE
Re:redundancy, anyone? by Anonymous Coward · 2009-09-17 07:26 · Score: 0

ZFS mirror or copies>1 should take care of that.
Re:redundancy, anyone? by Dare+nMc · 2009-09-17 08:33 · Score: 1

I guess my post might have just been a grammar troll, but with a post a few levels up claiming raids complexity could increase failures I thought it was worth correcting Lumpys grammar anyway (that was likely his meaning). Most backup servers I have dealt with must store more data (history) than the servers/pc's they backup. And due to that extra volume and data are more difficult to handle, and thus a failure could be even more catastrophic. IE loosing 2 years of history could be worse than loosing this weeks working copies. So using RAID on a backup for a file server could be more valuable than using RAID on the main file server.
Re:redundancy, anyone? by LWATCDR · 2009-09-17 08:43 · Score: 1

I would look at some different hardware.
Honestly things like motherboards and raid controllers are cheap compared to data.
ZFS just seems too god to not use and Solaris actually looks very interesting. If that really isn't an option I would look at OpenFiler. It runs on CentOS and makes a great NAS.
I would also stick with a server Linux. Fedora updates too much for my comfort.

--
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Re:redundancy, anyone? by Anonymous Coward · 2009-09-17 10:35 · Score: 0

3 arrays? Google: ZFS
Re:redundancy, anyone? by mysidia · 2009-09-17 11:40 · Score: 1

When you use a controller with battery backed NVRAM, the in-flight stripe updates are held in memory until the disks report that they are successfully written.
The probability is reduced to 0, except if your controller itself fails, or the cache data is lost due to failure of the battery or NVRAM before your server and drives are powered back up.
Re:redundancy, anyone? by Anonymous Coward · 2009-09-19 00:13 · Score: 0

ZFS kiddo.... ZFS.
Re:redundancy, anyone? by Anonymous Coward · 2009-09-19 00:34 · Score: 0

if the hardware is supported by linux or windows and not open solaris, you can use virtualbox and run opensolaris inside, just to have it export the volumes you need.

Remember you need two backups by xavi62028 · 2009-09-16 16:11 · Score: 1

A single backup using rsync isn't going to cut it. Imagine backing up corrupted data, overwriting other stuff. Also, having all backups on the same network is a bad idea if malware ever gets in. Your second level of backup should probably be tape, making a monthly and a yearly backup. Then store the tapes in a concrete and steel fire safe. Tape has longevity that your other options don't.

Re:Remember you need two backups by c6gunner · 2009-09-16 23:14 · Score: 1

A single backup using rsync isn't going to cut it. Imagine backing up corrupted data, overwriting other stuff.
That's what snapshots are for.

Archive Management by Anonymous Coward · 2009-09-16 16:19 · Score: 0

Backing up Final Cut Pro projects and media files seems like a simple enough problem: just copy the files to a tape archive or drive array and be done with it. However, there is more than one reason why archive storage might be required: disaster recovery and long-term storage of project files.

The long-term case is more interesting - as local storage runs low, projects are archived for later retrieval. How do you remember what each archived project contained? How can you be sure that the item that you're retrieving will provide what you're looking for? Restoring projects from any archive is a slow process - especially when using HD formats - so why do this when all you want to do is to use a short segment from a programme?

Most broadcasters employ some form of Media Management to manage this process, allowing editors and producers to browse a permanently available low-resolution version of the archive content, and to restore smaller segments from it. Partial restore using browse-based shot selection dramatically reduces the amount of data transfer and helps to speed up busy editing operations. Employing this is probably overkill in this case, but a different angle to consider on what appears to be a simple problem.

The rsync tool is called "rsnapshot" by Antique+Geekmeister · 2009-09-16 16:28 · Score: 1

I went down the current list of comments, and for all the people who write their own rsync tools, please go review 'rsnapshot'. It's quite efficient: it's major flaw is that it lists snapshots as 'hostname.1', 'hostname.2', etc., instead of 'hostname.YYYYMMDD', which would ease things for users grabbing their own old files from online.

Re:The rsync tool is called "rsnapshot" by Janek+Kozicki · 2009-09-16 19:20 · Score: 2, Insightful

rsnapshot + mdadm raid6. Agreed 100%. That's what I'm currently using. Works like a charm for over 2 years now (and single HDD failure in meantime).

--
# #\ @ ? Colonize Mars #

Here's what I do by MichaelCrawford · 2009-09-16 16:36 · Score: 2, Interesting

First let me point out that there are natural disasters that could potentially take out your backup, if it's on the same campus as your TV station - think of Hurricane Katrina. And for sure you want your Final Cut projects to survive a direct nuclear hit.

Anyway, I have a Fedora box with a RAID 5 made of four 1 TB disks. There is a partition on the RAID called /backup0. That's not really a backup, but more meant as a convenience. I back up all my data to /backup0, then right away use rsync to copy the new data to an external drive that is either /backup1 or /backup2.

I have a safe deposit box at my bank. Every week or two I swap the external drive on my desk with the external drive in the safe deposit box.

So the reason I have that /backup0 filesystem is so that I don't have to sync the two external drives to each other - otherwise I would have to make twice as many trips to the bank, and there would be some exposure were my house to burn down while I had both external drives at home.

My suggestion for you is to find two other University facilities that are both far away, and offer to trade offsite backup services with them.

You would have two backup servers in your TV station - one for each of your partners - and they would also each have two, one each for you, as well as for each other.

That way only a hit by a large asteroid would lose all your data.

I got religion about backing up thoroughly after losing my third hard drive in twenty years as a software engineer. Fortunately I was able to recover most of that last one, but one of the other failures was a total loss, with very little of its data being backed up.

--
Request your free CD of my piano music.

Re:Here's what I do by Anonymous Coward · 2009-09-16 17:28 · Score: 0

The only thing I'd consider adding to that would be tertiary storage like Mozy. The reason I say this is a scenario I encountered at one business (was called in to recover.)
Production machine's hard disks were dead due to a mass rm -rf on /home.
The mirror drives were also dead because they propagated the changes.
The tape drive was dead -- burned out motor.
I borrowed another tape drive of the same type. Apparently the tapes that were dutifully made and stored were miscaliberated, and only worked with the (now defunct) tape drive that was at that company.
The person who did the backups dutifully did tests and test restores. However, because that tape drive wrote in a format only it could read, when it died, the well made backup structure was all for naught.
Luckily, (and the previous admin forgot about doing it), fairly recent versions of the files needed were stored in a tarball on another filesystem, courtesy a long-forgotten cronjob that did a backup of /home nightly and deleted the fourth oldest archive. Had this not been done, that company wouldn't have been in business.
This is why I like multiple mechanisms of backup, and never depending on one single program.

Openfiler? by symbolset · 2009-09-16 16:55 · Score: 1

You've got low latency and high bandwidth. Make your storage iSCSI OpenFiler configured in cluster mode with block replication. Do use a pair of the BackBlaze boxes somebody else mentioned. Configure with RAID 6. Get enterprise support here. You're in and done at $16K capital cost, $2k labor, and annual support (24/7 4 hour response) at $6200/yr for 67TB of raw storage (~48TB net) plus whatever the network, rackspace and power costs, and it scales in volume storage at linear cost when your needs do and the more volume you have, the better performance gets. As a bonus it fits in two 4U slots.

If you want to skimp you don't have to fully populate the boxes until you need the room and can save $8K in capital costs up front. Every couple of months you have to hot-swap out some cheapo consumer grade drives so buy a few spares and configure them as hot spares and a few more for cold spares. If you have some extra Franklins, splurge on the 10G Ethernet connection from the BackBlaze box to the local network - the remote can stay on Gig-E because it's only used for writes or HA. With a little mental gymnastics and PSU field modifications you can use one BackBlaze master to control up to three BackBlaze slaves with passthrough connections only - no internal server needed. Just get the cards with some external eSATA or external SAS ports, depending on your preference. You might need to upgrade the motherboard spec on the master BackBlaze box, but it's worth the extra money. Since Openfiler support is unlimited CPU you may as well get the dual quad core Nehalem motherboard with 72GB RAM and 8 PCIe slots, or whatever's in the sweet spot this week. I do like the X5550, but if you can get a quad core for under $100 it's hard to pass up, especially combined with one of these cheap motherboards that use up to 32GB of cheap DDR2 RAM. Be careful with your PCIe slot counts when choosing motherboards.

Configure whatever machine you're using to do a backup periodically from one i-SCSI LUN on the local machine to another LUN. This gives you protection against 90% of backup needs (oops! I accidentally all all my presentations!) and will be transparently replicated to the HA site at block level without user intervention. Somewhere in here you should educate users that backup systems are not an alternative method of version control.

You could probably upgrade this with a few TB of PCIe attached SSD cache (pdf) for the million plus IOPS, guaranteed multiple 10Gbps network port saturation for an additional $40k, if you knew how, or why, or needed to.

Or you can go cheap with Linux and BSD and some scripts. You won't save any money and you won't have support. Buy the support. It's worth the money. Disclosure: I don't work for any of these folks. For the company I work for I can quote you a FC SAN. Trust me, you don't want to know what that costs for 67TB with block replication to a DR site and 24/7 4 hour support, let alone the scalable solution I've proposed here. Just assume it's "a lot".

--
Help stamp out iliturcy.

Mac OS X Native... by Hummdis · 2009-09-16 17:07 · Score: 1

Well, if you choose to backup on OS X native, which your post doesn't state since rsync is on OS X as well, there's BRU Producer's Edition. Time Machine can be a bit resource hungry in my experience, so that may not be the best option for you. On the Linux front, there are a few tools to do the trick. Again, TOLIS Group has BRU Server for Linux native, but that's a higher price than BRU PE is going to be. However, if you're looking for a free product, rsync may not cut it due to the limitations that many others have already mentioned. There's MondoRescue, but again, I don't think that will work to the needs that you require. Though the user 'mlheur' hit the nail on the head in my opinion. You need to focus on your restore needs and then choose a backup application that fits those needs!

Try Openfiler by symbolset · 2009-09-16 17:09 · Score: 1

It works. It's iSCSI + CIFS / Windows share. It has clustering and block replication. It's open source and support is available. Support is per server - unlimited sockets and storage - so you could really work them with a few hundred PB on a pair of 8 socket/32 core servers. I don't work for them, but they rock!

They're geeks. If you bribe them properly they might come up with a proprietary block level dedupe solution for you.

--
Help stamp out iliturcy.

FreeBSD, ZFS, Rsync --- matches made in heaven by phoenix_rizzen · 2009-09-16 17:13 · Score: 1

http://forums.freebsd.org/showthread.php?t=3689&highlight=zfs+remote+backup

another drobo and a file safe by TRRosen · 2009-09-16 17:20 · Score: 1

simple cheap and easy

Re:another drobo and a file safe by Anonymous Coward · 2009-09-16 17:47 · Score: 0

I like the idea of a dedicated PC that has a lot of hard disks in it. It's nowhere near as elegant looking as a Drobo, but it can be made a lot more secure and a lot more expandable. On the Windows end, you can buy a motherboard with a TPM, a couple RAID cards, and Windows Server 2008 R2, enable BitLocker, set your shares, and pretty much forget about the box. If running Linux, you can use most RAID cards, or LVM with RAID 6 enabled, then run loopback file encryption (volume based, or file based like EncFS). This way, if someone steals your servers, they won't be getting your data as a bonus prize.
What this gives me over a Drobo is the fact that I can drop in a SATA card, add an external drive (or a frame), and expand that way. Also, some PC cases support 12 or more drive bays, so if I went this route, I'd buy 12 2TB drives, mirror the OS volumes, use RAID 6 on the remaining drives, and have about 16TB as an exportable filesystem to play with. Total cost would be about $2700-$3000, but it would last a while for most tasks. Need more space? Buy another SATA frame, add a controller card and more disks.

My backups are very organized as well by MichaelCrawford · 2009-09-16 17:34 · Score: 1

Quite commonly backups are done by copying an entire filesystem, and then doing incremental backups of just the files that have changed.

I'm very concerned about just being able to find the particular file that I need, so I have my backups organized by topic - on each of my backup filesystems, there is a directory for my financial data, for my source code, for each of my websites and so on.

In each directory I put a bzip2ed tarball named for the date - for example "OggFrog_SVN_2009-09-16.tar.bz2". Most of my files compress quite a bit so I don't need to worry yet about running out of space.

The stuff that doesn't compress well mainly consists of media that is already compressed - audio files, my digital photos and so on. I tend not to keep infinite backups of that stuff, but just the latest copy.

It was quite a chore to get it all organized, as to make it work I had to organized the file structure that the backups came from, so that it would be easy to create each topic backup. But now that I have it all organized it is quite easy to deal with - and it's easy to find old files on my backups.

--
Request your free CD of my piano music.

Re:rdiff-backup: like rsync with versioning by Anonymous Coward · 2009-09-16 17:37 · Score: 0

nothing, on cp: illegal option -- l

Re:rdiff-backup: like rsync with versioning by M.+Baranczak · 2009-09-16 17:38 · Score: 3, Insightful

Since we're talking about Final Cut data, it's safe to assume that it's all coming from Macs. The version of cp on Mac OS doesn't take either of those options, so it's a moot point.

Time Machine is probably the way to go. It's integrated into Mac OS, and it's ridiculously easy to set up. I don't know how it scales up, but I'd be very surprised if it couldn't handle 12TB.

Avamar by mysidia · 2009-09-16 17:41 · Score: 1

Although it appears they got bought by EMC.. hrm.

Deduplication can help you reduce the size requirements on the backup server.

If buying new capacity, you should probably think about buying a backup server that can be expanded to have more capacity than your existing server, depending on current server usage.

Plan for a few years down the road, when it becomes necessary to expand capacity of the main server, backup more servers... or more likely: store multiple old versions of files that changed over time.

Normally.. if you have a 500 mb video file, and someone made some edits to it and re-saved. There are now going to be two 'files' in the backup repository for a time: the old version and the new version with the edits (twice the space usage)

So storage requirements on the backup server can actually be much more than storage requirements on the server being backed up.

Online backup? by JayAEU · 2009-09-16 17:50 · Score: 1

If online backup is an option, why not try http://www.wuala.com/ ?

Video is different by S-100 · 2009-09-16 18:10 · Score: 1

Your analysis may not work in this case. This is not a backup system for a large number of business/educational users. It's for a relatively small number of video editing stations. One new video project can easily generate hundreds of gigabytes of new data that needs to be backed up. The average daily churn rate may be comparable, but the peak churn could well be many times that.

Digitized video is not usually backed up the same way as conventional files or databases. Raw digitized video files do not change, and get archived once. Completed projects can go through a clip trimming process whereby the unused portions of the clips are trimmed away, making an archive of the entire project more space-efficient. Then, the raw digitized video files can be deleted. After all, the backup for the video are the original tapes themselves, not a computer-digitized version. The backup rules of a general-purpose office system are very different, and much less efficient.

oh by symbolset · 2009-09-16 18:11 · Score: 1

That seems to be working for Google, MSN and Yahoo.

Maybe they're doing something wrong. You should school them up.

--
Help stamp out iliturcy.

Re:oh by afidel · 2009-09-16 18:39 · Score: 1

I guess you have a reading comprehension disorder:
unless you are doing some kind of higher level checks between nodes...I am not aware of any off the shelf solution that accomplishes it.
Yes if you are Google and have hundreds of millions of dollars a year in really smart people working for you it is possible to use cheap commodity parts and still have data integrity, the vast, vast majority of IT shops do NOT have that. If someone is asking Slashdot how to do backups they do NOT have that ability.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Additional details from tv station by Stormwave0 · 2009-09-16 18:13 · Score: 1

I'm not the asker but I also work in the campus TV station and can provide additional details.

The primary method of storage right now is a RAID 5 based array containing around 6 TB of data. We'll be adding a Drobo Pro in the next few days with an additional 6TB of storage. Together this will serve the 12TB of data on the server located in the studio. The system storing the data now is running CentOS 5.

However, this is not a good method of backup. It just provides redundancy in case a hard drive fails. What we want is an offsite server which will serve as a backup system. The system will be located in a separate building but on the campus network (transfer speeds not an issue).

We want the backup system to be able to store the original 12TB. HOWEVER, it needs to be expandable or at least have enough space to accommodate additional data over the years. So I'm thinking the original setup could have around 16TB of storage. However this needs to be expandable up to 24TB or 32TB without too much extra work involved. With a transition to HD video we plan on having at approximately 1TB of new data per year and this will increase over time.

Because we want the system to be expandable we don't think RAID would be ideal. The idea of having to use identical drives feels very limiting. Hence the reason a Drobo Pro is very appealing. However, it just doesn't support the capacity we require beyond the initial studio server. We want to have a version control system which will require additional storage as well. We don't need daily complete backups. Just something like subversion or CVS which will log when changes are made and save them. That way if someone decides to delete all the directories a history will be stored. The snapshots of the versions don't have to be in real time - they can be done daily. If there are no changes in one day then no snapshot will be required. Typically we do a dump of all the data to the studio server once a month from each editing machine. So snapshots would occur approximately once per month. This data is rarely read - maybe once/twice per year and not all of it.

Restoration time is not that important. As long as it takes less than a few days. No application data is being stored. It's just raw project files and video files that are in directories.

We'd like this to cost between $3000 and $4000 for everything. Obviously, cheaper is preferred.

Its quite easy by Stu101 · 2009-09-16 18:22 · Score: 1

Not being pedantic but it aint a lot to backup. Just get a pair of MSA2000 with 1TB SATA disks. Total cost £20,000 inc tax. MSA2000fc if you can do fibre. Then just get a LT04 tape robot, HP, Overland or similar and do a disk to disk to tape backup setup. So you not only have tape backups for if the entire place burns down, and you also have disk based backup for a quick restore when someone accidently deletes a file. Also with DDT the throughput will be high enough to quickly complete the backups in a small time window over night.

--
http://www.writeitfor.us - Writing IT for the IT generation.

Your backup system MUST support HFS+ by agentofchange · 2009-09-16 18:49 · Score: 1

Your backup system must support the intricacies of HFS+ (the format of Mac hard disks) - otherwise you might loose important data in resource forks and extended attributes.
Rsync in all versions to Mac OS X 10.5 doesn't properly backup resource forks and extended attributes. I've heard there's changes in 10.6 but I've not investigated or tested them.
Use the application 'Superduper' to store your Mac files in a 'Sparsebundle'.
A 'Sparsebundle' is a single file that supports HFS+ and can live on other files systems (such as your uni's servers).

A very expensive way to recover those tapes by MichaelCrawford · 2009-09-16 18:56 · Score: 1

They used it for the Black Box from the first space shuttle explosion. There is a sort of paint consisting of microscopic magnetic spheres that are black on one side and white on the other. When you paint a tape with this, you can see the magnetic patterns on it. You would then take high-res digital photos of it, and recover the data from the photos. It worked for the space shuttle tapes, but as you can imagine it is very expensive to do.

--
Request your free CD of my piano music.

Re:A very expensive way to recover those tapes by petermgreen · 2009-09-17 01:28 · Score: 1

and probablly gets a lot harder with increasing storage density.
Though the GP doesn't seem to have tried that hard to recover the tapes (maybe because he found the cronjob backups). A motor swap and/or trying to recalibrate a good drive to match the bad drive would seem like the obvious steps to take.

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register

gshegosh by gshegosh · 2009-09-16 18:59 · Score: 1

Use rdiff-backup, it makes a mirror copy of the latest version and also stores older backups as increments. It's very fast and stable.

FreeNAS by dr_dex · 2009-09-16 19:00 · Score: 1

The newest FreeNAS RC has support for ZFS and is ideal for backup purposes. The checksumming facility of ZFS also makes you sleep well at night, knowing that silent bit corruption doesn't eat your data. And it has built-in support for rsync.

--
Robin Smidsrod Certified Linux Administrator

QNAP TS-809 by Ptur · 2009-09-16 19:31 · Score: 1

Just put a NAS like the QNAP TS-809 (8-drive) at a remote location. It talks rsync, and that's all you need. It's available as standalone, rackmount and redundant PSU, and is more affordable than a big RAID box.

Do the first rsync with the NAS next to your server, move it offsite afterwards ;)

rsnapshot, rdiff-backup, duplicity by Cato · 2009-09-16 19:54 · Score: 1

These are all built on top of rsync and turn it into a real backup tool by storing multiple versions of your files. The challenge will be the very large video files, but if you only write to these once, they are a good option.

Rsnapshot uses hard links combined with rsync --delete - rather than actually delete an old copy of a file, it unlinks it, and when there are no changes in a file, it simply creates a link to it under the current snapshot. It's not as space efficient as DAR but your big files are probably already compressed, and perhaps don't change day to day, so it may be a good option here. There are many other rsync-based tools that are equivalent, but this is easy to configure and has an active community. It only does 'pull backups' i.e. the backup server logs into the system to back up, and if you are doing full backups of the whole system, it needs a root login.

rdiff-backup stores block-level incremental diffs within its repository - it apparently has issues dealing with extremely large files so I'd be sure to test this. However it doesn't need a root login on the system you are backing up, and is more space efficient than rsnapshot. It has some level of checksumming to help detect corrupt backup files.

duplicity is a bit like rdiff-backup, but apparently does encryption as well, if that's important.

BackupPC is more work to set up, and really designed to back up a large number of client PCs, but it does provide more features, and has rsync as an option on network level. It does de-duplication across the client PCs which is good for full system backups. However, Windows backups mean you have to mess about with volume shadow copies (VSS), as for all these tools.

If you don't want rsync for some reason (despite it being insanely fast and network efficient) you could use DAR - like tar only much better for disk to disk backups, as you can extract a few files efficiently without reading the entire archive from start to finish (as tar requires). Also it does more granular compression and checksums, so if you lose some blocks due to disk corruption in the middle of the archive, most of the files can still be recovered. The rsync tools have the same granularity to some degree, but don't normally do any compression.

The world really, really needs a guide to the major categories of backup tools, pointing people to the right type of tool based on their requirements...

Warning! Cloning/mirroring is not backup by Kotten · 2009-09-16 20:02 · Score: 1

Most people seems to be concentrated on describing how to duplicate the server. This is great for availability but does not protect against mistakes/sabotage. Corruption on main server is too easily spreading into the clone.

Use a real backup tool (backup2l is great) and THEN use rsync to copy (not mirror/delete) the backup to a secondary server.

The secondary server should be locked down with no public services running to minimize the risk of somebody hacking both machines and sabotaging your backup.

--
Note to self: Make a sig

RsyncBackup by Alain+Williams · 2009-09-16 21:06 · Score: 1

This is something that I wrote and use myself and for my customers. It is easy to set up and use.

The backups on the archive server appear as complete copies of directories of the backed up machines. There will appear to be one complete backup for each day - this lets you find/restore a consistent set of files from a particular day.

The script cleverly avoids copying files that have not changed. It economises on disk use by only keeping one copy of each file - but makes that one copy appear in the various daily archives.

The idea is that one central archive server initiates backups on several other machines.

This script works well where you have many files that do not change from day to day, eg word processing documents. It is not so good where most of your files change frequently - but will still work.

GPLed, get it from: http://www.phcomp.co.uk/Packages/RsyncBackup.html

OpenSolaris & ZFS by Anonymous Coward · 2009-09-16 21:28 · Score: 0

ZFS snapshots & "zfs send." Nothing is easier and cheaper than OpenSolaris' ZFS for this kind of thing.

What does the campus use to back up? by KYPackrat · 2009-09-16 22:22 · Score: 1

You don't give us a hint about your campus, but I'm sure that they have some form of backup system. Tivoli Storage Manager is big, bulky, and contrary, but in the hands of a paranoid pessimist (like our campus's TSM admins), it handles huge amounts of data and handles multiple copies all over creation. The systems I admin (AIX, SAP, DB2) regularly push 2T a day directly to an 8 drive LTO-3 library, while others are backing up to the 12 drive IBM "Jaguar" library in a different part of campus.

Check out main campus IT. At worse, you might have to buy them some LTO tapes or pay a per-meg fee, but you'll probably find a well-designed system that you don't have to maintain.

(If you do use TSM with Macs, go to the 6.1 client. it's a LOT better than 5.3 on the Mac. Also, run the first backup by hand. The client has memory consumption issues sometimes on the first backup.)

Re:rdiff-backup: like rsync with versioning by buchner.johannes · 2009-09-16 22:41 · Score: 1

Yes, it makes copying much faster

--
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.

thing of the past by Danzigism · 2009-09-16 23:28 · Score: 1

I thoroughly enjoyed my days as a rsync and unison user. However image based backups make the most sense nowadays. The only thing that sucks is that I've had a pretty hard time finding anything image based for Linux besides Acronis. If anyone is familiar with StorageCraft's ShadowProtect, this software is making huge strides in the enterprise. The server edition works wonders as does the desktop edition. If your server dies, you have full and incremental backups that can be restored to a machine with completely different hardware without any bluescreens thanks to the hardware independent restore feature. The server edition automates all this stuff as some of you may have seen with Zenith's BDR appliance which virtualizes your entire server if it ever goes down. It too runs on ShadowProtect. I'm sorry to say that this software isn't for Linux yet, but I'd like to see how Acronis works. If you can restore images to a virtualized environment, having a backup disaster recovery appliance that not only backs up in-house, but also uploads incremental images to a data center somewhere every hour/day/week, then you're talking some serious stuff. worth the money you pay for it if you need fast recovery times. I love rsync and unison, and I'm sure you could still use a method similar to that, but I rather see it in use with image based backups rather than .gz compressed backups.

Oh yea, and I forgot to mention that most of my clients servers with 200gb or more of company data takes on average 30 minutes to perform a full backup to an SATA based RD1000. The image becomes nice and compressed and can be restored at equally the same speed. if you're backing up to a BDR appliance with virtualization failover, then you're up in a few minutes. if you're restoring an image to a different server, then give it about 30-40 minutes and you are back in business. time is money, avoid downtime is crucial. maybe not for a college TV station though :-)

--
*plays the Apogee theme song music*

Try this by Anonymous Coward · 2009-09-17 00:01 · Score: 0

http://www.tedial.com/en/productos_en.html

Re:rdiff-backup: like rsync with versioning by samjam · 2009-09-17 00:35 · Score: 1

Multiple times I have lost my entire rdiff-backup backup when the client didn't have exactly the same version of rdiff-backup as the server.

My advice is to just do a 3 way rsync.

That way you can find and restore your backed up files without any special
tools, reducing the risk of the backup tool either trashing your data or the
backups

Hard links used in a 3 way rsync (the 3rd way being a reference to the last
non-incremental backup) mean you save space for unchanged files.

You won't get the space saving when large files change slightly, but that is
more than made up for by not getting the space back when rdiff-backup deletes
all your data.

--
blog.sam.liddicott.com

Local Backups and Remote DR ... by Anonymous Coward · 2009-09-17 00:43 · Score: 0

Completely free solution. I've been carefully looking at this for about a year. In the end, we went with:

Server 1: Xen Host running Ubuntu-64 Server with Ubuntu Xen clients
- Virtualization - this is key. We use Xen, happily, thank you.
- 6 production VM servers on a single physical, relatively small, host.

Server 2: Local Backup Server running OpenSolaris
- ZFS file system in a RAIDz2 config; this is critical
- receives rdiff-backups of Xen images, opened by the host during rdiff-backups. The rdiff-backup is performed during nightly maintenance windows in just a few minutes for each VM.

Server 3: offsite, DR storage running OpenSolaris
- receives rsync images of srv2 rdiff-backup folders.
- ZFS file system in RAIDz2 config.
- May switch to zsend mirrors in the future, but we are happy with rsync.

Our CRM system takes less than 3 minutes to backup. Our email system takes about 4 minutes of downtime to backup. These are complete OS, Data, DB, and application rdiff-backups.

We retain 30 days of incrementals in the rdiff-backup local storage for each server. Complete recovery has been used 3 times this year. Flawless. Under 20 minutes from decision to restore to the apps being available.

rdiff-backup lets a 6GB image only be 7GB of storage containing 30 days of diffs. This rocks.

Virtualization is critical so the images can be restored anywhere that Xen on 64-bit X86 is available.

We also rdiff-backup the xen-serverX.cfg files and all the custom scripts used with this solution. These are retained for 90 days.

Re:rdiff-backup: like rsync with versioning by tayhimself · 2009-09-17 00:54 · Score: 1

Another vote against rdiff-backup. I have had it die on very large directories with many small files with perl overflow errors. Google for rdiff-backup errors and you will find a wealth of information.
Between Ubuntu 6.06 and 8.04 the rdiff-backup protocol changed and there was no way to get the new rdiff-backup talking to the old one. No switch to change protocol etc.
Bacula is definitely superior, but nothing beats a commercial solution if you have the money and need disaster recovery bare metal restore.

simple, straight forward by Anonymous Coward · 2009-09-17 00:59 · Score: 0

Since you're running Final Cut, the simplest, most straight forward way of doing backup (especially if you have gigabit ethernet to the remote site - holly cow) is to get another Mac. Two suggestions: 1. If you have the funds, buy an XServe and attach a XRAID es disk array from Active Storage Inc. to it. You can configure that anywhere from 4 - 16TB. 2. Since it's just remote backup and you don't really need a performance system for that, you could do it on the cheap by buying a Mac mini, and attaching one or two LaCie 4big Quadra cubes (also 4-16TB) via FireWire 800 to the mini, and use OSX's TimeMachine backup software (if you can configure it to back up a remote volume) or use Carbon Copy Cloner as a backup tool. Relatively inexpensive, simple, no administrative headaches, done.

rsync + hard links for versioning by randallman · 2009-09-17 01:18 · Score: 1

Using rsync with hard links lets you version your backups with good space efficiency and a simple structure.

http://www.mikerubel.org/computers/rsync_snapshots/

Say you want a snapshot for each of 30 days. You'll end up with a directory for each day. If you started with 12TB and 1TB changed, your backups for 30 days combined will be 13TB. Plus there are no funky metadata formats.

If you use ZFS, don't forget --inplace by ttsiod · 2009-09-17 01:37 · Score: 2, Insightful

I also use rsync and OpenSolaris/ZFS to keep daily backups. BUT - important: If the content is made of big files that change slightly each day (e.g. VMWARE/VirtualBox disk images), make sure you also use "--inplace" when you do the rsync, so that you take advantage of the copy-on-write semantics of ZFS. For example, I am using rsync to back up a VMWARE server to an OpenSolaris/ZFS fileserver, where the virtual disks are huge "vmdk" files - in the order of 10GB each. These huge files change only a little each day (less than 1%) - rsync would indeed realize this and only copy over the network the parts that changed, but it would store completely new copies in the backup server for each day! (I am assuming here that you would ZFS-snapshot each day). If instead you use the --inplace option of rsync, rsync will not only send the blocks that changed, but it will also only write the blocks that changed - thus, your ZFS will be able to host many years' worth of daily snapshots of these "vmdk", a truly marvelous thing, if you think about it...

Final Cut Server Archive Device by Nick+P-T · 2009-09-17 01:43 · Score: 1

Hi, We have a plugin for FCSvr that allows you to archive entire productions, or single assets for that matter, in a single click to The MatrixStore solution which is a redundant disk based archive... Let me know if you want to know more. N

Solaris + ZFS + rsync + a bunch of hard drives by gh5046 · 2009-09-17 02:17 · Score: 0

Before reading please note:

1. I've been up for over 24 hours, my brain may not be operating at its best.
2. I personally have not attempted anything like this, but I think I know enough that it should be do-able.

If I make any glaring mistakes please feel free to point them out and make fun of it whole heartedly.

I'm going to assume the following:

1. Recovery time isn't a huge concern.
2. You or someone that works for you is willing and capable to build it.
3. You want, or would like, point-in-time recovery abilities.
4. You don't have a lot of money to spend.

Buy a case that can fit as many hard drives as possible. For example, this case can take up to twelve 3.5" drives (I do not work for Newegg):

http://www.newegg.com/Product/Product.aspx?Item=N82E16811103029

Get a lot of large hard drives, preferable SATA. If you get a case that can take ten to twelve drives, get 1.5TB (~14TB usable space) or 2TB drives (~18TB usable space).

If you have to use a smaller case you'll need to build more than one system.

Get everything else to fill up your case: (motherboard, CPU(s), SATA cards, lots of RAM, gig-e network card, and a power supply).

Install Solaris and give all of the disks to ZFS.

Use rsync to copy the data to your newly built box to create your initial back up, then create a snapshot using ZFS.

For each subsequent back up use the --delete option when running rsync then create a snapshot using ZFS. (ta-da, you have point-in-time recovery capability!)

Depending on how thrifty you can be, and not considering the labour to build and test it, this setup could cost you as little as $4k USD at current prices.

If Solaris x86 supports it, I recommend getting a motherboard or SATA cards that support hot swapping and a case with front loading bays. Being able to replace failed drives (which will happen) is a nice thing.

Beyond this, when your storage requirements go beyond this first build you can just build another box or throw in some eSATA cards and connect some external drives to expand your ZFS pool(s).

What you want is ZFS by Mysticalfruit · 2009-09-17 02:28 · Score: 2, Insightful

Even though I'm writing this from a linux box, if you're going to be storing that much data and you want to do it cheaply, you should really look at ZFS as the filesystem of choice for the backend.

As for moving the data over there, sure use rsync and then use zfs's snapshot features so you have some rollback capability.

Why ZFS? So I'm envisioning that you're going to need a mid range machine (duel power supplies) and hanging off that you're going to have a whole pile of JBOD. You could spend the money on something that does hardware based raid, but if you're cost concious, your best route is to buy a JBOD box and fill it with 1.5TB disks. You could try to manage all of this with LVM and possibly XFS, but it would be nightmare. ZFS basically rolls RAID/LVM/FS into a single layer. Thus adding disks to your array becomes trivial. Also, I would recomment that each user/application get it's own sub filesystem on the array, that way you'll have much finer granularity for snapshots/quotas/etc.

I didn't intend this post to be an advertisement for ZFS but I have such a setup with ~14TB of disk on it right now and it works great. As for the OS on top, you could go with opensolaris, or netezza (which is just debian rolled ontop of the opensolaris kernel.

--
Yes Francis, the world has gone crazy.

Re:What you want is ZFS by evilviper · 2009-09-17 16:54 · Score: 1

As for the OS on top, you could go with opensolaris, or netezza (which is just debian rolled ontop of the opensolaris kernel.
FreeBSD has full, stable ZFS support as well.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:What you want is ZFS by Anonymous Coward · 2009-09-20 11:12 · Score: 0

Isn't it nexenta? not netezza?

NETGEAR ReadyNAS 3200 by perlionex · 2009-09-17 02:32 · Score: 1

http://www.netgear.com/Products/Storage/ReadyNAS3200/RN12P0610.aspx It's a 2U, 12 SATA-disk server. You could load it with 1TB drives for 12TB. The software's pretty good (based on Linux) and constantly being updated.

--
Gan Family Homepage

This is too easy. by Khyber · 2009-09-17 02:39 · Score: 1

Just run a fiber link to another office on campus, put a NAS device there, send all archived data there. Pull a drive when it's full, drop a new one in, seal removed drive up in a safe room in the Dean's office.

That took all of 15 seconds of thought.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Re:This is too easy. by Bourbonium · 2009-09-17 04:38 · Score: 1

That's not really offsite backup. What if your campus is Tulane University in New Orleans?
Give this concept a little bit more than 15 seconds of thought, mmmm-kay?
Re:This is too easy. by Khyber · 2009-09-18 19:36 · Score: 1

Learn what a black box is. It's a sealed drive that can withstand a full-out plane crash. ain't hard to put a hard drive in a black box designed for storage of hard drives. doesn't need to be off site if you have the proper container. It'll take worse than Katrina and every gas main in NO exploding to fuck the drive up.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Re:rdiff-backup: like rsync with versioning by spinkham · 2009-09-17 02:45 · Score: 1

Does it store diffs of large files with small changes, instead of storing the whole file? If you have a 2 TB file with a small 1K metadata change in it, your solution will take 4TB, and rdiff-backup will take 2TB + 1K + a few more K for dir overhead.

rdiff-backup is a huge win if you have large files with small changes, such is often the case with virtual machines.
Otherwise, backuppc or backula or other simple link based replication deduping would be better.

--
Blessed are the pessimists, for they have made backups.

Re:rdiff-backup: like rsync with versioning by pla · 2009-09-17 02:52 · Score: 1

Do you know what -l does?

Yeah - It makes a hardlink to the file in question rather than actually copying it.

This takes advantage of the normal behavior of rsync (unless you explicitly tell it otherwise), where it writes to a temporary file before moving that file in place of the original - Which in the case of a hardlink, breaks the link rather than overwriting the original file.

So you effectively end up with a "snapshot" of any files that did change, and no wasted space (beyond the inode entry) for those that didn't (you can prove this to yourself fairly quickly, if you have doubts).

Incidentally, I agree that using FS-level differential snapshotting provides a much more elegant solution... But personally, I've had problems with LVM, and ZFS doesn't come stock on any older Linux distros (and that I know of, none of the rest that do come standard support snapshotting). EXT2 has supported hardlinks back into the days of antiquity, however, so the "cp -al" trick will work on just about any Linux box you touch.

Try Storix by bigredradio · 2009-09-17 03:00 · Score: 1

If you decide on a commercial solution, Storix is a good choice. ;-) www.storix.com

--
Flexible bare-metal recovery for Linux/UNIX

One solution total backup by Archangel+Michael · 2009-09-17 03:12 · Score: 2, Insightful

VMWare Snapshots

Are you backing up just data, or configurations or what? Backup Solutions are nice and all, but you're still missing something .... all the crap^H^H^H^H configurations that you've collected over the years of using that particular setup.

And once you go to VMWARE (or other VM product) you'll quickly realize that the abstraction away from specific Hardware is very nice indeed.

However, if one is REALLY concerned about backups, a duplicate Hardware setup in a seperate location sitting idle (or cold) is a necessity. And having a VMWare snapshot ready to load on backup hardware is just tits when things REALLY go south. You end up looking like a genius, and get to play Scotty (over engineered everything).

The difference between amateurs and professionals is not when things are going well, it is when the shit hits the fan. A weekend Geek can built the $8000 backupsever or whatever of storage, but once the drives start to fail (and they will) that solution starts to REALLY suck because you can't get to the freaking drives easily (and I doubt it will tell you that the drive even failed).

Let me just say it this way, if you can't afford "over engineered" equipment, you can't afford to do it right.

So, VMware, snapshots and spare hardware offsite are the way to go. Anything less these days is simply weekend geek pride.

--
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.

Re:rdiff-backup: like rsync with versioning by emj · 2009-09-17 03:34 · Score: 1

And from what I have heard Time Machine lead to data loss, ask some one who has tried to recover a back up from time machine.

R1Soft by Jradw · 2009-09-17 03:35 · Score: 1

We use R1Soft for backups. It does block based backups to disk and we have found it much faster and less CPU overhead than rsync. If you have alot of files that are continuously changing than r1soft is perfect because it only backs up blocks that have changed from the previous backup. If you have a 5gig file and you make a small change to it, you only backup that small change, no need to backup the entire 5gig file again! It really simplifies backup and recovery, and also has Bare Metal Recovery features that can save your ass when a server goes down. It can also be used on both Windows and Linux servers giving you one solution for both operating systems. R1soft has saved us so much time handling our backups, its defiantly worth a look for anyone serious about backing up a sever.

VTL by Anonymous Coward · 2009-09-17 03:39 · Score: 0

I'd go with a virtual tape library (VTL).

consider survivability by swschrad · 2009-09-17 03:50 · Score: 1

I made a whole office of IT manager types blanch one day after carrying tapes to the admin building by asking,

"say, you know those jets that fly over every 20 minutes? do you think that we'd have any backups left if one crashed here on the library and skidded forward?"

not that they did anything about it, mind you, but I'll bet they all updated their resumes.

and it's a good plan to prepare offsite video servers/storage. KREX TV burned out early this Spring in Colorado. while it's one way to update your plant quickly, it sucks for business continuity.

--
if this is supposed to be a new economy, how come they still want my old fashioned money?

Don't forget Bacula (enterprise grade backups) by apexwm · 2009-09-17 04:29 · Score: 1

As there have already been tons of suggestions already, Rsync is a great utility for backing up files. It's extremely efficient, and makes synchronizing large files a breeze. However, it's command line based, so you would have to set up a script to run it. Another alternative that I have found very useful is a product called Bacula. It's an enterprise grade application, complete with a server component, workstation component, and an administration console. It basically stands up to commercial enterprise products like Backup Exec. However, it is somewhat challenging to set up, as there are a LOT of options. And, its console is text based. There is a Gnome front-end, but it does not access all of the features that Bacula has. But the upside is that it can be configured to do pretty much whatever you want, can be administrated remotely, and is very powerful. It is even used for complete disaster recovery.

Enterprise Backup at SMB costs by Bourbonium · 2009-09-17 04:32 · Score: 1

I haven't seen anyone in this thread mention SymForm http://www.symform.com/, which may well be an ideal solution for your situation. This is a fairly new startup operation founded by former Microsoft and Amazon engineers that manages a cooperative cloud backup platform. You'll need to do some reading of the whitepapers on their website to wrap your brain around the concept, but the gist of the idea is that you configure your spare storage device (like your Drobo box) to form a node that connects to the cooperative cloud, which is comprised of free disk space on the spare storage devices (SAN, NAS, external SATA drives, etc.) of the other members of the cloud. With 5,000-10,000 other nodes sharing exabytes of free disk space, there is plenty of capacity for all the members of the cooperative, and as the cloud is distributed worldwide, there is no single point of failure to worry about. The data is fragmented in such a way that it is distributed randomly across multiple nodes (in a system they call RAID-96) so that no single node in the network contains a complete copy of your data. You pay a flat monthly fee to join the cloud, and your data is encrypted by your node and backed up incrementally over your network connection. It may take a while to get your first full backup transmitted, but after that, the bandwidth is used only for deltas. It's kind of a brilliant idea that blew me away the first time I heard about it.

TSM ... by Anonymous Coward · 2009-09-17 05:30 · Score: 0

My institution uses TSM (Tivoli Storage Manager) with many huge tape libraries and racks of disk storage. It seems to work extremely well. I imagine it is very expensive.

Re:redundancy reduces the reliability? by Dare+nMc · 2009-09-17 06:25 · Score: 1

Well, the complexity of this redundancy reduces the reliability overall, and it has a cost.

sort of reminds me of the joke by Mitch Hedberg, the "an escalator can never break, it can only become stairs."
IE if PC redundancy is done right, then yes (for example) it might have 6 smaller drives instead of 2 drives, and 2 controllers instead of 1, it will have some hardware failures more often than the simpler system. However after the first hardware failure, it essentially becomes the simpler system without the redundancy, until you fix it.
IE if the main failure component is hard disks, with redundancy you may have 3* the number of (smaller) drives and you are then (roughly) 3* more likely to have some drive failure, than the single drive system (say over a 3 year span) So while you may have a 10% chance of failure in 3 years with the single drive system, you may have a 1-(0.9*0.9*0.9) 27% chance of a single drive failure in the redundant system. But you have only a 5.4% (0.1*0.27+0.1*0.27) chance of having 2 hard drive failures in the 3 drive system in 3 years , instead of the 10% chance in the single drive non redundant case (add a hot spare, your under 1% redundant failure rate).
So even if you weren't allowed to fix the redundant system, the likely hood of a disk failure downing the system would still be at least 1/2 as likely as the non redundant system. A 6 drive redundancy (ie a hot spare) instead of 2 drives non redundant works out much better, ie 26% chance of non-redundant failure vs 1% chance of a triple failure (or double in the same array) of the redundant system...

Cost is a valid issue, increased reliability has a hardware cost, which if it doesn't outweigh the cost of a system crash, then yes you don't need it.

rsync+zfs by xbytor · 2009-09-17 08:37 · Score: 1

I use rsync to a zfs file system. A couple of cron jobs to fire off rsync and do zfs snapshots makes for a nice TimeMachine-like solution without TimeMachine.

Duel power supplies? by Anonymous Coward · 2009-09-17 10:58 · Score: 0

I can see those two power-supplies dueling each other "I was here first! ZAP!" "No I was ZAP ZAP"

s/Duel/Dual/g

What we use at work.. by poptix_work · 2009-09-17 11:12 · Score: 1

.. is a Solaris system (for XFS) and rsync. After each rsync a snapshot is created, for 45 days of retention (each snapshot is fairly small for us, your data sets may vary). It's extremely fast and not difficult at all to figure out, just make sure you turn off all the unneeded Solaris services (essentially everything but ssh).

I'd love to be doing this with Linux but btfs is not yet stable enough for a production environment.

I do *not* recommend trying to use hard links for incremental backups, you'll find that unless your files are large (instead of numerous) that most of your processing time is spend expiring old snapshots.

--
Just because you disagree doesn't make it offtopic or flamebait.

What "real" television stations use by TheSync · 2009-09-17 11:22 · Score: 1

"Real" television stations use LTO tape for video backup, along with a robot tape library like the Quantum Scalar series or the Sun StorageTek system. This is generally operated by a broadcast archive management system such as MassTech MassStore, Gorilla, or Front Porch Digital.

The broadcast archive management system is connected with your television station automation system, so when your automation system needs a certain video file to play back from your server, the archive system begins a transfer from the tape library ahead of time, so the file is on your video play back server before play back begins.

Re:redundancy reduces the reliability? by mysidia · 2009-09-17 11:55 · Score: 1

sort of reminds me of the joke by Mitch Hedberg, the "an escalator can never break, it can only become stairs."

If you don't believe in Byzantine failures, then sure. One way an Escalator can break is it suddenly starts running in the opposite direction, or it accelerates to a wild speed. Only manual intervention can stop it, and by the time you do so, someone might have gotten hurt.

IE if PC redundancy is done right, then yes (for example) it might have 6 smaller drives instead of 2 drives, and 2 controllers instead of 1, it will have some hardware failures more often than the simpler system.

The problem is not when 1 controller goes out perfectly. The problem is when 1 controller is disrupted in a way that breaks the other controllers, or breaks in a way that causes corrupt data to be written.

Single redundancy isn't enough.. You need either at least 3 copies of your data, or 2 copies on different systems, some very good checksums, and a reliable procedure for validating them.

Using Linux software RAID and RSYNC doesn't do that, even with 2 boxes.

Re:redundancy reduces the reliability? by Dare+nMc · 2009-09-17 16:27 · Score: 1

I think what your saying is, don't use software RAID 5 to reduce your chances of a major failure from 10% to under 1%, because you cant use just that one solution to completely eliminate the chance of all failures? It is one thing to just point out that this is not a perfect solution in its self for all problems, it is another to say, give up unless you can cover every possible failure at no additional cost.

You need either at least 3 copies of your data, or 2 copies on different systems, some very good checksums, and a reliable procedure for validating them.

Using Linux software RAID and RSYNC doesn't do that, even with 2 boxes.

hunh? either you are really paranoid, or your not aware of all the features of these tools. Because that can do exactly what you claim to need. IE if you have s/w RAID-5 on the backup server, and do rsync, having it compare CRC and date of each file. Then it does exactly what you say you need. IE the backup raid will compare the stripe each time you calculate the rsync hash, rsync compares CRC as well, you have essentially 3 verified copies of everything (2 on the back RAID, 1 on the main system.) A (very unlikely event of) a strip fail may not know how to recover the file, but rsync will fix it (maybe some manual intervention needed) the next pass.
I guess if the very unlikely event of a un-detected write error to a drive, happens on a file, then the unlikely event of a failure of the main system, within the same backup period. This would leave you with a very small window of a single byte screwed up (but detected.)
I guess to get to your coverage level, both systems need raid-5, (or at least a better file system, on the main PC) so that the main system doesn't get a undetected corruption that then gets backed up.
None of that yet explains why you posted that raid would make it less reliable. I guess unless you got some really crappy disk controllers that fail more than anything else, and also fails into a "disrupting manor" most of the time. I would agree, after that happens, you would have been better off with a different system. That would be the same as telling a lottery winner that playing the lottery is a loosing bet (ie the lotto is only the right solution for players that fall into a 1 in a million situation.)

Re:redundancy reduces the reliability? by mysidia · 2009-09-17 17:23 · Score: 1

Latent disk errors are not one in a million events like winning the lottery. They are very common: the more storage you have, the more likely you will have one. CERN did some studies on Silent data corruption, because it's a real issue in scientific data collection.

They found 10^-14 error rate on Desktop hard drives (10^-15 on Enterprise disk), you expect to have 1 bit error for approximately every 11.3 Terabytes, and this is assuming good hardware, that you've qualified and verified clean, if you had a bad sector somewhere, it's a totally different story. And this is not including other sources of errors, such as RAM errors (the Backblaze chassis doesn't use ECC memory), or errors that can be introduced as a result of vibrations, due to the custom construction, or controller problems.

IF you are storing 63TB of data in RAID5

Simple. I'm saying software RAID5 on cheap disks is not a replacement for using high quality storage. When it comes to important data, all failures are major failures, even if you don't notice the failure.

Using two of these things is not nearly as reliable as using one good storage array, with proper disks and checksumming of data.

Reliability includes expected downtime. Downtime of your secondary servers can be costly too. All servers have downtime, the question is just.. how much of that is there on average, per year.

Then it does exactly what you say you need. IE the backup raid will compare the stripe each time you calculate the rsync hash, rsync compares CRC as well

From the storage layer's point of view, RSYNC'ing to a destination on a local file system is no different than copying to a new file ordinarily on the array; the destination will most likely be in page cache, when RSYNC reads back bits to verify the content checksum, some of those bits will be read back from cache (not by having each physical disk read back all those bits).

RSYNC does not use raw disk I/O, it is unable to check what is stored on each stripe and actually do any RAID verification.

RSYNC is also unable to examine metadata. If the ext4/ext2/ext3/JFS/XFS/Reiser/FAT metadata for the file or directory has latent errors, it may not cause issues until well into the future.

Latent errors do not consist of only a failed write. They may also be created by stray rights, stray reads. Just because a sector was good 15 seconds after you wrote to it, does not mean it will still contain good bits in 24 hours.

Final Cut Pro Server or Final Cut Server ? by Anonymous Coward · 2009-09-17 23:53 · Score: 0

Multi-parts answer :

1. If its actually Final Cut Server , vs a file server being used for Final Cut Pro machines to dump files to - FCS is extensively extensive and scriptable - as far as media go you can have it set whatever backup or archive policy you like. It has commercial integration with several COTS backup and archive products including PresStore, Bakbone Netvault, and Atempo Time Navigator and Digital Archive. Its also very agnostic as to what it archives to - so if you can get an FTP or NFS share from elsewhere, you are good to go. If it is FCS, I'd suggest an Atempo based solution, as TiNa can handle the general purpose data as well, but Presstore is a very good alternative.

2. If its just a file server, there is no reason why you can't script up rsync on it to push data else where that makes sense.

3. Drobo's are flakey. Dark Star talking sentient bomb crazy flaky. Back this up urgently.

It really hinges on what else you can get access to as storage across the network.

OpenSolaris/ZFS or Unitrends by deuxexmachina · 2009-09-19 23:21 · Score: 1

If I have spare time and someone I can hand it off to, and if my user can't afford to spend that much, and the user has a relaly high quality WAN with a ton of bandwidth, I tend to use OpenSolaris, ZFS, rsync, and other open source stuff. As mentioned before, ZFS is a killer file system - absolte best out there - and you can put together a cheap app server and storage server and wire this stuff together. It does require handing off to soemoen who knows what they're doing - new releases and bugs and stuff can really suck time from you. Most of the time I use Unitrends It's an integrated appliance that does disk-to-disk backup and has replication and all taht stuff built in with a killer user interface and support I can point someone to instead of having to do it myself. I like the disk archive stuff because most of the time WAN bandwidth is an issue and I like whatever they've done in replication because it seems fastger than rsync on what I do. Plus because their unknown I like the fact I can get this priced below what pure software stuff from Symantec (which has the worst support in the universe) and CommVault (expensive as hell). I think the user interface they have kicks ass too particularly when I have to hand off to someone who isn't as technical as I am.

VMware is most likely unsuitable.... by jotaeleemeese · 2009-09-22 22:51 · Score: 1

... for things like databases and other highly transactional applications.

--
IANAL but write like a drunk one.

Most comments don't relate to backups. by jotaeleemeese · 2009-09-23 00:47 · Score: 1

It is highly dispiriting that after reading most comments on this thread, only one poster mentioned LTO tapes (or any tapes or long term archival means for this matter).

By copying your data to another machine (the underlying file system is irrelevant) you are only creating a set of data that is highly available, but not one that is properly backed up.

Back ups refer to archival and retention of data for long periods of time (months or years). Putting your data in another machine simply does not fulfil this requirement.

Disks were not designed as long term archival means, you will find this the hard way.

All the well intentioned comments on this thread are describing how to make your data more available, but the handling of backups implies much more than quick access to a recent set of data.

Although ZFS could be part of the chain of your backup strategy, if the data does not go ultimately to archival tape which is registered and stored safely, then you are deluding yourself if you think you have got the backup problem cracked.

--
IANAL but write like a drunk one.

Slashdot Mirror

Best Backup Server Option For University TV Station?

272 comments