Domain: mikerubel.org
Stories and comments across the archive that link to mikerubel.org.
Comments · 99
-
Re:Don't know about LibreOffice
From now on, It'll also be your fault for not having backups:
Windows 7
Windows Vista
Windows XP
OS X (leopard, snow leopard, Lion*)
Linux
Linux
Linux...(some of the linux methods will also work on OS X and Windows....)
*Lion doesn't even require a separate partition or disk. Of course, it will not protect you against disk failure in that case.
-
Re:Rotational media
Your scenario would not happen in any disk based backup I configured. You should have at least 3 to 4 copies of your data:
1. live data
2. online backup
3. offline backup 1
4. offline backup 2
Due to rotation, you need two sets of offline disk. All my disks use rsync with hard links. This has several benefits. First of all it saves space, so if I am backup up 1 TB, I only need about 1.5TB to maintain 30 backups. Second, Each of those backups is a full backup, there is no load the full backup, then run 3 days of incrementals. Each backup is "full", however it only takes up the space necessary for any file changes from the last backup (try doing that with tape). My online backup will generally maintain a 30 day history. My Offline backups will probably contain a weekly, monthly, and annual. Since I am reusing space, I can probably go back 5 years without storing 5 times my online data. -
Re:Lack of polish
Rsync is basically just doing a copy.
And it's not hard to tell rsync to create a copy before it does the sync (Easy Automated Snapshot-Style Backups with Linux and Rsync).
Although personally, I prefer rdiff-backup.
Apples just takes the concept and takes a lot of the black magic out of the system, then puts a useable UI on it. -
Re:Lack of polish
Actually, it's pretty novel. No other *NIX systems that I'm aware of permitted hard linking directories. Doing this with Time Machine was a pretty neat trick. Any directories that haven't been modified are just hard links to the previous version. Directories that have been modified contain hard links to files in the previous version.
Naw, I disagree. We've been doing all that at work since late 2005. I keep planning to set up a WRT54G implementation with a big USB disk at home, but somehow I never get around to it... the beer won't drink itself, you know!
Time machine is a trivial elaboration on Rubel & Schulz, and nothing special to linux geeks, although it is obviously da bomb for mac users who haven't yet transcended "point and grunt".
My own implementation requires adding a line to a flat ascii config file in order to add a new host, so GUI users are unlikely to find it attractive.
-
Re:Isn't leaving things out fun?
I have glumly come to the conclusion that if I want something equivalent to or better than MacOS's Time Machine on Linux for doing time-based incremental backups, I'm going to have to write it myself, and it's going to have to rely on LVM's snapshotting mechanism to do a consistent backup until BTRFS is ready.
You might take a look at using rsync for incremental backups. I've been doing this and it works great.
-
Re:Supported Blu-Ray
Using USB disks for backup seems pretty interesting at current prices.
If you run some kind of unix you can make backup-like copies using rsync so you'll only need to buy a new disk when you want an off-line archive backup.
-
Wow, where to start...
Knowing where to start on this is a bit of a miffing point.
First: upgrade your shit. 2.4 kernel systems? Are you running Redhat 6? You know, from the turn of the millennia.
Second: upgrade your shit. Really,
Third: if your kernels are that old and you're using these machines for file storage/backup, chances are the hardware needs to be replaced before you even consider considering messing with them. Seriously: this stuff is ancient. Even Debian hasn't had a 2.4 kernel in 5+ years, I think.
Third: you can do what you're trying to do with rsync 'snapshots'. It works very well, failing filesystem level support. If you're sharing data over samba, this makes it easy: just put a '.snapshot' dir for these 'temporary' backups in their $HOME and hide dotfiles. Then make sure rsync ignores
.snapshot. (Of course, there are other ways to do this.)rsync snapshots (and here).
There are other sources of information out ther on rsync snapshots. There's also rsnapshot.
Chances are you'll have to upgrade before this stuff even works for you, though.
-
rsnapshot
Have you looked at rsnapshot?
It's based on this article:
http://www.mikerubel.org/computers/rsync_snapshots/ -
Re:This just gave me a good idea!
Two things to look into:
-
Re:This just gave me a good idea!
try this::
mv backup.0 backup.1
rsync -a --delete --link-dest=../backup.1 source_directory/ backup.0/see this
-
FTP would be dead
FTP would be dead if Microsoft would adopt the SSH suite, since SSH has the exact same capabilities as FTP. SSH is the swiss army knife of encrypted networking. Port tunneling is very useful. Less known, but also very nice is the ability to use pipes like this:
echo "hello" | ssh remote_host "cat > hello.txt"
You could use it to make a large backup without consuming disk space on the local machine.
tar -zc directory_to_backup | ssh remote_host "cat > backup.tar.gz"
It also works very well with rsync. Combine with hard links for a great backup strategy.
I like to see the surprise from Microsoft centric developers when they discover what SSH can do. They seem to all have this false assumption that it's just for getting a shell on a remote UNIX system.
Though I haven't kept up with SSH development on Windows, two applications I've used on Windows are: WinSCP and PUTTY sshwindows also looks interesting as I use cygwin + SSH
-
rsync + hard links for versioning
Using rsync with hard links lets you version your backups with good space efficiency and a simple structure.
http://www.mikerubel.org/computers/rsync_snapshots/
Say you want a snapshot for each of 30 days. You'll end up with a directory for each day. If you started with 12TB and 1TB changed, your backups for 30 days combined will be 13TB. Plus there are no funky metadata formats.
-
Re:ask mom
Better yet, ask a friend who already has a system running 24/7.
beware, though - rsync will gladly delete files on the target system if you accidentally delete the originals on the local. Perhaps use --link-dest and the scheme described at http://www.mikerubel.org/computers/rsync_snapshots/
-
Re:Moving parts are the main problem
My full solution would be a fanless rig, with RAID 1 for full redundancy of disks so if a hard disk fails, it doesn't take your data with it, and weekly backups to DAT tape stored off-site. Then I'd use a pair of power supplies, using a diode to prevent power from one from getting into the other, and a zener diode or 78 series linear regulators to ensure a failing supply can't overpower any one line. Then, from my little power circuit, the two power supplies would feed the one motherboard, which would be underclocked at reduced voltage. It would have the highest possible amount of RAM in it, because that would reduce the writes to the hard drives.
On the software side, I would consider hosting the DOS app on linux using an emulator such as dosemu or dosbox. The OP's dad would have an environment very similar to what he's using now. I would probably use Debian stable for both boxes, which has very long release cycles and is very stable.
With linux comes the option to replace the DAT tapes with an off-site rsync over ssh. If the main box dies, you'd be able to just swap in the backup box in a couple of minutes. If the data set isn't very large the mirror will complete in a couple of seconds. It's very easy to do:
Create a RSA public/private key pair: ssh-keygen -t rsa, press enter at the password prompts.
Copy the public key to the remote box: ssh-copy-id -i ~/.ssh/id_rsa.pub remotebox.
Have a nightly cron job to push the files: rsync -ave ssh --delete
/localfiles/ remotebox:/localfiles.For bonux points you could even throw in snapshots.
I'm backing up hundreds of partitions this way at work, each with snapshots going back a month. Tapes are slow, unreliable and expensive. I would not use them for any purpose.
-
Re:Carefully protected?
rsync -F --link-dest every night, push the batches out to the remote servers, apply the batches, use RAID10 or better on the storage end, publish the backup archive to the end-users with samba (if you get the switches right on the rsync job file protection and permissions management is done for you, since files will retain their original attributes) for self-service restore capabilities, use passwordless (with large keys) ssh for the scp and rsync base transports, put a little bit of filtering on the client end to quiesce or ascii-dump any live databases and to prevent the keys being abused for other purposes.
http://www.mikerubel.org/computers/rsync_snapshots/
If you're using MS-windows on the server end, it gets exponentially harder, so you want to avoid that if you can.
-
Re:yeah, use rsync.
Hard links use up a single inode and no more. Why mess about with diffs when you can change into a directory and see yesterday's backup? Don't like yesterdays? Change into another and see the day before that. The depth is limited by the number of inodes you have in your filesystem / number of inodes used in filesystem being backed up.
Agree completely. Disk space is cheap as dirt anyway.
I'm using rsync to mirror about 300 GB on 300+ remote partitions, with snapshots going back up to a month, depending. I have 8 fairly low-end boxes doing about 40 partitions each. Normally all boxes finish in 90-100 minutes.
Total cost for this project was less than 20k. Bids from commercial vendors for similar functionality were much, much higher.
-
Re:yeah, use rsync.
Using hard links, you can make multiple trees using only the storage space of the changed files. Here's one example: http://www.mikerubel.org/computers/rsync_snapshots/
-
rsync - it's in the tag
rsync to get the data, cp -al to keep snapshots. I've been using this for years to manage TB of data over relatively low-speed links. You'll take a hit first-time (so kick it off at night, kill it in the morning, and the next night just execute the same command and it'll eventually catch up, then cp -al it, then lather rinse, repeat. This page: http://www.mikerubel.org/computers/rsync_snapshots/ has been about for years. Use it!
-
Re:Simple, switch to VMS!
What's wrong with UNIX file semantics?
It's not that there is anything wrong with it, it's just that it's so ridiculously primitive. I mean, rwxrwxrwx is such a incredibly limiting mindset... if you've never developed a large complex system on a more advanced filesystem (like VMS or Novell's Netware file system) you probably aren't aware of what you are missing.
Limiting files to only one group membership, and only five possible file manipulation properties (rwxts) is really lame, it's a 30 year old paradigm that several other OSes surpassed 20 years ago. Novell even used to have "rename inhibit" as a filesystem attribute!
Stacking ACLs on top of other architectures has always been a way to create system maintenance nightmares, ACLs enable as many problems as they solve in unskilled hands (at least you can back them up with the files they apply to these days, though, that's a recent improvement in the *nix world).
These days, good version of unix have filesystems with versioning, such as: http://wayback.sourceforge.net/. In this case, the versioning is implemented using a very general machanism, instead of being built in. This allows, as a result, many more things than just versioning filesystems. You should look in to FUSE.
Thanks for the link; I'm already familiar with FUSE. Wayback looks nice, especially for anyone who doesn't already have something based on Mike Rubel's paper set up, but it's not solving the same problem as a file system that cleanly implements version numbering.
There's nothing wrong with a sharp rock. But I'd rather have a nice steel axe when I need to chop down trees. Once you've used a better toolset it's hard to go back to the stone age.
-
Re:Download caps
Ever hear of hard links? Apple uses hard links for their Time Machine backup system, but anyone can implement it. By linking multiple "files" to the same data, every incremental backup can contain the entire file structure without wasting space. You get the performance and efficiency of incremental backups with the simplicity of a filesystem image.
Another plus is that each version of a file, no matter how many times it was "backed up", is stored exactly once on the backup media. So if you have, say, a quad-mirrored backup system, you can be sure that each and every version of every file has exactly four backups. With conventional full image backups, old files will be duplicated hundreds of times while frequently modified ones will only have a single backup.
The only downside to hard-link backups is an inability to span filesystems; if you can't fit a full backup onto one device, you'll have to split the backup up or RAID multiple drives for storage. Hopefully ZFS will simplify this, but it's still a small price to pay for fast, efficient, easily recoverable backups IMHO.
-
Re:SymantecWe just moved from Backup Exec 9.1 to Backup Exec 11d (We had starting using when it was Veritas), mainly for tape encryption capabilities. Of course, it is working fairly well, unless I do something crazy Like try to encrypt our backups to tape. I sat on hold for 45 minutes yesterday, and gave up.. They just bought Altiris, which is who we were looking at to switch to from Ghost. GRRR.. They just buy companies, and then raise prices..
You know, with the price of disk space what it is today I find it hard to come up with any reason to use tapes for backup anymore. 2 backup servers, one offsite over VPN or ssh, with encrypted RAID hard drives on LVM, rsync with hardlinks and compressed dump for archiving is much cheaper and more reliably than tapes especially with offsite storage. This can even allow automated background backup of laptops when they're connected. What am I missing? What do tapes add that would justify the added expense and pain?
-
Re:Here's my solution
Secondly, here's my solution: [...] I made a script that runs every night and copies a "current" folder to one named by date and then rsyncs the new stuff onto the "current" one. That way, I have a history of all the files for every day I ran the script, and I only store the duplicates because I hardlink it. This is the script (public domain, since it took me 3 minutes to write), feel free to clean it up a bit since I didn't really feel like coding at 5 am:
Way too little error checking for a backup script, I think.
Easy Automated Snapshot-Style Backups with Linux and Rsync is the standard resource for this kind of thing. He points to various scripts much like yours (but I wrote my own anyway).
-
Re:OpenFiler
Why add the additional point of failure? Or was I supposed to buy 2 identical RAID cards for when one failed and it turned out the array it built isn't compatible with anything except the exact same device with the exact same firmware revision?
In fact, I just had a RAID controller die. Fortunately it would still let me mount the disks read-only and recover the data. That pretty much convinced me that RAID is not what I want for home.
To replace the RAID (and because I needed more storage anyway) I went out and bought two 500GB drives. I have them mounted as two plain ol' ext3 drives -- not RAID, not even software RAID. Just two drives. I have a cron job that rsync's one to the other every night. I took a cue from this page and keep a week's worth of backups as hard links. This gives me seven days to recover anything I accidentally deleted before it's gone for good, but doesn't take up much more backup space than just a single copy. My data is mostly unchanging files like CDROM ISOs and MP3s, so after the initial 5-hour mass copy was done the nightlies only take a few minutes.
Now if either drive craps out I can mount the other in any Linux box and recover the data. If anything in that box craps out, including the controller, I can take the drives and recover the data. Yeah, it's possible that the controller could fubar both drives if something dire happens. A RAID controller could do the same. If I had 500GB of storage off-site I'd rsync to there instead.
-
Time Machine has been around for a while...
At least since 2004. Just bear with me and take a look at the following HOWTO.
http://www.mikerubel.org/computers/rsync_snapshots/
I've up a system exactly like the one described above, and had it running, on Panther and now Tiger. The same setup can be used in Linux, or actually any posix file system that allows hard links.
I've been enjoying beautiful backups, with each subdirectory being a perfect image of my home directory at any given date, for about three years now. What does time machine do exactly that's different from this, other than fancy graphics? -
I see no reason for a geek to upgrade
There is nothing new in Leopard that would interest most geeks.
Time Machine? I have had something very similar to it set up since the Panther days (via rsync).
3D interface? According to the ars review, it's not so hot.
I was so hopeful that ZFS would make it to Leopard. It has, but only with read access AFAIK, and certainly not in time machine---ummm, not very useful.
So, lots of eye candy for the casual user. Anyone care to chime in why a geek might want to upgrade? -
Re:Alternative to backup
-
Re:Why even bother with compression anymore?I'm using snapshot-style rsync backups, so gzip is not an option.
http://www.mikerubel.org/computers/rsync_snapshots /We can combine rsync and cp -al to create what appear to be multiple full backups of a filesystem without taking multiple disks' worth of space. Here's how, in a nutshell:
rm -rf backup.3
mv backup.2 backup.3
mv backup.1 backup.2
cp -al backup.0 backup.1
rsync -a --delete source_directory/ backup.0/
If the above commands are run once every day, then backup.0, backup.1, backup.2, and backup.3 will appear to each be a full backup of source_directory/ as it appeared today, yesterday, two days ago, and three days ago, respectively--complete, except that permissions and ownerships in old snapshots will get their most recent values. In reality, the extra storage will be equal to the current size of source_directory/ plus the total size of the changes over the last three days--exactly the same space that a full plus daily incremental backup with dump or tar would have taken. -
Re:AOE is better than any of that crap
I spent about $8000 for a complete rig, including fifteen 500GB disks and a couple of cat6 crossover cables for the links. I'm not currently doing multi-host simultaneous access so I don't need a fancy file system, it's just a regular block device to the OS.
I'm using it to back up a couple of terabytes nightly with rsync --link-dest. (See Mike Rubel's site if you're not familiar with that trick).
Performance feels about the same as the $200,000 (US dollars) fiberchannel SAN array sitting next to it, but I haven't actually measured. -
Re:You say poe-tay-toe, I say poe-tah-toe ...
Hardware neuroses, Windoze malware, PeeCees and "pro Macs". THAT explains it. You're one of those old-school Macintosh persecution-complex cultists. All becomes clear now. I do happen have a pair of white-box Windows machines for gaming, though the rest of my gear is low to mid-grade server-class x86 hardware running Linux or FreeBSD (Tyan and Supermicro stuff). I also happend to have some Mac hardware (although you'd probably sneer at my Powerbook for not being "pro" enough).
I spend approximately 10 to 15 seconds unplugging and replugging the USB drives on the days that I swap them. How long does it take you to burn a DVD? 20 minutes? 30 minutes? Amortized over a three or four week period (your claimed backup window), I still win on time "wasted". That's why I love the way the rsync method works - I don't have to DO anything, the snapshots happen automatically as long as the target drive is attached.
Since you use Apple hardware, using rsync for backups is even easier, provided you're running OSX, since Apple bundles it with the OS. Here is a good resource for setting it up if you ever decide to explore other, more reliable options for data security.
Alas though, judging from your tone, you don't seem willing to engage in rational discourse. C'est la vie. -
Rsync...
So far the best "backup" software I've used is rsync.
I used to work at one of the worlds most well known web hosting companies where among other things I ran their backup system. It started out with Arkeia and a 120tape library with 6 AIT3 drives. Arkeia was crap though (this was 3yrs ago), it was such a pain to setup and the trying to restore ANY amount of data would literally take days just to scan its local database. Trying to restore just one file would take 6hrs just for it to scan its local database... On a dual processor box with SCSI drives and 1gb of ram.
We moved to Veritas NetBackup, which was a dream to work with compared to Arkeia, but it too had issues (besides its cost). You could tell the software had been around for ages, it was far from being easy to use, or even efficient, but compared to Arkeia, it was a dream. It would start like 10+ processes, and every now and then one would die, causing everything to silently stop working and you would have restart them all. This usually caused at least one days worth of backups to fail. When you had 1TB of data to get in a 8hr window from a few hundred machines, it didn't take much to miss your window.
Tape backups are just a pain to use. They are slow to backup, and even slower to restore from. They need constant cleaning, and from my experience the drives fail more often then harddisks do. It seemed we were replacing about two tape drives a year. They aren't cheap either. Ouch!
The best backup system I've used so far is rsync with its nifty snapshot ability.
I setup the backup system for a company with locations in 10 different cities (connected with broadband), where each location has its own Linux server, and a central backup server at the main branch. Each employee has a H: which maps to a Samba share on the cities local Linux server, they save all their data to this location, and twice a day the main backup server rsync's the data back to the head office. Since this process happens twice a day, the amount of data that changes is quite minimal, a few hundred megs or less across the entire company, so it only takes a couple hours at most. The main backup server keeps these twice daily snapshots for about two months, and each week the main backup server itself is backed up to tapes. Luckily we have never had the need to restore from the tapes...
So basically all data is stored locally on a RAID'd server, then remotely on a RAID'd server at head office (twice daily), then offsite on tapes (weekly). The main benefit though is that executives, or the technical staff can pull data off the main backup server from any date in the last two months immediately, just by using Windows Explorer. No need to restore from tapes, and all the data is redundant in 3 locations. To do a restore, we basically just reverse the rsync script, and push the selected data from the main backup server back out to the local server.
Works like a charm, and its free! -
Why not use rsync
Quick question
... you mentioned using OpenVPN to do the remote-to-central backups. Why not just use rsync? Seems like it would be easier than opening a VPN connection, mounting or otherwise connecting to the server, and then syncronizing the files to be backed up (which you'd need to use other utilities for anyway). With rsync, it's all done for you and the security is still there, since it's done over SSH. Keeping a remote mirror is as easy as one line in crontab (plus setting up the required certificates), and snapshots aren't much harder.
After SSH itself, rsync is one of the most useful little utilities that I couldn't live without. It just works. About the only thing it doesn't do is true bidirectional syncronization, but this isn't as much a limitation for making backups as it is for situations where people are going to be changing things on both ends.
Anyway, I thought the rest of your post was right on, I just thought the SSL VPN thing was the hard way. -
Re:slimmer alternatives ?
Hmmm...by your description, I think you really just want rsync. Check out, for example, this nice tutorial.
-
Neat.This is definitely the way to go. With huge hard-disks that offer capacities beyond tape drives, it is less and less feasible to use traditionnal tape-based backup systems in many organizations, if only by the time taken by the frigging tape drive...
Here is the idea behind the setup I am currently using: Easy Automated Snapshot-Style Backups with Linux and Rsync.
-
Re:Write-once backupsA tangental question: What do people use to backup nowadays? Everyone says to backup early and often, but what do ordinary everyday people actually use?
I think the answer is 'ordinary' people don't make backups. The make occasional copys of their data and hope for the best.
and I don't blame them... its very very easy to run up 20Gb of archives and there are few consumer devices that make backing it up easy (ie easy as in you stick a cartridge last thing at night and its finished by morning)
Personally I have an archive server and external hard disk, the server uses rsync to mirror copys of the archive tree on two internal and one extrenal hard disk and uses the --backup option to keep a version history of altered files. here is a rather good page on the subject. I also do regular full and incremental DVD backups
I rather liked the look of the new iomega rev drive a 35Gb removable disk system. However I understand that the cassette are basically little hard disk drives complete with motor and read heads this is great in that it keeps the dust out but I may as well use a USB hard disk as the cassette has all the same weaknesses. I'm currently pinning my hopes on blueray. The burned disks may only be stable for a few years but I'll be doing a full backup every month or so anyway and the backups only exist to recover from total disk failure
-
Use hard links, rsync, big redundant disk array.
I keep a lot more than 50 days worth on line. And I get effectively more than 90% compression. And individual users can do their own restores from their own desktops.
Look at how dirvish works. Or rsnap, or rsync-incr, or rsnapshot, or ribs-backup, or indeed any tool based on Mike Rubel's basic idea.
I use a homebrew variation that is suited to my employer's unique needs and infrastructure. You may find it expedient to do the same. I don't save any metadata other than the snapshot date for each tree, and I use data mining techniques (well, actually I use find and gawk from command line) if I want to determine what's going on or how the system is doing.
It has run for years with no maintenance other than periodic OS patches. It is not our primary backup system because it does not support off-site archival, but it's well worth the investment for rapid restore of user-deleted files. I'll consider this array (I'm currently using linux soft raid 1+0 on two physically separate busses) when I need more disk eventually. -
Re:Independent RAID 5 solution
Thanks for your reply. You are quite correct to note that RAID increases the likelihood that a drive will fail (as there are more drives that can fail). I agree that 'availability' is a better term than 'reliability'. I also agree that the majority of home users and even the majority of
/. users do not have the data requirements that justify a few terabytes of RAID 5. I know professional photographer and videographers that do. I also know some home theater applications built around RAID systems.I did consider a rsync operation such as detailed here. I do believe that this is a good solution for many people and is certainly cheaper. I do also appreciate your distrust of an intervening black-box component of uncertain quality. None the less, I selected against this option for three good reasons and one silly one.
- In the event of primary drive failure, you will suffer some data loss.
- I like having an audible alarm on the hardware RAID unit.
- A dedicated hardware RAID solution is easier to install, use, and maintain.
- My server room always needs more blinky lights.
-
What are you trying to protect against?
What exactly are you worrying about - and will RAID protect it all? I think maybe not. Some things RAID will *not* help with:
1)Theft of the machine
2)PSU failure in the machine (this happened to me, and fried every single drive with 240V on the 12V rail!)
3)Lightening (could kill every machine in your house)
4)Fire.
5)HDD failure.
6)Catastrophic OS failure (filesystem corruption, conveniently mirrored), or a worm/trojan/virus.
RAID does give you convenience, slightly better performance, and ease of repairing the most common fault. But it sounds to me as though reliability and backup safety matter more to you than a few hours of downtime. My suggestion:
1)Don't use RAID; use separate machines. [Maybe mini-itx ?]
2)If possible, put one of them somewhere else.
3)Rsync + SSH - see here: http://www.mikerubel.org/computers/rsync_snapshots /
4)Offline Backup (CD ROM, or external HDD) in safe place (eg bank) for really important documents.
5)If you have remote backups, make sure you encrypt them. If you dispose of old CD-Rs, destroy them first. Likewise old HDDs.
P.S. I've had a lot of HDD failures over the last 5 years (Mainly IBM Deskstars). 2 years ago, I switched to the Seagate Barracudas en masse. So far, so good :-) At least Seagate give a 5-year warranty, which suggests you might reasonably get a 5 year MTBF. -
DIY?
Here's an easily modifiable script that uses hard links and rsync. I used this as our office's starting point and now have a system that:
- creates a local snapshot every night and stores it on a separate drive;
- archives a copy from the night before for only the storage cost of the changes;
- writes off to an external drive every weekend.
The nightly back-ups mainly account for users accidentally deleting files or saving changes they wish they hadn't rather than hardware failure. Since it's all just stored as a copy, I can mount it over the network if necessary or archive the the snapshot to external media anytime I want. It doesn't require any downtime to back-up or restore on our setup, but if you were dealing with some more complex services you might need to make som allowances.
-
Re:Seems to work for me
-
Re:full article mirror & comment
http://www.mikerubel.org/computers/rsync_snapshot
s /
I've implemented a system like this where I work and it's quite nice. However, a nicer option exists in Dirvish which does all the rotating for you. -
Re:full article mirror & comment
http://www.mikerubel.org/computers/rsync_snapshot
s /
for one decent incremental backup solution.
I find that having 1 drive live and one as backup works fine as long as the live drive isn't over 95% full, but most of my large content is pretty static--for me, there's a lot of churn (and backup size) in email/source code/etc, not much in music/videos/images, and the majority of the disk space is used by the latter. -
Offset backups first, then RAID
Personally, I prefer daily backups to another HDD (use rsync, it's great), that way, if I make a major *oops* during the day I know I have a very recent backup immediately available, this is something that RAID cannot protect you from (the human failure).
If then I've still got money to spare, I'll look at mirroring.
http://www.mikerubel.org/computers/rsync_snapshots / is a great page to learn about using rsync to make easy backups. -
Re:Online backup? - Capacity
A semi-modern PC has a minimum 40GB sized hard drive. And it only goes up from there. I've been online for quite sometime and while things have gotten MUCH better, with respect to bandwidth, it still takes a LONG, LONG, LONG time to transfer huge amounts of data. Note, I am not talking about your 4.5gig ISO image. I'm talking 20 of them. In a row.
Most businesses don't care about backing up all of your pr0n and music. For a lot of places, if you back up documents, email, and source code, you've got the core business stuff--and that's often fairly small. You do a full local backup of the servers, have a standard image of the desktops, then do web backups of a few directories nightly (e.g. all files on some samba share, a source repository, email). The web backups are rsync'd (or equivalent) so only the day's changes are transferred.
It's not ideal, but for a lot of places it works. Of course, they often find out after a crash that employees _weren't_ storing everything in "Work Documents" folder like they're supposed to.
For home use I usually just do hourly snapshots to another machine at home (I keep every hour for the last week, and the 4 previous weeks, and montly for 6 months, and then just yearly) with something like:
http://www.mikerubel.org/computers/rsync_snapshots /
With nothing automated for off-site backups (though I do keep a handful of critical documents off-site by hand).
I cheat and do the initial rsync on local disk, only incremental stuff goes over the network. -
Re:Ridiculous
Bottom line, you need incremental backups for data reliability. Doesn't matter how you do it, you can do it on top of RAID 5 to give you more peace of mind if you want, but it's not really necessary. Instead, at a bare minimum, you must be able to go back to several points in time to recover as recent of data as possible.
See, for instance:
http://www.mikerubel.org/computers/rsync_snapshots /
This document describes a method for generating automatic rotating "snapshot"-style backups on a Unix-based system, with specific examples drawn from the author's GNU/Linux experience. Snapshot backups are a feature of some high-end industrial file servers; they create the illusion of multiple, full backups per day without the space or processing overhead. All of the snapshots are read-only, and are accessible directly by users as special system directories. It is often possible to store several hours, days, and even weeks' worth of snapshots with slightly more than 2x storage. This method, while not as space-efficient as some of the proprietary technologies (which, using special copy-on-write filesystems, can operate on slightly more than 1x storage), makes use of only standard file utilities and the common rsync program, which is installed by default on most Linux distributions. Properly configured, the method can also protect against hard disk failure, root compromises, or even back up a network of heterogeneous desktops automatically. -
Re:USB/Firewire drivesI plug in and mount my USB disk once a day and run this. I could probably use a for to make it shorter, but... meh.
more information#!/bin/sh
#SRCDIR requires a trailing /
SRCDIR=/home/acoward/
DSTDIR=/media/usb/acowar d/rsync/
cd $DSTDIR
if [ -d daily.7 ]; then
rm -rf daily.7
fi
if [ -d daily.6 ]; then
mv daily.6 daily.7
fi
if [ -d daily.5 ]; then
mv daily.5 daily.6
fi
if [ -d daily.4 ]; then
mv daily.4 daily.5
fi
if [ -d daily.3 ]; then
mv daily.3 daily.4
fi
if [ -d daily.2 ]; then
mv daily.2 daily.3
fi
if [ -d daily.1 ]; then
mv daily.1 daily.2
fi
if [ -d daily.0 ]; then
mv daily.0 daily.1
fi
mkdir daily.0
rsync -a --progress --stats --delete --link-dest=../daily.1 \
$SRCDIR daily.0/
touch daily.0 -
Re:On retention, storage, backup, archiving...With regards to the mailbox not shrinking when you delete the spams, Mozilla requires you to "Compact this folder".
The reason for this is performance: if Mozilla had to rewrite a multi-hundred MB file every time a message were deleted, it would be extremely slow (likely, as long as it takes to run "Compact this folder"). Instead Mozilla uses indexes to keep track of the messages.
Regarding 700MB per CD, you can upgrade to DVD and get more than 4GB. DVD writers can be had for under $100 these days, and media is cheap. A 16x DVD writer can write a full disc in under 10 minutes.
For backups larger than 4GB, your best bet is probably a hard drive, preferably on another machine in another geographic location. If you want to get fancy, there is a very cool incremental backup solution for UNIX/Linux that can be set up using "rsync":
-
rsync for incremental backupsrsync is the perfect tool for figuring out "which of the 8,000 files changed." If you give it two directories it will copy the changes from one to the other. There are even ports for those of you running non-Unix OS'es. You can automate it, sync to remote machines, etc. Here's a tutorial on creating backup snapshots under Linux:
Maybe it isn't a concern for whatever personal data you are accumulating and backing up once a week, but for me, losing any of the photos I shot with my digital camera is usually an irrecoverable loss. That's why I back them up on a second hard drive on another computer after every dump from my camera's memory card. Now that I think about it though, I've been lax in off-site backups. Time to warm up that CDR drive.
-
Re:no incremental
Actually we use rsync for incremental backups and it works quite well. Its a simple modification or scripting of rsync commands and can be all scripted away pretty easily..
b-loo -
Re:omg
You can save so much storage space by doing byte-level backups... and guess what? Rsync has that ability!
So don't be so fucking st00pid! If you don't know what the fuck you are doing... GET HELP! Quit fucking bitching that you are st00pid.
God, I hate all of you stupid fucks!
We should all use Winblows so we can know nothing about the underlining OS and use fucking pretty windows and have our fucking GHEY themes and shit... cause we're all too damn stupid to use the tools that are installed on most modern Linux/BSD/UNIX distros!!!
eat me ass! -
Rsync or mkzftree for backups
The best way to create differential backups under Unix is with hardlinked snapshots. Easy Automated Snapshot-Style Backups with Rsync has a good explanation of how to do this. The best part is that restoring is as simple as copying a file. Each snapshot is a folder hierarchy on disk, and you can browse through any snapshot and find files you want.
One small improvement over rsync (IMO) is to use mkzftree from the zisofs-tools package. It's designed to create compressed ISO filesystems which will be transparently uncompressed when mounted under Linux (and other supporting operating systems; it's a documented ISO extension). mkzftree supports an option for creating hardlinked forest (like cp -al and rsync), with the advantage that the files are compressed, thus saving space. ISO isn't quite as flexible as ext2 for things like hardlinks, so what I do is have DVD-sized disk images formatted as ext2 to store the snapshots. I burn the disk images directly to DVD; each one can hold ten or twenty compressed snapshots (of my data anyway). The disadvantage is that I can't read the files directly (because they're compressed, and the transparent decompression only works with ISO) but it's easy to decompress a file or folder to /tmp using mkzftree if I need to restore something.
It shouldn't be hard to make the transparent decompression code work with other filesystems than ISO, as long as they're mounted read-only. The files are just gzipped with a header block indicating they are compressed.