Slashdot Mirror


NetBSD - Live Network Backup

dvl writes "It is possible but inconvenient to manually clone a hard disk drive remotely, using dd and netcat. der Mouse, a Montreal-based NetBSD developer, has developed tools that allow for automated, remote partition-level cloning to occur automatically on an opportunistic basis. A high-level description of the system has been posted at KernelTrap. This facility can be used to maintain complete duplicates of remote client laptop drives to a server system. This network mirroring facility will be presented at BSDCAN 2005 in Ottawa, ON on May 13-15."

30 of 156 comments (clear)

  1. Pros and Cons by teiresias · · Score: 4, Insightful

    This would be an extremely sensitive server system. With everyones harddrive image just waiting to be blasted to a blank harddrive, the potential for misdeeds is staggering. Even in an offical capacity, I really feel uneasy if my boss was able to take a copy of my harddrive image and see what I've been working on. Admittely, yes it should all be work but here we are allowed a certain amount of freedom with our laptops and I wouldn't want to have that data at my bosses fingertips.

    On the flipside, this would be a boon to company network admins especially with employees at remote sites who have a hard crash.

    Another reason to build a high speed backbone. Getting my 80GB harddrive image from Seattle, while I'm in Norfolk would be a lot of downtime.

    --
    -Teiresias
  2. Perfect for those moments... by LegendOfLink · · Score: 3, Interesting

    ...when you get that idiot (and EVERY company has at least 1 of these guys) who calls you up asking if it's OK to defrag their hard-drive after downloading a virus or installing spyware. Then, when you tell them "NO", they just tell you that they did it anyways.

    Now we can just hit a button and restore everything, a few thousand miles away.

    The only thing left is to write code to block stupid people from reproducing.

    1. Re:Perfect for those moments... by SecurityGuy · · Score: 3, Funny
      The only thing left is to write code to block stupid people from reproducing.


      Unfortunately the user interface for the relevant hardware has a very intuitive point and shoot interface.

  3. How long before this becomes a hack? by Bret+Tobey · · Score: 4, Insightful

    Assuming you can get around bandwidth monitoring, how long before this becomes incorporated into hacking tools. Add this to a little spyware and a zombie network and things get very interesting for poorly secured networks & computers.

  4. Done this for years by OutOfMemory · · Score: 5, Funny

    I've been using der Mouse to copy files for years. First I user der Mouse to click on the file, then I use der Mouse to drag it to a new location!

  5. Maybe setup is inconvenient. by hal2814 · · Score: 2, Informative

    Maybe setup is inconvenient. Remote backups using dd and ssh (our method) was a bit of a bear to initially setup, but thanks to shell scripting and cron and key agents, it hasn't given us any problems. I've seen a few guides with pretty straightforward and mostly universal instructions for this type of thing. That being said, I do hope this software will at least get people to start looking seriously at this type of backup since it lets you store a copy off-site.

  6. Re:use rsync by FreeLinux · · Score: 4, Informative

    This is a block level operation, whereas rsync is file level. With this system you can restore the disk image including partitions. Restoring from rsync would require you to create the partition, format the partition and the restore the files. Also, if you need the MBR...

    As the article says, this is drive imaging whereas rsync is file copying.

  7. What is the origin of "der" in "der Mouse" by benhocking · · Score: 2, Interesting

    I, too, immediately thought of German when I saw "der Mouse" (although in German it would be "die Maus", since Maus is feminine). Since they're located in Montreal, however, it seems unlikely that they'd be inclined to use German, and would be more likely to go for a French reference. So I ask, where does the "der" come from?

    --
    Ben Hocking
    Need a professional organizer?
  8. Re:use rsync by x8 · · Score: 2, Insightful

    What's the fastest way to get a server running again after a disk crash? With rsync, if I backup /home and /etc, I still have to install and configure the OS and other software. That could take a significant amount of time (possibly days). Not to mention the time spent answering the phone (is the server down? when will it be back up?)

    But if I have a drive image, I could just put it on a spare server and be back up and running almost immediately. That would require an identical spare server though.

    What do the big enterprises who can't afford downtime do to handle this?

  9. Re:Mac OS X by Anonymous Coward · · Score: 3, Informative

    If you want something for OSX
    I'd suggest either
    CCC (Carbon Copy Cloner)
    ASR (Apple System Restore)
    Rsync
    Radmind

    Have fun on version tracker....

  10. Right solution, wrong problem by RealProgrammer · · Score: 2, Interesting

    While this is cool, as I thought when I saw it on KernelTrap, disk mirroring is useful in situations where the hardware is less reliable than the transaction. If you have e.g., an application-level way to back out of a write (an "undo" feature), then disk mirroring is your huckleberry.

    Most (all) of my quick restore needs result from users deleting or overwriting files - the hardware is more reliable than the transaction. I do have on-disk backups of the most important stuff, but sometimes they surprise me.

    I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

    Just a thought.

    --
    sigs, as if you care.
    1. Re:Right solution, wrong problem by gordon_schumway · · Score: 5, Informative

      I'd like a system library that would modify the rename(2), truncate(2), unlink(2), and write(2) calls to move the deleted stuff to some private directory (/.Trash, /.Recycler, whatever). Obviously the underlying routine would have to do its own garhage collection, deleting trash files by some FIFO or largest-older-first algorithm.

      Done.

      --

      Ha! I kill me!

  11. nothing new by Afroplex · · Score: 2, Interesting

    Novell Zenworks has had this capability for sometime in production environments. It also integrates with their management tools so it is easy to use on an entire network. To say this technology is newly discovered is a far cry from the truth. They also use Linux on the back end of the client to move the data to the server.

    It is nice though to have something like this in the open source world though. Competition is good.

  12. Wacky idea by JediTrainer · · Score: 2, Insightful

    Maybe I should patent this. Ah well, I figure if I mention it now it should prevent someone else from doing so...

    I was thinking - I know how Ghost supports multicasting and such. I was thinking about how to take that to the next level. Something like Ghost meets BitTorrent.

    Wouldn't it be great to be able to image a drive, use multicast to get the data to as many machines as possible, but then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

    I think that would really rock if someone wanted to image hundreds of machines quickly and reliably.

    I'm thinking it'd be pretty cool to have that server set up, and find a way to cram the client onto a floppy or some sort of custom Knoppix. Find server, choose image, and now you're part of both the multicast AND the torrent. That should take care of error checking too, I guess.

    Anybody care to take thus further and/or shoot down the idea? :)

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
    1. Re:Wacky idea by evilviper · · Score: 2, Insightful
      I must shoot down your idea. I have lots of experience with this sort of thing.

      then use BitTorrent to get pieces to any machines that weren't able to listen to the multicast (ie it's on another subnet or something) and to pick up any pieces that were missed in the broadcast, or get the rest of the disk image if that particular machine joined in the session a little late and missed the first part?

      Bittorrent poses NO advantage for this sort of thing. Why not just a regular network service, unicasting the extra data to hosts that require it? Bittorrent has lots of features that make it more useful for internet downloads, but NONE that would help on a LAN. If a node on a 100Mbps LAN is missing 1GB of an image, it can just request it from a single machine that already has it, and it will get it at 100Mbps. Requesting pieces from two or more different machines will not speed things up. Bittorrents anti-leech technology would be useless on a LAN, as would extra hashing, as would randomized chunks, as would everything else bittorrent does.

      The only place I think you have a real point is dealing with systems on other broadcast domains... I haven't yet seen any multicast systems that do what I needed in that case, to unicast the drive image to a machine on a different network, then have that machine multicast it to all the local machines on that network... Instead, you have to manually do that yourself, in a 2-step process, which makes the process take at least twice as long.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  13. Re:use rsync by dtfinch · · Score: 2, Informative

    Just make sure the backup server is properly configured (or very nearly so) I guess.

    Our nightly rsync backups have saved us many times from user mistakes (oops, I deleted this 3 months ago and I need it now), but we haven't had a chance to test our backup server in the event of losing one of our main servers. We figure we could have it up and running in a couple hours or less, since it's configured very closely to our other servers, be we won't know until we need it.

  14. Re:use rsync by Skapare · · Score: 3, Insightful

    In most cases, file backups are better. Imaging a drive that is currently mounted writable and actively updated can produce a corrupt image on the backup. This is worse that what can happen when a machine is powered off and restarted. Because the sectors are read from the partition over a span of time, things can be extremely inconsistent. Drive imaging is safest only when the partition being copied is unmounted.

    The way I make backups is to run duplicate servers. Then I let rsync keep the data files in sync on the backups. If the primary machine has any problems, the secondary can take over. There are other things that need to be done for this, like separate IP addresses for administrative access, and the network services being provided (so that the service addresses can be moved between machines as needed while the administrator can still SSH in to each one individually).

    --
    now we need to go OSS in diesel cars
  15. ghost 4 unix by che.kai-jei · · Score: 3, Interesting
  16. Re:Automatic Backup for Paranoids? by cloudmaster · · Score: 2, Interesting

    Use rsync and hardlinked snapshots. There are lots of examples out there. I rolled my own a while back, but if you want something relatively nicely polished and based on that idea, check out dirvish (I didn't find that until after I already had my system set up).

    I really like having several months worth of nightly snapshots, all conveniently accessible just like any other filesystem, and just taking up slightly more than the space of the changed files.

  17. WTF by multipartmixed · · Score: 4, Informative

    Why on earth are people always so insistent on doing raw-level dupes of disks?

    First of all, it means backing up a 40GB with 2 GB of data may actually take 40GB of bandwidth.

    Second of all, it means the disk geometries have to be compatible.

    Then, I have to wonder if there will be any wackiness with things like journals if you're only restoring a data drive and the kernel versions are different...

    I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

    # ssh user@machine ufsdump 0f - /dev/rdsk/c0t0d0s0 | (cd /newdisk && ufsrestore f -)

    or


    # ufsdump 0f - /dev/rdsk/c0t0d0s0 | ssh user@machine 'cd /newdisk && ufsrestore 0f -' .. it even supports incremental dumps (see: "dump level"), which is the main reason to use it over tar (tar can to incremental with find . -newer X | tar -cf filename -T -, but it won't handle deletes).

    So -- WHY are you people so keen on bit-level dumps? Forensics? That doesn't seem to be what the folks above are commenting on.

    Is it just that open source UNIX derivative and clones don't have dump/restore utilities?

    --

    Do daemons dream of electric sleep()?
    1. Re:WTF by JonMartin · · Score: 2, Interesting

      I hear ya. We've been cloning our labs with dump/restore over the net for years. Works on everything: Solaris, *BSD, Linux. Wrapper scripts make it a one line command.

      I know some Linux distros don't come with dump/restore. Maybe that's why more people don't use it.

      --
      Serve Gonk.
    2. Re:WTF by evilviper · · Score: 2, Interesting
      Why on earth are people always so insistent on doing raw-level dupes of disks?

      I can think of a few reasons. It makes time-consuming partioning/formatting unnecesary. It does not require as much work to restore the bootable partion (ie. no need to bootstrap to run "lilo", "installboot" or whatnot). But mainly, because there are just no good backup tools...

      I have been using ufsdump / ufsrestore on UNIX for ...decades!. It works great, and its trivial to pump over ssh:

      Full dumps work fine, despite the above limitations, and I've piped dumps over the network many times. However, I've had incrimentals fail to restore a few times, so I can't trust them to work, and full-dumps take much too long to do regularly. So ufsdump is a lowsy option, in my experience.

      First of all, it means backing up a 40GB with 2 GB of data may actually take 40GB of bandwidth.

      Actually, you can pretty easily solve this, though it takes quite a chunk of time.

      On any unix system, just do "dd if=/dev/zero of=zerofile". After it fills your 38GBs of unused disk space, delete the zerofile. Then, your 2GBs of data, and 38GBs of zeros will compress down to a little more than 2GBs. Writing the zerofile to disk takes forever though, but it's well worth it, especially if you will be sending the image back out to multiple machines.

      I used-to do this to clone Windows machines. I wrote a simple C program to do the zerofile thing, then I'd multicast the compressed drive image to about 100 similar machines simultaneously. It was incredibly fast, and made Ghost look like a joke...
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    3. Re:WTF by setagllib · · Score: 2, Interesting

      You missed the point. Here you only need to copy the image once and then all subsequent writes are done on both images at once (the on-disk and the network one). That means that everything after the initial copy (assuming you begin doing this on an existing fs) is as efficient and real-time as possible, requiring no polling for changes or any scheduling. It is essentially RAID1 over a network. Although it doesn't do much against system crashes (since neither side will have the final syncs and umount writes) it does work very well against hard disk crashes, and it is also good to know that the same data is on another machine - so you can just boot into that system and get your server up, without needing to migrate disks over or reconfigure some things. Well, I don't know how close usual RAID1 is to that.

      --
      Sam ty sig.
  18. Re:use rsync by spun · · Score: 2, Interesting

    From the article, it sounds like they are using a custom kernel module to intercept all output to the drive. This would keep things from getting corrupted, yes?

    --
    - None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
  19. The Dark Side of Image Backups by RonBurk · · Score: 4, Informative
    Image backups have great attraction. Restoring is done in one big whack, without having to deal with individual applications. Absolutely everything is backed up, so no worries about missing an individual file. etc. So why haven't image backups replaced all other forms of backup? The reason is the long list of drawbacks.

    • All your eggs are in one basket. If a single bit of your backup is wrong, then the restore could be screwed -- perhaps in subtle ways that you won't notice until it's too late to undo the damage.
    • Absolutely everything is backed up. If you've been root kitted, then that's backed up too. If you just destroyed a crucial file prior to the image backup, then that will be missing in the restore.
    • You really need the partition to be "dead" (unmounted) while it's being backed up. Beware solutions that claim to do "hot" image backups! It is not possible, in the general case, for a backup utility to handle the problem of data consistency. E.g., your application stores some configuration information on disk that happens to require two disk writes. The "hot" image backup software happens to backup the state of the disk after the first write, but before the second. If you then do an install, the disk is corrupted as far as that application is concerned. How many of your applications are paranoid enough to survive arbitrary disk corruption gracefully?
    • Size versus speed. Look at the curve of how fast disks are getting bigger. Then look at the curve of how fast disk transfer speeds are getting faster. As Jim Gray says, disks are starting to behave more like serial devices. If you've got a 200GB disk to image and you want to keep your backup window down to an hour, you're out of luck.
    • Lack of versioning. Most disk image backups don't offer versioning, certainly not at the file level. Yet that is perhaps the most common need for a backup -- I just messed up this file and would like to get yesterday's version back, preferably in a few seconds by just pointing and clicking.
    • Decreased testing. If you're using a versioned form of file backup, you probably get to test it on a fairly regular basis, as people restore accidental file deletions and the like. How often will you get to test your image backup this month? Then how much confidence can you have that the restore process will work when you really need it?

    Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users. And under some circumstances, they might also want to go with RAID for home computers.

    1. Re:The Dark Side of Image Backups by adolf · · Score: 2, Interesting

      Image backups certainly have their place for people who can understand their limitations. However, a good, automatic, versioning file backup is almost certainly a higher priority for most computer users.

      Great. Now, could you please enlighten us as to what a good, automatic, versioning file-based backup system might consist of?

      AFAICT, this doesn't seem to exist. It doesn't matter how much sense it makes, or how perfect the idea is. It is simply unavailable.

      In fact, the glaring lack of such a capable system almost seems to indicate that it is a victim of the "pick any two" rule.

      So where is it?

      (And, no. A few programs tied together with a ream of Perl or shell script that needs modified in order to function does not constitute a working system, and nor does a HOWTO with instructions on coding one.

      Non-programmers, believe it or not, often have important data to back up, too, and being able to code should not be a prerequisite for keeping important stuff backed up. That is, unless you programmers really do think that it'd be no big deal if your loan officer lost your mortgage just hours before closing, or when the accountant's machine trashes your financials.)

    2. Re:The Dark Side of Image Backups by Kent+Recal · · Score: 2, Informative

      Ummm. Well, there's DAR and there's kdar. I think there's even a win32 version for the clueless.

      It doesn't get much easier than this. You can have a sane, incremental backup setup in a single line cronjob or even point and click one up.

      If that's not simple enough for you then you have no business of storing or working with sensible data.

    3. Re:The Dark Side of Image Backups by mrbooze · · Score: 2, Insightful

      It's not that complicated. Disk image backups and file-level backups are not intended to serve the same purpose.

      Disk image backups are pure disaster recovery or deployment. Something is down and needs to be back up ASAP, where even the few minutes of recreating partitions and MBRs is unwanted. Or it's about deploying dozens or hundreds of client systems as quickly as possible with as few staff as possible.

      File level backups are insurance for users. Someone deletes/edits/breaks something important and needs it back or an old version back, etc.

      Sometimes, separating those two business needs (DR from user restoration) is the most sensible thing to do.

  20. Not scalable. by SanityInAnarchy · · Score: 2, Interesting

    rsync is not scalable to large numbers of files. We set up a backuppc machine awhile ago, tried to rsync the entire backup set over to another machine... It was a miserable failure. Even if we didn't check for hardlinks, (which we have to, backuppc uses tons of hardlinks,) the rsync process completely saturated a gig of RAM before it even started syncing.

    Now, rsync would have been fine if we'd unmounted the filesystem and done it on the raw partition. But there's a couple of problems with that:

    It's not live. Not a big deal for us, since it's a backup machine to begin with, but still...

    rsync doesn't do that. A couple of people have submitted patches to allow a flag for rsync to copy block devices as if they were files. They were tiny patches, but they were rejected out of a fear of users doing stupid things with them. I guess the usual Rsync Way is to duplicate the filesystem, so that devices are copied with mknod, not dd.

    --
    Don't thank God, thank a doctor!
  21. Re:DOS of the backup server by setagllib · · Score: 2, Insightful

    RTFA: It responds to heavy load by making a log (journal?) of the blocks that need backing up, and then does them when the load is lesser. If you do it on swap, then you're insane and deserve whatever you get :)

    This is a good idea, even if its niche is small, but I'm interested in how it handles the encryption. If it doesn't allow key re-generation on the fly, HMACs, certificates (or at least PSKs) and other things we expect from modern (SSH, IPSec/IKE, etc) systems then it's not going to be very useful. And unless I missed something it's going to be difficult to tunnel through a system that does do these things.

    Personally I use SSH to tunnel everything possible, especially from Windows where IPSec is a joke, and the thought of sending all of my disk writes over a security system that is any less secure is a worry. Just imagine the problems if a man in the middle (or just a sniffer) catches plaintext: they know what you're doing, they know the contents of what you're doing, and highly likely they know what to do to exploit what you're doing. It's a very good thing that system entropy under nix is stored in the kernel, not on disk :)

    --
    Sam ty sig.