Slashdot Mirror


NetBSD's Real-Time Network Backup

jschauma writes "One of NetBSD's developers, der Mouse, was interviewed by DaemonNews about his real-time network backup system (originally presented at BSDCan 2005), where changes to your local filesystem are automatically propagated to a backup server. In his interview der Mouse tells about his idea, how it works, and of course, how cool it is."

38 of 166 comments (clear)

  1. Correct me if I'm wrong by thedletterman · · Score: 5, Interesting

    But hasn't Sun been doing this with Solaris for at least 3 years?

    --
    Any fool can criticise, condemn, and complain, and most fools do. - Benjamin Franklin
    1. Re:Correct me if I'm wrong by operagost · · Score: 4, Interesting

      OpenVMS has been doing this for even longer using volume shadowing.

      --

      Gamingmuseum.com: Give your 3D accelerator a rest.
    2. Re:Correct me if I'm wrong by vertinox · · Score: 3, Funny

      But hasn't Sun been doing this with Solaris for at least 3 years?

      Yes, but do you want to sell your children and a kidney for a Solaris server?

      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
  2. Neat. by Pig+Hogger · · Score: 3, Interesting
    This is definitely the way to go. With huge hard-disks that offer capacities beyond tape drives, it is less and less feasible to use traditionnal tape-based backup systems in many organizations, if only by the time taken by the frigging tape drive...

    Here is the idea behind the setup I am currently using: Easy Automated Snapshot-Style Backups with Linux and Rsync.

    1. Re:Neat. by Lord+of+Ironhand · · Score: 4, Interesting

      I prefer Dirvish, and I highly recommend that people looking for a good harddisk-based backup system take a look at it. I've looked long and hard for a good backup system and this is the first that seems to fit the bill for me.

  3. Re:B.S. D? by ThePiMan2003 · · Score: 5, Insightful

    I think the point is that it could be used for an off site backup. Raid does not protect you from Hurricanes, or even fires.

  4. Good idea, but there has to be a better way by TheFlyingGoat · · Score: 3, Interesting

    This idea is really cool, but implementing it by putting hooks into each device driver seems overly complicated. It also doesn't sound like they're any sort of priority setting for this or any type of data filtering.

    Personally I'd like to see something like the MS filesystem in development that allows SQL calls to be run against it (not sure if there's any other filesystems that are similar). Query every 5 minutes for changed data that fits the backup parameters (within the system dir, the user's home dir, certain filetypes) and then transfer the data as the network isn't being used.

    That would achieve the same thing, but more flexibly and without affecting normal use.

    --
    You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
    1. Re:Good idea, but there has to be a better way by jcgf · · Score: 3, Interesting
      A hook into each driver does seem like a strange way to do this, you would think that it could be done once at a higher level.

      Query every 5 minutes for changed data that fits the backup parameters (within the system dir, the user's home dir, certain filetypes) and then transfer the data as the network isn't being used.

      Then you loose the realtimeness.

    2. Re:Good idea, but there has to be a better way by ivoras · · Score: 4, Interesting
      This idea is really cool, but implementing it by putting hooks into each device driver seems overly complicated.
      FreeBSD's GEOM is solving that: http://www.bsdcan.org/2004/papers/geom.pdf

      Also, there's "GEOM gate" on FreeBSD: http://garage.freebsd.pl/GEOM_Gate.pdf
      For other cool stuff with GEOM see here and here. See also this discussion thread about ggate's limits.

      --
      -- Sig down
    3. Re:Good idea, but there has to be a better way by disappear · · Score: 2, Funny

      Actually, he's letting the realtimeness (whatever in the heck THAT is) out of its cage, letting it loose.

      That it runs away and hides where nobody can find it. So he also does lose it, but only because he loosed it.

    4. Re:Good idea, but there has to be a better way by Jack9 · · Score: 2, Interesting

      Device drivers would be the best solution for me. I want an exact copy of what I wrote to a physical drive. Hook, encrypt, send to another HD to repeat. Realtime, low-level. This allows it to be relatively fast (as opposed to having to process through layers of abstraction), accurate (as opposed to something an abstraction might do to it), and realtime...

      I want dual transactions. 1 for onsite and 1 for offsite. I'm not even interested in encrypting the data. I need to be able to kill my onsite immediately and failover to the offsite with a simple endroute change. I need to be as realtime as possible...Why would I want a 5 min backup? I can get near-realtime NOW with many of the systems in this thread; I just want nearer.

      --

      Often wrong but never in doubt.
      I am Jack9.
      Everyone knows me.
    5. Re:Good idea, but there has to be a better way by MonkeyOfRage · · Score: 2, Insightful

      No political party has a monopoly on wisdom or ignorance

      No, but when it isn't frustrating, it's hysterical watching them try to corner the market.

    6. Re:Good idea, but there has to be a better way by ivoras · · Score: 2, Informative

      Sadly, no, not even to DragonflyBSD. Don't know why, but maybe it's because it uses kernel threads internally...

      --
      -- Sig down
  5. DoubleTake by ROOK*CA · · Score: 2, Insightful

    Sounds like it's essentially a DoubleTake daemon for BSD, cool, I wonder how well it scales? Say if you wanted to fully mesh 10 or more servers or something. Sounds like it might come in handy for keeping the content in web farms in synch as well....

  6. accidental deletion? by autopr0n · · Score: 2, Insightful

    Obviously, RAID servers don't help you in the case of accidental deletion. And they certainly don't help if your whole computer gets blow up.

    Still, you'd want to be careful with this, it would suck to back up all the temp files generated by random processes.

    --
    autopr0n is like, down and stuff.
    1. Re:accidental deletion? by Umrick · · Score: 2, Interesting

      What I want is something like Plan9's Fossil+Venti file system. Versioned, with permenant copies offloaded to archive media. It's a rather nice, though not blazingly fast, complete view of data. Not the rather ephemeral view most of us take. Restore to any point in time since inception.

      Failing that, something like OpenAFS with mirrored globally addressable volumes that can work at the system level rather than user level. Sure you can use IP security for OpenAFS, and a few brave folks have even gotten network booting to OpenAFS working... Again, snapshots are an option.

      The world view of data should really be 1. Create 2. Version 3. Archive. Instead we have as-it-stands-now and arbitrary-backup-in-case-of-failure-or-user-stupid ity. Sure some people do put /etc under SVN/CVS control, but not many.

  7. Cool, but not new by BlankStare · · Score: 2, Informative

    This concept has been in play for years as a commercial product for Disaster Recovery, Veritas Volume Replicator (VVR).

  8. Re:B.S. D? by Amouth · · Score: 4, Insightful

    yes you are missing the point..

    take 10 small servers that do the front end grunt work with 2-3 backup servers that keep complete working images of the servers and have access to their data..

    a front end server dies service can roll over to a backend until the front is replaced and is quickly made jsut like the orginal a backend dies and you have a second and if all the backups die then you still have the front end to recreate the backups..

    you don't normaly consider the bandwith costs as they are typicaly on a highspeed network between them and it offers you the option of replication over diffrent connections and areas..

    all redundent disks help with is if a disk dies not if ram or cpu fails

    some people have gotten too attached to their physical backups and tapes - personaly a backup is worthless if i can't have live access to it in a few min even if i am not physicaly at the point of failure..

    this isn't particulary useful for small setups but is great for mid to large scale setups and offers plenty of room to grow.

    --
    '...if only "Jumping to a Conclusion" was an event in the Olympics.'
  9. Re:Der Mouse? by Red+Flayer · · Score: 2, Informative

    "Those crazy Germans."

    No, that would be "der Maus"

    You crazy Americans -- Hier ist der Maus!

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  10. Re:Der Mouse? by isecore · · Score: 2, Informative

    Those crazy Germans.

    According to the article, he's canadian.

    --
    I enjoy large posteriors and I cannot prevaricate.
  11. NBD? by mikeee · · Score: 3, Informative

    How does this compare with Linux Network Block Device? Sounds very similar.

    There are pretty mature commercial tools for this stuff, as well - Veritas' VVR replication comes to mind.

    1. Re:NBD? by tpgp · · Score: 3, Insightful

      How does this compare with Linux Network Block Device? Sounds very similar

      It doesn't compare at all.

      From my (quick) scan of the article - think of NBD as a replacement for NFS (well, sorta) & this as a sort of network RAID (kinda, not realtime).

      They're not really alike - for linux drbd is probably closer.

      --
      My pics.
    2. Re:NBD? by SmallSpot · · Score: 2, Informative

      Not the same as NBD, but it is very similar to DRBD (http://www.drbd.org/). I've used DRBD before, and it works quite nicely.

  12. Re:B.S. D? by dpilot · · Score: 2, Interesting

    I don't actually run RAID, but I've gotten some interesting stories from some (more than 1) people who do.

    In a RAID cabinet, you have a bunch of identical drives, most likely purchased together, too. Then you submit them to an essentially identical environment and operating history. Barring a defect, and assuming wearout-type phenomena, something bad may well happen.

    The weakest drive fails first. Power down the RAID box to replace the bad drive, so you can bring it back up and restore the data. The stress of the power-down and restart is enough to kill the second-weakest drive. Now you have to go back to tape, and RAID didn't do squat. This doesn't happen all the time, but it's surprisingly more likely than you'd think - enough so that they've quit using RAID as "backup".

    Another alternative would be using different drive models, or finding some other way to change the vintage/history issue. Hotplugging drives while leaving the cabinet up would be another good idea.

    --
    The living have better things to do than to continue hating the dead.
  13. Re:B.S. D? by topical_surfactant · · Score: 5, Funny
    Raid does not protect you from Hurricanes, or even fires.

    Termites, on the other hand...

  14. Re:B.S. D? by djdavetrouble · · Score: 2, Insightful

    The weakest drive fails first. Power down the RAID box to replace the bad drive, so you can bring it back up and restore the data.

    well, no. enterprise level raid has spinning spares and hotswappable everything. you can lose two drives and still be running as long as you get those replacements in there before number 3 goes. been there, and yes, it happened when we shut down for maintenance. In the real world catastrophic failure happens. Raid is not used as a backup usually, it is used to keep data available in the event of a hard drive failure. That is why you have a tape backup every night of the raid, and an extra set offsite somewhere. We have all heard the phrase, "a backup of the backup".

    --
    music lover since 1969
  15. protection by NynexNinja · · Score: 2, Insightful

    How does this protect against an rm -rf against the filesystem... I guess it would trash the backup on the other side.

  16. Re:B.S. D? by PartialInfinity · · Score: 5, Insightful

    Why do you have to settle for one or the other? A proper backup strategy, like any security strategy, should involve more than one technology.

    Hotswappable RAID has saved my servers on more than one occasion. Likewise, the servers have also been saved by tape backups. RAID5, tape backups, and data replication all have different pros and cons.

    I think it is incorrect to say RAID5 is not acceptable in any backup strategy. The more chances you get at data redundancy, recovery, and failover, the better off your organization.

  17. Re:B.S. D? by Desert+Raven · · Score: 5, Insightful

    I don't actually run RAID, but I've gotten some interesting stories from some (more than 1) people who do.

    I'll comment on this later...

    The weakest drive fails first. Power down the RAID box to replace the bad drive...

    OK, this is where I start getting dizzy. If their data is valuable enough to have RAID, why were they such cheap bastards that they didn't get hot-swap drives? I've worked in a LOT of places that have RAID systems, and three of my own servers have RAID, yet to date, none of them were anything but hot-swap. Additionally, with a small amount of intelligence and a few extra dollars, the administrator always puts in a hot-standby drive that will automatically take over if a drive fails, allowing for the failed drive to be replaced at a more convenient time than 1:30am without sacrificing the redundancy. Sysadmins running really critical systems will often have multiple hot-standby drives.

    The stress of the power-down and restart is enough to kill the second-weakest drive.

    Now, see, here's the funny part. When you spend the bucks for SCA hot-swap drives, you actually get drives of decent enough quality that this is very rarely a problem. Even if you did have to shut the array down, which you won't because you bought proper hardware.

    enough so that they've quit using RAID as "backup"

    Further evidence of idiocy. RAID is not a backup. RAID allows you to keep running in the event of a specific type of hardware failure. But that is all it protects you from. Backups are still just as critical as they were before you had RAID. Anyone who uses a RAID array instead of proper backups deserves to have their data sacrificed to the gods of entropy, shortly followed by their own careers.

    As for my delayed comment on the first sentence... Well, I suggest you get smarter friends.

  18. Real-time accidental deletion, too. by evilviper · · Score: 3, Insightful

    This is basically RAID over the network. Personally, I can't see a lot of use for it... Just put the second drive in the machine, and use software RAID, rather than putting the second drive in a network server. Less network slowdown and congestion that way, not to mention CPU-time wasted packetizing, encrypting, etc.

    As always, RAID (and now this) is not a backup solution.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  19. Re:How is this different from Windows VSS? by Anonymous+Struct · · Score: 3, Informative

    As of Windows 2003 R2, there is a capability to do a VSS type of thing over the network to a remote server.

    I'm a little ashamed that I know that, but it's true.

  20. Re:B.S. D? by MonkeyOfRage · · Score: 2, Interesting

    I also don't see how this solution is effectively any better than RAID... If anything, a backup server is more expensive than a second hard drive for a RAID system (though it may pay off eventually). I'd think the backup server would need to be maintained as well... and if your backup ever fails, it seems like it would require a lot to set up another.

    I only skimmed TFA and it's not clear to me how like or unlike Windows' Distributed File System it is, but I'll give you a quick picture of what DFS does for us here to give you a better idea how NetBSD's backup could be handy. We've got a primary and secondary server, each with its own RAID array, and DFS isn't a replacement for it - it's a supplement to it. I'd consider this to be the same.

    For starters, when your server fails your RAID array goes with it. The data's fine of course (knock on wood), it's just not available until you either fix the server or shuffle the array into another system. Compound that with the fact that I only drop by here a couple times a week, and I'm the only person who could do this work (we're a small office). When that failure happens, the data would probably be offline for hours at minimum, and that would be a hardship in this environment. Having our data perpetually backed up on another working system that's just waiting to take over is easily worth the trouble and expense of a second system.

    In addition, DFS doesn't actually record a duplicate copy of the whole disk's file system (one-way to the backup server), nor does it work in the transactional manner that I picture this working, but it replicates files within a special share both ways. You create this share, and it isn't actually on either server - it's on BOTH servers. DFS decides which one to use and keeps the copies synchronized. If the primary server catches on fire, gets stolen, explodes etc., users would hardly notice. There's a little lag in replication sometimes, so something very recently saved in the primary copy of the share might not actually be in the secondary yet. Aside from that, almost everything else just keeps working.

    The bandwidth could be an issue in another environment, but this particular server only gets a mild-to-moderate workout, and DFS is able to keep up. There are a couple database applications that I only allow to replicate one-way because initially DFS started to choke trying to keep it synchronized both ways. For those, someone would have to switch the clients manually from using one server to using the other. Aside from those two, I can reboot either server at will without ever disturbing a user. I think that in the worst case, this is what you'd need to do with NetBSD's backup.

  21. but Multics had this in 1970 or so.... by apl73 · · Score: 2, Insightful

    It's almost a troll to even mention it, since there are so many things pioneered by Multics....

  22. oops! by krismon · · Score: 4, Funny

    Oh no! the rootkit got replicated to the backup server!

  23. DragonFlyBSD has the better way... by KonoWatakushi · · Score: 2, Informative
    There is actually a better way, and it is being implemented in DragonFlyBSD. Instead of duplicating writes at the device level, VFS operations are logged to a journal descriptor, which may be a file or network pipe. As this is performed in a VFS layer, it is possible to use with any filesystem. However, it is not limited to remotely (or locally) mirroring a filesystem; with the journal available, it will also be possible to rewind the state of the filesystem to any point in time, subject to the journal size.

    So what you have is not only a real-time backup, but also the ability to unwind any possibly damage after a break-in or other event. If your backup server is only running a journalling client, it can be made extremely secure, and also provide an excellent auditing tool.

    It would also be possible to delay the streams and have arbitrarily old filesystems available. Or to use a local journal as a buffer to smooth out the IO load as it is piped off site. It could also be used to augment a non-journalling filesystem, for crash recovery purposes, assuming your filesystem provides at least some consistency guarantees. In fact Netapp does something similar by logging FS operations to NVRAM, while the filesystem only writes consistentcy points periodically.

    Although I haven't read about the NetBSD work, I am sceptical that they could get the error handling to work correctly at that level. With the DragonFly journalling, there is support for transactional consistency, as well as recovering from interrupted network connections. While it is not complete, much of it is in place and functional.

    See the mountctl manual page for attaching a journal to a mount, and the jscan manual page for processing the journal.

  24. Re:B.S. D? by mattyrobinson69 · · Score: 2, Informative

    In linux a RAID array can contain any block devices, including network block devices, ramdisks, whatever.

    (I read this in a linux software RAID tutorial once)

  25. Network Appliance by SignalX · · Score: 2, Informative

    NetApp has been doing somthing similar for a very long time. A lot of people use the Sun boxes on the frontend to boot or attach to the storage appliance and let it do the backups. It saves space and saves the server from having to do it.

  26. Silly OBSD Troll (Re:... mock Theo.) by kjs3 · · Score: 2, Informative

    Actually, he's been around as Der Mouse since I was in college (circa 1985). I ran the xterm replacement he wrote back then. Long, long before Theo had his hissy fit and forked off OBSD. Of course, a trivial Google would have shown that, but hey, an AC would want to miss out on an ad homen flame...