Distributed Internet Backup System
deadfx writes "Since disk drives are cheap, backup should be cheap too. Of course it does not help to mirror your data by adding more disks to your own computer because a fire, flood, power surge, etc. could still wipe out your local data center. Instead, you should give your files to peers (and in return store their files) so that if a catastrophe strikes your area, you can recover data from surviving peers. The Distributed Internet Backup System (DIBS) is designed to implement this vision."
This is just the next evolutionary change in P2P. Encrypting data and exchanging the encryption key so that only those "in the know" can exchange files and the *AA groups don't know what you are trading.
In the "Pefect Example of Talking Out of Both Sides Of Your Mouth" Department:
This is posted on the home page:
Note that DIBS is a backup system not a file sharing system like Napster, Gnutella, Kazaa, etc. In fact, DIBS encrypts all data transmissions so that the peers you trade files with can not access your data.[emphasis mine]
This is posted on the documentation page:
Make sure you give your gpg public key to any peers you want to trade files with.[emphasis mine]
Some nice folks at Stanford are also creating a different flavor of network backup called rdiff-backup. I'll just plagiarize the description from the homepage:
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership (if it is running as root), and modification times. Finally, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted.
The homepage also links to a project called duplicity, which operates on a similar principle, but uses GnuPG to encrypt data to prevent spying/modification.
--Lawrence Lessig for Congress!
I haven't had the chance to read the article yet. Just skimed the site. How fault tolerant is this? What happpends if I need my data and a chunk is on a member that is offline. Is the data stored redundantly?
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
It was designed for use in low-bandwidth envrionments. Not only do you get the benefit of a distributed backup system, but you get inherant (sp?) fault-tolerance, load-balancing, etc. Yes, over a low-bandwidth connection a file still takes a long time to copy, but OpenAFS is designed to accomodate this (not going into detail here, go to the OpenAFS site if you're curious). I am a fanatic OpenAFS user so I am somewhat biased. We have however implemented OpenAFS on a 1.4TB datastore at one of our customer sites (medical market) that has key data (a couple hundred Gig) distribted to 3 slave RO cells (again, read up on OpenAFS for answers). Rock solid reliability is an understatement.
A similan product Bacula performs a similar function.
People have pointed out some other issues which I'd like to respond to:
* Why not just use some rsync/gpg/perl scripts to backup files from one machine onto another?
-- What if you only have one machine? Using DIBS, you can trade (automatically encrypted) files with peers without having a user account on anyone else's machine or owning multiple machines.
* What if you are behind a firewall?
-- As stated in the manual, DIBS has a number of different communication modes to allow things to work if one peer is on the Internet and another is behind a firewall. There is even a way to make things work if both peers are behind firewalls, but this is kind of kludgy at the moment.
* Why not use Gnutella/Napster/Kazaa?
-- While these file-sharing networks are terrific, they are not back-up networks. Who is going to want to store the pictures from your summer vacation? Probably nobody if you are talking about Gnuetall/Napster/etc. On the other hand, you can store your stuff on my machine if you let me store stuff on your machine.
* What if your peers have varying connectivity to the network?
-- The idea is that DIBS will store your data on multiple peers so if some of the are unavailable, then you get your data from somewhere else. Also, the plan for later versions of DIBS is to have your client periodically probe peers to measure how responsive they are. Once you have this information, your client will automatically prefer trading with more reliable peers.
Once again, I don't mean to imply that there are absolutely no issues with distributed backup and I welcome more comments on potential problems. However, I think these problems can be solved and that is why I'm working on DIBS.
Thanks again for all the feedback,
-Emin Martinian