Cringely's P2P Backup Idea
gewg_ writes "If Napster and Bit Torrent had a baby, would it Baxter?
As a follow-on to Cringely's
last column where he talked about having a backup strategy in the
wake of Hurricane Frances, this week he proposes a distributed RAID notion as a solution."
Baxter is, of course, the famous IRC client for BeOS. (Hi, Seth!)
Get off my launchpad!
In case they missed it.
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it."
Cringley's not the first with this kind of idea. In fact, the Freenet Project already implements something to this effect. Although not specifically designed for reliable backups, the distributed caching algorithms essentially replicate data towards where it's most often needed, helping to improve network performance and creating copies of important data along the way so that it won't be destroyed if a central server fails. Obviously not a commercial solution, but very interesting.
As a bonus, you can use it to transport data (eg. your mp3 collection) between places, or even use it to boot linux anywhere with much more space and document storage capability than Knoppix.
http://www.csua.berkeley.edu/~emin/source_code/dib s/
which is open source and also
http://www.hivecache.com/ which will be commercial 'real soon now'
Foldershare
We use foldershare for peer-to-peer backup, but the catch is that you invite people that you trust to your libraries.
For backup purposes, I only invite myself and just connect another computer to the account.
Thank you Mario! But our princess is in another castle!
I just went through Hurricane Ivan in Grenada. If you have been watching the coverage you should know that our island was completely destroyed. There is no water, no electricity, and no security. The university I attend (St. George's) lied to the students' parents about our situation. There were looters with guns and machetes threatening students. The first two nights we fended for ourselves with a large bonfire and homemade weapons, knives, pipes, etc. The third night we had 10 minutes to pack up and leave since we could see the looters lighting fires to apartment buildings on the road we were on. I quickly took the hard drives out of my two laptops (and the external drive I have), picked up a GSM roaming phone, any cash I had, a passport and two pairs of clothes. We ran to campus. Campus had about 200 male students lighting bonfires and running security teams to monitor the area. We chartered our own jet out of Grenada yesterday to Barbados which is where I am writing this from. My point is this: no one cares about data in this situation. No one wants to know about RAID or tape backups. If it came down to it, I would have ran with only a passport, a phone, and cash. We were worried for our lives and whether we had water or not, data was not our concern. People need a reality check. How many of you can claim that you went through a Category III or IV hurricane on an isolated island fending for their lives? Not many, so quite franly Cringely can go to hell.
That depends upon what you consider 'better'.
Large businesses have a scheduling process and hire people to swap tapes, move tapes in and out of the various facilities, rotate tapes, and replace tapes that are no longer reliable. This process is done on a 24x7x365 (plus leap days) basis. Most of the data is actually being backed up via tape silos and 'robots' to handle the actual tapes while the various backups are hapening, but it is still a significant investment in people.
A small business may be able to get away with burning a CD-R or CD-RW every night with that days transactions, and a small stack of CD-R (or RW) every weekend which they take home and store in a CD spindle in their freezer, or something. Though I think you would be hard pressed to find a small business that actually does that. (I am sure there are some that do.) Monthly or quarterly they should be taking a spindal of archived data to a remote relative's place to provide further archival of data.
Mid sized businesses are in a bit of a quandry. The number of tapes needed for a good backup is more than anyone really wants to haul around, handle and store at home, but they are not sure it is worth the expense of using a comercial off-site backup for either.
A project like this may be just what they are looking for. No tapes or disks to try to keep track of. Everything compressed and encrypted, so it is reasonably secure. Retreival can start as soon as the replacement system is ready to start retreiving it.
I personally think it should be trialed only as a suplement to some other backup strategy, but even then, someone would decide it was either too much of a hassle, or not reliable enough.
There are even people here who think it is 'reasonable' to haul around 160 or 250 Gig hard drives to backup their critical data.
-Rusty
You never know...
This idea is poorly thought out. It has a couple of *major* flaws, imo.
#1) It doesn't recognize the reality of the complexity of backup software. Kinda easy to gloss over 'automated' backups without ever describing it. Pretty hard to imagine some piece of software that can universally back stuff up on everyone's hard drive and at the same time be very easy to use. Imagine mom/dad trying to use software with similar capabilities to Veritas BackupExec isn't easy. And.. imagine the wide variety of live files and databases that it wouid have to handle.
#2) Data integrity. He suggests a 1:1 ratio for backup space. Not hardly. How is he going to have any kind of redundancy with that? Crashes and people unsubscribing will happen all the time. The data would have to have a *lot* of tolerance to that.
A parity solution wouldn't be nearly enough. That assumes that only 1 failure at a time happens (using RAID 5 as my basis here). It would be easy to imagine that one person unsubscribed with part of your data and another had a crash or corruption problem.
So.. complete mirroring would be necessary. Again, its easy to imagine 2 people's system going offline at the same time.. so, you'd probably need more than 2x Mirror. At this point... how much is enough to ensure reliability? 3x 4x 5x ? ? ? How much do you trust your average netizen?
So.. pick your number and then divide your backup space by it. Like 5x? Add 10GB and you have 2GB usable storage. Not very good.
I'll just skip over the 'auto backup' of people's 40GB storage over a 128K up line for now.. already typed too much...
Cringley is adding nothing new here. We've all already seen this on Slashdot. Hell, the websiteeven mentions how it's like P2P but not.
I would have moderated you into oblivion given the chance.
I genuinely feel for you and your struggle for safety given the recent events, and you have my deepest sincere sympathy...
But that is not what this article is about. And how about this, given the chance to either leave my data behind or fend for myself given those circumstances...I'd stay with my data.
Perhaps your data isn't a life or death matter to you, but my stacks of CD's, DVD's and harddrives with the past 15 years of my writing, graphics, and (most importantly) my recording sessions....over 500gb by now probably...it is indeed worth it for me to ensure it is safe. Even under such circumstances. The very thought of that data no longer existing is sickening to me...
No to undervalue your experiences at all. I mean that genuinely. But this article was about data backup--a form of backup that would have saved you even more time in your race to protect your neck.
I fail to see how this is informative to the topic at hand when all I see is someone poo-pooing a genuine concern with a slightly related story.
I'm willing to bet far more slashdotters than just myself value their data as much, if not more...risk life and limb for it? I probably would...it is just that important to me....which is why I would want to back it up in the first place.
Plan 9's "Venti" works similarly, and is freely available.
There are also several commercial systems that do this sort of thing. It's only out of your control if you store your data blocks on someone else's machines. Doing this across several widely-distributed SANs in a large enterprise is a reasonable backup strategy.
When I was working at a factory last year, I was part of an IT team supporting 1000+ PCs. An idea I thought of, but haven't had much time or chance to flesh out, was a "peer-redundant file system," whereas all those computers could have background hosts serving up a specified amount of space for use by anyone on the same network. The space would be treated like a block of sectors on a network-based drive, allocated by a master server, and made redundant through a desired number of hosts (anytime data gets posted, it should go to at least one random host, plus any more needed for redundancy). As people leave systems on, or turn them off, their shares could be updated by peers or the master server, and be able to sustain the desired space with as few as 1/3 hosts. Using the space would be easy: all client systems would have the same mount or drive letter, with the background software managing the behavior of the drive.
This situation solves two problems: one, having a network file share run out of space; two, a need for redundant backup. I suspect it could be done using exisiting peer-sharing software as a core.
Life is irony, and nothing ever goes as planned.
Nothing new here. Check out Berkeley's OceanStore project for an idea of a global storage solution impervious to local disasters.
Looks like he might like Pastiche.
Error correction gets a lot more sophisticated than checksums, you know. You can make a Reed-Solomon codec for 8-bit code words with 255 byte encoded blocks having any even number of parity bytes, and the way optimal RS codes work is that you can recover the original data as long as the number of missing code words plus twice the number of corrupted code words is less than the number of parity code words you chose.
So, you divide your data into chunks 225 bytes long. Each byte in a chunk goes to a different peer, and each of the 30 parity bytes also goes to a different peer. Then, even if a dozen peers have simultaneously unsubscribed or crashed and their shares haven't been replicated on new peers yet, you can still recover all your data from the shares that remain.