Cringely's P2P Backup Idea
gewg_ writes "If Napster and Bit Torrent had a baby, would it Baxter?
As a follow-on to Cringely's
last column where he talked about having a backup strategy in the
wake of Hurricane Frances, this week he proposes a distributed RAID notion as a solution."
Baxter is, of course, the famous IRC client for BeOS. (Hi, Seth!)
Get off my launchpad!
Depending on exactly what you have stored, millions of people may want to help you backup as soon as possible.
The coolest voice ever.
I think this is old news. Some people have been backing up the source code for viruses that they wrote on Kazaa for months now.
Buy Steampunk Clothing Online!
Well, we leave the data where it belongs: in the proxy network where the processes live too. Still a bit incomplete, but maturing WebDAV and mountable slices forthcoming...
Just insert a bunch of data into the network.. record the keys and retrieve once a week then delete. That should keep the data retrievable from the network for a good while. Using two nodes would help. Plus everything is encrypted with some heavy shit.
:(
Or, just make a local-freenet on the company lan.. everything is encrypted and unretrievable without the proper keys, so it's very secure and it's distributed.. + FEC encoding.
That assumes freenet works, AFAIK it's still fucking broken. Ian Clarke is playing too much politics with the project and the only coder that really understands freenet (Mathew Toseland) is swamped with ideas, day after day.. it just gets worse and worse... The donations seemed like a good idea, but after watching the DEV list for the last 18 months, I realize it's a failed project
Skype Me! username: john_allen_mohammed
In case they missed it.
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it."
But on the serious side, the claim of using encryption to store data on someone's hard drive worries me. Let's say the encryption gets broken. Now you might get Aunt Nedda's cookie recipes, but then again, you might get BobCo's strategic investment plan for the next 6 months as well. I can see people signing up just for the chance to hunt through people's data.
Cringley's not the first with this kind of idea. In fact, the Freenet Project already implements something to this effect. Although not specifically designed for reliable backups, the distributed caching algorithms essentially replicate data towards where it's most often needed, helping to improve network performance and creating copies of important data along the way so that it won't be destroyed if a central server fails. Obviously not a commercial solution, but very interesting.
Ideas like Cringely's will be impossible if the INDUCE Act passes.
Save Betamax is a national Congress call-in day this tuesday to oppose the INDUCE Act. It might be our last chance to stop this bill.
I had this idea in about '97 or '98. I looked around to see if anyone else had done anything like this (remember, this is kinda pre-mass-P2P) and found that someone had done so, but on a business scale solution. I think it was called Mango, and is still in production today. It essentially made a portion of your drive available for a drive letter, then whetever was copied onto it could be seen by all. The data was stored in at least 2 places, so if one went down, there was still one copy, and the remaining copy would duplicate, so that there was always at least 2 copies. In the end, I think nobody went for it because it was too expensive... But this is EXACTLY what a lot of Small-Medium businesses need atm. Bring on the Mango's!
As a bonus, you can use it to transport data (eg. your mp3 collection) between places, or even use it to boot linux anywhere with much more space and document storage capability than Knoppix.
It's a neat idea. In a nutshell, he suggests a Peer to Peer encrypted storage network. You get exactly as much storage room as you are willing to offer yourself for others to use. When you store anything, it's encrypted and automatically spread to other systems.
It doesn't make for a very safe backup, though: What happens if somebody decides to stop the service and just deletes his local storage? You've got no more backup at least for a while, and you might not even know it. And of course, other people have head crashes, too, which would also obliberate your backup at least for the time it takes to recreate it from your own data. Of course, by that time, you might have deleted it yourself, either by accident or knowingly, since you have a backup after all. A viable solution would be to store every file multiple times on different remote servers, although that'd lower the storage capacity you get. It's still the right step, though.
The crucial problem is that the service provider can't really give any guarantees that you will be able to regain your lost data. With three or more independent copies in different locations, it's very unlikely that the backup won't work for some reason, but a backup that's not 100% is not a very useful one, especially in those situations where backups are really crucial.
It's still a neat idea, and to my knowledge has not been done to that degree of sophistication. Of course, as others suggest, nobody is stopping you from inserting encrypted data into Freenet, but that's nowhere near as fast and secure as this could be. And while it's not a true backup, it's better than no backup at all, and most likely enough security for many persons.
Switch back to Slashdot's D1 system.
Peer Pressure
If your character data was stored on everyone else's computer, it would act like a virtual server, where if a few data sets get hacked, they'd be corrected by the whole.
P2P can work in wild ways we haven't even tapped.
too bad orrin hatch is trying to outlaw p2p:
www.geocities.com/James_Sager_PA
God spoke to me.
I just went through Hurricane Ivan in Grenada. If you have been watching the coverage you should know that our island was completely destroyed. There is no water, no electricity, and no security. The university I attend (St. George's) lied to the students' parents about our situation. There were looters with guns and machetes threatening students. The first two nights we fended for ourselves with a large bonfire and homemade weapons, knives, pipes, etc. The third night we had 10 minutes to pack up and leave since we could see the looters lighting fires to apartment buildings on the road we were on. I quickly took the hard drives out of my two laptops (and the external drive I have), picked up a GSM roaming phone, any cash I had, a passport and two pairs of clothes. We ran to campus. Campus had about 200 male students lighting bonfires and running security teams to monitor the area. We chartered our own jet out of Grenada yesterday to Barbados which is where I am writing this from. My point is this: no one cares about data in this situation. No one wants to know about RAID or tape backups. If it came down to it, I would have ran with only a passport, a phone, and cash. We were worried for our lives and whether we had water or not, data was not our concern. People need a reality check. How many of you can claim that you went through a Category III or IV hurricane on an isolated island fending for their lives? Not many, so quite franly Cringely can go to hell.
That depends upon what you consider 'better'.
Large businesses have a scheduling process and hire people to swap tapes, move tapes in and out of the various facilities, rotate tapes, and replace tapes that are no longer reliable. This process is done on a 24x7x365 (plus leap days) basis. Most of the data is actually being backed up via tape silos and 'robots' to handle the actual tapes while the various backups are hapening, but it is still a significant investment in people.
A small business may be able to get away with burning a CD-R or CD-RW every night with that days transactions, and a small stack of CD-R (or RW) every weekend which they take home and store in a CD spindle in their freezer, or something. Though I think you would be hard pressed to find a small business that actually does that. (I am sure there are some that do.) Monthly or quarterly they should be taking a spindal of archived data to a remote relative's place to provide further archival of data.
Mid sized businesses are in a bit of a quandry. The number of tapes needed for a good backup is more than anyone really wants to haul around, handle and store at home, but they are not sure it is worth the expense of using a comercial off-site backup for either.
A project like this may be just what they are looking for. No tapes or disks to try to keep track of. Everything compressed and encrypted, so it is reasonably secure. Retreival can start as soon as the replacement system is ready to start retreiving it.
I personally think it should be trialed only as a suplement to some other backup strategy, but even then, someone would decide it was either too much of a hassle, or not reliable enough.
There are even people here who think it is 'reasonable' to haul around 160 or 250 Gig hard drives to backup their critical data.
-Rusty
You never know...
This idea is poorly thought out. It has a couple of *major* flaws, imo.
#1) It doesn't recognize the reality of the complexity of backup software. Kinda easy to gloss over 'automated' backups without ever describing it. Pretty hard to imagine some piece of software that can universally back stuff up on everyone's hard drive and at the same time be very easy to use. Imagine mom/dad trying to use software with similar capabilities to Veritas BackupExec isn't easy. And.. imagine the wide variety of live files and databases that it wouid have to handle.
#2) Data integrity. He suggests a 1:1 ratio for backup space. Not hardly. How is he going to have any kind of redundancy with that? Crashes and people unsubscribing will happen all the time. The data would have to have a *lot* of tolerance to that.
A parity solution wouldn't be nearly enough. That assumes that only 1 failure at a time happens (using RAID 5 as my basis here). It would be easy to imagine that one person unsubscribed with part of your data and another had a crash or corruption problem.
So.. complete mirroring would be necessary. Again, its easy to imagine 2 people's system going offline at the same time.. so, you'd probably need more than 2x Mirror. At this point... how much is enough to ensure reliability? 3x 4x 5x ? ? ? How much do you trust your average netizen?
So.. pick your number and then divide your backup space by it. Like 5x? Add 10GB and you have 2GB usable storage. Not very good.
I'll just skip over the 'auto backup' of people's 40GB storage over a 128K up line for now.. already typed too much...
A company called 312, Inc. already has a commercial product for P2P backups called Lean On Me.
I don't work for them, etc.
Cringley is adding nothing new here. We've all already seen this on Slashdot. Hell, the websiteeven mentions how it's like P2P but not.
For larger, business-driven uses, you probably want something like DataSafe. They will keep media for you in a very safe place. Or better yet, keep your whole business disaster protected -have more than one live site for IT operations.
I would have moderated you into oblivion given the chance.
I genuinely feel for you and your struggle for safety given the recent events, and you have my deepest sincere sympathy...
But that is not what this article is about. And how about this, given the chance to either leave my data behind or fend for myself given those circumstances...I'd stay with my data.
Perhaps your data isn't a life or death matter to you, but my stacks of CD's, DVD's and harddrives with the past 15 years of my writing, graphics, and (most importantly) my recording sessions....over 500gb by now probably...it is indeed worth it for me to ensure it is safe. Even under such circumstances. The very thought of that data no longer existing is sickening to me...
No to undervalue your experiences at all. I mean that genuinely. But this article was about data backup--a form of backup that would have saved you even more time in your race to protect your neck.
I fail to see how this is informative to the topic at hand when all I see is someone poo-pooing a genuine concern with a slightly related story.
I'm willing to bet far more slashdotters than just myself value their data as much, if not more...risk life and limb for it? I probably would...it is just that important to me....which is why I would want to back it up in the first place.
There are several research groups doing work on distributed P2P backup systems. I know there's a group at MS doing this, as well as a group at MIT (http://catfish.csail.mit.edu/~kbarr/pstore/), and several others that don't come to mind offhand. I did a project on this in grad school, so I'm familiar with the research.
:)
There are a lot of issues here, mostly centering around the fact that you can't trust people in an open P2P network.
1) They might look at your data.
2) They might not be online when you want your data.
3) They might delete your data, or do other malicious things to it (insert viruses, etc.).
4) They might freeload by using space on other hosts and then deleting all the data they receive.
5) If a host leaves the system permanently, you need to detect that and replicate its data somewhere else. Also, how do you know whether it's leaving permanently or just logging off for a while?
#1 is easy, just encrypt the data. #2, #3, #4, and #5 are hard because data integrity is really important in a backup solution. You end up having to replicate the data all over the place to "ensure" that it'll be available when you need it, but then you've got the problem of having to donate more space than you receive to use the system. Plus, it's still not certain that your data will be available when you need it.
Basically what I'm trying to say is that it's a hard problem.
Nothing new here. Check out Berkeley's OceanStore project for an idea of a global storage solution impervious to local disasters.