Distributed Internet Backup System
deadfx writes "Since disk drives are cheap, backup should be cheap too. Of course it does not help to mirror your data by adding more disks to your own computer because a fire, flood, power surge, etc. could still wipe out your local data center. Instead, you should give your files to peers (and in return store their files) so that if a catastrophe strikes your area, you can recover data from surviving peers. The Distributed Internet Backup System (DIBS) is designed to implement this vision."
The main problem with this approach (and for that matter Freenet) is that it is slow for all but the smallest files.
Bandwidth is still the most precious commodity in computing. Once we get fibre to every house, then distributed storage will make sense.
Get your own free personal location tracker
I guess we won't be able to slashdot this server, then.. there goes that idea.
I've got my terrabyte array setup. Your, "Worlds of Warcraft" data will be completely secure on my backup node.
Go ahead, send it.
I'm waiting....
It's like what Linus said:
;)"
"Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it
-- Linus Torvalds, after his hard disk failed
M.
I'm not worried. %-)
A feeling of having made the same mistake before: Deja Foobar
We do this with neighbor school districts. We also backup all buildings, over the WAN and at night, to a file on the hard drive of another building. We do this in two places, so backups criss-cross. Because of the size and time it takes, this can only happen at night and only one building per night, so there is a downside. But if a building goes down, I know I have a secondary (besides the tape in that building) to fall back on.
File sharing engines could take a lesson from this tool, in expanding the collective database. If people were forced to have data on their computer vs just having what they download, it would improve efficiency of the network and increase the number of searchable files.
to store my data. why should you trust me?
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it."
-Linus Torvalds.
...just share everything on a P2P network. Then, after a crash, just fire up your favorite client and go get your invaluable porn^H^H^H^H data files!
Geek used to be a four letter word. Now it's a six-figure one.
There are lots of
Trolling is a art,
What's with all of the "cut and paste" stories lately?
One of the things I like about Slashdot is the different takes on existing news presented by user submissions. Lately, though, many stories seem to be just copied directly from the link's website.
Why in the world would I ever put my data on someone else's machine? I spend my life keeping people out of my network.....
Distributed Internet Backup System = Gnutella
Bowie J. Poag
For all the fun files anyway.
Blaze a trail to the New World
What if it is sensitive data? Do you think even with all that cryptography and secure computing blabla people will trust storing their important files on other people's computers? think not. There are companies who put their backups into safes ... ask *them* to put it online on a slashdot reader's PC. See what they answer.
Freenet and similar networks are only good for general [public] domain data
Hello extreme programming fans? Please leave the building.
I hate liberals. If you are a liberal, do not reply.
How would this work? We all can't be assigned the small part of the internet, and be told to download it. Also, if the internet is down, it is most likely that we won't be able to access the internet either.
As has been mentioned already, [no this is not redundant, because I am writing this myself] the potential for data being stolen is too great an issue to overlook. This is not a viable option because the potential for theft is too great, and no ammount of encryption will make a difference. Encryption will always be broken.
Saskboy's blog is good. 9 out of 10 dentists agree.
What is to say that the FBI/RIAA won't come to your house, claiming you have terrorest information/stolen music stored on your harddrive? And assuming it was true, would you be legally/crimminally liable for it? This gives a whole new meaning to the excuse "well I was just holding it for a friend".
Otherwise, excpet for the lack of available, easily-accessed FAQ and documentation, this sounds like an awesome idea. How many others have this same opinion?
"This food is problematic."
I have a shell script that sends contents of a directory on my home systems to a machine of mine at a hosting company in another state, and vis-versa. Cron runs it on a nighly basis.
I always figured it was a fairly common thing for "data conscience geeks" to do.
Of course this is aimed at users who don't have their own off-site servers.
-Pete
Soccer Goal Plans
reminds me of pre-napster warez. just "back your stuff up online" and forget to set directory permissions ...
Will you be liable if someone stores kiddie porn on your drive? Maybe not ultimately, but that won't keep you from being arrested, your computer confiscated and your name trashed in the meantime.
"As God is my witness, I thought turkeys could fly." A. Carlson
With this system all other P2P networks will go bye-bye
Why bother searching for files when I have my friends 200GB movies and mp3 collection backed up on my machine!
Its not copying its a Back-up! 8)
__Syo
There goes the new slashdot server I guess.
Q...
The whole point is it will be distributed. So If one jerk-off looses your encrypted data (in case that's what you mean by "trust") then there are 10-10000 other competent people to fall back on.
I went to battle MC Escher but drew a blank
It's not so much that I wouldn't trust someone not to break the encryption, but what if the person who's holding your backup copies gets tired of giving up disk storage and just deletes the software from his/her computer. Or what if their computer happens to be off when you want to retrieve the backup?
the court battles, the claims that "it was meant for legal backup purposes but we can't control what our users do"... Why not just go ahead and tape a "DCMA Me" sign on it's back?
It's not a bad idea. The website talks more about security (PGP) and such, which would be my primary concern. (My porn, not their's...)
Seriously, though... Just as with P2P networks, it depends on a strong, diverse, and reliable mesh. Any natural disaster, bandwidth failure, or even power failure could wipe out most, if not all, of your peer backups. Tried and true remains for me.
jrbd
affordable jukeboxes.
People should be able to burn DVDs and have a keg-refrigerator sized juke box with a few hundred of these in it hooked up as a near-line SCSI device.
You CAN get these but the cheap ones are 25 grand.
Anyone know why they're so expensive? I'd love a non-volitile terabyte or two.
It's Christmas everyday with BitTorrent.
I certaninly know my company would never give it's confidential data to others to backup ... and isn't that the most important type of data?
... how long will encryption of today last? If I have plans for a product that will last 15 years, I don't want the plans out there to be decrypted in 10. Also... where do I store my decryption key? If that get's lost, I might as well have no backup at all.
The obvious solution is to encrypt. BUT
I grant that personal backup is time consuming and it is tough to find a good method without resorting to expensive tape or hundreds of CDs. But as intriguing as this approach is, there seems like a lot of problems with it.
What if the reason you need to do a recovery is because your system with internet access is toast? How long does it take to restore several hundred thousand files? What about peers that drop off the network, or that are only on sporadically (no, that never happens in peer to peer filesharing networks!).
Even aside from the issues of speed of restoration, I can't imagine too many circumstances in which you want to rely on a internet network connection as a prerequisite for a successful restore... Although perhaps as a way of complimenting existing backup methodologies (i.e. backup root and critical config information to tape or CD, and the rest of your schiznit to DIBS) this might have a place.
Okay so you have your data on the remote machine encrypted with your PGP key. Which you kept on your local machine, or maybe you kept it on a floppy or usb keyring by your local machine.
Disaster strikes. Byebye local machien, byebye PGP key. How exactly do you recover now?
Slashdot Patriotism: We Support our Dupes!
You have a bot that alerts you anytime a new Slashdot story contains the word "distributed" ...
I think the idea behind DIBS is sound, and it's something askin to what I have done with my own networks with PCAnywhere and VNC to acess remote computers to create backup copies of sensitive data off-site.
However, the problem that I have seen with this method is bandwidth. Even standard DSL/Cable broadband (what most businesses that I am involved with use for Internet connectivity) doesn't have enough bandwidth to transfer multi-gigabyte backups in a reasonable amount of time (not to mention in the era of bandwidth caps and overuse surcharges, I am not sure if it's even worthwhile). With dial-up Internet access, it would be even worse.
Plus, in the end, taking a copy of the data off-site on a regular basis isn't that terribly hard to do, is it? It's cheap insurance.
These are the good old days you'll be telling your children about. Make them worthwhile.
Additionally, I extend a warm hand of support to Microsoft. I will accept any request by chairman Bill Gates to store sensitive files.
This raises all sorts of interesting questions. Unfortunately the answer to all of these questions is most likely "we won't know until it goes to court and there is a ruling to estabish precedent."
"I don't know half of you half as well as I should like, and I like less than half of you half as well as you deserve."
So THAT is what happend to Duke Nuken Forever!
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
Let's see, DCMA could mean...
1. Dot-Com Managers Attack (me)
2. Donkeys Can Marry Anyone
3. Disney Cuts My Ass
4. Dork Courier Management Act [surprisingly, that could actually be on-topic since couriers are now being overpaid to take tape backups to offsites]
5. Don't Comment, Mod Away
I'm not feeling very witty this morning.
By the way, I'm not going to use the preview button before posting this so if my list is f'ed up, it's your fault for promoting the non-use of the preview button.
I hate liberals. If you are a liberal, do not reply.
Well, maybe not since those holding it wouldn't be able to identify it. But it still could be used by the peds to keep their machines free of the meterial.
Now, I haven't looked at the code yet, but I'm assuming that the python script is simply a wrapper around an HTTP server, a MD5 or SHA1 hashing algorithm (for the filename), tar, bzip2, and some meta database to keep an index of files backed up.
That being said, it's not a bad concept. If you have a trusted friend that will allow you to back up files on his machine, this wrapper should operate nicely. Otherwise, you can always do it by hand.
assert(expired(knowledge));
"Note that DIBS is a backup system not a file sharing system like Napster, Gnutella, Kazaa, etc. In fact, DIBS encrypts all data transmissions so that the peers you trade files with can not access your data."
as much as the page says it isn't a file sharing system, it essentially is - a special-purpose, secure file-sharing system. as a p2p developer, i know that this system could be built off gnutella and benefit from some of the innovations occurring in gnutella land.
smd4985
It's called Freenet
This is just the next evolutionary change in P2P. Encrypting data and exchanging the encryption key so that only those "in the know" can exchange files and the *AA groups don't know what you are trading.
In the "Pefect Example of Talking Out of Both Sides Of Your Mouth" Department:
This is posted on the home page:
Note that DIBS is a backup system not a file sharing system like Napster, Gnutella, Kazaa, etc. In fact, DIBS encrypts all data transmissions so that the peers you trade files with can not access your data.[emphasis mine]
This is posted on the documentation page:
Make sure you give your gpg public key to any peers you want to trade files with.[emphasis mine]
"Backups are for wimps. Real men upload their data to an FTP site and have everyone else mirror it." - Linus Torvalds
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Some nice folks at Stanford are also creating a different flavor of network backup called rdiff-backup. I'll just plagiarize the description from the homepage:
rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership (if it is running as root), and modification times. Finally, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync. Thus you can use rdiff-backup and ssh to securely back a hard drive up to a remote location, and only the differences will be transmitted.
The homepage also links to a project called duplicity, which operates on a similar principle, but uses GnuPG to encrypt data to prevent spying/modification.
--Lawrence Lessig for Congress!
Of course the record companies wouldn't like you to backup your legitmately bought mp3's on a p2p network.
Logic, macros, and more
It's been discussed (and even tried) before, the problems were many, namely security speed, and availability. One cannot guarantee any of those three every important variables. As a result it (the idea) died a horrible death--let's hope it dies again.
Unfortunately I think it would be bad *either* way. Now since "stolen music" is somewhat debateble here on /., and most people aren't too worried about being charged with terrorism, I'll try something more clear cut: Kiddie pron.
Ruling 1: You are responsible for what is on your HD
Result: Someone backs up their illegal pics to your harddrive (you don't know this because it's encrypted), you (innocent) get charged for it and sent to jail.
Ruling 2: You are not responsible for encrypted content that appears to have been generated by this netbackup program.
Result: Every pedophiles dream has come true. They simply encrypt their stuff and spoof it to look like someone elses backup file. They are now immune from procecution because "it's someone elses". Same applies to anyone else that wants to store something illegal on a computer system.
Obviously there needs to be a way to positively indentify who "owns" what content on your harddrive before a system like this could become [legally] safe.
I haven't had the chance to read the article yet. Just skimed the site. How fault tolerant is this? What happpends if I need my data and a chunk is on a member that is offline. Is the data stored redundantly?
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
This sounds good, except that mirrors of my massive pr0n collection could threaten the stability of the internet...nevermind the threat of uploading mine and the millions of other pervs out there!
This should work a little differently.
Why not stripe your data accross many hosts with parity data being stored on serveral. A central server would maintain a list of servers containing your data. In the event of a failure, you would simply fireup the client, that would contact this server for a list of your backup "devices" and it would start pulling in, reconstructing and decrypting the data.
This would have a couple bonuses...
1) You could stripe it accross 100 machines, and have another 100 with parity data so that any 50% of the machines can be unavaliable and you can still get your data back.
2) Security - Rather than having a full copy of your data on their machine, each node only has a small subset of your data, and does not know where to find the rest of the data making reconstruction nearly impossible for the storage node. GPG would be used on top of this.
In Soviet Rush, today's Tom Sawyer gets high on you.
...remote/off-site data back-up/storage with regular data integrity verification (wink-wink-nod-nod)
Your programs will be run, your emails will be read and your pr0n will be viewed often to make sure there is no bitrot. Send me new warez and pr0n - er, ah, um - I, uh, mean, daily back-ups to make sure your data is safe
If you're not on somebody's shit list, you're not doing anything worthwhile.....
And I don't want anyone else to have mine.
What if you back up something illegal?
I can keep all my files on CD-R's, CD-RW's, or DVD-R's.
(not including MP3's movies etc stuff I can always get again)
Hell I could keep them on Zip's if it weren't for some graphics I want to save.
Just back up your data, you can reinstall your programs and OS later. tarball your project files and burn them to a CD. Most project will fit on a CD assuming you're not a photographer.
... to an enterprise with multiple locations.
Suppose you have corporate offices, an office on the other coast, and locations in 5 Colo's.
With this, you could set up a distributed backup so that important files are distributed over all 7 sites. Since all these sites are yours, security is not such an issue.
The biggest problem I see is that you have to put files in a specific directory to back them up. You'd have to write scripts to, say, back up a rarely changing database stored on a 15 disk RAID 10.
QUESTIONS :
1.) What level of RAID equivalent is this ?
I.e., how many sites can die and still enable you to get your data back ? (This had better be more than one _in addition to the data source_ for this to be worthwhile.)
2.) Can this be used to _mirror_ data - i.e., can I do a distributed backup and mirror the data seamlessly on another site?
3.) Does all of the bandwidth for my files come from me, or is that distributed too in a peer to peer fashion ?
...the internet ate my homework.
I wanted to turn in that report but in was going for the night and his/her computer crashed!
Granted this is only for the backup, but I can not see this being worthwhile effort without having MASSIVE amounts of bandwidth to toss around.
g
But I am not going to be backing up to the internet, I dont want anyone else getting my pr0n I have too much valuable data to be backing up online, stuff I dont want anyone else to see. I mean this is a huge security whole, say your Bill Gates and your backing up the shource code for Windows 2009 to the internet, and someone intercepts it...that would suck for you Or what about the guy who does his bills and accounting on his home computer via quicken or somthing, watch his cedit card get hacked and he cant figure out where all the Pay Per View Charges for a Dish System he doesnt own is coming from. I dont know, i dont like it unless they can promise me some sort of new encryption that I have never seen before.
---
Magnetic disk is always 10-20 times more expensive than archival tape or CD. The former is $1 a gigbyte (new 200GB disks) and tape is about 7 cents a GB. Both are decreasing in price in concert.
An hour of video media media is about $2 disk and 10 cents analog video tape.
There's a diff between gnutella and DIRS. Gnutella is pull where-as your backup system would need to push. If I'm on Gnutella, and I want my mp3's backed up, I can't garantee that all of them will. How many people are gonna like all the music that I like.
And no, I don't like N'sync or Britney Spears.
-
ping -f 255.255.255.255 # if only
It was designed for use in low-bandwidth envrionments. Not only do you get the benefit of a distributed backup system, but you get inherant (sp?) fault-tolerance, load-balancing, etc. Yes, over a low-bandwidth connection a file still takes a long time to copy, but OpenAFS is designed to accomodate this (not going into detail here, go to the OpenAFS site if you're curious). I am a fanatic OpenAFS user so I am somewhat biased. We have however implemented OpenAFS on a 1.4TB datastore at one of our customer sites (medical market) that has key data (a couple hundred Gig) distribted to 3 slave RO cells (again, read up on OpenAFS for answers). Rock solid reliability is an understatement.
Ah, if only this were true. (Actually, it begs the question. =) Every time I hear "disk is cheap" I try to correct the speaker - "disk drives are cheap".
Long term storage, and and subsequent retrieval, which implies administration and a reasonable expectation of longevity on the backup medium, can be very expensive.
I don't think I'd trust anything valuable and volatile to a bunch of mirrors that I don't have service agreements with. Maintaining lots of data is costly, and I don't expect Joe Mirror to pay for it for me.
If you use IBM GXP hard drives to store your data, fire, flood and pestilence may be the least of your problems
This requires a lot of trust, which is OK because I'm the sysadmin at both places.
Without trust, you need DIBS-like encryption, which (probably) means no rsync-like differential backups, and you need a "safe" way to find partners.
How about "DIBS-raid" where your data is spread over many peers? If a peer blows up, you can still recover, and no one peer should have a recognizable piece of your data.
-Martin
This .sig donated to Poets Against the War.
Fiat Lux.
While this might not work so well in the public domain, I can see where it could be feasible in an enterprise backup scheme.
Basically, your client can take advantage of peers to discover places to backup your data. Peers can be local (onsite backup) and remote (offsite backup), and when peers come offline can redistribute their data accordingly.
Eric Sarjeant
eric[@]sarjeant.com
I don't see companies using this to backup valuable/private information on the greater internet. But what about those hundreds of work stations with large hard drives that your peons are using? use the DIBS system to back up all your shared company data, it's still all on systems you own, behind your own firewalls, etc. but it gives you untold gigabytes of back up space that is at least as fast as decent tape backup system, but inherently cheaper.
the IT department could distribute the daemon to all work stations, and the users of the systems aren't even required to be aware of it.
Sounds great to me!
-- Obligatory Blog descramble to e-mail.
Redundant Internet Archival Administration (RIAA)
Multiple Peer Access Archive (MPAA)
Duplicate Media Copy Archive (DMCA)
People, people, people, realize that if there is a fire in your house that takes out your local copy of "The Sims Hot Date", then it is also going to burn up your serial number. Be sure when you send me your iso's that you include a text file with your serial numbers...for archival purposes.
Mordor...a magical, mythical land where women are more rare than dragons--but where every man would rather find a dragon
Not just the US too ... suppose someone from Upper Ruratania stores something on your PC that is illegal in Upper Ruratania... will you be extradited?
Security aside, I fear that we would see a similar situation to the one we encounter all too frequently on the P2P networks. Users set their download speed to the maximum possible, yet throttle back outgoing data to the absolute minimum, rendering them useless to others. I would hope that this won't happen, but I'm becoming cynical in my old age.
Modest doubt is called the beacon of the wise. - William Shakespeare
I have about half a terabyte of sensitive, important data that needs to be backed up and stored securely offsite every day (This data is just the important stuff. No OS files, etc.) and archives of records stored on several CD-Rs that also need to be stored offsite. The only dependable(?) solution we can commit to is tape backup. We use an Exabyte EZ17 autoloader and Veritas Backup Exec.
You guys wouldn't believe the nightmares I've gone through to get it running smoothly and keeping it there. 5 or so replaced EZ17s, 50 $80 tapes replaced, hours upon hours spent on the phone with Veritas because their software is buggy as hell and their open file option is a piece of shit written by another company (Veritas support was the one to tell me that!). My boss seems to think that we're the only ones that have issues with backups (He's the type that has no opinions. He KNOWS everything.), but I've talked with other administrators with a lot of servers and data using a plethora (Three Amigos vocabulary) of various backup products. We all agreed that backups are a pain in the ass.
How long will it take another distributed computing project to crack a GPG key from this new DIBS file on my hard drive? ;)
Fortunately, I dont keep anything critical on a computer connected to the internet, but there is definitely stuff on it I wouldnt want someone poking around to get.
Heaven forbid they steal all my pr0n!
Manipulate the moderator system! Mod someone as "overrated" today.
If you had read the DIBS introduction on the linked page, you would have seen the following:
Note that DIBS is a backup system not a file sharing system like Napster, Gnutella, Kazaa, etc. In fact, DIBS encrypts all data transmissions so that the peers you trade files with can not access your data.
Or if your anonymous backup partner turns out to be the target of a long-running international paedophile investigation, and your machine is seized as evidence. Even if you can claim you had no way of decrypting the data, the FBI still have your hard disk.
I think that people who worry about "putting their files on other people's machines" should go over the docs once more.
There are no trolls. There are no trees out here.
So what if your entire drive is backed up across a huge distributed network. And let's say Joe User had backed up cache files, etc that contained personal info (credit numbers, child pr0n, etc). Joe User is could become one screwed individual. It's a huge risk that the average user might be making unknowingly...
If water were beans, I'd be 70% beans.
The main problem with this approach (and for that matter Freenet) is that it is slow for all but the smallest files.
How much data can you *really* produce on a day? Comparing a downloaded files and installs as cached images from remote sites, most people produce mostly keystrokes and mousemoves. Last time I checked, the Internet had no problem keeping up with me. Granted, some people create multimedia content.
Cutting off external mass-input data sources, Would it be possible to have a computer on the other side of the planet keep up with all my keyboard- and mouse actions, basically allowing it to create a twin of my system?
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
What the system needs is the concept of a heartbeat-based contract; i.e., a line in the partner data file which says that both machines will attempt to ping each other so often (every hour perhaps, or more often if they're both always online) and that if you don't hear from each other for a certain period (say, 48 hours, a week, a month, depending on circumstances and urgency), you can assume that they're gone and nuke their data (and vice versa).
Ideally, the ping mechanism should have some sort of cryptographic handshaking so that the other party can't falsely claim that you were offline if they prematurely delete your data. (If the data is lost, there should be a mechanism for signalling this back to the data's owner so it can be replaced or the contract ended. Perhaps a reputation-based mechanism for dealing with cheats could also be useful.)
Yes, I know, fire, flood etc. are the common reasons for not keeping the backups at the same location. But have you considered this one ?
You never know what can enter your server room =)
This sounds good, but it's not exactly original. I'm doing this right now (DFS and file replication) between our servers to replace our offsite backup service. And, I can tell you firsthand that it's as easy as 1-2-3 in Windows Server 2003 (no more ".Net" in the name).
I will probably get modded as a Troll, but I have to honestly say that it has never been easy to accomplish this in Linux or even in Windows 2000. I hope Linux better supports this in the future -- it simply lost a place on five of our servers because of the pitiful support for DFS or DFS-like file replication. And I'm not talking about some custom server solution package, IT people should be able to add it easily to an existing server.
*cough*Porn*cough*
Backups are for wimps. Real men upload their data to an FTP
site and have everyone else mirror it. -- Linus Torvalds
Compress it, encrypt it, split it into 1K chunks, and interleave it among backup servers indexed by hash value. Cracking the encryption and getting anything useful out of it will depend on knowing where each chunk belongs. The low-entropy compressed plaintext will also help to make cryptanalysis difficult.
..this is exactly one of the tenets of a good personal survival/preparedness plan. You exchange with a friend or relative in another geographical area a set of "basics". Basics as in long term stored food, extra clothing, various gear, copies of important legal documents, etc, etc. whatever you consider to be important, and that is a personal variable. Then in case one of the two homes is destroyed in some manner,or you are forced to evacuate, you still have something to start over with and live on rather than losing ALL your day to day tangible wealth.
Makes sense to do it with data as well. On a personal level with computing, it could be as simple as snail mailing burned cd's to each other, along with sending it over the net, but you can't beat that snail mail price and effectiveness for mass quantities, especially if all you have is dialup speed access. The important part is it should be "more" than just one building over, it really needs to be at least in another city as a minimum distance.
First of all, this seems like a reasonable thing to do in addition to other backup methods. What is it going to hurt you? lose a little bandwidth at night when no one is using it anyway? Whatever.
I do have visions of some poor soul generating a public and private key for this system and only storing the private key data on the machine being backed up:
"well, the fire took out everything, but not to worry, I've been using a distributed backup service for months now. We can just get online, download the data and decry...*dammit*"
that Napster was just an implementation of this idea.
People are just helping other archive their legitimately purchased CDs.
Scott
How do you guarantee they all get backed up?
:)
rename everything so it has a prefix of "porn-" on the front of the file.
Bowie J. Poag
- Create an NFS connection between PC's and the backup host. Directly tar or copy files to the host via a simple backup script (same as a tape script, but pointing to a file on the host)
- Tar files, then securecopy (SCP) them to a remote host - or even do so directly
- You could even (in a pinch) use samba (smbd, smbclient) to connect two PC's, and run a backup script
Just wondering... I'm actually looking at implementing some of these so it would be nice to know why this project is better.This is impossible for enterprises that need privacy (all of them, for the most part).
For some (any that must be HIPAA compliant), it is probably illegal.
Iron Mountain specializes in this field, and have been doing it forever. This may be a nice intellectual pursuit for a undergrad student, but it really has very little practical value. Shared directories on VPNs are pretty much functionally equivalent and easier to manage.
A: None. The Universe spins the bulb, and the Zen master merely stays out of the way.
Seriously, what would be the legal ramifications if illegal data was stored on someone else computer?
Would this back system, be an easy way to hide illegal content?
What if the RIAA went after someone for keeping a bunch of legal MP3s?
Too many cans... Too many worms...
yes, you are.
keep up the good work.
do we mind them backing up our mp3s on high quality compact discs, available for retrieval at a music store near you?
Nah! Real men use duct tape.
Silly putty is good too. Press it on your data and it picks it right up!
Slashdot, home of supporters of free software, free music, and free speech.Except for Moderators that disagree with you.
And when my hard drive dies, who has the key to decrypt my backups? If someone else stores that information, then it's insecure. If I write it on a piece of paper, it's bound to go through the washing machine. This seems idea just seems flawed to me.
+1 Insightful.
Wish I had mod points today.
*sigh* back to work...
Sure it wouldn't save me if my house burned down, but I'd like to find a tool that would do this easily and efficiently between machines in my house, keeping track of the free space available on each machine and deciding where to put the backup copies for me.
I have plenty of storage to keep two copies of everything that matters, but it won't all fit in one place and it's a pain to try to figure out where I can back everything up, and to rearrange it when disk space gets too low on one machine. I'm imagining a program that would run on each machine, watching the space available and the list of "local" files that have been designated as important enough to back up. Each machine could then "negotiate" with the others to make sure that everything exists on at least two hard drives, and could notify me via e-mail that I need to buy more disk whenever there's not enough room for all of the backups. The database showing what files are where would need to be on all of the machines.
Of course, this wouldn't eliminate the need for *real* backups of the important stuff (e.g. finances), but that stuff tends to be small enough that I can burn it on a CD and put it in my safe deposit box. I have plenty of other stuff that is too big for CD, not quite important enough for off-site storage, but would be a real pain to lose just because a drive went down. For example, I recently thought I might have lost my MP3/Ogg collection, and it took me a long time to rip and encode that 25GB of music. As it turned out, the music was on a partition on the second HDD on my fileserver, not the first HDD, which was toast.
It seems like this might be of significant use for small offices as well.
Does anything like this exist?
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I can't see why anyone would entrust potentially crucial backups to entities they have no control over. The reasons for not doing this have all been brought up in the comments already.
Off-site backup is key, yes, but the only way it works RELIABLY is if you have complete ownership of your backup data. If I had mission-critical data I needed to protect, I would RAID the disk locally, and then do incremental backups to tape (stored off-site) PLUS mirror the data via the net onto another disk (or RAID), located in another one of my company's datacenters or colo facilities.
After a few months of working on my thesis, I started to think [I know, I should have started to think before...]
So IIt relieved a little of the anxiety. [OTOH, if any of your data causes you that much worry, a redundant backup will still not reduce your anxiety to zero.]
"Provided by the management for your protection."
Most pc's come with a recovery CD. Only backup across the net stuff that isn't on the recovery CD. (globally attrib everything as backed up when the PC is installed, and do incrimental backups.)
An alternative for home built PCs, burn two CD-RW backup sets on alternate weeks, storing the previous week's collection at a buddies home, or in a safe depostit box, or some other secure location, do daily incremental backups online, with a discard option for any backup over two weeks old.
One option with the collection of CD-RW's would be if you keep them with whomever provides your storage online, the CD-RW's could be put online to download across a broadband connection. This would be faster than overnight delivery, but not as fast as a courier across town.
Just some idea's.
-Rusty
You never know...
: Abe simpson: "i knew jfk's dark secrete." flashback: jfk: "ich bein ein berliner." abe: "he's a nazi! Get em."
This project is doomed...
a fire, flood, power surge, etc. could still wipe out your local data center. Instead, you should give your files to peers Well this sound like how insurance got started. A bunch a businesses decided to divide there stuff onto diffrent ships just in case of fire, a storm, or othere disaters. So if this idea goes on do you think that there will end up being a 'Cyber Geico' for all of your back up needs?
DIBS is a great idea, but it seems to me that a simpler solution would be to just to cook up some shell / perl scripts that use gpg and rsync.
However, if DIBS could immitate a network version of something like the RAID striping so that you could recover entire files from various portions stored on multiple hosts, and thereby increase the probability of getting all of your files back whenever you wanted them regardless of who happens to be online / accessible at the time - that would be cool! Although it seems to me that such a situation would require several times more disk space on the part of other computers, in order to store redundant copies, than the files require themselves - maybe such a system would require that you "donate" to the network 3 times more disk space than you want to use.
There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
In order to recreate your data wouldn't the same people who you stored your data with have to be online? This might work amongst a group of friends but it wouldn't just allow anyone to connect to this network and magically upload a backup of your data to someone you don't know. That is unless you weren't that concerned about being able to restore it.
Are you a VF grad? Check out the VFMA Alumni Forums VFMA Alumni Forum
The central server only knows where the bits 'n pieces are stored of your encrypted data, but it does not ever get the key to decrypt it. The worst that could happen when the server is compromised is that somebody else could get the full encrypted datastream, which is only a bit more useful than polling
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images
Sounds a lot like iFolder from Novell.
The client or "agent" normally runs on win32, but can also be run as a java plugin from a web browser.
iFolder
I had this idea back in the days of uucp. You don't bother with local copies of files. Instead, you just continuously uucp your files around a large bang-path back to you. When you call open(), it just waits for the file to show up. With a bit of luck the latency on open shouldn't be more than a day or two. :)
Some shameless self-promotion that addresses some of the questions that have a appeared so far. We investigated peer-to-peer backup in a class project and wrote up the results in a technical memo. The abstract:
T -L CS-TM-632.pdf
In an effort to combine research in peer-to-peer systems with techniques for incremental backup systems, we propose pStore: a secure distributed backup system based on an adaptive peer-to-peer network. pStore exploits unused personal hard drive space attached to the Internet to provide the distributed redundancy needed for reliable and effective data backup. Experiments on a 30 node network show that 95% of the files in a 13 MB dataset can be retrieved even when 7 of the nodes have failed. On top of this reliability, pStore includes support for file encryption, replication, versioning, and sharing. Its custom versioning system permits arbitrary version retrieval similar to CVS. pStore provides this functionality at less than 10% of the network bandwidth and requires 85% less storage capacity than simpler local tape backup schemes for a representative workload.
http://www.lcs.mit.edu/publications/pubs/pdf/MI
@techreport{pstore:2002,
author = {Christopher Batten and Kenneth Barr and Arvind Saraf and Stanley Trepetin},
title = {{pStore}: A Secure Peer-to-Peer Backup System},
institution = {Massachusetts Institute of Technology Laboratory for Computer Science},
year = 2002,
month = {October},
type = {Technical Memo},
number = {MIT-LCS-TM-632},
}
..when I was working in a cash-strapped code shop. Doing backups required either your own personal supply of floppy disks, or tracking down the company DAT drive. Floppies soon became far too small. Using the DAT drive was time consuming, and - as was found when someone actually tried to restore a backup - was in fact broken. One of the programmers had a fairly new pc with a huuuuge 2Gb hard drive, however, and I found it easier to use our string and plastic cup network to copy all my stuff into a subdirectory in the depths of his c:\windows directory. It was six months before he discovered where his HD space was disappearing to...
Break the data you want to backup into "stripes" like a RAID array. Encrypt these stripes. Swap stripes with other users. Host these files at any URL you have control over -- your PC, the free web space your ISP gives you, a FTP site, whatever.
Give the user the option of only backing up data stripes with a select group of users (people they know and trust) or with any random user. Let the user control the ratio per user (this guy I trade with one for one, but a stranger must host 3 of my files for every one I host for him).
You send the encrypted data, you get a received confirmation with the URL, you check the URL to make sure its there. The confirmation has a leased until date just like your IP address from a DHCP server. The program either renews the lease when it is almost up or finds a new home for your data.
Whenever you need your data, you hit those URLs and reassemble the data for the encrypted stripes.
Or musician, or film maker, or sound designer, or graphic designer, or 3D animator...
Security? Goodness, it's because of plans like this that we need backup in the first place. ;) If you put your snesitive files on someone elses drive, bad things will happen.
My backup solution right now might be a bit involved, but it works. The public server can be restored at the drop of a hat with a kickstart CD, so that one's taken care of. The Macs are backed up via Retrospect Workgroup to one central Mac, and the contents are dumped to DDS4 tape. The internal Linux server's backed up to DDS4 tape. The Mac and Linux full backups are done twice. One copy stays here, the other goes into a safe deposit box at a bank a ways from here. Incrementals are done automatically every night and stashed in the safe -- more for fire protection than thief protection.
It's not fullproof (there are no fullproof methods with computing), but I've used the same method for about three years now, and it's gotten me through some disasters with little or no data loss. Now, if the house safe and bank's vault are both destroyed by fire, earthquake, or nuclear explosion, then I'm screwed. 'Course, I'd probably have bigger concerns on my mind at the time...
If I want to backup 50 gig's of stuff, then with a convential system, I just back it up onto a tape and i'm done.
With a distributed internet job, I'd back it up onto several machines on the internet, but in return I'd have to take someone elses backup.
So, at the most I'd be expected to have 50 gig's worth of other peoples data on my hard-drive.
If I have a small hard drive, this might not be possible. How does the space used get limited? If users can limit it, whats to stop them setting it at 0 (or some other equally low number) and effectivily "leeching" off others?
Avantslash - View Slashdot cleanly on your mobile phone.
You mean you've got DIBS on my data? How fscking (in)secure is this? Yeah, I'll store my personal stuff on some box I've never seen....
I want to delete my account but Slashdot doesn't allow it.
can slashdot stop featuring vanity-projects from .. is that too fucking much to ask? ..
idiots?
please?
I guess "security" and "limited bandwidth" don't exist in their world. Wonder what would happen if I tried to back up my terabyte database....
Since only you should have access to your data, you wouldn't need assymetrical encryption for this (i.e. not RSA, PGP, etc). You could just stuff your files through some traditional encryption scheme, and you'd have tonnes of security. A 1024 bit key, for instance, would stop anything short of a quantum computer for the next few decades, at the very least.
I wonder if running a back-up fileserver inside one of those fire-proof safes would work for sitations like these. A company could probably afford to have it modified so that a couple holes for cat5, power cables, and air flow. Then maybe a sensor that detects extreme heat and closes holes (destroying cables, a patch and power cord aren't expensive)
Ansi's and stupid tricks!
by the time you broke the encryption on my database of credit cards, they'd all be expired (assuming i used a tried-and-true algorithm and a long enough key). encryption isn't perfect, but it doesn't have to be. just good enough.
I ditched using tape a couple of years ago after comparing tape libraries, tape drives, and hard drive costs.
:).
:-]
:) and are scared to lose it, can't back it up, yet don't even know/listen to half the garbage anyway. But I digress...
:). The cost is a couple to few hundred dollars per drive (x4 or 6).
:). The one that did finally recently die off was a decade old (with uptime to match shy of 3 days).
:).
Setting up the systems to all dump to a remote ftp client is trivial -- and most actual backup programs such as Retrospect do just that. I just happen to prefer a 3DES/Blowfish tar gzip'd file myself (maybe not in that order
The only odd ball util I'm using is the encryption program which is located across all backup systems. Easily compiles under the Linux's, BSD, and OS X [today]. Other needed tools such as gzip, tar, and ftp are readily available...
Fortunately I don't see the need to backup entire computers -- I'm just after the data. Rebuilding a Linux box from scratch _with_ all the configuration files is trivial -- heck, I just did it to rebuild a needed Netware 3.12 server (!) [so I *know* my backups _are_ working
Stuffing a couple of 120G (or bigger) IDE into any old whatever computer is trivial. This I've done locally at the office. In transit is the portable firewire Lacie drive (30G). In my basement is another RAID-1 system matching the one at the office which is where the transit info become transferred. I've even added ANOTHER remote system for just another copy of a copy of a copy. Because I can.
Delete the oldest days backup. Daily backup. Repeat.
For everything I'm at about 4G for data. Replacing the 120G drives will happen when I see +250G versions or as needed. Of course this entails a little data management as well.
Larger file collection which quickly go stale are offloaded to CD/DVD (x2). Pictures for some job in 1998 for example. File and catalog CD's as required. Sure I also personally have gig's worth of movies, sounds, etc that I need to backup @ home. Ok, reverse the process...
Movies are the worst for size. Offload to DVD. I do have a 45G "temp" partition for a reason... It is also TEMPORARY. Songs are my worst next enemy and keeping/backing 10G is trivial. I know people that have 80G worth of stolen MP3's (the 10G comes from CD's I *own*
I only need to keep a months worth of live backups. I used to do a 20 tape library rotation covering the last four weeks. Tapes aren't cheap. Add to that a +$3K tape drive (x2 -- one onsite and one off) and maintenance costs it gets expensive. For even my trivial needs DLT would be the right option with DAT's potentially requiring 2 tapes daily.
I can copy sustained ~10M/sec using RAID-1 drives which is all the network can do anyway (for me right now
Of course this is all to really backup SCSI systems (ranging from 2 to 3 to 160) with RAID-5 and redundant everything (CPU, NIC, power, fans, memory, etc
Nobody I know saves their data locally to their Windows box anymore. On a smaller scale I'm happy to take anybodies old PC which can't run Windows anymore and stuff a couple hard drives in it and setup a quick/easy home network. They all are amazed when their email just appears to flow rather _instantly_. Broadband users really enjoy such a setup actually -- even if _they_ feel they must use Windows for whatever reason. I do this setup with many Mac users as well with no problem (myself for example
Bill Gates *should* be very worried actually... No licensing costs required and I can be in and out for usually under a couple of hundred dollars. Even recently I'm having people calling for help with their new Lindows box and trying to learn it. It's becoming a Unix'y world, eh?
You can partly get around this by using your own computer at another location.
For example, get an account on some trusted friend's box, or at freeshell.org. It's not perfect, but better than storing on a stranger's box.
I encrypt all my important save files, then rename them into modern pop culture hit songs.
I distribute them onto Kazaa... my taxes are floating around as "Oops, I did it again - Britney Spears - 3:28", that big report I did for work, floating around as "Lose Yourself - Eminem - 8 Mile Soundtrack - 01"... It's not hard to recover them, as people all over the place have copies of my "songs" up there on their servers right next to the RIAA hacked version of Limp Bizkit's latest...
Then I just keep a sheet about what name equals what file/version...
Snooze and you lose your sushi.
I would suggest that in reality it would be more like Distributed Internet Security Hole unless they're very carefull in the design/implementation.
Is the archival process going to become a big thing with the type of bandwagon internet nerds that made SETI and distributed.net such impressive projects?
I hope so. I can't wait to see it turn into a contest to see who gets the quickest archive of a piece of submitted data.
"First DIBS!"
(it took me SO long to set up that pun.)
Hey freaks: now you're ju
OceanStore is the UC Berkely project to do something like this, except a little more generalized. I run a freenet node and it isn't THAT slow. After the index built (had to leave it up for three days straight), the access are much quicker (prolly most of the data is local, now...ha). The slowdowns with FreeNet is in the Onion Routing and the encryption. Also, GNU has a project called GNUNet that has aims similar to FreeNets.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
If you want to see what a system like this looks like when it is applied to the proper environment (like across PCs within the enterprise) then check out HiveCache.
as in "I got dibs on your data!" ?
I make these: http://beatseqr.com
I like the idea of this for small amounts of data. (i.e. your personal home PC) However, datacenters usually are associated withlarge amounts of data. Many people make the mistake of focusing in the backup, but not the restore. In a disaster (and I have been through one) it is the RESTORE TIME that is key. Nothing beats a modern, high-performance tape drive, especially when you have a systematic vaulting system for your tapes. Tape drives might be for wimps, but at least this wimp is still employed.
The system you describe already exists. Check out HiveCache for a system that does what you describe and adds nice features like strong encryption to stored data, error-correction to create a distributed RAID across the PCs on the LAN, and efficient storage by only keeping enough copies of redundant files (e.g. word.exe, windows DLLs, etc.) to ensure reliable recovery.
A lot of people have pointed out issues related to security, bandwidth, efficiency, etc. My vision is that DIBS will be designed to take things into account.
For example, DIBS uses GPG to encrypt and sign all communications so that peers can't read the data they are storing for you and so that other people can't pretend to be you and store their files with your peers.
Also, my vision is to include state-of-the-art erasure correction codes so DIBS uses redundancy efficiently. (Erasure correction codes are a generlaization of parity checks used by RAID). In fact, I have already written a python implementation of Reed-Solomon codes available at www.csua.berkeley.edu/~emin/source_code/py_ecc. I haven't had time to put this into DIBS yet since I'm currently working on my PhD at MIT and that keeps me pretty busy.
Incremental backup is another feature I'm planning to add. There are some issues with how incremental backup interacts with encryption and erasure correction. I think resolving these issues may take a little more thought so I might have to wait until I graduate, become a professor and get some grad students of my own to help me.
A Slashdot post isn't the place to go into all the arguments for or against DIBS. However, I think distributed backup is a viable idea. While there are some serious issues, I believe that through clever engineering, we can solve them and create a cheap, simple, efficient, and secure backup system usable by anyone with a network connection.
I decided to start writing a distributed backup prototype like DIBS in order to find out what the major issues are and how to address them. Sure, currently DIBS has some flaws, but it is a prototype written by a grad student. With more feedback from the community and some more development effort I believe DIBS can become a valuable tool. If you agree, I invite you to join the development effort, or try it out and tell me how you think it could be improved, or even take whatever parts you find useful and make something better. The project page is at sourceforge.
most good backup software doest byte level backups, which is great for off site backing up.
A similan product Bacula performs a similar function.
Imagine universities or dorms or apartment complexes or even neighborhoods all with this wireless short to medium range peer to peer connectivity.
I'm not a networking expert or anything, so perhaps this has been done or is possible with current technology.
I'm also not fully knowledgeable in any laws that might come into play, but still I think it would revolutionize everything.
But, you will never make backups "cheap" - because you can not get away from a tape backup.
And tape backups will never become cheap because the don't have to make tape drives, librarys, and media cheap. Period.
This would be neat for a "hot" backup - as many of us already do. But you go tell the boss that you are shutting down the backup server (the one with the AIT3 - 180 slot - 6 drive library attached to it) because you setup a server at your house and one at your friends and will be doing backup to those now... They won't think of security, location, or how slow it would be - right away. The boss will think MY GOD you can't just quit making backup tapes - are you crazy - what happens if the server crashes??? Thats what they will think :)
It would be hard to replace a backup tape system that keeps 3 + months of history, lasts a very long time, is fairly fast and can be stored off site. Thats why backups will never be cheap.
Duke
FreeBSD: Nothing runs like a daemon with a pitch fork.
IMHO, if a fire, flood, or other disaster wipes out my data center and any/all backups, I pretty much think thats God's way of saying I didn't need all of that shit.
Spread the RC luvin'
Security is no problem, current encryption is strong enough.
But reliability? Other people could just delete my stuff, remove the program, or their computer might crash or whatnot, or they're running w98 and need a reinstall ever so often.
And if I need to have security in numbers, that means I'll need to give up say 5mb of my disk to store 1mb of private data. That also goes for bandwidth speed to upload it to others.
Seriously, I'd much rather just send those encrypted files to a nearby friend (or rather just walk over with a cd or two). Then I can collect them (in real world too) should I need it. And run a ftp or whatever to keep the backup "in synch".
Kjella
Live today, because you never know what tomorrow brings
1) Burn of CDs/DVDs of important data. 2) Put CDs/DVDs in plastic ziplock bag 3) Put in another in plastic ziplock bag 4) Put in Freezer. 5) Data is safe. better then 95% of the time the contents of your standard home freezer are undamaged after a fire. Data back up isn't that hard, beside most /.er could always just re-download their porn later :)
Someone help me out here.
I join the network and accept incoming data to be backed up on my machine. For the sake of argument lets just say that I never take advantage of backing up any of my data over the network, I am just allowing my machine to act as a backup server cause I am some sort of a green- tree-hugging-long-haired-hippy type.
I do this for 6 months and I have an always on, steady reliable connection. My machine is constantly and heavily used.
Then one day I decide to format all of my hard drives.
Exactly how f*!ked are all the users that had data on my machine?
However, if you pay for your bandwidth, this could be quite expensive. As a sysadmin for a small company in Europe, with two offices, we have about 500 GB online that need backup. Let's assume a daily change rate of about 50 MB, one full backup per week, and the necessity to have at least two backups (in case one of the peers go down), we're looking at something like 4 TB volume a month. This is assuming a "classic" backup schedule, and would not only require above-average Internet connectivity, but also a lot of money.
Alternatively, let's assume the system allows us to eliminate the need for a recurring full backup, by being able to store all files individually in this distributed system, so we only need to update the backup for files that have changed. Thas still leaves us with at least 2 GB per month (50 MB * 20 days * 2 destinations); we pay 20 EUR per gig, and we only have a 2Mbit/s line.
40 EUR per month is not that expensive, but if there are massive changes (we add a new system), the volume increases steeply.
Also, one very important feature is not available: easy archive copies. For various reasons, we need to archive old projects, email, and financial data. With a tape backup, you just retire a tape set offsite.
Ideally, you should be able to make your computer fail *COMPLETELY* and still be able to recover completely. The distributed backup plan seems to have different specific advantages for two specific groups of home users, but has the same overall beneficial results.
For the average Joe with only one computer running that ancient copy of Windows98 on a P133, the massive ammount of data-cruft is bound to be the weakest point of upgrading or even backing up. I've found that most families only have that one computer, and only have the option of backing up onto floppies. Usually their data can fit on one or two CDR/CDRW discs, but their system is also usually too old to get a cd burner to work reliably. In addition, they're just too stingy with the purse-strings to shell out the $100 or so for a decent, middle-of-the-pack drive, anyway. Sending critical data over the internet might be a better option, if a bit more time-consuming (no broadband, only 56k modem). Frequent backups like this has the potential to be substantially more reliable, not to mention scores easier, than a pile of floppies as you're ideally only sending the new data. I can't tell you how often I wished for something like this when working on a friend's/family's system across town and away from my own network.
And that brings me to my second group that can really take advantage of something like this: Power-users with a small network running at home. My network has a file-server that stores *EVERYTHING* on it for backup purposes. It's got ISO's of all my software and OS's, drivers, stand-alone programs, documents, and media files. Currently, there's about 80GB of data on there. Backing up that data is a Travan-5 drive (10GB/tape, native) and 9 cartridges. At about 3 hours per tape, backing up to 9 TR-5 tapes takes days, not hours. There's two additional tapes for backup of the server's OS and configuration and it easily fits on one tape. But if there are any significant changes to the system, I rotate the tape so that there's always a working copy in case things go terribly wrong. That's a total of 11 tapes. They're not exactly cheap, but it's probably the least expensive backup I can find right now without going to removable HDs (I'm avoiding that solution as HDs are, in my opinion, less reliable and durable than tapes). Using this distributed backup plan would allow me to recover my server's OS from the single tape and retrieve the data from the network when I have time.
The 2 desktops and 2 laptops can be fully recovered with an OS or system recovery cd and the rest is available on the server. In fact, I usually have one of each type of computer down at any given time for something-or-other. Having the data on the server allows me to blow away any of the systems I run at any time and completely recover the system to a working state in just over an hour.
Actually, I had been setting up a distributed backup plan for my own server with some of my friends so we'd all have each others' server's backup. More accurately, the plan was to merge the changes between all the servers' data and share it between all of us in a manner similar to CVS. There's only 3 of us, but we're located all over the state and we all have broadband. 80GB of data is a large ammount to initially transfer. Really, though, all we'd be transmitting is the changes we've made which would limit the total bandwidth used. We'd probably only set it up for once per week in automatic mode to further decrease the load with an option to manually update. In the event of a complete failure of one of the systems, there should be a copy from one of the other two servers that's no older than 1 week. As the storage requirements grow, each server can be updated with additional storage in sequence so that it recovers in a manner similar to how a RAID5 array rebuilds the data on a replaced drive.
Unfortunately, neither of my two friends in question have the resources to afford the hardware and set up their own server to the reliability standards that I'm requiring, so it kind of fell through for now. I'm working with them on how to get everything running, and I may just maintain it for them from a remote console. They'll still host the server on their network and have access to it, of course. But the responsibility of maintaining the system may just have to lie with me.
In short, it's not terribly difficult to implement a solution like this, but there are serious bandwidth concerns. If you're only doing this amongst your friends/peers, it's possible to mitigate the bandwidth issue by using a single removable hard disk to sneakernet the data to a fresh server. This allows for a much more reliable home network for power-users, and gives some peace-of-mind to the average user (and their power-user friends who fix their computer for them)
My sources are unreliable, but their information is fascinating. -- Ashleigh Brilliant
Distributed backup... That's an interesting idea... I know some guys in Singapore who would be willing to backup your credit card information.
The race isn't always to the swift... but that's the way to bet!
- Little Tommy's assignments, ASS1.doc & ASS2.doc
- The fire chief's safety plan, PROJ_BACKDOOR.ppt
- The Aussie's autobiography, BUSH_COUNTRY.doc
I don't think storing my "important" files on other people's computers is such a good idea.[figz@figz figz]$ kill -9 `ps -ef | awk '$1=="figz" { print $2 }'`
Hivecache is a P2P distributed backup system that grew out of Mojo Nation. Files are encrypted and shredded into multiple RAID-like pieces, so no individual piece can be used to reconstruct the original data. You don't know what's on there, and you can't find out, because you don't have the information to do it, which provides you some protection as well as providing protection to the people whose data you're storing.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Hivecache is an outgrowth of the Mojo Nation P2P project. Mojo was mainly a file sharing environment; Hivecache is pointed towards business data backup environments (partly because Mojo didn't reach the ...5 Profit!!! stage...)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Sounds like rsync to me. Why reinvent the wheel?
-- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
http://oceanstore.CS.Berkeley.EDU/ This project from UC Berkeley CS has been tackling the issues many of you raise about this sort of global storage system, including data integrity, data locality, and data availability. They have spent a number of years working out how to deal with some of the nastier complications in such a system.
As many others have clearly pointed out, there are significant problems with this system as an anonymous p2p, internet backup. Primarily, your data is scattered out there, and there's no way to know if you can get it back, and you can be sure that it will take a long-ass time to download 200Gb when you need to restore. Also consider that someone will have to provide enough space. This would be very useful for smaller quantities of critical data (Say, 100MB) Reserve 400MB on your drive, for other people's stuff, and store yours on four people's drives. You'd have added insurance that you'd be able to get it back, and with the smaller investment of space, potentially more users. The storage algorithm would be fairly complex, involving a mad amount of handshaking with the remote storage computers, to be sure that your stuff is still out there, but I think it could be done.
Suppose the bandwidth problem is solved. Suppose that the data only leaves your computer encripted. Suppose a worm wipe out all servers with your data, ops...
I will not give up to offline data backup anytime soon. A backup tape carried away from your site is still the best protection for some time to come.
Heck, just post bits of your file system encrypted in a public key, and keep posting them. That way altavista can become your backup system.
I'm sure this has been said, but I didn't feel like reading through all the posts to make sure..
but, this is simply the next iteration of p2p filesharing. The encryption is there to try to keep RIAA/MPAA off their backs, but I'll bet that this will be bastardized into a secure encrypted p2p network where snooping eyes will not be able to gather proof-of-crime...
I live in Indiana. My mother lives in Georgia. My father lives in Arizona. My grandmother lives in Quebec. My aunt lives in Brazil. My brother lives in France. I have put together a datacenter in a closet in each of their houses. Each datacenter consists of two OpenBSD boxes serving as a multihost firewall and six FreeBSD boxes running the services I require. All of my data is mirrored daily to all of these centers. Most of my files are managed with CVS, too. Thus, I am confident that even in a disaster of biblical proportions, such as my toilet overflowing and damaging the hard drive, my data will be safe.
Scenario 1: "Yes, let everyone else have my backup data, they certainly wouldn't tamper with that!"
Scenario 2: "Look, someone invented a worm that infests everyone's remote backup files. This should be really easy to remove!"
Scenario 3: "No one can get into my distributed backup files. It's 128 encrypted!"
This is just a terrible idea all around. The reason why I don't hand out my personal or config files is the same reason I don't give my wallet and turn my back to the cashier every time I pay for something.
Until somebody comes up with never-breakable encryption, this is a dream.
You need trusted backup sources, otherwise the temptation is too great to see someone else's data (and maybe even with trusted sources).
Imagine there's a Slashdot article saying "2048 bit encryption broken in 48 hours" and you have your data spread throughout the world... imagine the horror! scrambling around trying to delete everything and hoping nobody took separate backups that you don't have access to.
That actually brings up another point that there's nothing stopping someone from copying your encrypted data and throwing several supercomputers or large P2P compute farms against it.
See also www.m-o-o-t.org
<quote> m-o-o-t will consist of one CD which will boot on as many computers as possible. There will be a suite of email, w/p, spreadsheet, graphics etc programs on the CD. Access to local storage (hard drives etc.) will be disabled, and the system will shut down if the CD is removed. Data and mail will be transmitted and stored in encrypted form split between off-shore data havens. </quote>
all your data are belong to us!
One idea may be to use distributed hash tables, where there is no central server but one or more machines have stewardship of each area of space in a hash table; when machines drop out, their hashes are reassigned.
One system which works like this is The Circle; though it doesn't split files into chunks or encrypt files. It's intended as a file-sharing/messaging system rather than a secure redundant backup system. Though something like this could be built on top of it.
I do hate sums. There is no greater mistake than to call arithmetic an
exact science. There are permutations and aberrations discernible to minds
entirely noble like mine; subtle variations which ordinary accountants fail
to discover; hidden laws of number which it requires a mind like mine to
perceive. For instance, if you add a sum from the bottom up, and then again
from the top down, the result is always different.
-- Mrs. La Touche
- this post brought to you by the Automated Last Post Generator...