Domain: allmydata.org
Stories and comments across the archive that link to allmydata.org.
Comments · 20
-
Re:Or you could get an MSCE
I'm not sure if you're aware of it, but the Transitive Grace Period Public License requires releasing of source code after 12 months. It's an interesting idea that I've seen around a few places.
-
Re:How Precisely Could P2P Solve This?The difference there is that your relatively small key holds the potential for everything on your page.
Yes, that's intentional. In cryptography it's known as Kerchoff's principle: only the key should be secret, everything else (the encrypted data, the system design, the source code) should be assumed to be known to an attacker. That approach leads to strong designs because the designers can't rely on handwavy arguments like "Oh, nobody's likely to hack the Facebook servers" and "Facebook's thousands of employees are all trustworthy".
And how long before key collecting viruses run rampant and phone home to a black market provider's server where all Diaspora data is cached?
The same argument applies to Facebook passwords, except that with Facebook, the black market provider doesn't even need a server. Viruses are a problem, but they're just as relevant to client-server systems as P2P systems.
I understand how asymmetric key encryption works in PGP but that requires that you have a single person you are sending the message to
... do you need to build a PGP public/private key for each of your friends?No; you only need to generate one public/private keypair, regardless of how many people you want to communicate with. But PGP's probably not the best model for a P2P social network - something like Tahoe is a lot closer (I hope the Diaspora guys have the sense to use it rather than reinventing it).
Then I guess my next question is where does this decryption take place? Obviously it has to take place on your friend's box otherwise the people in the middle would have your key and your unencrypted data. So your friend logs on to check out your picture on Facebook
... but he's on his netbook so he has to wait to get the encrypted data then decrypt the data on a possibly low CPU intensive device.Encryption is cheap. Seriously, it's cheaper than water. Once you've established a shared key with your friend, which only has to happen once when you first friend each other, all the rest of the encryption is symmetric. Again, PGP's not the best model here because it does asymmetric crypto for every message. Think about HTTPS web browsing or a GSM phone call instead; mobile devices have no trouble handling those.
And then when people start posting unlicensed songs and movies to their pages you'll have the MPAA and RIAA trying to sue the crap out of everyone ever connected to it and then they'll start caching as a Diaspora node
... and wait for legal action to get a potential file sharer's key by court order ...That's still a lot more secure than Facebook, where copyright holders can get stuff pulled from your page by sending a DMCA takedown email with no court oversight at all, and you're subject to arbitrary censorship by Facebook itself.
People seed on bittorrent because they can use the files that they're seeding but they're not going to be able to use my encrypted files that people might want when I'm offline nor will I be able with a netbook to help them out with hosting their files.
Yup, downtime and mobility are major challenges for P2P networks. The most likely solution I see is a little fanless Linux device that sits beside your cable or ADSL modem and participates in the P2P network 24/7, trading some of its storage with other devices so your data stays available during its occasional periods of downtime. Another possibility is that if you can't run a node yourself, you rent or borrow a share of someone else's node, just like you do with email servers. That's more like a federation than true P2P, but, crucially, like email and unlike Facebook, there's no single party providing accounts to everyone, and you're always free to change providers.
-
Re:Forecast: Cloudy forever
Somewhat exists already:
-
Re:Next Gen File Systems/Storage Management Soluti
-
Tahoe-LAFS
The RAID concept can be extended to multiple PCs forming a storage grid. One open-source implementation is Tahoe LAFS.
-
Re:Levels of importance
You might consider adding the ability to distinguish nearline storage. That way you could add computers around the home as devices that could be backed up to and restored from quickly, but that there would still be remote computers will full copies of the backup.
That is part of my longer-term plans, though not exactly for the reasons you suggest. My main reason for adding a nearline storage layer is to address the fact that most people use laptops these days, and although my solution does a very good job of allowing backups to be interruptible, backing laptops up directly to the grid would require leaving them turned on and connected for large amounts of time.
Also, fairness requires that you provide as much storage to the grid as what you're consuming from the grid. But you can't use an intermittently connected laptop as a storage node. Or, at least, it's would be a pretty unreliable storage node.
My solution to that is to back laptops up to nearline, always-on storage at LAN/WLAN speeds and then have that trickle the data up to the grid at ADSL/Cable speeds. In particular I want to take OpenWrt or similar and add a Tahoe client node, so that you can attach, say, a 1 TB USB drive to your wireless router and use it as nearline backup plus give it responsibility for getting the data pushed out to the grid. And it will also act as a storage node in the grid.
Using the default parameters, a 1 TB drive would give you about 230 GB of nearline backup, plus would provide your contribution to the grid for online backup of that much data (storing 770 GB of others' encoded data). So for the default encoding parameters, you'd need 4.33x as much nearline storage as what you're going to back up and that would effectively give you 4.33 backups of every file, one of them nearline and the others offsite. Less conservative parameters would reduce that amount.
I'm curious how you are doing rsync diffs for encrypted Reed-Solomon fragments.
I do the diffs on the files before encryption and FEC (Forward Error Correction) encoding. Because the storage servers cannot decrypt the files, there's obviously no way to "patch" the backed-up files, so the diffs have to be stored as a chain of forward deltas. Forward deltas suck for many reasons, but particularly they damage reliability, since the loss of any delta in the chain means that all subsequent versions are lost as well. I'm currently addressing this by limiting the length of the delta chain before uploading a new full version, but my plan eventually is to make those decisions statistically.
I have written an incomplete paper describing how to calculate loss probabilities in a distributed, FEC-based grid, and I eventually plan to offer GridBackup users the option to set their desired reliability level, with some description of the impact it will have on upload times, storage used, etc. I'll then use that reliability figure to automatically select FEC encoding parameters, and to choose when to break delta chains, so that the user's reliability requirement is met.
The other major downside of forward deltas is that if the chain is long enough, it can take much more time to download the chain than it would to download a final version. Because of the ADSL/Cable asymmetry, my design focuses on optimizing upload performance, but when I move to using statistical criteria for chain-breaking decisions, I'll also do some size calculations. The Mercurial developers (Mercurial also uses forward deltas) have found that a good rule of thumb is that when the size of the delta chain exceeds twice the size of the full version, it's time to break the chain.
Oh, you may also have been asking about how I create the diffs efficiently. I store local copies of the rsync file "signatures" for files that have been backed up into t
-
Re:Privacy oriented paranoia
Something like this? http://allmydata.org/trac/tahoe
Tahoe, the Least-Authority Filesystem. This is a secure, decentralized, fault-tolerant filesystem. All of the source code is available under a choice of two Free Software, Open Source licences.
This filesystem is encrypted and spread over multiple peers in such a way that it continues to function even when some of the peers are unavailable, malfunctioning, or malicious.
-
Re:Interesting...BBB report...
For fun I've been working on a statistical model of loss probabilities
Probably no one cares, but I just noticed the file at that link is out of date. The current version of the document is still pretty rough, but it's better than that one. Here is a better link.
-
Re:Interesting...BBB report...
For fun I've been working on a statistical model of loss probabilities
Probably no one cares, but I just noticed the file at that link is out of date. The current version of the document is still pretty rough, but it's better than that one. Here is a better link.
-
Re:Interesting...BBB report...
So now I have to make a decision to continue using a service that has shown itself to be less than scrupulous or drop a company that I have been very happy with.
If you end up in a hunt for a replacement, I know of two: Mozy and Allmydata. FWIW, I recommend Allmydata. Their software is open source, which provides some stronger assurances on the security front (assuming it gets reviewed, of course), and it doesn't delete your files 30 days after you do. Allmydata keeps everything forever and gives you a web-based "time machine" sort of view for looking at old stuff. Also, they use some cool technology to minimize the chance that they'll EVER lose your data. For fun I've been working on a statistical model of loss probabilities of files stored using their approach, and assuming some reasonable numbers for professionally-managed server uptimes, the probability of losing your data is on the order of 1 in 10^9.
Of course, their service does cost $10 per month, not $5. Mainly because they keep a full backup history, so you consume more of their storage than if they deleted old stuff.
Disclosure: My only affiliation with allmydata is that I'm hacking a P2P backup system based on their code. In the process, I've had a some personal interaction with their developers via IRC and e-mail, and that interaction has given me a very positive impression of both their technology and their organization.
-
Re:I'm not surprised...
Have you heard their ads? They sound like a scam just from that.
Not just their ads. There are all kinds of small claims on their web site that smell like snake oil. Stuff like: "We encrypt your files twice before backing them up securely offsite, using the same encryption techniques that banks use."
Twice? Really. I guess if once is good, then twice is better?
If you're in the market for something like this, I suggest taking a look at allmydata.com. It costs more ($10 per month for unlimited storage, rather than $5), but unlike Carbonite and Mozy it keeps all of your data forever, rather than deleting files 30 days after you delete them from your computer. It gives you a web-based "Time Machine" view that lets you see your data as it was on any given date.
Allmydata is also very geek-chic, since it's all open source, and uses erasure coding on your files to ensure reliability even if a server (or seven!) die. I believe they use a 3-of-10 scheme, where 10 shares of each file are distributed across 10 different servers, and any three of them are enough to recover the file.
Disclaimer: I don't work for allmydata.com, but I am using their distributed file system software to build my own P2P backup solution, to make it easy for groups of friends and family to set up their own backup systems by sharing parts of their drives over the net. The allmydata developers are very supportive of this effort, even though it's a potential competitor with their commercial offering.
-
Tahoe
Sounds like tahoe could be what you want. However, it's pretty young, so I wouldn't necessarily rely on it yet.
-
Allmydata Tahoe
I'm not sure if it's precisely what you were looking for, but Allmydata Tahoe looks like an interesting possibility. It sounds like it meets most of your criteria, except versioning, but I think it exposes a FUSE interface, so you could probably just run a versioning filesystem like Git atop it.
-
Re:Tahoe - an open source alternative
*** Potential bias alert: I'm involved in both the Tahoe project at allmydata.org and the commercial online storage service at Allmydata.com ***
I wanted to add to the above comment that we (Allmydata.com) also tried out a business model where the software agent on each machine was a peer storage node on the storage grid. For many of the reasons that have already been mentioned, this model did not gain acceptance. Technically, peer node churn is logistically complex to efficiently manage, and socially it is difficult for people to accept storage of strangers' data (encrypted, encoded, or not). We now use the p2p storage grid only on servers that we manage and the clients are effectively encryption and transfer agents. This gives us a cost-advantage on the server side (easy to manage, cheap hardware can be used), but doesn't expose us to some of the other marketing and technical issues.
-
For Free Software Zealots Like Me...
Looks fun and all, but it's proprietary, so what's the use? It's probably full of FBI back doors, or at the very least, marketing dept. back doors. Even without evilness on their part, do you really trust their precompiled java binary to encrypt your data in a way you can't inspect, or can't verify with people who know more about crypto? If you really need something like this, go with what others have posted, and try http://allmydata.org/trac/tahoe
-
Tahoe - an open source alternative
I would recommend taking a good look at Tahoe, from allmydata.org. This is an open source project that uses a conceptually similar file dispersal system for backup, but it has been designed and reviewed by expert cryptographers. There is also a commercial version available at allmydata.com which has generously sponsored the open source project. Tahoe is working on Windows, Mac, Linux and other Unix style systems.
Tahoe does have a minimal dependency on a central server to first learn about the peer nodes that hold data, but only for the initial callup - once the client is running, it remembers all the peers it is using. And they are working towards eliminating even this dependency with "gossip" introductions, so if you can connect to any peer you can learn of all the others. Everything is cryptographically protected with encryption and signatures to make it effectively impossible for anyone to see the contents of your files without your permission.
-
How is this different to Tahoe?
Doesn't Tahoe already do this?
-
This may fill the bill...
Take a look at Allmydata Tahoe. I think it will do what you're looking to do. It also sounds robust as well. Hope this helps.
-
Allmydata "Tahoe"
I do some work for Allmydata, which an online storage provider. Their next-gen storage technology is open source and nearly perfect for this application. It's a bit green at this point, but coming along nicely. http://www.allmydata.org/
-
tahoe?
It's probably not there yet for you, but you might want to keep an eye on AllMyData's Tahoe project.