MojoNation ... Corporate Backup Tool?
zebziggle writes "I've been watching the Mojo Nation project off and on over the last couple of years. Very cool concept. While taking a look at the site recently. They've morphed into Hive Cache a P2P corporate backup solution. Actually, it sounds like a great way to use those spare gigs on the hd."
P2P falls into two categories nowadays, file sharing (FastTrack/Kazaa, Gnutella/Gnucleus-Shareaza-Limewire-Bearshare, Edonkey2000) or publishing (Freenet and Mnet/Mojonation). Like Freenet, Mojonation was more of a publishing network - users publish data, it gets broken into little chunks, encrypted, and then sent out to other computers, and you receive other people's encrypted chunks on your computer making you a "block server". Content trackers and Publication trackers kept track of the meta-data and where the blocks were, and metatrackers kept track of where the trackers (also called brokers) were. I chatted with zooko, one of the developers, on IRC, he was cool and the ideas were very interesting. Like many dot-com stories, it was ahead of it's time in many ways. They converted Mojonation to the open source MNet , whose CVS tree you can peruse. A lot of it is in Python, a language I do not know.
The wasted disk space on workstations (and servers) is something thought about by many, especially in large organizations with large networks. My last company began implementing SANs, so that less disk space would be wasted, and the centralization of disk space allowed for greater redundancy and easier backup. They also ran low priority (nice'd) distributed.net processes across the whole network on non-production machines. You can take a guess about how large the network is by seeing that they're still ranked #22 without submitting any keys for a year.
For security reasons we absolutely want to encrypt and sign everything stored on the other computers. There is nothing tricky about this part, the usual cryptography can be used without modifications. This is not going to waste any significant amount of storage space or network bandwidth. But it will require some CPU cycles.
The other not so trivial part of such a system is the redundancy. Reed-Soloman would be one type of redundant coding suitable for the purpose. Parchive also uses this coding.
I know some implementations are limited to at most 255 shares, but for performance reasons, it is probably not feasible to use a lot more than that anyway. I expect the Reed-Soloman code to be the most CPU hungry part of such a system.
We need to choose a threshold for the system, I see no reason why the individual users cannot choose their own threshold. If one user want to be able to reconstruct data from 85 shares, there need to be three times as much backing storage as the data being backed up.
The first approach to storage space would obviously be, that each user can consume as much as he himself makes available to the system. I'd happily spend the 10GB harddisk space needed for two backups of my 1.5GB of important data with a factor three of redundancy. This would if done correctly give a lot better security than most other backup solutions.
One important aspect you may never forget in such a system is the ability to verify the integrity of backups, I guess this is the most tricky part of the design. Verifying with 100% security that my backup is still intact would require downloading enough data to reconstruct my backup. However verifying with 99.9999999% security could require significantly less samples to be made. Unfortunately here the 255 shares can be a major limitation, the larger the number of shares gets the smaller the percentage of data we need to sample gets. I don't wanna do the exact computations right now, but if 18 randomly picked from the 255 shares are all intact, we have approximately the 9 nines of security that there are indeed 85 intact shares of the 255. So we have indeed limited the network usage by almost a factor of five.
If we want:
- Higher security
- Less network usage for verifications
- Good network performance even in case of a few percent of lost
shares
we need more than 255 shares of data. There is no theoretical limit to the number of shares, but the CPU usage increases.What the system also needs is migration of data as users join and leave the system, and a reliable way to detect users responsible for large amounts of lost shares. Creating public key pairs for each user is probably necessary for this. I think this can be done without the need of a PKI, a user can just create his key pair and then start building a reputation.
Do you care about the security of your wireless mouse?