Slashdot Mirror


MojoNation ... Corporate Backup Tool?

zebziggle writes "I've been watching the Mojo Nation project off and on over the last couple of years. Very cool concept. While taking a look at the site recently. They've morphed into Hive Cache a P2P corporate backup solution. Actually, it sounds like a great way to use those spare gigs on the hd."

10 of 122 comments (clear)

  1. Interesting, but ... by King+Of+Chat · · Score: 3, Interesting

    I'd like to know how this fits in with Data Protection legislation (eg UK DPA).

    --
    This sig made only from recycled ASCII
    1. Re:Interesting, but ... by Albanach · · Score: 4, Interesting
      Am I missing something? This is jsut a backup - why would that present a problem under the DPA. You update the databse and the backup is corrected as it is updated, just as would happen in any enterprise situation.

      Do you mean unrestricted access? I don't think this is talking about using Joe Foo's kaaza shared folder to store your company's backup data - it's using unused disk space on the company network, and the web sites states that the backup mesh is encrypted, so unauthorised users may have the file on their disk, but they can't access it.

      Looks to me like all the criterea of the DPA are covered.

  2. Fundamentally flawed by pouwelse · · Score: 3, Informative
    It is always great to see applications of P2P technology. But let's do the math on this one.

    50 PC's in your Intranet, each with a 20GByte disk. Thus your backup need is a cool 1000 GByte, if the disks are all fully filled and fully backed-up...

    For this concept to work you can see that you need to exclude every copy of Dos95/Office from being backed-up. The basis of P2P is the the service users are also the service provides, thus every participating node needs free HD space. Depending on the crypto overhead and your non-backup portion, you still need a lot of free space for this concept. What is the added value above a reduntant RAID server? Is the total cost of ownership really lower?

    MojoNation proposed an awsome concept with their virtual P2P credits. However, this idea seems to suggest that P2P technology increases you HD size, it does not!

    Just my 5 EuroCents,
    J.

    1. Re:Fundamentally flawed by twoshortplanks · · Score: 3, Interesting
      From the website
      As an enterprise IT network grows the amount of redundant data stored on PCs increases significantly. Most PCs have the same applications and OS files, HiveCache takes advantage of this fact to decrease the amount of distributed storage needed for the backup network. By not requiring hundreds of redundant copies of the same MS Word executable and Windows DLL files found on each enterprise PC to be stored multiple times, the HiveCache backup mesh requires only a fraction of the data storage space that would be required by an equivalent tape backup solution.
      In other words, in a coperate enviroment it only backs up each file once - meaning only different files you have on each machine - i.e. the data files and configuration files - need be backed up after the first machine is backed up.
      --
      -- Sorry, I can't think of anything funny to say here.
  3. GPL? by Albanach · · Score: 4, Interesting

    Looks to me like they've also morphed from being a GPL package to a commercial one, with no mention of source code, but several emntions of patents on the web page.

  4. Mojonation and backups by br00tus · · Score: 5, Interesting
    One of the exciting things about p2p are the innovations people come up with. Of course, some innovations are braindead, though take awhile to implement - someone suggested hashing for p2p (Gnutella) in the first thread in which Gnutella was discussed on Slashdot, but it's taken over two years for the major Gnutella developers to implement it.

    P2P falls into two categories nowadays, file sharing (FastTrack/Kazaa, Gnutella/Gnucleus-Shareaza-Limewire-Bearshare, Edonkey2000) or publishing (Freenet and Mnet/Mojonation). Like Freenet, Mojonation was more of a publishing network - users publish data, it gets broken into little chunks, encrypted, and then sent out to other computers, and you receive other people's encrypted chunks on your computer making you a "block server". Content trackers and Publication trackers kept track of the meta-data and where the blocks were, and metatrackers kept track of where the trackers (also called brokers) were. I chatted with zooko, one of the developers, on IRC, he was cool and the ideas were very interesting. Like many dot-com stories, it was ahead of it's time in many ways. They converted Mojonation to the open source MNet , whose CVS tree you can peruse. A lot of it is in Python, a language I do not know.

    The wasted disk space on workstations (and servers) is something thought about by many, especially in large organizations with large networks. My last company began implementing SANs, so that less disk space would be wasted, and the centralization of disk space allowed for greater redundancy and easier backup. They also ran low priority (nice'd) distributed.net processes across the whole network on non-production machines. You can take a guess about how large the network is by seeing that they're still ranked #22 without submitting any keys for a year.

  5. Technical Information by kasperd · · Score: 5, Informative
    Since the technical information is missing I cannot explain how this particular product works, but I can explain how it could be possible to do this.

    For security reasons we absolutely want to encrypt and sign everything stored on the other computers. There is nothing tricky about this part, the usual cryptography can be used without modifications. This is not going to waste any significant amount of storage space or network bandwidth. But it will require some CPU cycles.

    The other not so trivial part of such a system is the redundancy. Reed-Soloman would be one type of redundant coding suitable for the purpose. Parchive also uses this coding.

    I know some implementations are limited to at most 255 shares, but for performance reasons, it is probably not feasible to use a lot more than that anyway. I expect the Reed-Soloman code to be the most CPU hungry part of such a system.

    We need to choose a threshold for the system, I see no reason why the individual users cannot choose their own threshold. If one user want to be able to reconstruct data from 85 shares, there need to be three times as much backing storage as the data being backed up.

    The first approach to storage space would obviously be, that each user can consume as much as he himself makes available to the system. I'd happily spend the 10GB harddisk space needed for two backups of my 1.5GB of important data with a factor three of redundancy. This would if done correctly give a lot better security than most other backup solutions.

    One important aspect you may never forget in such a system is the ability to verify the integrity of backups, I guess this is the most tricky part of the design. Verifying with 100% security that my backup is still intact would require downloading enough data to reconstruct my backup. However verifying with 99.9999999% security could require significantly less samples to be made. Unfortunately here the 255 shares can be a major limitation, the larger the number of shares gets the smaller the percentage of data we need to sample gets. I don't wanna do the exact computations right now, but if 18 randomly picked from the 255 shares are all intact, we have approximately the 9 nines of security that there are indeed 85 intact shares of the 255. So we have indeed limited the network usage by almost a factor of five.

    If we want:
    • Higher security
    • Less network usage for verifications
    • Good network performance even in case of a few percent of lost shares
    we need more than 255 shares of data. There is no theoretical limit to the number of shares, but the CPU usage increases.

    What the system also needs is migration of data as users join and leave the system, and a reliable way to detect users responsible for large amounts of lost shares. Creating public key pairs for each user is probably necessary for this. I think this can be done without the need of a PKI, a user can just create his key pair and then start building a reputation.
    --

    Do you care about the security of your wireless mouse?
  6. Disaster Recovery by MicroBerto · · Score: 3
    While this is a great idea, most corporations will not want to do this for one big reason -- they should be doing off-site backups anyway, in case a disaster strikes the Corp building.

    Say that one of the companies in the WTC had done this. Sure, they woulda had backups when a server blew up, but after the entire building was destroyed, they would have had nothing.

    You never want to put ALL of your marbles into local backups.

    --
    Berto
  7. Re:Mango Medley 97 did this 5 years ago. (before p by harshaw · · Score: 4, Interesting
    Yup, I worked at Mango several years ago on the Medley product.
    The basic premise behind the product was that when someone copied a file into the Medley drive the data pages were instantly "duplexed", meaning that a second copy of a page was made elsewhere in the network. If a node in the network went down causing only one other computer to have a copy of the page, Medley would automatically reduplex, causing the single copy of the page to be propagated to another node in the network. The basic promise of Medley was availability and fault tollerance on a P2P level.
    Very cool concept but the product had a number of severe flaws that are probably obvious to the average slashdot reader.
    • The product ran as a driver in Windows 95/98 and NT. Debugging Medley was an absolute atrocity; I remember at one point having as many as 8 windbg windows open attempting to debug some network wide crash problem.
    • Another problem was that in rare cases a crash on a single machine could bring down other machines in the network. Doh!
    • Servers are cheap, disk is cheap. Once the IT administrator realizes that there is significant complexity in maintaining a Medley network, he or she would realize that it is probably cheaper to buy a RAID enabled server with a ton of disk space.
    • Marketing. Mango couldn't market this product to save themselves. At one point we used a rather rotund Male model called "Waldo" to push the product. Very bizarre.... the product was so complex that it took a while for technical people to understand let alone the average user.
    • The 1.0 product was pushed out the door before it was ready.
    • The company couldn't figure out the appropriate distribution chanel; Vars, direct retail, direct sales, etc. Nothing seemed to work.
    • Finally, the product was over-engineered (sorry guys!)

    The best thing I can say about working on Medley was that it was an opportunity right out of College to work with a number of incredibly excellent engineers on a complex and very interesting problem. Unfortunately, the idea was probably 5 to 10 years ahead of its time.

  8. the family tree of Mojo Nation by Zooko · · Score: 4, Informative

    Mojo Nation was conceived by Jim McCoy and Doug Barnes in the 90's. At the end of the 90's they hired hackers and lawyers and started implementing.

    Their company, Evil Geniuses For A Better Tomorrow, Inc., opened the source code for the basic Mojo Nation node (called a "Mojo Nation Broker") under the LGPL.

    During the long economic winter of 2001, Evil Geniuses ran short of money and laid off the hackers (the lawyers had already served their purpose and were gone).

    One of the hackers, me, Zooko, and a bunch of open source hackers from around the world who had never been Evil Geniuses employees, forked the LGPL code base and produced Mnet.

    Now there is a new commercial company, HiveCache. HiveCache has been founded by Jim McCoy.

    BTW, if you try to use Mnet, be prepared for it not to work. Actually the CVS version works a lot better than the old packaged versions. We would really appreciate some people compiling and testing the CVS version (it is very easy to do, at least on Unix).

    It would be really good if someone would compile the win32 build. We do have one hacker who builds on win32, but we need more.