Slashdot Mirror


Using P2P for Legitimate Applications?

scum-o asks: "Where I work, we move a lot of large weather data around and there's always a question of whether someone's already found the data that I need to use (many projects use the same data, but it needs to get refreshed several times a day). My brilliant idea was to use a P2P-like network to search for already-existing data and use that in my app (and if none found, go to the original source). My company has a fast network and I'd much rather have my app suck the data from someone else in my company who's already grabbed the data as opposed to pounding on the public ftp server (which is slow and horribly abused each day). Has anyone found any way to use the P2P-network for legitimate reasons other than just file swapping/sharing and stuff? Also, how would I go about this, can I just grab a gnutella API and start searching?"

11 of 50 comments (clear)

  1. Gnucleus by Bistronaut · · Score: 2, Informative

    Gnucleus has a LAN-only mode.

  2. Bitorrent by arcadum · · Score: 4, Informative

    You could make a intranet repository for all data downloaded and have a bittorent posted for each.

  3. Bittorrent by FrenZon · · Score: 4, Informative

    While I'm sure many others are typing the same thing as I use this, why not set up bittorrent; it's nearly perfect for your application - if there are peers who already have the data, then it'll grab it off them, otherwise it'll grab it from the server.

    The source is available, too.

  4. Bittorrent by tzanger · · Score: 4, Informative

    I've grabbed numerous CD images (OpenGroupware, Knoppix, hell even Slackware) through BitTorrent. I imagine news broadcasts and whatnot, if popular, would also do well through 'torrent.

  5. Semi Relevant Article by szyzyg · · Score: 3, Informative

    Check out this story at vulns.com on how P2P apps bring more than just legal threats. OK... I wrote it in my spare time....

    Anyway, it's entirely possible to legally distribute content via p2p, I'm amazed that nobody has really turned it into a fully fledged service.

  6. Grendel by babbage · · Score: 3, Informative

    All you need is to imagine a Beowulf cluster of...

    Waitaminute!

    You actually could think of this as a Beowulf cluster! The main twist is that each node in the network is being used interactively, rather than just acting as a slave that churns away on data chunks autonomously.

    You don't state what kind of systems your colleagues are using, but if you're using Macs, then Rendezvous mDNS networking can take care of the "plumbing" part of the problem for you -- everyone can instantly start publishing their shared resources, and the trick then is to just figure out a way to search who has what content.

    The search function could be done from a machine set up to automatically spider everyone's content & basically set up a little in-house search engine, with links back to each user's version of "http://johndoe.local/weather/data/2003/08/21/1530 _nws" or whatever.

    If you're not running Macs, well that's a problem on several levels :), but the mDNS spec is an open standard, and it is IIRC available as an Apache module. There's mod_rendezvous , but it seems to have stalled with an OSX version only -- porting to Linux shouldn't be bad but is left as an exercise for the reader. There also seems to be the Net::MDNS::Server & Net::MDNS::Client Perl modules on CPAN, but they seem to have been born & stalled in the same week back in June. Not sure what that means.

    In any case, if you can set up a spontaneous mDNS network, then that would solve the problem of getting every node on your network to be able to advertise what resources are available to other nodes on the network. The step after that is to set up a search interface, and that's really a solved problem -- any Perl hacker comfortable with LWP should be able to whip up a reasonably good search mechanism using &/or extending existing tools.

    If you manage to get this to work, it would be interesting to read a writeup of how the lego parts end up being assembled :-)

  7. Waste or Konspire by bhima · · Score: 2, Informative
    Use Waste http://waste.sourceforge.net/

    or Konspire

    http://konspire.sourceforge.net/
    --
    Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
  8. Try these P2P protocols by shodson · · Score: 2, Informative

    Try these protocls/apps and run a P2P network in-house

    JXTA
    BitTorrent

    Or, you can create your own Gnutella client, using some an open-source Gnutella package, like JTella

  9. no, not BitTorrent by oobar · · Score: 3, Informative

    Why is everyone so quick to mention BitTorrent? It's totally inappropriate for this. Here's why: in order to make something available for download with the BitTorrent protocol, you have to create the .torrent metadata file which contains SHA1 hashes of all of the segments of the file. In other words, you need to download the dataset from the source server first before you can share it w/BT. Now, the whole point here is that within the organization BW is plentiful and abundant -- the part that he wants to avoid is hitting the origin server unless it's really necessary. So if he were to use BT he'd have to setup a cron job or something to automatically fetch the data and run it through btmakemetafile.py. This means that they data will necessarily be available inside his organization, where BW is basically "free", so what's the point of using BT on the internal network? You might as well just ftp/scp the file from one server to another, no need to go to all the trouble of running a BT server and making a new metafile every time the data file changes.

    Additionally using BT would turn out to be -more- wasteful, for two reasons: One, because to make the data available you'd have to automatically retrieve it from the origin server, regardless of if there is a demand for it or not. Secondly, BT is still a somewhat of a niche protocol and so there's a good chance that there would be people that say "screw this, I don't want to instll Python and wxWindows just to get this file that I can download with Mozilla in about 3 seconds." ...And these people would just download it directly from the source, increasing the load on the origin server and completely missing the point.

    I'm with the person that said setup a caching proxy server. Squid will do this perfectly, and it doesn't involve making the users change at all -- it's all behind the scenes if you set it up as a transparent proxy. There will be no wasteful cron-job downloading since Squid will simply cache whatever the users are requesting -- if no one needs data for some period there's no point in wasting the weather site's bandwidth on some cron job ftp thing.

    Please don't be so fast to suggest something like BitTorrent just because it's trendy.

  10. Re:Why not cron FTP? by Kiaser+Zohsay · · Score: 2, Informative

    ... a cron a job to check the timestamp of the remote file and if it's newer than the local copy, download it...

    rsync, anyone?

    A local mirror of the files from the public ftp server definitely sounds like the way to go. Just make sure everyone in your office knows to grab stuff from the local (ftp|nfs|samba) server. This will provide faster access for everyone in your office, and reduce load on the public server.

    P2P is a solution for when you dont have control over the infrastructure to set up something centralized (ie individual users all over the 'net). Performance is the main concern in this case, rather than decentralization.

    --
    I am not your blowing wind, I am the lightning.
  11. example of legitimate P2P platform and apps... by BigGerman · · Score: 2, Informative