Using P2P for Legitimate Applications?
scum-o asks: "Where I work, we move a lot of large weather data around and there's always a question of whether someone's already found the data that I need to use (many projects use the same data, but it needs to get refreshed several times a day). My brilliant idea was to use a P2P-like network to search for already-existing data and use that in my app (and if none found, go to the original source). My company has a fast network and I'd much rather have my app suck the data from someone else in my company who's already grabbed the data as opposed to pounding on the public ftp server (which is slow and horribly abused each day). Has anyone found any way to use the P2P-network for legitimate reasons other than just file swapping/sharing and stuff? Also, how would I go about this, can I just grab a gnutella API and start searching?"
Gnucleus has a LAN-only mode.
You can always use the LimeWire core and build something ontop of it that automates exactly what you want it to do.
Check out:
limewire.org and the javadocs.
You can use the Agent API to do network crawls like what you are talking about.
But why isn't everyone uploading their data to a central server anyway?
You could make a intranet repository for all data downloaded and have a bittorent posted for each.
... but when it was time for me to download Mandrake, the first place I looked was on Kazaa. I wasn't worried about it taking all night, but I didn't want to cause extra expense to the people providing this for free. I was disappointed I couldn't find it this way.
I also became a Robotech fan, resulting in the purchase of a number of DVDs. Technically that's not legitimate, but I would argue that because I couldn't find it on TV, finding a handful of apps on P2P wasn't necessarily that bad. Afterall, I was evaluating the series. Seems to me that in the case of series like this being released on DVD, having a few free sample episodes floating around the web would be an awesome marketing tool. There's lots of shows out there I'm curious about but I'm not about to spend $20 for a show I may or may not like.
"Derp de derp."
...and stop reinventing the wheel.
It won't be a solution, because after a while people will get used to having the network and being able to find the documents on other people's machines, and they will ask for text searching, version control, and so on. Not sure what Microsoft wants for the Sharepoint and what are the Linux alternatives, but it seems to be worth it in your case.
Why not setup a cron a job to check the timestamp of the remote file and if it's newer than the local copy, download it to a machine in your office, and share it from there? No need for P2P really. I would think such a script could be written in a few minutes, and the file could be shared with Samba, and then everybody would have the latest version. Run the script every hour or whatever.
rooooar
What's wrong with the simple solution of just putting a proxy server in to cache the data from the original site?
Isn't that what it was designed for in the first place? Peer to peer file sharing in a trusted enviornment?
"All I want is a warm bed and a kind word and unlimited power." - Ashleigh Brilliant
While I'm sure many others are typing the same thing as I use this, why not set up bittorrent; it's nearly perfect for your application - if there are peers who already have the data, then it'll grab it off them, otherwise it'll grab it from the server.
The source is available, too.
I've grabbed numerous CD images (OpenGroupware, Knoppix, hell even Slackware) through BitTorrent. I imagine news broadcasts and whatnot, if popular, would also do well through 'torrent.
File sharing is "illegitimate" because of the files, not the sharing.
:)
If you're a small garage band trying to advertize yourself, there's nothing wrong with throwing mp3s of your performances on kazaa. Anything else that you created yourself is legitimate, too. Same with uncopyrighted works (like the complete works of Shakespeare, for example).
The only real problem with file sharing is that nobody wants that stuff, they all want the copyrighted stuff
Oh, and I downloaded Mandrake, RedHat, and Knoppix ISOs from BitTorrent. Those were totally legit uses.
Check out this story at vulns.com on how P2P apps bring more than just legal threats. OK... I wrote it in my spare time....
Anyway, it's entirely possible to legally distribute content via p2p, I'm amazed that nobody has really turned it into a fully fledged service.
After years of working on MODs, I've come to realize that getting custom content for any game is a pain because of the domination of swaped download sites and the alternative of paying for downloads. So with some help from a few other MOD developers, we've been working on making a MOD P2P network application that we hope to launch by the end of the year (grain of salt, we're MOD developers, we love to delay shit). Originally we were just going to make it for Half Life since they have the biggest community, but there was enough desire for it to support other games that we scrapped our original structure and have decided to make it more scalable. This way we can add the capability for new games as they come out (see Gamespy Arcade for the model we're working off of). Right now our main problem is workng out how to keep this legal. Since a lot of game content is IP, we have to be careful not to allow those files to be shared. So once we find a lock-out scheme we like that doesn't make the program useless, we'll be a lot happier.
Hopefully this thing works well and will spur game developers to support the concept (winkwink). I fully anticipate games to ship with in-game P2P content delivery systems in the future. Integrating that with chatrooms and game lobbies is the next logical progression. Share some levels or models while you chat it up. Release new mods through P2P and stage a chat release party, all within the game architecture. Plus, the user base is already trained in the software.
It's not stupid. It's advanced.
All you need is to imagine a Beowulf cluster of...
Waitaminute!
You actually could think of this as a Beowulf cluster! The main twist is that each node in the network is being used interactively, rather than just acting as a slave that churns away on data chunks autonomously.
You don't state what kind of systems your colleagues are using, but if you're using Macs, then Rendezvous mDNS networking can take care of the "plumbing" part of the problem for you -- everyone can instantly start publishing their shared resources, and the trick then is to just figure out a way to search who has what content.
The search function could be done from a machine set up to automatically spider everyone's content & basically set up a little in-house search engine, with links back to each user's version of "http://johndoe.local/weather/data/2003/08/21/1530 _nws" or whatever.
If you're not running Macs, well that's a problem on several levels :), but the mDNS spec is an open standard, and it is IIRC available as an Apache module. There's mod_rendezvous , but it seems to have stalled with an OSX version only -- porting to Linux shouldn't be bad but is left as an exercise for the reader. There also seems to be the Net::MDNS::Server & Net::MDNS::Client Perl modules on CPAN, but they seem to have been born & stalled in the same week back in June. Not sure what that means.
In any case, if you can set up a spontaneous mDNS network, then that would solve the problem of getting every node on your network to be able to advertise what resources are available to other nodes on the network. The step after that is to set up a search interface, and that's really a solved problem -- any Perl hacker comfortable with LWP should be able to whip up a reasonably good search mechanism using &/or extending existing tools.
If you manage to get this to work, it would be interesting to read a writeup of how the lego parts end up being assembled :-)
DO NOT LEAVE IT IS NOT REAL
Sun Micro is working on just this thing for enterprise-level p2p within a corporate network.
or Konspire
http://konspire.sourceforge.net/Nothing in the world is more dangerous than sincere ignorance and conscientious stupidity.
Justablip is an indie electronic music label started by Kris Weston from the Orb. we release music under an 'open source'-like license & encourage people to share the music via p2p (or however they wish)...we also use Ogg, FLAC, & bittorrent.
Here's a link to download their first release, WTF! The Madonna Remix comp in Ogg, MP3, or AAC...
also check out the ARTICLES on the site, I think people on here would be into it.
ant
--
))
((
c[_] bLiP
www.justablip.co.uk
Try these protocls/apps and run a P2P network in-house
JXTA
BitTorrent
Or, you can create your own Gnutella client, using some an open-source Gnutella package, like JTella
Why is everyone so quick to mention BitTorrent? It's totally inappropriate for this. Here's why: in order to make something available for download with the BitTorrent protocol, you have to create the .torrent metadata file which contains SHA1 hashes of all of the segments of the file. In other words, you need to download the dataset from the source server first before you can share it w/BT. Now, the whole point here is that within the organization BW is plentiful and abundant -- the part that he wants to avoid is hitting the origin server unless it's really necessary. So if he were to use BT he'd have to setup a cron job or something to automatically fetch the data and run it through btmakemetafile.py. This means that they data will necessarily be available inside his organization, where BW is basically "free", so what's the point of using BT on the internal network? You might as well just ftp/scp the file from one server to another, no need to go to all the trouble of running a BT server and making a new metafile every time the data file changes.
...And these people would just download it directly from the source, increasing the load on the origin server and completely missing the point.
Additionally using BT would turn out to be -more- wasteful, for two reasons: One, because to make the data available you'd have to automatically retrieve it from the origin server, regardless of if there is a demand for it or not. Secondly, BT is still a somewhat of a niche protocol and so there's a good chance that there would be people that say "screw this, I don't want to instll Python and wxWindows just to get this file that I can download with Mozilla in about 3 seconds."
I'm with the person that said setup a caching proxy server. Squid will do this perfectly, and it doesn't involve making the users change at all -- it's all behind the scenes if you set it up as a transparent proxy. There will be no wasteful cron-job downloading since Squid will simply cache whatever the users are requesting -- if no one needs data for some period there's no point in wasting the weather site's bandwidth on some cron job ftp thing.
Please don't be so fast to suggest something like BitTorrent just because it's trendy.
Let's say Site A provides the data on a non-commercial basis freely to users but charges for commercial use. They require you to go through an agreement in order to get this data for personal use or else sign up and get a license for commercial use... Site B can't just go and get this data from Site A and procede to share it out. Because those who got it from Site B would not have clicked through the agreement required on Site A for the download and use of Site A's material.
This P2P sharing of data could open up a legal can-o'-worms. You really do need to check that the data you want to share around is freely distributable and that you really have permission to do so.
Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
Arliweb
Waste would have done just this - create a private, crypographically secure, p2p file sharing network. Alas, AOL decided to kill yet another cool NullSoft invention, apparently breaking the camel's back and resulting in the resignation of Mr. Frankel.
It's 10 PM. Do you know if you're un-American?
Well, yea. You could do that. It might even work. But, wouldn't it be a lot quicker, simpler, easier, cheaper to just add a transparent porxy caching server to your network?
Of course you could use squid for this but, there are also several commercial products that do very well.
The most important thing to remember is to make sure the server is big and powerful. The server must have at least one fast processor, oodles of memory(1 gig or more) and it needs to have a very fast disk subsystem. I recommend hardware RAID1 with 4 or more of the fastest SCSI disks you can find. Also, the RAID controller should have as much on board cache as you can get.
This saves bandwidth and makes for a blessed user experience provided they are not the first to get that particular data.
I thought all p2p was legitimate...=)
Just hang a squid proxy off your router with a huge cache... problem solved.
Then go get a drink, for not having to re-invent the wheel, or cost your company much ( if any ) money.
---- Booth was a patriot ----
when I reread the article, I guessed that you might be asking about saving load on the public FTP server on a larger scale. I.e. providing an option for all other users of the same service an opportunity to help reduce the load. In this case, the best bet is to talk to the service provider and use whatever they are willing to use!
Ask the public FTP provider to put up Torrent files, or equivalent, right on the FTP site, or in a README. Alternatively, if they aren't interested or are too busy to set it up themselves, ask them to put a description and link to a webpage that YOU control in a readme file. Something along the lines of, "Our servers are busy. A good samaritan maintains an alternate download method at this URL. You may get faster results from there." Perhaps toss in some MD5 hashes so that people can verify data if they so desire, and then hope for the best. With any luck, your company wins, other groups win, and the provider wins.
Pretty much any P2P network would work, but you would probably have more support from IT groups for BitTorrent than eMule/eDonkey or Gnucleus.
If you're a small garage band trying to advertize yourself, there's nothing wrong with throwing mp3s of your performances on kazaa.
What about the copyrights owned by the songwriters?
Will I retire or break 10K?
another great thing to use P2P for would be spam filtering. If i get another penis enlargement mail and delete it, it will be deleted on my neighbours computer too, and deleted on all other computers that are running the p2p filter client. This technique is being used by cloudmark :-)
Most of the the P2P research papers that I've read seem to be nothing more than a re-hash of the parralel computing papers in the eighties.
Granted, back then they were thinking of upwards of tens of nodes, whereas now these speak in the thousands+ nodes. Surprise! The math works out the same regardless of whether x=25 or x=2500. Wow. Give the man a doctorate.
Yes there is a legit use for a lot of P2P. I work at Verizon. I have also worked for Boeing and have a couple open source and one private venture. All use P2P in 100% legitimate applications and only one is sharing files.
Use JXTA. It is mature and far better than Gnutella because of the level of control and diversity. Gnutella just shares files and can't really improve the efficiency of distribution.
Head to jxta.org to start.