Slashdot Mirror


Next Generation of Gnutella

ResearchBuzz writes "Wired is reporting the announcement of gPulp (general Purpose Location Protocol), an open source technology for search engines. From the article: "It is based on the Gnutella structure, an open source application originally created by Nullsoft....Using the basic protocols originally developed for Gnutella, gPulp will search for information across a network in real time.""

29 of 70 comments (clear)

  1. Re:A bit exaggerated by snookums · · Score: 2

    The only way to solve the bandwidth issue is by aggressive caching of search results.

    I had an idea the other day, after reading the "Gnutella is dead" article -- why don't we build a multi-rooted tree system like the DNS, with information classified by content type and topic?
    Each category and sub-category could have autoritative servers that keep lists of filename/type/description -> URI mappings and pointers to authorities on more specific topics. All server on the way down (right down to the one running on your PC) can cache results until the TTL expires.

    Lists of "root" servers must, of course, be reasonably large and (geopolitically) diverse, and it must be relatively simple for anyone to express their intention to become an authoritative server for a given tpoic, otherwise the system becomes susceptible to censorship and regulation (as the DNS is at the moment).

    Anyone have any thoughts on this? It's only a rought idea and there are techical details that I haven't taken the time to think through (e.g. what would be the mechanism for registering your content with the authoritative servers?, how would the system defend against "polution"?, etc.)


    --
    Be careful. People in masks cannot be trusted.
  2. Re:A bit exaggerated by grahamsz · · Score: 2

    Caching of results (and even files) is certainly a reasonable idea in such Peer to Peer systems.

    It does however start to create problems of its own. One feeling is that gnutella needs to be more anonymous. This would most likely be done by propagating a public key along with each search query and then accepting the results in an encrypted form - hence no intermediatary could cache results.

    The other difficulty with complex caching mechanisms is the potential for them to be badly implemented. Unfortunately these peer to peer systems rely on clients being properly written and properly configured. Gnutella still finds about 1/3rd of it's network traffic is consumed by a bug in the Pong response that caused any pong message to propagate to the entire network. If we start on with complex demand vs availability style caching then problems will no doubt emerge.

    The other problem with things like this is the possibility for abuse. Consider just what our DNS system would look like if anyone could add information to it - you'd end up with more entries like (sisgo.net - nslookup it - it's weird).

    Gnutella is already struggling to fight the script kiddies :(

    The only solution I can envisage would be some kind of star shaped topology. Take the basic gnutella network style and use it with people on t1+ connections to build a network backbone.

    Then onto them you can start hanging the cable modem and DSL users, and then each CM user can hold a couple of modem users. That MIGHT make it work.

    Then it would be the responsibility of the faster users whether or not to forward search queries... that way a search for something common like 'metallica' would never go further than the local group, but when u search for something bizarre it would go all over the network.

    The other way this could be extended is by having the slower users upload the lists of files they are sharing to the faster users where they could be cached.

    So we have a set of fast servers which are capable of answering on behalf of anyone else.... hang on that's napster.

    D'oh

  3. How long before... by Vladinator · · Score: 3

    ... search engines are also able to use this? gPulp/Gnutella clients alone will suck bandwith, but what happens when Yahoo/Lycos/AskJeeves, etc. see this as cutting into thier market share, and decided to do this searching for thier users? One plus to this might be that they could cache such hits, and could allow searching that didn't suck bandwidth, but with the imperminent nature of such connections, I bet they'd do a scan EVERY time someone looked for "Eminem*.mp3" at ftpsearch.lycos.com or some such.


    Fawking Trolls!

    --

    "Going to war without France is like going deer hunting without your accordion." - Jed Babbin

  4. Re:However by ashpool7 · · Score: 2

    Are people really prepared to sit their for that long (probably a *lot* longer) in front of their browser for the results? Most research indicates that people who have to wait more than 5 seconds for a webpage to download get bored and go elsewhere.

    Napster searches routinely take 8-15 seconds for me (campus ethernet) and everybody else on my hall and nobody cares. When you're pretty assured you'll find what you want by putting in the right searches (as opposed to sorting links) people are willing to wait. The "web user getting bored" premise doesn't work here, mainly, because this isn't the web. ;)

  5. Re:P2P Network Searching by MrShiny · · Score: 2
    Not really. A search technology like this would practically eliminate the need for crawling robots since all the crawling happens intrinsically as part of the protocol.

    The impression I get from the article is that this could be used as a generic search technology.. one network that searches any type of resource i.e. web pages, file sharing, phone directories etc. It would just be a matter of writing a gPulp node server for whatever type of resource you wanted to add to the network.

    We will also need a standard result format so that the search client can figure out what to do with them automagically. We could use URLs, but types would have to be added for every kind of resource out there.

  6. Re:Can anybody answer a question? by bockman · · Score: 2
    Only the French?I don't think so.

    And here we adhore that stuff (IANAF).
    Please respect our religion.

    --
    Ciao

    ----

    FB

  7. The Gnutella protocol has big problems by WolfWithoutAClause · · Score: 3

    The problem with the GNUTELLA protocol is that it it is quite inefficient, and it collapses completely at heavy load.

    The GNUTELLA protocol sends one message per search through the entire network upto the horizon.

    Eventually when enough people are in the network the individual links collapse under the load and the network falls apart.

    Nothing can completely solve this problem, but graceful degradation can certainly be designed for, and the way that the GNUTELLA protocol uses bandwidth can be very much improved, allowing for many more users, but at reduced horizon sizes.

    The current protocol wastes bandwidth in atleast one BIG way: it sends many short messages rather than one big one containing the requests.

    The reason that that is a waste is that each short message has a fixed sized overhead, at the TCP level. This means the useful percentage of bandwidth is significantly smaller than it might be if large messages were sent.

    Therefore it pays to hold off each search request for maybe 1 second before passing all the search requests on to the neighbours- the searching will be maybe 10 seconds slower due to the artificial delay (compared to 2 minutes; also offsetting this is the reduction in bandwidth), but possibly 50-100% as many users can be catered for.

    Secondly the behaviour at collapse can be much improved. To implement the above behaviour, each client should keep a list where it keeps requests/replies before forwarding them off to neighbours.

    If a request is held onto for too long without having a chance to pass it on then it should get thrown away (giving preference to low hop messages). That means that the horizon self tunes- avoiding collapse and giving better search performance at small network sizes.

    Otherwise death of the GNUTELLA net is predicted...

    --

    -WolfWithoutAClause

    "Gravity is only a theory, not a fact!"
  8. Re:GUIDS by garnier · · Score: 2

    test

  9. Re:GUIDS by garnier · · Score: 2

    test2

  10. Re:GUIDS by garnier · · Score: 2

    test3

  11. Re:GUIDS by garnier · · Score: 2

    test4

  12. Re:GUIDS by garnier · · Score: 2

    test5

  13. gPulp location by Frijoles · · Score: 3

    From the article, you can find gPulp at http://gnutellang.wego.com/.

    --
    -Frijoles-
    1. Re:gPulp location by warmcat · · Score: 3

      There are some very interesting proposals for the next version listed on the gPulp site.

      When I first saw it was ''talk before code'' my heart sank, but some of the proposals are actually very good.

      Also, I noticed at the Gnotella page that the author is pointing out that Gnutella has being going downhill - no wonder, I find it much harder to get anything useful from it than with Napster.

  14. P2P Network Searching by zpengo · · Score: 3
    Pretty soon every C: drive will have to have a robots.txt file.

    --


    Got Rhinos?
    1. Re:P2P Network Searching by The-Pheon · · Score: 2
      Pretty soon every C: drive will have to have a robots.txt file

      Too bad they can just ignore them, *cough* *cough*

  15. Can anybody answer a question? by Jon+Erikson · · Score: 2

    Can somebody tell me why the FSF named a piece of their software after a truly grim piece of food that only the French enjoy? Is this to be followed by GnuBovril, GnuMarmite and GnuGrits?

    ---
    Jon E. Erikson

    --

    Jon Erikson, IT guru

    1. Re:Can anybody answer a question? by seanmeister · · Score: 2
  16. Use Agents to Reduce Bandwidth? by oni · · Score: 2

    Whatever happened to the idea of send a small program to a site to sift through its data and make intelligent choices about what to send back to the client?

    As opposed to the Gnutella, Napster, even HTTP protocols in which every request results in a huge number of hits and lots of extraneous file transfers.

  17. Questions by Luminous · · Score: 3
    When I read this story on Wired this morning, I was first very excited. I was thinking this had to be the groundbreaking p2p app. But when I saw it was based on the Gnutella structure, I recalled a previous discussion on Slashdot where it was said Gnutella had maxed.

    I am assuming when they say based on Gnutella, they have 'fixed' some of the problems. Or, maybe it really isn't searching the entire network, but key segments of the network. And maybe it just ignores 56k connections altogether.

    In your honest opinions, what is the viability of gPulp? Is this a bandwagon that deserves support or should we (okay, you real programmers) continue to develop a more robust and 'intuitive' P2P system. I do believe a well-built, user friendly P2P app will take the internet by storm. We've only scratched the surface. What I am afraid of is instead of looking down the road and seeing what the requirements and capacity of the P2P app will be, the development community will continue to add and tweak the current flawed or underpowered systems.

    --
    This is not the way to build a lasting empire.
  18. Talk about net congestion... by psychofox · · Score: 2

    I'll be interested to see how they deal with the problem of congestion. I know that I hit search engines around fifty times a day. If each one of my (and everyone elses) queries were distributed a la Gnutella, then surely a significant proportion of total bandwidth would be used by those queries alone. Presently, each one of my queries creates a tiny amount of traffic. This is an efficient system. I realise the cost involved in hosting something like google, but google seems quite happy to bear it... I can only see gPulp being a success if it offered a quality of result not available with current systems. A system like gPulp should be fine for specialised searching (i.e. for phone numbers or mp3s) but I doubt it would be able to outperform the brute force of google in a more general case.

  19. However by Mr_Silver · · Score: 3
    Lets hope that it doesn't suffer from the same problems as Gnutella.

    The technology of the Gnutella system is limited by the version that the majority use. In other words, if version 2.0 is out and has some cool new features, it'll be useless if the majority of people are using 1.0 because they won't recognise the new stuff.

    An example, say my version of gnutella client can do regular expressions. Throw a regular expression at another (non RE supporting) client and it'll either think its a normal search string (in which case you'll probably get nothing back) or it'll throw it away (so you'll also get nothing back). You can't win.

    You can't make people update, if you made v2 not badwards compatible them you'd fragment the network. Napster may be peer to peer but if they release a new client with new features then those people who download it get those features immediately (unless they require someone else with a newer client of course)

    My other concern is the spead of searches. Again the network is limited by the majority. With speed its going to be all those people on modems. Have they done any testing on a very large scale? Does anyone know how much faster the network would be without all the people on modems?

    In the end, either this will work and be groovy or it'll flop or it'll work for a while, overload and die.

    One final thought, you have to wait at least 20 seconds to get a decent number of results back. Are people really prepared to sit their for that long (probably a *lot* longer) in front of their browser for the results? Most research indicates that people who have to wait more than 5 seconds for a webpage to download get bored and go elsewhere.

    --

    --
    Avantslash - View Slashdot cleanly on your mobile phone.
  20. A bit exaggerated by grahamsz · · Score: 5

    Certainly in the context of using it on a Lan to locate things this could be a very powerful tool. However how many sysadmins would like to have to secure every workstation from hackers instead of just every key server... fun stuff.

    However on the internet it's doomed to failure. I followed the work on GnutellaNG for a little while and it seemed at that point to be involved in attempting to reduce the bandwidth requirements of gnutella by slimming down the protocol, whilst simultanously increasing the functions and hence bandwidth requirements.

    Ultimately any P2P system is limited by the outbound bandwidth that each user has. At the moment with about 3000 host on Gnutella you are using about 1.5-2kbytes/s for each connection you have open (most ppl have 2 - 4) plus that doesn't include bandwidth left to upload or download.

    Curiously though this would be the most optimum way of doing things (not gnutella in particularl but p2p) if it weren't for the fact that we have so little bandwidth at the end user.

    Even cable modem users typically have only 128kbit upstream, which will only take gnutella to about the 10,000 user mark before it starts to fall over again. The same has to be true of any raw peer to peer system.

    No amount of optimisation will reduce the bandwidth requirements of any search having to be executed on any host.

    Freenet on the other hand is a lot smarter than that and does actually move information about in a streamlined manner. Unfortunately I fear that freenet would fall over and die right now if it were holding the terabytes of files that gnutella does - so it appears not to be the best solution either.

    We need more bandwidth at end users and less at big corporations, except that would count as empowering the people and be morally repulsive to most politicians.

  21. Wishes by relinquish · · Score: 2
    A few things I'd find interesting from gPulp, Freenet or what have you.

    IP tunneling - recreate a whole new web, allocate domain names freely and to first bidder - corporates will never have their say

    Interest sheets - by filling out an interest sheet for your node, the software connects tends to connect you to nodes sharing your interests thereby enhacing search retrieval speed on your topics of interest.

    Automatic load balancing - if I'm making a heavy use of non freenet services, reduces resources bandwith allocated to freenet. If I'm doing something else (or not using my computer)gives full bandwith to freenet, helps indexing/caching other nodes etc.

    On a side note, idealy, I think freenet should concentrate on making a secure encrypted overlay net protocol (IP tunneling, means of logging/using freenet anonymously) while having gPulp and the such concentrate on the file sharing.

    Please note, I'm not pretending to have a full grasp of the issues - these are just a few ideas.

    --
    Relinquish
  22. Distributed Operating System by roman_mir · · Score: 2

    Currently there are no real distributed operating systems that comply with all the requirements of such system. Many have attempted to use Solaris as such a system, however no Unix based OS can be called trully distributed. GNUtella, gPulp, FreeNet, HotLine etc. are only file searching and delivery systems but it seems that the logical step from their approaches should be the creation of a real distributed operating system, where all computers are resources of this OS and all users loged into one huge computer. If future versions of Linux could achieve this goal, Linux would become the final frontier of OSs

  23. Simple answer: by Enoch+Root · · Score: 2

    They're running out of stupid puns starting in 'GNU'.

  24. This will never fly... by sitcoman · · Score: 2
    ...unless these guys come up with an entirely new and better protocol.

    Ever notice how Gnutella just loves to suck all your bandwidth up, as twenty billion people search for "britney spears cumshot"? Well just imagine the joy of having the entire Internet glutted with that kind of crap.

    From their site, it looks as though they're still asking for proposals. So let's hope someone has better ideas than "Gnutella to the Stars!!!" if they hope anyone will run it on their servers.

    -=20

    --

    -=20
    me doesn't live for do [DEPRECATED]

  25. Instant Messaging by Dr.+Evil · · Score: 2

    Does anybody see an IM application using Gnutella like protocols? Register yourself with your friends and maybe one of thousands of semi-static servers and totally negate the need for AOL/Netscape/Mirabilis and... who's left... Microsoft and Yahoo/Geocities?

  26. I love the idea behind GNUTELLA by kriegman · · Score: 2

    but I haven't heard anybody address the fundamental problem with it: its vulnerability to attack. The best example is advertisers responding to every query with html garbage. But it seems to me that anyone who dislikes the service will do the same.