Slashdot Mirror


Gnutella Technology Powers New Search Engine

Matrium writes: "News.com (owned by CNet) is running an article on how the makers of Gnutella have turned their decentralized model of information swapping away from music and porn, and are now looking at search engines. InfraSearch is still in beta, but it does offer an interesting look in the evolution of the Internet." InfraSearch presently paws through only a few search sites, but as a concept really intrigues me. For one thing, it introduces the long-overdue concept of "how long to search" right into the query dialogue.

29 of 67 comments (clear)

  1. Only collaborative filtering will prevent this. by Paul+Crowley · · Score: 2

    This, and lots of other sorts of spamming, admit only one really good solution: collaborative filtering. You can find out more about this from Berkeley's link farm.
    --

  2. This looks BAD by thomasd · · Score: 2
    Right now, the Internet could really do with some tools which empower the user. This looks like another way for big content providers to herd users where they want them. Traditional search engines have always been a bit of a battle, with content provides trying to find new ways to `stuff' the search results, and make sure that their pages came out on top. Now you don't need to do that any more -- just pay these guys whatever they're asking, and you can display what you want. Including images, by the look of things, which sounds cool, but really just lets provides grab your attention.

    Of course, if this were heavily user-moderated, I guess it might just work. But don't hold your breath. I'll be sticking to Google...

  3. old idea: federated search engines by jetson123 · · Score: 2

    Federated search engines aren't exactly a new idea. The reason why they haven't been used much on the web are issues related to quality of service, reliability, spamming, and revenue sharing. Those could be worked out, but web search engines seem to have found it easier to just centralize resources. Federated search has been more popular within particular communities (e.g., scientific literature, intranet sharing), and for integrating a few top search engines (MetaCrawler etc.). For more information, type "federated search" into Google.

  4. Re:Interesting idea with problems by Kaa · · Score: 2

    [Meta tags] They're still there, and still used correctly by some sites.

    Sure, but does anybody care? Meta tags are now 'tainted' and no search engine even spares as much as a glance in their direction.

    Perhaps collaborative trust-based filtering is the way to go.

    I don't understand how this could work. It's doable for a tree-like reference site (Yahoo), but seems impossible for a pure search engine (Google). Let's say I type "support vector machines Vapnik" into the engine -- who and how is going to filter the results so that I don't get "naked and petrified young girls" matches? The only feasible thing seems to be locking out of IP addresses which supply bogus data, but this will get extremely messy extremely fast.

    You MUST provide an ALT text alternative to any images, otherwise you will drive away viewers

    So instead of a banner I will see "Come to our site for hot chicks, naked and petrified...". I don't think this is going to help much.

    I wasn't really concerned with bandwidth. I was concerned with the fact that a search becomes a request to send targeted advertising to you, and nothing more.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  5. Re:An important step by Wah · · Score: 2

    I'd like to see a distributed version of consumer reports.

    www.deja.com

    But organization helps, so for now, stick with consumer reports.
    --

    --
    +&x
  6. What is the difference to harvest? by gpoul · · Score: 2

    By reading this article I don't really get the difference between this search method used by gnutella and the harvest web indexer. I have to admit that I don't know much about both of them but for me it looks nearly the same.

    1. Re:What is the difference to harvest? by cwhicks · · Score: 2

      Harvester still has a central Broker server that all the Gatherers send the info they have collected. Where as Gnutella is truly decentralized with your searches all going to what would be the Gatherers in Harvester, with no centralization.
      The reason for the Broker server may be because of problems of spam etc. as many have been discussing above.

      --
      - I like pudding.
    2. Re:What is the difference to harvest? by Spankophile · · Score: 2

      o Gnutella gets attention for being a haven for pirates.
      o Nullsoft creates a search engine based on the technology to legitimize it.
      o InfraSearch gets media attention, much fanfare etc.

      . Harvest creates a distributed search engine.
      . Had you heard of it before now?

      It's the same old story, to be heard you have to be contraversial, or rich.

      Gotta love the 'net - created for war, popularized by porn and piracy.

  7. Re:Open Source but not by Junks+Jerzey · · Score: 2
    I really, really hate projects which are "open source" but who refuse to release the source until it's "done." Too many projects these days seem to be following that path, and it's a dangerous one to take. Because what if the code is never truly "finished", as no project ever really is. Its sad.

    It is not sad. You're simply parroting Linus Torvalds's "release early and often" advice. This works for an OS, because compatibility problems are the issue of the day. There are also reasons not to do so:

    Most people don't take the time to upgrade, so they'll miss out on major features that are added later.

    If you release something that is incomplete, people will try it, see that it is incomplete, and have a poor impression from that point on.

    If promising software is released at an early stage, there is likely to be much more "cloning activity," channeling effort into doing things again--the right way--instead of tweaking something that's already most of the way there

  8. dns rehash by shomon2 · · Score: 2

    First time I've seen how this stuff really works, and the implications are really amazing:

    All these technologies are just DNS all over again. DNS was created to make host information available all over the internet. Here's the difference:

    When DNS was set up, it was probably just as easy to try and pirate stuff, and who knows? Maybe people did use the early internet for illicit purposes, but only the few people who were in the know could actually do so. And not much was available on the net anyway. But DNS was created for the exact same purpose as napster and freenet: to make it easy to share information.

    Nowadays, the internet is so big that there are lots of people into it only to make money. The possibility of a scam makes people run to see how they can get their share of it, and a technology like this, however innocent, will make the headlines when everyone rushes over to see what scam (and related lawsuit) they can pull off.

    All these technologies: freenet, napster-likes, all sorts of things, are incredibly valuable extensions of what already provides structure for the wired world. If someone had thought of them in 1980, we would have a much tighter, distributed internet today.

    Well, we've thought of them now. I hope they are allowed to flourish, and that people don't keep just thinking about the negative implications of them. This is the first time I've seen a concrete example of putting it to good use.

    I think we should have the right and the possibility to choose to share what we want to:

    Imagine all the information that our governments gather from us a la enemy of the state for example: with this kind of network idea, peer to peer, we could all be gathering and sharing that information already, and maybe even doing something positive with it!

  9. Developer's mailing list by CMU_Nort · · Score: 2

    This has been an ongoing discussion on the developers list. Some people are for it, others against. The ones who seem to be against it are the ones who want to maintain the simplicity of gnutella. Others are suffering from kitchen-sink syndrome. You have to admit that the general technology behind gnutella could be adapted to a really great real-time web search engine. That is, if they ever get around to releasing the source to it.

    --
    --------- Beware the dragon, for you are crunchy and good with ketchup.
  10. Starting my IPO by mat+catastrophe · · Score: 2
    I think I will go public next week, or maybe tomorrow. I am starting a small firm that will search the search results that were searched by a search engine that searches search engines.

    We're called Search.

    --
    sig not found
  11. Gnutellanet by Dungeon+Dweller · · Score: 2

    Gnutella brings an interesting thought to our Internet, and it is an old one. It is an ever present, self expanding, responsive, searchable file system. You don't register with the search engine, you become a part of it. With the advent of major web page trafic, we got away from this very important concept. I large on the fly networked filesystem and expanding network connections are what the world really needs in order to move forward. As for search time, that is necessary in such a paradigm, since the responsiveness of searches is not a function of the algorithm running a localized database, but of the responsiveness of network nodes. IE, this is the good stuff that we left long agon in search of "user friendliness." It turns out that it is more useful, quicker, and friendlier! This is what we haven't been waiting for, but put on the back burner for a bit, in order to turn a profit. If you think about it though, with the right protocols to initiate such a turn in industry, this could be even more profitable than the web. ...And maybe we could get away from identifying ourselves by someone else's product (URLs).

    --
    Eh...
  12. Think of the children!!! by Glowing+Fish · · Score: 2
    The children! We musn't harm the children! Think of the children!

    Please Slashdotters, Think of the Children! and stop stealing the food of the innocent children with this Gnutella technology!

    --
    Hopefully I didn't put any [] around my words.
  13. Re:The �system� has brought this on itself. by Whoozit · · Score: 2

    I just thought of something - remember way back when the internet first became somewhat popular? Everyone went bonkers saying it would be a haven for porn, piracy, hate literature, etc., and there were debates (maybe not as public as the current ones) on wether the internet should be censored or not.

    At the time, as I remember, one major argument against censoring the 'net was that it was nearly impossible to do - "anyone" can post "anything" on the net, and because its so international, no one nation had control.

    Oh where have those days of freedom gone? How did the censors get past those barriers? Easily enough, it seems - they have the money and the will to spend it on their self-interest that makes anything possible. In retrospect, our claims of immunity from censorship were naive.

    I believe that once systems like Gnutella become popular, it will move (like the original web) from being a geek's haven to corporate tool, and be appropriately restricted by their needs. Maybe less so than the web today, but order will be enforced. How? I don't know, but did we forsee the DMCA and other tactics corps are using to censor the 'net?

    Don't worry, by then I'm sure we'll think of something else. :)

  14. Re:Faking results? by GeZ117 · · Score: 2

    A sort of learning search engine that receive feedback from users ("this site isn't about what it claim to be, don't show it as a possible answer", "this site is excellent and very complete", "this site is nice but unreadable without browser so-or-so", etc). That can be the future. Of course it will need lot of negative feedbacks unbalanced by positives and some other checksto definetly dismiss a site, otherwise we can imagine a way of making "softwar": imagine some corporate making a script sending bad feedback to burry a concurrent website...

    Sidenote: IIRC, "Softwar" is the title of a novel, so this wordplay is not mine.

    --
    sigmentation fault
  15. Moderation/site ranking system by jesterzog · · Score: 2

    I'm definitely not an expert, but collaborative moderation doesn't seem like a bad idea. You could maybe have a separate, probably more centralised moderation server. (Or lots of them if people start running their own.) Users could rank the results they get from any given site, and when others run a search, the reliable replying sites come up first.

    There are still lots of problems though, like how to stop anyone from just moderating their site to the top, and how to make sure the responding site is exactly who they say they are.. which is one of the major problems with spam these days anyway. It could also be really tedious working out how to distinguish a good result from a bad result.

  16. Faking results? by Ed+Avis · · Score: 3

    What's to stop people 'spamming the index'? When your site gets a query, you could respond with 'very strong match' in the hope of getting more hits.

    Who is enforcing that sites won't just lie? Maybe some sort of collaborative moderation a la Slashdot would be needed?

    --
    -- Ed Avis ed@membled.com
  17. Open Source but not by CMU_Nort · · Score: 3

    I really, really hate projects which are "open source" but who refuse to release the source until it's "done." Too many projects these days seem to be following that path, and it's a dangerous one to take. Because what if the code is never truly "finished", as no project ever really is. Its sad.

    --
    --------- Beware the dragon, for you are crunchy and good with ketchup.
  18. More than web data? by Spankophile · · Score: 3

    Since people can define their own content, would this mean that people running the server-end could still be distributing their MP3's, pr0n etc, but through a web interface? It's not just limited to html-page searching.

    This makes pira^H^H^H^H trading files even easier - people no longer need to install a client, there's a nice web-search interface, with direct dload URLs. Web searching for files with no broken links. Nice.

  19. gnutella sucks right now by ArchieBunker · · Score: 3

    Go ahead and do a search for something. Within 5 minutes whatever connection you have (dsl t3 etc) will be saturated. Has anyone ever had a complete download? Getting 100bytes/sec on a 5 meg file is insane. Maybe if it reported their connection speed truthfully and people set realistic download/upload limits.

    --
    Only the State obtains its revenue by coercion. - Murray Rothbard
  20. Will this work? by dayL8 · · Score: 3

    Doesn't the model imply that every search will be processed by every available server - effectively turning a single query into n queries and responses?
    Just think - you're dialled in to an ISP and want to search for something. Eventually you start getting responses, first from hosts logically closer to you then those further away (we can only hope that there's no negative response in the protocol). You may have to wait for it all to come down the line before you get a useful result. And you'll still have to wade through mountains of useless junk (since responders get to define what content they have) just that now you'll have to actually visit the site to see that it's just another boring article on internet protocols instead of the "fix your credit record" guys you were looking for. Eventually, you'll learn which hosts not to accept responses from and which ones respond better to what types of queries (just like today).

    Big search engines will still dominate the field by being able to get it right most of the time. I don't see any real advance.

    ---

    --
    The real problem is entropy.
  21. An important step by GrayMouser_the_MCSE · · Score: 3

    The article mentions this, but not strongly enough. Without "legitimate" applications for technology, they will be viewed as simply tools for pirating or other illegal use. FTP, as an example, could be used for those purposes, but the mainstream uses came first. We need to develop as many mainstream uses for mp3 and gnutella as can be done, so the focus of the technology critics can be drawn away from the music/copyright questions, on to the other uses. As of now, they can claim that other uses are simply "vaporware". Sure, they're possible, but no one is actually doing anything with them. Once the applications come, the technology will gain the acceptance it deserves.

    --
    Of course I use Microsoft. Setting up a stable unix network is no challenge ;p
  22. Re:Simple pirate tools? by MindStalker · · Score: 4

    Yes, but the idea of letting clients search eachother and share files, the Napster or the Gnutella, way is a very good idea. It has many legitimate possibilities, its just that it started out being used for piracy, but saying that its only use is for piracy is a bit short sighted. Though honestly, it can be easy to see things that way. I used to believe the only good use for CD-R technology, was copying games and music. But then I became a network administrator, and realized its benifits for cheap backups. Anyways, my point is that you should never abandon the new ideas, just because its first uses are bad (take nuclear power for instance :)

  23. The �system� has brought this on itself. by JamesSharman · · Score: 4

    The internet has traditionally been free, in recent years/months we have seen an increase in attempts to control the internet via legislation, patents and law suits. The problem is that whilst the internet has seen a large influx of everyday joe's and suits the real power behind the net is as always the people who write the software. Gnutella and software systems like it are part of the fight back. Previously online systems have been centralized due to simplicity and the lack of reason to build them any different. Since we are now entering a time when the freedom we used to take for granted online is under threat new software systems that are nearly impossible to regulate are inevitable. If the various governments and organizations had paid attention to the cherished principles of the net perhaps we could have found a way to limit the pedophiles and professional pirates that they seem so paranoid about without compromising the net's principles to much.Instead the MPAA, the RIAA and all the other control freaks decided they wanted to make a war out of it, and a way they will get.

  24. Distributed searching. by ClayJar · · Score: 4

    What's to stop people from spamming the index?

    I suppose they could build in a little technology to actually check the page. On the other hand, anything you do can be circumvented.

    I suppose this is the classic downside to the entire Internet "thing". You can't enforce absolute control in a medium specifically designed against it. Of course, there are a few things you could do to help the situation.

    With a Gnutella-style model for distributed searches, any host that is consistently returning false positives could be cut off by the adjacent node(s), right? If you have tons of traffic coming through your node from a spam site, couldn't you just stop forwarding requests to them.

    Of course, this wouldn't stop all spamming on the index, but it should allow any one node to cut off a spam node "below" itself. On the other hand, since not everyone will be eternally vigilant, this much freedom could be damaging.

    You could always have something like the MAPS RBL for search nodes. Just have someone paying attention that can keep a database of hosts to ignore requests from. If anybody can create a blackhole list, it wouldn't necessarily be centralized, so it wouldn't impinge on freedom of the search. It may still have an "open relay" problem, like SMTP does now, but that doesn't necessarily make it not worthwhile.

  25. == Do it yourself DDoS? by Pooh22 · · Score: 4
    One of the problems I see with this gnutella method is the broadcasting of the searches.

    Example: If you get the results of this kind of broadcast search back from a bad search ("sex nude pictures jpg"), you'll trash your own internet connection and probably that of others (or the search-interface's if you use a web-interface).

    Imagine a network of a million hosts (a small subset of all webservers). Each of these is running a gnutella-based search-engine. On one of the servers is an interface to search the network for some information. The query is forwarded onto the overlay network, to say 10 nodes at each node, assuming some mechanism is in place to avoid loops. if the network is well interconnected, it will take about 5-6 hops to reach an edge of the cloud (probably a couple of times more to reach all the nodes). As soon as the first nodes get the search-request, they send back results, say limited to the first 5-10 most significant hits. Each reply has a number of tuples consisting of (URLs, a description and an indication of how close the match is and a timestamp and probably some more), maybe 1-2 kB per reply. Say 10% of servers have a match, then 100000 hosts will at some point send back results.

    I calculate, roughly a 100 MB of results will be arriving at the searching node within a few minutes, if it can process the dataflow

    This is only one search, both the searching nodes and the servers will have to deal with a lot of searches if you look at other search-engines as a comparison.

    Centralised search-engines are a good way to limit the bandwidth-usage, but they are slow to get changes on the web.

    idea: It would be good to have a webserver keep track of an index for it's own document-space and when that changes, push that change to a central search-engine where it can be searched. Distributing the searches is a waste of resources, IMHO you should distribute the indexing mechanism and centralise the searching.

    And considering that for this thing to work you need an index-engine on each server anyway, it's a small step to do it like this, isn't it?

  26. Interesting idea with problems by Kaa · · Score: 5

    The idea is interesting, no doubt. However there are three major (from my POV) with it:

    (1) An obvious point: if a site itself decides which queries to respond to, there'll be a lot of spamming the index. Doesn't anybody remember the fate of the [meta] tags?

    (2) This search technology essentially turns a search into an advertising stream. Since the site decides what to return, it'll return a blurb instead of a context around the match. And if the site can returns graphics and not just text strings... oh, my! Advertising banners as search results! Joy.

    (3) The results are going to be dependent on the location of the query. Same question asked from a machine in California is likely to return different results if asked from a machine in Germany (especially with low timeouts). This isn't horrible, but not all that good. In particular, it means that I cannot tell other people "Search for 'foo', you'll find the site I am talking about on the first page".

    Out of the three, the first is so obvious, something will be done about it. I don't know what, though. It's the second that worries me most of all. Besides more advertising, there is a basic problem here -- I want to see what the site has, not necessarily what they prefer to show me. To give a trivial example, a company could have a recalls/warnings/manufacturing defects page somewhere on its site to satisfy disclosure requirements, but never return this page to any search.

    All in all, I'll stick with Google for the time being, thank you very much.

    Kaa

    --

    Kaa
    Kaa's Law: In any sufficiently large group of people most are idiots.
  27. Illicit network by Hard_Code · · Score: 5

    "Unlike Napster, however, it allows people to search for any kind of files; a random sampling of the search terms being used at any given time ranges from MP3s to blockbuster movies to pornography."

    "The Department of Transportation released a shocking report this morning, in which it was discovered that the federal highway system, unlike rural routes, allow transportation of any kind of material. A random sampling of items being transported at any given time ranges from pirated music to pirated blockbuster movies, to pornography."

    --

    It's 10 PM. Do you know if you're un-American?