Slashdot Mirror


Interesting Concepts in Search Engines

TheMatt writes "A new type of search algorithm is described at NSU. In a way, it is the next generation over Google. It works off the principle that most web pages link to pages that concern the same topic, forming communities of pages. Thus, for academics, this would be great as the engine could find the community of pages related to a certain subject. The article also points out this would be good as an actually useful content filter, compared to today's text-based ones."

8 of 230 comments (clear)

  1. Some issues on linking. by Restil · · Score: 5, Informative

    Google pioneered the use of links to deducepages' relevance. Its PageRank technology counts a link from site A to site B as a vote for B from A. But it does not take account of all the other sites to which A has links, as NEC's new technique does.

    I won't pretend to know all the inner workings of google's search engine technology. But I believe that google DOES care about other links from site A. This falls into the hub and authority model, which is definined recursively. A hub is a site that links to a lot of authority sites. An authority site is a site that is linked to by a lot of hubs. Basically, authorities provide the content, and hubs provide links to the content. In this example, B is an authority site, and A is a hub.

    The way the ranking works, is that if B is linked to by a large number of quality hub sites, then it has a respectively large quality rating. Likewise, if a hub links to a large quantity of high quality authority sites, then its quality will also be ranked highly as a result.

    This also allows Google to provide links to sites even if the search terms don't match the content of that site. A hub that links to a lot of sites about cars will relate cars to ALL the links regardless if the word "car" is included on the site that is provided.

    Of course, I'm not THAT familiar with google. Its possible I'm full of bunk. But I'm pretty sure it works this way to some extent and that google does pay attention to the hub based links.

    -Restil

    --
    Play with my webcams and lights here
  2. Clustering by harmonica · · Score: 5, Informative

    Clustering pages is what other search engines like Teoma are doing already.

    In a recent interview in c't magazine, a Google employee (Urs Hölzle) said, when asked about clustering, that they had tried that a long time ago, but they never got it to work successfully. He mentioned two problems:
    - the algorithms they came up with delivered about 20 percent junk links for almost all topics
    - it's hard to find the right categories and give them correct names, esp. for very generic queries

    Of course, just because Google didn't get it to work properly doesn't mean nobody else can. But it's harder than it looks, and it's been known for quite a while.

  3. Re:Sparse on details and a working demo by jsprat · · Score: 3, Informative
    His homepage

    A postscript document detailing his research.


    Also, if you're a member of IEEE Computing, you can see his publication.

  4. Re:Exploiting search engines that rank popularity by tiltowait · · Score: 5, Informative

    Did you read the update on the page, or are you just parroting the previous +5 post on this?

    Since this was first brought up a few days ago, the Scientology volunteer editor at the Open Directory Project, an upstream content provider for Google, was fired.

  5. Explanation of the joke by Wire+Tap · · Score: 3, Informative

    For anyone out there who doesn't quite know why this is +5 worthy, here is the joke:

    Super Bowl Sunday a commercial aired, featuring none other than Kevin Bacon at a retail store, trying to use a check to pay for his goods. The man behind the counter asked to see ID, but Bacon didn't have any on him. What now? Bacon runs around town gathering people (an extra he played in a movie with, a doctor, a priest, an attracive girl, and maybe one other guy?), who all had some ties to one another, through the other 6 in the group. The attractive girl once dated the sales clerk in the store, so Kevin explains that they are "practically bothers," hence putting to good use the principle of 7 degrees of seperation.

    Therefore, the humor lies within. :) This is, of course, a very pop-culture oriented joke that will probably fade even more quickly than AYB did after its behemoth prime of last year and the December before. Long live the meme.

    --

    Man is born free; and everywhere he is in chains.

  6. This is not a new idea by John+Harrison · · Score: 3, Informative
    I will refer you to the Clever project at IBM. I first read about this years ago when Google was still a project at google.stanford.edu.

    Clever does Google one better by separating the results of searches into "hubs" and content. Hubs are sites with lots of links on a particular subject. Content sites are the highly rated sites linked to by the hubs.

    I thought it was a very intersting concept and I am surprised that it was not comercialized. Of course, IBM is in the business of buying banner ads rather than selling them. They could always do like /. and OSDN and mostly run ads for their own stuff though....

    1. Re:This is not a new idea by John+Harrison · · Score: 3, Informative
      How do you know this is not how Google creates its search results? What you've described sounds exactly like how Google describes their technology:

      I know because I have read about both technologies. I discussed the merits of Clever v. Google a few years ago with classmates that were taking the class at Stanford that spawned Google. That is how I know.

      End of Rant

      There is an excellent article on Clever that appeared in Scientific American a few years ago. It was linked to from the page I origianlly posted. You should check it out. Clever returns results divided into the catergories of "hubs" and "authorities". I have never noticed Google doing that/

      Here is an excellent summary from the article on the differences between Clever and Google:

      Google and Clever have two main differences. First, the former assigns initial rankings and retains them independently of any queries, whereas the latter assembles a different root set for each search term and then prioritizes those pages in the context of that particular query. Consequently, Google's approach enables faster response. Second, Google's basic philosophy is to look only in the forward direction, from link to link. In contrast, Clever also looks backward from an authoritative page to see what locations are pointing there. In this sense, Clever takes advantage of the sociological phenomenon that humans are innately motivated to create hublike content expressing their expertise on specific topics.

      Of course Google has tweaked their method since this article was written, however it has not become Clever.

  7. Re:Oracle of Bacon by swillden · · Score: 3, Informative

    Here are the top 1000. Number 1 is Christopher Lee (Saruman in FotR), probably largely because he's been in 228 films.

    --
    Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.