Slashdot Mirror


Google Patents Search Algorithm

blastedtokyo writes "Google gets the first web search patent. According to this News.com.com article, Google was able to patent how they crawl and rank web pages. They claim "an improved search engine that refines a document's relevance score based on interconnectivity of the document within a set of relevant documents.""

8 of 362 comments (clear)

  1. Mis-title by Amsterdam+Vallon · · Score: 4, Informative

    It's not really their Search algorithm, it's their method of comprehensive PageRanking.

    They basically measure Web pages as either 1) portals, or 2) authorities.

    Sites like Kuro5hin and *nix have a lot of "Google juice" (i.e. weight in their ranking system) because they have so many links to other sites, while also garnering a slew of links to their main page.

    --

    Reply or e-mail; don't vaguely moderate. Ex-O'Reilly/MIT employee, now a full-time Google employee.
    1. Re:Mis-title by MilTan · · Score: 5, Informative

      PageRank doesn't actually distinguish between "portals" and "authorities." It "only" does a link-analysis of the web by essentially mutiplying some ranking vector by a matrix representing the links in the web, with a random jump to another location taking place with a certain probability to create a new ranking vector. Once this converges, you have the new "PageRank."

      PageRank scores are calculated completely independently of the search query. You are probably thinking of Kleinbergs HITS (or Hubs and Authorities) algorithm which uses an initial search query to prune the search space, and then identifies hubs and authorities in the web. In contrast to PageRank, which only uses forward links to calculate its rankings, HITS uses both forward and "backward" links to figure out its ratings. Furthermore, unlike PageRank, HITS produces different scores for different queries.

      The above tells us the following: That Kuro5hin and Slashdot have high pageranks not because of their excessive numbers of outlinks, but because many people point to their frontpages. Similarly, these high PageRanks mean that people that Slashdot or Kuro5hin point to get higher scores as well.

  2. Oh Please - Eugene Garfield did this is 1961 by tiltowait · · Score: 5, Informative

    Google didn't invent the concept behind PageRank, just its name. See my E2 writeup on citation analysis for more.

    1. Re:Oh Please - Eugene Garfield did this is 1961 by zmahk31 · · Score: 5, Informative

      In fact, the algorithm as a computational method goes back to Jacobi 1804-1851, and is essentially an iterative solver for large systems of linear equations.
      <p>
      Of course, it's still a significant contribution to see the application of the Jacobi method to ranking web pages, and I assume that they have done some clever and many more dirty tricks to get more realistic results, weed out duplicate pages, etc., which may or may not be part or the patent.
      <p>
      In any case, the basic page rank algorithm is quite intuitive to anyone who has worked with iterative numerical methods, and in fact a very nice illustration of the power of such methods.

  3. Software patents by killmenow · · Score: 4, Informative

    I find it interesting that because it's google, some /.-ers are saying essentially "good for them!" But at the heart of it, it makes no difference who it is or what their intention is.

    Kids, software patents are bad, mm-kay...

  4. Not necessarily... by TopShelf · · Score: 4, Informative
    Patents are also widely used as a means of rewarding an inventor by giving them an avenue to license their technology to one or many users who can then implement it into commercial products. In that way you don't get a monopoly, nor does the inventor have to provide the capital required to bring something to market. You only get a monopoly if the patent holder refuses to sell licenses, or sells it to a single user.

    Think fuel injectors, for example, which are made by several suppliers, but have a patent holder who gets license revenue.

    --
    Stop by my site where I write about ERP systems & more
  5. Patent # 6,526,440 by esme · · Score: 4, Informative
  6. Re:OMG MORE PATENTS!!! by iocat · · Score: 4, Informative
    Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it. You can't patent a trade secret and still have it be secret. People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)
    Actually, they are required to disclose it, but not to license it. The patent gives them a 17 year legal monopoly to do whatever they want with it (use it, license it, bury it, etc.). As an example, Capri Sun never licensed their patented "juice bag" technology, forcing others to use inferior "drink boxes" to deliver product. Now that the patent is expired, other "drink bags" are on the market.

    More worrying is that software patents are sometimes granted using such general language that the entity getting the patent *doesn't* really have to disclose anything, enabling them to get both protection while keeping their invention secret, which is exactlty the opposite effect of what patents were intended for -- to get duplicable knowledge into the public domain after a period of protection for the original inventor.

    --

    Dude, I think I can see my house from here.