Slashdot Mirror


Google Patents Search Algorithm

blastedtokyo writes "Google gets the first web search patent. According to this News.com.com article, Google was able to patent how they crawl and rank web pages. They claim "an improved search engine that refines a document's relevance score based on interconnectivity of the document within a set of relevant documents.""

18 of 362 comments (clear)

  1. Good for them... by DCowern · · Score: 5, Interesting

    They thought of a way to improve upon an existing invention. They were the first to do it. They want to make money from their idea. It's only logical for them to seek a patent. I guess congratulations are in order!

    1. Re:Good for them... by Ed+Avis · · Score: 5, Insightful

      It's not a question of whether Google is 'good' or 'evil'. It's a question of whether the patent office was right to grant this patent, and whether a patent system that includes software is of greater economic benefit to society than one that does not.

      You can ask: if patents on computer programs were not available, would Google have developed their idea anyway?

      --
      -- Ed Avis ed@membled.com
    2. Re:Good for them... by Qzukk · · Score: 5, Interesting

      The problem with the patent system today (in my view) isn't whether or not software should be patented, it is the complete crap that comes out of the PTO. They have no incentive to do things right, no disincentive against doing things wrong, so they just grant patents on crap.

      What needs to be done is first: fix the backlog so that it no longer takes years to grant a patent. Open new patent courts, hire more people, charge more for the patents, whatever it takes.

      Second: The USPTO needs to be responsible for its output. If a patent is overturned as invalid, court fees and the original cost of filing the patent should come out of its pocket.

      Third: Mandatory implementation. For a patent to be valid it must be implemented by someone, somewhere. If the holder of the patent is unable to implement the patent, it must accept at least one request to license the patent for implementation under generally accepted RAND practices. This prevents people from patenting something with no intention of ever producing that invention, for the purpose of preventing that invention from being produced.

      Fourth: establish a class of patents specifically for software. Things not in this class of patents cannot be used to claim software infringes. Turnaround must be quick, and part of the patent process should be "does someone else sell this product already" which should be relatively easy, compared to looking through millions of patents.

      Software patents will expire after two years, renewable once (at the price of the original patent) for another year. This better matches the reality of software development. It provides people a head start without granting them an essentially perpetual monopoly.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
  2. Patents creating artificial monopolies by taumeson · · Score: 5, Insightful

    Patents are a tool for creating temporary, artificial monopolies.

    With that said, aren't you glad Google might be able to stay on top and profitable, instead of having to resort to banner ad revenue, etc?

  3. Oh Please - Eugene Garfield did this is 1961 by tiltowait · · Score: 5, Informative

    Google didn't invent the concept behind PageRank, just its name. See my E2 writeup on citation analysis for more.

    1. Re:Oh Please - Eugene Garfield did this is 1961 by zmahk31 · · Score: 5, Informative

      In fact, the algorithm as a computational method goes back to Jacobi 1804-1851, and is essentially an iterative solver for large systems of linear equations.
      <p>
      Of course, it's still a significant contribution to see the application of the Jacobi method to ranking web pages, and I assume that they have done some clever and many more dirty tricks to get more realistic results, weed out duplicate pages, etc., which may or may not be part or the patent.
      <p>
      In any case, the basic page rank algorithm is quite intuitive to anyone who has worked with iterative numerical methods, and in fact a very nice illustration of the power of such methods.

  4. Not that outrageous by Nikk+Name · · Score: 5, Insightful
    Compared to other "patent the fork" icon'ed items, this one is not that outrageous.

    Google's way of doing thing was certainly not the first way to search, it is not the most obvious way to search, it is not the only way to search, and it might not be the best way to search (something better likely will come along). In other words, I don't think this patent will harass many others at all.

    This is nothing near as bad as Amazon patenting message boards attached to sale items, or "one-click shopping" being patented.

  5. Hmm.. by Astin · · Score: 5, Insightful

    Wow.. an internet patent that might actually make sense. It's not "A method to search through an index of web pages for relevant links to a user request for specific information." But the improvement on it. And it's generally accepted that Google DID improve web searching tremendously and have a unique method of doing it. Of course, this means it will be struck down immediately by some small company that gets a broader patent (see above) and sues them.

    --
    - In hell, treason is the work of angels.
  6. Re:watch out by DCowern · · Score: 5, Insightful

    What's wrong with what Google is doing? They're simply trying to keep an "edge" on the market. The reason why they're the best search engine out there is because they figured out how to make a better way to rank pages. They deserve to reap the benefits of that invention without anyone else cutting in on their business.

    As for the "googling" incident, I just think they're attempting to defend their trademark. If you don't do that kind of stuff, you lose your trademark. Kinda like how Kleenex and Xerox lost theirs (everyone says "may I have a kleenex?" or "could you xerox this?" and so it became colloquial and no longer a trademark).

    All Google is trying to do is cover their ass. If they decide one day to try to patent the search engine, then there'll be reason to get up in arms.

  7. Good for them... by theGreater · · Score: 5, Insightful

    ...because they're Google. But if it were Microsoft patenting "an improved method for giving help to users", say maybe the help files vs. man pages, people would flame about prior art, talk endlessly out of their anuses about how Bill Gates is trying to wrest control of the tinfoil hat co-op from Mac users, and generally be nuisances.

    I love /.ing while in class, but honestly, people. Google gives a C&D letter, we all golf clap and say "way to defend your IP!" Someone else does it, and we all run to chillingeffects to boycott / whine / gripe / whatever.

    Here's a thought... get off your hobbyhorse, and start evaluating things based on FACTS, not the general feeling of techno-elitism you get from pretending you're cool because you get jokes written in PERL.

    And mod me -5 Troll, if you want. But it's the damned truth, and you know it.

    -theGreater.

  8. Re:Mis-title by MilTan · · Score: 5, Informative

    PageRank doesn't actually distinguish between "portals" and "authorities." It "only" does a link-analysis of the web by essentially mutiplying some ranking vector by a matrix representing the links in the web, with a random jump to another location taking place with a certain probability to create a new ranking vector. Once this converges, you have the new "PageRank."

    PageRank scores are calculated completely independently of the search query. You are probably thinking of Kleinbergs HITS (or Hubs and Authorities) algorithm which uses an initial search query to prune the search space, and then identifies hubs and authorities in the web. In contrast to PageRank, which only uses forward links to calculate its rankings, HITS uses both forward and "backward" links to figure out its ratings. Furthermore, unlike PageRank, HITS produces different scores for different queries.

    The above tells us the following: That Kuro5hin and Slashdot have high pageranks not because of their excessive numbers of outlinks, but because many people point to their frontpages. Similarly, these high PageRanks mean that people that Slashdot or Kuro5hin point to get higher scores as well.

  9. Prevent the spread of a deficient technology. by expro · · Score: 5, Interesting

    So, the bright side of this patent is that perhaps it will keep others from focusing on Google's obsession -- the reference popularity contest. But like any patent, it is subject to abuse, not that we know at all how Google intends to enforce it.


    I have requested improvements to Google's algorithms for years to make it more possible to search for a specific thing, rather than just a popular thing, but they don't have engineers, apparently, who understand these basic needs.


    AltaVista lets you wildcard, search for one word NEAR another word, use common words as part of a phrase, and construct a variety of very useful filters that are impossible with Google's popularity engine.


    AltaVista used to be the best out there, but compromised their own usefulness. If AV indexed more pages and had not dropped their usenet coverage, it would still be the most useful engine by far to an advanced searcher -- one looking for very specific things. I still go there often. Just because the masses use Google does not make it quality or best for advanced users. They have stagnated for years now. The masses use a lot of things produced by monopolists who are no longer required to innovate or even improve to the level of the competition.


  10. Re:OMG MORE PATENTS!!! by Bonker · · Score: 5, Insightful

    While I personally think that patents are repugnant, Google has fallen down on the 'just' side of using the patent laws the way they were intending to be used. They're not trying to bilk people out of vast sums of money ala British Telecom's hyperlink patent or Amazon's 1-click buy patent. They have a unique process that they've carefully guarded and have built a business around.

    Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it. You can't patent a trade secret and still have it be secret. People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)

    Again, I think that patents are a misstep. I think they allow too many Amazon and BT events to happen. Despite the fact that the patent system is horribly broken, Google is using patent laws responsibly here. Wait until they announce a patent on 'all search technology that lists search results on a web page' or something like that. *Then* you can start complaining about how broken the patent system is.

    --
    The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
  11. This patent is for NewRank/LocalRank, no PageRank by registro · · Score: 5, Insightful

    That is not the patent for PageRank.
    PageRank had already been patented by Stanford University, just before Google was created, when it was a community effort.
    This new patent is a patent over an improvement of PageRank, what they call now "LocalRank" and "NewRank". It is designed to stop competitor from developing pagerank-like technologies. Armed with that kind of patent, they can stop open-sorce Aspseek, Teoma and others from developing similar technologies.
    What they are tryng to do is extend patents over citation ranking and peer-review, something that has been around since the creation of the first libraries. This is NOT good.

    Basically, this means no more money from the suits to any citation-ranking related effor in any start-up, fearing litigation. It could mean also no more installations of open-source Aspseek (Google Appliance's competitor )in corporate environments, because of fear of litigation.

    This is sad.

  12. Re:Mis-title by johnnyb · · Score: 5, Interesting

    What I found particularly cool about their algorithm was that they can return results for pages that google has not read. If there is a link to a page google has not read on a page that google is reading, it can still return results to the unread page based on the context of the link, and the popularity of the link on other pages. Really nifty stuff.

  13. Re:OMG MORE PATENTS!!! by ergo98 · · Score: 5, Insightful

    Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it

    I had made this mistake before, confusing trade groups with patents. AFAIK patents do not force you to license it whatsoever. Instead they can be used to hunt down anyone who intrudes into your patent and sue them out of existence.

    In any case this isn't about PageRank, but is about a revised search technique: In a nutshell it is PageRank by resultset -> i.e. Say you searched for "Scooby Doo" : It gets the result set of Scooby Doo hits, and THEN it derives a pagerank within that set of Scooby Doo hits (versus the basic PageRank which derives the ranking for the whole net). It's funny because I had actually investigated the initial steps of a patent several years ago for something which I called a "combined corpus" (which in a similar light groups items by topic of discussion-i.e. a page on Crickets would get a good score for cricket searches by being referenced by lots of Cricket pages : It wouldn't benefit them to put a nude picture of Britney Spears to get a lot of links and boost their generic pagerank) because of the general ridiculousness of something like the basic PageRank, but I knew that against a giant machine like Google I wouldn't stand a chance so I just forgot about it (which is the problem with patents: How many people think of a great idea but then let it rot because of the almost certain patent overlaps). I've had that same thought process with a lot of, in my opinion, great ideas.

    People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)

    Ooooh, nice use of the obligatory MS slam for mod points (ignore the fact that MS has been a fantastic patent citizen and has never, to my knowledge, enforced dubious patents). In any case how is it "innovation" for others to now use something existing (if Google allowed it)? Sounds like counter-innovation because everyone who might possibly overlap with this patent will now just dump the project lest they cross paths with Google.

  14. Re:OMG MORE PATENTS!!! by Com2Kid · · Score: 5, Insightful
    • What G**gle is doing is basically quantifying word of mouth.


    Which is a pretty impressive proccess. Making a set of mathmatical formulas out of an otherwise very much fluid and etheral concept. Not half bad.

    • Everyone knows the best restaurant in town is the one that everyone talks about.


    Oh? I think it is the one that everybody with a good sense of taste talks about? What is a good sense of taste? Welllll, now we are getting down to the nitty gritty. What defines a "trustable" website?

    • We "link" to the restaurant of our choice. Someone new to the office, and town, is looking to go to the best restaurant there is. They ask around. 5 people say it is the Puerto Vallarta Mexican restaurant 2 say it is White River Landing [muncienews.com], and 8 others say it is Vince's [muncienews.com]. Vince's it is.


    You must define the weight, if person a and b say Vinces but they both say that person C has "better taste" in Mexican Food, and person C says Puerto Vallarta, and enough of that goes on, than the decision base upon the results can be changes signifigently.

    • Simple word-of-mouth ranking right there.


    Yes, sounds like a good alpha-level project. :)

    • If the new person throws in variables like, "I don't eat Mexican." That skews the results. Nothing fancy about this.


    True, but what if resturant X has a style of Mexican that is mixed with, say, Soul Food, and the person REALLY loves Soul Food. Then what? Life gets confusing. :)


    • In fact, I bet a few hours of research into Sociology, Psychology, and Linuquistics papers will turn up generic proofs and observations of the very same things that page rank takes care of in a different context. A context shift shouldn't be patentable.


    Oh? If it is so obvious, why did search engines for so long, well, heh, suck. I remember using insanly complicated regexps with those "other" search engines to do what are now trivial searches on Google.

  15. Re:OMG MORE PATENTS!!! by Thavius · · Score: 5, Insightful

    There's two ways patents can be used: as a sword, and as a shield.

    IBM holds many interesting patents. One that caused a former employer of mine to take notice is one that covered anything that used templates to generate HTML files. This patent basically covers almost all WYSIWYG HTML creation tools (we were in the middle of creating one when it was issued). I haven't seen any breaking stories on how IBM is beating down small companies with it, and our company didn't get served a C&D order because of it.

    It appears that IBM is using the patent as a shield, to protect themselves against another company saying, "I invented that, give me money." It will protect them from being the target of an infringement suit.

    Other companies, such as BT, and Amazon, and others, are using their patents as a sword to exthort money out of companies. This is what I disagree with, because most often they target small companies first. They never seem to go after companies with resources, because they know their sword is not as sharp or strong as it could be.

    I'm not patents as an idea, but patents of some tech innovations have been abused. The side-swinging patent, that guy will never try to enforce his patent, because it was for fun. But just like anything else, patents can be abused to the detriment of everyone.

    Google's patent can be used in two ways. Let's see how they use it.