Google Patents Search Algorithm
blastedtokyo writes "Google gets the first web search patent. According to this News.com.com article, Google was able to patent how they crawl and rank web pages. They claim "an improved search engine that refines a document's relevance score based on interconnectivity of the document within a set of relevant documents.""
They thought of a way to improve upon an existing invention. They were the first to do it. They want to make money from their idea. It's only logical for them to seek a patent. I guess congratulations are in order!
Patents are a tool for creating temporary, artificial monopolies.
With that said, aren't you glad Google might be able to stay on top and profitable, instead of having to resort to banner ad revenue, etc?
Google didn't invent the concept behind PageRank, just its name. See my E2 writeup on citation analysis for more.
Google's way of doing thing was certainly not the first way to search, it is not the most obvious way to search, it is not the only way to search, and it might not be the best way to search (something better likely will come along). In other words, I don't think this patent will harass many others at all.
This is nothing near as bad as Amazon patenting message boards attached to sale items, or "one-click shopping" being patented.
Wow.. an internet patent that might actually make sense. It's not "A method to search through an index of web pages for relevant links to a user request for specific information." But the improvement on it. And it's generally accepted that Google DID improve web searching tremendously and have a unique method of doing it. Of course, this means it will be struck down immediately by some small company that gets a broader patent (see above) and sues them.
- In hell, treason is the work of angels.
What's wrong with what Google is doing? They're simply trying to keep an "edge" on the market. The reason why they're the best search engine out there is because they figured out how to make a better way to rank pages. They deserve to reap the benefits of that invention without anyone else cutting in on their business.
As for the "googling" incident, I just think they're attempting to defend their trademark. If you don't do that kind of stuff, you lose your trademark. Kinda like how Kleenex and Xerox lost theirs (everyone says "may I have a kleenex?" or "could you xerox this?" and so it became colloquial and no longer a trademark).
All Google is trying to do is cover their ass. If they decide one day to try to patent the search engine, then there'll be reason to get up in arms.
...because they're Google. But if it were Microsoft patenting "an improved method for giving help to users", say maybe the help files vs. man pages, people would flame about prior art, talk endlessly out of their anuses about how Bill Gates is trying to wrest control of the tinfoil hat co-op from Mac users, and generally be nuisances.
/.ing while in class, but honestly, people. Google gives a C&D letter, we all golf clap and say "way to defend your IP!" Someone else does it, and we all run to chillingeffects to boycott / whine / gripe / whatever.
I love
Here's a thought... get off your hobbyhorse, and start evaluating things based on FACTS, not the general feeling of techno-elitism you get from pretending you're cool because you get jokes written in PERL.
And mod me -5 Troll, if you want. But it's the damned truth, and you know it.
-theGreater.
PageRank doesn't actually distinguish between "portals" and "authorities." It "only" does a link-analysis of the web by essentially mutiplying some ranking vector by a matrix representing the links in the web, with a random jump to another location taking place with a certain probability to create a new ranking vector. Once this converges, you have the new "PageRank."
PageRank scores are calculated completely independently of the search query. You are probably thinking of Kleinbergs HITS (or Hubs and Authorities) algorithm which uses an initial search query to prune the search space, and then identifies hubs and authorities in the web. In contrast to PageRank, which only uses forward links to calculate its rankings, HITS uses both forward and "backward" links to figure out its ratings. Furthermore, unlike PageRank, HITS produces different scores for different queries.
The above tells us the following: That Kuro5hin and Slashdot have high pageranks not because of their excessive numbers of outlinks, but because many people point to their frontpages. Similarly, these high PageRanks mean that people that Slashdot or Kuro5hin point to get higher scores as well.
So, the bright side of this patent is that perhaps it will keep others from focusing on Google's obsession -- the reference popularity contest. But like any patent, it is subject to abuse, not that we know at all how Google intends to enforce it.
I have requested improvements to Google's algorithms for years to make it more possible to search for a specific thing, rather than just a popular thing, but they don't have engineers, apparently, who understand these basic needs.
AltaVista lets you wildcard, search for one word NEAR another word, use common words as part of a phrase, and construct a variety of very useful filters that are impossible with Google's popularity engine.
AltaVista used to be the best out there, but compromised their own usefulness. If AV indexed more pages and had not dropped their usenet coverage, it would still be the most useful engine by far to an advanced searcher -- one looking for very specific things. I still go there often. Just because the masses use Google does not make it quality or best for advanced users. They have stagnated for years now. The masses use a lot of things produced by monopolists who are no longer required to innovate or even improve to the level of the competition.
While I personally think that patents are repugnant, Google has fallen down on the 'just' side of using the patent laws the way they were intending to be used. They're not trying to bilk people out of vast sums of money ala British Telecom's hyperlink patent or Amazon's 1-click buy patent. They have a unique process that they've carefully guarded and have built a business around.
Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it. You can't patent a trade secret and still have it be secret. People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)
Again, I think that patents are a misstep. I think they allow too many Amazon and BT events to happen. Despite the fact that the patent system is horribly broken, Google is using patent laws responsibly here. Wait until they announce a patent on 'all search technology that lists search results on a web page' or something like that. *Then* you can start complaining about how broken the patent system is.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
That is not the patent for PageRank.
PageRank had already been patented by Stanford University, just before Google was created, when it was a community effort.
This new patent is a patent over an improvement of PageRank, what they call now "LocalRank" and "NewRank". It is designed to stop competitor from developing pagerank-like technologies. Armed with that kind of patent, they can stop open-sorce Aspseek, Teoma and others from developing similar technologies.
What they are tryng to do is extend patents over citation ranking and peer-review, something that has been around since the creation of the first libraries. This is NOT good.
Basically, this means no more money from the suits to any citation-ranking related effor in any start-up, fearing litigation. It could mean also no more installations of open-source Aspseek (Google Appliance's competitor )in corporate environments, because of fear of litigation.
This is sad.
What I found particularly cool about their algorithm was that they can return results for pages that google has not read. If there is a link to a page google has not read on a page that google is reading, it can still return results to the unread page based on the context of the link, and the popularity of the link on other pages. Really nifty stuff.
Engineering and the Ultimate
Now that they've been awarded a patent for page-rank, it's required for them to make it public so that people can license it
I had made this mistake before, confusing trade groups with patents. AFAIK patents do not force you to license it whatsoever. Instead they can be used to hunt down anyone who intrudes into your patent and sue them out of existence.
In any case this isn't about PageRank, but is about a revised search technique: In a nutshell it is PageRank by resultset -> i.e. Say you searched for "Scooby Doo" : It gets the result set of Scooby Doo hits, and THEN it derives a pagerank within that set of Scooby Doo hits (versus the basic PageRank which derives the ranking for the whole net). It's funny because I had actually investigated the initial steps of a patent several years ago for something which I called a "combined corpus" (which in a similar light groups items by topic of discussion-i.e. a page on Crickets would get a good score for cricket searches by being referenced by lots of Cricket pages : It wouldn't benefit them to put a nude picture of Britney Spears to get a lot of links and boost their generic pagerank) because of the general ridiculousness of something like the basic PageRank, but I knew that against a giant machine like Google I wouldn't stand a chance so I just forgot about it (which is the problem with patents: How many people think of a great idea but then let it rot because of the almost certain patent overlaps). I've had that same thought process with a lot of, in my opinion, great ideas.
People now have the opportunity to build new methods and innovate with Pagerank as a basis for that innovation. (Real innovation, not MS innovation.)
Ooooh, nice use of the obligatory MS slam for mod points (ignore the fact that MS has been a fantastic patent citizen and has never, to my knowledge, enforced dubious patents). In any case how is it "innovation" for others to now use something existing (if Google allowed it)? Sounds like counter-innovation because everyone who might possibly overlap with this patent will now just dump the project lest they cross paths with Google.
Which is a pretty impressive proccess. Making a set of mathmatical formulas out of an otherwise very much fluid and etheral concept. Not half bad.
Oh? I think it is the one that everybody with a good sense of taste talks about? What is a good sense of taste? Welllll, now we are getting down to the nitty gritty. What defines a "trustable" website?
You must define the weight, if person a and b say Vinces but they both say that person C has "better taste" in Mexican Food, and person C says Puerto Vallarta, and enough of that goes on, than the decision base upon the results can be changes signifigently.
Yes, sounds like a good alpha-level project.
True, but what if resturant X has a style of Mexican that is mixed with, say, Soul Food, and the person REALLY loves Soul Food. Then what? Life gets confusing.
In fact, I bet a few hours of research into Sociology, Psychology, and Linuquistics papers will turn up generic proofs and observations of the very same things that page rank takes care of in a different context. A context shift shouldn't be patentable.
Oh? If it is so obvious, why did search engines for so long, well, heh, suck. I remember using insanly complicated regexps with those "other" search engines to do what are now trivial searches on Google.
Need help treating your acne? Come here!
There's two ways patents can be used: as a sword, and as a shield.
IBM holds many interesting patents. One that caused a former employer of mine to take notice is one that covered anything that used templates to generate HTML files. This patent basically covers almost all WYSIWYG HTML creation tools (we were in the middle of creating one when it was issued). I haven't seen any breaking stories on how IBM is beating down small companies with it, and our company didn't get served a C&D order because of it.
It appears that IBM is using the patent as a shield, to protect themselves against another company saying, "I invented that, give me money." It will protect them from being the target of an infringement suit.
Other companies, such as BT, and Amazon, and others, are using their patents as a sword to exthort money out of companies. This is what I disagree with, because most often they target small companies first. They never seem to go after companies with resources, because they know their sword is not as sharp or strong as it could be.
I'm not patents as an idea, but patents of some tech innovations have been abused. The side-swinging patent, that guy will never try to enforce his patent, because it was for fun. But just like anything else, patents can be abused to the detriment of everyone.
Google's patent can be used in two ways. Let's see how they use it.