Google Patents Search Algorithm
blastedtokyo writes "Google gets the first web search patent. According to this News.com.com article, Google was able to patent how they crawl and rank web pages. They claim "an improved search engine that refines a document's relevance score based on interconnectivity of the document within a set of relevant documents.""
They thought of a way to improve upon an existing invention. They were the first to do it. They want to make money from their idea. It's only logical for them to seek a patent. I guess congratulations are in order!
Does this mean that with their algorithm now publicly available, we're going to find more "googlebuster" sites finding ways to improve their rankings?
Now that they've patented their technology, surely that means that it's open to public scrutiny and therefore abuse as people exploit it's shortcomings.
Like tinyurl, but one letter less! http://qurl.co.uk/
Do you realise that the Google search you link to, shows your comment as the top result? Its a Google loop!
There's a reason I only ever use their search engine now. Well two reasons. One is that about half the time I run searches there, what I'm looking for is the first thing on the list. The second is they are very not obnoxious about their advertising. And I've probably clicked through more google ads than any other banner ads on the net. That's right, I'm much more likely to follow information that looks like it pertains to what I'm looking for right now over some obnoxious Javascript ad (Which usually make me turn Javascript off and reload.)
Very similar to Google's method, I've seen CNN and USA Today run ads disguised as news stories in their tech sections. Unlike google, which clearly marks the ads, CNN and USA Today are simply compromising their journalistic integrity. As if those two words have been put together in a single sentence since Cronkite left the industry.
Where was I? Oh yes. In principle, software patents offend me. Well... and most of the rest of the slashdot population apparently. Being able to patent something that doesn't have a physical presence (Be it programs or math or business processes) is counter-productive. Especially since the patent office seems to rubber stamp every application that hits their desk. Hey. If you don't like it, write a civil nastygram to your congresscritter. Do NOT use the word "Fuck." That tends to turn them off. And in extreme case, get you visits from very grumpy people who seem to have something against doors. We're starting to see some technologically clueful folks in office, so the more people who write, the higher the chance that someone in the know might get the message.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
So, the bright side of this patent is that perhaps it will keep others from focusing on Google's obsession -- the reference popularity contest. But like any patent, it is subject to abuse, not that we know at all how Google intends to enforce it.
I have requested improvements to Google's algorithms for years to make it more possible to search for a specific thing, rather than just a popular thing, but they don't have engineers, apparently, who understand these basic needs.
AltaVista lets you wildcard, search for one word NEAR another word, use common words as part of a phrase, and construct a variety of very useful filters that are impossible with Google's popularity engine.
AltaVista used to be the best out there, but compromised their own usefulness. If AV indexed more pages and had not dropped their usenet coverage, it would still be the most useful engine by far to an advanced searcher -- one looking for very specific things. I still go there often. Just because the masses use Google does not make it quality or best for advanced users. They have stagnated for years now. The masses use a lot of things produced by monopolists who are no longer required to innovate or even improve to the level of the competition.
What I found particularly cool about their algorithm was that they can return results for pages that google has not read. If there is a link to a page google has not read on a page that google is reading, it can still return results to the unread page based on the context of the link, and the popularity of the link on other pages. Really nifty stuff.
Engineering and the Ultimate
There seems to be a lack of understanding about the original purpose of the patent system. The the distant past, knowledge was transferred from artisan to apprentice and through guilds. Back then, as now, people were very protective of their intellectual property, as it was their livelihood, so it would not be stored anywhere. If the person were to die without passing on the knowledge, it would be lost forever (like Damascus steel).
To try and stop knowledge from being lost, governments introduced a patent system (first patent recorded in 1449) so that the creator of the knowledge would still get a fair financial reward for the item.
IMHO there are 2 problems with the existing patent system implementations.
1) As the technology becomes more complicated, those who verify patents are not skilled enough to accurately judge their validity.
2) The time limit of patents is too inflexible. Many technology patents should have valid lengths of 5-10 years.
My initial reaction was "it's ok, they are patenting a truly new idea". Then I read the patent. The patent specifies an algorithm. In one sense that's good, the patent is not wide open, it's not like they are patenting "web searches" or "database indexing", they are patenting a very specific way for building indices and extracting information out of the database. On the other hand, this is bad. They are patenting an /algorithm/. They are not patenting an specific rendition of an idea, they are patenting the idea itself. This stops research. You can't do basic research that improves on the idea, because you run the risk of your modification not being /different enough/ (there's plenty of examples of algorithms where you add a little detail to make them work better on specific cases, or add details to make them more general, or variations like that). If you modify the algorithm like this and you publish your results, you /would be/ infringing on Google Inc.'s patent.
This is the reason why patents on algorithms are bad. Unlike patents on physical devices (think the common mouse trap, a cage with bait inside vs. a spring trap), with algorithms it's extremely difficult to draw the line. What would happen if someone tried to patent a mouse trap. The idea of baiting a mouse and then [???] and thereby trapping the mouse? It would be rejected. You can't just patent the intention of trapping a mouse. You have to be more specific than that. With algorithms, it might seem that you are being specific enough, when in fact you aren't.
Interestingly enough, Google's patent applies only to searches over the network (that, I really can't read the languaje used in patent claims).