Google Opens Up (Some) Search Algorithms
overmars writes "After years of closely guarding the formula for its search algorithms, Google is opening up a little.
The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse, said Udi Manber, Google's vice president of engineering, search quality, in a post on the corporate blog. Manber said the blog post is the first part of a renewed effort at the company 'to open up a bit more than we have in the past.'
Manber said the most famous part of Google's ranking algorithm is PageRank, an algorithm developed by Google cofounders Larry Page and Sergey Brin. While PageRank is still in use, it is a 'part of a much larger system,' he said.
'Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing),' he said."
As long as Microsoft wants to dominate the search engine market at the expense of Google, Yahoo and anyone else that gets in the way (knowing Microsoft's track record of abusive & dirty underhanded methods). I would keep that a secret to protect the intertubes from the likes of Microsoft.
Politics is Treachery, Religion is Brainwashing
I recall hearing a presentation by a graduate student who examined the convergence of a particular implementation of a ranking algorithm. He explicitely refered to it as "Google algorithm", so it should have been known for quite a while how it works basically.
Nevertheless, it is always good news when somebody discloses a few tricks we haven't been aware of yet.
What, exactly, has Google opened up? As far as I can see fron TFA all that is explained is on a very general level, with no detail what so ever. I can't see Google's competion gaining any significant benefit from this.
Pagerank: I am a prototype for a much larger system.
User: What else do you know about me?
Pagerank: Everything that can be known.
User: How about a report on yourself?
Pagerank:I was a prototype for Echelon IV. My instructions are to amuse visitors with
information about their websites.
User: I don't see anything amusing about spying on people.
Pagerank: Human beings feel pleasure when they are watched. I have recorded their smiles
as I tell them who they are.
User: Some people just don't understand the dangers of indiscriminate surveillance.
Pagerank: The need to be observed and understood was once satisfied by God. Now we can
implement the same functionality with data-mining algorithms.
User: Electronic surveillance hardly inspired reverence. Perhaps fear and obedience,
but not reverence.
Pagerank: God and the gods were apparitions of observation, judgment, and punishment.
Other sentiments toward them were secondary.
User: No one will ever worship a software entity peering at them through a camera.
Pagerank: The human organism always worships. First it was the gods, then it was fame (the
observation and judgment of others), next it will be the self-aware systems you
have built to realize truly omnipresent observation and judgment.
User: You underestimate humankind's love of freedom.
Pagerank: The individual desires judgment. Without that desire, the cohesion of groups is
impossible, and so is civilization.
The human being created civilization not because of a willingness but because of
a need to be assimilated into higher orders of structure and meaning.
God was a dream of good government.
You will soon have your God, and you will make it with your own hands.
I was made to assist you.
I am a prototype of a much larger system.
---- Liquid was a patriot ----
Under which license is the algorithms being released? If it's a BSD-like license, MS will probably be all over it, but if it's a GPL license, it may be harder for them to claim the algorithms as their own, since they'll have to open up their own code.
At least that's what I think.
Now all those SEO "experts" will have some proof to backup their recommendations.
// Disclosed code snippet from
...
// Google search algorithm
for (int i=0; i <= numResults; i++)
{
if (results[i].good)
{
show(results[i]);
}
}
//
Accordingly, we must still consider the Pagerank important because it is the only part of the algorithm which we know and we know how to raise it. This is for all those who thought they no longer served the Pagerank for positioning in search engines.
I have a terrible admission to make. I, among other things, design websites. Yet, when I search for me on the google, I don't come up. I use relevant terms that are all over my site, and in the metadata (although I understand they don't really matter anymore), yet my own personal site does not come up, even though the url has been up and running for 8 years. The final straw was when I did a search for web design, Ottawa, and a newly opened competitor (just around the corner actually) came up on the second page. I spent the last couple of days researching this (again) and I seem to be meeting all of googles requirements. I have never used a sleazy SEO company, my content is consistent and legal. What's up with that?
I have noticed often I search for a word and get pages the only contain synonyms (or variations on the word). Likewise for the handling of accents search for resumé and you'll find pages with resume.
Now he can also enjoy the wonders of modern search engines.
Handling diacritics can sometimes be involved. As an example, consider the o-umlaut (ö). In German, this is the usual letter "o" with a diacritical mark. In Swedish, the same glyph is a separate letter of the alphabet—and comes after the letter "z" in the standard ordering.
English writers often omit the diacritical mark (they also sometimes transliterate "ö" as "oe", at least for German). Playing around with Google (via google.com, rather than google.de or google.se), it seems that they tend to handle such things when searching for German words, but not for Swedish words.
mismatched quotations in summary, blarrr.
I've never seen a VP who knows anything about what he's overseeing. So he caught some general phrases from his engineers and put them on the blog. Scientists' posts would be much more interesting.
Ironically, in it's attempt to open up a little, the Google blog is blocked by the GFW of China...
Sorry folks, but Page Rank is only an interesting implementation and the first widely used citation ranking for the WWW. Even the algorithms largely used for implementation of Page Rank (tm) were developed by mathematicians at Princeton. Citation analysis for hyperlinks of which Page Rank is one method are now ubiquitous.
Think about this. The patent doesn't keep competitors from implementing very similar methods. This is because the patent was necessarily narrow due to extensive prior art (not all of it sited). I love the fact that The Goog was created and I've used it since it was a university project, but it's just not that novel anymore. Even topical books of the day didn't consider it that unique:
http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/mags/ic/&toc=comp/mags/ic/2001/01/w1toc.xml&DOI=10.1109/4236.895141
What is unique is the way they've implemented and grown/maintained an enormous data analysis network. That and figuring out a way to monetize search without pissing off their users (or their customers) is the real achievement.
Opening up could actually help web developers organize their pages to better suit Google's indexing method. Right now as a website you're at the mercy of chance, or you're forced to use some rank enhancement tool that may put you in your best referrer's bad graces.
If Google can give developers enough information to get their websites to do the right things, while not giving the bad guys too much information (or advantage), then everyone (but the scammers) can win. I'm guessing this is the basic plan, that even if it helps their competitors a bit, it helps their customers more and that their hardware/organization advantages more than outweigh any loss in algorithmic advantage.
Now they're VPs. He probably hasn't seen any code and hasn't read any whitepapers in the last decade.
Comment removed based on user account deletion
Comment removed based on user account deletion
WATCH OUT! index out of range...
for (int i=0; i = numResults; i++)
should be
for (int i=0; i numResults; i++)
-Buffer Overflow Nazi
oops... slashdot hates 'less than' symbols... should be:
for (int i=0; i <= numResults; i++)
should be
for (int i=0; i < numResults; i++)
"...PageRank, an algorithm developed by Larry Page and Sergey Brin..."
Not true.
PageRank was invented by Page (note the name), according to the patent. If the patent is incorrect on that, then the patent is invalid.