Google Opens Up (Some) Search Algorithms
overmars writes "After years of closely guarding the formula for its search algorithms, Google is opening up a little.
The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse, said Udi Manber, Google's vice president of engineering, search quality, in a post on the corporate blog. Manber said the blog post is the first part of a renewed effort at the company 'to open up a bit more than we have in the past.'
Manber said the most famous part of Google's ranking algorithm is PageRank, an algorithm developed by Google cofounders Larry Page and Sergey Brin. While PageRank is still in use, it is a 'part of a much larger system,' he said.
'Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing),' he said."
As long as Microsoft wants to dominate the search engine market at the expense of Google, Yahoo and anyone else that gets in the way (knowing Microsoft's track record of abusive & dirty underhanded methods). I would keep that a secret to protect the intertubes from the likes of Microsoft.
Politics is Treachery, Religion is Brainwashing
What, exactly, has Google opened up? As far as I can see fron TFA all that is explained is on a very general level, with no detail what so ever. I can't see Google's competion gaining any significant benefit from this.
Pagerank: I am a prototype for a much larger system.
User: What else do you know about me?
Pagerank: Everything that can be known.
User: How about a report on yourself?
Pagerank:I was a prototype for Echelon IV. My instructions are to amuse visitors with
information about their websites.
User: I don't see anything amusing about spying on people.
Pagerank: Human beings feel pleasure when they are watched. I have recorded their smiles
as I tell them who they are.
User: Some people just don't understand the dangers of indiscriminate surveillance.
Pagerank: The need to be observed and understood was once satisfied by God. Now we can
implement the same functionality with data-mining algorithms.
User: Electronic surveillance hardly inspired reverence. Perhaps fear and obedience,
but not reverence.
Pagerank: God and the gods were apparitions of observation, judgment, and punishment.
Other sentiments toward them were secondary.
User: No one will ever worship a software entity peering at them through a camera.
Pagerank: The human organism always worships. First it was the gods, then it was fame (the
observation and judgment of others), next it will be the self-aware systems you
have built to realize truly omnipresent observation and judgment.
User: You underestimate humankind's love of freedom.
Pagerank: The individual desires judgment. Without that desire, the cohesion of groups is
impossible, and so is civilization.
The human being created civilization not because of a willingness but because of
a need to be assimilated into higher orders of structure and meaning.
God was a dream of good government.
You will soon have your God, and you will make it with your own hands.
I was made to assist you.
I am a prototype of a much larger system.
---- Liquid was a patriot ----
Under which license is the algorithms being released? If it's a BSD-like license, MS will probably be all over it, but if it's a GPL license, it may be harder for them to claim the algorithms as their own, since they'll have to open up their own code.
At least that's what I think.
// Disclosed code snippet from
...
// Google search algorithm
for (int i=0; i <= numResults; i++)
{
if (results[i].good)
{
show(results[i]);
}
}
//
Accordingly, we must still consider the Pagerank important because it is the only part of the algorithm which we know and we know how to raise it. This is for all those who thought they no longer served the Pagerank for positioning in search engines.
I have a terrible admission to make. I, among other things, design websites. Yet, when I search for me on the google, I don't come up. I use relevant terms that are all over my site, and in the metadata (although I understand they don't really matter anymore), yet my own personal site does not come up, even though the url has been up and running for 8 years. The final straw was when I did a search for web design, Ottawa, and a newly opened competitor (just around the corner actually) came up on the second page. I spent the last couple of days researching this (again) and I seem to be meeting all of googles requirements. I have never used a sleazy SEO company, my content is consistent and legal. What's up with that?
I have noticed often I search for a word and get pages the only contain synonyms (or variations on the word). Likewise for the handling of accents search for resumé and you'll find pages with resume.
Handling diacritics can sometimes be involved. As an example, consider the o-umlaut (ö). In German, this is the usual letter "o" with a diacritical mark. In Swedish, the same glyph is a separate letter of the alphabet—and comes after the letter "z" in the standard ordering.
English writers often omit the diacritical mark (they also sometimes transliterate "ö" as "oe", at least for German). Playing around with Google (via google.com, rather than google.de or google.se), it seems that they tend to handle such things when searching for German words, but not for Swedish words.
I've never seen a VP who knows anything about what he's overseeing. So he caught some general phrases from his engineers and put them on the blog. Scientists' posts would be much more interesting.
Ironically, in it's attempt to open up a little, the Google blog is blocked by the GFW of China...
Now they're VPs. He probably hasn't seen any code and hasn't read any whitepapers in the last decade.
Comment removed based on user account deletion
Comment removed based on user account deletion
WATCH OUT! index out of range...
for (int i=0; i = numResults; i++)
should be
for (int i=0; i numResults; i++)
-Buffer Overflow Nazi
oops... slashdot hates 'less than' symbols... should be:
for (int i=0; i <= numResults; i++)
should be
for (int i=0; i < numResults; i++)
"...PageRank, an algorithm developed by Larry Page and Sergey Brin..."
Not true.
PageRank was invented by Page (note the name), according to the patent. If the patent is incorrect on that, then the patent is invalid.