Slashdot Mirror


Google Opens Up (Some) Search Algorithms

overmars writes "After years of closely guarding the formula for its search algorithms, Google is opening up a little. The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse, said Udi Manber, Google's vice president of engineering, search quality, in a post on the corporate blog. Manber said the blog post is the first part of a renewed effort at the company 'to open up a bit more than we have in the past.' Manber said the most famous part of Google's ranking algorithm is PageRank, an algorithm developed by Google cofounders Larry Page and Sergey Brin. While PageRank is still in use, it is a 'part of a much larger system,' he said. 'Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing),' he said."

52 of 86 comments (clear)

  1. Dont do it Google! by FudRucker · · Score: 1, Interesting

    As long as Microsoft wants to dominate the search engine market at the expense of Google, Yahoo and anyone else that gets in the way (knowing Microsoft's track record of abusive & dirty underhanded methods). I would keep that a secret to protect the intertubes from the likes of Microsoft.

    --
    Politics is Treachery, Religion is Brainwashing
    1. Re:Dont do it Google! by spidr_mnky · · Score: 3, Insightful

      As long as we're twisting the lion's tail, you might instead say that the more people and companies who desire common progress share their work, the more people and companies who want to isolate themselves and their work for the sake of competition will be unable to keep up. Therefore, the more Google publishes, the harder they will be able to fight (our antagonist) MS.

      In reality, I'm sure Google's leadership has done some heavy analysis on exactly how much openness benefits them.

    2. Re:Dont do it Google! by risk+one · · Score: 5, Insightful

      I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this. Ideas like PageRank gave Google the edge in the early days, but now, their advantage lies in other areas. The have a stunning amount of capital tied up in hardware, giving them amazing speed, and amazing amounts of data. They have code optimized to handle those amounts of data in reasonable time. They have the experience to take simple probability models like the ones described in the article, and make them work with those amounts of data.

      This is why it's impossible to beat Google at search and other data-based markets. It's not one simple patented idea anymore. If it was just that, Google would've disappeared years ago. The only way to beat the points described above, is to have the capital to buy the hardware, and knowledge to match Google. Microsoft can do that, but Google has one other thing that Microsoft doesn't. They understand their developers. They understand that if you give these kinds of scientist/developers an interesting problem, a fantastic dataset and the freedom to attack it in their own way, you barely even have to pay them anymore. The interest will take over and completely fuel the project. They will work overtime, and come in on the weekends, without being asked.

      That will bring energy to a project and a company, that you can never get through any tactic that Microsoft is likely to employ. I admit I don't precisely know what Microsoft is like on the inside, but I simply cannot conceive of them as a company that understands the joy of programming, or the joy of science (which is a huge big part of information retrieval). In any case, one blog post with some sketchy details isn't going to tell Microsoft anything they don't know already.

    3. Re:Dont do it Google! by Daengbo · · Score: 2, Interesting

      In reality, I'm sure Google's leadership has done some heavy analysis on exactly how much openness benefits them.
      and
      The search engine company has kept its search formula a closely guarded secret for two reasons: competition and to prevent abuse

      Security through obscurity isn't a good plan, and Google knows that.

    4. Re:Dont do it Google! by Daengbo · · Score: 1

      Your comment contradicts many MS bloggers.

    5. Re:Dont do it Google! by Daengbo · · Score: 1

      Oh, you mean the ones that have been fired over the contents of their blogs? Not him, but I can see the article you linked as supporting evidence for my point. ;)

      Anyway, here's his response. /certainly not angry,
    6. Re:Dont do it Google! by Anonymous Coward · · Score: 2, Interesting

      I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this. Coming up with things is easy; implementing them is hard. Any average Joe Sixpack can come up with the idea of a flying car in five seconds, but to actually build one is another matter entirely - and doing so in a commercially viable way is yet another matter.

      Remember what Edison said about inspiration and perspiration?
    7. Re:Dont do it Google! by kestasjk · · Score: 3, Funny

      I took one course in Information Retrieval, and I could come up with most of these things with an evening or two of brainstorming, at least on a general level like this.

      Of course you could. ;-) You took a course in Information Retrieval, after all.
      --
      // MD_Update(&m,buf,j);
    8. Re:Dont do it Google! by dreamsofcaffeine · · Score: 1

      Coming up with something is a little less abstract than you'd like it to be. Coming up with these ideas includes some rough thought of how to implement them; only having an idea about something is definitely not ``coming up with something''.

    9. Re:Dont do it Google! by SnprBoB86 · · Score: 1

      I've worked at both, they are more similar than not...

      I'll be joining Microsoft full-time this summer; if that says anything.

      --
      http://brandonbloom.name
  2. What exactly is open? by k33l0r · · Score: 5, Insightful

    What, exactly, has Google opened up? As far as I can see fron TFA all that is explained is on a very general level, with no detail what so ever. I can't see Google's competion gaining any significant benefit from this.


    1. Re:What exactly is open? by Anonymous Coward · · Score: 2, Interesting

      Right. And the competitors already know pieces of what Google has, as a result of the inevitable stream of engineers leaving to take new jobs. Particularly at SV startups founded by ex-Googlers.

      While Rob Enderle puts the matter trollishly, I agree with the thrust of what he says. Google has been given a free pass on this. Their main product/service is definitely not open source, or free software, and in fact is less open that most of Microsoft's products (for example). At least with Windows and .Net, we can obtain detailed documentation on APIs, tools, and (often) internal architecture. Sponsoring "summer of code" is a tiny contribution compared with the size of their revenues and profits, comparable to the PR-wise philanthropic programs of your typical Fortune 500 company.

    2. Re:What exactly is open? by anomalous+cohort · · Score: 1

      From TFA...

      Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).

      Obviously, this usage of the word "open" is not related to open source software. It's more like he is willing to talk about it at all.

  3. Deus Ex, anyone? by Ortega-Starfire · · Score: 1

    Pagerank: I am a prototype for a much larger system.
    User: What else do you know about me?
    Pagerank: Everything that can be known.
    User: How about a report on yourself?
    Pagerank:I was a prototype for Echelon IV. My instructions are to amuse visitors with
    information about their websites.
    User: I don't see anything amusing about spying on people.
    Pagerank: Human beings feel pleasure when they are watched. I have recorded their smiles
    as I tell them who they are.
    User: Some people just don't understand the dangers of indiscriminate surveillance.
    Pagerank: The need to be observed and understood was once satisfied by God. Now we can
    implement the same functionality with data-mining algorithms.
    User: Electronic surveillance hardly inspired reverence. Perhaps fear and obedience,
    but not reverence.
    Pagerank: God and the gods were apparitions of observation, judgment, and punishment.
    Other sentiments toward them were secondary.
    User: No one will ever worship a software entity peering at them through a camera.
    Pagerank: The human organism always worships. First it was the gods, then it was fame (the
    observation and judgment of others), next it will be the self-aware systems you
    have built to realize truly omnipresent observation and judgment.
    User: You underestimate humankind's love of freedom.
    Pagerank: The individual desires judgment. Without that desire, the cohesion of groups is
    impossible, and so is civilization.
    The human being created civilization not because of a willingness but because of
    a need to be assimilated into higher orders of structure and meaning.
    God was a dream of good government.
    You will soon have your God, and you will make it with your own hands.
    I was made to assist you.
    I am a prototype of a much larger system.

    --
    ---- Liquid was a patriot ----
  4. License? by Bootarn · · Score: 2, Interesting

    Under which license is the algorithms being released? If it's a BSD-like license, MS will probably be all over it, but if it's a GPL license, it may be harder for them to claim the algorithms as their own, since they'll have to open up their own code.
    At least that's what I think.

    1. Re:License? by Anonymous Coward · · Score: 2, Informative

      I dont think algorithms are typically licensed. source code is licensed, algorithms are patented.

    2. Re:License? by vertigoCiel · · Score: 2, Informative

      There's no indication in the article that any code or algorithms will be released. They're just talking about it on a very broad, conceptual level. The headline and summary are quite misleading.

  5. The secret ingredient... by nweis · · Score: 5, Funny

    // Disclosed code snippet from
    // Google search algorithm

    for (int i=0; i <= numResults; i++)
    {
        if (results[i].good)
        {
            show(results[i]);
        }
    }

    // ...

    1. Re:The secret ingredient... by nfk · · Score: 2, Funny

      You forgot to sort the results by goodness. Do you work for Microsoft?

    2. Re:The secret ingredient... by Vexorian · · Score: 1

      You forgot to sort the results by goodness.
      That's google's secret
      --

      Copyright infringement is "piracy" in the same way DRM is "consumer rape"
    3. Re:The secret ingredient... by Memroid · · Score: 1

      cool. So a search for "the", which has about 16,570,000,000 results, only needs to loop 16 billion times.

    4. Re:The secret ingredient... by nweis · · Score: 1

      Actually, Google doesn't return more than 1,000 results for any search query. Try it: http://www.google.com/search?q=the&start=991

  6. consider the Pagerank important by voodoosws · · Score: 2, Interesting

    Accordingly, we must still consider the Pagerank important because it is the only part of the algorithm which we know and we know how to raise it. This is for all those who thought they no longer served the Pagerank for positioning in search engines.

  7. Mystified by 'the google" by gary_7vn · · Score: 4, Interesting

    I have a terrible admission to make. I, among other things, design websites. Yet, when I search for me on the google, I don't come up. I use relevant terms that are all over my site, and in the metadata (although I understand they don't really matter anymore), yet my own personal site does not come up, even though the url has been up and running for 8 years. The final straw was when I did a search for web design, Ottawa, and a newly opened competitor (just around the corner actually) came up on the second page. I spent the last couple of days researching this (again) and I seem to be meeting all of googles requirements. I have never used a sleazy SEO company, my content is consistent and legal. What's up with that?

    1. Re:Mystified by 'the google" by Nirvelli · · Score: 1

      Do you host your own site?
      If not, maybe your host has their robots.txt set to block searching?

    2. Re:Mystified by 'the google" by OMGZombies · · Score: 1

      The first two results for +ottawa +web +design are: -Atomic Motion - Ottawa Web Design and Development -Envision Online - Ottawa Web Designers Ottawa Web Site Designs Your site appears to be eyestir: -EyeStir Visual Communications, Digital Signage, PowerPoint Searching for +Visual +Communications +Ottawa, your site doesn't show up until the 12th page, but searching for +digital +signage +canada, it's the third result. I'd guess google is ranking these results according to the page title. Try adding web design and ottawa to your page title and see what happens.

    3. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Thanks, I think that is the key, I have to load up on the correct search terms. I just "localized" last night on Google Web last night, so adding Canada really helps. Prior to that, the results were even worse. It's funny but I have a picture of myself and my cat fritz with a ufo in the background, Fritz sees it because I am looking at the camera. The text on the picture reads "I want to believe". The jpeg file has that name too, and because of the new X files movie, I am getting lots of hits from that string. The other thing Google really likes, and they tell you this quite explicitly, is links from important sites. But of course that can be problematical.

    4. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Yes, thanks, I am using webmaster, very powerful and useful.

    5. Re:Mystified by 'the google" by popra · · Score: 1

      1. Make sure that at some point google didn't label you as a "spam" site, http://www.google.com/webmasters/ is a good starting point for learning google's view of your site's health

      2. Make sure that navigation in your site makes sense from google's bot perspective. Map categories/subcategories in your site to folders in the URL of your site. URLs of your site should be preety, contain relevant words and be relatively short, ie http://example.com/webdesign/logo/price-quote-for-logo-design.html rather than http://example.com/siteengine.php?id_category=12&subcategory=93&articleid=112&lang=en&....

      3. Don't use FLASH, JAVASCRIPT or (I)FRAMES for navigation/menus

      4. Navigation in your site should be easy, use: menus with main/sub categories, breadcrumbs, related pages, etc.

      5. Make sure that there are links on the internet that link to your site (to the front page, but also very important to sections inside you web site). Take time to build links: ie. when posting in forums make a habit of linking back to your site, especially if there's something on your site that is relevant to the discussion. When you do a website for a client, if possible add a link on the website, pointing to your website. Something like: "Web design by Your Company, Ottawa". Make sure to add the proper "title" attribute to links to your site and the links inside your site.

      6. Change you hosting company and get your own IP to host your website. (the shared IP on which your website might be running, could be marked as "spammy" especially if you're site is sharing it with other shady sites)

    6. Re:Mystified by 'the google" by The+MESMERIC · · Score: 1

      1. Content does NOT necessarily have to be interesting.

      But FOCUSED and INFORMATIVE.

      All my websites are borderline boring - but nevertheless focused and informative :)

    7. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      www.eyestir.com I have submitted to google.

    8. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Google webmaster informs me that there are 789 links to my site. What does it take?

    9. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      No, my site is hosted by blacksun.ca, other sites that I have done come up on a keyword search just fine, that is part of the reason why I am mystified. Or googled or something like that. Thanks for the suggestion!

    10. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Yes, I know that, but why doesn't my site come up on Google? Oh, sorry I thought you said "incontinent".

    11. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Thank you very much and to all the others that offered help, it really is a serious problem for me. Most of my work is still obtained the old fashioned way, word of mouth, cold calls, et al, but it would be nice if once in a while I got some calls from people who had found the site. www.eyestir.com

    12. Re:Mystified by 'the google" by gary_7vn · · Score: 1

      Thank you. Excellent advice. I will look at this for sure. I am already doing some of the things you suggest, will try the rest.

    13. Re:Mystified by 'the google" by The+MESMERIC · · Score: 1

      sweat and experience.

      Like for example now I am trying to optimize this website now: Farmhouses in Tuscany

      When i first saw the original - I had to completely rewrite it from scratch - it took me over 3 months of research to come up with better text and structure - and the site is still not 100% finished.

      It is not going to be easy .. it never is.

      Lot's of energy goes into SEO.

      The most important thing to remember is play by the rules - but play very hard.

    14. Re:Mystified by 'the google" by The+MESMERIC · · Score: 1

      I tell you what it doesn't take

      FRONTPAGE

      It kills any chances of your site doing well

      Hand-code your site

  8. Much can be determined by using google by hey · · Score: 1

    I have noticed often I search for a word and get pages the only contain synonyms (or variations on the word). Likewise for the handling of accents search for resumé and you'll find pages with resume.

    1. Re:Much can be determined by using google by hey! · · Score: 1

      Well, they're probably using some kind of hash based document fingerprinting anyway. Ignoring low entropy characteristics of a word when calculating the fingerprint makes sense, because you can always go back and take it into account once you've eliminated 99.999999999% of the documents on the Internet.

      Nice, nick, by the way.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  9. Making Open Source propreitary? by FrostDust · · Score: 1

    In many ways, Google is much more proprietary than Microsoft is, and they actually used open source software to get there. So unlike Microsoft, which started off proprietary and has gradually been opening its stuff up, Google starts off getting other people's open stuff, turns it proprietary and then makes money off it. It kind of redefines 'pirate.' I think Google is feeling a little bit of the heat because people are starting to focus on that a bit." While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory?
    1. Re:Making Open Source propreitary? by flooey · · Score: 1

      While I'm pretty sure Google wouldn't be so ignorant as to violate open source licenses for the code they utilize, is there any claim to his "pirate" label, or is he just trying to be inflamitory? I think what he's saying is that he thinks Google violates the spirit of licenses (particularly the GPL), even though they follow all the requirements of them. Some people get upset that the Internet makes it so that you can separate the using of software from the running of it (whereas in non-networked environments, those are equivalent), and all the obligations in the licenses are stated in terms of people who run the software, so companies like Google can modify software to their heart's content and never have to release their modifications, because they're not letting anyone else run the software.

      Personally, I can't imagine that Stallman and others were ignorant of the idea of accessing software over a network, and they didn't make any effort to change the rules in GPL v3 to eliminate that use case, so I think he's somewhat on his own in that respect.
  10. Diacritics and language by D.+J.+Keenan · · Score: 1

    Handling diacritics can sometimes be involved. As an example, consider the o-umlaut (ö). In German, this is the usual letter "o" with a diacritical mark. In Swedish, the same glyph is a separate letter of the alphabet—and comes after the letter "z" in the standard ordering.

    English writers often omit the diacritical mark (they also sometimes transliterate "ö" as "oe", at least for German). Playing around with Google (via google.com, rather than google.de or google.se), it seems that they tend to handle such things when searching for German words, but not for Swedish words.

  11. To be fair, he's a VP by melted · · Score: 1

    I've never seen a VP who knows anything about what he's overseeing. So he caught some general phrases from his engineers and put them on the blog. Scientists' posts would be much more interesting.

    1. Re:To be fair, he's a VP by Temporal · · Score: 2, Informative

      The engineering VPs at Google are all engineers themselves. Udi himself was hired for his extensive background in web search, at Yahoo and Amazon. He knows a great deal about what he oversees.

  12. oh the irony... by mutantcamel · · Score: 1

    Ironically, in it's attempt to open up a little, the Google blog is blocked by the GFW of China...

  13. No, they _used to be_ engineers by melted · · Score: 1

    Now they're VPs. He probably hasn't seen any code and hasn't read any whitepapers in the last decade.

  14. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  15. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  16. Re:Your indexing is wrong by fodi · · Score: 1

    WATCH OUT! index out of range...

    for (int i=0; i = numResults; i++)

    should be

    for (int i=0; i numResults; i++)

    -Buffer Overflow Nazi

  17. Re:Your indexing is wrong by fodi · · Score: 1

    oops... slashdot hates 'less than' symbols... should be:

    for (int i=0; i <= numResults; i++)

    should be

    for (int i=0; i < numResults; i++)

  18. Not true... by simplerThanPossible · · Score: 1

    "...PageRank, an algorithm developed by Larry Page and Sergey Brin..."

    Not true.

    PageRank was invented by Page (note the name), according to the patent. If the patent is incorrect on that, then the patent is invalid.