Slashdot Mirror


Cracking the Google Code... Under the GoogleScope

jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""

11 of 335 comments (clear)

  1. On the minds of all slashdotters, by uberjoe · · Score: 5, Funny

    So will this make it easier or harder to find porn?

    --

    The days of the digital watch are numbered.

    1. Re:On the minds of all slashdotters, by uberjoe · · Score: 5, Funny

      Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young.

      --

      The days of the digital watch are numbered.

  2. Google what is best in life by kensai · · Score: 5, Funny

    To crush artificial link inflation and hear the lamintations of search engine spam

  3. in case of slashdotting, article text by Anonymous Coward · · Score: 5, Informative

    Cracking the Google Code... Under the GoogleScope
    Google's US Patent confirms information retrieval is based on historical data.

    Publication Date: 5/8/2005 9:51:18 PM

    Author Name: Lawrence Deon

    An Introduction: ...if you thought you cracked the Google Code and had Google all figured out ... guess again.

    Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again.

    Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.

    The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.

    What exactly do these changes mean to you?
    Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:

    "A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."

    Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.

    Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.

    Here's how Google scores your web pages.

    In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
    What's new and interesting is what Google takes into account in determining the freshness of a web page.

    For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.

    According to their patent filing Google records and scores the following web page changes to determine freshness.
    The frequency of all web page changes
    The actual amount of the change itself... whether it is a substantial change redundant or superfluous
    Changes in keyword distribution or density
    The actual number of new web pages that link to a web page
    The change or update of anchor text (the text that is used to link to a web page)
    The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
    Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.

    Developing your web page augments for page freshness.

    Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.

    Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.

    How do you unravel that statement and differentiate between the two types of content?

    An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.

    A page related to winter clothin

  4. Unintended side effects of the Google arms race by 14erCleaner · · Score: 5, Interesting

    It just occurred to me that, as Google changes its algorithms, it'll just create more business for the Search Engine Optimization consultant. When web sites drop in the Google rankings, they'll want to make changes to move back up, and will hire the SEO again to do so.

    --
    Have you read my blog lately?
    1. Re:Unintended side effects of the Google arms race by AKAImBatman · · Score: 5, Interesting
      Here's a thought: How about companies try to offer useful services rather than "optimize" their search engine results? I've gotten several top hits on Google by the complete accident of providing useful services or information in the past. Traditional advertising such as adclicks and dmoz listings also help. Not once have I wasted my time trying to game the system.

      Companies need to start realizing that making money is about providing what customers want. Advertising is a great way of getting your name out, but only a good product or service will actually carry through. So in that frame of thinking, I highly recommend that companies:

      • Stop looking at "cost cutting" by reduction, and start looking at "using existing resources to provide relavent products"
      • Start hiring employees who know what they're doing and listen to them
      • Stop wasting your money on search engine optimizations.
      • Be good to the customer, and the cutomer will be good to you. If you don't know why people are upset or unhappy, grab a couple off the street and ask.
    2. Re:Unintended side effects of the Google arms race by MrNiceguy_KS · · Score: 5, Funny
      If you don't know why people are upset or unhappy, grab a couple off the street and ask.

      I'm unhappy because I was grabbed off the street. May I go now?

      Please?

      --
      Redundancy is good And also good.
  5. Yes by Anonymous Coward · · Score: 5, Funny

    But when I search on Tiger, a mail-order company's site still comes up above Apple's. Is anyone at Google listening?

  6. Take the article with a grain of salt... by nganju · · Score: 5, Insightful


    The article is not written by a Google employee, nor did the author speak with anyone at Google. It's simply his analysis of the patent document filed by Google.

    Also, at the bottom of the article after the author's name, there's a link to some search optimization service's website.

    --
    There are 2 kinds of people in this world. Those that can keep their train of thought,
  7. effect on search engine optimizers by nemexi · · Score: 5, Informative

    One of the most interesting (and obvious) effects of Google's changes: The company which once ranked first for the phrase "search engine optimization", SEOinc, is now nowhere to be found -- even a search for the company's name doesn't bring up the company's website. SEOincs response has been a -- somewhat ineffective -- try to bring those reporting on its fall to "cease and desist".

  8. Two Keys: Data Mining and Delay by RonBurk · · Score: 5, Interesting
    The first big mistake webmasters make when trying to understand how Google ranks search results is failing to grasp the idea of data mining. The Google folks come from a data mining background, the constantly write about data mining algorithms, it would be highly surprising if the bulk of the Google algorithm was not constructed via data mining.

    What does that mean? At the highest level, it means that most of the Google algorithm is constructed by a machine. You give the machine human-constructed examples of how to rank a sample set of pages (notice those want ads where Google is hiring people who can inspect and assess the quality of web pages?) and it then uses essentially brute-force techniques to test every possible combination of your ranking variables to find the simplest formula that ranks pages the same way the human did.

    There is no human at Google "twisting dials" to alter individual parameters of a formula. The machine constructs the algorithm, and it can therefore easily be so complex that no human can understand it. Tweaking the algorithm becomes a process of changing or adding to your "training set" of human-ranked pages, and letting the data mining process come up with a revised algorithm.

    For example, Google could invent a new variable called "category", and identify each page as belonging to category Astronomy, Botulism, Country, [...] and Other. Once that variable is thrown into the mix, then the Google "aglorithm" is essentially free to vary wildly from one type of subject matter to the next. For example, you might see someone with a Real Estate site swearing up and down that inbound links are no longer as important, while someone with an Astronomy site might swear that, no, inbound links are more important than ever. You can see exactly this kind of bickering in most of the forums that people who hope to do Search Engine Optimization frequent.

    The other big mistake people make in trying to see how to game the Google algorithm is "delay". In studying how people manage (or fail to manage) complex systems, psychologists learned that people generally would fail if a delay was introduced between their actions and the results of their actions.

    In one very simple test, people were charged with trying to stabilize the temperature in a virtual refridgerator. They had one dial, and there was exactly one piece of feedback: the current temperature in the fridge. However, they were not explicitly told that there was a delay between moving the dial and when the results of that action would stabilize.

    The responses of those test subjects was eerily similar to what we see in Google-gaming webmasters these days. Some people swore up and down that some human behind the scenes was directly tweaking the results to thwart whatever they did. Others became frustrated and decided that nothing they did really mattered, so they would just swing the dial back and forth between its minimum and maximum settings.

    What does this have to do with Google? These days, Google can change their algorithm relatively frequently, and the algorithm can vary by the relative date of various things. The net sum is, there's a delay between when your page is first ranked and when it is likely to arrive at a relatively stable ranking. This can drive webmasters nuts as they think they've done something clever to rank their page high, but then it drops a week later. Although it doesn't occur to them, the important question is: did the change cause the high ranking or did it cause the sudden decline?

    The few people who did master the simple refridgerator system? Well, they sounded more like some of the people who are more successful at gaming Google. Those folks tend to say things like: "just make one change and then leave it alone for a while to see what happens."

    Can you still game the Google algorithm? Undoubtedly in specific cases. But it's getting harder. The Google algorithm was always complex, but what's changing is that the days when a few variables (such as inbound link count) generally swamped the effects of all the others is drawing to a close. We are approaching the day when the best technique to rank highly with Google will be: sit down at your keyboard and make more good content every day.