Cracking the Google Code... Under the GoogleScope
jglazer75 writes "From the analysis of the code behind Google's patents: "Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again. ... In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates. What's new and interesting is what Google takes into account in determining the freshness of a web page.""
Cracking the Google Code... Under the GoogleScope
...if you thought you cracked the Google Code and had Google all figured out ... guess again.
... guess again.
Google's US Patent confirms information retrieval is based on historical data.
Publication Date: 5/8/2005 9:51:18 PM
Author Name: Lawrence Deon
An Introduction:
Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out
Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.
The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.
What exactly do these changes mean to you?
Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:
"A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."
Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.
Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.
Here's how Google scores your web pages.
In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
What's new and interesting is what Google takes into account in determining the freshness of a web page.
For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.
According to their patent filing Google records and scores the following web page changes to determine freshness.
The frequency of all web page changes
The actual amount of the change itself... whether it is a substantial change redundant or superfluous
Changes in keyword distribution or density
The actual number of new web pages that link to a web page
The change or update of anchor text (the text that is used to link to a web page)
The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.
Developing your web page augments for page freshness.
Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.
Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.
How do you unravel that statement and differentiate between the two types of content?
An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.
A page related to winter clothin
I use google quite a bit to check on recent spyware/malware (used it this morning) and with all due respect, the first few links typically are for spyware products that don't work, domain parking sites (search engines themselves), requiring some amount of diligence to get to the "real" sites that have information.
If this claim is true, I guess we'll have to wait the typical "four to six weeks for delivery."
One of the most interesting (and obvious) effects of Google's changes: The company which once ranked first for the phrase "search engine optimization", SEOinc, is now nowhere to be found -- even a search for the company's name doesn't bring up the company's website. SEOincs response has been a -- somewhat ineffective -- try to bring those reporting on its fall to "cease and desist".
Google United - Google Patent Examined
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control.
Publication Date: 4/7/2005 7:41:24 AM
By Jim Hedger, StepForth News Editor, StepForth Placement Inc.
Thoughts on Google's patent... "Information retrieval based on historical data."
Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control. Some of the ideas and concepts covered in the document are almost certainly worked into the current algorithm running Google. Some are being worked in as this article is being written. Some may never see the blue-light of electrons but are pretty good ideas so it might have been considered wise to patent them. Google's not saying which is which. While not exactly War and Peace, it's a pretty complex document that gives readers a glimpse inside the minds of Google engineers. What it doesn't give is a 100% clear overview of how Google operates now and how the various ideas covered in the patent application will be integrated into Google's algorithms. One interesting section seems to confirm what SEOs have been saying for almost a year, Google does have a "sandbox" where it stores new links or sites for about a month before evaluation.
Google is in the midst of sweeping changes to the way it operates as a search engine. As a matter of fact, it isn't really a search engine in the fine sense of the word anymore. It isn't really a portal either. It is more of an institution, the ultimate private-public partnership. Calling itself a media-company, Google is now a multi-faceted information and multi-media delivery system that is accessed primarily through its well-known interface found at www.google.com.
Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll's over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn't work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn't work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.
In its recent SEC filing, the first it has produced since going public in August 2004, Google said it was going to spend a lot of money to continue outpacing its rivals. This year they figure they will spend about $500 million to develop or enhance newer technologies. In 2004 and 2003, Google spent $319 million and $177 million respectively. The increase in innovation-spending corresponds with a doubling of Google's staff headcount which has jumped from 1628 employees in 2003 to 3021 by the end of 2004.
Over the past five years Google has produced a number of features that have proven popular enough to be included among its public-search offerings. On their front page, these features include Image Search, Google Groups, Google News, Froogle, Google Local, and Google Desktop. There are dozens of other features which can be accessed by cli
sigs, as if you care.
From the article: GOOGLE has plans that will dramatically improve the results of internet news searches, by ranking them according to quality rather than simply by their date and relevance to search terms. The ambitious system is revealed by patents filed in the US and around the world (WO 2005/029368) by researchers based at the company's headquarters in Mountain View, California.
Sometimes search engine optimization isn't about making a hack site rank well. Sometimes it is about getting the traffic that a really nifty site deserves.
In fact, I wish all the legit sites did everything they should morally do in terms of SEO. Then the spam sites wouldn't have such an easy time pushing them out of the way.
From a business perspective, money spent on making non-spammy search engine optimizations can be much more effective than money spent on marketing or public relations.
--
Scientific calculator with hex, octal, decimal, and binary
There's a whole range. Some will tell you how to rewrite your web page so that search engines will classify it better. That seems legit. Others will try to sell you on "link farms" and other hacks to improve your ratings - not so legit. I've also seen spamming websites that have google-accessible logs with fake referrers, or spamming blogs like /. with links in your sig [place link here].
Intron: the portion of DNA which expresses nothing useful.
There seems to be a lot of weight put on web page freshness. I host a friend's site containing the collection of poems by Ella Wheeler Wilcox. She lived in the 1800s so one cannot expect to see any new material from her.
The site is mostly static but is rich with cultural value. It's currently the number one hit on Google. I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.
The race isn't always to the swift... but that's the way to bet!
Notice that the author of the article is from an SEO himself: Rank your way to the bank. Clearly there is no conflict of interest here: he has no interest in making sites think they need to hire a new SEO to get around these "new" techniques... right... (the patent was filed in late 2003, IIRC)
Posting as AC to avoid the inevitable karma hit so here goes...
... just a good, search engine friendly site.
I'm a former SEO guy...I've worked with many companies, large and small, to optimize their websites. I've done everything from online pharmaceuticals to christian mission trips. I've tried every trick in the book over a number of years...and I can tell you that as long as search engines exist, I seriously doubt the SEO companies will disappear.
Why? Simply put, people (and companies) do NOT understand how to present their content in a way that an automated bot can read and rank them.
Towards the end of my SEO consulting days, my advice was over and over again: Content is king. Build a good website, with good content, and make sure to include all the necessary elements to identify it as a good website.
Usually, that meant that I would go through their site, fill in missing pieces and recommend additional content. No schemes, no crazy link deals
So, keep in mind not ALL SEO guys are bad...just most of them...but companies will always need SEO guys to come in and fill in their site's holes.
Incase you hadn't noticed google links are direct.
/ slashdot.org/~eluusive&ei=A_-AQubaOq2gYNujqccO
You sure about that? Try copying and pasting a Google results link.
For example, let's search Google for "elluusive". The first result was your slashdot "homepage", at http://slashdot.org/~eluusive, which at first glance seems to be a direct link. But if you right-click on the link and copy it, paste it somewhere and you'll find something along these lines:
http://www.google.com/url?sa=U&start=1&q=http%3A/
I watched C-beams glitter in the dark near the Tannhauser gate.
Each link in the search results on google has a onmousedown event attached.
If you have javascript enabled and click on it, then your browser will also execute the javascript, which sends a get request to google. They do log each link you click on.
check the source of any google search page.
The function that gets called for each onmousedown is called clk():
Puh-leeeeze! That trick became ineffective last century. It's very easy for the search engine to check background colors and FONT tags and penalize the page that uses text that is too close to the background color.
Doesn't work in slashdot because:
Avantslash: low-bandwidth mobile slashdot.
Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young.
Have you already seen DOMAI? (NSFW)
reason defies logic