Google's Site Ranking Secrets
vivin writes "Ever wonder how Google's site ranking works? Wonder no more. Google recently filed United States Patent Application 20050071741 on March 31, 2005. This patent reveals a great deal of information about Google's site ranking algorithm and makes very good reading. For example, one of the criteria that they use is the number of years that your site has been registered. If your site has been registered for less than a year, then it counts against you. A site registered for a longer period of time means that the owner is probably serious about the site, and the site is probably legitimate. Google's Site Ranking algorithms reveal how hard they are making it for spam sites to get listed (on Google). This information will also make it easier for you to make sure that you get listed well in Google."
Note that there is no guarantee that Google uses everything in the patent or that they don't use other methods not described in any of their other patents.
or conversely how spam website can get higher :)
I prefer the official Google explanation:
http://www.google.com/technology/pigeonrank.html
Something is happening here but you don't know what it is, do you, Mr Jones.
'Google record the discovery of a link and link changes over time. The speed at which a site gains links and the link life span.' I fail to see how this would be helpful--if something's new and briefly popular, you only want to give it a high rank for a brief period and forget it once people stop linking. But if something's new and popular for a duration, you want to keep it well ranked.
Could someone explain how other crap search engines are getting high rankings in Google search?
Sometimes when I search for something specific, I get a bunch of useless links that have results of other "search engines" that invariably show something similar to "0 results for your search terms 'sheep+barn+slashbot+erotica'"
How do these sites get on the first page of Google results?
They've thrown every technique they could have thought of into the patent purely as a defensive mechanism to prevent other major engines from patenting them. Some of the techniques are thrown in as defensive FUD to prevent newbies from using them.
.. what do I know ..
Some of these techniques are just plain old bizzare and might be way too difficult to approach algorithmically.
Oh well
Step 2: Go 5 years into past, buy domain names, set up sites with lots of soft porn images
Step 3: Return to present, stopping off each year on the way to renew domains. Step 4: Sell to spammers etc.
Step 5: Profit.
I'm open to venture capitalists for investment in this one.
Panurge has posted for the last time. Thanks for the positive moderations.
Argh... quit trying to game the system! If you read the article, it's entirely from the perspective of someone trying to corrupt the rankings for financial gain. Here's an idea: make good, useful web pages, rather then spending all your time an energy creating these BS link farms. The SEO world is the modern day equivilent of snake-oil salesmen.
I want to change. Please help me--I don't think I can do it on my own.
I always suspected this... When we've started our business, we used the domain www.interakt.ro (we're from Romania). However, because we sell software tools mostly to the USA and Western Europe, we've decided to go to www.interaktonline.com.
:D
Instantly, our ranking went from number one (for "Dreamweaver Php" for example, we were number one there instead of Macromedia itself a long time), to page 10.
Now, we're working hard to promote our site, we have links all over the place, but still our site don't get up again to page 1 (search for "dreamweaver extensions" - we have to pay to get our site in the first position). I even thought that they do this on purpose for us to continue to pay on Google Ads
Probably they say it too in the patent, but the best ranking tool is to use the right "title" tag in your pages. It's invaluable how well this scores as compared to the page content.
Alexandru
Nothing in the patent nullifies my pagerank defeating technique - put lots of links to my homepage in slashdot posts modded to +5 funny!
sheep.horse - does not contain information on sheep or horses.
The article dedicates only a couple of paragraphs to PageRank, the main algorithm that Google uses, and about 2.5 pages to the rest. If anyone wants to know more about PageRank, here's Page and Brin's original paper: http://www-db.stanford.edu/~backrub/google.html
I hate the one hundred and twenty character limit for signatures with an all-enveloping, all-destroying, incredible pass
I really don't think proof-reading would have helped. The problem is much simpler--the author is an idiot.
Finance tutorials and more! Understandfinance
So that explains a lot. What a crappy article, I wonder if the submitter is the same as the Author?
This space is intentionally staring blankly at you
Umm, you spelled 'genius' wrong, genius.
Each claim in the patent can be invalidated independantly. Most patents start off with an all encompasing claim 1 that would almost certainly get invalidated if it went to court, and define subsequent claims more narrowly, often in terms of the preceding claims.
one of the criteria that they use is the number of years that your site has been registered
is not the same thing as (from the article):
How many years did you register your domain name for?
Though the summary suggests that older sites do better, the article is stating that, in order to improve one's Google ranking, domain owners should purchase longer domain registrations.
And another small note... Initially, we have used an HTTP 403 (Permanently moved) from interakt.ro to interaktonline.com. This caused us a MASSIVE degradation of our position, so right now we just do a transparent redirect from interakt.ro to interaktonline.com, without the Permanently moved headers (and this is how we've reached page 2...)
Alexandru
Some of the tatics detailed in the article require a spyware (google toolbar?). It is not possible for google to know when you came back to the search engine from your site, or another one (unless you have a link in your site to google). It also impossible for google to know if you have a bookmark.
Google does have a click-through engine attached to the results, but many people find this in adition to the single identifier cookie that googles push into you abusive already.
We all thing google is doing a good job, and it did managed to incorporate adds and an add service that is well accepted by the people. (I wonder why people still think it is a good idea to make blinking and noisy flash adds?) The point is how much we trust google? I personaly don't mind very much the click through, but do not accept the cookie and will not install a toolbar.
[]'s Victor Bogado da Silva Lins
^[:wq
The story is so old I can't believe it made it to slashdot.
Some more on info the subject:
1. U.S. Patent Application - it's best to read what's exactly been patented.
2. interesting discussion on webmasterworld
Personally I think that while some of the stuff is interesting, most of it is made up rather to confuse SEOs (google doesn't quite like them, you know that, right?). Before that, they had couple factors to think about and work on. Now, there's a shitload of stuff that just makes their work harder. Also, more factors influencing SERPS means it's much, much harder to make a trial-an-error research on what works well and what doesn't.
Won't this information now make it easier for spam sites to get listed?
Their pagerank algorithm was one of the keys to their success. Keeping it secret was one of the things that made Google work and it was a good secret - nobody completely knew how it worked. So why patent it? What's the point?
This type of spam (showing a page to the crawler and another to the user) is called cloaking. Cloakers have anticipated this sort of move and can detect a search engine's crawler by not just the user agent but also the IP address range it comes from and other heuristics. In order to beat them, search engines would have to crawl from unpredictable IP addresses and behave like regular users.
A while back I proposed a distributed approach like this in the Nutch mailing list. The problem is that it would be hard to implement and it may not be worth the effort, since there are cheaper ways to fight spam.
See charts for twitter trends on Trendistic
Just look at the patent application yourself.
I haven't read the whole thing, but just having taken a quick look at it, I have to agree with the posters who said that Google purposefully tried to cover any conceivable technique to index and rank pages. The application discusses multiple implementations of the various techniques that could be used to rank a page. Therefore analysis of the patent application is probably of limited utility for those trying to game PageRank (which was certainly a factor that Google's very competent IP lawyers considered before prosecuting the patent).
For those who are worried that Google is doing evil with this patent application, given the breadth of the patent and the fact that it discusses a plethora of techniques which Google may or may not be using, I will be surprised to see Google try to use this patent (or be able to use this patent) to push another search engine out of the market. More likely, I think, is that this will constitute prior art to enable Google to withstand challenges from other patent applicants for infringement. Of course, if you know anything about PageRank, you know that it was getting published in Scientific American long before Google was the dominant search engine. So this patent application is probably more to prevent allegations that Google infringed by adding on all the other checks and balances to the original PageRank technology to discourage spam sites.
Moiche
At the moment, the system is horribly abused, but the basic principle is a good one. I would be completely in favour of software patents if:
I am TheRaven on Soylent News
No, its not that simple. Lets say I have a small business, I sell garden tools, lawnmowers,etc, in a certain region. And yet I do a search on google for garden tools + region, I am nowhere to be found. What do I do? I optimise the hell out of my site, caking it with region name + garden tools information, and I set up a links exchange program, getting in links left right and centre from related sites. This is SEO, and it will only affect people that enter a search for "garden" "tools" "my region". In other words, those that actually want to find my site.
Theres a distinction between SEO and spamming; if I was to optimise for a garden tools site and set up a poker site there, that would be spamming.
What he can't kill, he has sex on. Trent.
If any of you have worked in a small online shops you know what a fucking holy war this is between marketing and pretty much everyone else. I specifically remember saying at one point, "Do we have to make ALL of the money RIGHT NOW?"
Good for Google for coming forward and telling peole they won't be a part of that slimy shit.
Bad for Google for saying all of this to drive up prices on their AdWord sales.
s'wut i sed.
Remember, this is a patent which requires no working model. In other words, this could be how Google *envisions* their search working as much as it indentifies any of the things it does do.
I guess it will not help, since links from slashdot have the rel="nofollow" that make them not valid for ranking. This helps minimize the comentary spam bots that run arround the net. My site was hit ny one of those, two or three times.
[]'s Victor Bogado da Silva Lins
^[:wq
OK, so there aren't that many sites like mine, let alone sites that update daily over a period of years and include their entire archive on the site that grows daily. On the other hand, to my knowledge from doing searches on Google, I have very few site that link to mine, and I thought that counted highly with Google. So basically without trying to game the system, let alone advertise my site (other than incidentally in comments like this), I've been treated really well by Google.
In my case, it must be the longevity issue coupled with the scarcity of sites like mine. It sure ain't the links to my site.
...that whole pigeon thing was a joke? I can't believe it. Maybe this filing just a way to divert our attention?
Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
I'm evil and want my small business competitor to drop in the rankings.
I set up a link-exchange farm and make sure he's listed prominently.
POOF he's branded a spammer.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Alrighty folks, you know the drill! Google filed a patent, ready pitchforks!!
"Derp de derp."
I posted a binary for mod_proxy_html at my site along with a how to on compiling it and was listed on Google's front page (currently number 5) within a week. It was a small project that a major aerospace company needed. They actually found my page through Google before we notified them it was there by e-mail.
Submitting the site to Google is a negative in their algorithm. Back when I had therabbithole.redback.inficad.com for my domain name Google found my site within a month.
You can't be successful in a vacuum. If you can't afford advertising and actually have a good site, then you join newsgroups and forums related to your site and become an active productive member. That's how my site got big initially. I linked to it in my sig on a major forum that I was active on.
Five years later I have a very large very diverse web-site and anything I post on it gets indexed (sometimes very highly) within a week. I'm currently one of the top results for Numa Numa Lyrics and Saaya Irie. It took less than a week for even Yahoo to put it at the number one result for the latter. It's since dropped a notch.
I think if you actually ran a site, you'd have a much better outlook on how Google and other major search engines operate. You don't have to spam anybody to get hits. You have to be proactive and useful. Oh yes, and patient.
Work Safe Porn
For example, let's search Google for "london hotels", a common search phrase. The first return is LondonNights.com. "Whois" returns "Worldview Ltd, 16 Marine Road West, Morecambe, LA3 1BS, Lancs, GREAT BRITAIN (UK)."
That's a UK company, so we look it up at Companies House., where we find "WORLDVIEW LIMITED, 16 MARINE ROAD WEST, MORECAMBE, LANCASHIRE LA3 1BS, Company No. 04588973". So we have a match on a registered company.
We check further with Dun and Bradstreet, which has a worldwide database of companies. We find "WORLDVIEW LTD 16 MARINE RD WEST MORECAMBE , UK Type of Location: single"
So they pass company validation, and we can get financial information about them.
Now let's try a domain that just appeared in a spam: "fleagroups.com". "Whois" gives us "Flea Market Groups. 126 73rd Ave N., Coral Springs, Florida 34992. US" So we go to Sunbiz, the Florida State Division of Corporations, and search. No "Flea Market Groups" under fictitions names. No match on address under anything beginning with "Flea". No "Flea Market Groups" under corporations, and no "Flea Market *" address matches.
Looking in Dun and Bradstreet, there are "Flea Market *" hits, but no exact match and no address match.
So they fail company validation. Add to probable spammer list, drop search engine ranking.
This is a reasonable test for any site that appears to be selling something.
Kind of reminds me of a science fiction story I read as a kid... this engineer is walking down the road when he sees a guy peddling toy saucers based on anti-gravity devices. After watching the demonstration, he buys one and is taught the trick, a piece of black thread inobtrusively linked to a pully, that the switch just powers some lights and sounds on the saucer. The engineer smiles and says it will make for a fun trick for the kids. The narrative then follows the vendor home where he says tells a man at a workbench that he sold 15 units that day and why the hell were they selling these saucers for $5 each when they cost $100 to make? The man at the workbench smiles and explains that somewhere out there, some bright individual is going to notice that operating the saucer without flipping the switch results in a broken thread. The inventor has never been able to get his device to output more than a small fraction of anti-gravity, but one day, someone will figure out how to improve the process whereupon he can leverage the patent he's got filed... ^_^ It was an amusing twist in the story to me.
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
That's not exactly what the article said. It said "How many years did you register your domain name for?" This is not a measure of how long the domain has been in existence, it is a measure of how many years you plan for it to exist. A spammer might register for just a year and then move on, but a serious business planning to build a reputation might register for 5 years or more. They are rewarding websites which are more committed to staying around, it has nothing to do with them being new.
The AC is saying they don't appreciate the line of work you are in. He/she believes it to have an overall negative impact on search engine sites. Now you can try and justify that you're not doing anything wrong by just providing a service to a paying client, but... that's not going to negate the accurate point AC is making. The clear point (that you still won't get or accept) is that SEO, spam, porn, etc. It's all gutter stuff. Leeches on society. You've chosen to be part of that.
So I get the following:
17779 eligible voters in a district, 17779 'vote' as one. This is Russia.
Google's Site Ranking algorithms reveal how hard they are making it for spam sites to get listed (on Google).
And provides a list of techniques for spam sites to use that guarantee them positions on every search engine but Google (in fact, if you use these techniques it's illegal for other search engines to penalize you for them.
This could be an especially evil technique for spammers.