Cracking the Google Code... Under the GoogleScope

On the minds of all slashdotters, by uberjoe · 2005-05-10 04:27 · Score: 5, Funny

So will this make it easier or harder to find porn?

--

The days of the digital watch are numbered.

Re:On the minds of all slashdotters, by rrkap · 2005-05-10 04:58 · Score: 4, Funny

So will this make it easier or harder to find porn?

Because there's a shortage of porn on the web?

--
I like my beverages with warning labels!
Re:On the minds of all slashdotters, by Fear+the+Clam · 2005-05-10 04:59 · Score: 2, Funny

Everything to do with porn.
eBay.com
Re:On the minds of all slashdotters, by uberjoe · 2005-05-10 05:33 · Score: 5, Funny

Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young.

--
The days of the digital watch are numbered.
Re:On the minds of all slashdotters, by WormholeFiend · 2005-05-10 05:56 · Score: 3, Funny

Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young

bingo. therefore, the question he should've asked is: will the pron I find make me harder?
Re:On the minds of all slashdotters, by Gamasta · 2005-05-10 10:51 · Score: 3, Informative

Yes, there is a shortage of *quality* porn on the web. When are these people going to learn that pigtails don't necessarily make you look young.
Have you already seen DOMAI? (NSFW)

--
reason defies logic

Great by future+assassin · 2005-05-10 04:27 · Score: 2, Interesting

Now I'll see more Get ranked #1 in search engines" spam.

http://www.anologger.com/

--
by TheSpoom (715771) Uncaring Linux user here. I have nothing to add to this but please continue. *munches popcorn*

Re:Great by chennes · 2005-05-10 05:55 · Score: 2, Informative

Notice that the author of the article is from an SEO himself: Rank your way to the bank. Clearly there is no conflict of interest here: he has no interest in making sites think they need to hire a new SEO to get around these "new" techniques... right... (the patent was filed in late 2003, IIRC)

Google what is best in life by kensai · 2005-05-10 04:27 · Score: 5, Funny

To crush artificial link inflation and hear the lamintations of search engine spam

it's a war by roman_mir · 2005-05-10 04:28 · Score: 4, Funny

The linked article is slashgoogled. It's a googlewar. Googlers are all googling.

--
You can't handle the truth.

Re:it's a war by baggins2002 · 2005-05-10 04:37 · Score: 2, Funny

If you check, you find they are running an IIS 6 server. I guess the 2GB of memory and quad CPU on MS platform couldn't handle the 15 of us that went to look at this article.
Maybe we could find the cache of the article on Google.

in case of slashdotting, article text by Anonymous Coward · 2005-05-10 04:28 · Score: 5, Informative

Cracking the Google Code... Under the GoogleScope
Google's US Patent confirms information retrieval is based on historical data.

Publication Date: 5/8/2005 9:51:18 PM

Author Name: Lawrence Deon

An Introduction: ...if you thought you cracked the Google Code and had Google all figured out ... guess again.

Google's sweeping changes confirm the search giant has launched a full out assault against artificial link inflation & declared war against search engine spam in a continuing effort to provide the best search service in the world... and if you thought you cracked the Google Code and had Google all figured out ... guess again.

Google has raised the bar against search engine spam and artificial link inflation to unrivaled heights with the filing of a United States Patent Application 20050071741 on March 31, 2005.

The filing unquestionable provides SEO's with valuable insight into Google's tightly guarded search intelligence and confirms that Google's information retrieval is based on historical data.

What exactly do these changes mean to you?
Your credibility and reputation on-line are going under the Googlescope! Google has defined their patent abstract as follows:

"A system identifies a document and obtains one or more types of history data associated with the document. The system may generate a score for the document based, at least in part, on the one or more types of history data."

Google's patent specification reveals a significant amount of information both old and new about the possible ways Google can (and likely does) use your web page updates to determine the ranking of your site in the SERPs.

Unfortunately, the patent filing does not prioritize or conclusively confirm any specific method one way or the other.

Here's how Google scores your web pages.

In addition to evaluating and scoring web page content, the ranking of web pages are admittedly still influenced by the frequency of page or site updates.
What's new and interesting is what Google takes into account in determining the freshness of a web page.

For example, if a stale page continues to procure incoming links, it will still be considered fresh, even if the page header (Last-Modified: tells when the file was most recently modified) hasn't changed and the content is not updated or 'stale'.

According to their patent filing Google records and scores the following web page changes to determine freshness.
The frequency of all web page changes
The actual amount of the change itself... whether it is a substantial change redundant or superfluous
Changes in keyword distribution or density
The actual number of new web pages that link to a web page
The change or update of anchor text (the text that is used to link to a web page)
The numbers of new links to low trust web sites (for example, a domain may be considered low trust for having too many affiliate links on one web page).
Although there is no specific number of links indicated in the patent it might be advisable to limit affiliate links on new web pages. Caution should also be used in linking to pages with multiple affiliate links.

Developing your web page augments for page freshness.

Now I'm not suggesting that it's always beneficial or advisable to change the content of your web pages regularly, but it is very important to keep your pages fresh regularly and that may not necessarily mean a content change.

Google states that decayed or stale results might be desirable for information that doesn't necessarily need updating, while fresh content is good for results that require it.

How do you unravel that statement and differentiate between the two types of content?

An excellent example of this methodology is the roller coaster ride seasonal results might experience in Google's SERPs based on the actual season of the year.

A page related to winter clothin

Unintended side effects of the Google arms race by 14erCleaner · 2005-05-10 04:28 · Score: 5, Interesting

It just occurred to me that, as Google changes its algorithms, it'll just create more business for the Search Engine Optimization consultant. When web sites drop in the Google rankings, they'll want to make changes to move back up, and will hire the SEO again to do so.

--
Have you read my blog lately?

Re:Unintended side effects of the Google arms race by rm999 · 2005-05-10 04:37 · Score: 4, Insightful

Perhaps, or perhaps if Google changes its rankings enough, the SEOs' credibilities will be destroyed (they will be seen as a temporary and overpriced fixes)
Re:Unintended side effects of the Google arms race by AKAImBatman · 2005-05-10 04:38 · Score: 5, Interesting
Here's a thought: How about companies try to offer useful services rather than "optimize" their search engine results? I've gotten several top hits on Google by the complete accident of providing useful services or information in the past. Traditional advertising such as adclicks and dmoz listings also help. Not once have I wasted my time trying to game the system.

Companies need to start realizing that making money is about providing what customers want. Advertising is a great way of getting your name out, but only a good product or service will actually carry through. So in that frame of thinking, I highly recommend that companies:
- Stop looking at "cost cutting" by reduction, and start looking at "using existing resources to provide relavent products"
- Start hiring employees who know what they're doing and listen to them
- Stop wasting your money on search engine optimizations.
- Be good to the customer, and the cutomer will be good to you. If you don't know why people are upset or unhappy, grab a couple off the street and ask.
--
Javascript + Nintendo DSi = DSiCade
Re:Unintended side effects of the Google arms race by Anonymous Coward · 2005-05-10 04:41 · Score: 2, Insightful

There will always be >=11 sites wanting to be in the Top10
Re:Unintended side effects of the Google arms race by 14erCleaner · 2005-05-10 05:04 · Score: 3, Interesting

Perhaps, or perhaps if Google changes its rankings enough, the SEOs' credibilities will be destroyed
That would be great. Now that I've read TFA, it looks like Google's techniques a long way toward eliminating the fakery done by SEO's currently.

As an aside, the article looks like it was written by an SEO consultant, as it contains a lot of advice about how to get good rankings under Google's patented approach. Interestingly, the recommended actions are mostly legitimate (offer interesing content, update regularly, don't try to create fake links to your site), but also some less-upfront techniques (make link-exchange deals with other sites and encourage bookmarking, for example).

--
Have you read my blog lately?
Re:Unintended side effects of the Google arms race by MrNiceguy_KS · 2005-05-10 05:05 · Score: 5, Funny

If you don't know why people are upset or unhappy, grab a couple off the street and ask.
I'm unhappy because I was grabbed off the street. May I go now?

Please?

--
Redundancy is good And also good.
Re:Unintended side effects of the Google arms race by baggins2002 · 2005-05-10 05:10 · Score: 2, Insightful

Companies need to start realizing that making money is about providing what customers want. Advertising is a great way of getting your name out, but only a good product or service will actually carry through. So in that frame of thinking, I highly recommend that companies:

Uhh, which world are you living in. Most companies have found that bigger profits can be made, by convincing people that they want what they have. And most customers find it easier to buy what they are told to buy.
I like your world, but it's not the one I've been living in.
Re:Unintended side effects of the Google arms race by DeadSea · 2005-05-10 05:11 · Score: 2, Informative
How about sites that already provide a useful service and want to get as much exposure as possible? I can't count the number of useful sites that I've visited that are not ranked as well as google as I would like (so I can find them more easily) because they do non-Google-friendly things like:
- Session IDs in urls
- Doorway pages
- Content that expires or changes urls
- Javascript navigation
Sometimes search engine optimization isn't about making a hack site rank well. Sometimes it is about getting the traffic that a really nifty site deserves.

In fact, I wish all the legit sites did everything they should morally do in terms of SEO. Then the spam sites wouldn't have such an easy time pushing them out of the way.

From a business perspective, money spent on making non-spammy search engine optimizations can be much more effective than money spent on marketing or public relations.

--
Scientific calculator with hex, octal, decimal, and binary
Re:Unintended side effects of the Google arms race by AKAImBatman · 2005-05-10 05:18 · Score: 4, Insightful

Sometimes search engine optimization isn't about making a hack site rank well. Sometimes it is about getting the traffic that a really nifty site deserves.

Actually, pretty much everything you list falls under the issue of usability. Many of those options have lower usability for the user, and thus the search engine by extension.

These companies don't need an SEO, they need to find a web designer that doesn't use Macromedia "tools".

--
Javascript + Nintendo DSi = DSiCade
Re:Unintended side effects of the Google arms race by Doctor+O · 2005-05-10 09:47 · Score: 3, Insightful

Being a professional webworker for more than 8 years now, I agree with you from experience, but actually I don't think you can blame Macromedia.

I will not say anything at all about Flash because two camps who BOTH don't get it will start the usual pointless discussion. Flash is rarely used for what it's great at, visualizing data, and plagues us with wildly unnecessary and annoying l33t-masturbation stuff instead.

Dreamweaver itself is indeed a powerful timesaver in the hands of an experienced XHTML/CSS guy. If you look at it closely, you'll find that it is a very nice graphical frontend to HTML itself, with a great set of shortcuts so that you almost don't have to touch the mouse at all. The palettes just provide access to the most commonly needed attributes of the element you're working on. If you leave all those nasty "behaviours", "timelines" and whatnow alone, it produces nicely readable and well-formed code. I'm using Dreamweaver since the early betas, and even back then this was the case. I tend to think that this was an initial design goal behind DW.

The bad comes from the 'designers' who are taught print design at the universities and apply them to the Web, using all the nutty clicky-pointy tools that produce JS-laden horror cabinet of non-standards-compliance they dare to call "HTML". It's a classical PEBKAC. Look at it this way - if DW didn't have those features, GoLive would've taken over long ago and we don't want THIS to happen. IMNSHO the only thing worse would be Frontpage. At least the guys at Macromedia didn't invent bogus HTML extensions because they were incapable of providing a proper metadata infrastructure, like Adobe did.

(I'm not a fanboy though, I just use what works best at the moment for the things I do. If someone shows me how to reproduce this "Apply Source Formatting" feature from DW in Kate/KDevelop and how to synchronize sites like in DW, I'm switching my machine at work from Win2K with DW to KDevelop/nvu on FreeBSD tomorrow, because it better fits the things I do nowadays. It will then match my setup at home.)

While we're at it, SEO is, was and always will be BS, just like the whole Internet Advertising Myth which after nearly a decade of documented failure still isn't debunked. Duh.

--
Who is General Failure and why is he reading my hard disk?

After link analysis by Ars-Fartsica · 2005-05-10 04:30 · Score: 4, Interesting

Its obvious Google and Yahoo are moving on to trust-based (or perceived trust) ranking for sites based on what they see users clicking on through the web accelerator, Yahoo's MyWeb, etc. Hopefully this will help grade down the obvious spam...although you only find out its spam by going to the page...we'll see.

Re:After link analysis by JVert · 2005-05-10 04:34 · Score: 3, Insightful

Doesn't seem like the best solution. This would work if you started from a clean slate but spam pages are still out there and are being clicked on. Not much you can do about that, I just hope its not something silly like how much time you spend on a page. If I find a page that quickly answers my question or at least answers part of my question and I click back for other links i'd hate to think that that site would be marked as "spam".

Yes by Anonymous Coward · 2005-05-10 04:30 · Score: 5, Funny

But when I search on Tiger, a mail-order company's site still comes up above Apple's. Is anyone at Google listening?

Re:Yes by AKAImBatman · 2005-05-10 04:41 · Score: 4, Insightful

Truthfully? The top results should be for "Tiger" should be furry creatures that eat meat and perform in Las Vegas.

--
Javascript + Nintendo DSi = DSiCade
Re:Yes by 99BottlesOfBeerInMyF · 2005-05-10 05:06 · Score: 2, Interesting

Interestingly enough, the top the results for "tiger" are a page about tigers, tiger direct, and the Apple page. These seem pretty reasonable to me. The OS is obviously something a lot of people are going to be looking for, but I'd still find it weird if real tigers were not the first link. For "panther" the results are Apple's page, then some pages on real panthers. For "jaguar" you get the car manufacturer, Apple, then real panthers. I wonder what will happen if you do a search on "tiger" a year from now.

resistance is futile by roman_mir · 2005-05-10 04:30 · Score: 4, Funny

you will be googleated. Or googleaten. Whichever.
Borgle.

--
You can't handle the truth.

Re:resistance is futile by WWWWolf · 2005-05-10 04:48 · Score: 3, Funny

you will be googleated. Or googleaten. Whichever. Borgle.

So that's what it sounds like when the Borg have problems digesting the food... and I always thought they just recharged or whatever... well, I guess they can't always adapt fast enough.

("Captain, may I suggest remodulating the food replicator inputs?")

Re:This is under YRO? by DrEldarion · 2005-05-10 04:31 · Score: 3, Insightful

It's about your right to not see search results filled with complete crap.

Is it the general opinion of the public... by wcitech · 2005-05-10 04:31 · Score: 3, Interesting

...that google is still a "not evil" company? This proxy "web-accelerator" thing really still has me freaked out. Am I just paranoid or is there legitimate reason for concern?

Re:Is it the general opinion of the public... by DrinkingIllini · 2005-05-10 04:35 · Score: 3, Insightful

Of course there is reason for concern, any company gets too big and powerful they become evil. Wal-mart, Microsoft, Disney, Intel, Lucasfilm, they're all evil, and I'm sure they didn't set out to become that way, it's just the power of the dark side. Power corrupts, it's the nature of the beast.

Take the article with a grain of salt... by nganju · 2005-05-10 04:31 · Score: 5, Insightful

The article is not written by a Google employee, nor did the author speak with anyone at Google. It's simply his analysis of the patent document filed by Google.

Also, at the bottom of the article after the author's name, there's a link to some search optimization service's website.

--
There are 2 kinds of people in this world. Those that can keep their train of thought,

Six weeks to fix? by Anonymous Coward · 2005-05-10 04:35 · Score: 2, Informative

I use google quite a bit to check on recent spyware/malware (used it this morning) and with all due respect, the first few links typically are for spyware products that don't work, domain parking sites (search engines themselves), requiring some amount of diligence to get to the "real" sites that have information.

If this claim is true, I guess we'll have to wait the typical "four to six weeks for delivery."

GoogleBombs Away by Doc+Ruby · 2005-05-10 04:35 · Score: 4, Funny

The "war" metaphor really is cute. Geeky competition in search relevance is really a lot like bombing cities, shooting ranks of soldiers, and destroying bridges and railways. Burnt, bloody bodies everywhere! And clean datacenters with mathematical algorithms.

--

--
make install -not war

effect on search engine optimizers by nemexi · 2005-05-10 04:36 · Score: 5, Informative

One of the most interesting (and obvious) effects of Google's changes: The company which once ranked first for the phrase "search engine optimization", SEOinc, is now nowhere to be found -- even a search for the company's name doesn't bring up the company's website. SEOincs response has been a -- somewhat ineffective -- try to bring those reporting on its fall to "cease and desist".

Article text and Google cache link by RealProgrammer · 2005-05-10 04:37 · Score: 3, Informative

I think this is the same article: google:www.coder.com
Google United - Google Patent Examined

Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control.

Publication Date: 4/7/2005 7:41:24 AM

By Jim Hedger, StepForth News Editor, StepForth Placement Inc.

Thoughts on Google's patent... "Information retrieval based on historical data."

Google's newest patent application is lengthy. It is interesting in some places and enigmatic in others. Less colourful than most end user license agreements, the patent covers an enormous range of ranking analysis techniques Google wants to ensure are kept under their control. Some of the ideas and concepts covered in the document are almost certainly worked into the current algorithm running Google. Some are being worked in as this article is being written. Some may never see the blue-light of electrons but are pretty good ideas so it might have been considered wise to patent them. Google's not saying which is which. While not exactly War and Peace, it's a pretty complex document that gives readers a glimpse inside the minds of Google engineers. What it doesn't give is a 100% clear overview of how Google operates now and how the various ideas covered in the patent application will be integrated into Google's algorithms. One interesting section seems to confirm what SEOs have been saying for almost a year, Google does have a "sandbox" where it stores new links or sites for about a month before evaluation.

Google is in the midst of sweeping changes to the way it operates as a search engine. As a matter of fact, it isn't really a search engine in the fine sense of the word anymore. It isn't really a portal either. It is more of an institution, the ultimate private-public partnership. Calling itself a media-company, Google is now a multi-faceted information and multi-media delivery system that is accessed primarily through its well-known interface found at www.google.com.

Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll's over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn't work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn't work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.

In its recent SEC filing, the first it has produced since going public in August 2004, Google said it was going to spend a lot of money to continue outpacing its rivals. This year they figure they will spend about $500 million to develop or enhance newer technologies. In 2004 and 2003, Google spent $319 million and $177 million respectively. The increase in innovation-spending corresponds with a doubling of Google's staff headcount which has jumped from 1628 employees in 2003 to 3021 by the end of 2004.

Over the past five years Google has produced a number of features that have proven popular enough to be included among its public-search offerings. On their front page, these features include Image Search, Google Groups, Google News, Froogle, Google Local, and Google Desktop. There are dozens of other features which can be accessed by cli

--
sigs, as if you care.

SEO by Anonymous Coward · 2005-05-10 04:37 · Score: 2, Interesting

What do those guys actually *do* in any case? I mean, legitimately. I guess you can tweak things a bit, but... how much does that actually get you if you simply aren't a popular site?

Re:SEO by Intron · 2005-05-10 05:17 · Score: 3, Informative

There's a whole range. Some will tell you how to rewrite your web page so that search engines will classify it better. That seems legit. Others will try to sell you on "link farms" and other hacks to improve your ratings - not so legit. I've also seen spamming websites that have google-accessible logs with fake referrers, or spamming blogs like /. with links in your sig [place link here].

--
Intron: the portion of DNA which expresses nothing useful.
Re:SEO by Anonymous Coward · 2005-05-10 05:49 · Score: 2, Insightful

There is an art to SEO. Some of us employ spamming techniques that will force a website to the top of the list for a short period of time, and then become banned. To some people, this is desirable - such as when you know your product has a short lifespan.

Others like myself try to help businesses retool their websites to be search engine friendly. Alot of smaller businesses out there have websites that have every bit of info on everything they do on every page, thats bad. We show them how to break it into logical pieces, present it to the end user in a manner they will respond favorably to, AND build the site in a manner which will get crawled efficiently.

True SEO has two sides to it, the Optimization side and the Search side. You have to understand how your demographic searches for things. If you are selling womens jewelry online, you will build the site (SEO wise) differently than you would a site that sells lab equipment. There are cultural differences in how these demographics search for things, and differences in the lifecycle of a sale to them. Some web developers can create sites that are easy to navigate and look great, yet they forget who they are targeting. Their content may be relevant, but it wouldn't spur Google to refer to it by the terms that the target would search with. Good SEO is about building a website with relevant content in the context that the target uses, not your perception of how it should be used.

I don't try to manipulate Google into thinking my websites are the authority on any subject. I try to build my sites to speak to the target demographic. When done properly, this drives traffic to your site because you knew what your target audience was going to put into the search box. Which, BTW, is what Google WANTS. They don't want page spam that will artificaially inflate a page's ranking and dilute the the accuracy of their product. They want to be able to detirmine what is relevant by the traffic a site gets, how many people link to it, and how often it is updated. For heavily used terms, there are some technical tricks that are employed to increase your ranking, but nothing outside what a good designer should want to do to bring attention to the most relevant content on a page.

The truth of the matter is, nothing on the internet is unpopular. If Furries exist, everything has a place. But speaking a demographics language on the web is difficult, and quite often outside the scope of a web developer/Copy writer.
Re:SEO by hankwang · 2005-05-10 09:38 · Score: 3, Informative
spamming blogs like /. with links in your sig [place link here].
Doesn't work in slashdot because:
- Sigs are only visible for logged-in users (i.e. not for robots)
- Posts without a karma bonus have the REL=NOFOLLOW attribute in the links, so that they don't count for Google.
--
Avantslash: low-bandwidth mobile slashdot.

Frequency of changes by Veinor · 2005-05-10 04:41 · Score: 4, Insightful

Almost any algorithm can be spoofed fairly easily: inserting very small text that's the same color as the background. Then whenever they want Google to think they've updated, they change the text. The viewer doesn't tell the difference, but the source code changes. Or they could just use comments in Javascript, or just create Javascript that never gets used.

Also, a page with frames might get penalized since its content doesn't change, although the content of the frames may change frequently.

Re:Frequency of changes by killtherat · 2005-05-10 05:29 · Score: 2, Interesting

What if Google starts to use a filter designed to elimnate the effect of text that is deemed 'unviewable'. Just check to see if the text color is the same as the background, if it is, ignore it.

I thought of that is less then 30 seconds, what are the odds Google has already thought about it?
Re:Frequency of changes by Tsu+Dho+Nimh · 2005-05-10 07:20 · Score: 2, Informative

"inserting very small text that's the same color as the background"
Puh-leeeeze! That trick became ineffective last century. It's very easy for the search engine to check background colors and FONT tags and penalize the page that uses text that is too close to the background color.

Coral cache link by dagnabit · 2005-05-10 04:43 · Score: 2, Funny

Original server is /.ed. Coral cache link here.

Re:This is under YRO? by null+etc. · 2005-05-10 04:45 · Score: 2, Funny

It's about your right to not see search results filled with complete crap.

That's an interesting interpretation. Here's a review of today's submissions, translated to your perspective:

Broadway Awards Spam is about your rights to watch Spamalot, nominated for 14 Tony awards.

IT: More on Last Years Cisco Source Code Theft is about your rights to read about a theft of proprietary source code.

IT: What Does a Spreading Worm Look Like? is about your rights to visualize what a spreading worm looks like.

Games: Gameboy Emulator Released for PSP is about your rights to play Gameboy games on a PSP.

Newest Star Wars Reviews Suprisingly Positive is about your rights to be surprised that Lucas Finally Got It Right(tm).

Wow, with your new scheme, we can get rid of all other topics!

Re:This is under YRO? by ShaniaTwain · 2005-05-10 04:46 · Score: 4, Funny

"You have the right to search in silence. If you give up the right to search in silence, anything you say can and will be modded down in a court of public opinion. You have the right to be listed on google. If you desire a seo and cannot afford one, you will spare us all a lot of unwanted search engine spam, and a metamod will be obtained for you before final moderation."

--
Starsucks

Re:Is it the case.. by CynicalGuy · 2005-05-10 04:52 · Score: 2, Insightful

Is it the case that Google's search dominance is a direct result of it clinging onto a patent for PageRank ?

Their search dominance is a direct result of PageRank. That they have a patent on it prevents other companies from copying the idea or hiring their employees away (Microsoft is notorious at doing both these things). So yes, the patent is important.

Sorry kids, but patents and "Do no evil" are mutually incompatible concepts.

You're retarded if you think that.

Google's crackdown is coming by Animats · 2005-05-10 04:53 · Score: 3, Insightful

It's clear that Google is gearing up for a crackdown on search engine spamming. They've already started to kill off "link farms". They're checking spam blacklists. And they're not stopping there.

Note that Google is now looking at domain ownership information. This may result in a much lower level of bogus information in domain registrations. It's probably a good idea to make sure that your domain registration information, business license, D&B rating, on-site contact info, and SSL certificates all match.

"Domain cloaking" will probably mean that you don't appear anywhere the top in Google. So that's on the way out.

Re:A reason why *not* to use .NET? by Valar · 2005-05-10 05:00 · Score: 2, Insightful

Well, considering that it isn't .NET's fault that they didn't properly implement exception handling I would say no. Also, combine this with the fact that that exception is caused simply by a server overload and you get a total nonissue.

--

====
Crudely Drawn Games

Google's Click History Asset by 4of12 · 2005-05-10 05:07 · Score: 4, Insightful

Google has millions upon millions of click history on their search results that say what it is people really are looking for, as well as which ones appeared good fodder for first clicking.

No one else has such a large database of what humans have actually picked.

Such a click history and search term history asset is worth even more if it gets correlated with Evil Direct Marketing information from the cookie traders.

Although, it seems possible that large ISPs could also grab and analyze their members Google interactions to figure out people's tastes, assuming such interactions remain unencrypted.

I have to wonder how many companies with static IP addresses have, unbeknownst to them, built up extensive history logs at Google showing their search term preferences and click selections. If I were a technology startup with a hot idea to research I'd be a little more paranoid about something like that.

--
"Provided by the management for your protection."

Re:Google's Click History Asset by eluusive · 2005-05-10 05:26 · Score: 4, Interesting

Click history? Incase you hadn't noticed google links are direct. There's no link to a google page that redirects. So,then, by what method do they obtain this mystical click information on me?
Re:Google's Click History Asset by daeley · 2005-05-10 06:39 · Score: 3, Informative

Incase you hadn't noticed google links are direct.

You sure about that? Try copying and pasting a Google results link.

For example, let's search Google for "elluusive". The first result was your slashdot "homepage", at http://slashdot.org/~eluusive, which at first glance seems to be a direct link. But if you right-click on the link and copy it, paste it somewhere and you'll find something along these lines:

http://www.google.com/url?sa=U&start=1&q=http%3A// slashdot.org/~eluusive&ei=A_-AQubaOq2gYNujqccO

--
I watched C-beams glitter in the dark near the Tannhauser gate.
Re:Google's Click History Asset by F�an�ro · 2005-05-10 06:59 · Score: 4, Informative

with onmousedown events.

Each link in the search results on google has a onmousedown event attached.

If you have javascript enabled and click on it, then your browser will also execute the javascript, which sends a get request to google. They do log each link you click on.

check the source of any google search page.
The function that gets called for each onmousedown is called clk():
function clk(el,ct,cd){if(document.images){(new Image()).src="/url?sa=T&ct="+escape(ct)+"&cd="+esc ape(cd)+"&url="+escape(el.href).replace(/\+/g,"%2B ")+"&ei=gwKBQoX7GJKmQcONmN4B";}return true;}
Re:Google's Click History Asset by Juergen+Kreileder · 2005-05-10 07:12 · Score: 2, Insightful

Exactly. I just found a whole page of this by searching 'web proxy' without the quotes and going down the search results to about page 6 or so. Interesting, when I reloaded the page all of that /url?sa= stuff was gone and the links were direct again.
I guess it's a Google feature. They use the click-tracking URLs very sparingly. That makes it harder for SEOs to manipulate rankings that way.
Re:Google's Click History Asset by MikeBabcock · 2005-05-10 09:03 · Score: 2, Insightful

Its also fairly simple to note that someone clicked a link then immediately returned to the results list by noting the "if-modified" request from the user's browser.

A quick return would indicate that the page was not in fact what the user had requested.

--
- Michael T. Babcock (Yes, I blog)
Re:Google's Click History Asset by F�an�ro · 2005-05-10 09:50 · Score: 2, Interesting

Ok, this is geting weird

i look at this page:
http://www.google.com/search?q=test

The first result link looks like this:
<a href=http://www.ets.org/toefl/ onmousedown="return clk(this,'res',1)">Welcome to TOEFL: The <b>Test</b> of English as a Foreign Language</a>
at least in IE.
In opera the javascript is missing

try this (remove the space before the ?):
wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" "http://www.google.de/search ?q=test"
compared to this:
wget --user-agent="." "http://www.google.de/search ?q=test"
the first page contains the javascript but the second does not, at least on my system.

Can anyone confirm that?
Re:Google's Click History Asset by epee1221 · 2005-05-10 10:28 · Score: 2, Interesting

I have to wonder about the effectiveness of using click histories. It seems to me that the only way any site is going to get a lot of clicks from google is if they're already near the top to begin with. A site that is good but new will be buried so far down that nobody will actually get to it. Is there any way around this?

--
"The use-mention distinction" is not "enforced here."

Re:Is it the case.. by AKAImBatman · 2005-05-10 05:14 · Score: 2, Insightful

Yes, patents ARE a violation of google's do no evil policy, as it gives them a monopoly on the good search engine algorithems.

So they have monopoly. What's your point?

When did a monopoly by google become ok?

Sometime around the 1790's when the patent system was created in the US to give inventors an temporary and artificial monopoly on their inventions so as to encourage them to innovate. Google has not violated their policy of "do no evil" by properly utilizing the patent system, and it has had the intended side effect of preventing Microsoft from using their corporate muscle to crush Google.

but why support one companies attempts to cripple their opponents through legislation instead of competition?

Why should a company with more money have a right to crush me with my own invention?

The primary reason why the patent system sucks is that "invention" is far too loosely defined. Many patents get granted in cases where the patent office's own rules state that they should throw them out.

--
Javascript + Nintendo DSi = DSiCade

Two Keys: Data Mining and Delay by RonBurk · 2005-05-10 05:17 · Score: 5, Interesting

The first big mistake webmasters make when trying to understand how Google ranks search results is failing to grasp the idea of data mining. The Google folks come from a data mining background, the constantly write about data mining algorithms, it would be highly surprising if the bulk of the Google algorithm was not constructed via data mining.

What does that mean? At the highest level, it means that most of the Google algorithm is constructed by a machine. You give the machine human-constructed examples of how to rank a sample set of pages (notice those want ads where Google is hiring people who can inspect and assess the quality of web pages?) and it then uses essentially brute-force techniques to test every possible combination of your ranking variables to find the simplest formula that ranks pages the same way the human did.

There is no human at Google "twisting dials" to alter individual parameters of a formula. The machine constructs the algorithm, and it can therefore easily be so complex that no human can understand it. Tweaking the algorithm becomes a process of changing or adding to your "training set" of human-ranked pages, and letting the data mining process come up with a revised algorithm.

For example, Google could invent a new variable called "category", and identify each page as belonging to category Astronomy, Botulism, Country, [...] and Other. Once that variable is thrown into the mix, then the Google "aglorithm" is essentially free to vary wildly from one type of subject matter to the next. For example, you might see someone with a Real Estate site swearing up and down that inbound links are no longer as important, while someone with an Astronomy site might swear that, no, inbound links are more important than ever. You can see exactly this kind of bickering in most of the forums that people who hope to do Search Engine Optimization frequent.

The other big mistake people make in trying to see how to game the Google algorithm is "delay". In studying how people manage (or fail to manage) complex systems, psychologists learned that people generally would fail if a delay was introduced between their actions and the results of their actions.

In one very simple test, people were charged with trying to stabilize the temperature in a virtual refridgerator. They had one dial, and there was exactly one piece of feedback: the current temperature in the fridge. However, they were not explicitly told that there was a delay between moving the dial and when the results of that action would stabilize.

The responses of those test subjects was eerily similar to what we see in Google-gaming webmasters these days. Some people swore up and down that some human behind the scenes was directly tweaking the results to thwart whatever they did. Others became frustrated and decided that nothing they did really mattered, so they would just swing the dial back and forth between its minimum and maximum settings.

What does this have to do with Google? These days, Google can change their algorithm relatively frequently, and the algorithm can vary by the relative date of various things. The net sum is, there's a delay between when your page is first ranked and when it is likely to arrive at a relatively stable ranking. This can drive webmasters nuts as they think they've done something clever to rank their page high, but then it drops a week later. Although it doesn't occur to them, the important question is: did the change cause the high ranking or did it cause the sudden decline?

The few people who did master the simple refridgerator system? Well, they sounded more like some of the people who are more successful at gaming Google. Those folks tend to say things like: "just make one change and then leave it alone for a while to see what happens."

Can you still game the Google algorithm? Undoubtedly in specific cases. But it's getting harder. The Google algorithm was always complex, but what's changing is that the days when a few variables (such as inbound link count) generally swamped the effects of all the others is drawing to a close. We are approaching the day when the best technique to rank highly with Google will be: sit down at your keyboard and make more good content every day.

Re:Two Keys: Data Mining and Delay by jrtom · 2005-05-10 13:28 · Score: 2, Interesting

The parent post is largely composed of misinformation, ignorance and irrelevance. I'd suggest to its author that it might be a good idea to do some basic research before posting on a subject which is, I suspect, outside his area of expertise.

(1) What you have described as Google's "algorithm" is a distortion of one particular technique used in data mining (actually machine learning, but we'll let the vocabulary slide); furthermore, no one other than a first-year AI/machine learning student would use exhaustive search in parameter space ("brute force") to come up with a solution. In fact, a very brief search on your favorite search engine (for, say, "PageRank algorithm") would reveal that the basic algorithm is actually very simple, and does not in fact involve learning from labeled examples, as you suggest. (More recent versions of the Google ranking mechanism may safely be assumed to be more sophisticated, but I'd bet serious cash that they're nothing like what you describe.)

(2) PageRank--the basic algorithm, that is--is not, and never has been, based, even in part, on inbound link count. This can also be easily verified by a few minutes' research as above.

(3) Your refrigerator example doesn't actually support your point. If Google's ranking algorithm is continually changing, as you suggest, then you can never know whether any change you made had any effect on your ranking. (And "algorithm can vary by the relative date of various things"? Say what?)

Web page "freshness?" A good thing... by Eric+Damron · 2005-05-10 05:23 · Score: 2, Informative

There seems to be a lot of weight put on web page freshness. I host a friend's site containing the collection of poems by Ella Wheeler Wilcox. She lived in the 1800s so one cannot expect to see any new material from her.

The site is mostly static but is rich with cultural value. It's currently the number one hit on Google. I'm hoping that Google's emphasis on "freshness" won't make his site fall in ranking.

--
The race isn't always to the swift... but that's the way to bet!

Re:FAQs by eluusive · 2005-05-10 05:33 · Score: 2, Insightful

I say F-A-Q not FAQ. I pronounce IRC I-R-C not Irck. It makes me go irck when somebody says erck for IRC. I pronounce MySQL as My-S-Q-L not My Sequel. #$*#@$%&)(@#&%()*#@&%)(*#@% However, I do pronounce LASER as laser the word. Laser is no longer just an acronym.

Wait a second. by ThePromenader · 2005-05-10 05:38 · Score: 2, Insightful

Isn't this "page update frequency" hullaballoo a bit premature? If Google wants relevant results I can only see update frequency being but a minor factor in any page rank determination algorithms. For example: Informations sites (historical information, dictionaries, encyclopedias, collections, etc...) are often at once the most relevant (if info is what you're looking for) and the least updated sites. I can't really imagine the Oxford Faculty meeting every week to decide new words for their dictionary to retain their www.oed.com pagerank. Just imagine what it would do to the English language : )

Seriously, this little article is going to get Webmasters thinking a little more but I don't see anything to panic about. Not yet, anyways.

--

No, no sig. Really.

ThePromenader

What about harmful link spam? by mejesster · 2005-05-10 05:55 · Score: 3, Insightful

It seems nobody has asked the question: what if a spammer wants to lower the rank of more reputable companies? If a spammer link spams a site that is already fairly popular, couldn't it harm the page rank of a company that has nothing to do with the spam?

--
MacroHard - Boning you in a big way! (TM)

SEQUEL by naph · 2005-05-10 07:08 · Score: 2, Insightful

"The history of SQL and relational databases traces back to E.F. Codd, an IBM researcher who first published an article on the relational database idea in June 1970. Codd's article started a flurry of research, including a major project at IBM. Part of this project was a database query language named SEQUEL, an acronym for Structured English Query Language. The name was later changed to SQL for legal reasons, but many people still pronounce it SEQUEL to this day."

http://www.provue.com/proVUE/Fact_SQLServer.html

just a bit of history.

--
"if i'd known it was harmless, i'd have killed it myself"

Re:FAQs by warpup · 2005-05-10 07:23 · Score: 2, Funny

I like the pronunciation Fah - cue.

Slashdot Mirror

Cracking the Google Code... Under the GoogleScope

67 of 335 comments (clear)