Google Fires Back About Search Engine Spam
coondoggie writes "The folks at Google are taking issue over spam and the quality of Google searches, which some claim has gone down in recent months. Today on Google's official blog, Principal Engineer Matt Cutts said, 'January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. Today, English-language spam in Google’s results is less than half what it was five years ago, and spam in most other languages is even lower than in English.' Cutts also explained that the company has made a few significant changes to their method of indexing."
According to our own tests we are 100% awesome. We have tested you and you are not :( --Elgoog
"spam in most other languages is even lower than in English."
this is definately not true for Spanish. There has always been a higher level of spam results for Spanish
the evaluation metrics we've refined over the past decade
In other words, as long as they keep changing the evaluation criteria, they always pass them!
I've seen more parked domains in google results than I have actually content recenty.
" according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness and comprehensiveness. "
And thus begins the downfall of Google. Once you start drinking your own lemonade and stop listening to the people who use your product, you're on a greased downhill slope.
And the worms ate into his brain.
It would, if Google include an option to filter out entire domains from search results. Google could then simply monitor these domains and try and figure out why people take the trouble of filtering them out.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Perception is reality.
Anyway, I think the argument is: The spammers are gaming your Metrics. It's not that there's 50% less spam in your search results, it's that you're detecting 50% less spam in the first place.
Every time I search for something these days I get some ridiculous set of non-results due to the fuzzy matching. I search for "TIPC layer3" google nicely finds me results about TCP Layer3 because google thinks I must have typo'd something. This happens constantly with one or two letter off searches where the search results I get are adjusted because the alternative ranks higher.
Google's search is not getting better, it's getting more and more 'Clippy' every year.
Google has a dilemma. If their search engine takes you directly to the place you want to go, they don't make any money. For a good analysis of this, see "Google Sucks All the Way to the Bank", by Jill Whalen She is, unfortunately, right. It's essential for Google's success that some of their own ads be more relevant than their search results. Part of their revenue comes from sending users on a side-trip to AdWords-heavy pages. We've measured this, using a browser plug-in which reports AdWords appearances to us. About 36% of domains with AdWords (counting domain names, not traffic) are what we consider "bottom feeders", junk sites with a commercial purpose but no identifiable business behind them.
On the local search front, spam in Google Places is even worse than in their main search results. This, though, appears to be due to ineptitude, not malice. Google added a business search system to Google Maps a year or two ago; that's what Google Places really is. You've been able to go to a Google Maps page and search for businesses for some time now. Few people knew this.
Then, in October 2010, Google merged the map search results into their main search results. "Places" results suddenly got top billing in Google. The "search engine optimization" (SEO) industry swung into action, and began spamming Google Places on a massive scale. (We have a paper on this, which has been mentioned by Techdirt, the New York Observer, etc. It's an amusing read.) Recommendation spamming, which had been going on for a while at a low level, grew substantially once recommendations started affecting Google search results.
This, incidentally, is why Blekko won't work. If they get enough market share to matter, techniques will be developed to spam them into meaninglessness.
Stopping web spam is technically quite possible. We do it by finding the business behind the web site, and doing some automated due diligence. We check business records, SEC filings, BBB ratings, and Dun and Bradstreet to verify business legitimacy. We down-rate most of the junk. We try to err in the down-rating direction, taking the position that it's the job of a company to demonstrate their legitimacy by using their real name and address on their web site, which has to match real-world business records. Our demo site demo site for this shows what search is like if you take a hard line on spam.
Our approach requires more of a hard-ass attitude than Google's business model can perhaps afford. With Bleekko making Google look foolish, though, and Bing slowly improving, Google may have to actually do something that works, even if it cuts into revenue from the spam.
I've switched to other search engines; from my experience, Google provides too many tangential and corporate references when I do research.
Also, how does Google "know" that their search results were valid? I'll often do a Google search, click a couple of links, and after being disappointed, I'll go to another search engine where I get more useful results.
What bugs me the most are searches on technical or medical topics, where Google give me a dozen "harvester" results -- e.g., I get sites that have stolen conversations from other message boards, and reported them along with tons of ads. Yuck! There must be dozens of hundreds of sites, all with broken answers to questions about JavaScript and/or medicines.
Just because evidence is anecdotal doesn't mean it should be blithely discounted. If I say "Ouch" at being cut, that means the injury hurt me; the pain is quite real even if no one else has felt it.
All about me
Empiricism is all about saying "Here's what I did, and those are the results.". It's not empirical to say "Trust me, I did something I can't tell you about, and the results are really good".
My favorite part is how searching for something that happens to appear in a Stackoverflow question returns dozens of sites that copy and paste the Stackoverflow content surrounded by ads.
In the last few years, I've found search results have been dominated more and more by content mills like associated content, ehow, hubpages, about, and others; or some low quality Q&A page, like yahoo answers. The pages are hastily written and edited, and low content. The articles are also typically written by someone without any relevant knowledge or experience - so the information is common knowledge or wrong.
If google's metrics say quality is up, but their users think quality is down, then google's metrics need to be revised to match user experience more closely. I've started using duck duck go because they block content mills, and thus I think their results are as good or better than google, even without the complicated algorithms and all the data google has accumulated.
Here's a great example of returning pages that don't contain what you're searching for.
Search for +open +cat +mug +frame
The first link only contains 2 of the 4 terms.
Returning a page that does not contain a required search term is a failure state.
I find being offended by me offensive.
I use a Google Customized Search Engine (CSE) configured to promote StackOverflow and block ExpertSexchange. Here, you can try it out: www.google.com/cse/home?cx=007350804174195462206:7etfz1pyl-s . I've set it as my default search engine in Chrome and never have to think about it again.
Actually it does, the description text is hidden until some user actions are taken. A ctrl-f on the page may not return results for the terms, but viewing source and ctrl-f does.
insight through the mind