How Google Trends & News Pollute the Web
Danny Sullivan's hard-hitting piece at Search Engine Land calls on Google to quit being evil in one particular way: collaborating with sleazy websites that jump on Google Trends to grab advertising revenue, as Google itself rakes it in. "Google's CEO Eric Schmidt has quite famously been on record many times talking about how the Web is full of garbage. It's a cesspool out there, he's said. Today, a short fast look at how his own company pollutes the Web. ... That [example of an off-topic, trend-following] page isn't adding any value to the web. If it didn't exist, we wouldn't be the less savvy... But thanks to Google Trends, we've got a big red flag up in front of publishers that wish to pollute Google's results with this type of garbage. ... On the one hand, I love Google Trends. It's fun seeing what the top terms are that are sparking interest... On the other hand, it's clear how much [garbage] Google has caused to be generated, simply by publishing the trends. But that garbage wouldn't happen, if it didn't know it was going to be rewarded. It is, both with traffic from Google and from revenue from Google for those carrying its ads."
What the hell is this guy's point? Bing could release a "trends" the same as google, yet everyone is acting like google is god.
If anything, a blog post on a site called search engine land, which is all about SEO, hating on google, sounds like a competitor disliking their own competitor.
Certainly not Google. Or me, for that matter. The Big G's business model is built on the premise that storage is cheap, and that value is provided by being able to never delete anything, but make it available through a powerful search engine. When did you last delete something out of Gmail, for example?
There are whole industries around SEO and it seems naive to think that people aren't going to create/alter content in order to get a higher ranking. Does it matter?
I started using using google blog search to create an RSS feed of topics I'm interested. Gradually I started using regex to filter out sites that were clearly just spam sites. Now my regex statement is about 20K in size, and out of 150 results that Google returns, I may have 4 or 5 stories that make it through the filter.
Introducing Microsoft Vacuum 1.0 The first Microsoft product that doesn't suck.
His point is to write an article about how people will write articles about Chocomize to draw traffic to their site because Chocomize shows up in google trends. It allows him to use many words from google trends inside said article (I didn't count the occurences of the word "Chocomize", but I had never seen so many occurences of this word in a single page), thus drawing attention to his article.
Chocomize.
So should Google shut down Google Trends? Block it from their ad customers? Somehow force them to ignore it? What the hell does he expect/want/think how in a perfect world this would work?
There's no point to this article. It's claiming an evil conspiracy just because Google Trends exists.
This sentence no verb.
So, Google is Evil because they release a useful tool that slimy people are abusing?
Then just quit doing searches for Britney Spears, Lindsay Lohan and Paris Hilton.
So Google is bad for being transparent and releasing data which is aggregated and highly anonymous? It is a good thing I don't run Google because after enough articles like this I'd be tempted to say "you know, we get so much crap even when we're being helpful. Let's see what happens if we just try to act really, really evil for a few months." Seriously, this criticism comes down to Google releasing interesting data which in the long run could be actually useful to sociologists and other academics. It already has been used to help accurately get an idea of where the common flu is and how bad it is at any given time http://www.google.org/flutrends/. And the complaint in TFA is that unethical people can abuse this data at the margins. The obvious question is whether that minor abuse outweighs the positive good created by having this data. At least for me, the answer seems to be know, but that's partially because I have a strong ideological commitment to transparency and openness. When in doubt, give people access to data when it can be done easily.
The problem with naive crowd wisdom, like the one generated by Google Trends,
is that it's generally untrue that most people like what people like most.
The average "like" of people is not the "like" of the average person.
What people like most is the lowest common denominator.
Ironically, when publishers adopt that fallacy, they create the garbage that gives Google relative value, by reducing even more value from other ways of data consumption.
So the negative effect of Google Trends works great for Google.
These guys got lucky and hope to keep going with their chocolate idea. The only thing is that they need to keep their idea going. By being near the top of Google's search list, they will make money until it wavers. The CNN news story is the ground breaking story, now they would need to advertise on Television and maybe make an appearance on a show for a few minutes to make a huge profit for the company to survive on.
Why would the spammers only copy trending topics? Why not just screen scrape everything from cnn.com and add ads? They do.
It just looks like they are only targeting trends because Google picks up on that stuff and aggregates it when it is a hot topic, so you see more of it.
Spammers don't need the trends, they are screen scraping everything, or just the headlines. This has been going on forever, long before "trends" existed. There are just more of them, and they are getting better at making their spam farms and increasing their page-rank, such that their screen scraped content is actually beating the site they copied from in the results.
Sadly it's only going to get worse, as it's too easy for even a single person to create many terabytes of auto-generated spam. Multiply that by the thousands of spammers doing it every minute.
I.O.U One Sig.
What else is new? Try to find drivers and service manuals... Virtually all the results are spam sites.. I got better returns 20 years ago when Compuserve was king.
*Kinda reminds me of a nerdy news site that treats binspam as actual news on its front page. Eh... all part of the dumbing down process.
For justice, we must go to Don Corleone
When I google for "Chocomize", my top three results are the source chocolate-making company - not spam. The fourth, the only thing remotely resembling pollution, is this searchengineland article itself.
Also, if this is an issue, I really don't think the right solution is to hide the information.
Google trends hasn't helped my sight at ALL!!
You might try searching for "eyeglasses" or "contact lenses." That would help your sight.
Free Martian Whores!
Advertisers aren't stupid. Google ads are only worthwhile if they're actually generating revenue for the advertiser. Eventually, if they keep allowing this sort of practice, it's only going to drive down their own ad revenue (as advertisers realize they're not getting as much revenue from their ads as they once were).
If someone clicks on an advertisement then buys, does it really matter which spam site they arrived through? There's nothing that suggests they're getting less revenue; in fact, they may be getting more since the ads themselves will be relevant to what is searched for.
I ran into bizarre web parroting-- a site took an article about my DIY satellite from "Wired", and (best guess) ran it through an English->Chinese translator then back to Chinese->English. So we end up with sentence-by-sentence content stealing, but with its own working, e.g.:
"Once deployed, they can put out enough power to be picked up on the ground by a hand-held amateur radio receiver." [from Wired]
"Once deployed, they can put out enough energy to be picked up on the belligerent by the hand-held pledge airwave receiver." [from Tubesat Gerber]
Or this bit
"Once the bastion of NASA and commercial satellite services, space has now become the final frontier for the do-it-yourselfer next door." [Wired]
"Once a bastion of NASA as well as blurb heavenly body services, space has right away turn the final limit for a do-it-yourselfer subsequent doorway." [Tubesat Gerber]
That's me, the blurb heavenly body service belligerent receiver!
A.
http://projectcalliope.com/ "Music from Space, Launching 2011"
A.
A US public radio show just ran a whole feature on Web 2.0 content farming. Wired also ran this piece on one of the main polluters, Demand Media, a while back, explaining how it uses algorithmically driven keyword generators that grab "hot" (ie, adclick revenue-generating) trends from, among others, source such as Google Trends, then farms out a skeleton of an article with the required keywords to an extremely poorly paid human whose job it is to string together acceptably human-readable inter-keyword verbiage to flesh out an "article".
Da Blog
Let's say you're right. Now Google has an index for cnn.com, and an index for spamdomain.com. Presumably the timestamps on the cnn.com pages are a bit earlier since it takes time for spamdomain.com to scrape and republish the content, and then for Google to index the new content on spamdomain.com.
I'm no computer scientist but it seems that this is the sort of data mirroring that should be pretty easy to spot algorithmically. If two domains share >80% of the exact same content, de-emphasize the one with later timestamps.
The provocative theory is that Google doesn't care which site ranks first, as long as its ads are being served on both. Or worse, that Google allows the crap to float to the top if it is carrying Google ads, and cnn.com is not.
Is the theory right? Who knows besides Google? Perhaps it is not so easy for the algorithm to distinguish what to our minds is obvious spamming. And one of the things that Google is up-front about is that if they can't do it algorithmically, they're not interested in it.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.