Slashdot Mirror


How Google Trends & News Pollute the Web

Danny Sullivan's hard-hitting piece at Search Engine Land calls on Google to quit being evil in one particular way: collaborating with sleazy websites that jump on Google Trends to grab advertising revenue, as Google itself rakes it in. "Google's CEO Eric Schmidt has quite famously been on record many times talking about how the Web is full of garbage. It's a cesspool out there, he's said. Today, a short fast look at how his own company pollutes the Web. ... That [example of an off-topic, trend-following] page isn't adding any value to the web. If it didn't exist, we wouldn't be the less savvy... But thanks to Google Trends, we've got a big red flag up in front of publishers that wish to pollute Google's results with this type of garbage. ... On the one hand, I love Google Trends. It's fun seeing what the top terms are that are sparking interest... On the other hand, it's clear how much [garbage] Google has caused to be generated, simply by publishing the trends. But that garbage wouldn't happen, if it didn't know it was going to be rewarded. It is, both with traffic from Google and from revenue from Google for those carrying its ads."

29 of 101 comments (clear)

  1. hard hitting? by poetmatt · · Score: 4, Insightful

    What the hell is this guy's point? Bing could release a "trends" the same as google, yet everyone is acting like google is god.

    If anything, a blog post on a site called search engine land, which is all about SEO, hating on google, sounds like a competitor disliking their own competitor.

    1. Re:hard hitting? by eldavojohn · · Score: 4, Insightful

      I don't think he even understands how the ads work.

      All you have to understand is that

      1. Google Trends tells people when a key word gets hot (like 'Chocomize').
      2. Websites of ill repute watch this and then gank content from CNN about some fluff piece on Chocomize.
      3. The websites that gank the content vie for the top seats in the "organic" part of search results (not the advertisement part of Google's search results).
      4. When the user is selects any of these websites (and in the case of chocomize there are many), they are hosting Google ads so Google actually profits from this misbehavior.

      The author of the article is complaining that Google encourages poor behavior and then turns a dime on it through whatever ads end up being hosted at the websites that don't produce any actual content. You can claim they don't know this is happening or they don't care or they are laughing all the way to the bank. Either way the author appears to be correct in his analysis although you cannot be certain that Trends is where the crap websites find which terms are hot. Other sites could possibly measure this but would require a lot of indexing and resources to do so. So it's most likely Google Trends.

      --
      My work here is dung.
    2. Re:hard hitting? by KarrdeSW · · Score: 2, Interesting

      So it's most likely Google Trends.

      It should also be noted that the guy's only "case study" has to do with an article poached from CNN. While Google Trends makes a likely culprit, this misbehavior could just as easily have started by people watching the "top articles" on CNN.

      It's even possible that the article poachers gain their "content" from multiple sources. It doesn't take much effort to copy-paste every time you see a high traffic article.

  2. Who cares? by RMH101 · · Score: 3, Interesting

    Certainly not Google. Or me, for that matter. The Big G's business model is built on the premise that storage is cheap, and that value is provided by being able to never delete anything, but make it available through a powerful search engine. When did you last delete something out of Gmail, for example?
    There are whole industries around SEO and it seems naive to think that people aren't going to create/alter content in order to get a higher ranking. Does it matter?

    1. Re:Who cares? by eldavojohn · · Score: 4, Interesting

      ... it seems naive to think that people aren't going to create/alter content in order to get a higher ranking.

      Well, it certainly is naive to think that considering that Google encourages it and they offer a PDF Starter Guide that instructs you how to alter your title, description and meta tags in your website to better your chances of coming up in the "organic" (not adwords) section of search results.

      Does it matter?

      Well, that's the article's argument. That it does matter because Google complains of the internet being a cesspool and yet here they are encouraging it with Trends. To you and I this is no problem. We don't care. To someone like Google that 0.1% of the end user experience might be worth millions of dollars to take care of because those end users are the eyeballs that sells their ad service to marketers of other companies. If Google perceives this to threaten the people that search their site then, yes, it does matter.

      There might be some day when you sit down to use Google and you search for some popular music or terms and all you get is complete unadulterated feces on the first page of search results. And you might consider checking the other search engine pages for the same results. If this phenomena could cause that to happen then, yes, Google will care very much.

      --
      My work here is dung.
  3. Tell me about it. by Skraut · · Score: 4, Interesting

    I started using using google blog search to create an RSS feed of topics I'm interested. Gradually I started using regex to filter out sites that were clearly just spam sites. Now my regex statement is about 20K in size, and out of 150 results that Google returns, I may have 4 or 5 stories that make it through the filter.

    --
    Introducing Microsoft Vacuum 1.0 The first Microsoft product that doesn't suck.
    1. Re:Tell me about it. by dargaud · · Score: 2, Funny

      Now my regex statement is about 20K in size

      I almost fainted when I read that...

      --
      Non-Linux Penguins ?
  4. Chocomize! by Rhaban · · Score: 4, Funny

    His point is to write an article about how people will write articles about Chocomize to draw traffic to their site because Chocomize shows up in google trends. It allows him to use many words from google trends inside said article (I didn't count the occurences of the word "Chocomize", but I had never seen so many occurences of this word in a single page), thus drawing attention to his article.

    Chocomize.

    1. Re:Chocomize! by johnhennessy · · Score: 4, Insightful

      I'm sorry, but lets take a step back here ...

      This sounds like a glitch in the search algorithm than anything else. Publishing trends is interesting, and can allow us to learn more about what we (as a species) do with the internet. This information is clearly abused by a few (who then go out and write fake page which use the popular keywords to attract attention to their page), but this is an abuse of the Trends information that google provides, not something inherently evil.

      Google (or any search engine) could just tweak their results to reduce the importance of sites which are written *after* a topic became trendy. At least to give the existing articles a head start. Or I can imagine a million other ways in which they could tweak the algorithm.

      But I don't think what the article is implying (that google should stop publishing Trends) should be taken seriously.

      --
      [ Monday is a terrible way to spend one seventh of your life. ]
    2. Re:Chocomize! by whencanistop · · Score: 4, Insightful

      To be fair, I think he is more ranting about the fact that he noticed that Chocomize was trending (for whatever reason) and he had to plough through hundreds of spam sites before finding the real reason that it was trending (the CNN article). Why are the spam sites there? Because the CNN article caused people to search for the term, pushed it up on Google trends, automated tools caused some sites to create new pages that Google then index higher. Google could fix this by improving their news algorithm.

      Is it Chocomise in the UK, just out of interest?

    3. Re:Chocomize! by Anonymous Coward · · Score: 2, Insightful

      Google (or any search engine) could just tweak their results to reduce the importance of sites which are written *after* a topic became trendy.

      Yea, that might not work so well for developing news stories. Yea, a CNN puff piece on Chocomize really only needs the one article that started it, but a trend like a political election, the latest news is significantly more relevant than the first.

    4. Re:Chocomize! by Rhaban · · Score: 2, Interesting

      They could detect articles that are duplicates of previous articles and penalize that.

    5. Re:Chocomize! by TheLink · · Score: 2, Interesting

      Or Google could just make it easier for me to blacklist entire sites from appearing on google search.

      Currently you have to tinker around with Google's custom search[1], and it's kinda klunky when there are hundreds of linkspam sites.

      The "whack-a-mole" needs to be easier.

      Yes I even tried a few firewall plugins but they didn't work so well. Maybe things have improved since.

      [1] http://www.ehow.com/how_6752589_create-blacklist-google-search-results_.html

      --
    6. Re:Chocomize! by TheLink · · Score: 3, Insightful

      Or modify their ranking algorithm to smack down these spammers. For example, just pick a few very unrelated trend keywords/phrases. Then find sites which are turning up for these set of unrelated keywords. After some sanity checks, rank the sites down.

      And remember that xkcd coined word ( http://news.slashdot.org/article.pl?sid=10/05/13/183221 )? You can use stuff like that to find a whole bunch of sites to exclude.

      --
  5. And the solution is? by Posting=!Working · · Score: 4, Insightful

    So should Google shut down Google Trends? Block it from their ad customers? Somehow force them to ignore it? What the hell does he expect/want/think how in a perfect world this would work?

    There's no point to this article. It's claiming an evil conspiracy just because Google Trends exists.

    --
    This sentence no verb.
  6. Tools by Nerdfest · · Score: 4, Insightful

    So, Google is Evil because they release a useful tool that slimy people are abusing?

  7. Stop that by Drakkenmensch · · Score: 3, Funny

    Then just quit doing searches for Britney Spears, Lindsay Lohan and Paris Hilton.

  8. So Google is bad for being transparent? by JoshuaZ · · Score: 4, Insightful

    So Google is bad for being transparent and releasing data which is aggregated and highly anonymous? It is a good thing I don't run Google because after enough articles like this I'd be tempted to say "you know, we get so much crap even when we're being helpful. Let's see what happens if we just try to act really, really evil for a few months." Seriously, this criticism comes down to Google releasing interesting data which in the long run could be actually useful to sociologists and other academics. It already has been used to help accurately get an idea of where the common flu is and how bad it is at any given time http://www.google.org/flutrends/. And the complaint in TFA is that unethical people can abuse this data at the margins. The obvious question is whether that minor abuse outweighs the positive good created by having this data. At least for me, the answer seems to be know, but that's partially because I have a strong ideological commitment to transparency and openness. When in doubt, give people access to data when it can be done easily.

    1. Re:So Google is bad for being transparent? by maxume · · Score: 2, Insightful

      The part I find most irritating is that Google also profits from the actions of the abusers (because the abusers are using Google advertising).

      --
      Nerd rage is the funniest rage.
  9. The problem with crowd widdom by Anonymous Coward · · Score: 2, Interesting

    The problem with naive crowd wisdom, like the one generated by Google Trends,
    is that it's generally untrue that most people like what people like most.
    The average "like" of people is not the "like" of the average person.
    What people like most is the lowest common denominator.

    Ironically, when publishers adopt that fallacy, they create the garbage that gives Google relative value, by reducing even more value from other ways of data consumption.
    So the negative effect of Google Trends works great for Google.

  10. Idea and media makes profit by Maarek · · Score: 2, Insightful

    These guys got lucky and hope to keep going with their chocolate idea. The only thing is that they need to keep their idea going. By being near the top of Google's search list, they will make money until it wavers. The CNN news story is the ground breaking story, now they would need to advertise on Television and maybe make an appearance on a show for a few minutes to make a huge profit for the company to survive on.

  11. Not just trends by shird · · Score: 4, Insightful

    Why would the spammers only copy trending topics? Why not just screen scrape everything from cnn.com and add ads? They do.

    It just looks like they are only targeting trends because Google picks up on that stuff and aggregates it when it is a hot topic, so you see more of it.

    Spammers don't need the trends, they are screen scraping everything, or just the headlines. This has been going on forever, long before "trends" existed. There are just more of them, and they are getting better at making their spam farms and increasing their page-rank, such that their screen scraped content is actually beating the site they copied from in the results.

    Sadly it's only going to get worse, as it's too easy for even a single person to create many terabytes of auto-generated spam. Multiply that by the thousands of spammers doing it every minute.

    --
    I.O.U One Sig.
  12. Advertisers have turned the net into slime* by countertrolling · · Score: 2, Insightful

    What else is new? Try to find drivers and service manuals... Virtually all the results are spam sites.. I got better returns 20 years ago when Compuserve was king.

    *Kinda reminds me of a nerdy news site that treats binspam as actual news on its front page. Eh... all part of the dumbing down process.

    --
    For justice, we must go to Don Corleone
  13. Can you actually replicate this article's issue? by mhwombat · · Score: 2, Insightful

    When I google for "Chocomize", my top three results are the source chocolate-making company - not spam. The fourth, the only thing remotely resembling pollution, is this searchengineland article itself.

    Also, if this is an issue, I really don't think the right solution is to hide the information.

  14. Re:SEO and Google by mcgrew · · Score: 2, Informative

    Google trends hasn't helped my sight at ALL!!

    You might try searching for "eyeglasses" or "contact lenses." That would help your sight.

  15. Re:Will only hurt google in the end by thePowerOfGrayskull · · Score: 2, Insightful

    Advertisers aren't stupid. Google ads are only worthwhile if they're actually generating revenue for the advertiser. Eventually, if they keep allowing this sort of practice, it's only going to drive down their own ad revenue (as advertisers realize they're not getting as much revenue from their ads as they once were).

    If someone clicks on an advertisement then buys, does it really matter which spam site they arrived through? There's nothing that suggests they're getting less revenue; in fact, they may be getting more since the ads themselves will be relevant to what is searched for.

  16. Web pollution via parroting by ghostlibrary · · Score: 4, Funny

    I ran into bizarre web parroting-- a site took an article about my DIY satellite from "Wired", and (best guess) ran it through an English->Chinese translator then back to Chinese->English. So we end up with sentence-by-sentence content stealing, but with its own working, e.g.:

    "Once deployed, they can put out enough power to be picked up on the ground by a hand-held amateur radio receiver." [from Wired]

    "Once deployed, they can put out enough energy to be picked up on the belligerent by the hand-held pledge airwave receiver." [from Tubesat Gerber]

    Or this bit

    "Once the bastion of NASA and commercial satellite services, space has now become the final frontier for the do-it-yourselfer next door." [Wired]

    "Once a bastion of NASA as well as blurb heavenly body services, space has right away turn the final limit for a do-it-yourselfer subsequent doorway." [Tubesat Gerber]

    That's me, the blurb heavenly body service belligerent receiver!

    A.
    http://projectcalliope.com/ "Music from Space, Launching 2011"

    --
    A.
  17. Content Farming and Demand Media by meehawl · · Score: 2, Informative

    A US public radio show just ran a whole feature on Web 2.0 content farming. Wired also ran this piece on one of the main polluters, Demand Media, a while back, explaining how it uses algorithmically driven keyword generators that grab "hot" (ie, adclick revenue-generating) trends from, among others, source such as Google Trends, then farms out a skeleton of an article with the required keywords to an extremely poorly paid human whose job it is to string together acceptably human-readable inter-keyword verbiage to flesh out an "article".

    --

    Da Blog
  18. The question is why can't Google fight it by snowwrestler · · Score: 2, Insightful

    Let's say you're right. Now Google has an index for cnn.com, and an index for spamdomain.com. Presumably the timestamps on the cnn.com pages are a bit earlier since it takes time for spamdomain.com to scrape and republish the content, and then for Google to index the new content on spamdomain.com.

    I'm no computer scientist but it seems that this is the sort of data mirroring that should be pretty easy to spot algorithmically. If two domains share >80% of the exact same content, de-emphasize the one with later timestamps.

    The provocative theory is that Google doesn't care which site ranks first, as long as its ads are being served on both. Or worse, that Google allows the crap to float to the top if it is carrying Google ads, and cnn.com is not.

    Is the theory right? Who knows besides Google? Perhaps it is not so easy for the algorithm to distinguish what to our minds is obvious spamming. And one of the things that Google is up-front about is that if they can't do it algorithmically, they're not interested in it.

    --
    Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.