Slashdot Mirror


Microsoft Tracks Down Mass Fake Web Pages

An anonymous reader writes "According to an article on New York Times, Microsoft researchers have discovered tens of thousands of junk Web pages, created only to lure search-engine users to advertisements. While most of us have run across them from time to time, the company researchers have found the pages are deliberately generated in vast numbers by a small group of shadowy operators. By following the money trail, Microsoft researchers were able to track the flow from big-name advertisers to search engine spammers. Many use Google's blogspot.com to set up spam doorway pages. 'The practice has proved to be a vexing problem for the major search companies, which struggle to prevent both spammers and companies specializing in improving legitimate clients' Web traffic -- a field known as search-engine optimization -- from undermining their page-ranking systems. Surprisingly, the researchers noted that the vast bulk of the junk listings was created from just two Web hosting companies and that as many as 68 percent of the advertisements sampled were placed by just three advertising syndicators.' The report is available at Microsoft Strider Search Ranger project page."

135 comments

  1. The easy way by truthsearch · · Score: 4, Interesting

    They could have saved a lot of time and money by just visiting forums like DigitalPoint. These doorways and other spammy sites are for sale every day. It's no secret.

    1. Re:The easy way by insanemime · · Score: 5, Funny

      Well it's about time someone tracked down these spammers. I can't count how many times I was searching for porn on the internet and got an ad page. The nerve of some companies.

  2. Bring out the torches and pitchforks! by Anonymous Coward · · Score: 1, Interesting

    I was actually surprised to find their "what to do" points so simple and to the point.

  3. another ripoff by gEvil+(beta) · · Score: 3, Funny

    Man. This Microsoft project is just a ripoff of Google's Gandalf Search Wizard project...

    --
    This guy's the limit!
    1. Re:another ripoff by eldavojohn · · Score: 4, Funny

      The report is available at Microsoft Strider Search Ranger project page.

      Man. This Microsoft project is just a ripoff of Google's Gandalf Search Wizard project...
      Yeah, but let's not forget that even before that was AOL's Smeagol Browser Gollum project ...
      --
      My work here is dung.
    2. Re:another ripoff by voice_of_all_reason · · Score: 1

      Register.com among the Businesses, Melbourne IT to the Australians; Tucows I was in my youth that is forgotten, in the South ENom, in the North GoDaddy, to the East I go not...

    3. Re:another ripoff by lostboy2 · · Score: 2, Funny

      The report is available at Microsoft Strider Search Ranger project page.

      Man. This Microsoft project is just a ripoff of Google's Gandalf Search Wizard project...

      Yeah, but let's not forget that even before that was AOL's Smeagol Browser Gollum project
      When I was a kid, all we had was the U of Minnesota's Sauron Gopher Overlord project...
    4. Re:another ripoff by CurtisAutery · · Score: 2, Funny

      The report is available at Microsoft Strider Search Ranger project page.

      Man. This Microsoft project is just a ripoff of Google's Gandalf Search Wizard project...

      Yeah, but let's not forget that even before that was AOL's Smeagol Browser Gollum project

      When I was a kid, all we had was the U of Minnesota's Sauron Gopher Overlord project...
      You had Gopher as a kid!? Man, we were stuck with local BBS Sam & Frodo's ASCII Express Second Breakfast project.
    5. Re:another ripoff by jagdish · · Score: 1

      yeah, but before that there was Chuck Norris' Walker Texas Ranger project.

    6. Re:another ripoff by The_mad_linguist · · Score: 2, Funny

      You had the Second Breakfast? We only had Bilbo's Punchcard Breakfast.

  4. Great by BadERA · · Score: 1

    I fully expect to see an improvement in my search results ... for about five minutes, until the SEO spammers crank out their next method of making the Internet less efficient.

    --
    I am, therefore you think.
    1. Re:Great by jackv · · Score: 1

      Very ambivalent , i.e the difference between a pure doorway page and an "apparent" information page with advertisements , all over it

  5. Why? by Herkum01 · · Score: 5, Interesting

    Is it really cheaper to use Page Ranking companies instead of just well, PAYING for an advertisement on Google or MSN or something?

    1. Re:Why? by Frosty+Piss · · Score: 5, Insightful

      Is it really cheaper to use Page Ranking companies instead of just well, PAYING for an advertisement on Google or MSN or something?
      Yes, or they wouldn't do it.
      --
      If you want news from today, you have to come back tomorrow.
    2. Re:Why? by Anonymous Coward · · Score: 1, Insightful

      That's incredibly naive. You don't honestly think that all companies work at 100% efficiency do you?

    3. Re:Why? by fruey · · Score: 5, Informative

      The average return on investment on Search Engine Optimisation (generally: increasing your search position on specific keywords relevant to your business) can be about 10x more than the return on keyword purchasing, which can cost 0.30c - several dollars. Every click costs money.

      Once you've optimised to your keywords in "natural search" e.g. *free* results, then your investment keeps paying (you need to maintain positions of course, but this is lower cost, especially if you're in a niche) whereas in paid advertising you have to keep giving money to Google and, in competitive industries, your cost per click will be subject to significant inflation...

      --
      Conversion Rate Optimisation French / English consultant
    4. Re:Why? by brunascle · · Score: 1

      it may not be cheaper, but it may be more effective. search engines generally identify results that were purchased, and i'm sure a user is less like to click on it if they see that. the clients of these companies are buying their way into the results without have to be in that section.

    5. Re:Why? by hey · · Score: 1

      Sometimes businesses do stuff that doesn't work out -- they go bankrupt everyday.

    6. Re:Why? by terraformer · · Score: 3, Insightful

      It is also more effective. How many times do you click on ads? Now how many times do you click on search results? 'nough said...

      --
      Who are you? The new #2 Who is #1? You are #617565. I am not a number, I am a free man! Muhahaha.
    7. Re:Why? by monk.e.boy · · Score: 1

      People trust organic search results more, so even if they were more expensive to buy than paid for adverts, you'd get more bang for your buck.

      People who click on adverts are less likely to 'convert' (buy and item, sign up for a newsletter etc) than people who click on a natural search result

      Spam sucks bad, but if you can get into the top 20 of googles natural search, you have hit gold.

      monk.e.boy

    8. Re:Why? by Anonymous Coward · · Score: 0

      No, it doesn't follow. It's the belief that makes money for the rankgamers.

  6. "time to time"? by Frosty+Piss · · Score: 4, Insightful

    While most of us have run across them from time to time...

    Time to time? For mee it seems like more than 50% when I scan the search results. Maybe less, maybe more, but certainly more than "time to time". For many of my searches, I may not find anything truly relevant until the second and third page. People have learned how to play Google to the point where more and more Windows Live is starting to give better results (scary!).

    --
    If you want news from today, you have to come back tomorrow.
    1. Re:"time to time"? by Hoi+Polloi · · Score: 1

      Maybe the best thing to do is to automatically skip to the 2nd page of results and write off the first page as search engine spam.

      --
      It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
    2. Re:"time to time"? by Anonymous Coward · · Score: 0

      Really? Please provide say, 1, query where Windows Live gives less spam that Google.

    3. Re:"time to time"? by onepoint · · Score: 1

      No I don't agree with this, people like myself have businesses that have optimised web sites ( I am in the Miami rental market ), we target exactly a few words and nothing more. most of my business is organic and I don't rank any higher than 5 ( would love to have 2 or 3 rank) but I get enough traffic that I am happy and keep my little building full.

      I hate those spam-my web sites ( the top 4 other sites ) because they keep people away from my site and a few others that have vacation rentals here in Miami.

      Onepoint

      --
      if you see me, smile and say hello.
    4. Re:"time to time"? by hobo+sapiens · · Score: 5, Funny

      I agree. I run a small business out of Nigeria that helps people in unfortunate situations recover lost money, and we rely on upfront investments from Americans. We always promise a good cut of the money to our American investors. This search engine spam really puts the hurt on my business, too.

      --
      blah blah blah
    5. Re:"time to time"? by loafing_oaf · · Score: 1

      True enough. I recently switched back to Yahoo! search after about five years of nothing but Google. I don't know if the results are any better, but it sure is a good change of pace.

      --
      Always someone has power over you. The thing to consider is this: Is the power good, or bad?
    6. Re:"time to time"? by bendodge · · Score: 1

      I have never seen results that bad. You must be searching for porn, where spam is to be expected.

      --
      The government can't save you.
    7. Re:"time to time"? by onepoint · · Score: 1

      you win, I will no longer look at the first page. LOL

      --
      if you see me, smile and say hello.
    8. Re:"time to time"? by Frosty+Piss · · Score: 2, Insightful

      I have never seen results that bad. You must be searching for porn, where spam is to be expected.

      I beg your pardon... "Erotica" is a perfictly legitimate subject.

      --
      If you want news from today, you have to come back tomorrow.
    9. Re:"time to time"? by beckerist · · Score: 1

      http://www.google.com/search?num=100&hl=en&safe=of f&q=foxpro+%22close+button+disable%22
      -- 2 of the 3 are link farms
      http://search.msn.com/results.aspx?q=foxpro+%22clo se+button+disable%22&FORM=MSNH
      -- both of the links are valid.

      Luckily, I just happened to have searched for this yesterday!

  7. Ironically by Rik+Sweeney · · Score: 1

    they harvested most of their results from Google.

    1. Re:Ironically by Anonymous Coward · · Score: 0

      Whenever I do a search any more around 60% of the first few pages are nothing but spam.
      Strange, since Google works just fine for me; I don't remember the last time I got a spammy hit, but it's certainly at least many months ago. Maybe it is because I search for different things to you, but even so.
    2. Re:Ironically by Anonymous Coward · · Score: 0

      Same for me, I can't even remember a time in a LONG time where I got enough spam results on google that I actually noticed them. Maybe it's just because we search better? Or don't search for things like "viagra casinos" and "free sex!!!"?

    3. Re:Ironically by Goaway · · Score: 1

      It's because he's a paid shill.

  8. Nice work by MysteriousPreacher · · Score: 4, Informative

    There's actually some pretty decent research here. The site cloning report is a good read.

    http://research.microsoft.com/SearchRanger/Spam_At tack_by_Website_Clones.htm

    The cloning of popular blogs as been a scourge for a while now, both for manipulating search engines and good old fashioned advertising - using someone else's content to draw visitors in

    --
    -- Using the preview button since 2005
    1. Re:Nice work by onepoint · · Score: 2, Interesting

      You are 100% correct that Google does help clean up it's searches. I do about 100 web searches a day to learn stuff, every time I come across spammy results I send Google a note. I think it's working, because the next week when I want to learn more on a topic it's much improved

      --
      if you see me, smile and say hello.
    2. Re:Nice work by MysteriousPreacher · · Score: 1

      Yeah, Google are pretty good at cleaning up. My blog got hammered by a russian spammer and after some complaints, his sites began to vanish from Google. Good thing really since his host (Everyone's Internet) had no interest in shutting him down - despite the fact that he was using some pretty nasty hidden code on his sites to spam forums and blogs whenever his pages were loaded using IE.

      --
      -- Using the preview button since 2005
    3. Re:Nice work by HomelessInLaJolla · · Score: 1
      While I'll forever be a free software advocate I do need to give proper recognition for a good and true endeavor by the other team:

      By following the money trail, Microsoft researchers were able to track the flow from big-name advertisers to search engine spammers. Someone needs to take a similar approach to find out where taxpayer money has been going while we've been in Afghanistan and Iraq.
      --
      the NPG electrode was replaced with carbon blac
    4. Re:Nice work by hankwang · · Score: 1

      You are 100% correct that Google does help clean up it's searches.

      Hmm, I always had the impression that they use the feedback to seed a database of pages to test their spam-removal algorithms on. They claim that they "prefer automated solutions rather than manual removal".

      One of my big annoyances is sites that are spidered by Google but require mere mortal visitors to purchase a subscription. For example, searches on certain technical subjects often return pages with IEEE publications - purchase this article for US$ 20. And for a long time, webmasterworld.com had been blocking most of the world except the US. Extremely annoying, to search for any webmaster-related subject and WMW is the #1 hit and being unable to see the page. I reported this type of sites several times, but they were never removed.

    5. Re:Nice work by onepoint · · Score: 1

      In reference to web master world issues with search engines. the discussion has been more than once discussed. basic registration get you most of the issues you want to learn, the paid subscriptions get you into the special area's.

      you have to understand that his servers were consistently being spider-ed and his bandwidth cost were way high. kill all spiders was his first thing then he made special changes.

      Onepoint.

      --
      if you see me, smile and say hello.
    6. Re:Nice work by hankwang · · Score: 1

      you have to understand that his servers were consistently being spider-ed and his bandwidth cost were way high. kill all spiders was his first thing then he made special changes.

      That's all fine with me, but then block Googlebot as well. Allowing Googlebot and not allowing 80% of the world population is called cloaking in my dictionary and Google should have removed the whole site from the index for that reason.

    7. Re:Nice work by onepoint · · Score: 1

      he did block Google. that's well document.

      >>Allowing Google bot and not allowing 80% of the world population is called cloaking in my dictionary

      no, if you read all the issues, most people could see his site, very few could not because scrapers were coming from those IP's. and 80% ... maybe 30% at tops and north america - europe - had full access.

      anyway here is the view point from brett : http://blog.searchenginewatch.com/blog/051128-1616 06

      --
      if you see me, smile and say hello.
  9. Then Microsoft realized... by physicsboy500 · · Score: 5, Funny

    It's coming from inside the building!!!

    --
    The original generic sig.
    1. Re:Then Microsoft realized... by Joe+Snipe · · Score: 1

      If anyone would know it would be you two: Is it possible to get a +5 Offtopic?

      --
      Sometimes, life itself is sarcasm...
    2. Re:Then Microsoft realized... by Anonymous Coward · · Score: 0

      Has BSG been a yawner this season or what? One more episode to go...


      I have never been able to watch it at all. Horrifically contrived "empowered" female characters make me sick. If I were a woman I would be insulted, as a man I'm just disgusted.
    3. Re:Then Microsoft realized... by Anonymous Coward · · Score: 0

      No need to make up the comedy when the paper includes lines like "Web Patrol with Search Monkeys."

  10. How does this help them? by Paul+Crowley · · Score: 1

    PageRank is designed to be resistant to exactly this sort of attack. The amount of Google karma you get is proportional to the karma of the pages that link to you. Creating lots of pages with no karma that link to you therefore shouldn't do you any good at all. Why do they bother?

    Theories:

    (1) There's a subtle way that it helps I haven't spotted yet, perhaps to do with non-PageRank elements of Google's search ordering

    (2) This is all done by a very few companies because they are the few that don't understand PageRank and therefore don't realise it won't help...

    1. Re:How does this help them? by jandrese · · Score: 3, Insightful

      It works because you don't realize the size of this thing. They're talking about millions of fake pages here, lots of them pointing at other fake pages to raise their pagerank so they can in turn point at yet more pages. You would think Google would have someone seeking these kind of sites out and applying a discount on their domain though (although when that happens the spammers just move on anyway).

      --

      I read the internet for the articles.
    2. Re:How does this help them? by Paul+Crowley · · Score: 2, Interesting

      Er, that sounds like the old saw "we lose a penny on each one sold, but we make it up in volume".

      If there's only so much karma going into your pages, there's only so much karma they have to give, no matter how huge it is. A trillion pages pointing at my page won't increase its karma, if those trillion have no karma to give.

    3. Re:How does this help them? by volsung · · Score: 1

      Every page has to start with some small, intrinsic amount of karma, otherwise there would be none to pass around. By creating enough bogus pages, you can aggregate some amount of link karma to bestow on the site of your choosing. In principle, I guess this would devalue everyone's PageRank too (kind of like printing money), but for a while it could be profitable.

      The second hole is the popularity of websites with user-generated content. Lots of highly ranked websites (like /. in fact) allow anyone, or almost anyone, to add arbitrary links to pages, thereby redirecting some small fraction of the sites possibly large link karma to any place they want. This can also be used to gather karma from insignificant websites (like the thousands and thousands of semi-dead blogs with comments enabled) in mass quantities. It's like the urban myth of the bank scam where someone gets rich stealing all the fractional cents left over in interest calculations.

      Of course, these are only problems for the original PageRank algorithm. It's pretty clear that Google has modified in several ways to fight these problems, such as through the introduction of the "nofollow" link attribute.

    4. Re:How does this help them? by Paul+Crowley · · Score: 1

      Every page has to start with some small, intrinsic amount of karma, otherwise there would be none to pass around.

      There has to be a "root set", but that root set doesn't have to consist of all pages. There's some evidence that it includes all top-level pages, because the Scientologists experimented with creating zillions of top-level domains to increase their Google ranking. But ordinary pages, as I understand it, have no intrinsic karma at all.

      Yeah, blog SEO spam is a great evil irritant. I do understand how *that* helps them.

    5. Re:How does this help them? by FooAtWFU · · Score: 2, Insightful
      Presumably some of these trillion pages have a karma greater than or equal to epsilon.

      The scummiest part of it all is that some of the pages in question will be on domains that someone let expire and someone else immediately snatched up. They get their PageRank from the sites that linked to the formerly legitimate domain. And if that was your domain name, and you only let it expire accidentally, well, sucks to be you. :(

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
    6. Re:How does this help them? by jdoeii · · Score: 1

      Creating lots of pages with no karma that link to you therefore shouldn't do you any good at all

      That's not how it works. You assume it's a zero-sum game, but it's not. Every page gets some weight even if no one links to it. It's small, but it's positive. When one page links to another, the weight of the source page is reduced less than the target page gains. So, here is the business plan:
      1. Make a lot of unique pages (G in the PR calculation joins identical or nearly identical pages)
      2. Crosslink them in a non-obvious way (i.e. not A <-> B but A -> B -> C -> A).
      3. Sell ads on high PR pages with a lot of traffic from G or Y
      4. Profit!
      It really works.

    7. Re:How does this help them? by Paul+Crowley · · Score: 1

      Indeed, and this has happened to me.

    8. Re:How does this help them? by Paul+Crowley · · Score: 1

      Every page gets some weight even if no one links to it. It's small, but it's positive.

      That's not the impression I'm under - I thought that most pages were not part of Google's "root set". See my reply here:

      http://slashdot.org/comments.pl?sid=227331&thresho ld=1&commentsort=3&mode=thread&pid=18413697#184137 87

    9. Re:How does this help them? by brunascle · · Score: 1

      our site is actually working with one of these companies (on the receiving end of the paycheck, though). they want to put "ads" on our site that link to other sites. they dont care at all what the ads look like or where they are on the page, but just that there's a link to another site. and the link has to be search-indexable (no javascript). all they care about is boosting the rank of their clients, not the number of clicks.

    10. Re:How does this help them? by jdoeii · · Score: 1

      That's not the impression I'm under - I thought that most pages were not part of Google's "root set"

      I understand that you have such an impression, but that's a wrong impression. Every page gets a non-zero weight by default. If you think about it you will see that your scheme just would not work: emerging subjects/sites would stay with zero PR for a long long time until links to them propagate all the way to the "roots".

    11. Re:How does this help them? by Paul+Crowley · · Score: 1

      Since the answer is a closely guarded secret within Google, it's always fun to be contradicted by someone speaking in authoritative tone of voice who knows as little about this as I do :-)

      You're mistaken about your argument against, in any case; PageRank itself is public information, so I can tell you that it does not have the property you assign to it. There's a delay between a link being made and Google spidering and discovering it, but the eigenvector calculation at the heart of PageRank will propogate karma along the links as fast as it needs to go.

    12. Re:How does this help them? by sconeu · · Score: 1

      Happened to Andrew Koenig (or maybe his publisher Addison-Wesley) -- The website for Accelerated C++ (http://www.acceleratecpp.com ) was either hijacked or expired, and snapped up by some lowlife for about 2 months before it got fixed.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    13. Re:How does this help them? by sconeu · · Score: 1
      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    14. Re:How does this help them? by jdoeii · · Score: 1

      who knows as little about this as I do

      How do you know that?

      I can tell you that it does not have the property you assign to it

      The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.

    15. Re:How does this help them? by Paul+Crowley · · Score: 1

      Oooh, hints of dark and secret knowledge! Those are always very impressive.

      The delay I mentioned is due to links being made, not links being discovered. Think about some small community of scientists making an almost closed cluster of sites about their niche research subject.

      There is simply no way for Google to know that those pages are any good until people start linking to them. Fortunately it doesn't take long - for example, the scientists will get karma from the links from their institution front pages, which in turn get karma from the other respected scientists at those institutions.

    16. Re:How does this help them? by Smuffe · · Score: 1

      Here's hoping you add "nofollow" to those links...

    17. Re:How does this help them? by Anonymous Coward · · Score: 0

      The blogspot blogs mostly do JavaScript redirects and have nothing to do with PageRank. The reason it's done: spam 20.000 sites with links to hundreds of your blogs. There is always a chance someone clicks. The click is routed to a ppc search engine. Some I have seen load JavaScript from other sites. Couldn't be bothered to check out what it does.

      Bottom line: Google has been aware of the size - est. (mine) 30.000+ redirecting blogs in December - of this issue for several months now.

    18. Re:How does this help them? by jdoeii · · Score: 1

      Oooh, hints of dark and secret knowledge!

      It's only dark and secret for a newbie

      There is simply no way for Google to know that those pages are any good until people start linking to them.

      Exactly, except turned upside down. It's "there is no way for Google to know that those pages are spam", so they get positive weight until proven otherwise.

      from the links from their institution front pages

      A few links will make the cluster discoverable by crawlers but won't make a difference for PR. It's the cross links within the cluster that make the difference.

      I am sharing a first hand knowledge. I've seen it done this way. You seem to be continuing this conversation just for the sake of argument. But others reading this thread may actually learn something useful.

    19. Re:How does this help them? by Vintermann · · Score: 1

      Still, they need non-fake input to stay afloat. A billion links from worthless sites won't do my pagerank much good.

      --
      xkcd is not in the sudoers file. This incident will be reported.
    20. Re:How does this help them? by Paul+Crowley · · Score: 1

      Hints of dark and secret knowledge backed by insults! I'm more impressed by the minute.

  11. And? by jafiwam · · Score: 2, Interesting

    Ok. Forgive me if MS just discovering this makes me think they just entered 2002. That crap is _not_ new folks.

    On the other hand, what idiot spouts off about two hosting companies being responsible without naming them? Seriously. This isn't Fark, you can't get kicked off for calling some asshole out.

    1. Re:And? by Sirch · · Score: 3, Insightful

      ... but you can get sued for libel if you're wrong.

    2. Re:And? by gbjbaanb · · Score: 1

      but the best bit: Phillip Rosenthal, chief technology officer of one of the companies, ISPrime, an Internet services company based in New York, said the activity had been traced to a single customer and violated the company's acceptable-use policy. He said the company's relationship with the customer, whom he would not identify, had been severed

      so, one down, one to go. Its still a shame the offending company was not named, but I imagine it doesn't exist anymore, wound up and is now reborn as a differently named one.

  12. Re:The easiest way by TheMeuge · · Score: 3, Funny

    Quick, somebody make a few thousand clones of this report.

  13. And in other news... by sconeu · · Score: 3, Funny

    Microsoft researchers have discovered tens of thousands of junk Web pages, created only to lure search-engine users to advertisements.

    In other news, Microsoft researchers have discovered that the sky is blue and that water is wet.

    --
    General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    1. Re:And in other news... by Smuffe · · Score: 3, Funny

      Microsoft researchers have discovered that the sky is blue

      I live in London, you insensetive clod!

    2. Re:And in other news... by et764 · · Score: 1

      In other news, Microsoft researchers have discovered that the sky is blue and that water is wet.

      Discovering that the sky is blue is quite a discovery for a company based near Seattle. They should have known about water though, given all the rain they get.

  14. Are they the last ones to discover this? by softwareengineer99 · · Score: 0, Flamebait

    It amazes me how dumb Microsoft search researchers are as they are probably the last ones to discover that the majority of spam web pages are created by a handful of shadow operators. If I were in charge of the researchers who made this finding so late, I would have them dismissed promptly as this finding is too little, too late.

    1. Re:Are they the last ones to discover this? by Anonymous Coward · · Score: 0

      OK, rocket scientist, you tell us who those "shadow operators" are. If dumb MS researchers could do it, certainly you can too.

      I'm all ears and eyes.

  15. Obligatory Bill Hicks by Thaelon · · Score: 4, Funny
    Obligatory Bill Hicks...

    If you work in advertising, kill yourself.
    --Bill Hicks - Another Dead Hero
    --

    Question everything

    1. Re:Obligatory Bill Hicks by hazah · · Score: 1

      Good ol' Hicks... sigh.

  16. Bad neighborhoods by condour75 · · Score: 2, Interesting

    Google is already developing methods to deal with clusters of these fakes. Usually they're scraping web directories and databases. I've seen a lot of this lately, searching for dental hygiene schools for my girlfriend. Usually they're linking to each other, even if they're huge clusters. Legit SEO guys (yes, there are consultants who actually try to get your site linked legitimately and by hand) call these areas "bad neighborhoods". Whatever Google's doing, though, clearly isn't enough, and a lot of these guys are using adsense to make money. Martinibuster's got a few good links on the subject.

    1. Re:Bad neighborhoods by Anonymous Coward · · Score: 0

      SEO means faking relevance, and means tricking people. There is no such thing as legit SEO.

  17. How hard is it to find.... by Anonymous Coward · · Score: 0

    www.about.com?

    On another note, I've been wondering, based on results I see fairly regularly, whether it is possible for a site to dynamically produce a page based on the google search that it is linked from.

    When I am looking at search results I often hit pages that look like they were designed to match exactly my query, but are full of meaningless high level fluff, ads and links.

    1. Re:How hard is it to find.... by Anonymous Coward · · Score: 0

      On another note, I've been wondering, based on results I see fairly regularly, whether it is possible for a site to dynamically produce a page based on the google search that it is linked from.

      When I am looking at search results I often hit pages that look like they were designed to match exactly my query, but are full of meaningless high level fluff, ads and links.


      Easy as pie:

      http://www.google.com/search?q=page+referrer+php

      Hit #1. Just as easy in Perl or ASP.

      P.S. I don't get your comment on about.com, its one of the spammiest sites out there.

      P.P.S The 20 minute delay in this reply brought to you by slashdot's asinine policy:

      Slow Down Cowboy!

      Slashdot requires you to wait between each successful posting of a comment to allow everyone a fair chance at posting a comment.

      It's been 20 minutes since you last successfully posted a comment


      Pretty lame way to reduce serverload if you ask me.
    2. Re:How hard is it to find.... by Anonymous Coward · · Score: 0


      Easy as pie:

      http://www.google.com/search?q=page+referrer+php

      Hit #1. Just as easy in Perl or ASP.


      Yeah, I knew about that. What I don't get is how sites can get pages that don't exist until I click on them listed as results in the search.


      P.S. I don't get your comment on about.com, its one of the spammiest sites out there.


      My original comment about about.com was that whole site is EXACTLY what the article is talking about.

      about.com is a search engine "honeypot" that adds another step in between your search and the useful data, while spamming you as you pass through.

      Fortunately it is easy to see that a search result links to about.com, so I have learnt to not click on those results.

      If I could just get google to filter them out...

  18. Here are some more by Anonymous Coward · · Score: 0

    You can see how they make them: fed by Digg, obviously.
    (found via Digg's "who blogged about this" feature, remove f- from the start of the url)

    f-cartoons-plugin.com/blog/
    f-www.primenewsblog.com/
    f-fatmobil.com/blog/
    f-www.cartoonsfans.com/blog/
    f-searchroads.com/blog/

  19. A few years ago... by AliasTheRoot · · Score: 3, Interesting

    ...a friend of mine figured he could get great Google listings by autogenerating trashy link farm pages, he had the top 1000 porn search terms all cunningly mispelled, ie "Brittney Spares" and hundreds of thousands of static pages all linking into each other across a bunch of subdomains. For about a year we reckoned he had some stupid percentage of all porn listings in Google, and in that time he made around $1,000,000 from banner clicks. Eventually Google caught onto it and blocked his sites enmass, but he'd made enough to buy some property by then.

  20. Microsoftie wearing a white hat? by CodeShark · · Score: 5, Insightful

    I just finished reading how much the Strider group at M$ has accomplished and how, and it is rather amazing. They lifted the covers off of typo-domain squatters exploiting Google's programs, a progressive honeypot setup that detects which levels of XP are attackable by different mal-ware attacks (up to and including reporting zero-day exploits if the latest "patch hardened" machine is exploited], and now this project. Even better, they are publishing the "how", and any OS (AKA Mac OS or any of the Linux distros) could benefit by using similar approaches on even more machines.

    So -- from an admitted open source advocate -- here's a rare kudo to the giant in Redmond for keeping a "white hat" and his group -- and letting them work.

    --
    ...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
    1. Re:Microsoftie wearing a white hat? by TheViewFromTheGround · · Score: 2, Interesting

      I agree. Whatever else you say about MS, and there's lots to say, they seem to have given their security researchers a lot of freedom and because of their size and power have the resources and brainpower to tackle these problems in pretty cool ways. The sad thing, as with much of what comes out from MS, is that you see these really smart, awesome people doing great work, but when it comes to taking their own advice, you can see quite directly the way that the vast bureaucracy and Microsoft's avaricious corporate culture corrupting the good work.

      Case in point is IE7. If you look at the IE7 development blogs, you see some good ideas from people who by and large wanted to do good by the web development community. Yet the IE7 that was delivered to consumers can be charitably described as "disappointing", and less charitably described as a "watered-down piece of shit."

      --
      Online citizen journalism from the inner city: The View From The Ground
    2. Re:Microsoftie wearing a white hat? by Spy+Hunter · · Score: 1

      Microsoft Research has always done great things. Check out their graphics research or their Singularity OS. Microsoft Research is almost like a completely independent entity.

      --
      main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
  21. What adverts? by b.thompson · · Score: 0, Offtopic

    I just use Firefox with the Adblock & Filterset.G add-ins. I don't see any ads to click on.

  22. Nice work by kad77 · · Score: 3, Interesting

    Thanks for a informative post. Beats the typical whiny M$ iz S4T4|\| crap.

    Google does keep up, but quietly- anecdotally, last week I was searching for a certain spec ARM9 dev board (the VULCAN-Lite) with USD also as a search term and all kinds of fake keyword sites and eastern block bride services were in the top 20 results.

    I sent Google feedback with my search terms (VULCAN-Lite +USD), explained what spam was popping up, and as I write this comment a few days later-- the Google search comes back clean (empty for +USD, no spam in first 30 results for VULCAN-Lite). They apparently listen and respond to random user feedback pretty quickly.

  23. OK, where's the anti-Linux angle THIS time? by Anonymous Coward · · Score: 0

    I'm going to find a copy of this list and check it, fully expecting to find Linux sites in it.

    That's all it's been from Microsoft lately. Microsoft, the anti-Linux company (who also sells some software on the side).

  24. Uncool MS Research by Anonymous Coward · · Score: 0

    As far as research divisions of big companies go, MS's is the most uncool by miles. I have yet to see any announcements coming from MS Research that evoke anything other than a yawn - this announcement being a good example. This can't be said of HP, IBM, the old AT&T, the old SUN, etc.

          One wonders if MS hires talented people only in order to prevent them from doing interesting research for other companies, not in order to do interesting research for them.

  25. is this research reliable by msblack · · Score: 1

    I read the research paper a couple days ago after reading about it in the NY Times. Seeing how this research is Microsoft funded and implicates Google, claiming they're syndicators are in cooperation with the spammers, one has to question researcher bias. I'd like to see a peer-reviewed and independently verified article before accepting these outrageous claims. Note that the researchers focused on a few keywords and strictly limited the scope of their efforts. This doesn't mean the findings are untrue, it just calls their methodology into question.

    --
    signature pending slashdot approval
    1. Re:is this research reliable by Anonymous Coward · · Score: 0

      The paper, having been accepted to WWW conference, IS peer reviewed.

  26. Firefox is good. by wetelectric · · Score: 2, Informative

    Firefox has an extension called customizegoogle. It adds a 'filter' option to a google results page. Allows one to filter out the sneaky pages that hi-jack your search query.

    --
    Most people have no idea what they are doing, and are silently panicking on the inside.
  27. I often wonder... by dbmasters · · Score: 1

    I look at these situations much like I looks at people that cheat welfare systems and such. So many people spend so much time figuring out how to cheat a system, I wonder if that same time was spent trying to work the system the right way how much of a difference in the net outcome it would be...

    --
    dB Masters
  28. What's the point? by brouski · · Score: 1

    What was the point of this effort? To improve its own search results? To show up Google?

    --
    Proud member of the American Non Sequitur Society. We might not make much sense, but boy do we love pizza!
  29. I know what they used to find those sites :) by Anonymous Coward · · Score: 0

    (just posted today on the reg.)
    Microsoft's search excels in spreading malware
    http://www.theregister.co.uk/2007/03/20/windows_li ve_malware/

  30. This is research? by QuietLagoon · · Score: 0, Flamebait

    No wonder Microsoft never has any real innovation.

  31. To play devil's advocate... by Gorkamecha · · Score: 1

    What if I want MY page to just be a sea of ads? I setup the code, I did the work, why can't I show what I want? It's not my fault that Google misreads my page or gives someone else a higher ranking because of it. I'm sure there are whole boatload of sites that could be deemed "junk", but out here in the digital wild west, I'm free to do what I want on my 10MB of free space....Aren't I?

    1. Re:To play devil's advocate... by Stanistani · · Score: 1

      You're free to do what you want.

      So is Google.

      Selah.

    2. Re:To play devil's advocate... by Anonymous Coward · · Score: 0

      Yes, but equally, the search engines aren't obliged to pay any attention to you.

  32. Go get them... by Anonymous Coward · · Score: 0

    Someone go get those bloody bastards, and shoot them dead. The Internet won't be that much of a safer place (other bastards will rise to replace them), but every step taken to sanitize it will be a welcome one.

  33. Why not discard hidden links? by glindsey · · Score: 1

    Here's a thought: why can't search spiders be a bit smarter, and discard any links on a page that are set to "display: none"? Or, better yet, why not flag them as potentially abusive? I realize there are legitimate reasons for hiding a link with the CSS display attribute if you're using dynamic HTML, but I'd venture to guess the majority of hidden links are used for search engine manipulation.

    Of course, the scammers would just try some other tactic -- perhaps hiding links in Z-layers behind opaque graphics -- but it is always an arms race, isn't it?

    1. Re:Why not discard hidden links? by oni · · Score: 1

      discard any links on a page that are set to "display: none"

      I bet the spammers would just start using really obfuscated javascript to set the style = display:none. So, you'd be starting an arms race where search spiders would have to start processing javascript and then the spammers would just come up with something else (maybe set the z-index low so that the links can't be seen). It just doesn't seem like it's worth the effort.

      I use display:none all the time by the way. The left column of slashdot has those boxes with Help, Stories, About (etc). That would be a great example of a place where you could hiding the links under those sections and then roll them out when the user clicks or hovers.

  34. Comment removed by account_deleted · · Score: 0, Redundant

    Comment removed based on user account deletion

  35. Comment removed by account_deleted · · Score: 1

    Comment removed based on user account deletion

  36. And here's another ... by Anonymous Coward · · Score: 0

    ""According to an article [in the] New York Times, Microsoft researchers have discovered tens of thousands of junk Web pages ..."

    There are plenty of pages pushing junk out there. Here's one I came across just today:

    http://onecare.live.com/standard/en-us/default.htm

  37. GOOGLE SUPPORTS CYBER SQUATTERS by Anonymous Coward · · Score: 0

    The site in question seems to have changed from google ADSENSE, but when I complained about it (WHEN THEY WERE USING GOOGLE ADSNSE) they bascially said it was not against there policy. I think Microsoft knows that google is getting lots of revenue from cybersquatters so that is why they are going after them.

    Original Message Follows:

    Subject: Other
    Date: Fri, 22 Sep 2006 18:20:26 -0000

    Hi there, I came accross a site by accident and I notice that it seems to
    direct link to adsense advertisers via a direct link as opposed to the
    hidden javascript that is normally the case. Also the page seems to be
    made up entirely of adsense adds which I thought was against adsense
    policy.

    Can I do this too? http://paypall.com/

    BTW I see lots and lots of sites like this and they all seem take
    advantage of mistakes when typeing in website urls - ths seems EVIL to
    me...

    Regards,

    Hi,

    Thank you for your email. It appears that paypall.com is a member of our AdSense for Domains (AFD) program. Because we respect the confidentiality of all publishers, we cannot disclose any additional details of our relationships with other sites.

    If you own sites that generate more than 750,000 page views per month you may be eligible for our AFD program. If you meet this requirement and you'd like to learn more about the program, please visit http://www.google.com/domainpark .

    For additional questions, I'd encourage you to visit the AdSense Help Center (http://www.google.com/adsense_help), our complete resource center for all AdSense topics. Alternatively, feel free to post your question on the forum just for AdSense publishers: the AdSense Help Group (http://groups.google.com/group/adsense-help).

    Sincerely,

    Kevin
    The Google AdSense Team

    To access the Google AdSense home page or to log in to your account,
    please visit: https://www.google.com/adsense

  38. wait... by hogsWild · · Score: 1

    This didn't make microsoft sound nearly evil enough for /.

  39. MIA: Marketing dept. by TheRistoman · · Score: 1

    "Microsoft Strider Search Ranger"? Come on now. Are they turning to japanime/manga naming conventions? How long until: Microsoft Laser Super Action Happy Extreme START!!!!! Microsoft Real Swift Rainbow Sunshine Police Now LOVE!!!! This is taking the concept of branding into its exact opposite. And then you have things like, Apple TV. And you wonder why MSFT is tanking.

  40. Well, here's 100,000 spam domains by XHIIHIIHX · · Score: 1

    10,000 pages??? Geez, I want to work for microsoft, those guys make wally look industrious http://www.google.com/search?hl=en&q=allinurl%3Adm xargs&btnG=Google+Search

  41. Re:Interesting. by ericlondaits · · Score: 1

    Well... it's a bit like blaming the PC security problems mostly on Windows. The shoe fits.

    --
    As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
  42. Old news by Anonymous Coward · · Score: 0

    Old news, see: http://johnbokma.com/mexit/2006/07/13/

    Have been reporting this to Google for over a year. Only recently long lists (thousands) of blogs got /finally/ accepted by the abuse desk. If I can find thousands of blogs with some Perl, why can't Google fix this before those blogs get spammed on thousands and thousands of open guestbooks, blogs, etc.

    Furthermore, the problem is not limited to Google. LayeredTech, ThePlanet, and several other hosting providers have no problem at all with making it a pain in the ass to report abuse and just host too much garbage for too long.

    And all the while non-solutions like Akismet are applied by the masses. It's time some people create a draft on how comments should be stored in blogging software (hint: including remote ip, proxy related environment variables, etc) and we get a online reporting tool like spamcop. Filtering? Look at your inbox. It's not going to happen. And CAPTCHA? By the time bots have problems with it, most people can't solve them.

  43. How did they search this out? by whitehatlurker · · Score: 1
    Given the report in The Register today, the researchers could have been better off using Live.com as their search engine for researching this topic.

    Seriously, I have had phishing email for some of these 80.77.x.y websites recently as well. A "Good on ya!" to MicroSoft & UC Davis! Root the bastards out and stomp 'em!

    --
    .. paranoid crackpot leftover from the days of Amiga.
  44. Wow by wumpus188 · · Score: 1

    welcome to the social, MS

  45. Timing by jeichels · · Score: 2, Insightful

    I think it is funny timing how we turned down a $73k/month in advertising last night from one of the top three spam supporting syndicators. They were seeking a $1.16 per average click through.

    I am very glad I read the detailed report from end to end. We seek value in advertising, not spam, but it is very difficult for well meaning companies to figure out which is which. You shouldn't have to be a rocket scientist to differentiate the deceptive tactics/companies from the valid ones. I guess most forms of fraud end up being abstractly similar to this scheme in the end though.

    If something smells fishy don't eat it.

    --

    JohnE
    jobbank.com - Search jobs, post resume,

  46. What is web spam? Ads from phony businesses. by Animats · · Score: 1

    This is good work by Microsoft. They've tracked down a few big-time web spammers, all the way up the food chain. But there are more.

    We've been working on the web spam problem, from a different angle. Our starting point is the legal requirement that a business cannot be anonymous. Every legitimate business must have an identifiable person or corporation behind it. (See CA B&P code sec. 17358, ("disclosure of ... legal name and address information shall appear on ... the first screen displayed ... (or) on the screen on which a buyer may place the order for goods or services ...") the European Directive on Electronic Commerce ("the service provider shall render easily, directly and permanently accessible to the recipients of the service and competent authorities, at least the following information: (a) the name of the service provider; (b) the geographic address at which the service provider is established...")

    Given that basis, our solution to web spam is straightforward: if we can't find a valid business name and address on a web site that's selling or advertising, it's not a legitimate business. Of course, if there is a name and address, it should match business license data, corporate registration data, fictitious name filings, and similar records of business existence.

    So we have a system that parses web pages in some detail, looking for addresses. If a web site has a name and address on it that obeys postal addressing rules, we can usually find it. We have access to some business databases, and we're adding more. We look at some other info, like SSL certs and BBB seals, which has some credibility. Thus, we can check for legitimacy.

    Our goal is to feed this into search engine rankings, so that non-legitimate businesses fall out of visibility.

    "Doorway pages" and "affilates" with no business behind them aren't legitimate businesses, so they're toast. Completely phony addresses won't work, either; they won't match business records. Stealing the name address of a legitimate business is felony identity theft, which is a place you don't want to go. (Also, sometimes, we can detect and report that.)

    An early version of this is already running at SiteTruth.com. If you're responsible for a commercial web site, run it through the Detailed SiteTruth analysis, for Webmasters and see what SiteTruth finds. If SiteTruth can't find your business name and address, you might want to fix that. The day will come when it affects your search placement.

    This is the alpha test phase for SiteTruth; there's more coming.

    Web spam used to be a safe tactic. That was then. This is now.

  47. One Mans "Junk" Is Another mans "Diamond' by chipperjack · · Score: 1

    Anyone who makes a website, no matter who considers it "junk" still is not forcing or spamming any serach engine. In order to be listed in a search engine, they (the search engines) must send out its crawlers in a search for websites...If a SE(s) end up listing a "junk" website in thier search engine becuase their SE crawler found it in the endless boundrys of cyberspace...thats not the owner of the so called "junk" web site(s) problem...nor it should it be.

    There is NO SUCH thing as "spamming a Search Engine"
    There is only what THEY allow to be indexed in their SE or what they don't

    This term was invented by the SE's folks for their own puposes. To get you on their side by being your "protector"

    ------
    "the most terrifying thing happen to me today...I visited a blog and it...it...it had ADS on it and information about...about...watches...and...and it did not make sense to me...and the ads were from google and yahoo...and...and...oh my god...thhe maddness! I think it was a "spam website...thats been talked about SO MUCH...on CNN,MSNBC....oh the pain...I feel so dirty...Iam go to take a shower and a bottle of nightqil....I just can't deal with this....LOL
    -----

    BTW: I thought search engine companies invented algorithms and search engine filters and about a thousand other things to rid themselves of "so called junk sites" starting about 5 years ago.

    Whats scary, is some how along with google they now think and are trying to take complete control over the internet...Earth to google and ms...you don't and never will.

    But at the rate this are going, they will control enough of it...to ruin it for a lot of people.

    If they don't want what they consider "junk" websites in their SE...and they cannot control this through technology...then they should go back to the "old fashion" directory days were every website and blog (blogs did not exsist...LOL) that gets indexed into their SE is first viewed by a "person" and approved or disapproved for inclustion to their SE.

    Just as with real history of humans on this planet...if you do not remember all the freedoms the internet provided in the past...you to will loose more and more of your freedom on the internet...with Google and MS as your masters....and you will never know or remember how it exactly happened...its just will be.

    This is not ALL about websites they don't like...it is ALL about controling everything you do on line...weather your aware of it or not.


    Thats it...nothing to "heavy"...LOL

    Peace! Chipper Jack

    --
    http://www.iraqsinconvenienttruth.com/
    1. Re:One Mans "Junk" Is Another mans "Diamond' by Anonymous Coward · · Score: 0

      You are correct. This research, while possibly an admirable technical achievement, is about as useful as a phd thesis that analyzes the news value of your typical neighborhood classified ad rag, what is typically known as a "fetcher," as in ad fetcher. The point is, there isn't any news value in an ad fetcher. Neither is there any high editorial value to a search engine. At the risk of stating the patently obvious to anyone who doesn't know, a search engine is not a newspaper, not a magazine and not a network news show. Its sole purpose from a business standpoint is to deliver an audience to advertisers and to deliver ads to the audience. When you use a search engine, YOU are the product that is being sold. And as to the discovery that Looksmart is kind of spammy, well, welcome to 1998.

  48. Re:Interesting. by chipperjack · · Score: 1

    Hi, I love analogy's...but thats a bad one. Peace

    --
    http://www.iraqsinconvenienttruth.com/
  49. Re:The easiest way by Anonymous Coward · · Score: 0

    Looks like the dudes are having a little shot at google/yahoo. Meanwhile one of the authors quotes 4 of his articles so i wonder if he's trying to pump up his academic ratings on the side

  50. Brett Tabke is a liar by hankwang · · Score: 1

    no, if you read all the issues, most people could see his site, very few could not because scrapers were coming from those IP's.

    I have read the stories about "we have a long list of blocked IP addresses and all the horrible bots are using my bandwidth". Brett Tabke is a liar. I have tried accessing his site from many different (static!) IPs in different /16 blocks and they were all blocked. Tabke's business model is to have an ad-free website and charge $180 per year for access to the site. He wants to attract new paying customers without paying for bandwidth to nonpaying visitors. So he blocked access by nonpaying visitors from Asia and Europe completely because they were not generating enough revenue.

    1. Re:Brett Tabke is a liar by onepoint · · Score: 1

      >>So he blocked access by nonpaying visitors from Asia and Europe completely because they were not generating enough revenue.

      And, can you point out the problem with this? a web site is like a parking lot, that lot can generate a rental income, so why not design it to get the most out of it.

      heck if the euro-Asia traffic is non-performing, then why give them the bandwidth. As much as we would all like to share, then end of the day it's $$$ that speak or otherwise it's a hobby and you don't care about the money.

      I block traffic from certain IP's because I know they don't convert, which is the right of all web site owners.

      What you are talking about ( or feeling ) is the spirit of the Internet where information is free. what is happening is that information, is being priced for those that are willing to pay up for it.

      Onepoint

      --
      if you see me, smile and say hello.
    2. Re:Brett Tabke is a liar by hankwang · · Score: 1

      I block traffic from certain IP's because I know they don't convert, which is the right of all web site owners.

      As I said: fine with me if you do that, but the search engines should not be indexing you. And you should not be lying in public about the real reasons for your policy.

    3. Re:Brett Tabke is a liar by onepoint · · Score: 1

      >>but the search engines should not be indexing you.

      Why? If I place my content to view, the engines that have clean IP's will clear my systems but those that are from other locations wont. so Google in asia won't see me but Google USA will. ( and that's been tested already with google, but not with yahoo)

      if you choose to use Google USA to search but not your local brand it's not my problem, Google makes it easy.

      currently most of Asia does not see certain sites that I manage ( my personal sites are world wide ), but cost being an issue, content costing a lot of man hours, and filing copyright on content, closing off Asia-Africa- and a few other locations has been cost effective and scraper reduction is now to null. ( that and issuing warnings about upcoming DCMA notices, you would be surprised how many scrapers have said "we remove quickly, don't tell yahoo or Google ..." )

      onepoint

      --
      if you see me, smile and say hello.
    4. Re:Brett Tabke is a liar by hankwang · · Score: 1

      if you choose to use Google USA to search but not your local brand it's not my problem, Google makes it easy.

      Provide me a reference or example that country-blocking websites will not show up in nationalized versions. Google.nl, google.fr, google.sv, etc. only give a slightly different ordering of the search results (slightly preferring certain TLDs and pages written in the local language).