Slashdot Mirror


Webmasters Pounce On Wiki Sandboxes

Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."

49 of 324 comments (clear)

  1. Why just wikis? by GillBates0 · · Score: 4, Insightful

    Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
    1. Re:Why just wikis? by caino59 · · Score: 5, Funny

      We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

      You forgot the link: Litigious Bastards

    2. Re:Why just wikis? by abscondment · · Score: 3, Interesting

      posting on Wikis doesn't screw up your own blog.

      posts on message boards will be deleted quickly, unless the board is expressly google bombing (as in the current Nigritude Ultramarine 1st placer) / people are stupid

      i think the idea is that wikis make it easier in general for your post to stay up and not affect your blog.

    3. Re:Why just wikis? by nautical9 · · Score: 4, Interesting
      I host my own little phpBB boards for friends and family, but it is open to the world. Recently I've noticed spammers registering users for the sole purpose of being included in the "member list", with a corresponding link back to whatever site they wish to promote. They'll never actually post anything, but they've obviously automated the sign-up procedure as I get a new member every day or so, and google will eventually find the member list link.

      And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.

      Sure hope they can find a nice, elegant solution to this.

    4. Re:Why just wikis? by Anonymous Coward · · Score: 5, Funny

      Why not normal discussion boards and blogs?

      As an employee of JBOSS, I'm shocked and appalled at your suggestion. Fortunately, JBOSS is working on a new JBOSS solution to overcome this problem using JBOSS. We at JBOSS are passionate that our JBOSS technology will prevent even non- JBOSS users from taking advantage of boards this way.

      Frank Lee Awnist
      JBOSS Employee
      JBOSS Inc.

      JBOSS JBOSS JBOSS

    5. Re:Why just wikis? by ichimunki · · Score: 5, Informative

      The real problem with Wikis is that the link will remain there, even after it has been removed from the current page, because most Wikis have a revision history feature. So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

      My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.

      What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?

      --
      I do not have a signature
    6. Re:Why just wikis? by Andy+Mitchell · · Score: 3, Insightful

      I'm not sure this will make you feel better but this startergy has a limited lifetime.

      The contribution of your page to another pages page rank depends on two factors, firstly the page rank of your page, and secondly the number of links coming from your page.

      As more people take up this tactic the return everyone gets from it, gets smaller. E.g. When there are hundred of links on that page they cease to have any real value. Eventually people should give up on this one.

    7. Re:Why just wikis? by Pieroxy · · Score: 4, Funny

      You forgot the link: JBOSS.

    8. Re:Why just wikis? by clarkcox3 · · Score: 5, Funny

      That's just irresponsible. By putting that link there (the one that says Litigious Bastards), you're contributing to the problem.

      Again, responsible people do not put "Litigious Bastards" links in their slashdot posts.

      Think about it? How would you like a google search for Litigious Bastards to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards?

      --
      There are no tiger attacks in my area and it's all because this rock I'm holding keeps the tigers away.
    9. Re:Why just wikis? by boa13 · · Score: 3, Informative

      So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

      It has probably already been done in any wiki software worth its salt. Here's what MoinMoin does for example:

      * It has a regexp of HTTP_USER_AGENTS which should receive a FORBIDDEN for anything except viewing a page. The default setting includes many known bots (including Google) and utilities such as wget.
      * Most pages contain the appropriate robot meta tag, whith the relevant noindex and/or nofollow settings.

      In addition to that, the webmaster can of course set up a robots.txt file, and actually should do so because there are tools out there which don't understand the robot meta tags (or they don't want to take a performance hit) and the user agent of which can easily be changed by the user... wget comes to mind.

      Of course, it shouldn't be too hard to add regexps to prevent certain links from being done, or certain hostnames or IPs from altering the site (editing pages, reverting them, deleting them).

    10. Re:Why just wikis? by Eivind · · Score: 4, Informative
      It's working almost *too* well. Not only are SCO the number one hit for "litigious bastards", but they're also the number one hit for "litigious" or "bastards" alone.

      Then again maybe that mostly says something about their popularity.

    11. Re:Why just wikis? by mrtroy · · Score: 3, Funny

      Top 5 reasons that unix > linux, according to SCO

      SCO UNIX® is a Proven, Stable and Reliable Platform
      SCO UNIX® is backed by a single, experienced vendor
      SCO UNIX® has a Committed, Well-Defined Roadmap
      SCO UNIX® is Secure
      SCO UNIX® is Legally Unencumbered

      HAHAHAHAHAAHHAHAHAHAHAHAHA

      That should be a top 10 list, and on letterman's show

      --
      [I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
  2. Cyberneighborhood Not-Watch? by raehl · · Score: 5, Interesting

    In the real world, there are neighborhood watch signs to "deter" criminals.

    Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.

    1. Re:Cyberneighborhood Not-Watch? by lunax · · Score: 3, Insightful

      Why not put the sandbox in it's own folder and add an entry to the robots.txt telling it not to browse that folder?

    2. Re:Cyberneighborhood Not-Watch? by Random+Web+Developer · · Score: 5, Informative

      There is a robots meta tag for this that you can put in your headers for a single page (robots.txt needs subdirs) but unfortunately most webmasters are too ignorant to realize the power of these:

      http://www.robotstxt.org/wc/meta-user.html

      --
      Artists against online scams http://www.aa419.org/
    3. Re:Cyberneighborhood Not-Watch? by naoiseo · · Score: 3, Insightful

      This fails to address the real issue.

      That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.

      Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.

      Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree of success. This is a deterrent, having a spammer show up on a blacklist whereby webmasters use a distributed file to 'clean' their blogs automatically.

    4. Re:Cyberneighborhood Not-Watch? by phutureboy · · Score: 4, Interesting

      You can also list robots.txt commands as meta tags in the [head] portion of the document. So, the wiki authors could just put them in the sandbox template, and individual site owners would not even have to know about / monkey with robots.txt to be protected.

  3. Oh well by SpaceCadetTrav · · Score: 5, Informative

    Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.

  4. Yes... PLEASE... by Paulrothrock · · Score: 4, Insightful
    Google needs to do something about this. I had to turn off comments on my blog because all I was getting was spam. Two or three a day that I had to go in and delete. I have to now find a system that will keep the bots out.

    What happened to the nice internet we had in 1996?

    --
    I'm in the hole of the broadband donut.
    1. Re:Yes... PLEASE... by n-baxley · · Score: 4, Interesting

      The system was even easier to rig back then. Back in 96ish, I created a web page with the title "Not Sexy Naked Women". Then repeated that phrase several times and then gave a message telling people to click the link below for more Hot Sexy Naked Women which took them to a page that admonished them for looking for such trash. I added a banner ad to the top of both of these pages, submitted them to a search engine and made $500 in a month! Things are better today, but they're still not perfect.

  5. like porn by millahtime · · Score: 4, Interesting

    These seems similar to the system all those porn systems used to get such a high rank in google.

    Kind playing the system with the content not being quite as desirable.

  6. You know... by fizban · · Score: 3, Insightful

    ...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.

    --

    +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

    1. Re:You know... by Anonymous Coward · · Score: 4, Insightful

      that button will also get spammed, as bots will click 'yes' for their sites and 'no' for the competitors sites

    2. Re:You know... by goon+america · · Score: 3, Insightful

      Wouldn't that be equally abused?

  7. google works by mwheeler01 · · Score: 3, Informative

    Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.

    --
    Pretty widgets? What pretty widgets?
  8. Who's fault is that? by lukewarmfusion · · Score: 4, Insightful

    Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."

    Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.

    Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.

  9. ROBOTS.TXT by gtrubetskoy · · Score: 4, Insightful
    The burden is not on Google, but on Wiki sandbox admins, who should provide proper ROBOTS.TXT files to inform Google that this content should not be indexed.

    As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.

  10. Complacency by faust2097 · · Score: 5, Interesting
    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

    But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.

  11. Well, it's about time this gets some attention by digitalgimpus · · Score: 4, Insightful

    I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....

    From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.

    It's not about meta tags, or submitting anymore... it's spamming.

    Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?

    IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?

    If they are going to use my blog... why can't I get a cut in that business?

    1. Re:Well, it's about time this gets some attention by Lurker+McLurker · · Score: 4, Insightful
      IMHO those services are pure evil.
      No, 9/11 was pure evil, some unwanted comments on a blog is an annoyance. If you have a website that allows anyone to post comments, you will get some you don't like. That's life.
      --
      Mod parent up!
  12. This happened to me by JohnGrahamCumming · · Score: 4, Interesting

    This happened on the POPFile Wiki. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.

    It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.

    John.

  13. I've seen this by goon+america · · Score: 3, Informative
    I just reverted some pages on my watch list on Wikipedia that had been edited with a google spam bot to link all sorts of words back to its mother site.... lots of mistakes, looked like the script they were using hadn't been tested that well yet. (Would post an example, but wikipedia is completely fuxx0red at the moment).

    This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.

  14. Google. by Rick+and+Roll · · Score: 3, Interesting
    When I search on Google, half the time I am looking for one of the best sites in a category, like perhaps "OpenGL programming". Other times, however, I am looking for something very specific that may only be referenced about twenty times, if at all.

    When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.

    The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.

    Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.

    I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.

    Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.

  15. Tomorrow today yesterday by boa13 · · Score: 4, Insightful

    But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.

    The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).

    So, this is not new, but this is now news.

  16. Not a big deal by arvindn · · Score: 4, Informative

    Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.

  17. Hmm by Julian+Morrison · · Score: 3, Interesting

    Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.

  18. True by Pan+T.+Hose · · Score: 4, Funny

    "Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"

    I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  19. It just might work! by mcmonkey · · Score: 4, Funny

    'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'

    Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.

    Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.

    It sounds crazy, I know, but I think we could do this.

  20. visual security code for sign-up by Saeed+al-Sahaf · · Score: 4, Informative

    Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    1. Re:visual security code for sign-up by stevey · · Score: 5, Insightful

      There was a story about defeating this system on /. a while back.

      Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.

      Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.

      Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...

      Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..

  21. "Finally"?? by jdavidb · · Score: 4, Interesting

    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.

    The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.

  22. naked women are trash? i'll take all you got by waspleg · · Score: 3, Funny

    you know what they say about another man's garbage

  23. image based spam control by MaximusTheGreat · · Score: 3, Interesting

    What about using random image based spam control lik the one yahoo uses on its new mail signup?
    So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.

  24. Grow up by scrytch · · Score: 4, Funny

    You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO come up with a different link for SCO so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".

    "Dumb fucker", "miserable failure", etc ... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
    1. Re:Grow up by maxwell+demon · · Score: 4, Insightful

      Well, why not link SCO to something the reader gets real value from? Some page where they can learn something about SCO? After all, since those pages indeed tell something about SCO and therefore contain the word SCO, it should even be more effective.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  25. Another solution besides robots.txt by wamatt · · Score: 3, Interesting

    Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.

  26. Which is why I thought it was real time by swb · · Score: 3, Interesting

    I thought it was a real-time thing, where the account creation bots passed the image that loaded during the signup process to a porn site and the images were decoded by a real person, and the result passed back to the bot who then signed up for the account.

    To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.

    Did I just imagine this, or does it work that way?

    1. Re:Which is why I thought it was real time by allism · · Score: 3, Informative

      You didn't imagine it, but perhaps a clearer understanding of the technique can be achieved by reviewing the previous discussions. Here's a link to the Slashdot article that discussed this last January.