Slashdot Mirror


Webmasters Pounce On Wiki Sandboxes

Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."

31 of 324 comments (clear)

  1. Why just wikis? by GillBates0 · · Score: 4, Insightful

    Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
    1. Re:Why just wikis? by caino59 · · Score: 5, Funny

      We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

      You forgot the link: Litigious Bastards

    2. Re:Why just wikis? by nautical9 · · Score: 4, Interesting
      I host my own little phpBB boards for friends and family, but it is open to the world. Recently I've noticed spammers registering users for the sole purpose of being included in the "member list", with a corresponding link back to whatever site they wish to promote. They'll never actually post anything, but they've obviously automated the sign-up procedure as I get a new member every day or so, and google will eventually find the member list link.

      And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.

      Sure hope they can find a nice, elegant solution to this.

    3. Re:Why just wikis? by Anonymous Coward · · Score: 5, Funny

      Why not normal discussion boards and blogs?

      As an employee of JBOSS, I'm shocked and appalled at your suggestion. Fortunately, JBOSS is working on a new JBOSS solution to overcome this problem using JBOSS. We at JBOSS are passionate that our JBOSS technology will prevent even non- JBOSS users from taking advantage of boards this way.

      Frank Lee Awnist
      JBOSS Employee
      JBOSS Inc.

      JBOSS JBOSS JBOSS

    4. Re:Why just wikis? by ichimunki · · Score: 5, Informative

      The real problem with Wikis is that the link will remain there, even after it has been removed from the current page, because most Wikis have a revision history feature. So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

      My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.

      What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?

      --
      I do not have a signature
    5. Re:Why just wikis? by Pieroxy · · Score: 4, Funny

      You forgot the link: JBOSS.

    6. Re:Why just wikis? by clarkcox3 · · Score: 5, Funny

      That's just irresponsible. By putting that link there (the one that says Litigious Bastards), you're contributing to the problem.

      Again, responsible people do not put "Litigious Bastards" links in their slashdot posts.

      Think about it? How would you like a google search for Litigious Bastards to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards?

      --
      There are no tiger attacks in my area and it's all because this rock I'm holding keeps the tigers away.
    7. Re:Why just wikis? by Eivind · · Score: 4, Informative
      It's working almost *too* well. Not only are SCO the number one hit for "litigious bastards", but they're also the number one hit for "litigious" or "bastards" alone.

      Then again maybe that mostly says something about their popularity.

  2. Cyberneighborhood Not-Watch? by raehl · · Score: 5, Interesting

    In the real world, there are neighborhood watch signs to "deter" criminals.

    Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.

    1. Re:Cyberneighborhood Not-Watch? by Random+Web+Developer · · Score: 5, Informative

      There is a robots meta tag for this that you can put in your headers for a single page (robots.txt needs subdirs) but unfortunately most webmasters are too ignorant to realize the power of these:

      http://www.robotstxt.org/wc/meta-user.html

      --
      Artists against online scams http://www.aa419.org/
    2. Re:Cyberneighborhood Not-Watch? by phutureboy · · Score: 4, Interesting

      You can also list robots.txt commands as meta tags in the [head] portion of the document. So, the wiki authors could just put them in the sandbox template, and individual site owners would not even have to know about / monkey with robots.txt to be protected.

  3. Oh well by SpaceCadetTrav · · Score: 5, Informative

    Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.

  4. Yes... PLEASE... by Paulrothrock · · Score: 4, Insightful
    Google needs to do something about this. I had to turn off comments on my blog because all I was getting was spam. Two or three a day that I had to go in and delete. I have to now find a system that will keep the bots out.

    What happened to the nice internet we had in 1996?

    --
    I'm in the hole of the broadband donut.
    1. Re:Yes... PLEASE... by n-baxley · · Score: 4, Interesting

      The system was even easier to rig back then. Back in 96ish, I created a web page with the title "Not Sexy Naked Women". Then repeated that phrase several times and then gave a message telling people to click the link below for more Hot Sexy Naked Women which took them to a page that admonished them for looking for such trash. I added a banner ad to the top of both of these pages, submitted them to a search engine and made $500 in a month! Things are better today, but they're still not perfect.

  5. like porn by millahtime · · Score: 4, Interesting

    These seems similar to the system all those porn systems used to get such a high rank in google.

    Kind playing the system with the content not being quite as desirable.

  6. Who's fault is that? by lukewarmfusion · · Score: 4, Insightful

    Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."

    Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.

    Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.

  7. ROBOTS.TXT by gtrubetskoy · · Score: 4, Insightful
    The burden is not on Google, but on Wiki sandbox admins, who should provide proper ROBOTS.TXT files to inform Google that this content should not be indexed.

    As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.

  8. Complacency by faust2097 · · Score: 5, Interesting
    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

    But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.

  9. Well, it's about time this gets some attention by digitalgimpus · · Score: 4, Insightful

    I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....

    From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.

    It's not about meta tags, or submitting anymore... it's spamming.

    Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?

    IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?

    If they are going to use my blog... why can't I get a cut in that business?

    1. Re:Well, it's about time this gets some attention by Lurker+McLurker · · Score: 4, Insightful
      IMHO those services are pure evil.
      No, 9/11 was pure evil, some unwanted comments on a blog is an annoyance. If you have a website that allows anyone to post comments, you will get some you don't like. That's life.
      --
      Mod parent up!
  10. This happened to me by JohnGrahamCumming · · Score: 4, Interesting

    This happened on the POPFile Wiki. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.

    It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.

    John.

  11. Re:You know... by Anonymous Coward · · Score: 4, Insightful

    that button will also get spammed, as bots will click 'yes' for their sites and 'no' for the competitors sites

  12. Tomorrow today yesterday by boa13 · · Score: 4, Insightful

    But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.

    The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).

    So, this is not new, but this is now news.

  13. Not a big deal by arvindn · · Score: 4, Informative

    Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.

  14. True by Pan+T.+Hose · · Score: 4, Funny

    "Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"

    I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  15. It just might work! by mcmonkey · · Score: 4, Funny

    'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'

    Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.

    Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.

    It sounds crazy, I know, but I think we could do this.

  16. visual security code for sign-up by Saeed+al-Sahaf · · Score: 4, Informative

    Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    1. Re:visual security code for sign-up by stevey · · Score: 5, Insightful

      There was a story about defeating this system on /. a while back.

      Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.

      Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.

      Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...

      Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..

  17. "Finally"?? by jdavidb · · Score: 4, Interesting

    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.

    The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.

  18. Grow up by scrytch · · Score: 4, Funny

    You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO come up with a different link for SCO so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".

    "Dumb fucker", "miserable failure", etc ... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
    1. Re:Grow up by maxwell+demon · · Score: 4, Insightful

      Well, why not link SCO to something the reader gets real value from? Some page where they can learn something about SCO? After all, since those pages indeed tell something about SCO and therefore contain the word SCO, it should even be more effective.

      --
      The Tao of math: The numbers you can count are not the real numbers.