Slashdot Mirror


Webmasters Pounce On Wiki Sandboxes

Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."

80 of 324 comments (clear)

  1. Why just wikis? by GillBates0 · · Score: 4, Insightful

    Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
    1. Re:Why just wikis? by caino59 · · Score: 5, Funny

      We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.

      You forgot the link: Litigious Bastards

    2. Re:Why just wikis? by abscondment · · Score: 3, Interesting

      posting on Wikis doesn't screw up your own blog.

      posts on message boards will be deleted quickly, unless the board is expressly google bombing (as in the current Nigritude Ultramarine 1st placer) / people are stupid

      i think the idea is that wikis make it easier in general for your post to stay up and not affect your blog.

    3. Re:Why just wikis? by nautical9 · · Score: 4, Interesting
      I host my own little phpBB boards for friends and family, but it is open to the world. Recently I've noticed spammers registering users for the sole purpose of being included in the "member list", with a corresponding link back to whatever site they wish to promote. They'll never actually post anything, but they've obviously automated the sign-up procedure as I get a new member every day or so, and google will eventually find the member list link.

      And of course there are still sites that list EVERY referer in their logs somewhere on their site, so spammers have been adding their site URLs to their bot's user agent string. It's amazing the lengths these people will go to spam google.

      Sure hope they can find a nice, elegant solution to this.

    4. Re:Why just wikis? by Anonymous Coward · · Score: 5, Funny

      Why not normal discussion boards and blogs?

      As an employee of JBOSS, I'm shocked and appalled at your suggestion. Fortunately, JBOSS is working on a new JBOSS solution to overcome this problem using JBOSS. We at JBOSS are passionate that our JBOSS technology will prevent even non- JBOSS users from taking advantage of boards this way.

      Frank Lee Awnist
      JBOSS Employee
      JBOSS Inc.

      JBOSS JBOSS JBOSS

    5. Re:Why just wikis? by ichimunki · · Score: 5, Informative

      The real problem with Wikis is that the link will remain there, even after it has been removed from the current page, because most Wikis have a revision history feature. So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

      My wiki got hit by this stupid link, but not in the sandbox. Of course, recovering the previous version of the page is easy... it's wiping out any trace of the lameness that gets trickier. I suppose the easiest way to defeat this would be to require simple registration in order to edit Wiki pages.

      What else can we do? Alter the names of the submit buttons and some of the other key strings involved in Editing?

      --
      I do not have a signature
    6. Re:Why just wikis? by Andy+Mitchell · · Score: 3, Insightful

      I'm not sure this will make you feel better but this startergy has a limited lifetime.

      The contribution of your page to another pages page rank depends on two factors, firstly the page rank of your page, and secondly the number of links coming from your page.

      As more people take up this tactic the return everyone gets from it, gets smaller. E.g. When there are hundred of links on that page they cease to have any real value. Eventually people should give up on this one.

    7. Re:Why just wikis? by Pieroxy · · Score: 4, Funny

      You forgot the link: JBOSS.

    8. Re:Why just wikis? by Anonymous Coward · · Score: 2, Informative

      Just set your robots.txt to exclude the user list. Or if you don't have many friends and family, send yourself an 'approve member' email. Then start training your spam filter on fake accounts.

    9. Re:Why just wikis? by clarkcox3 · · Score: 5, Funny

      That's just irresponsible. By putting that link there (the one that says Litigious Bastards), you're contributing to the problem.

      Again, responsible people do not put "Litigious Bastards" links in their slashdot posts.

      Think about it? How would you like a google search for Litigious Bastards to point to your company, leading everyone to think that you and your co-workers are nothing but a bunch of Litigious Bastards?

      --
      There are no tiger attacks in my area and it's all because this rock I'm holding keeps the tigers away.
    10. Re:Why just wikis? by boa13 · · Score: 3, Informative

      So what's needed is careful set up in the robots.txt file and other HTML clues for the web crawlers to exclude anything but the most current version of a page (and to skip over the other 'action' pages, like edits, etc).

      It has probably already been done in any wiki software worth its salt. Here's what MoinMoin does for example:

      * It has a regexp of HTTP_USER_AGENTS which should receive a FORBIDDEN for anything except viewing a page. The default setting includes many known bots (including Google) and utilities such as wget.
      * Most pages contain the appropriate robot meta tag, whith the relevant noindex and/or nofollow settings.

      In addition to that, the webmaster can of course set up a robots.txt file, and actually should do so because there are tools out there which don't understand the robot meta tags (or they don't want to take a performance hit) and the user agent of which can easily be changed by the user... wget comes to mind.

      Of course, it shouldn't be too hard to add regexps to prevent certain links from being done, or certain hostnames or IPs from altering the site (editing pages, reverting them, deleting them).

    11. Re:Why just wikis? by Eivind · · Score: 4, Informative
      It's working almost *too* well. Not only are SCO the number one hit for "litigious bastards", but they're also the number one hit for "litigious" or "bastards" alone.

      Then again maybe that mostly says something about their popularity.

    12. Re:Why just wikis? by mrtroy · · Score: 3, Funny

      Top 5 reasons that unix > linux, according to SCO

      SCO UNIX® is a Proven, Stable and Reliable Platform
      SCO UNIX® is backed by a single, experienced vendor
      SCO UNIX® has a Committed, Well-Defined Roadmap
      SCO UNIX® is Secure
      SCO UNIX® is Legally Unencumbered

      HAHAHAHAHAAHHAHAHAHAHAHAHA

      That should be a top 10 list, and on letterman's show

      --
      [I can picture a world without war, without hate. I can picture us attacking that world, because they'd never expect it]
  2. Cyberneighborhood Not-Watch? by raehl · · Score: 5, Interesting

    In the real world, there are neighborhood watch signs to "deter" criminals.

    Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.

    1. Re:Cyberneighborhood Not-Watch? by lunax · · Score: 3, Insightful

      Why not put the sandbox in it's own folder and add an entry to the robots.txt telling it not to browse that folder?

    2. Re:Cyberneighborhood Not-Watch? by Random+Web+Developer · · Score: 5, Informative

      There is a robots meta tag for this that you can put in your headers for a single page (robots.txt needs subdirs) but unfortunately most webmasters are too ignorant to realize the power of these:

      http://www.robotstxt.org/wc/meta-user.html

      --
      Artists against online scams http://www.aa419.org/
    3. Re:Cyberneighborhood Not-Watch? by Random+Web+Developer · · Score: 2, Informative

      The problem with wiki's is that they use 1 template for all pages, including the sandbox, everything is wiki.pl?PageName or something like that. You would have to dive in the code instead of just "using" the wiki

      --
      Artists against online scams http://www.aa419.org/
    4. Re:Cyberneighborhood Not-Watch? by naoiseo · · Score: 3, Insightful

      This fails to address the real issue.

      That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.

      Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.

      Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree of success. This is a deterrent, having a spammer show up on a blacklist whereby webmasters use a distributed file to 'clean' their blogs automatically.

    5. Re:Cyberneighborhood Not-Watch? by Random+Web+Developer · · Score: 2, Informative

      as most spam posts have several links in them, wordpress allows setting a treshold: X number of links in the comment gets cued for moderation.

      --
      Artists against online scams http://www.aa419.org/
    6. Re:Cyberneighborhood Not-Watch? by phutureboy · · Score: 4, Interesting

      You can also list robots.txt commands as meta tags in the [head] portion of the document. So, the wiki authors could just put them in the sandbox template, and individual site owners would not even have to know about / monkey with robots.txt to be protected.

    7. Re:Cyberneighborhood Not-Watch? by jacoplane · · Score: 2, Insightful

      I think the real problem is that spammers aren't likely to look at how you've configured spiders to handle your site. So even if you do this i'm sure it won't get rid of the spammers.

    8. Re:Cyberneighborhood Not-Watch? by bkhl · · Score: 2, Insightful

      Also, we really need to replace the klugy robots.txt files and robots meta-tags with headers built in to the HTTP protocol.

      Like it is, it's hell to try to get decent robotic behaviour out of anything other than HTML pages.

  3. Oh well by SpaceCadetTrav · · Score: 5, Informative

    Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.

  4. Yes... PLEASE... by Paulrothrock · · Score: 4, Insightful
    Google needs to do something about this. I had to turn off comments on my blog because all I was getting was spam. Two or three a day that I had to go in and delete. I have to now find a system that will keep the bots out.

    What happened to the nice internet we had in 1996?

    --
    I'm in the hole of the broadband donut.
    1. Re:Yes... PLEASE... by lukewarmfusion · · Score: 2, Interesting

      As my site grows, I'm thinking about adding a mechanism to address those issues: when the user requests a page for the first time, he'll get a session value that says he's a valid visitor to the site. When he submits a comment, he has to have that value, or comments aren't allowed. I don't know how you'd write a script to circumvent that. (If someone can tell me, I'd love to know so I try to prevent it!)

    2. Re:Yes... PLEASE... by n-baxley · · Score: 4, Interesting

      The system was even easier to rig back then. Back in 96ish, I created a web page with the title "Not Sexy Naked Women". Then repeated that phrase several times and then gave a message telling people to click the link below for more Hot Sexy Naked Women which took them to a page that admonished them for looking for such trash. I added a banner ad to the top of both of these pages, submitted them to a search engine and made $500 in a month! Things are better today, but they're still not perfect.

    3. Re:Yes... PLEASE... by happyfrogcow · · Score: 2, Funny

      What happened to the nice internet we had in 1996?

      i blame blogs

    4. Re:Yes... PLEASE... by Paulrothrock · · Score: 2, Insightful

      No, I blame opportunistic bastards who can't see that it's okay to not profit from something. *Thinks about his sledding hill that was destroyed by an upscale minimall.*

      --
      I'm in the hole of the broadband donut.
    5. Re:Yes... PLEASE... by joggle · · Score: 2, Interesting

      Why not generate an image containing modified text like yahoo and others? Using a little PHP magic, it shouldn't be too hard (see here to get a start).

  5. like porn by millahtime · · Score: 4, Interesting

    These seems similar to the system all those porn systems used to get such a high rank in google.

    Kind playing the system with the content not being quite as desirable.

  6. You know... by fizban · · Score: 3, Insightful

    ...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.

    --

    +1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.

    1. Re:You know... by Anonymous Coward · · Score: 4, Insightful

      that button will also get spammed, as bots will click 'yes' for their sites and 'no' for the competitors sites

    2. Re:You know... by goon+america · · Score: 3, Insightful

      Wouldn't that be equally abused?

    3. Re:You know... by Nasarius · · Score: 2, Insightful

      Ah, but how long will it take for someone to write a worm with a Google-abusing payload? We've already got spammers using hacked PCs to send mail.

      --
      LOAD "SIG",8,1
  7. < jab jab > by jx100 · · Score: 2, Interesting

    Well, couldn't have been that successful, for he didn't win.

  8. Some people ... by TheGavster · · Score: 2, Insightful
    It still gets me how the people who are participating in the nigritude ultramarine thing don't see anything wrong with what they're doing. This line particularly got me:
    "Without, as opposed to guestbook spamming, being evil it's a sandbox after all."

    Yes its a sandbox, no its not your personal playground.
    --
    "Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
  9. google works by mwheeler01 · · Score: 3, Informative

    Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.

    --
    Pretty widgets? What pretty widgets?
  10. Who's fault is that? by lukewarmfusion · · Score: 4, Insightful

    Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."

    Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.

    Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.

    1. Re:Who's fault is that? by bcrowell · · Score: 2, Interesting
      Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
      I'm not even convinced Google's algorithm has a problem. One thing a lot of people don't realize about the page rank algorithm is that your page rank goes down if you have lots of outgoing links that aren't reciprocated with links coming back from the site you linked to. It may be that this technique simply leads to a reduction in the page rank of the sandbox, which, after all, is appropriate, since the sandbox isn't something the the sandbox's owner even wants people to find by Google searching.

      Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
      Yeah, the simplest thing to do would be for the sandbox's owner simply to use the robots.txt file to forbid indexing of the sandbox page. That keeps the rest of the web site's page rank from being adversely affected, deters spammers from abusing the sandbox, and does Google's users a service by not directing them to the sandbox, which they don't want to find.

      Spammers aren't stupid -- if I was an Evil Spammer(tm), I'd certainly make sure my script checked the robots.txt and didn't waste time spamming sandboxes that weren't going to be indexed.

  11. ROBOTS.TXT by gtrubetskoy · · Score: 4, Insightful
    The burden is not on Google, but on Wiki sandbox admins, who should provide proper ROBOTS.TXT files to inform Google that this content should not be indexed.

    As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.

  12. Same site, a few days later: Don't do it. by micha2305 · · Score: 2, Insightful
    Ok, but the same webmaster says:

    I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.

    This sure makes sense, and who knows, maybe future wiki distributions do it by default. (If

    <meta name="robots" content="noindex">
    would work universally...)
  13. Complacency by faust2097 · · Score: 5, Interesting
    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.

    But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.

  14. Well, it's about time this gets some attention by digitalgimpus · · Score: 4, Insightful

    I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....

    From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.

    It's not about meta tags, or submitting anymore... it's spamming.

    Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?

    IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?

    If they are going to use my blog... why can't I get a cut in that business?

    1. Re:Well, it's about time this gets some attention by Lurker+McLurker · · Score: 4, Insightful
      IMHO those services are pure evil.
      No, 9/11 was pure evil, some unwanted comments on a blog is an annoyance. If you have a website that allows anyone to post comments, you will get some you don't like. That's life.
      --
      Mod parent up!
    2. Re:Well, it's about time this gets some attention by Heywood+Yabuzof · · Score: 2, Insightful


      OK, so it's not really fair to get into relative levels of "evil", but let's also not minimize the "evil" that search optimizers do. It's not just a bunch of extra comments on blogs or wikis.

      Their fundamental business model is CONTRARY to my interests as a consumer trying to get product information. They don't wish to let me find the product or the review or the site that MOST PEOPLE FOUND USEFUL, they only want me to find the one that PAID THEM THE MOST MONEY.

      I realize that's just the way things are, but that's obviously counter to my whole purpose for using a search engine like Google. They are intentionally polluting the search results. It's not the methods I find "evil" (although blog comment and wiki spamming are pretty shady) as much as the end result - the loss of helpful web searches.

  15. This happened to me by JohnGrahamCumming · · Score: 4, Interesting

    This happened on the POPFile Wiki. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.

    It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.

    John.

  16. I've seen this by goon+america · · Score: 3, Informative
    I just reverted some pages on my watch list on Wikipedia that had been edited with a google spam bot to link all sorts of words back to its mother site.... lots of mistakes, looked like the script they were using hadn't been tested that well yet. (Would post an example, but wikipedia is completely fuxx0red at the moment).

    This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.

  17. apache + search + p2p = distributed search engine by datrus · · Score: 2, Insightful

    Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
    This way all the modificed web servers would make a giant distributed search engine.
    Some nice algorithms like koorde or kademlia could be used.
    Anyone thought about starting something like this?

    David

  18. Google. by Rick+and+Roll · · Score: 3, Interesting
    When I search on Google, half the time I am looking for one of the best sites in a category, like perhaps "OpenGL programming". Other times, however, I am looking for something very specific that may only be referenced about twenty times, if at all.

    When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.

    The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.

    Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.

    I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.

    Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.

    1. Re:Google. by nsingapu · · Score: 2, Interesting

      Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.

      Google owns oingo.com - perhaps the largest collection of squatter sites out there.

  19. Tomorrow today yesterday by boa13 · · Score: 4, Insightful

    But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.

    The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).

    So, this is not new, but this is now news.

  20. Not a big deal by arvindn · · Score: 4, Informative

    Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.

  21. Hmm by Julian+Morrison · · Score: 3, Interesting

    Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.

  22. This is a concern for the Google Gorilla? by Mr.Fork · · Score: 2, Interesting

    Wait a minute - a way to spoof Google to get your page ranked better through WiKi? OMFG! Call the internet police, call Dr. Eric E. Schmidt, call out the Google Gorilla goons! I'm sure the good Dr. has a fix like the ones he used at Novell...

    The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the top that has no business at the top. To the experienced googler, it's just an annoying fly on the screen and you just move further down.

    I'm hoping that Google doesn't get too bogged down in becoming that big Ape like Micro$oft and be a little more proactive in protecting their business property. It's bad enough that they're selling top space to companies willing to pay, but here's hoping they don't slip on their own banana peels.

    --
    Management is doing things right; leadership is doing the right things. - Peter F. Drucker
  23. True by Pan+T.+Hose · · Score: 4, Funny

    "Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"

    I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...

    --
    Sincerely,
    Pan Tarhei Hosé, PhD.
    "Homo sum et cogito ergo odi profanum vulgus et libido."
  24. Re:Naughty behaviour by Doesn't_Comment_Code · · Score: 2, Informative

    I'm looking for a clean, fast, non-buggy alternative to the google giant. Preferably open source.

    Any suggestions?


    The only big one I know of right now is Nutch. It is an open source search engine that is in the later stages of development, but hasn't produced a large, usable site yet.

    nutch.org

    Since it will be open source, you will be able to read the ranking algorithms and change/abuse them as you see fit.

    This one http://search.mnogo.ru/ is also available.

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
  25. It just might work! by mcmonkey · · Score: 4, Funny

    'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'

    Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.

    Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.

    It sounds crazy, I know, but I think we could do this.

  26. visual security code for sign-up by Saeed+al-Sahaf · · Score: 4, Informative

    Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...

    --
    "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    1. Re:visual security code for sign-up by stevey · · Score: 5, Insightful

      There was a story about defeating this system on /. a while back.

      Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.

      Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.

      Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...

      Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..

    2. Re:visual security code for sign-up by Bitsy+Boffin · · Score: 2, Informative

      Except that the images ("turing numbers" as they are often called) are dynamically generated from random character sequences, and probably with equally random distortions.

      You'd be pretty lucky to hit the exact same image twice.

      --
      NZ Electronics Enthusiasts: Check out my Trade Me Listings
    3. Re:visual security code for sign-up by smagruder · · Score: 2, Informative

      Check out the Visual Confirmation mod in the /contrib folder in your phpBB installation. Read the README.html file for installation instructions.

      --
      Steve Magruder, Metro Foodist
  27. Sandbox persistence by gmuslera · · Score: 2, Insightful
    If its a test area, is needed to store it? Wikis could just have it live for the current session or testing of the user, and when the user logs out or finish editing, simply delete/restore it to a default introductory text. Don't need to be some kind of collaborative blackboard or graffiti wall, or at least, if it must be, that be the webmaster choice to be that way (at least TikiWiki let me disable the sandbox if i want).

    But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.

  28. "Finally"?? by jdavidb · · Score: 4, Interesting

    Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?

    I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.

    The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.

    1. Re:"Finally"?? by jdavidb · · Score: 2, Informative

      I checked, and I've got documented evidence of this. On April 25 last year, I reported that earthlink.net was showing up as the top search result for queries involving various religious words, including "Bear Valley Bible Institute." The Church of Scientology (which owns Earthlink) was clearly engaging in something to distort the page rank of earthlink. I had noticed this for a long time before I recorded it.

      On that same day, I reported the problem to Google via their feedback mechanism. I note today that the problem is gone.

      Now if I can just do something about the "Church Of Christ at eBay Low Priced Church Of Christ. Huge Selection! (aff)" ads I keep getting on Google, I'll be happy... ;)

  29. Easy solution by lightspawn · · Score: 2, Insightful

    Edit robots.txt to let search engines know they should ignore sandbox pages.

  30. naked women are trash? i'll take all you got by waspleg · · Score: 3, Funny

    you know what they say about another man's garbage

  31. image based spam control by MaximusTheGreat · · Score: 3, Interesting

    What about using random image based spam control lik the one yahoo uses on its new mail signup?
    So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.

    1. Re:image based spam control by JamieF · · Score: 2, Insightful

      Hear, hear. Systems (software or otherwise) that offer something of monetary value for free, and provide no mechanism whatsoever to prevent people from exploiting them, are going to get exploited. Shocking!

      Maybe it wasn't obvious to blog and wiki programmers that the ability to post a comment or edit a wiki page was worth money. It isn't worth a lot per post, but because these are online systems, they are very susceptible to bots that can post in huge volume. All of those posts together can alter a site's placement in Google search results, and that's definitely worth money.

      Instead of whining about Google being influenced by attacks that use your Wiki or blog, how about making it hard for bots to post in the first place? Is that really an important feature that you can't live without?

    2. Re:image based spam control by Blakey+Rat · · Score: 2, Insightful

      I've always wondered why the image is always distorted images which are hard to read on speckled backgrounds?

      Why not just show the picture of an object, like an apple or something, and ask the user to type in what it is? I mean, you could have a few hundred of these and it would be nearly impossible for an automated system to guess. (You have a few hundred different items, and like 5-10 images of each item.) I dunno, seems easier to me, but I don't write web software.

  32. YHBT. YHL. HAND. [Was: Re:Well, ...] by waveclaw · · Score: 2, Insightful

    No, 9/11 was pure evil

    Overuse of absolutes can lead to their deterioration. As an American I couldn't feel more turgid: now when the Europeans get ready to yell HITLER!!!! in IRC, I can just pre-emptively yell 9/11!!!!!!! and lose/end the conversation.

    To be fair, the difference between these 'blog abusing 'minor annoyances' and the large scale deaths/destruction of 9/11 can be seen as just a matter of scale. To some people I know, the economic impact of terrorism keeps them awake at night: the value of human life be damned, watch that bottom line! (Not the most civicly minded people, IMHO.)

    Being respected members of polite business society, these people and their defective outlook just as dangerous to you and I as the wiki 'blog abusers and 9/11 baby killers. To them, you are either a customer, employee or garbage to be taken out by security.

    This, by the way, is how we treat anybody who we have successfully alienated. Look at these 'blog spammers. Would anyone have cried if Al Queda had blown up a spammer's house?

    Both sides of this argument stand at the top of a moral mountain with a very slippery slope and are trying to make the other fall off as far and as fast as possible. I'm waiting to see who tumbles first.

    Like they say on bash.org: I will become rich and famous when I invent a device to punch people in the face through the Internet.

    --

    "You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
  33. Re:Sure, that will work by Short+Circuit · · Score: 2, Informative

    I know you're being sarcastic, but one way to prevent forged IP addresses is to require the user to "preview" their comment before posting.

  34. Disallow weblinks by Will2k_is_here · · Score: 2, Interesting

    With regards to just editing the sandbox which nobody monitors anyway, why not just include a rule to deny adding URLs. There is no conceivable reason to allow a user to add a URL in the sandbox.

    And if your thinking "I want to practise adding links with the required syntax", it's not hard. The only thing you need to use the sandbox for beyond learning how other basic syntax works (and you can apply that to links without practising) is structuring.

  35. Clean sandbox daily. by chiph · · Score: 2, Informative

    As any cat owner will tell you, you need to clean the sandbox out periodically. In the case of a Wiki, overnight would probably be a good idea.

    Chip H.

  36. Grow up by scrytch · · Score: 4, Funny

    You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO come up with a different link for SCO so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".

    "Dumb fucker", "miserable failure", etc ... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.

    --
    I've finally had it: until slashdot gets article moderation, I am not coming back.
    1. Re:Grow up by maxwell+demon · · Score: 4, Insightful

      Well, why not link SCO to something the reader gets real value from? Some page where they can learn something about SCO? After all, since those pages indeed tell something about SCO and therefore contain the word SCO, it should even be more effective.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  37. just like spam by SethJohnson · · Score: 2, Insightful


    Your suggestion is well-thought-out, but is plagued by two problems.

    1. The bombing bots won't give a rat's ass if you add this to robots.txt. Just like spammers, there's not cost for them to hit your site anyway. Even if Google is instructed to ignore the links.

    2. Your site's google ranking is affected by the quality of the links you feature pointing at other sites. Your solution unbalances this whole matrix.
  38. Another solution besides robots.txt by wamatt · · Score: 3, Interesting

    Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.

  39. Which is why I thought it was real time by swb · · Score: 3, Interesting

    I thought it was a real-time thing, where the account creation bots passed the image that loaded during the signup process to a porn site and the images were decoded by a real person, and the result passed back to the bot who then signed up for the account.

    To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.

    Did I just imagine this, or does it work that way?

    1. Re:Which is why I thought it was real time by allism · · Score: 3, Informative

      You didn't imagine it, but perhaps a clearer understanding of the technique can be achieved by reviewing the previous discussions. Here's a link to the Slashdot article that discussed this last January.