Webmasters Pounce On Wiki Sandboxes
Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But
webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."
Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
What happened to the nice internet we had in 1996?
I'm in the hole of the broadband donut.
...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
Yes its a sandbox, no its not your personal playground.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.
Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.
I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.
This sure makes sense, and who knows, maybe future wiki distributions do it by default. (If
would work universally...)I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....
From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.
It's not about meta tags, or submitting anymore... it's spamming.
Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?
IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?
If they are going to use my blog... why can't I get a cut in that business?
Why not put the sandbox in it's own folder and add an entry to the robots.txt telling it not to browse that folder?
Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
This way all the modificed web servers would make a giant distributed search engine.
Some nice algorithms like koorde or kademlia could be used.
Anyone thought about starting something like this?
David
But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.
The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).
So, this is not new, but this is now news.
But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.
There was a story about defeating this system on /. a while back.
Rather than using OCR or anything poeople would merely harvest a load of images from a signup site - possible when there are only a given number of finite images, or when there is a consistent naming policy.
Then once the images were collected they would merely setup an online porn site, asking people to join for free proving they were human by decoding the very images they had downloaded.
Human lust for porn meant that they could decode a large number of these images in a very short space of time, then return and mount a dictionary attack...
Quite clever really, sidestepping all the tricky obfuscation/OCR problems by tricking humans into doing their work for them ..
Edit robots.txt to let search engines know they should ignore sandbox pages.
This fails to address the real issue.
That is, even if you make your links useless (easy with a no-follow meta tag) it wont help, the majority of this spam is AUTOMATED, and will spam your wiki/blog/guestbook based on simple page queues.
Your best personal defense is to manually remove any page or html queues that a spammer would pick up on as being common to a certain type of postable web page or element.
Bloggers have been creating blacklists (banning both poster ips and destination urls) with some degree of success. This is a deterrent, having a spammer show up on a blacklist whereby webmasters use a distributed file to 'clean' their blogs automatically.
No, 9/11 was pure evil
Overuse of absolutes can lead to their deterioration. As an American I couldn't feel more turgid: now when the Europeans get ready to yell HITLER!!!! in IRC, I can just pre-emptively yell 9/11!!!!!!! and lose/end the conversation.
To be fair, the difference between these 'blog abusing 'minor annoyances' and the large scale deaths/destruction of 9/11 can be seen as just a matter of scale. To some people I know, the economic impact of terrorism keeps them awake at night: the value of human life be damned, watch that bottom line! (Not the most civicly minded people, IMHO.)
Being respected members of polite business society, these people and their defective outlook just as dangerous to you and I as the wiki 'blog abusers and 9/11 baby killers. To them, you are either a customer, employee or garbage to be taken out by security.
This, by the way, is how we treat anybody who we have successfully alienated. Look at these 'blog spammers. Would anyone have cried if Al Queda had blown up a spammer's house?
Both sides of this argument stand at the top of a moral mountain with a very slippery slope and are trying to make the other fall off as far and as fast as possible. I'm waiting to see who tumbles first.
Like they say on bash.org: I will become rich and famous when I invent a device to punch people in the face through the Internet.
"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
> Isn't it time for Google finally to put some work into refining their results...
Isn't it time to also reconsider the Wiki paradigm? More sites (like this) are requiring logins. "Golden Prose" indeed! IMHO, Wikis are evolving into crude Content Management Systems.
Hear, hear. Systems (software or otherwise) that offer something of monetary value for free, and provide no mechanism whatsoever to prevent people from exploiting them, are going to get exploited. Shocking!
Maybe it wasn't obvious to blog and wiki programmers that the ability to post a comment or edit a wiki page was worth money. It isn't worth a lot per post, but because these are online systems, they are very susceptible to bots that can post in huge volume. All of those posts together can alter a site's placement in Google search results, and that's definitely worth money.
Instead of whining about Google being influenced by attacks that use your Wiki or blog, how about making it hard for bots to post in the first place? Is that really an important feature that you can't live without?
I've always wondered why the image is always distorted images which are hard to read on speckled backgrounds?
Why not just show the picture of an object, like an apple or something, and ask the user to type in what it is? I mean, you could have a few hundred of these and it would be nearly impossible for an automated system to guess. (You have a few hundred different items, and like 5-10 images of each item.) I dunno, seems easier to me, but I don't write web software.
Comment of the year
$5 / month hosted VPS on linux = awesome!
I think the real problem is that spammers aren't likely to look at how you've configured spiders to handle your site. So even if you do this i'm sure it won't get rid of the spammers.
Well, why not link SCO to something the reader gets real value from? Some page where they can learn something about SCO? After all, since those pages indeed tell something about SCO and therefore contain the word SCO, it should even be more effective.
The Tao of math: The numbers you can count are not the real numbers.
Also, we really need to replace the klugy robots.txt files and robots meta-tags with headers built in to the HTTP protocol.
Like it is, it's hell to try to get decent robotic behaviour out of anything other than HTML pages.
You forgot the most important SCO link.