Webmasters Pounce On Wiki Sandboxes
Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But
webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."
Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
In the real world, there are neighborhood watch signs to "deter" criminals.
Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.
paintball
Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.
Life in Orange County
What happened to the nice internet we had in 1996?
I'm in the hole of the broadband donut.
These seems similar to the system all those porn systems used to get such a high rank in google.
Kind playing the system with the content not being quite as desirable.
Evolution or ID?
...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
Well, couldn't have been that successful, for he didn't win.
Yes its a sandbox, no its not your personal playground.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.
Pretty widgets? What pretty widgets?
Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.
Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.
I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.
This sure makes sense, and who knows, maybe future wiki distributions do it by default. (If
would work universally...)It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.
But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.
I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....
From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.
It's not about meta tags, or submitting anymore... it's spamming.
Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?
IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?
If they are going to use my blog... why can't I get a cut in that business?
This happened on the POPFile Wiki. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.
It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.
John.
This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.
Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
This way all the modificed web servers would make a giant distributed search engine.
Some nice algorithms like koorde or kademlia could be used.
Anyone thought about starting something like this?
David
When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.
The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.
Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.
I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.
Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.
But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.
The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).
So, this is not new, but this is now news.
Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.
Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.
Wait a minute - a way to spoof Google to get your page ranked better through WiKi? OMFG! Call the internet police, call Dr. Eric E. Schmidt, call out the Google Gorilla goons! I'm sure the good Dr. has a fix like the ones he used at Novell...
The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the top that has no business at the top. To the experienced googler, it's just an annoying fly on the screen and you just move further down.
I'm hoping that Google doesn't get too bogged down in becoming that big Ape like Micro$oft and be a little more proactive in protecting their business property. It's bad enough that they're selling top space to companies willing to pay, but here's hoping they don't slip on their own banana peels.
Management is doing things right; leadership is doing the right things. - Peter F. Drucker
"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"
I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...
Sincerely,
Pan Tarhei Hosé, PhD.
"Homo sum et cogito ergo odi profanum vulgus et libido."
I'm looking for a clean, fast, non-buggy alternative to the google giant. Preferably open source.
Any suggestions?
The only big one I know of right now is Nutch. It is an open source search engine that is in the later stages of development, but hasn't produced a large, usable site yet.
nutch.org
Since it will be open source, you will be able to read the ranking algorithms and change/abuse them as you see fit.
This one http://search.mnogo.ru/ is also available.
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'
Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.
Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.
It sounds crazy, I know, but I think we could do this.
Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.
Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?
I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.
The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.
Secession is the right of all sentient beings.
Edit robots.txt to let search engines know they should ignore sandbox pages.
you know what they say about another man's garbage
What about using random image based spam control lik the one yahoo uses on its new mail signup?
So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.
No, 9/11 was pure evil
Overuse of absolutes can lead to their deterioration. As an American I couldn't feel more turgid: now when the Europeans get ready to yell HITLER!!!! in IRC, I can just pre-emptively yell 9/11!!!!!!! and lose/end the conversation.
To be fair, the difference between these 'blog abusing 'minor annoyances' and the large scale deaths/destruction of 9/11 can be seen as just a matter of scale. To some people I know, the economic impact of terrorism keeps them awake at night: the value of human life be damned, watch that bottom line! (Not the most civicly minded people, IMHO.)
Being respected members of polite business society, these people and their defective outlook just as dangerous to you and I as the wiki 'blog abusers and 9/11 baby killers. To them, you are either a customer, employee or garbage to be taken out by security.
This, by the way, is how we treat anybody who we have successfully alienated. Look at these 'blog spammers. Would anyone have cried if Al Queda had blown up a spammer's house?
Both sides of this argument stand at the top of a moral mountain with a very slippery slope and are trying to make the other fall off as far and as fast as possible. I'm waiting to see who tumbles first.
Like they say on bash.org: I will become rich and famous when I invent a device to punch people in the face through the Internet.
"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
I know you're being sarcastic, but one way to prevent forged IP addresses is to require the user to "preview" their comment before posting.
tasks(723) drafts(105) languages(484) examples(29106)
The Robots Exclusion Protocol (i.e. robots.txt.
Here's Google's stance on the subject (boils down to you don't want it indexed, put in a damn robots.txt file)
Hell, even Google News uses robots.txt
With regards to just editing the sandbox which nobody monitors anyway, why not just include a rule to deny adding URLs. There is no conceivable reason to allow a user to add a URL in the sandbox.
And if your thinking "I want to practise adding links with the required syntax", it's not hard. The only thing you need to use the sandbox for beyond learning how other basic syntax works (and you can apply that to links without practising) is structuring.
As any cat owner will tell you, you need to clean the sandbox out periodically. In the case of a Wiki, overnight would probably be a good idea.
Chip H.
You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO come up with a different link for SCO so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".
... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.
"Dumb fucker", "miserable failure", etc
I've finally had it: until slashdot gets article moderation, I am not coming back.
$5 / month hosted VPS on linux = awesome!
Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.
I thought it was a real-time thing, where the account creation bots passed the image that loaded during the signup process to a porn site and the images were decoded by a real person, and the result passed back to the bot who then signed up for the account.
To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.
Did I just imagine this, or does it work that way?