Webmasters Pounce On Wiki Sandboxes
Yacoubean writes "Wiki sandboxes are normally used to learn the syntax of wiki posts. But
webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank. One such webmaster recently demonstrated this successfully. Isn't it time for Google finally to put some work into refining their results to exclude tricks like this? I know all the bloggers and wiki maintainers would sure appreciate it."
Why not normal discussion boards and blogs? We, for one, saw how the SCO joke (litigious b'turds) managed to GoogleBomb SCO in first place without a problem.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
In the real world, there are neighborhood watch signs to "deter" criminals.
Perhaps there could be a command in the robots.txt file which says "Browse my site, but don't count any links here for page ranking"? That would make your site less of a target for spammers, but not prevent you from being ranked at all.
paintball
Google and others will just lower/diminish the value of links from Wiki pages, just like they did to those open "Guest Book" pages on personal sites.
Life in Orange County
What happened to the nice internet we had in 1996?
I'm in the hole of the broadband donut.
These seems similar to the system all those porn systems used to get such a high rank in google.
Kind playing the system with the content not being quite as desirable.
Evolution or ID?
Just a shame that Google is one of the few search engines that are any good.
I always use All the Web when looking for any company or organisation I know the name of, but for more general queries I'm looking for a clean, fast, non-buggy alternative to the google giant. Preferably open source.
Any suggestions?
...what Google needs? A "Was this result helpful in your search?" button for each link returned, so that the search itself also influences page ranks. Maybe that will help get rid of this Google bombing mess.
+1 Insightful, -1 Troll. What can I say, I'm an Insightful Troll.
Well, couldn't have been that successful, for he didn't win.
Yes its a sandbox, no its not your personal playground.
"Because Science" is one step from "Because old book". Try "Because of my experiment testing my falsifiable assertion".
Google does tweak their ranking system on a regular basis. When the problem becomes evident, (and it looks like it just has) they do something about it...that's why they're google.
Pretty widgets? What pretty widgets?
Google's algorithm isn't the problem. The problem is the availability of easily abused areas such as these "sandboxes."
Some search engines accept any old site. Others accept sites based on human approval and categorization. Google is a nice combination of the two - by using outside references (counting how often the site is linked) it assumes that the site is more relevant. Because other people have put links on their sites. That's a human factor, without directly using human beings to review and categorize the sites and rankings.
Sure it can be abused, but it's not Google's fault; perhaps these areas of abuse (blogs, wikis, etc.) should address the problems from their end.
As a sidenote, I think that with recent Wiki abuse, the issue of open wikis will become a similar one to open proxies and mail relays.
I decided to stop posting backlinks in Wiki sandboxes, the SEO strategy previously explained. [...] In the meantime I'm asking developers and those hosting Wikis of their own to please exclude sandboxes from search engine results (via the robots.txt file). Doing so would shield the sandbox from backlink-postings, and there is no need for it to turn up in search results in the first place.
This sure makes sense, and who knows, maybe future wiki distributions do it by default. (If
would work universally...)It was time to do that at least a year ago. It's pretty much impossible to find good information on any popular consumer product and this is a problem that's been around for a long time.
But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.
I've noticed that my blog's getting lots of spam from sites that don't seem like typical spam sites....
From what I can see, it looks like those "search ranking professionals" who "guarantee to raise your google rank in 30 days" are using blog spamming, and perhaps Wiki Spamming as a way to increase their clients ratings.
It's not about meta tags, or submitting anymore... it's spamming.
Perhaps it's time for people to finally be warry of these services. After all, can a third party really guarantee a position in another companies search index?
IMHO those services are pure evil. They either do nothing, or they do something to increase page rank... what is that "something"? How many options do they have?
If they are going to use my blog... why can't I get a cut in that business?
This happened on the POPFile Wiki. Eventually I solved it by changing the code of the Wiki itself to have an allowed list of URLs (actually a set of regexps). If someone adds a page which uses a new URL that isn't covered it wont show up when the page is displayed and the user has to email me to get that specific URL added.
It's a bit of an administrative burden, but stopped people messing up our Wiki with irrelevant links to some site in China.
John.
I have got the impression that this could work with E2 as well as probably most bbcode powered fora.
Trolling using another account since 2005.
This may become a big problem for sites like this. The only solution might be one of those annoying "write down the letters in this generated gif" humanity tests.
Something that would make a nice opensource project would be to include p2p search functionality in apache itself.
This way all the modificed web servers would make a giant distributed search engine.
Some nice algorithms like koorde or kademlia could be used.
Anyone thought about starting something like this?
David
When I do search in the first category, especially for things such as wallpaper, or simpsons audio clips, the sites that usually turn up are the least coherent ones with dozens of ads. I usually have to dig four or five pages to find a relevant one.
The people with these sites are playing hardball. Google wants them on their side, though, because they often display Google text ads.
Right now, my domain of choice is owned by a squatter that says "here are the results for your search" with a bunch of Google text ads. I was going to/may still put a site there that is very interesting, and the name was a key part of it.
I firmly believe that advertisements are the plague of the Internet. I would like to see sites selling their own products to fund themselves. Google doesn't really help in this regard. The text ads are less annoying than banner ads, but only slightly less annoying.
Don't get me wrong, I like Google. It's an invaluable tool when I'm doing research. I would just like to see them come out in full force against squatters.
But webmasters may soon deluge these handy tools with links back to their site, not to get clicks, but to increase Google page rank.
The Arch Wiki has sufferred several times from such vandals in the past few months. I'm sure other wikis have, too. They create links over single spaces or dots, so that casual readers don't notice them. Attentively watching the RecentChanges page is the most effective way to find and fight them, but this is tiresome. I guess many wikis will require posters to be authenticated soon, which is a blow in the wiki ideal, but not such a major blow. Alternatively, maybe someone will develop heuristics to fight the most common abuses (e.g. external link over a single space).
So, this is not new, but this is now news.
Recently the Chinese wikipedia suffered a spam attack with a distributed network of bots editing articles to add link to some chinese intenet marketing site. In response, the latest version of MediaWiki (the software that runs the wikipedias and sister projects) has a feature to block edits matching a regex (so you can prevent links to a specific domain). Wikis generally have more protection against spamming than weblogs. So I wouldn't worry.
Leave the links, edit the text to read something like "worthless scumbag, scamming git, googlebomb, please die, low quality, boring" - and lock the page.
Wait a minute - a way to spoof Google to get your page ranked better through WiKi? OMFG! Call the internet police, call Dr. Eric E. Schmidt, call out the Google Gorilla goons! I'm sure the good Dr. has a fix like the ones he used at Novell...
The problem with the whole Google model is that it's biased to begin with. If I'm looking for granny-smith apples, chances are an internet chimp they've bought the space with banana's to Google's goons. It becomes obvious when you see a chimp site that is near the top that has no business at the top. To the experienced googler, it's just an annoying fly on the screen and you just move further down.
I'm hoping that Google doesn't get too bogged down in becoming that big Ape like Micro$oft and be a little more proactive in protecting their business property. It's bad enough that they're selling top space to companies willing to pay, but here's hoping they don't slip on their own banana peels.
Management is doing things right; leadership is doing the right things. - Peter F. Drucker
"Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?"
I agree. I hope Google will finally put some work into refining their search results. I mean, they are probably the worst search engine ever! Now, Yahoo, MSN, Overture, Altavista... Those are much better. But Google?! Please...
Sincerely,
Pan Tarhei Hosé, PhD.
"Homo sum et cogito ergo odi profanum vulgus et libido."
yeah but the winner's wiki spamming is all over also:
& c2 coff=1&edition=us&q=merkey.net+wiki&btnG=Searc h
http://www.google.com/search?hl=en&lr=&ie=UTF-8
Because IP addresses can't be forged. Evar!
Yeah, right.
Compare this to the ROI for music search, non-web search etc and its pretty clear Google's R and D is better directed to new products. Get used to it folks, when they go public there will be a huge expectation of new products on a regular basis. Web search will get tuned when ad keyword revenue dictates it.
A possible solution I've been toying with... 1. Servers provide a meta-tag for certain pages which search engines interpret as reducing/eliminating that specific page's search weight. 2. Scripts which allow user-created content (wikis?, guestbooks, weblog comment forms, forums, and so on) can be updated by the content-provider to include this meta-tag. 3. To encourage spammers to check this tag and move on elsewhere if it's implemented, these same scripts should enforce a longish (5 second?) delay for all user-initiated content changes. [and seeing this is slashdot] 4. ??? 5. Profit!
Coding cookie preserving http connection takes about 20 lines of java code. You better think about something better.
'You know what Google needs? A "Was this result helpful in your search?" button for each link returned'
Yes! Genius! That's it! Google needs some kind of system of rating results to modify future results returned--a system of 'mods' if you will.
Of course some people will 'mod' stuff down just because they don't like the viewpoint expressed, or they're in a perennial bad mood because their favorite operating system is dead, so we'll need to have a system of allowing people to rate the moderations--'meta-mod' if I may be so bold.
It sounds crazy, I know, but I think we could do this.
We looked into something a lot like what you suggest (and actually have it up and running inside our intranet with 2k or so users). The problem with doing this on the internet is that p2p techniques are MUCH more susceptible to spamming than centralized techniques in general (because, for one, p2p reputation systems are very difficult to get right). Another problem is that most existing p2p search methods work great for finding popular content but not very well for finding that very specific peice of information that maybe only you are looking for at the current moment. Kademlia/Chord are DHT's and do not solve the text search problem on their own. While some p2p networks have adapted DHT's for keyword searching, the results still leave a lot to be desired (IMO).
This is a really sad day for news. Get over it and quit crying.
I expect that Google will in time give drastically lower weight to easily-modified pages like "blogs" and "wikis". They're not that hard to recognize.
Most BB boards (including phpBB, upgrade!) and blogs (including Slashdot) now feature the visual security code for sign-up. But, of course, this does not prevent hand entry of spam...
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
Landmines that USA sells to poor nations is evil
and 3000 people die a month on american roads but i dont see people burning down GMC or Ford
The web was designed around the concept of trust and this simply does not work anymore. The only way to fix the internet is to eliminate all form of anonymity and temper this with strong legal protection of private information. Until this happens you will always have to deal with spam, viruses, hackers and the like. Once people can be held accountable for their actions online then, and only then, will the internet work as it was intended.
But if the problem is to have in websites areas where visitors (even unregistered ones) can post random text and links, even slashdot is potentially target of the same (maybe should be a "Spam" mod score?) or by the way, any site where unregistered visitors can store content in a way or another, be wiki or not.
Isn't it time for Google finally to put some work into refining their results to exclude tricks like this?
I take extreme issue with that statement, and I'm surprised noone else has challenged it. Google does in fact put quite a bit of work into making themselves less vulnerable to these kinds of stunts. They even have a link on every results page where you can tell them if you got results you didn't expect, so they can hunt down the cause and refine their algorithm.
The system will never be perfect, and this is the latest issue that has not (yet) been dealt with. Quit your griping.
Secession is the right of all sentient beings.
simply make a distinction between "I am looking to buy something" searches vs "I am looking for information about something".
They are cleary different kinds of searches, and I do both of them, yet I get the same results for both kinds of searches. With the exception for froogle, which is definitely a step in the right direction, but not quite there.
Although the interface has gotten a little better on altavista (remember them??), but searches like: for used condoms do not make sense for retail stores at all. I'm sorry guys, there isn't a market for used condoms, but if there were I'm sure someone would be more than willing to supply the demand.
The google search for used condoms is a little better, but the advertising links on the right hand side does have:
Used Anything -Dirt Cheap
at Gov't & Police Auctions Near You
Seized, Surplus Property. Hot Deals
www.GovernmentAuctions.org
And please do not take a tangent on "used condoms", its just a sick memorable example.
Edit robots.txt to let search engines know they should ignore sandbox pages.
you know what they say about another man's garbage
That's a very interesting article.
Sig
--
KEY PHRASE <A HREF=www.my_website.com> KEYWORD KEYWORD KEYWORD <\A>
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
Right, preserving the integrity of the search results in the case of malicious users injecting false information into the system sounds like a challenging problem.
But a p2p search engine seems to be the only way to go for an open-source search engine implementation.
David
What about using random image based spam control lik the one yahoo uses on its new mail signup?
So, every time you edit/post comment, you would be presented with an image with a random distorted text, which you will have to type in to be able to edit/post. That should take care of automated systems.
Mark the sandbox pages as non-indexable, non-followable with either meta tags or robots.txt.
I guess the spamming problem might be solved if each web-server is crawled by it's own builtin search agent and a set of indepent search agents that also belong to the p2p network.
This way, even if the webserver advertises false information, this information isn't taken into account if it isn't verified by information coming from independent agents of the network.
David
No, 9/11 was pure evil
Overuse of absolutes can lead to their deterioration. As an American I couldn't feel more turgid: now when the Europeans get ready to yell HITLER!!!! in IRC, I can just pre-emptively yell 9/11!!!!!!! and lose/end the conversation.
To be fair, the difference between these 'blog abusing 'minor annoyances' and the large scale deaths/destruction of 9/11 can be seen as just a matter of scale. To some people I know, the economic impact of terrorism keeps them awake at night: the value of human life be damned, watch that bottom line! (Not the most civicly minded people, IMHO.)
Being respected members of polite business society, these people and their defective outlook just as dangerous to you and I as the wiki 'blog abusers and 9/11 baby killers. To them, you are either a customer, employee or garbage to be taken out by security.
This, by the way, is how we treat anybody who we have successfully alienated. Look at these 'blog spammers. Would anyone have cried if Al Queda had blown up a spammer's house?
Both sides of this argument stand at the top of a moral mountain with a very slippery slope and are trying to make the other fall off as far and as fast as possible. I'm waiting to see who tumbles first.
Like they say on bash.org: I will become rich and famous when I invent a device to punch people in the face through the Internet.
"You cannot have a General Will unless you have shared experiences. You cannot be fair to people you don't know."
What happened to the nice Internet we had in 1996?
They are playing hardball. We can strike back by building web browsers that have ad blocking enabled by default. Maybe we can drive a few of them out of business.
I deal with this day in and day out on infoanarchy.org/wiki. The administrator there really has no conception of how to block people who are constantly posting spam and I am trying to find a method to automatically revert pages back when they are changed.
The problem is - allowing good changes and not ending up in a wrestling match with the spammer (and, yes, a Google contest - to me - is the same as spam).
I realize this is all part of the constant struggle between spam to be more effective and sneaky and services like Google to only reference relevant results.
That said, don't introduce yourself to me as someone who does that or you might get pushed down a long flight of stairs. Whoops!
You meant webserfs for webmasters, didn't you?
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin
This whole problem would be moot if the commenting paradigm weren't so ass-backwards. Slashdot and other comment sites shouldn't host the comments, they should link to them. This would have the added benefit of posters getting to keep all their comments in one place.
Is the only part of wikis that anybody can post to, right? No, so how about we toss all the "solutions" regarding protecting/limiting the sandbox out. If it needs a solution, it should be more generalized one.
I believe some of the blog software was using a google redirect mechanism to prevent links from polluting page ranking. Not sure how well it worked, but perhaps something like that would be useful here.
Yet another simple countermeasure would be to empty the sandbox if it hasn't been touched in one hour.
Find free books.
Wikis have always been useful for self-promotion in less obscene ways. If you're knowledgeable in a field, and there's a wiki for it, then some tasteful posting and linkage is good for getting your name around. Ditto for Usenet and web-based discussion forums.
And wikis have always been abusable--by design, really--by people with agendas. Hate Java or Python or Emacs or Perl or Windows? Then go to a popular wiki, delete positive comments about them, add positive comments about your own pet topic, and there you go. There's even a term for this.
The Robots Exclusion Protocol (i.e. robots.txt.
Here's Google's stance on the subject (boils down to you don't want it indexed, put in a damn robots.txt file)
Hell, even Google News uses robots.txt
Has anyone ever experimented with the idea of using a page's "originality" to help determine its place in search results? Maybe comparing text on a page with text on other pages in the results and moving any very similar pages down on the rankings. You'd have to add some sort of garbage filter to prevent people from stacking their pages with randomly-generated nonsense, but that's certainly doable. It wouldn't eliminate all of this SEO crap, but it would at least get rid of the fifty zillion nearly identical Amazon or BizRate or other pages that come up on a lot of searches, and would severely handicap some types of SEO techniques as well.
Admittedly this is a lot more computationally intensive than most current search algorithms (as you'd pretty much have to do this in real time) but I can't imagine it's beyond the abilities of a Google or a Microsoft.
> Isn't it time for Google finally to put some work into refining their results...
Isn't it time to also reconsider the Wiki paradigm? More sites (like this) are requiring logins. "Golden Prose" indeed! IMHO, Wikis are evolving into crude Content Management Systems.
With regards to just editing the sandbox which nobody monitors anyway, why not just include a rule to deny adding URLs. There is no conceivable reason to allow a user to add a URL in the sandbox.
And if your thinking "I want to practise adding links with the required syntax", it's not hard. The only thing you need to use the sandbox for beyond learning how other basic syntax works (and you can apply that to links without practising) is structuring.
As any cat owner will tell you, you need to clean the sandbox out periodically. In the case of a Wiki, overnight would probably be a good idea.
Chip H.
You know, googlebombing might have some better effect if you did it in reverse, e.g. SCO. Right now the second link for "litigous bastards" after sco.com is ... a page urging people to googlebomb. Gee, how subversive, no one will figure out how that worked... Hell every time you mention SCO come up with a different link for SCO so their google results will be peppered with such commentary after... People search for "SCO", not "litigous bastards".
... that was funny. Once. Get over it and take some real action against these, uh, litigous bastards, or at least improve the trick a little.
"Dumb fucker", "miserable failure", etc
I've finally had it: until slashdot gets article moderation, I am not coming back.
$5 / month hosted VPS on linux = awesome!
Spammers are going there because you have a high PR. So cut the PR supply and you in business, http://www.site.com/~url=http://www.link.com and voila - URL rewriting. no more PR for mr spammer.
This foils PageRank in some ways (after all, valid comments with links should increase the rank, at least some of the time). It would be a shame to do the same with all Wikis. Otherwise more onerous authentication may be necessary, which is also against Wiki principles (though common in Wiki implementations). Or some vetting, perhaps using this PageRank-fooling measure until the page changes are approved.
Perhaps google could index only wikipedia pages that has not been changed for a few days. So as long as people keep removing the spam links, they will have no effect on the pageranks. I won't work for forgoten old wikis, but it will for the big ones. I believe that completely ignoring links on Wikipedia is a bad idea, since the average quality of sites linked from there is very good.
A "webmaster" maintains a website. This, however, covers the work of "spammers", anyone over-zealously promoting their website to the detriment of the web.
Big difference. Let's not soil the good name of productive, non-scum career fields.
Last night I googled for "wireless linux intersil prism" (or something like that) and found myself on a site with a page full of nothing but keywords about wireless stuff surrounded by porn banners. I was very happy^Hupset!
From The New Yorker, 5/31/2004 SEARCH AND DESTROY By James Surowiecki If you go to the Internet search engine Google, type in "miserable failure," and click on the "I'm feeling lucky" icon, you will be directed not to an article about "Ishtar" or the 1962 Mets but, rather, to the White House Web site and the official biography of President George W. Bush. Congratulations. You've been Google-bombed. A Google bomb goes off when people conspire to have a particular phrase (in this case, "miserable failure") link to a given Web page, effectively tying the phrase to the page. Other famous Google bombs include one linking "more evil than Satan himself" to Microsoft's home page and, currently, one that links "weapons of mass destruction" to a page that reads, "The weapons you are looking for are currently unavailable. . . . Click the Regime Change button, or try again later." Google bombing may be a party trick, something to amuse office workers as they trudge through the day, but it exemplifies one of the biggest challenges that Google faces as it heads toward its multibillion-dollar I.P.O. Google is as much a ranking system as a search engine. It is more efficient than any other site at analyzing information and making decisions about its importance. Google is successful not because if you search for "Enron" it will return 1.75 million pages that contain the word but because, of those 1.75 million, the most relevant are right at the top. In large part, Google does this by relying on the collective intelligence of the Web itself. At the core of Google's technology is a voting system. Every link from one Web site to another is treated as a vote; sites that get more votes are considered more valuable and, in Google's system, are weighted to have more influence. Google also takes hundreds of other factors into consideration, such as font size and the location of words on the page. But, fundamentally, the Web pages that Google says are best are the pages that the Web as a whole thinks are best. Google's success has created a problem, though: if you have a voting system, people are going to try to manipulate it. Google bombing is the innocent face of this. Less innocent is the industry dedicated to helping Web sites maximize their Google rankings-the racket known as "search engine optimization." Some American companies have armies of programmers toiling away in Bangalore solely to boost their Google rankings. Much of what the "optimizers" do is reasonable, helping companies do a better job of presenting content, using keywords, and building pages to which others will want to link. (These are termed "white hat" tactics.) But there are also plenty of black hats-known as "index spammers"-who have simply adapted the methods and tricks of the old political machines. In the days of Boss Tweed, people were encouraged to vote early and often, dead men were placed on the voting rolls, and citizens were paid for their votes. On the Web, companies "cloak," which means, among other things, that they disguise the real content of their sites, in an attempt to fool Google into thinking that a page is relevant to a search. Deep-pocketed players pay other sites to link to their sites, to foster an illusion of popularity. Some companies set up "link farms"-a host of interconnected Web sites that exist primarily to link to each other. A big company with a major Internet presence, for instance, can buy thousands of domain names, set up Web sites, and effectively create thousands of links out of nothing. Google, of course, knows about all this. In its recent I.P.O. filing, it said that the threat from index spammers was "ongoing and increasing," and so it has embarked on a campaign to outsmart them. A couple of weeks ago, for instance, it essentially banned a company called WhenU because of its cloaking tactics. (WhenU's Web site will no longer appear if you search for the company on Google.) To stymie the cheaters, Google issues periodic revisions to its algorithm, and companies breathlessly await the subsequent changes in their rankings. (Th
Just delete the sandbox every 24 hours.
Sindri Traustason.
The question is who has the problem and what's the best/easiest solution. The wiki admins don't care - their software is doing what it's supposed to be doing. Google is the one that has the problem with it because it degrades their search service. Do they solve it by convincing thousands (millions?) of sandbox administrators to all change their system? Or do they solve it by changing their algorithm?
And generally you are right, though I'd like to put out an instance where that isn't the case. With WordPress it has 2 very nice plugins available: one that uses google as a redirect, and the other creates a md5 hash of the website url. So, for example, this link will take you to google, which takes you to my website, which then redirects you to slashdot.
I thought it was a real-time thing, where the account creation bots passed the image that loaded during the signup process to a porn site and the images were decoded by a real person, and the result passed back to the bot who then signed up for the account.
To avoid the timing problems with porn signons needing to happen concurrent with account signups, the account generation process was actually initiated by a porn signon. It limits your account generation ability, but only to the extent that you have porn traffic.
Did I just imagine this, or does it work that way?
We've also had problems on the SpamAssassin Wiki.
Our solution has been to ensure that all changes are emailed to a mailing list, where we can monitor them and remove the spam links within minutes of their arrival.
An ideal solution: Google should define an attribute for the A tag, which indicates that a URL should not be used in computing Page Rank. We could then modify our Wikis so that page links from Wikis are not included.
Same thing would work for weblog comment spamming, too.
Agreed... I think it's solvable too -- just difficult, which makes it all the more fun.
Also I accidentally linked to the wrong paper in my original reply. I meant to link to this one, but that other paper was at least marginally related.
That's what froogle's for. Are you familiar with it? Type in a term and you'll find nothing BUT consumer items. If you're looking for a review site, that's eopinions. And if your complaint is that google should have a dedicated product review site, that fails your other point of them being all things to all people.
Also, I have NEVER typed in the model number of a product, with some reasonable attendant keywords, and NOT found good review info as well as the manufacturer's site. A little google prowess is all that's required. As someone else mentioned, do you have an example of a search that failed? With keywords, please.
But they're too busy making an email application with 9 frames and 200k of Javascript to pay attention to the reason people use them in the first place. It's a little disappointing, I'm an AltaVista alumni and I got to watch them forget about search and do a bunch of useless crap instead, then die. I was hoping Google would be different.
Please, gmail is a wonderful and necessary idea. Most webmail email clients suck - it's impossible to find your messages, either because they're not indexed or because you had to delete them to keep under your tiny limit. A ton of people, myself included, can't wait for gmail.
Also, it's not like they aren't actively fixing the problems with abuse, but it's hard to keep up with the entire spamming world. Recall SearchKing - they killed that effectively. And just because they don't publically detail their changes to PageRank doesn't mean they aren't working on it.
I would expect the PageRanks of the Wikifarms to decrease within a month.
Blind people surf the net using one of the following "solutions".:
Audio embeds conflict with the screen reader output. [Ever browsed a website listening to a nice cd, only to also get some webdesigner's idea of what constitutes "good music" to come through your speakers at the same time? Audio embeds are the same problem, for users of screen readers.]
Audio embeds don't conflict with braille display screens. The "problem" there is the assumption that an audio output is setup.
Audio embeds work, if the user does not use a screen reading program. The combined audio streams usually result in neither being heard correctly.
Deaf blind people can only surf the web with braille display screens. They don't install audio output on their systems. Audio embeds won't work for them.
Amber
Wind Beneath Thy Wings
Somewhere, a Google employee is reading this Slashdot article thinking, "Oh shit. So much for next week's vacation."
For the people with fast internet connections, it should be easy to do a repeated page mirroring with some tool like HTTrack. Maybe contolled by a little script that keeps repeating it...
That will drive their amount traffic through the roof and cost them. Usually, the amount of free traffic included with a webhosting account is limited to something like 100 GBytes/month. Exceed it and get a big bill.
C - the footgun of programming languages
I don't understand why the wiki-owners just don't put a robots.txt file in the directory of the sandbox indicating the search engine to NOT index the page containing the sandbox.
That's why it's there.....
-- If you try to fail and succeed, which have you done? - Uli's moose
It would be cool if sites could set a page or page-group with a "google weight" via a meta-tag. The weights would be from 0 to 100, with 100 being normal and 0 being no-value at all.
Then sites could take things like their sandboxes and tell people that they are zero-weighted. In fact Wiki, blog, and-such software could automatically zero-weight the free non-user and sandbox pages to prevent this kind of abuse.
Then you put a disclaimer at the top: "These pages are excluded form search engine page rankings."
Innocent people shouldn't be forced to pay for inferior software development.
--"Code Complete" Microsoft Press
Google is filled to the brim with highly intelligent PhDs. This won't take long to fix, right?
Sorry. This tag only stops well behaved bots.
Deliberately written mis-behaving bots will just ignore it.
seriously though. google's algorythm works.
GENERATION 26: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
Now to get the word out to all the smut connoisseurs.
gewg_
It will stop the bots that google and other search engines use for page ranking. It won't stop the spambots but it will prevent the spammers from gaining anything by it. If spammers can't raise the page ranking of a site by using those spambots, then the use of the spambots will stop, or at least not spam your site as it would be useless to do so. Making your site use HTTPS for everything would increase the resources needed to spam your site, making it even less useful to spam any site that uses those tags.
It's not only wikis that are appropriated by these spammers. I had to shut down a discussion board I ran because the spam got to be too much. I was logging on several times every day to delete the junk.
The point of the article, I think, is that wikis are the new frontier for slimy spamming SEOs. The weasels have used "comment spam" on regular blogs. They have spammed referer logs. Now they are giggling over how they can defecate on wikis.
User-agent: Googlebot
Disallow:
Wikileaks, no DNS
I didn't mean for it to happen but after a few posts on /. my page surged on google
(when searching for my name)
here's an example
-- Avishalom is usually vish
While Google's PageRank algorithm does not consider that 'subject' of a page that is linking to another page, their search algorithm is heading that way (sorta like TEOMA). Meanwhile Google-bombing will always work if there isn't a good REAL page about the text that is used for the Google-bomb. As for the META robot tag, don't expect people to use it, if they won't even validate their code(hell, I don't). Google will find ways to sort through the vast amounts of disorganized, unstructured data that is the World Wide Web we know and love.
just go through and clear them.
easy.. they are almost all spam, even down on page 19 of the google results.
bah
anime+manga together at last.. in real time.
I enjoyed it. Please retain the formatting of the source page, or add paragraphs if none exist. Huge blocks are harder to read.
I don't use one and I'm proud of this fact d: fancy spellin i save fer english papers.
GENERATION 26: The first time you see this, copy it into your sig on any forum and add 1 to the generation.
The concept of the wiki was flawed from the beginning. These kind of naive, utopian communities never work, because there's always going to be someone who is willing to run through the site and ruin it for the rest.
The insensitive clods! Wikis help the commie side of the web alive.
If the spammers are linking text like " " or "." to hide their activities, google will easily be able to identify those and block those sites but then spammers will start linking words.
How about we relink any spam we find from http://www.spamsite.com/ To: http://www.searchenginespammers.net/bb-spammer.cgi /http://wwww.spamsite.com/
After linking, 1) click the link (or better have a program visit it with the correct referrrer string or report the link via a web form
on the cgi) and 2) move the link to your search engine accessible spam page. Actually, reporting
via a web form is better than clicking the link
if you are doing it manually because you don't increment the sites hit counters and you don't expose your computer to malware.
Of course, someone would need to register searchenginespammers.net and install a cgi there that would basically display a page describing the criminal practice of bulletin board/wiki spamming, and then lists all the referrer strings that have brought it to this particular page.
This will help search engines like google identify the wiki spammers and purge their sites from their search results. In the short term, searches for the keywords they tried to drive to their site would now take them to searchenginespammers.net and once the folks at google take action they can use it to activiate a filter mechanism. Other sites besides google can use the information. Someone could start a PICS or DNS based blacklist based on listings at searchenginespammers.net that people could use to prevent patronizing such sites. Email filters could use the list to help identify spam.
Like any site that lists spam URLs, there is the possibility that people will spam other peoples URLs to discredit them, so that needs to be taken into account.
Also, this thread is a reminder that when mentioning a company we dislike ( SCO, MPAA, RIAA , Macrovision , Microsoft, George W Bush, etc. should either not link their name or link their name to a site that describes their misconduct; we don't want to help them get better search engine rankings.
Why not an automated reply system? If you become annoyed by a link, simply add it to a list to access on an ongoing, random basis. No need to accept replys. How could they possibly complain, after all they did post a link to bring you to their address. With a few million people responding I should think they would be delighted.