Splogs Clog Blog Services
SuperWebTech writes "A new generation of spam has emerged lately in the form of automatically-created spam blogs, or "splogs." One wily programmer manipulated Blogger's API to create a "spamalanche" of thousands of blogs whose sole purpose was to increase their real sites' pagerank. This clogged search engine results while filling RSS feed services with useless listings. Though Google, Blogger's owner, is doing its best to fix the problem, in the meantime several services have stopped listing any site they host. So far nobody has found a solution."
That should read "Bayesian filtering" of course.
In hopes of not looking so spammy, they will take real blogs, and either copy the contents, or just key words (such as authors name and perhaps post title.
So when you search for something... spammers with your name come up, rather than yourself.
The problem surfaces when the "splogs" are used to comment spam and trackback spam legitimate blogs. It's through these links that PageRank is increased. If everyone starts proactively dealing with spam on their own sites, this problem will solve itself. MovableType users can upgrade to 3.2, which has spam blocking features, or use the great plugin MT-Blacklist. Either will eliminate this problem. An AC mentioned that WordPress has a similar set of options. I know that TypePad does. The only major blog service provider left to come up with a solution is Blogger, and in the interim you can require registration to post comments on your Blogger site or turn comments off entirely. LiveJournal and all the clones are blocked from trackback by 90% of normal blog sites already, so they don't even count.
Another poster suggested that we ignore this problem, and it will go away. Untrue. Ignoring the 600 spam comments a day is exactly what the spammers would prefer you do, so that they can stink up every site on the internet with their crap. We are fortunate that in the case of this "new" form of spam, the tools necessary to get rid of it are already there and effective, we just need to get them all turned on.
All these approaches are in active use.
If someone's willing to pay for a higher search ranking, the spammer can pay humans to beat the CAPTCHAs. I can see it now, a sweatshop in a low-wage country with hundreds of workers monotonously typing in the text from the skewed and scrambled images.
There's also PWNTcha, a CAPTCHA decoder. (Previously slashdotted.)
Capchas don't solve anything. 90% of them are easily decoded by software. (Software made them, software can decode them.) And as others love to point out, there are ways to get actual people to decode them for you. [However, I've never seen actual evidence of one of the "pr0n traps".]
The only thing that appears to work is charging for new accounts. Yes, it's annoying. Yes, it will drive some, otherwise legit, people away (because they don't use online payment systems, etc., etc.) And yes, it's a hassle for the site. But, aside from stolen credit cards, there's no getting around it. (And very few spammers are willing to commit credit card fraud to increase their pagerank.)
While Google is the *best* commercial search engine it completely ignores the most useful information that can be found through the "Invisible Web" research.
Sure if you wanna find this or that web site or quick info, Google is great. But when you want to find something truly meaningful that you can use as reference, try http://lii.org/ or http://dmoz.org./ Of course this requires subject search (much like going to the library) and recognizing the set of terms you want to find. I just discovered http://www.factbites.com/ is a decent search engine Web site that digs through other "invisible web" sites to deliver results.
People really have to get out of this "Google or bust" mentality if they want to get any real research done.
If you're *really* desperate for a commercial search engine, just go with www.dogpile.com it compiles searches from yahoo, google, jeeves and MSN Search.
ps: PageRank flaws are considered "GoogleHoles" coined by Steven Johnson
http://slate.msn.com/id/2085668/
Comment removed based on user account deletion
You can already. Just add -site:(URL here without the ()'s) at the end of the search, as many as sites you want not to be listed in the results... :)