Splogs Clog Blog Services

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday October 24, 2005 @03:02AM from the like-LDLs dept.

SuperWebTech writes "A new generation of spam has emerged lately in the form of automatically-created spam blogs, or "splogs." One wily programmer manipulated Blogger's API to create a "spamalanche" of thousands of blogs whose sole purpose was to increase their real sites' pagerank. This clogged search engine results while filling RSS feed services with useless listings. Though Google, Blogger's owner, is doing its best to fix the problem, in the meantime several services have stopped listing any site they host. So far nobody has found a solution."

10 of 241 comments (clear)

Min score:

Reason:

Sort:

Re:Username trend? by De+Lemming · 2005-10-24 03:22 · Score: 2, Informative

That should read "Bayesian filtering" of course.
They even quote you sometimes by digitalgimpus · 2005-10-24 03:24 · Score: 3, Informative

In hopes of not looking so spammy, they will take real blogs, and either copy the contents, or just key words (such as authors name and perhaps post title.

So when you search for something... spammers with your name come up, rather than yourself.
splogs aren't the problem... by ianmassey · 2005-10-24 03:27 · Score: 4, Informative

The problem surfaces when the "splogs" are used to comment spam and trackback spam legitimate blogs. It's through these links that PageRank is increased. If everyone starts proactively dealing with spam on their own sites, this problem will solve itself. MovableType users can upgrade to 3.2, which has spam blocking features, or use the great plugin MT-Blacklist. Either will eliminate this problem. An AC mentioned that WordPress has a similar set of options. I know that TypePad does. The only major blog service provider left to come up with a solution is Blogger, and in the interim you can require registration to post comments on your Blogger site or turn comments off entirely. LiveJournal and all the clones are blocked from trackback by 90% of normal blog sites already, so they don't even count.

Another poster suggested that we ignore this problem, and it will go away. Untrue. Ignoring the 600 spam comments a day is exactly what the spammers would prefer you do, so that they can stink up every site on the internet with their crap. We are fortunate that in the case of this "new" form of spam, the tools necessary to get rid of it are already there and effective, we just need to get them all turned on.
Word verification is obsolete by Animats · 2005-10-24 03:43 · Score: 5, Informative
Word verification is obsolete.
- Programs have been written that can successfully decode capchas most of the time. It turns out not to be too hard to modify OCR programs to do this.
- Word verification can be outsourced to third world countries at low cost.
- Most cleverly, word verification can outsourced to users of your porno sites, who have to type in soneone else's capcha to get free pictures.
All these approaches are in active use.
1. Re:Word verification is obsolete by PeeAitchPee · 2005-10-24 04:48 · Score: 2, Informative
  
  Maybe beatable, yes, but still 99%+ effective and definitely not obsolete in practice. Most of the successful existing CAPTCHA attacks use a dictionary matched to the default wordlist that ships with the CAPTCHA and can usually be defeated by running the CAPTCHA in random mode with a few more characters than usual. I get maybe four or five hand-entered spam comments / week, which are usually quickly blocked after the first attempt by blacklisting the target "online drugstore" / poker / whatever site's URL. If I shut my CAPTCHA off I get *thousands* of spam comments / week. So while the technology has its limitations (such as, for instance, excluding blind users), it's a tradeoff that most individual blog owners find beats sifting through hundreds or thousands of spammed comments / week.
Re:Word verification? by Myself · 2005-10-24 03:46 · Score: 3, Informative

If someone's willing to pay for a higher search ranking, the spammer can pay humans to beat the CAPTCHAs. I can see it now, a sweatshop in a low-wage country with hundreds of workers monotonously typing in the text from the skewed and scrambled images.

There's also PWNTcha, a CAPTCHA decoder. (Previously slashdotted.)
Re:Capcha? by Cramer · 2005-10-24 03:51 · Score: 2, Informative

Capchas don't solve anything. 90% of them are easily decoded by software. (Software made them, software can decode them.) And as others love to point out, there are ways to get actual people to decode them for you. [However, I've never seen actual evidence of one of the "pr0n traps".]

The only thing that appears to work is charging for new accounts. Yes, it's annoying. Yes, it will drive some, otherwise legit, people away (because they don't use online payment systems, etc., etc.) And yes, it's a hassle for the site. But, aside from stolen credit cards, there's no getting around it. (And very few spammers are willing to commit credit card fraud to increase their pagerank.)
Re:Well let's get old fashioned by Anonymous Coward · 2005-10-24 05:23 · Score: 1, Informative

While Google is the *best* commercial search engine it completely ignores the most useful information that can be found through the "Invisible Web" research.

Sure if you wanna find this or that web site or quick info, Google is great. But when you want to find something truly meaningful that you can use as reference, try http://lii.org/ or http://dmoz.org./ Of course this requires subject search (much like going to the library) and recognizing the set of terms you want to find. I just discovered http://www.factbites.com/ is a decent search engine Web site that digs through other "invisible web" sites to deliver results.

People really have to get out of this "Google or bust" mentality if they want to get any real research done.

If you're *really* desperate for a commercial search engine, just go with www.dogpile.com it compiles searches from yahoo, google, jeeves and MSN Search.

ps: PageRank flaws are considered "GoogleHoles" coined by Steven Johnson
http://slate.msn.com/id/2085668/
Comment removed by account_deleted · 2005-10-24 05:52 · Score: 2, Informative

Comment removed based on user account deletion
Re:Well let's get old fashioned by LocoMan · 2005-10-24 07:00 · Score: 2, Informative

You can already. Just add -site:(URL here without the ()'s) at the end of the search, as many as sites you want not to be listed in the results... :)