Slashdot Mirror


The Ham and Spam of Weblogs

An anonymous reader submits "Will the blogosphere become just as spammy as Usenet? There may be over 10M weblogs out there, most of them seem to be fake spam blogs created to manipulate the search engines. Scott Johnson, CTO at Feedster, complained that "at times we see upwards of 90% of the traffic from Blogspot being spam," and the problem is likely to only get worse. Can blog search engines like Technorati, Feedster, and PubSub filter the signal from the torrent of noise? Or will we have to seek new approaches such as the social filtering used by Del.icio.us or collaborative filtering used by Findory to separate the ham from the spam?"

12 of 192 comments (clear)

  1. To me (most) blogs ARE spam by Anonymous Coward · · Score: 4, Informative

    I wish Google had an option to exclude blogs from my search. Considering many blogs use b2evolution, phpBB, or whatever, Google could easily determine what IS a blog and what IS NOT and filter it accordingly. Google IMHO would be a much better place if I could exlude blogs and those stupid parked domain search sites from my queries.

    I'm not trying to be flamebait; It would be a nice option though. ::242

    1. Re:To me (most) blogs ARE spam by aftk2 · · Score: 4, Insightful

      I'm just curious - exactly _what_ would include, if not for blogs? Certainly, I can understand not including those parked domain search sites: they're gaming the system, completely unhelpful, and filled with bogus content.

      But blogs? Sure, much of the content is poorly written, or not applicable to what most people - or, well, rather, 90% of a given population - are interested in. But in searches especially, doesn't it make sense to list results that include those normal people so interested in a particular topic that they blog about them?

      For example, blogs can be very helpful when facing computer troubles, provided you're dealing with bloggers who know how to write for Google. This is a good example. I mean, this surely has to be more worthy of inclusion in Google than the lion's share of those web-based bulletin boards that get indexed - you know the ones, with the "Next in thread" and the replies that are typically out of date, or altogether unrelated to your original query.

      Everyone's quick to dismiss things lately. Don't dismiss blogs, just because sometimes their content seems insular and not applicable to what you've searched for. That's a problem with the search engines, not the sites they index.

      --
      concrete5: a cms made for marketing, but strong enough for geeks.
  2. Human validation by SamMichaels · · Score: 4, Funny

    The guy makes a good point...human validation via captcha. If you're going to spend 10 minutes complaining, whining, bragging and/or loathing about something then you can spend 3 seconds typing in the word "uNFsaQ" to prove you're human.

    If it takes you less than 10 minutes to write in your dear diary--I mean blog--then it's probably a 1 liner to the effect of "i think she likez me omglolbbq!!!" and you need to get off my internet.

    Problem solved. Next?

  3. Shouldn't be too hard to filter by XNormal · · Score: 4, Interesting

    With email spam filtering you have to consider each email separately. A blog has a persistent identity and reputation. In theory, this should make it easier to filter blog spam than email spam. On results of this type of filtering is that it will will penalize new blogs in search results, both spammy and real.

    Blog comment spam will remain a problem, of course.

    --
    Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
  4. Software is Not Social by lux55 · · Score: 5, Insightful

    I just wanted to point out that so-called "social software" is not social. Person-to-person communication through computers is mediated and indirect. Technology is a barrier to communication as much as it is an enabler. I agree that it is an enabler in situations where it is used to help overcome disabilities and things of that nature, however technology is used moreso by people who are actually avoiding being social. Email is often preferable to a telephone because it creates an additional barrier between ourselves and the "recipient" (aka person).

    A prime example of software in a "social" context is the chatter that accompanies networked video games. This does not form real relationships between people. I heard a teenager recently say that his gaming buddies, who he doesn't even know by name, are like family to him. Technology has helped a whole generation and then some to fail to learn what real relationships are. When a teenager can't distinguish between somebody he's only ever witnessed virtually shoot ze germans and the people who nurtured him before he was able to take care of himself, we have a problem Houston.

    And it's only getting worse. Now we've begun adding "social" in front of all kinds of new web applications. Anything that lets other users see your profile and the items you post and comment on them is seen as a valid replacement for real human contact.

    There was a line from a movie I saw recently called Crash, where Don Cheadle's character says to his girlfriend "It's the sense of touch. Any real city you walk, you know. You brush past people, people bump into you. In L.A., nobody touches you. We're always behind this metal and glass. I think we miss that sense of touch so much, that we crash into each other just so we can feel something.". The next time we use the word "social" to describe a new type of web application, I think we should give that some thought first.

  5. Check out by Greyfox · · Score: 4, Informative
    The Customize Google plugin. I don't use it to block out adverts and would encourage you not to either, but it is handy for blocking out those obnoxious spammy sites that far too often show up in my google searches.

    It was a bit unintuitive how you add sites to the filter list though -- just cut and paste "http://*.whatever.com/*" into your extensions list and any search results from whatever.com will then be greyed out.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  6. Welcome to Slashdot. by ciroknight · · Score: 4, Interesting

    Slashdot is a blog, created in the context of a news site, which we all come to and bitch about things we want out of technology, think is/are cool, and/or hate and want everyone to know why.

    That being said, Google (along with other large search engines) have already taken stances on blogging, and are actively pursuing their individual stances. For most, this is creating their own blog service, and doing some shifting in their code to make sure blogs don't come out on top. But this isn't an absolute truth.

    If you want these things, and Google doesn't offer them, make your own search engine, and do it better. No, seriously, don't look at me like I'm crazy; there have been over a dozen "major" search engines created after Google, some are only in serious use by geeky populations (AlltheWeb, as far as I can tell, fits this), some by the trendy, some by the "I hate Google"ites, etc. etc. It's as simple as that.

    One reason I think Google's strayed from taking such a hardline on blogs is simply out of ease of use. Google doesn't want to complicate life with a million more search options, especially ones you can deal with yourself by subtracting out the majorly offensive sites (-livejournal -blogger -blogspot, etc).

    --
    "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
    1. Re:Welcome to Slashdot. by The+One+and+Only · · Score: 5, Insightful

      God forbid that one voice be allowed to speak without needing to ask the consent of thousands of others.

      --
      In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
    2. Re:Welcome to Slashdot. by ciroknight · · Score: 5, Interesting

      Wait, how many Slashdot editors are there? Oh right, not thousands. Not even hundreds.

      Secondly, haven't you ever heard of the Freedom of Speech, as guarenteed to us by the Second Amendement in the Bill of Rights of the Constitution of the United States of America? By your comment, I'll assume not.

      Why should we quash out individuality so that one person can get to the content they want better? Why shouldn't we just solve the damned problem, instead of creating more?

      --
      "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
  7. Blogspot by dedazo · · Score: 4, Informative
    Blogspot is fucking overflowing with these fake blogs. Here's one example.

    If you have a few minutes, click on the randomizer button at the top of the screen that reads "Next Blog" a couple of times. I'd be willing to say that at least 2 out of every 10 blogs is a spam farm.

    It's just fucking sad.

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  8. Usenet has improved substantially by Animats · · Score: 5, Informative

    Actually, Usenet is doing quite well. The spam battle has been won; there's very little spam in the technical groups. Serious workers in difficult fields are on there. Check out, say, "comp.games.development.programming.algorithms", where the people who write physics engines discuss how to do it. Or "comp.std.c++.moderated", where proposed changes to C++ are discussed. Usenet has far lower advertising content than the Web, where, today, "content" seems to be a little box in the middle of the page, surrounded by blinking ads.

  9. Re:2 years and no one will care by Absentminded-Artist · · Score: 5, Insightful

    I have been coding webpages since March of 1995. I have learned HTML 1.0, 2.0, 3.0, 4.0 and now CSS1.0 and CSS2.0 and... As exciting as all that can be sometimes I just want to post my thoughts and be done with it. There's nothing wrong with efficiency. Blog sites can be great time savers. I used to have a web journal, wrote entries in my Palm Pilot, hotsynced the data to my Mac and ftp'd it onto my server using Applescript - all the while snorting at all the newbies using blog sites. Then I decided I valued my time better. I opened up a blog in January of this year (http://thesplinteredmind.blogspot.com/ and have had a blast. I post once a week.

    Now, my blog isn't going to be popular. I cover mostly neurological problems and how to deal with them. But I've had some fascinating discussions with complete strangers because of my blog and I'll continue blogging into the forseeable future. Because of Google many people find my blog despite it being a small fish in a big and noisy blog sea. Google is a great tool and I'm glad they index blogs. Now, I'm as upset as the next guy about spam blogs, but "crap" blogs are relative. You may read my blog and find it lame. Others, including myself, would disagree with you. But if you don't find the subjects I write about interesting or valuable, so what?

    Slashdot cracks me up sometimes. What is it to some of you guys if somebody wants to blather on and on about their breakfast or their boyfriend? If the site is a bore move on, but you could tell that from the Google search, right? Seriously, I haven't found many blogs that come up in my searches that aren't related to my searches. Not as much as parked domain sites and adsense whores at any rate.

    Not all bloggers can't be bothered to code a web page. In fact, because I do code I'm able to personalize my site. Every month I tinker and tinker with the code when I find some time. Blogging may be an exercize in vanity, but then so isn't hosting your own website. In fact, the whole web publishing scene is about personal expression, and what's wrong with that?

    --
    The Splintered Mind - Overcoming