Slashdot Mirror


The Ham and Spam of Weblogs

An anonymous reader submits "Will the blogosphere become just as spammy as Usenet? There may be over 10M weblogs out there, most of them seem to be fake spam blogs created to manipulate the search engines. Scott Johnson, CTO at Feedster, complained that "at times we see upwards of 90% of the traffic from Blogspot being spam," and the problem is likely to only get worse. Can blog search engines like Technorati, Feedster, and PubSub filter the signal from the torrent of noise? Or will we have to seek new approaches such as the social filtering used by Del.icio.us or collaborative filtering used by Findory to separate the ham from the spam?"

18 of 192 comments (clear)

  1. To me (most) blogs ARE spam by Anonymous Coward · · Score: 4, Informative

    I wish Google had an option to exclude blogs from my search. Considering many blogs use b2evolution, phpBB, or whatever, Google could easily determine what IS a blog and what IS NOT and filter it accordingly. Google IMHO would be a much better place if I could exlude blogs and those stupid parked domain search sites from my queries.

    I'm not trying to be flamebait; It would be a nice option though. ::242

    1. Re:To me (most) blogs ARE spam by aftk2 · · Score: 4, Insightful

      I'm just curious - exactly _what_ would include, if not for blogs? Certainly, I can understand not including those parked domain search sites: they're gaming the system, completely unhelpful, and filled with bogus content.

      But blogs? Sure, much of the content is poorly written, or not applicable to what most people - or, well, rather, 90% of a given population - are interested in. But in searches especially, doesn't it make sense to list results that include those normal people so interested in a particular topic that they blog about them?

      For example, blogs can be very helpful when facing computer troubles, provided you're dealing with bloggers who know how to write for Google. This is a good example. I mean, this surely has to be more worthy of inclusion in Google than the lion's share of those web-based bulletin boards that get indexed - you know the ones, with the "Next in thread" and the replies that are typically out of date, or altogether unrelated to your original query.

      Everyone's quick to dismiss things lately. Don't dismiss blogs, just because sometimes their content seems insular and not applicable to what you've searched for. That's a problem with the search engines, not the sites they index.

      --
      concrete5: a cms made for marketing, but strong enough for geeks.
    2. Re:To me (most) blogs ARE spam by bigman2003 · · Score: 3, Insightful

      The blogs don't bother me nearly as much as "those stupid parked domain search sites."

      I don't know how many times I have done a Google search, and the 3rd or 4th result comes back with my exact phrase..yay!

      Then I go to some stupid, totally lame site advertising domain names, or listing other sites, or something like that.

      I never have figured out how they get listed in Google the way they do though- because my search phrase is not listed on the page...so evidently they know something I don't.

      --
      No reason to lie.
  2. Human validation by SamMichaels · · Score: 4, Funny

    The guy makes a good point...human validation via captcha. If you're going to spend 10 minutes complaining, whining, bragging and/or loathing about something then you can spend 3 seconds typing in the word "uNFsaQ" to prove you're human.

    If it takes you less than 10 minutes to write in your dear diary--I mean blog--then it's probably a 1 liner to the effect of "i think she likez me omglolbbq!!!" and you need to get off my internet.

    Problem solved. Next?

  3. Down with neologisms by Urusai · · Score: 3, Funny

    "blogosphere"? Considering that blogs are probably the dumbest form of communication possible (a linear log of rambling bullshit) I can only hope that the Blogosphere is destroyed by the Vogon Constructor Fleet to make way for a colonic bypass.

  4. Shouldn't be too hard to filter by XNormal · · Score: 4, Interesting

    With email spam filtering you have to consider each email separately. A blog has a persistent identity and reputation. In theory, this should make it easier to filter blog spam than email spam. On results of this type of filtering is that it will will penalize new blogs in search results, both spammy and real.

    Blog comment spam will remain a problem, of course.

    --
    Stop worrying about the risks of nuclear power and start worrying about the risks of not using nuclear power.
  5. Software is Not Social by lux55 · · Score: 5, Insightful

    I just wanted to point out that so-called "social software" is not social. Person-to-person communication through computers is mediated and indirect. Technology is a barrier to communication as much as it is an enabler. I agree that it is an enabler in situations where it is used to help overcome disabilities and things of that nature, however technology is used moreso by people who are actually avoiding being social. Email is often preferable to a telephone because it creates an additional barrier between ourselves and the "recipient" (aka person).

    A prime example of software in a "social" context is the chatter that accompanies networked video games. This does not form real relationships between people. I heard a teenager recently say that his gaming buddies, who he doesn't even know by name, are like family to him. Technology has helped a whole generation and then some to fail to learn what real relationships are. When a teenager can't distinguish between somebody he's only ever witnessed virtually shoot ze germans and the people who nurtured him before he was able to take care of himself, we have a problem Houston.

    And it's only getting worse. Now we've begun adding "social" in front of all kinds of new web applications. Anything that lets other users see your profile and the items you post and comment on them is seen as a valid replacement for real human contact.

    There was a line from a movie I saw recently called Crash, where Don Cheadle's character says to his girlfriend "It's the sense of touch. Any real city you walk, you know. You brush past people, people bump into you. In L.A., nobody touches you. We're always behind this metal and glass. I think we miss that sense of touch so much, that we crash into each other just so we can feel something.". The next time we use the word "social" to describe a new type of web application, I think we should give that some thought first.

    1. Re:Software is Not Social by aftk2 · · Score: 3, Insightful
      You raise valid points, but what would you have to say to these people?
      The passing of a forum god.
      For the people who are mourning the loss described in the link, is their grief less meaningful than that of those who knew the person directly, face-to-face? Perhaps, but perhaps not: I know a bunch of people, some of whom I see regularly, but with whom I haven't had as meaningful a relationship as some people I've spoken to online, but have never met in person. Is there a qualitative difference between the two types of social interaction? Probably - but I think it's too easy to say "the way we always used to do things is right" and "This is new, and less personal, and hence, wrong."
      --
      concrete5: a cms made for marketing, but strong enough for geeks.
    2. Re:Software is Not Social by lux55 · · Score: 3, Interesting

      "Is there a qualitative difference between the two types of social interaction?"

      While you did answer your own question ("Probably..."), I do like your response. You raise good questions. I definitely don't believe that only face-to-face communication is real social interaction, but I could have been clearer on this point. I'm not an absolutist, and I'm not pining for the dark ages or anything like that ;) If I didn't believe in communication through mediation, I wouldn't be here on /. right now.

      Anyway, my real point is that these online substitutes are serving more and more people as substitutes for the real thing, to the point where young'uns are being brought up not knowing that there is a difference. Instead of getting together (in cases that are actually able to) they go online and "chat". Mediated communication inherently encourages more mediation because we as human beings form habits. And while mediation can still produce relationships (I can't deny that), they are less rich than direct unmediated ones. And technology is inherently a mediator, no getting around it (pun slightly intended ;).

      To be perfectly honest though, most face-to-face relationships are just as mediated as those maintained through technology. Real-world mediators include our political and religious views, our egos, etc. which inhibit our ability to relate directly and honestly with one another just as much as the inability to see facial expressions on a forum.

      I definitely use technology where appropriate to augment relationships at distances. I only see my family twice a year, but I keep in touch via telephone all the time, and I post photos to flickr for them to see. My sisters email me once in a while, which is great too. These things definitely have value, but they are no substitute for being able to see and hug my family. They simply help make the time between visits bearable.

      Cheers,

      Lux

  6. Check out by Greyfox · · Score: 4, Informative
    The Customize Google plugin. I don't use it to block out adverts and would encourage you not to either, but it is handy for blocking out those obnoxious spammy sites that far too often show up in my google searches.

    It was a bit unintuitive how you add sites to the filter list though -- just cut and paste "http://*.whatever.com/*" into your extensions list and any search results from whatever.com will then be greyed out.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  7. Welcome to Slashdot. by ciroknight · · Score: 4, Interesting

    Slashdot is a blog, created in the context of a news site, which we all come to and bitch about things we want out of technology, think is/are cool, and/or hate and want everyone to know why.

    That being said, Google (along with other large search engines) have already taken stances on blogging, and are actively pursuing their individual stances. For most, this is creating their own blog service, and doing some shifting in their code to make sure blogs don't come out on top. But this isn't an absolute truth.

    If you want these things, and Google doesn't offer them, make your own search engine, and do it better. No, seriously, don't look at me like I'm crazy; there have been over a dozen "major" search engines created after Google, some are only in serious use by geeky populations (AlltheWeb, as far as I can tell, fits this), some by the trendy, some by the "I hate Google"ites, etc. etc. It's as simple as that.

    One reason I think Google's strayed from taking such a hardline on blogs is simply out of ease of use. Google doesn't want to complicate life with a million more search options, especially ones you can deal with yourself by subtracting out the majorly offensive sites (-livejournal -blogger -blogspot, etc).

    --
    "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
    1. Re:Welcome to Slashdot. by The+One+and+Only · · Score: 5, Insightful

      God forbid that one voice be allowed to speak without needing to ask the consent of thousands of others.

      --
      In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
    2. Re:Welcome to Slashdot. by ciroknight · · Score: 5, Interesting

      Wait, how many Slashdot editors are there? Oh right, not thousands. Not even hundreds.

      Secondly, haven't you ever heard of the Freedom of Speech, as guarenteed to us by the Second Amendement in the Bill of Rights of the Constitution of the United States of America? By your comment, I'll assume not.

      Why should we quash out individuality so that one person can get to the content they want better? Why shouldn't we just solve the damned problem, instead of creating more?

      --
      "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
    3. Re:Welcome to Slashdot. by ciroknight · · Score: 3, Insightful

      Take it up with them personally, then. Oh, or you could just use a search engine that actively removes blogs from their indicies. Or you could make your own and remove them personally. Or you could subtract out the sites you don't want in existing search engines.

      As I can see it, choice is on your side. They have the choice of posting or not posting. You have the choice of how you want to deal with it.

      --
      "Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush
    4. Re:Welcome to Slashdot. by gbulmash · · Score: 3, Informative
      Secondly, haven't you ever heard of the Freedom of Speech, as guarenteed to us by the Second Amendement in the Bill of Rights of the Constitution of the United States of America? By your comment, I'll assume not.

      If what you say above were true, I'd be careful where you point that mouth. The safety is off.

      The Second Amendment is our right to guns, not our right to free speech. Free speech is in the First Amendment. So

      And be very careful. All the First Amendment guarantees is " Congress shall make no law..." abridging freedom of speech.

      If you want to go to a public park and preach religion or recite your political manifesto, the First Amendment guarantees your right to. But it's not absolute.

      If you want to preach/recite on my front lawn, my property rights prevail and I can physically throw you off my property if you refuse to leave voluntarily. If you want to preach/recite at midnight and you're preaching/reciting too loud, city noise ordinances prevail, and the cops can arrest or ticket you if you refuse to quiet down or move along.

      Slashdot is required to allow you a certain amount of leeway in exchange for safe harbor protections covering public forums, but that is a matter of them trying to avoid getting sued over any libelous/defamatory content in your posts, not any First Amendment guarantee they are obligated to provide you. And if you go beyond that leeway, they can ban you from posting and erase your posts.

      So if you want to argue in favor of blog spam, find another argument. The First Amendment has nothing to do with whether Google and other blog services should voluntarily clean up their act and put roadblocks/barriers in place to stem the flow of blog spam.

      - Greg

  8. Blogspot by dedazo · · Score: 4, Informative
    Blogspot is fucking overflowing with these fake blogs. Here's one example.

    If you have a few minutes, click on the randomizer button at the top of the screen that reads "Next Blog" a couple of times. I'd be willing to say that at least 2 out of every 10 blogs is a spam farm.

    It's just fucking sad.

    --
    Web2.0: I love when people Flickr my cuil and digg my boingboing until my google is reddit and I start to yahoo
  9. Usenet has improved substantially by Animats · · Score: 5, Informative

    Actually, Usenet is doing quite well. The spam battle has been won; there's very little spam in the technical groups. Serious workers in difficult fields are on there. Check out, say, "comp.games.development.programming.algorithms", where the people who write physics engines discuss how to do it. Or "comp.std.c++.moderated", where proposed changes to C++ are discussed. Usenet has far lower advertising content than the Web, where, today, "content" seems to be a little box in the middle of the page, surrounded by blinking ads.

  10. Re:2 years and no one will care by Absentminded-Artist · · Score: 5, Insightful

    I have been coding webpages since March of 1995. I have learned HTML 1.0, 2.0, 3.0, 4.0 and now CSS1.0 and CSS2.0 and... As exciting as all that can be sometimes I just want to post my thoughts and be done with it. There's nothing wrong with efficiency. Blog sites can be great time savers. I used to have a web journal, wrote entries in my Palm Pilot, hotsynced the data to my Mac and ftp'd it onto my server using Applescript - all the while snorting at all the newbies using blog sites. Then I decided I valued my time better. I opened up a blog in January of this year (http://thesplinteredmind.blogspot.com/ and have had a blast. I post once a week.

    Now, my blog isn't going to be popular. I cover mostly neurological problems and how to deal with them. But I've had some fascinating discussions with complete strangers because of my blog and I'll continue blogging into the forseeable future. Because of Google many people find my blog despite it being a small fish in a big and noisy blog sea. Google is a great tool and I'm glad they index blogs. Now, I'm as upset as the next guy about spam blogs, but "crap" blogs are relative. You may read my blog and find it lame. Others, including myself, would disagree with you. But if you don't find the subjects I write about interesting or valuable, so what?

    Slashdot cracks me up sometimes. What is it to some of you guys if somebody wants to blather on and on about their breakfast or their boyfriend? If the site is a bore move on, but you could tell that from the Google search, right? Seriously, I haven't found many blogs that come up in my searches that aren't related to my searches. Not as much as parked domain sites and adsense whores at any rate.

    Not all bloggers can't be bothered to code a web page. In fact, because I do code I'm able to personalize my site. Every month I tinker and tinker with the code when I find some time. Blogging may be an exercize in vanity, but then so isn't hosting your own website. In fact, the whole web publishing scene is about personal expression, and what's wrong with that?

    --
    The Splintered Mind - Overcoming