Slashdot Mirror


95% of User-Generated Content Is Bogus

coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

2 of 192 comments (clear)

  1. It might be true, but it's also irrelevent. by onion2k · · Score: 5, Insightful

    95% of user-generated posts on Web sites are spam or malicious.

    The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

    Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

    1. Re:It might be true, but it's also irrelevent. by CAIMLAS · · Score: 5, Insightful

      A lot of forum software works well, until it gets "behind the curve", and then the site maintainer pulls the site*.

      By "behind the curve" I mean any of the following can/does happen:
      1) Forum software gets out of date and user fails to upgrade due to modifications or similar, resulting in spam.
      2) Forum software gets popular without having a good security model and/or update cycle, resulting in exploits.
      3) Gets inundated with comment approvals and the forum (or blog) gets ignored or set to auto-allow out of frustration.

      * By "pulls the site" I mean "abandons it but doesn't take it down". That's typically the end result.

      It's a lot of work to maintain your own forum and/or blog: managing spam can and will take hours+ from your day if you've not got a good automated and/or textual way to deal with it: web interfaces are clumsy.

      Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

      Yet, unlike most of those cars, the actual blog content is not necessarily useless. I have seen quite a few abandoned blogs and/or forums which have 3-10 year old information on them which is by no means useless; it's just getting buried.

      Digital archeologists of the future will probably have to figure out an automated way to prune back the spam to find the actual Internet, the way things are going.

      Consider: if spam accounts for 95% of all user-generated content, and said user-generated content is actually a non-trivial percentage of all actual content online (believable), consider how much bandwidth gets wasted by these spammers. (Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results so it's not going to likely be perused with regularity - unless it's more heavily seeded on topics common folks search.)

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers