Slashdot Mirror


95% of User-Generated Content Is Bogus

coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

25 of 192 comments (clear)

  1. This just in by Shadow+of+Eternity · · Score: 4, Funny

    Animals shit in ~95% of their habitat...

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
    1. Re:This just in by Smegly · · Score: 5, Funny

      a staggering 95% of User Generated Content is... ...spam. Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

      Give up your Name, rank, email... so we can enlighten you with valuable information from our partners.

    2. Re:This just in by timeOday · · Score: 4, Informative
      This has almost nothing to do with websites like Wikipedia, which people actually look at. Spammers create huge sets of keyword-laden wikis and other web pages, which all link to each other, for the purpose of fooling search engines that use PageRank and similar algorithms. To search engines, it's hard to differentiate this from a popular site with lots of users. But when you see these pages you know it immediately, like spam in your inbox.

      It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com (picked at random), with no real content.

      But it doesn't interfere much with most poeple's use of the web.

    3. Re:This just in by VoltageX · · Score: 5, Informative

      Sorry to hijack this, but http://securitylabs.websense.com/content/Assets/WSL_ReportQ3Q4FNL.PDF seems to be the direct link to the paper.

      --
      "Anonymous could not immediately be reached for further comment." - International Business Times
  2. Want to get ripped? by Anonymous Coward · · Score: 5, Funny

    I got ripped in 2 weeks. learn how with secret juice formula.

    1. Re:Want to get ripped? by Pharmboy · · Score: 4, Funny

      If I want real juice, I just drink Florida Orange Juice®. It's not just for breakfast anymore!

      --
      Tequila: It's not just for breakfast anymore!
  3. Let me be the first to post that this is BS. by nicknamenotavailable · · Score: 5, Funny

    That is so untrue. There is value in what I write.

  4. This is slashdot by Junior+J.+Junior+III · · Score: 4, Funny

    We know.

    --
    You see? You see? Your stupid minds! Stupid! Stupid!
  5. It might be true, but it's also irrelevent. by onion2k · · Score: 5, Insightful

    95% of user-generated posts on Web sites are spam or malicious.

    The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

    Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

    1. Re:It might be true, but it's also irrelevent. by mwvdlee · · Score: 5, Funny

      95% of humans are over 100 years old. Most of them are dead.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    2. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 5, Funny

      That should be on Fox News.

      "Number of dead people reaches all time high!"

    3. Re:It might be true, but it's also irrelevent. by Trepidity · · Score: 5, Informative

      It seems that at least as well as anyone can estimate, the current population really is about 5% of the total humans who've ever lived.

    4. Re:It might be true, but it's also irrelevent. by CAIMLAS · · Score: 5, Insightful

      A lot of forum software works well, until it gets "behind the curve", and then the site maintainer pulls the site*.

      By "behind the curve" I mean any of the following can/does happen:
      1) Forum software gets out of date and user fails to upgrade due to modifications or similar, resulting in spam.
      2) Forum software gets popular without having a good security model and/or update cycle, resulting in exploits.
      3) Gets inundated with comment approvals and the forum (or blog) gets ignored or set to auto-allow out of frustration.

      * By "pulls the site" I mean "abandons it but doesn't take it down". That's typically the end result.

      It's a lot of work to maintain your own forum and/or blog: managing spam can and will take hours+ from your day if you've not got a good automated and/or textual way to deal with it: web interfaces are clumsy.

      Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

      Yet, unlike most of those cars, the actual blog content is not necessarily useless. I have seen quite a few abandoned blogs and/or forums which have 3-10 year old information on them which is by no means useless; it's just getting buried.

      Digital archeologists of the future will probably have to figure out an automated way to prune back the spam to find the actual Internet, the way things are going.

      Consider: if spam accounts for 95% of all user-generated content, and said user-generated content is actually a non-trivial percentage of all actual content online (believable), consider how much bandwidth gets wasted by these spammers. (Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results so it's not going to likely be perused with regularity - unless it's more heavily seeded on topics common folks search.)

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    5. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 5, Funny

      Well, then MSNBC would just rip into Fox for inferring these unfortunate individuals should no longer vote. CNN would chime in and blame the lack of universal health care for the deaths.

    6. Re:It might be true, but it's also irrelevent. by Kugrian · · Score: 4, Interesting

      How much of it is user generated content that's copied from one site onto a zillion others?

    7. Re:It might be true, but it's also irrelevent. by Hognoxious · · Score: 4, Funny

      People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

      And when they want a change from that, they come here.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  6. Re:Nothing to see here. Move along. (Bad summary) by PCM2 · · Score: 4, Insightful

    And in addition, the report itself doesn't even explain the result. It's a bullet point at the beginning of the report, but there's no explanation or analysis.

    --
    Breakfast served all day!
  7. Was going to RTFA but it's probably bogus by syousef · · Score: 5, Funny

    ...95% probability actually. So I didn't bother.

    --
    These posts express my own personal views, not those of my employer
  8. just a cheap shot by Nyder · · Score: 4, Funny

    I guess that goes in hand with 95% of kdawson's submissions being crap and not worth the time.

    --
    Be seeing you...
  9. The message... by Anonymous Coward · · Score: 4, Insightful

    The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

    Well, I don't care if 99% of user-generated content it is crap; people need to be free to create it, because some individual in the other 1% may just come up with the cure for cancer, and despite whatever it does to Big Pharma's profits, everyone needs to be able to hear about it.

  10. So Sturgeon was right by Aussie · · Score: 5, Interesting

    "Ninety percent of everything is crud."

    http://en.wikipedia.org/wiki/Sturgeon's_Law

    1. Re:So Sturgeon was right by Faylone · · Score: 4, Funny

      Sturgeon just had low standards.

  11. Calling spam email UGC is... disingenuous. by argent · · Score: 4, Insightful

    I would say that 95% of email is commercial in nature, and not "user generated content". To me "UGC" is something that people who are actually active users (consumers as well as creators) of a service generate... not something injected into the service from outside by predators.

  12. Replace "UGC" with "Usenet" by Antique+Geekmeister · · Score: 4, Insightful

    We've seen this before, with Usenet, BBS's, MUD's, and Email. The advertisers, and the trolls, find it easy to spew their material across many thousands of targets, and get enough money or gratification from doing so that it funds their efforts. It doesn't even have to make money: they just have to believe that it _can_ make money, and the professionals will simply continue.

    Whatever would make anyone think that "User Generated Content" forums would be any different?

  13. 95% of the story is bogus by gmuslera · · Score: 4, Informative
    The original article say that they scan 40 millon sites an 10 millon emails each hour, and they are refering to thjis report (that also links to the full info, and video of the presentation of that info).

    Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.