Slashdot Mirror


95% of User-Generated Content Is Bogus

coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

61 of 192 comments (clear)

  1. This just in by Shadow+of+Eternity · · Score: 4, Funny

    Animals shit in ~95% of their habitat...

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
    1. Re:This just in by Smegly · · Score: 5, Funny

      a staggering 95% of User Generated Content is... ...spam. Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

      Give up your Name, rank, email... so we can enlighten you with valuable information from our partners.

    2. Re:This just in by KGIII · · Score: 2, Interesting

      Anonymity comes into play I suspect. I'm not a psychologist though. It makes me wonder if there will be any attempt (or anyone with the compute power and gumption is more accurate I suppose) to fact check Wikipedia. I'm rather curious as to how that will turn out if it is done in a non-biased and total in situ way. I imagine it would take a great deal of work and then there are people who will lay claim as to it being constantly changed but the point that I'm considering is what is the accuracy level at a particular moment in time. I'm not interested in how accurate it may be in the future, just the now.

      I don't actually hold any opinion on its accuracy and I refer to it for my own needs quite frequently. I'm mostly curious as it is one of the largest sites with user generated content and it holds an authoritative position in some circles.

      --
      "So long and thanks for all the fish."
    3. Re:This just in by gumbi+west · · Score: 2, Informative

      Nature did a study and found Wikipedia was slightly less reliable than Britannica. The editors of Britannica objected to the methods, and I'm not sure I like them ether, but I think it was an honest attempt. I think all of the articles were science articles and this is from 2005, so it is not exactly what you were asking for (its not 2010).

    4. Re:This just in by timeOday · · Score: 4, Informative
      This has almost nothing to do with websites like Wikipedia, which people actually look at. Spammers create huge sets of keyword-laden wikis and other web pages, which all link to each other, for the purpose of fooling search engines that use PageRank and similar algorithms. To search engines, it's hard to differentiate this from a popular site with lots of users. But when you see these pages you know it immediately, like spam in your inbox.

      It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com (picked at random), with no real content.

      But it doesn't interfere much with most poeple's use of the web.

    5. Re:This just in by ChipMonk · · Score: 2, Informative

      Randy Pausch, after writing for the World Book Encyclopedia, declared that he had no problem with Wikipedia's quality controls.

      But don't watch his Last Lecture for just that...

    6. Re:This just in by spazdor · · Score: 3, Funny

      you mean, fill in their User Generated statistics maliciously?

      --
      DRM: Terminator crops for your mind!
    7. Re:This just in by justin12345 · · Score: 2, Informative

      I seem to remember that a while back someone (as they say on Fark.com, I'm too drunk to look it up) did a comparison of Encyclopedia Britannica to Wikipedia. Their conclusions were based on a random sampling of 500 topics, with the wiki compared to the Brit article of the same subject. The conclusion was that Britannica contained slightly less errors per entry, but significantly less data per entry as well. The study didn't address the issue of Wikipedia's comparatively massive number of entries, and it didn't address the fact that a large number of the wiki articles are about topics Britannica would be foolish to waste the paper to print.

      --
      Cool art gallery, if you're into that sort of thing.
    8. Re:This just in by PaganRitual · · Score: 2, Informative

      I think you've slightly missed the point. When they say bogus they don't mean the content on a site like Wikipedia, although that site provides a useful example to explain my point. Try to go to Wikipedia, except do a typo.

      http://www.wikapedia.org/
      http://www.wikipeedia.org/
      http://www.wickipedia.org/
      http://www.wikepedia.org/

      I imagine this is likely to be what they're talking about when they say bogus or a scam. Take any of your favourite websites and slightly misspell the URL. Then extrapolate out over everyones favourite, popular websites. Then realise that there are probably dozens of variations for each one.

    9. Re:This just in by VoltageX · · Score: 5, Informative

      Sorry to hijack this, but http://securitylabs.websense.com/content/Assets/WSL_ReportQ3Q4FNL.PDF seems to be the direct link to the paper.

      --
      "Anonymous could not immediately be reached for further comment." - International Business Times
    10. Re:This just in by cgenman · · Score: 2, Interesting

      I'm not surprised. Wikipedia is great for niche articles like finding out what happened to Star Trek, The Experience. Such niche information wouldn't be viable for Britannica to cover, but anyone with an interest can put up an article about it. If you want real articles on things like science, DON'T GO TO AN ENCYCLOPEDIA. They're about as good at teaching you usable science as they are teaching you how to play the flute.

    11. Re:This just in by A+Big+Gnu+Thrush · · Score: 2, Funny

      You have to average those two numbers to get the 95% figure. Don't be so lazy next time.

  2. Want to get ripped? by Anonymous Coward · · Score: 5, Funny

    I got ripped in 2 weeks. learn how with secret juice formula.

    1. Re:Want to get ripped? by sopssa · · Score: 3, Funny

      Speaking of juice, there's nothing better than a cold glass of Fanta !

    2. Re:Want to get ripped? by maxume · · Score: 3, Funny

      If I wanted juice in my soda, I'd steal it from Mark McGwire.

      --
      Nerd rage is the funniest rage.
    3. Re:Want to get ripped? by Pharmboy · · Score: 4, Funny

      If I want real juice, I just drink Florida Orange Juice®. It's not just for breakfast anymore!

      --
      Tequila: It's not just for breakfast anymore!
  3. Let me be the first to post that this is BS. by nicknamenotavailable · · Score: 5, Funny

    That is so untrue. There is value in what I write.

    1. Re:Let me be the first to post that this is BS. by newdsfornerds · · Score: 2, Interesting

      Pepsi supports Israeli fascism while depleting your precious bodily fluids. And Snapple kills Afro-Americans seven different ways. http://www.snopes.com/business/alliance/snapple.asp

      --
      Damping absorbs vibrations. Dampening is caused by moisture.
  4. This is slashdot by Junior+J.+Junior+III · · Score: 4, Funny

    We know.

    --
    You see? You see? Your stupid minds! Stupid! Stupid!
    1. Re:This is slashdot by Dilligent · · Score: 3, Insightful

      +5 Insightful, not Funny, nope.. insightful, only on slashdot could such a thing happen. Part of the reason i love it as much as i do, oh and while you're here: I'm a prince from the far lands of absurdistan and would like to ask if you would like to [insert random passage of text here]

    2. Re:This is slashdot by Anonymous Coward · · Score: 2, Funny

      No..... this is SPARTA!!!!

  5. It might be true, but it's also irrelevent. by onion2k · · Score: 5, Insightful

    95% of user-generated posts on Web sites are spam or malicious.

    The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

    Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

    1. Re:It might be true, but it's also irrelevent. by mwvdlee · · Score: 5, Funny

      95% of humans are over 100 years old. Most of them are dead.

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    2. Re:It might be true, but it's also irrelevent. by Asadullah+Ahmad · · Score: 2, Interesting

      I don't assume they included Wikipedia in the "user generated" category, otherwise that much non-bogus content would have definitely tipped the scale a bit.

      In my personal experience however, even without wikipedia, I have not come across that much bogus stuff on forums and random comments.

    3. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 5, Funny

      That should be on Fox News.

      "Number of dead people reaches all time high!"

    4. Re:It might be true, but it's also irrelevent. by Yaur · · Score: 2, Interesting

      More likely they are generalizing the activity they are seeing on their fake/honey pot sites on the internet as a whole.

    5. Re:It might be true, but it's also irrelevent. by Trepidity · · Score: 5, Informative

      It seems that at least as well as anyone can estimate, the current population really is about 5% of the total humans who've ever lived.

    6. Re:It might be true, but it's also irrelevent. by CAIMLAS · · Score: 5, Insightful

      A lot of forum software works well, until it gets "behind the curve", and then the site maintainer pulls the site*.

      By "behind the curve" I mean any of the following can/does happen:
      1) Forum software gets out of date and user fails to upgrade due to modifications or similar, resulting in spam.
      2) Forum software gets popular without having a good security model and/or update cycle, resulting in exploits.
      3) Gets inundated with comment approvals and the forum (or blog) gets ignored or set to auto-allow out of frustration.

      * By "pulls the site" I mean "abandons it but doesn't take it down". That's typically the end result.

      It's a lot of work to maintain your own forum and/or blog: managing spam can and will take hours+ from your day if you've not got a good automated and/or textual way to deal with it: web interfaces are clumsy.

      Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

      Yet, unlike most of those cars, the actual blog content is not necessarily useless. I have seen quite a few abandoned blogs and/or forums which have 3-10 year old information on them which is by no means useless; it's just getting buried.

      Digital archeologists of the future will probably have to figure out an automated way to prune back the spam to find the actual Internet, the way things are going.

      Consider: if spam accounts for 95% of all user-generated content, and said user-generated content is actually a non-trivial percentage of all actual content online (believable), consider how much bandwidth gets wasted by these spammers. (Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results so it's not going to likely be perused with regularity - unless it's more heavily seeded on topics common folks search.)

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    7. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 5, Funny

      Well, then MSNBC would just rip into Fox for inferring these unfortunate individuals should no longer vote. CNN would chime in and blame the lack of universal health care for the deaths.

    8. Re:It might be true, but it's also irrelevent. by CAIMLAS · · Score: 2, Interesting

      There were, but not many. Nowhere near the scads of people roaming the planet today. I've read that there have been several times in known history where there were fewer than a couple hundred thousand people; it's plausible that the past 100 years has had more people alive than all of human history, considering the multiple near-extinction events which have supposedly occurred.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    9. Re:It might be true, but it's also irrelevent. by Kugrian · · Score: 4, Interesting

      How much of it is user generated content that's copied from one site onto a zillion others?

    10. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 2, Funny

      Oh god, are they okay?

    11. Re:It might be true, but it's also irrelevent. by Anonymous Coward · · Score: 3, Funny

      Are you implying that Wikipedia is not bogus content?

    12. Re:It might be true, but it's also irrelevent. by dosius · · Score: 2, Informative

      Sturgeon's Law comes into play, as always. 90% of everything is crud

      -uso.

      --
      What you hear in the ear, preach from the rooftop Matthew 10.27b
    13. Re:It might be true, but it's also irrelevent. by Hognoxious · · Score: 4, Funny

      People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

      And when they want a change from that, they come here.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    14. Re:It might be true, but it's also irrelevent. by mrsquid0 · · Score: 2, Informative

      Depending on what you assume about paleolithic populations about 15%-25% of all the humans who ever lived are alive today. That means that roughly one our of every five people who ever walked the Earth have the potential to post to slashdot.

      --
      Just because you are paranoid does not mean that no-one is out to get you.
    15. Re:It might be true, but it's also irrelevent. by gumbi+west · · Score: 2, Interesting

      I've registered in Chicago, and it was very easy. Voting after that registration required a drivers license though.

    16. Re:It might be true, but it's also irrelevent. by steelfood · · Score: 3, Informative

      It's plausible that the past 100 years has had more people alive than all of human history

      And that would still make the current population only a little more than 50% of all that people that have been alive.

      Except considering that homo sapiens have been around for several hundred thousand years, I think your estimates for the number of humans that have ever walked the planet may be a bit low.

      --
      "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
    17. Re:It might be true, but it's also irrelevent. by greyhueofdoubt · · Score: 2, Informative

      Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results

      That's what I was going to say. Unless people are searching for cialis or real replica watches or VIaGrA, they shouldn't see the spam itself. I spend a lot of time browsing all sorts of different sites and it's very rare for me to ever see spam*. How I've avoided the 95% of the web that is spam? I must have some hidden talent, who knows.

      *The exception being the occasional google search where instead of information about a thing, I get three pages of people trying to sell the thing (try "lp gas generator" )

      -b

      --
      No offense, but I've stopped responding to AC's.
    18. Re:It might be true, but it's also irrelevent. by justin12345 · · Score: 2, Interesting

      This is a really interesting question, and as your link points out, a very difficult one to solve given that we know so little of our own history. There is a lot of evidence human civilization was thriving until a comet strike about 13,000 years ago on the North American continent wiped out most of the worlds population and potentially raised sea levels dramatically, submerging their cities.

      There is evidence that there were some advanced civilizations prior to the theorized comet incident. They might have had large populations. The problem is that this topic tends to attract the type of people that like to throw around terms like "Atlantians" and "Nephilim", so its really hard to casually research. Typing in "13,000" and "comet" to Google gets you mostly websites with black backgrounds with star fields on them, purple new-age-y fonts, and a lot of talk about Noah and aliens (contributing greatly to the 95% of the internet is bullshit figure above, I'm sure).

      --
      Cool art gallery, if you're into that sort of thing.
  6. Re:Nothing to see here. Move along. (Bad summary) by PCM2 · · Score: 4, Insightful

    And in addition, the report itself doesn't even explain the result. It's a bullet point at the beginning of the report, but there's no explanation or analysis.

    --
    Breakfast served all day!
  7. Was going to RTFA but it's probably bogus by syousef · · Score: 5, Funny

    ...95% probability actually. So I didn't bother.

    --
    These posts express my own personal views, not those of my employer
  8. just a cheap shot by Nyder · · Score: 4, Funny

    I guess that goes in hand with 95% of kdawson's submissions being crap and not worth the time.

    --
    Be seeing you...
  9. 40 000 000 sites per hour? by nicknamenotavailable · · Score: 2, Interesting

    Every single hour the Internet HoneyGrid scans some 40 million websites for malicious code as well as 10 million emails for unwanted content and malicious code.

    So 40 million sites per hour is 960 million sites per day. While wikipedia says that there over 25 billion pages but can that number be accurate?

  10. The message... by Anonymous Coward · · Score: 4, Insightful

    The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

    Well, I don't care if 99% of user-generated content it is crap; people need to be free to create it, because some individual in the other 1% may just come up with the cure for cancer, and despite whatever it does to Big Pharma's profits, everyone needs to be able to hear about it.

    1. Re:The message... by Yaur · · Score: 3, Interesting

      the subtext is, the internet is dangerous so you need to buy their product.

    2. Re:The message... by jgrahn · · Score: 2, Informative

      The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

      You're reading too much into it, and you are also misled by the misquote in the ,/ title. The article said "95% of user-generated posts on Web sites are spam or malicious", probably meaning postings in forums, "comments" and stuff like that. They're not saying plain web pages by *authors* who aren't faceless corporation drones are crap.

  11. can be adequately explained by stupidity by findoutmoretoday · · Score: 2, Insightful

    "95% of User Generated Content is either malicious in nature or spam"

    "Never attribute to malice that which can be adequately explained by stupidity"

    So I read "95% of User Generated Content is stupid" I agree,  count me in.

  12. Obligatory by jlintern · · Score: 2, Funny

    In human terms, the majority of computers have AIDS. And we all know where they caught it.

    Your mom?

  13. So Sturgeon was right by Aussie · · Score: 5, Interesting

    "Ninety percent of everything is crud."

    http://en.wikipedia.org/wiki/Sturgeon's_Law

    1. Re:So Sturgeon was right by Faylone · · Score: 4, Funny

      Sturgeon just had low standards.

  14. Calling spam email UGC is... disingenuous. by argent · · Score: 4, Insightful

    I would say that 95% of email is commercial in nature, and not "user generated content". To me "UGC" is something that people who are actually active users (consumers as well as creators) of a service generate... not something injected into the service from outside by predators.

  15. And of the rest... by Arancaytar · · Score: 2, Funny

    Out of the 5% that are not generated by spambots, 99% is still generated by idiots.

  16. Not so staggering... by osu-neko · · Score: 3, Insightful

    ... a staggering 95% of User Generated Content is either malicious in nature or spam.

    Considering 95% of internet users are malicious (see GIFT), it's hardly staggering that 95% of user generated content is malicious too. :p

    --
    "Convictions are more dangerous enemies of truth than lies."
  17. Replace "UGC" with "Usenet" by Antique+Geekmeister · · Score: 4, Insightful

    We've seen this before, with Usenet, BBS's, MUD's, and Email. The advertisers, and the trolls, find it easy to spew their material across many thousands of targets, and get enough money or gratification from doing so that it funds their efforts. It doesn't even have to make money: they just have to believe that it _can_ make money, and the professionals will simply continue.

    Whatever would make anyone think that "User Generated Content" forums would be any different?

    1. Re:Replace "UGC" with "Usenet" by Anonymous Coward · · Score: 2, Informative

      BBS's? Realy? I don't remember a single instance of "spam" on any BBS during the golden years. Perhaps that's because individual systems were far easier to control and moderate.

      USENET fell because it was never designed with any real moderation or control in mind. Which was great as long as the users played nicely together. But after the Eternal September and the coming of gold diggers like Cantor & Siegel, the whole system fell apart.

      If you want the flood of garbage to stop, you need someone standing at the door with a baseball bat. The days of the internet "playing nicely together" ended back in 1995.

  18. 95% of the story is bogus by gmuslera · · Score: 4, Informative
    The original article say that they scan 40 millon sites an 10 millon emails each hour, and they are refering to thjis report (that also links to the full info, and video of the presentation of that info).

    Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.

  19. Google's fault for their dependence on linking by cenc · · Score: 3, Insightful

    Emails spam aside, I would say that most of that is Google's fault. The other 95% of content created on the internet is in an attempt to SEO web sites in the other 5% of the internet that people do potentially read or visit. Google encourages web masters to get in bound links, thus the whole industry of spamming sites, directories, blog feed sites, and so on that have one purpose and one purpose only: getting as many anchor text links pointed to sites as possible so they will rank higher in Google for key terms.

  20. 95% chance by kylben · · Score: 3, Funny

    I take it that means there is a 95% chance that this report is bogus, or malicious?

    --
    Insightful and funny are really the same thing, except one has a punch line.
  21. Looks like I'll have to change my sig by RudeIota · · Score: 2, Funny

    I'll have to change it from "Everything" to "95% of everything". :-(

    --
    Fact: Everything I say is fiction.
  22. The actual new vulnerabilities by Animats · · Score: 2, Informative

    First, here's the actual report, without any form to fill out. (Backup copy at WebCitation.) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.

    The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:

    • Anti-virus companies are slowing down. Average time to "patch: (really, release a new identifying signature) has increased from 22 hours to 46 hours. By the time the anti-virus companies catch up, the attack has changed. This indicates the uselessness of signature-based attack detection.
    • More attacks are successfully targeting search engines. Google is more vulnerable to hacked SEO than previously thought. Google Trends, which drives Google Suggest (the command completion in Google search boxes) is extremely vulnerable. (I've commented on that before.) "The average number of malicious sites in any Google search using hot/trending topics (as ranked by Google) by the end of the year stood at 13.7% for the top 100 results."
    • The "long tail" of the Web is becoming less important as more user generated content moves to the top 100 sites. More attacks now involve injection of hostile code into user generated content on major sites.

    The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.