95% of User-Generated Content Is Bogus
coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
Animals shit in ~95% of their habitat...
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
I got ripped in 2 weeks. learn how with secret juice formula.
That is so untrue. There is value in what I write.
We know.
You see? You see? Your stupid minds! Stupid! Stupid!
The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.
Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.
http://twitter.com/onion2k
BS in the summary. TFA says:
"95% of user-generated posts on Web sites are spam or malicious."
The user generated content is valid, it's just the "comments" sections which are getting hit by spambots. If this is front page news, then the fact that 95% of email is spam is news as well. Nothing to see here. Move along.
I don't think I've seen so many floating ads in a theoretically-legitimate site before. When I opened it, it grayed out the window and popped up trying to get me to fill out something...scrolling around, the mouse runs into these little green underlined words that pops up an ad thing you have to click to close...and after about twenty seconds, another floating window scrolled down the screen and parked in the middle.
That's a little too much cruft for me. They can keep their content, I don't want it.
buy this deluxe duct tape developed by nasa scientists to put yourself back together again. Just three easy installments of $99.99.
I reached the same conclusion reading slashdot.
And in addition, the report itself doesn't even explain the result. It's a bullet point at the beginning of the report, but there's no explanation or analysis.
Breakfast served all day!
71% of statistics are useless ...
...95% probability actually. So I didn't bother.
These posts express my own personal views, not those of my employer
They are right ! There is so much rubbish on /. nowadays, I can not even find penis enlargement comments anymore :-(
I guess that goes in hand with 95% of kdawson's submissions being crap and not worth the time.
Be seeing you...
Every single hour the Internet HoneyGrid scans some 40 million websites for malicious code as well as 10 million emails for unwanted content and malicious code.
So 40 million sites per hour is 960 million sites per day. While wikipedia says that there over 25 billion pages but can that number be accurate?
The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.
Well, I don't care if 99% of user-generated content it is crap; people need to be free to create it, because some individual in the other 1% may just come up with the cure for cancer, and despite whatever it does to Big Pharma's profits, everyone needs to be able to hear about it.
In related news, approximately 90% of the cells in the human body are bacteria. Fortunately for us, the human body has an effective immune system. When are computers going to get one?
In human terms, the majority of computers have AIDS. And we all know where they caught it.
You can't write that the United Kingdom.
"95% of User Generated Content is either malicious in nature or spam"
"Never attribute to malice that which can be adequately explained by stupidity"
So I read "95% of User Generated Content is stupid" I agree, count me in.
I can think of several areas, whose web sites seem - almost always - to be "spot on" technically, informationally & operationally.
How can this "95%" statistic have any meaning or usefullness?
We must ask: "Can you break that down?" (eg, by topic, field, application area, etc.)
There's way too much data out there on the question, :-./
for a single number to be at all useful, except - possibly
- by commercial sites, who might try to convince us
that [only] their sites have non-bogus content...
(Now, I'll see if there are any break-downs of this statistic, :-)
eg, by reading the cited report...
In human terms, the majority of computers have AIDS. And we all know where they caught it.
Your mom?
...no wait, make that 95%.
"Ninety percent of everything is crud."
http://en.wikipedia.org/wiki/Sturgeon's_Law
I would say that 95% of email is commercial in nature, and not "user generated content". To me "UGC" is something that people who are actually active users (consumers as well as creators) of a service generate... not something injected into the service from outside by predators.
Out of the 5% that are not generated by spambots, 99% is still generated by idiots.
... a staggering 95% of User Generated Content is either malicious in nature or spam.
Considering 95% of internet users are malicious (see GIFT), it's hardly staggering that 95% of user generated content is malicious too. :p
"Convictions are more dangerous enemies of truth than lies."
Just like the posts on /. ?
In order to form an immaculate member of a flock of sheep one must, above all, be a sheep.
yet another reason to hate mankind.
You all suck.
Ugh.
o hai
95% is intentionally bad, the other 5% is just shit
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
If you use an ISP that hijacks unregistered domains, such as Virgin, to land you on their search page then that statistic goes up to 99.99%
Phillip.
Property for sale in Nice, France
Apropos of nothing.
http://soundclick.com/share?songid=8720416
Spam, perhaps, but not necessarily bogus.
95% of KDawson generated content is Bullshit
As I discovered wit on of my sites a few years ago, someone had installed a site within mine and in investigating it I discovered there are plenty other siets with teh same issue, many even on Source Forge.
My advice is to do an inventory of the files on your site, to see if you to have such a problem.
Almost all sites are spam (of the kind that steals old urls) or redirectors to malware places.
Remember next time when some fool announces proudly that internet just reached a bazillion of pages.
We've seen this before, with Usenet, BBS's, MUD's, and Email. The advertisers, and the trolls, find it easy to spew their material across many thousands of targets, and get enough money or gratification from doing so that it funds their efforts. It doesn't even have to make money: they just have to believe that it _can_ make money, and the professionals will simply continue.
Whatever would make anyone think that "User Generated Content" forums would be any different?
You missed your assessment by ~5%.
-dZ.
Carol vs. Ghost
Sorry. That's like going to a municipal dump, pointing at the fields of waste and declaring that 95% of what Americans eat is plastic. The problem with this statement is that it includes this garbage in "user generated content".
that 95% of spam and bogus content is generated by a small fraction of the people that uses internet. Not everybody is a spammer, and not everybody forwards every chain email they receive. Fot instance, 95% of the spam in my inbox comes from Russian/Chinese addresses. I do not think a large percentage of the Russian or Chinese population are engaged in spamming. The other 5% comes from family and friends forwarding things. It is mostly content that recirculates, as usually none of it is generated by the sender. So while 95% of what is there may be bogus, my guess is that a small percentage of the people who uses internet generates that.
"Statistics can be made to prove anything Kent, 16% of all people know that" - Homer Simpson.
What percentage of non-user generated content is fake?
More like 99% if you include the non malicious stupidity into the mix.
---- Booth was a patriot ----
Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.
Emails spam aside, I would say that most of that is Google's fault. The other 95% of content created on the internet is in an attempt to SEO web sites in the other 5% of the internet that people do potentially read or visit. Google encourages web masters to get in bound links, thus the whole industry of spamming sites, directories, blog feed sites, and so on that have one purpose and one purpose only: getting as many anchor text links pointed to sites as possible so they will rank higher in Google for key terms.
Living in Chile
I take it that means there is a 95% chance that this report is bogus, or malicious?
Insightful and funny are really the same thing, except one has a punch line.
Just fill in bogus data (the form does not check it) to get the report.
Given that 95% of the emails are spam this means that 100% of the non-spam content is valid.
Once again shown to be overly optimistic.
http://en.wikipedia.org/wiki/Sturgeon's_Law#.E2.80.9CNinety_percent_of_everything_is_crud.E2.80.9D
I'll have to change it from "Everything" to "95% of everything". :-(
Fact: Everything I say is fiction.
95% of statistics are fudged to give the desired results.
it turns out that 95% of the Slashdot users think the report was about all internet content instead of just user generated content and they responded to that instead.
No big surprise there, huh?
No one ever had to evacuate a city because the solar panels broke!
....malicious and as useless as spam.
"you'll have to give up name, rank, and serial number."
Dear god, none of you have /b/ experience?
proud caffeine whore
of the remaining 5%, 95% of that is also SPAM, or malicious or something? We already know about SPAM percentages, so I assume this is measuring something new, like non-automated emails contain huge amounts of things that people consider SPAM.
First, here's the actual report, without any form to fill out. (Backup copy at WebCitation.) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.
The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:
The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.
If you want to read the whole report, just lie about about all the personal data they want you to enter. Everyone else does, apparently.
By the way, what about "numbers posts"? There are cases of spam posts being made that are very similar in style to the transmissions of numbers stations - just strings of short blocks of numbers. Has anyone ever found out what those are about? My guess is that it's some botnet's C&C channel but that's just a guess.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
It might be humorous/infuriating to note that their website has an irritating amount of intrusive popup ads. My ABP somehow got turned off.
Is this irony? Hypocrisy? I don't know.
"Ripped off" is more accurate!
I think that figure is way too low if they include spam in the equation. I don't think that Spam is 'user generated content' - it is more likely 'user targeted content'. Maybe I need to frag M$ into this as an example: 'Microsoft dominates 100% of the Windows Desktop Market'...
Excuse me, but please get off my Pennisetum Clandestinum, eh!
I've found about the same ratio to be true regarding TV content.
...is porn, trolling, flame wars, and 4chan.
In other news, 87.6% of all statistics are made up on the spot.
I'm looking at you Scribd. Why Google can't figure out how to push your spam results off the front result page puzzles me since they have a method to keep the Wikipedia clones off the front page. I can't wait for you to experience the same fate.
I am becoming gerund, destroyer of verbs.
This article has a 95% chance of being bogus.
Often wrong but never in doubt.
I am Jack9.
Everyone knows me.
Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
This being Slashdot - how was that sentence even relevant?
#DeleteChrome
I've seen cases where spammers, unable to reliably defeat the administrators of a popular forum, will simply copy the information on that forum onto another forum and then spam the hell out of it. Forums on the use of Microsoft tools seem to be particularly popular targets.
90% of everything is crud
Especially google.
If you search for certain topics ,all you get are spam and $$pay sites.
Does that mean that 95% of the reputation I got was bogus? Now I don't feel nearly so proud...
We're outside of the six sigma specification.