51% of Internet Traffic Is "Non-Human"
hypnosec writes "Cloud-based service Incapsula has revealed research indicating 51 per cent of website traffic is through automated software programs, with many programmed for malicious activity. The breakdown of an average site's traffic is as follows: 5% is due to hacking tools looking for an unpatched or new vulnerability within a site, 5% is scrapers, 2% is from automated comment spammers, 19% is the result of 'spies' collating competitive intelligence, 20% is derived from search engines (non-human traffic but benign), and only 49% is from people browsing the Internet."
http://funpics.classicfun.ws/var/albums/Funpics/On%20the%20Internet%20nobody%20knows%20you're%20a%20dog.jpg?m=1300661194
"only 49% from people browsing the Internet." I wonder how much of that 49% is porn.
"If any question why we died, Tell them because our fathers lied."
... PORN?!?
... wait, what?
I knew it!
My blog
Any webmaster should already know this, probably way more than 51% for websites in existence for several years.
Which of those categories do data analysis and aggregation tools fall into?
I'm thinking of user-focused tools like RSS Readers, Stock Quote graphers, etc... They're automated non-human tools which access websites, but it's not clear how they are being categorised...
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco
the article seems to be about websites, not the intetnet
Is such a huge load over advertising.
It doesn't link to any research, its simply and in house "research" after which they also suggest you that you really should use their service. So its to be taken with a grain of salt.
There is no research method described, or anything else.
Hey, now, I know the United States isn't exactly the only game in town anymore, but you guys could be a little more sensitive.
how much of that 49% is Reddit and 4chan?
Title says 51% of Internet Traffic.
Summary says 51% of Website Traffic.
Internet != Website.
If you run wordpress for your site... It's more like 50% Bots (search engines), 40% Comment Spam, and 8% Content Scanners and 2% Visitors....
and 51% is female. Since no one has ever seen a female on Internet i'd say we can now explain the non-human 51%.
I swear there is so much monitoring (is my site up?)... that should at least register on the scale.
Seriously. Do they have liberal arts majors writing the headlines at /. now?
I am becoming gerund, destroyer of verbs.
Do bot views count toward page views for advertising revenue?
If figuring out malicious traffic was this easy, we could get rid of it!
Did not realize there were that many furries out there. Though, it makes sense, we make the internets go.
Furries make the internet go.
Someone should invent porn that appeals to screen scrapers, then we'd REALLY see web traffic go wild!
Try using a calendar which has next month and year links (along with every day therein) and doesnt know googlebot is coming.....gigs. seriously.
Do they have people with doctorates and PHD's doing their "research." Or are they just pulling numbers out of their butt.
Could be both. This allegedly happened in some areas where researchers felt that a "controlled publication" of scientific evidence could bring more exposure to what they considered important issues... (climategate, peppered moths, Libby half-life, etc.)
lucm, indeed.
The internet is dangerous, buy our security product.
Here's the original ZDNet blog post. It's a longer article with more detail; it's also linked at the bottom of TFA, which seems to have plagiarized it. Compare the first paragraphs:
[TFA] Cloud-based service, Incapsula, has revealed research indicating that 51 per cent of website traffic is through automated software programs; with many programmed for the intent of malicious activity.
[ZDNet] Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.
The sentence structure and order of ideas is identical, and many phrases are the same or nearly the same. A high schooler should do better. Minor rephrasing is not sufficient.
That said, both articles are pretty much advertisements. The study doesn't appear to have attempted to actually be comprehensive (so it only used data from this one company). The point was apparently to give this cloud service provider some selling points for businesses to use their service to "secure" their sites. This story is yet another that shouldn't even have appeared on /.; shame on the editors who let it through.
Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.
and it just so happens that Incapsula has the perfect solution to save you from all this... for a price.
Anons need not reply. Questions end with a question mark.
I worked for an anti-spam provider > 90% of emails were spam some customers > 99%. That said though spam emails tend to be short, almost to the point of ridiculous. I don't remember the exact numbers but say the average email that is legitmate is about 50k (because of attachments skewing it, but still even legitimate email tends to be 5+ sentences). Along comes duffious spammer. Not only are they shooting off 10k emails per bot per hour, but they are all one sentence emails with a tinyurl link in them. The lack of size is one of the key indicators left once you remove the obvious keywords and the sending history. Kind of makes me wonder why the bots spamming other content are so chatty.
I guess if you are spamming forums you have to have a "comment" length message to send sometimes to look legitamate. You can't just say "go to http://tinyurl.com/growyourpenis" with out being obvious. But that said why do spammers get away with it in email but not in forums? I mean someone is clicking the links in the emails because it is a large business (likely multi-billion dollar). Hmm ... I'll start running the unsuspicious botnet on the forums posting email like spam and post forum spam length content to email accounts, I'll make millions, er well a few dollars anyways.
Bots send short emails usually for throughput reasons. Why waste bandwidth when you are both trying to use little enough so you don't get caught and your peak email send rate is inversely proportional to content size.
Another tidbit that I'm sure a bunch of people know but is worth throwing out there: spam with images, there is a reason for that. The images round trip to the spamers servers. Usually they set it up so that your email account is tagged somehow in the url that your viewer sends to their server. So opening the email "calls home" and tells the spammer "hey I got a real email addreess" (and likely someone gullable enough to look at spam). The spammer can then add your email address to a list of "live email accounts" which sell easily for 10X what a list of unconfirmed email addresses do. So ... if you don't recognize the sender don't even open it even if it is just your webmail client. If you do expect more spam.
Guilty as charged. I admit, I've been known to check out the competition from other sites to ensure I'm not falling behind the curve. My guess is that they perform a reverse DNS lookup of their IP logs and determine that the company's network I'm behind belongs in the same industry as theirs.
Life is not for the lazy.
However, based solely on the title, my reaction was "No shit, Sherlock". Or, to introduce the younger crowd to an "old saw"... See http://en.wikipedia.org/wiki/Occam's_razor, and ponder it well.
Then, GET OFF MY LAWN!
This issue is a bit more complicated than you think.
Last time I checked whenever I sent any data across the net, it was not human, but rather data.
With IPv6 no more wholesale scanning of the entire global address space in minutes time looking for expliotable hosts. No more 5 minutes to ownage of unpatched PCs and the associated waste of bandwidth.
No more self propogating worms using simple algorithms to divide and conquer the global network.
In the grand scheme of things it won't help much but better than nothing.
last time I had a personal website up, 60% of it was buffer overflow bots, 20% were old IIS exploit bots and 10% were slashdot scans whenever I made a post.
Really though, firewalls in the US should come with the entire Chinese, Russian, and Indian IP range blocked for incoming connections by default.
When it gets down to the the mythical 10% that human's supposedly use of their own information processing machine (their brains), will the net mind achieve sentience?
One man's pink plane is another man's blue plane.
..most of the slashdot posts! Now I know why!
Consider, for instance, lol cats and pedo bears. Two distinct mammals that have perplexed the likes of sir Attenborough for many office hours.
Testing the waters, we also have dramatic animals (it all began with a hamster), and the turtle kid. The latter a new breed of furry, that may prove more nuisance than entertainment.
Defining Statistics and Social Research
Skynet has become self-aware.
Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.
The interesting thing is that 51% is identifiable as bots. What about bots that are designed to emulate real users?
I mention that because I have written some bots that are designed to emulate users as closely as possible, so as to not be noticed by paranoid webmasters. Mine follow valid workflow scenarios, and even pause appropriate amounts of time between post backs, so I am fairly certain that they have gone unnoticed.
I don't think that I am more clever than the average hacker, so I am sure that others are doing this sort of thing, too.
HA! I just wasted some of your bandwidth with a frivolous sig!
HAT, ALUMINUM HAT NOW PLEASE. Who are these non humans filling up the pipes that lead into my house? ripping wire out in 3.2.1...
OMG Ponies!!! with Glitter!!!! I miss Pink
More than you might think. Not that those two are huge, but the same people are also everywhere else, including slashdot. Probably not so much on Facebook. Some people spend an inordinate amount of time online and have multiple personas. They spend so much more time online than most people they skew the statistics. You can't really say X% are doing A and Y% are doing B because mostly the same people are doing both.
There are fewer people online that is apparent.
"According to recent report, there were so many worms and counterworms loose in the data-net now, the machines had been instructed to give them low priority unless they related to a medical emergency." - The Shockwave Rider by John Brunner
Be careful. You do NOT want an Anti-Asimovian robot/AI being evil. Because we can sometimes be evil, the bot will ALWAYS be efficiently evil.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
50% human traffic is too much even for HTTP protocol.
It means semantic web concept if far from penetrating enough web. Human ability to perceive information is constant, does not change with time. Some might have illusion that we do improve this, but we are not.
All those automated HTTP protocol robots are doing their service to us to reduce this overload and facilitate, even the evil ones.
"5% is due to hacking tools looking for an unpatched or new vulnerability within a site, " those are wolves weeding out the weaklings
"5% is scrapers, " - that's good, need to be more
"2% is from automated comment spammers, " - those are truly bad guys, but 2% is not much
"19% is the result of 'spies' collating competitive intelligence, " - this is unexpectedly high, but I guess legit use.
"20% is derived from search engines (non-human traffic but benign), " - that's very vital for Web, without this current Web is unimaginable
"and only 49% is from people browsing the Internet." - only? scratch this and say: "whopping", because it's too much. We are still doing a lot of menial job weeding out crap out of internet manually.
It should be 1%
Think of all other complex systems surrounding us. A lot of time they spend in automatic mode, doing things without our control: our washing machines, our DVRs, our body.
The Web should be like that too. Web search engine provides us with page ranking sorting out pages that are most suitable for human consumption. A similar sort of ranking should exist for robotic searches that takes into account how easily the information is passed, and how many robots are reading it.
Compare human browsing to a walk in the Central Park. You have your paths intertwined, your walk around the reservoir.
And robotic browsing is like automobile roads crossing Central Park - completely different traffic routes, mostly passing through with intersections with human routes where it makes sense (bus stops, etc).
I do not believe in karma. "Funny"=-6. Do good and forbid evil. Yours, Oft-Offtopic Flamebaiting Troll.
You will never felt this way before, it can be more than you will need, http://www.superior-jersey.com/"> football shirt is exactly what you what. We will show you the best bag you ever seem. And this is the one better than any bag you have. Never hesitate it, this bag is belongs to you.
Not surprising.
I bet Jane makes up a large percentage of internet traffic.
"That's the way to do it" - Punch
What is "real-time entertainment"? Also none of that is reading the news or communicating?
It could be that ZDNet was copied by TFA, but...