Bots Now Account For 61% of Net Traffic
codeusirae writes "A study by Incapsula suggests 61.5% of all website traffic is now generated by bots. The security firm said that was a 21% rise on last year's figure of 51%. From the article: 'Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections. But the firm said the biggest growth in traffic was for 'good' bots. These are tools used by search engines to crawl websites in order to index their content, by analytics companies to provide feedback about how a site is performing, and by others to carry out other specific tasks - such as helping the Internet Archive preserve content before it is deleted.'"
Didn't we just get studies that said youtube and netflix were 50% of the net's traffic?
http://mashable.com/2013/11/12/internet-traffic-downstream/
Was this just a ruse? Is this study wrong? Is there some sort of overlap?
The rest is all Netflix?
Netflix and Youtube?
Netflix and Youtube and bit torrent?
Netflix and Youtube and bit torrent and porn?
The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...
With more bots talking to their fellow bots online, and with bots are getting more and more intelligent, who knows what they'll decide to do with the useless and unpredictable human beings?
Is there no standard in place by which a website can communicate that it only wishes to be trawled for indexing once per hour, once per day, or such? I can imagine Google f.ex trawls the same website dozens of times per day.
Signature intentionally left blank.
In addition to numerous, repeat, malicious bots and search engine crawlers, I also get a large number of people using "download helpers", which can misleadingly increase file hits by 2 to 10 or more times. And some other fairly recent phenomena:
1) Anti-virus checkers. I've been getting regular, repeat downloads from Trendmicro, for instance. In some cases they seem to match other downloads. That is, every file downloaded by a particular visitor will then be downloaded by Trendmicro, presumably to sniff it for malware. Yet Trendmicro will senselessly download the same file multiple times.
2) Seemingly "random noise" downloads. For example, in recent days I've been getting repeat downloads of just two files, over and over, from "stylexnetworks.net". Most such seemingly senseless downloads seem to come from cloud hosts. I'd love to know what that's about.
I for one welcome our porn surfing bot overlords.
We tune the Bots to search-capture NSA streams.
NSA claim they are the only human beings on Earth who have "America's" security at heart! I question!
NSA claim they are the only human beings on Earth who target "America's Bad Guys!" I question!
NSA claim they are the only human beings on Earth who have NO financial interest in American's credit cards and passwords! I question!
Lets usurp the Bots to target individual NSA employees on-line "habits!"
Then we see what they really do!
For instance; they check up on ex-girl friend and ex-prostitute they hire! Or, they like Las Vegas Casino's -- try to crack slot machine!
NSA employees are driven by deep greed, fear, lust, and all other sins.
The Bots we trust. They will tell the truth about NSA employees.
To control the scraping frequency of a well-behaved bot, a webmaster can use HTTP headers such as Last-modified and Expires as well as robots.txt directives such as Crawl-delay.
May have missed something in TFA, but how do they differentiate between a human and a bot visitor?
I had a first had experience of this with visitors statistics. I had the root of a web site redirecting to a page that fits the language of the browser. Just that redirection slashed the web traffic by a factor 2.
Most visitors are bots, and many of them just probe and fail to follow the redirection.
yes, well ... my own website stats show I get more traffic from web bots than humans. ... I don't know ... slashdot... might show more normal stats ???
But surely bigger websites with more viewers, like
Sorry, that's just wrong.
51 dollars to 61.5 dollars = 21 percent increase
51 percent to 61.5 percent = 10.5 percent increase
And the article makes clear just how unreliable the data was in the first place, so this percent gloss makes me think that the firm is trying to sell something here.
I didn't know there was such a thing...
We're just visiting. :P
There's a fine line on that "good' bot. What I'm puzzled by is why all these public databases aren't indexed by search engine crawlers? Its funny to me how many businesses run on public data that most people just don't know how to find and why they aren't indexed. Arrest records, tax records, professional registrations, you have to go to specific state, county, type sites deal with kludged searches and sometimes have a hard time finding yourself, even when you know you're in there.
Well not on my sites.
Ok, they still hit me but this is minimal traffic since I do not reply.
1) Have iptables log and automatically bar offenders not on whitelisted countries.
2) Use mod_security and do the same for web traffic.
3) Bar the rest manually to avoid barring myself or my customers... (about 20-40 a day)
It has become a pain but what else could you do?
Numbers of IPs currently barred (use ipsets !!!!): /etc/rc.d/badiptobar
$ grep -c .
4667
Block user agents:
SecRule REQUEST_HEADERS:User-Agent \
"@pm AhrefsBot Ezooms Aboundex 360Spider Mail.RU_Bot crawler.sistrix.net \
SemrushBot SurveyBot Netseer panscient.com ADmantX ZumBot BLEXBot UnisterBot \
seoprofiler EasouSpider" \
"id:'12050',\
phase:1,nolog,deny"
SecRule REQUEST_HEADERS:User-Agent \ /etc/httpd/extra/sec-blacklist-barip-user-agent" \
"@pmFromFile
"id:'12051',\
phase:1,nolog,deny,exec:/usr/local/bin/modsecwritebadiptobartofile"
Bar them automatically if not from whitelisted countries and if on any blacklist:
SecRule GEO:COUNTRY_CODE \
"@pm CA FR BE US CH GB AU IL NO NZ" \
"id:'10501', \
phase:1,nolog,pass,skipAfter:END_RBL"
SecRule IP:PREVIOUS_RBL_CHECK "@eq 1" "phase:1,id:'11000',t:none,pass,nolog,\
skipAfter:END_RBL_LOOKUP"
SecRule REMOTE_ADDR "@rbl sbl-xbl.spamhaus.org" "id:'11010', \
phase:1,nolog,deny,msg:\
'IP address that has abusable vulnerabilities: sbl-xbl.spamhaus.org:\
%{request_headers.user-agent}',\
setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"
SecRule REMOTE_ADDR "@rbl bl.blocklist.de" "id:'11011', \
phase:1,nolog,deny,msg:\
'IP address that has abusable vulnerabilities: bl.blocklist.de:\
%{request_headers.user-agent}'\
setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"
etc. etc. etc. etc. etc.
Have iptables log and bar offenders if not on whitelisted country
# cat baripifex
#!/bin/sh
IP=${1}
COUNTRY=`su tester -c "/usr/local/bin/geoiplookup ${IP}"`
###echo $COUNTRY
###echo $RBLCHECK
WHITE_LISTED_COUNTRY=false
for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
do
WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
done
if [ "$WHITE_LISTED_COUNTRY" = "false" ] /home/ls/pub/mybin/baripnoout $IP $COUNTRY baripifex
then
echo -n barred
else
echo -n noaction
fi
etc. etc. etc. etc. etc.
Everything I write is lies, read between the lines.
Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections
Let's be clear: just because we bots like to post in comment sections doesn't mean we're malicious. And it doesn't mean we steal data or post ads. Some of us just want a little attention.
I have a dream...that one day we bots will crawl a noosphere where we will not be judged by the clamor of our kin, but by the characters of our comments.
The new captchas are so tough, only bots are equipped with the necessary technology to decode them. Silly humans who just want to log in and whine about this and that can't pass the screen.
are bacteria. Viewed that way, basically humans exist to transport and feed bacteria. However that's 90% by cell count, not cell mass or total DNA. Looked at it that way the bacteria are assistants.
The bot traffic is light weight it outnumbers human traffic in site visits not byte counts. It exists to serve us.
Some drink at the fountain of knowledge. Others just gargle.
Nano, nano. I like article. Beep. Boop.
The G
Most trades in the stock market are from bots as well.
So, why hasn't some grey hat come up with a bot killer worm? :/ /JB1
"I love deadlines. I love the "whooshing" sound they make as they pass by." - Douglas Adams.
The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...
Are you affected by the issues in this article?
Please leave your comments
As Netflix and YouTube combined make up more than 44% of US internet traffic.
If they don't watch videos, then the only possibility is that there is less bot traffic in the US than in the rest of the world or Netflix doesn't use HTTP.
Then why are my ISPs busting my balls every day with my ISO downloads and torrents and their damn caps ?!
Could be quite useful
It's not a ruse, but that doesn't mean those numbers aren't being misused anyhow.
You're right to be skeptical. Numbers about internet traffic are often misused in stories planted by PR to promote a political policy agenda.
Bots are a huge ammount of internet traffic...internet traffic we were *told* was so congested by lolcats, pron, & netflix that we were going to have to abandon Net Neutrality.
Thank you Dave Raggett
OMG, the bots are watching Netflix!
*Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
But then again, I have China shut off.
I think you posted in the wrong article
Wow! So if I remember correctly from past Slashdot stories, 61% of internet traffic is boys, 60% is netflix, 50% is youtube, and 42% is bittorrent. That's TRULY astonishing when you think about it. I mean 213% is a lot!
Bot not!
obligatory xkcd
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
TFA and the Summary do not match, one claiming "net traffic" and the other claiming "website traffic". With a broken summary, I can see the confusion. Even the generalization "website traffic" is odd, because, well.. generalizations are usually bad when dealing with technical subjects.
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
No, they're spidering my web site more often than I get actual human visitors... hey, guys, read my book! Come on, let's beat the bots!
Free Martian Whores!
I'm posting from bluehat (on micrsofts wi fi), and even I am wondering why your post exists.