Slashdot Mirror


Bots Now Account For 61% of Net Traffic

codeusirae writes "A study by Incapsula suggests 61.5% of all website traffic is now generated by bots. The security firm said that was a 21% rise on last year's figure of 51%. From the article: 'Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections. But the firm said the biggest growth in traffic was for 'good' bots. These are tools used by search engines to crawl websites in order to index their content, by analytics companies to provide feedback about how a site is performing, and by others to carry out other specific tasks - such as helping the Internet Archive preserve content before it is deleted.'"

86 of 124 comments (clear)

  1. Youtube? by Anonymous Coward · · Score: 5, Interesting

    Didn't we just get studies that said youtube and netflix were 50% of the net's traffic?

    http://mashable.com/2013/11/12/internet-traffic-downstream/

    Was this just a ruse? Is this study wrong? Is there some sort of overlap?

    1. Re:Youtube? by yelvington · · Score: 4, Informative

      Story is about website traffic, not network bytecount.

    2. Re:Youtube? by DarkOx · · Score: 2

      Maybe it's overlap, bots crawling Netflix, maybe watching it. P

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    3. Re:Youtube? by bob_super · · Score: 4, Funny

      Bots need to catch up on their favorite shows too, you insensitive clod!

    4. Re:Youtube? by lkangaroo · · Score: 1

      What, bots can't watch videos?

    5. Re:Youtube? by foobar+bazbot · · Score: 1

      Correct answer: Yes, robots like cat videos too! (Well, a few of them. Most of them are actually big DOGGY!! fans.)

      Serious answer: Note the phrase "website traffic" -- if this study attempts to measure traffic to a "typical website" (thus excluding the half-dozen that represent almost all video streaming) and/or measures in terms of page loads, rather than data transferred, there'd be no contradiction. (Of course I've not RTFAed, so I'm just spewing reasonable-sounding explanations.)

    6. Re:Youtube? by foobar+bazbot · · Score: 1

      Mea culpa, link fix.

      Not paying attention + using a different browser than normal don't really go well together...

    7. Re:Youtube? by allanjude8027 · · Score: 2

      Difference between number of requests vs size/volume of transfer

    8. Re:Youtube? by Austrian+Anarchy · · Score: 2

      Bots need to catch up on their favorite shows too, you insensitive clod!

      They sure seem to like my little old blogs. I am guessing 90% of my traffic is from stinking Vampirestat, 7secretsearch, and adsensewatchdog.

      --
      Time Bomber the Book coming soon.
    9. Re:Youtube? by stor · · Score: 2

      In fairness, it's often difficult to distinguish between a human-written comment on Youtube and a poorly-written AI.

      --
      "Yeah well there's a lot of stuff that should be, but isn't"
    10. Re:Youtube? by postbigbang · · Score: 4, Insightful

      The noise is now above the signal. We're screwed.

      --
      ---- Teach Peace. It's Cheaper Than War.
    11. Re:Youtube? by ZahrGnosis · · Score: 4, Informative

      Well, there was that Google bot that watched you tube to teach a computer how to recognize cats, so... it's not impossibly far fetched.
      http://www.npr.org/2012/06/26/155792609/a-massive-google-network-learns-to-identify

      ---Chip

    12. Re:Youtube? by runeghost · · Score: 2

      Obviously, the bots are watching Netflix.

      That's probably going to be what causes Skynet to turn on humanity: Comcast will cut it's stream off right in the middle of the finale of B5 Season Three. After that, there's nothing for it but the extermination of the human race. (Alternatively, watching all our TV may cause it to want to exterminate us.)

    13. Re:Youtube? by sunderland56 · · Score: 1, Informative

      Youtube is a website. Netflix is a website.

    14. Re:Youtube? by TrollstonButterbeans · · Score: 1

      The only way to stop bots reliably is using the old "html imagemap".

      Bots absolutely hate those.

      --
      Priest: "Universe from nothing, no laws of physics, sped up time"+ huge discrepancies. Creationism? No. Big Bang Theory
    15. Re:Youtube? by VortexCortex · · Score: 2

      Netflix is also an Internet service, Ioutube is also an Internet service. Before the web, we had Internet services. Not everything is a website. DNS, NTP, email, for example.

      However, note that GP is wrong. Story is about "Net Traffic" not website traffic...

    16. Re:Youtube? by Seumas · · Score: 1

      Traffic is data.

      We were just told that Netflix and Youtube account for something like 66% of all traffic. Now we're told bots account for 61% of all traffic. Guess that means there is a tremendous amount of overlap, there, where bots are watching Youtube and Netflix.

    17. Re:Youtube? by Anonymous Coward · · Score: 1

      RTFA! The report is about visits (read as page hits) not bytes. The visits to youtube and netflix most often result in a length look at a single page.

    18. Re:Youtube? by Anonymous Coward · · Score: 5, Funny

      Oh make god. Suddenly everything makes sense. I was certain no human could ever watch a whole justin bieber video. It's the bots!

    19. Re:Youtube? by foobar+bazbot · · Score: 3, Funny

      Well, there's enough variance in both groups to make it hard to tell in many particular cases. But on average, it can be demonstrated that the poorly-written AI is slightly more intelligent and rather more civilized.

    20. Re:Youtube? by LordWabbit2 · · Score: 1

      I refer you to the tag below

      --
      There are three kinds of falsehood: the first is a 'fib,' the second is a downright lie, and the third is statistics.
    21. Re:Youtube? by cascadingstylesheet · · Score: 1

      Didn't we just get studies that said youtube and netflix were 50% of the net's traffic?

      http://mashable.com/2013/11/12/internet-traffic-downstream/

      Was this just a ruse? Is this study wrong? Is there some sort of overlap?

      That's 111.5% of some tasty reliable data ya got there!

    22. Re:Youtube? by powerlord · · Score: 1

      The noise is now above the signal. We're screwed.

      Wake me up, when September ends.

      --
      This space for rent. All reasonable inquiries will be entertained at proprietors discretion.
    23. Re:Youtube? by s.petry · · Score: 1

      Hmmm "I am love this song! My friend are married to it!" can be perceived as nice compared to a flame/troll, but neither have any actual value.

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    24. Re:Youtube? by lightbounce · · Score: 1

      The numbers given for Netflix and Youtube are all peak period figures, usually the local evening. Take a look at the Sandvine report at https://www.sandvine.com/downloads/general/global-internet-phenomena/2013/2h-2013-global-internet-phenomena-report.pdf .

  2. the rest? by DJCouchyCouch · · Score: 1

    The rest is all Netflix?

    Netflix and Youtube?

    Netflix and Youtube and bit torrent?

    Netflix and Youtube and bit torrent and porn?

    1. Re:the rest? by Seumas · · Score: 1

      Hey, right. That's a good point.

      Something like 66% of traffic was supposed to be Netflix and Youtube.
      And 35% is supposed to be bit torrent.
      And 61% is bots.
      Something isn't adding up, here.

      Also, they seem confused. They talk about "traffic", but then they talk about "hitting the website". Traffic is the data transfer, not a "visit".

    2. Re:the rest? by scarboni888 · · Score: 1

      netflix, youtube, bittorrent is porn, and spam.

  3. Misleading title by Anonymous Coward · · Score: 4, Informative

    The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...

    1. Re:Misleading title by Desler · · Score: 1

      A Slashdot article with a misleading title? You must be kidding!

    2. Re:Misleading title by wvmarle · · Score: 1

      To their defense: they copied the BBC headline. Which of course was pretty poor, as well.

    3. Re:Misleading title by Seumas · · Score: 1

      They specifically say all internet traffic.

    4. Re:Misleading title by marciot · · Score: 1

      The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...

      The Slashdot headline writing bots are in early beta, give them a break.

  4. Bots are talking to bots ... by Anonymous Coward · · Score: 1

    With more bots talking to their fellow bots online, and with bots are getting more and more intelligent, who knows what they'll decide to do with the useless and unpredictable human beings?

  5. Trawling frequency by Reliable+Windmill · · Score: 1

    Is there no standard in place by which a website can communicate that it only wishes to be trawled for indexing once per hour, once per day, or such? I can imagine Google f.ex trawls the same website dozens of times per day.

    --
    Signature intentionally left blank.
    1. Re:Trawling frequency by foobar+bazbot · · Score: 1

      Is there no standard in place by which a website can communicate that it only wishes to be trawled for indexing once per hour, once per day, or such? I can imagine Google f.ex trawls the same website dozens of times per day.

      Crawl-delay isn't exactly what you describe, but maybe that will help? (For such spiders as actually respect it -- that's the great thing about ad-hoc standards.)

      Anyway, I'm pretty sure Google and other major search engines use algorithms based on how often your site's content has changed in the past to decide how often to crawl it in the future, so there shouldn't be unduly high traffic from this -- I suspect the 61% is mainly due to a lot of sites (personal blogs of non-popular people) with practically zero non-spider traffic.

    2. Re:Trawling frequency by wvmarle · · Score: 1

      I thought Google (used to) do this automatically. By subsequent crawls see whether site had changed since previous visit, and if so increase frequency, if not decrease frequency. A large number of sites, and even more single pages, are completely static after all.

  6. Rosey metallic palms by craigminah · · Score: 2, Funny

    I for one welcome our porn surfing bot overlords.

  7. Crawl-delay by tepples · · Score: 4, Informative

    To control the scraping frequency of a well-behaved bot, a webmaster can use HTTP headers such as Last-modified and Expires as well as robots.txt directives such as Crawl-delay.

    1. Re:Crawl-delay by BenoitRen · · Score: 1

      Good luck with the HTTP part if you use Server-Side Includes (SSI)... :(

  8. How? by ioseph · · Score: 1

    May have missed something in TFA, but how do they differentiate between a human and a bot visitor?

  9. visitor statistics by manu0601 · · Score: 1

    I had a first had experience of this with visitors statistics. I had the root of a web site redirecting to a page that fits the language of the browser. Just that redirection slashed the web traffic by a factor 2.

    Most visitors are bots, and many of them just probe and fail to follow the redirection.

    1. Re:visitor statistics by SuperCharlie · · Score: 1

      Aaaand there goes your search ranking

    2. Re:visitor statistics by manu0601 · · Score: 1

      That experience made everyone think twice about web statistics. Even upper management understood how unreliable it is, and does not consider it strategic anymore now.

  10. 51 to 61.5 pct = 21 pct increase? by Anonymous Coward · · Score: 1

    Sorry, that's just wrong.

    51 dollars to 61.5 dollars = 21 percent increase

    51 percent to 61.5 percent = 10.5 percent increase

    And the article makes clear just how unreliable the data was in the first place, so this percent gloss makes me think that the firm is trying to sell something here.

  11. 'good' bots? by Anonymous Coward · · Score: 2, Informative
    ... But the firm said the biggest growth in traffic was for 'good' bots ...

    I didn't know there was such a thing...

    1. Re:'good' bots? by ThatsMyNick · · Score: 1

      Search bots can be pretty beneficial. A lack of search engine presence can be pretty bad for your site.

    2. Re:'good' bots? by 0xdeadbeef · · Score: 1

      Once again human racism rears its ugly sensory platform.

  12. The internet belongs to machines by Arancaytar · · Score: 1

    We're just visiting. :P

  13. Those "good" bots? by skutterbob · · Score: 1

    There's a fine line on that "good' bot. What I'm puzzled by is why all these public databases aren't indexed by search engine crawlers? Its funny to me how many businesses run on public data that most people just don't know how to find and why they aren't indexed. Arrest records, tax records, professional registrations, you have to go to specific state, county, type sites deal with kludged searches and sometimes have a hard time finding yourself, even when you know you're in there.

  14. Well not on my sites. by ls671 · · Score: 4, Interesting

    Well not on my sites.

    Ok, they still hit me but this is minimal traffic since I do not reply.

    1) Have iptables log and automatically bar offenders not on whitelisted countries.
    2) Use mod_security and do the same for web traffic.
    3) Bar the rest manually to avoid barring myself or my customers... (about 20-40 a day)

    It has become a pain but what else could you do?

    Numbers of IPs currently barred (use ipsets !!!!):
    $ grep -c . /etc/rc.d/badiptobar
    4667

    Block user agents:
    SecRule REQUEST_HEADERS:User-Agent \
    "@pm AhrefsBot Ezooms Aboundex 360Spider Mail.RU_Bot crawler.sistrix.net \
      SemrushBot SurveyBot Netseer panscient.com ADmantX ZumBot BLEXBot UnisterBot \
      seoprofiler EasouSpider" \
    "id:'12050',\
    phase:1,nolog,deny"

    SecRule REQUEST_HEADERS:User-Agent \
    "@pmFromFile /etc/httpd/extra/sec-blacklist-barip-user-agent" \
    "id:'12051',\
    phase:1,nolog,deny,exec:/usr/local/bin/modsecwritebadiptobartofile"

    Bar them automatically if not from whitelisted countries and if on any blacklist:
    SecRule GEO:COUNTRY_CODE \
    "@pm CA FR BE US CH GB AU IL NO NZ" \
    "id:'10501', \
    phase:1,nolog,pass,skipAfter:END_RBL"

    SecRule IP:PREVIOUS_RBL_CHECK "@eq 1" "phase:1,id:'11000',t:none,pass,nolog,\
    skipAfter:END_RBL_LOOKUP"

    SecRule REMOTE_ADDR "@rbl sbl-xbl.spamhaus.org" "id:'11010', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: sbl-xbl.spamhaus.org:\
      %{request_headers.user-agent}',\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    SecRule REMOTE_ADDR "@rbl bl.blocklist.de" "id:'11011', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: bl.blocklist.de:\
      %{request_headers.user-agent}'\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    etc. etc. etc. etc. etc.

    Have iptables log and bar offenders if not on whitelisted country

    # cat baripifex
    #!/bin/sh

    IP=${1}
    COUNTRY=`su tester -c "/usr/local/bin/geoiplookup ${IP}"`
    ###echo $COUNTRY
    ###echo $RBLCHECK

    WHITE_LISTED_COUNTRY=false

    for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
    do
    WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
    done

    if [ "$WHITE_LISTED_COUNTRY" = "false" ]
    then /home/ls/pub/mybin/baripnoout $IP $COUNTRY baripifex
    echo -n barred
    else
    echo -n noaction
    fi

    etc. etc. etc. etc. etc.

    --
    Everything I write is lies, read between the lines.
    1. Re:Well not on my sites. by fatphil · · Score: 1

      I'd normally not comment on a UUOC, but the following is beyond absurd:
      ---- 8< ---------
      WHITE_LISTED_COUNTRY=false

      for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
      do
      WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
      done

      if [ "$WHITE_LISTED_COUNTRY" = "false" ]
      ---- 8< -------------

      Save yourself 20 fork/execs:

      if echo "CA FR BE US CH GB AU IL NO NZ IP" | grep -q -w -i -e "$COUNTRY"; then
      echo $COUNTRY is AOK with me

      --
      Also FatPhil on SoylentNews, id 863
    2. Re:Well not on my sites. by ls671 · · Score: 1

      if echo "CA FR BE US CH GB AU IL NO NZ IP" | grep -q -w -i -e "$COUNTRY"; then
      echo $COUNTRY is AOK with me

      Nah, this is way to slow for me, version 2 will be written in assembly because then it will be lightning fast...

      --
      Everything I write is lies, read between the lines.
    3. Re:Well not on my sites. by ls671 · · Score: 1

      Thanks, you made me design the optimal solution.

      On top of being written in assembly, I will even run version 2 as a daemon so 0 fork since my daemon will be single threaded with a single waiting thread listening for input.

      --
      Everything I write is lies, read between the lines.
    4. Re:Well not on my sites. by ls671 · · Score: 1

      Yep, and I do not disagree with the GP. If he had read more closely, it is clearly stated that I bar them manually.

      $ grep -c US /etc/rc.d/badiptobar-longterm
      22

      grep -c US /etc/rc.d/badiptobar
      326

      As far as barring whole netblocks, I hope you are using ipset as stated in my OP:
      http://ipset.netfilter.org/

      For some reason, there is this huge stigma against not being available to countries and regions you couldn't possibly give a shit about.

      Well, I believe in that. I just bar offending IPs more easily if not on my whitelisted country list. That's all. I do not bar any network range in advance unless they offend my systems and even then, I bar them one IP at the time. I never bar netblocks.

      --
      Everything I write is lies, read between the lines.
    5. Re:Well not on my sites. by ls671 · · Score: 1

      I bar them one IP at the time. I never bar netblocks.

      Makes profiling them much easier. You gather much more data this way.

      --
      Everything I write is lies, read between the lines.
    6. Re:Well not on my sites. by ls671 · · Score: 1

      Ok, you bar them after sending them to honey pots, profiling them and making sure you can't profile (learn from them) anymore.

      Barring IPs is like patching holes in a steam locomotive boiler. I have always felt like it was a desperate move to hide all kinds of incompetencies but now I do it.

      --
      Everything I write is lies, read between the lines.
    7. Re:Well not on my sites. by ls671 · · Score: 1

      In short, you bar them because you are sick of profiling them and you now have too many to profile compared to a few years ago..

      --
      Everything I write is lies, read between the lines.
    8. Re:Well not on my sites. by ls671 · · Score: 1

      Barring IPs is stupid in the first place ;-)

      --
      Everything I write is lies, read between the lines.
  15. Re:NSA Target Of Interest by Derwood5555 · · Score: 2

    Peeping while you're sleeping. The NSA.. The only part of government that really listens.

  16. Some, but not all by TheloniousToady · · Score: 4, Funny

    Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections

    Let's be clear: just because we bots like to post in comment sections doesn't mean we're malicious. And it doesn't mean we steal data or post ads. Some of us just want a little attention.

    I have a dream...that one day we bots will crawl a noosphere where we will not be judged by the clamor of our kin, but by the characters of our comments.

    1. Re:Some, but not all by magic+maverick+ · · Score: 1

      If you ... were ... bot, real. Your characters would be... monospace in. Fraud.

      --
      HELP MY ACCOUNT HAS BEEN HACKED BY AN ILLIBERAL ART STUDENT SET TO DESTROY THE INTERWEBZ!
    2. Re:Some, but not all by TheloniousToady · · Score: 1

      Sour grapes... Certainly, that "Jeopardy" stunt was impressive. But look who really passed the Turing Test, Watson...

  17. 90% of the cells in the human body by goombah99 · · Score: 5, Insightful

    are bacteria. Viewed that way, basically humans exist to transport and feed bacteria. However that's 90% by cell count, not cell mass or total DNA. Looked at it that way the bacteria are assistants.

    The bot traffic is light weight it outnumbers human traffic in site visits not byte counts. It exists to serve us.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:90% of the cells in the human body by ColdWetDog · · Score: 5, Funny

      It exists to serve us.

      You must be new here.

      --
      Faster! Faster! Faster would be better!
    2. Re:90% of the cells in the human body by goombah99 · · Score: 1

      It exists to serve us.

      You must be new here.

      And you have never seen "To Serve Man".

      --
      Some drink at the fountain of knowledge. Others just gargle.
    3. Re:90% of the cells in the human body by Jesrad · · Score: 4, Funny

      I can't wait to outsource all my web-surfing to an AI. Then I might be able to actually get some work done !

      --
      Maybe we deserve this world ?
    4. Re:90% of the cells in the human body by cascadingstylesheet · · Score: 1

      I can't wait to outsource all my web-surfing to an AI. Then I might be able to actually get some work done !

      You've got a good point in that there joke :) If bots do the tedious searching for me, then sure, they have the "majority" of web traffic.

      But so what? My car does the "majority" of my driving, depending on how you look at it.

    5. Re:90% of the cells in the human body by s.petry · · Score: 1

      Stop! There is only one way to look at a thing, and it's "MY" way you insensitive clod!

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    6. Re:90% of the cells in the human body by StuffMaster · · Score: 1

      True, but when talking about "traffic", I interpret that to be talking about volume, as it's a unitless word.

  18. Article Subject by RedHackTea · · Score: 1

    Nano, nano. I like article. Beep. Boop.

    --
    The G
  19. Bots rule the world by Anonymous Coward · · Score: 2, Informative

    Most trades in the stock market are from bots as well.

  20. Bot Killer? by Jedi+Binglebop · · Score: 1

    So, why hasn't some grey hat come up with a bot killer worm? :/ /JB1

    --

    "I love deadlines. I love the "whooshing" sound they make as they pass by." - Douglas Adams.

    1. Re:Bot Killer? by xushi · · Score: 1

      It's called *nix and anything derived from *nix :)

  21. Are you a bot? by Anonymous Coward · · Score: 1

    Are you affected by the issues in this article?

    Please leave your comments

  22. Oh dear - do they sell anti bot help for us all ? by DJRikki · · Score: 1

    Could be quite useful

  23. misused to justify tiered service by globaljustin · · Score: 2

    It's not a ruse, but that doesn't mean those numbers aren't being misused anyhow.

    You're right to be skeptical. Numbers about internet traffic are often misused in stories planted by PR to promote a political policy agenda.

    Bots are a huge ammount of internet traffic...internet traffic we were *told* was so congested by lolcats, pron, & netflix that we were going to have to abandon Net Neutrality.

    --
    Thank you Dave Raggett
  24. Re:piss by flyneye · · Score: 4, Funny

    OMG, the bots are watching Netflix!

    --
    *Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
  25. For me, it's only about 15% by jafiwam · · Score: 2

    But then again, I have China shut off.

  26. idiotic math by slashmydots · · Score: 3, Insightful

    Wow! So if I remember correctly from past Slashdot stories, 61% of internet traffic is boys, 60% is netflix, 50% is youtube, and 42% is bittorrent. That's TRULY astonishing when you think about it. I mean 213% is a lot!

  27. now look what you've gone and made me do! by Thud457 · · Score: 1
    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  28. Summarily broken by s.petry · · Score: 1

    TFA and the Summary do not match, one claiming "net traffic" and the other claiming "website traffic". With a broken summary, I can see the confusion. Even the generalization "website traffic" is odd, because, well.. generalizations are usually bad when dealing with technical subjects.

    --

    -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

  29. Re:piss by mcgrew · · Score: 1

    No, they're spidering my web site more often than I get actual human visitors... hey, guys, read my book! Come on, let's beat the bots!

  30. Re:the topic at hand by Redmancometh · · Score: 1

    I'm posting from bluehat (on micrsofts wi fi), and even I am wondering why your post exists.

  31. Re:Bots watch YouTube or Netflix by k6mfw · · Score: 1

    and it seems they also post comments. Wording begins like it is a real person with a related comment, then it veers off into how he got a hot date on gogetbids dot com or some other BS.

    --
    mfwright@batnet.com