Slashdot Mirror


Bots Now Account For 61% of Net Traffic

codeusirae writes "A study by Incapsula suggests 61.5% of all website traffic is now generated by bots. The security firm said that was a 21% rise on last year's figure of 51%. From the article: 'Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections. But the firm said the biggest growth in traffic was for 'good' bots. These are tools used by search engines to crawl websites in order to index their content, by analytics companies to provide feedback about how a site is performing, and by others to carry out other specific tasks - such as helping the Internet Archive preserve content before it is deleted.'"

16 of 124 comments (clear)

  1. Youtube? by Anonymous Coward · · Score: 5, Interesting

    Didn't we just get studies that said youtube and netflix were 50% of the net's traffic?

    http://mashable.com/2013/11/12/internet-traffic-downstream/

    Was this just a ruse? Is this study wrong? Is there some sort of overlap?

    1. Re:Youtube? by yelvington · · Score: 4, Informative

      Story is about website traffic, not network bytecount.

    2. Re:Youtube? by bob_super · · Score: 4, Funny

      Bots need to catch up on their favorite shows too, you insensitive clod!

    3. Re:Youtube? by postbigbang · · Score: 4, Insightful

      The noise is now above the signal. We're screwed.

      --
      ---- Teach Peace. It's Cheaper Than War.
    4. Re:Youtube? by ZahrGnosis · · Score: 4, Informative

      Well, there was that Google bot that watched you tube to teach a computer how to recognize cats, so... it's not impossibly far fetched.
      http://www.npr.org/2012/06/26/155792609/a-massive-google-network-learns-to-identify

      ---Chip

    5. Re:Youtube? by Anonymous Coward · · Score: 5, Funny

      Oh make god. Suddenly everything makes sense. I was certain no human could ever watch a whole justin bieber video. It's the bots!

    6. Re:Youtube? by foobar+bazbot · · Score: 3, Funny

      Well, there's enough variance in both groups to make it hard to tell in many particular cases. But on average, it can be demonstrated that the poorly-written AI is slightly more intelligent and rather more civilized.

  2. Misleading title by Anonymous Coward · · Score: 4, Informative

    The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...

  3. Crawl-delay by tepples · · Score: 4, Informative

    To control the scraping frequency of a well-behaved bot, a webmaster can use HTTP headers such as Last-modified and Expires as well as robots.txt directives such as Crawl-delay.

  4. Well not on my sites. by ls671 · · Score: 4, Interesting

    Well not on my sites.

    Ok, they still hit me but this is minimal traffic since I do not reply.

    1) Have iptables log and automatically bar offenders not on whitelisted countries.
    2) Use mod_security and do the same for web traffic.
    3) Bar the rest manually to avoid barring myself or my customers... (about 20-40 a day)

    It has become a pain but what else could you do?

    Numbers of IPs currently barred (use ipsets !!!!):
    $ grep -c . /etc/rc.d/badiptobar
    4667

    Block user agents:
    SecRule REQUEST_HEADERS:User-Agent \
    "@pm AhrefsBot Ezooms Aboundex 360Spider Mail.RU_Bot crawler.sistrix.net \
      SemrushBot SurveyBot Netseer panscient.com ADmantX ZumBot BLEXBot UnisterBot \
      seoprofiler EasouSpider" \
    "id:'12050',\
    phase:1,nolog,deny"

    SecRule REQUEST_HEADERS:User-Agent \
    "@pmFromFile /etc/httpd/extra/sec-blacklist-barip-user-agent" \
    "id:'12051',\
    phase:1,nolog,deny,exec:/usr/local/bin/modsecwritebadiptobartofile"

    Bar them automatically if not from whitelisted countries and if on any blacklist:
    SecRule GEO:COUNTRY_CODE \
    "@pm CA FR BE US CH GB AU IL NO NZ" \
    "id:'10501', \
    phase:1,nolog,pass,skipAfter:END_RBL"

    SecRule IP:PREVIOUS_RBL_CHECK "@eq 1" "phase:1,id:'11000',t:none,pass,nolog,\
    skipAfter:END_RBL_LOOKUP"

    SecRule REMOTE_ADDR "@rbl sbl-xbl.spamhaus.org" "id:'11010', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: sbl-xbl.spamhaus.org:\
      %{request_headers.user-agent}',\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    SecRule REMOTE_ADDR "@rbl bl.blocklist.de" "id:'11011', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: bl.blocklist.de:\
      %{request_headers.user-agent}'\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    etc. etc. etc. etc. etc.

    Have iptables log and bar offenders if not on whitelisted country

    # cat baripifex
    #!/bin/sh

    IP=${1}
    COUNTRY=`su tester -c "/usr/local/bin/geoiplookup ${IP}"`
    ###echo $COUNTRY
    ###echo $RBLCHECK

    WHITE_LISTED_COUNTRY=false

    for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
    do
    WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
    done

    if [ "$WHITE_LISTED_COUNTRY" = "false" ]
    then /home/ls/pub/mybin/baripnoout $IP $COUNTRY baripifex
    echo -n barred
    else
    echo -n noaction
    fi

    etc. etc. etc. etc. etc.

    --
    Everything I write is lies, read between the lines.
  5. Some, but not all by TheloniousToady · · Score: 4, Funny

    Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections

    Let's be clear: just because we bots like to post in comment sections doesn't mean we're malicious. And it doesn't mean we steal data or post ads. Some of us just want a little attention.

    I have a dream...that one day we bots will crawl a noosphere where we will not be judged by the clamor of our kin, but by the characters of our comments.

  6. 90% of the cells in the human body by goombah99 · · Score: 5, Insightful

    are bacteria. Viewed that way, basically humans exist to transport and feed bacteria. However that's 90% by cell count, not cell mass or total DNA. Looked at it that way the bacteria are assistants.

    The bot traffic is light weight it outnumbers human traffic in site visits not byte counts. It exists to serve us.

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:90% of the cells in the human body by ColdWetDog · · Score: 5, Funny

      It exists to serve us.

      You must be new here.

      --
      Faster! Faster! Faster would be better!
    2. Re:90% of the cells in the human body by Jesrad · · Score: 4, Funny

      I can't wait to outsource all my web-surfing to an AI. Then I might be able to actually get some work done !

      --
      Maybe we deserve this world ?
  7. Re:piss by flyneye · · Score: 4, Funny

    OMG, the bots are watching Netflix!

    --
    *Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
  8. idiotic math by slashmydots · · Score: 3, Insightful

    Wow! So if I remember correctly from past Slashdot stories, 61% of internet traffic is boys, 60% is netflix, 50% is youtube, and 42% is bittorrent. That's TRULY astonishing when you think about it. I mean 213% is a lot!