Slashdot Mirror


Throttle Apache Bandwidth Based on IP Address?

BigBlockMopar asks: "A friend of mine runs a web site which offers a very large archive of files. He wishes to continue to offer free and unrestricted access to his archive, but his bandwidth consumption has been through the roof because of people using wget (and similar) to download his entire site. Current traffic is around 200 gigabytes per month with over 50% of that being clients who are downloading every document on the site. The server space is donated by a hosting provider who is understandably starting to become impatient with the traffic. I've checked out mod_throttle and mod_bandwidth, neither appears to do exactly what is desired. Does anyone have any suggestions?"

"Eventually, he plans to set up mirrors, but he'd like to get the greedy users under control first. Alternatives are adding a (free) log-in authentication system, or a text-in-image system like Network Solutions uses to weed out automated whois queries. But I think the best solution is to allow a given client IP address full-speed downloading for the first $WHATEVER megabytes, and then automatically reduce the speed of the transfer to that IP address. This would probably deter most leeches but continue to allow legitimate users to transfer more than an arbitrary limit."

8 of 75 comments (clear)

  1. Wrong solution by insensitive+claude · · Score: 3, Informative

    I don't think a bandwidth limitation is going to be effective for this situation. They're still going to consume the same amount of bandwidth, just over a longer term. It's not like people usually sit and wait for the site sucker to do it's thing. Bandwidth limiters like you suggested are usually used to reduce the effects of slashdotting and the like.

    What you need is an anti-leech mod to limit the amount of data that can be downloaded from a specific IP. I know they're out there. Just do a bit of googling.

  2. Re:Even if it works, it might not. by Mancide · · Score: 2, Informative

    Also, wget will listen to robots.txt, just specify what is allowed and what isn't allowed for wget to grab. Granted, this can be circumvented, but it should help with most of the users who are not smart enough to get around it.

    This link should help you out.

    --
    "This amp is special, see all the knobs go up to 11, that means it is one louder than other amps"
  3. mod_perl is your friend by Etyenne · · Score: 2, Informative

    I strongly second the idea of offering your files via BitTorrent only. If, however, you must continue to offer them via plain HTTP, you should be able to cook up something with a custom Apache module. I suggest to have a look at http://www.oreilly.com/catalog/wrapmod/

    --
    :wq
  4. Not quite what you want .. by stevey · · Score: 3, Informative

    I wrote an apache module which I call mod_curb (for Apache 1.3)

    This doesn't do exactly what you want, but I'm sure if you were to ask me or somebody else we could code something for you.

    The basic idea I have for you problem is to have a database of currently active clients, beit MySQL/Flat files, then you can keep track of all data transferred by that address.

    Once a threshold has been reached you can either stop everything, or start throttling.

    However throttling alone won't help you out they'll still mirror you, just slowly.

  5. Re:WANTED by stevey · · Score: 2, Informative

    See my other comment about mod_curb which comes close to doing the right thing.

    You could hack it, or find somebody else to do so for you.

  6. Re:Another solution might be referer checking by ncr53c8xx · · Score: 2, Informative
    Wget has an --referer=URL option, but I find that it doesn't work.

    Which version of wget are you using? The referrer option works fine for me--for one website when I don't use it, I get redirected to the main page. With the referrer option I can download the file. Although something that sets the referrer automatically would be best.

  7. KISS by borgdows · · Score: 1, Informative

    just put Squid (http://squidcache.org) in front of your Apache.

    Squid's config is very easy for bandwidth throttling by IP.

  8. Totally Possible by yancey · · Score: 4, Informative


    Don't you hate it when everyone tells you something is impossible? It would be much more useful if they wouldn't, so that people who post solutions are easier to find.

    This is absolutely possible and not that hard. It is just that most people don't take the time to learn how. The poster who mentioned Quality of Service (QOS) was correct. You will certainly want to read about traffic control and queueing disciplines.

    Under Linux, use the traffic control (tc) command to configure bandwidth limits by adding or chaining queueing disciplines to your network interface. tc may not come pre-installed with your distribution, so you might have to find it.

    At the end of this post is a script I wrote to limit bandwidth from my website, which limits anything going out of port 8000 to 2 Mbps, but can "borrow" up to 2 Mbps more when bandwidth is available (almost always on a 100 Mbps connection).

    Since you can accidentally limit yourself to near nothing, you'll want a quick way to disable traffic control. The line below removes the "root" queueing disciple from the network interface which removes all the queueing disciplines that are chained from it.

    tc qdisc del dev eth0 root

    By modifying the u32 queueing discipline parameters, you can quite easily limit based upon IP addresses/networks.

    This should get you started, but you really should read the traffic control documentation and understand how to configure this stuff. Don't just think you can tweak a few parameters in the script and get what you want. I'm not ashamed to admit that it took me a few hours to get a beginning grasp on it.

    OK, here is the script...

    # Add HTB queuing discipline to root of eth0 with handle 1:0
    # unclassified traffic goes to class 1:99
    tc qdisc add \
    dev eth0 \
    root \
    handle 1: \
    htb \
    default 99

    # Add a single class that will limit all bandwidth on this interface
    # This is done so that we can borrow between the classes below
    tc class add \
    dev eth0 \
    parent 1: \
    classid 1:1 \
    htb \
    rate 100mbit

    # Class 1:10 is limited to 2mbit/s but can borrow up to 2mbit/s more from 1:99
    # in practice the other 2mbit/s should almost always be available
    tc class add \
    dev eth0 \
    parent 1:1 \
    classid 1:10 \
    htb \
    rate 2mbit \
    ceil 4mbit

    # Class 1:99 is limited to 90mbit/s and can not borrow any more
    tc class add \
    dev eth0 \
    parent 1:1 \
    classid 1:99 \
    htb \
    rate 90mbit \
    ceil 90mbit

    # Use SFQ to load balance the connections within class 1:10
    tc qdisc add \
    dev eth0 \
    parent 1:10 \
    handle 10: \
    sfq

    # Use SFQ to load balance the connections within class 1:99
    tc qdisc add \
    dev eth0 \
    parent 1:99 \
    handle 99: \
    sfq

    # This filter selects all traffic from port 8000 as belonging to class 1:10
    tc filter add \
    dev eth0 \
    protocol ip \
    parent 1: \
    prio 1 \
    u32 match ip sport 8000 0xffff \
    flowid 1:10

    --
    Ouch! The truth hurts!