Slashdot Mirror


Throttle Apache Bandwidth Based on IP Address?

BigBlockMopar asks: "A friend of mine runs a web site which offers a very large archive of files. He wishes to continue to offer free and unrestricted access to his archive, but his bandwidth consumption has been through the roof because of people using wget (and similar) to download his entire site. Current traffic is around 200 gigabytes per month with over 50% of that being clients who are downloading every document on the site. The server space is donated by a hosting provider who is understandably starting to become impatient with the traffic. I've checked out mod_throttle and mod_bandwidth, neither appears to do exactly what is desired. Does anyone have any suggestions?"

"Eventually, he plans to set up mirrors, but he'd like to get the greedy users under control first. Alternatives are adding a (free) log-in authentication system, or a text-in-image system like Network Solutions uses to weed out automated whois queries. But I think the best solution is to allow a given client IP address full-speed downloading for the first $WHATEVER megabytes, and then automatically reduce the speed of the transfer to that IP address. This would probably deter most leeches but continue to allow legitimate users to transfer more than an arbitrary limit."

2 of 75 comments (clear)

  1. bad idea, apache because 1 connection per process by nudelding · · Score: 4, Interesting

    if you just slow down the connection you will have a lot of nearly idle apache processes running and so that after a while you cannot get more clients connected.
    Either just drop connections or use a single process proxy whith the required ability, which then forwards the requests to the apache.
    But restring by IP can be dangerous if users are sitting behind a proxy from the ISP (very common at least in Germany).

  2. Even if it works, it might not. by DDumitru · · Score: 4, Interesting

    If you want to limit BW by IP address, this might be doable depending on what the server is. If the server is a Linux or "virtual Linux" box, you can probably use 'tc' (Traffic Control) in the kernel to meter bandwidth by subnet or address. This works pretty well. Look at the advanced routing howto for info. It is a bear to setup, but actually works quite well.

    The problem is that if these are bots grabbing your whole site, slowing them down to 10K/sec won't actually reduce the amount of traffic they pull from you. They may take all day to get the pages, but the bytes will still move.

    Some options that you have.

    * If the user really doesn't need the data, block their address entirely.
    * consider blocking the 'bots' "client" signature. You can do this in
    Apache. "Respected" bots don't lie about who they are. If a bot does
    lie, then it is a DOS attack in disguise.
    * Contact the users, if you can.
    * If you want the user to get a mirror, setup something to actually do the
    mirror that is effective. I would recommend running rsync.