Slashdot Mirror


Throttle Apache Bandwidth Based on IP Address?

BigBlockMopar asks: "A friend of mine runs a web site which offers a very large archive of files. He wishes to continue to offer free and unrestricted access to his archive, but his bandwidth consumption has been through the roof because of people using wget (and similar) to download his entire site. Current traffic is around 200 gigabytes per month with over 50% of that being clients who are downloading every document on the site. The server space is donated by a hosting provider who is understandably starting to become impatient with the traffic. I've checked out mod_throttle and mod_bandwidth, neither appears to do exactly what is desired. Does anyone have any suggestions?"

"Eventually, he plans to set up mirrors, but he'd like to get the greedy users under control first. Alternatives are adding a (free) log-in authentication system, or a text-in-image system like Network Solutions uses to weed out automated whois queries. But I think the best solution is to allow a given client IP address full-speed downloading for the first $WHATEVER megabytes, and then automatically reduce the speed of the transfer to that IP address. This would probably deter most leeches but continue to allow legitimate users to transfer more than an arbitrary limit."

2 of 75 comments (clear)

  1. Another solution might be referer checking by Loualbano2 · · Score: 5, Insightful

    If you enable referer checking, this will stop most wget type programs. Wget has an --referer=URL option, but I find that it doesn't work. Also, there are a lot of windows clients that will spider a website and pull files based on extention, but again these don't usually have an option to set referer, or if they do most people aren't smart enough to turn it on.

    One exeption to this is Pavuk, which does referer spoofing pretty well. This program is about 4 times harder to use than wget, and isn't very popular (you don't see it included in distros too often).

    Of course this won't completely fix your problem, but it will probably stop about 90% of the people doing it now. It's an easy fix that you can implement quickly until you get something to throttle bandwidth properly.

    -ft

  2. auto-block bulk downloads by dj.delorie · · Score: 5, Insightful

    What I do is have a hidden link at the top of every page that links to a specially-named missing HTML file in that directory. The missing file handler checks for this special name and, if found, adds the client's IP to the .htaccess deny list. The access denied handler checks the .htaccess list and, if their IP is found, explains the acceptable use policy to them. A cron job expires the .htaccess entries quickly once they stop trying to bulk download.