Throttle Apache Bandwidth Based on IP Address?
BigBlockMopar asks: "A friend of mine runs a web site which offers a very large archive of files. He wishes to continue to offer free and unrestricted access to his archive, but his bandwidth consumption has been through the roof because of people using wget (and similar) to download his entire site. Current traffic is around 200 gigabytes per month with over 50% of that being clients who are downloading every document on the site. The server space is donated by a hosting provider who is understandably starting to become impatient with the traffic. I've checked out mod_throttle and mod_bandwidth, neither appears to do exactly what is desired. Does anyone have any suggestions?"
"Eventually, he plans to set up mirrors, but he'd like to get the greedy users under control first. Alternatives are adding a (free) log-in authentication system, or a text-in-image system like Network Solutions uses to weed out automated whois queries. But I think the best solution is to allow a given client IP address full-speed downloading for the first $WHATEVER megabytes, and then automatically reduce the speed of the transfer to that IP address. This would probably deter most leeches but continue to allow legitimate users to transfer more than an arbitrary limit."
Well, you could always zip the whole site up, and put that up as a bittorrent link.
Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
if you just slow down the connection you will have a lot of nearly idle apache processes running and so that after a while you cannot get more clients connected.
Either just drop connections or use a single process proxy whith the required ability, which then forwards the requests to the apache.
But restring by IP can be dangerous if users are sitting behind a proxy from the ISP (very common at least in Germany).
Set up javascript links, which wget can't follow.
Or set up a 'captcha' for each download, so that a human has to confirm each file one at a time.
455fe10422ca29c4933f95052b792ab2
If you want to limit BW by IP address, this might be doable depending on what the server is. If the server is a Linux or "virtual Linux" box, you can probably use 'tc' (Traffic Control) in the kernel to meter bandwidth by subnet or address. This works pretty well. Look at the advanced routing howto for info. It is a bear to setup, but actually works quite well.
The problem is that if these are bots grabbing your whole site, slowing them down to 10K/sec won't actually reduce the amount of traffic they pull from you. They may take all day to get the pages, but the bytes will still move.
Some options that you have.
* If the user really doesn't need the data, block their address entirely.
* consider blocking the 'bots' "client" signature. You can do this in
Apache. "Respected" bots don't lie about who they are. If a bot does
lie, then it is a DOS attack in disguise.
* Contact the users, if you can.
* If you want the user to get a mirror, setup something to actually do the
mirror that is effective. I would recommend running rsync.
Don't bother trying to rate limit downloads; you'll get exactly the same number of people downloading everything, except that instead of doing it quickly they'll leave wget running all week and tying up your server's resources.
Have a page "download.php?filename=foo.txt" that all your links point to, and have that page return <meta http-equiv="Refresh" content="1;URL=files/$filename">
(pseudocode; my php scripting is not great, but you get the idea..)
This totally breaks wget, although it's not too hard to script around. You'll cut spider traffic back by probably 95%, all the casual 'grab everything we can' downloaders, but people who really want to get all your files will still figure out how to.
Or if you totally want to stop automated downloads, put each file behind a 'captcha'.
455fe10422ca29c4933f95052b792ab2
Move it over to FTP, and allow only X number of simultaneous logins.
Vintage computer games and RPG books available. Email me if you're interested.