Slashdot Mirror


Throttle Apache Bandwidth Based on IP Address?

BigBlockMopar asks: "A friend of mine runs a web site which offers a very large archive of files. He wishes to continue to offer free and unrestricted access to his archive, but his bandwidth consumption has been through the roof because of people using wget (and similar) to download his entire site. Current traffic is around 200 gigabytes per month with over 50% of that being clients who are downloading every document on the site. The server space is donated by a hosting provider who is understandably starting to become impatient with the traffic. I've checked out mod_throttle and mod_bandwidth, neither appears to do exactly what is desired. Does anyone have any suggestions?"

"Eventually, he plans to set up mirrors, but he'd like to get the greedy users under control first. Alternatives are adding a (free) log-in authentication system, or a text-in-image system like Network Solutions uses to weed out automated whois queries. But I think the best solution is to allow a given client IP address full-speed downloading for the first $WHATEVER megabytes, and then automatically reduce the speed of the transfer to that IP address. This would probably deter most leeches but continue to allow legitimate users to transfer more than an arbitrary limit."

6 of 75 comments (clear)

  1. Another solution might be referer checking by Loualbano2 · · Score: 5, Insightful

    If you enable referer checking, this will stop most wget type programs. Wget has an --referer=URL option, but I find that it doesn't work. Also, there are a lot of windows clients that will spider a website and pull files based on extention, but again these don't usually have an option to set referer, or if they do most people aren't smart enough to turn it on.

    One exeption to this is Pavuk, which does referer spoofing pretty well. This program is about 4 times harder to use than wget, and isn't very popular (you don't see it included in distros too often).

    Of course this won't completely fix your problem, but it will probably stop about 90% of the people doing it now. It's an easy fix that you can implement quickly until you get something to throttle bandwidth properly.

    -ft

  2. BT by Gadzinka · · Score: 3, Insightful

    Does anyone have any suggestions?

    Yes, use the frelling BitTorrent, that's exactly what it was written for!

    Add to this some way of limiting bandwith per connection (so people are mainly downloading from other bt clients, not from you) and you have perfect distribution means.

    Leave the possibility to download via http, but limit it with QoS or some other way to tiny little stream, plus advertise all over the site that people can achieve unlimited dl speeds using BT.

    Publishing documents only to limit in every possible way access to them (like all the game files servers do) is unwise, to say the least. Especially if you don't have to.

    Robert

    --
    Bastard Operator From 193.219.28.162
  3. How would this be a solution? by lorcha · · Score: 2, Insightful
    If you are trying to limit the actual amount of downloaded bytes, how would throttling by IP help? If Larry the leach types
    wget -l99 http://your.site.org/
    he's just gonna walk away from the machine and check back when it's done. If you serve up all those files in 1 minute, 1 day, or 1 week, it doesn't matter. He's still downloaded exactly the same amount of data from you. Your solution only works if you're trying to limit transfer rates, which you should be able to do with your mod_throttles of the world.

    If you're just trying to discourage people from downloading so much from you, you need to set up mirrors, bittorrents, or some other protection of your site. Maybe you could reduce the size of your site? Is it an archive of pictures? If so, maybe your friend could reduce the size of them? I mean, if he's offering 100,000 pictures that are 100k in size each, then if he reduces the size/quality so they're only 30k each, then you'll really reduce your bandwidth.

    If it's all text, maybe you could use some kind of compression. If it's video, maybe use a lower bitrate. You get the idea.

    But just limiting transfer rates by IP is probably not gonna help.

    --
    "Avoid employing unlucky people - throw half of the pile of CVs in the bin without reading them." -- David Brent
  4. Instead of slowing down, try stopping it entirely by spitzak · · Score: 2, Insightful

    As several people here have said, if you just slow it down the wget will just take all week, and perhaps use more resources (you will have to keep track of it to slow it down).

    Instead, when they pass the bandwidth limit (or more likely a number-of-requests limit) you should deliver a dead-end page from which there are no links to go anywhere else. Then when they wget it you will get a lot of these dead-end pages instead of the data they want. If a normal user hits it, it can tell them to wait a few minutes and then reload the page.

    If the owner of the material does not mind, it does sound like a bittorrent download would help a lot too. Have the dead-end page give instructions on how to retrieve the bittorrent.

    Anyway these are just my ideas, I really have zero experience in web sites so feel free to dismiss them as stupid.

  5. auto-block bulk downloads by dj.delorie · · Score: 5, Insightful

    What I do is have a hidden link at the top of every page that links to a specially-named missing HTML file in that directory. The missing file handler checks for this special name and, if found, adds the client's IP to the .htaccess deny list. The access denied handler checks the .htaccess list and, if their IP is found, explains the acceptable use policy to them. A cron job expires the .htaccess entries quickly once they stop trying to bulk download.

  6. Plan for them by DynaSoar · · Score: 2, Insightful

    If they're going to suck down the whole thing, plan for it.

    Offer it pre-zipped. This would reduce the bandwidth and download time. A plus for everyone.

    Make it easy for people who do this to obtain updates/additions by date.

    As part of accessing the zipped version, ask people to mirror it. If they're going to carry it all, offer it all. Arrange dynamic mirror updating with those willing.

    Find one or more secondary storage site for the archive. Ask people to use these (put them highest on the list).

    If people persist on sucking down the whole thing and don't go for the archive, arrange a throttle with the sysadmin, and advertise it. Let people know that if they try to wget everything, things will start going real slow for them.

    Set up a small version without the files, in parallel to the real one, with a note saying "files temporarily unavailable". Allow the system owner to switch to the small version during times of high traffic so as not to bog down his other users, or alterntaively, switch it yourself according to the owner's estimates of his traffic and times.

    --
    "I may be synthetic, but I'm not stupid." -- Bishop 341-B