Domain: squid-cache.org
Stories and comments across the archive that link to squid-cache.org.
Comments · 216
-
Can't save it?
How are they preventing you from saving it? While you might not be able to save the file using the RealPlayer client, it wouldn't be that hard to record the stream on a network level. As I understand it, you can rig squid to cache realplayer (we've been thinking of doing at the school where I work so classes can watch stuff without killing our bandwidth). Couldn't you just do that locally and then copy the file out of the cache when you're done watching?
-
Damned if they do, damned if they don't
The social side of it is that no one wants to be watched by Big Brother or marketers or whatever. And it's against the law for Comcast to do it. It seems that most Slashdotters are well aware of and justifiably sensitive to these issues.
Then there's the technical side, and I'm kind of surprised that so few voices here speak up about it. When you have a network with a lot of users, it is very natural and intelligent to want to optimize it and use it at maximum efficiency. Caching proxies are a great tool at your disposal. It would almost be stupid if you didn't use them. (Yes, I'm a Squid lover. IMHO, almost every ISP, net-connected business, school, etc should have one.)
But running a web proxy comes with a responsibility. Someone might abuse it, and if the admin receives a complaint about something that came from his box, he needs to be able to look in a log and see who really deserves to receive that complaint, lest he be left holding the hot potato. You can't be a common carrier and not be held responsible for what goes through your network, unless your finger is always ready to point at who is responsible.
It looks like the social issues are the squeaky wheel, so they're going to be addressed. Just remember that this comes with a performance cost.
-
Re:Woe is me ... I hate pop unders ... geesh ...
bandwidth is generally a fixed cost. it doesn't matter if your T-1 is at 80% m-f 8-6, and averages 2% the rest of the time, of if you're averaging a constant 70% all the time, the cost of the line is the same.
if those in the Real World (tm) want to limit their bandwidth by avoiding pop-x windows, they'll use (as you eluded to) a browser that doesn't allow pop-x windows (moz/opera).
in the Real World (tm), those that are too cheap, too frugal, or too cost conscious to purchase the needed bandwidth will setup a Squid (tm) proxy server
Note: Real World is trademarked by MTV corporation.
Squid is trademarked by the Squid Cache web proxy software organization. -
Re:Let's hope managers/supervisors don't find this
Managers already do this. Many companies put all their employees on web proxies for exactly this reason. I have friends that work in large companies where it is a known fact that managers review
1) Page views
2) Attempts to view blocked pages
3) Email with questionable content
4) Usage statistics on mail servers
As a result, I've helped those friends use web proxies and and SSL to add privacy to their workstations. putty port forwarding and a remotely running squid are their best worktime friends.
-
Does it respect proxies yet?
Mozilla is good, mozilla is great. The only thing keeping me from using it over Konqueror right now is the fact it seems to ignore my proxy setting. I use The Internet Junkbuster to remove unwanted (read: all) ads and other things. Mozilla up to RC1 seems to overlook this and I see ads all over the place. It may be due to JavaScript url fetching not going through the proxy, but I'm not sure
And don't tell me to use moz's built-in ad blocking, because I've already got a huge blockfile, I want to block for all browsers across the network, and it usually screws up rendering to use the builtin stuff anyway.
This is a great web browser; it's really faster than other GUI browsers I've used, renders nicely, and has all the features. But until it respects proxies (I use Squid to cache stuff too, helps a lot when all you've got is a modem), I can't use it.
:-( -
Re:/. effect solution?
How about building the cache on top of Squid? Write a program so links would reference, say, http://slashdot.org/cache/www.amyhughes.org/lego/
. This program then requests the URL through Squid, which takes care of making sure that the site receives the appropriate number of hits. (Squid checks to make sure the data hasn't changed, and if it hasn't, the data isn't downloaded - this results in the site registering a hit, but not having to transfer any data other than the header response.) If a site seems like it might get slashdotted (and I can usually guess when a URL will be slashdotted), the editor clicks a button and POOF - the URL in the story is automatically changed and the cache program lists it as a valid site to cache (so that people can't use it to bypass pr0n filters at work). This can't be THAT difficult, can it? Squid does all the work for you, and who needs permission from the site to use Squid? Are there any implications (described in the FAQ or otherwise) that I haven't addressed (besides non-relative links in the HTML needing to be rewritten)? -
Re:This is great.You can do the obvious thing, or you can use any of the filtering software available:
squid plus ad-zap (my choice)
;
proxomitron.
There may be others. Some tweaking required (not for subscribing, obviously :) -
Hmmm.... This looks weird...The web anno 2002, when using something not blessed by Microsoft (in my case, mozilla + junkbuster/squid proxy chain):
Microsoft VBScript runtime error '800a000d'
Type mismatch: 'CInt'
E:\INETPUB\WEBSITES\YIL\COLUMNS\../ssi/ssiASP.as p, line 83This is not what Tim Berners-Lee intended...
-
corrections, suggestions, etcFirst of all, the phrase "routing" is a misnomer. Web caching is something that happens on the application layer of the OSI model, layer 7, whereas "routing" refers to layer 3, which supplies IP routing for the TCP/IP protocol suite. What's broken is their caching, their cache server, or their proxying; pick a term.
Second, there's a lot of ways around it which involve tunnelling.
Tunnel to another box running a non-broken web cache. I used to tunnel my http traffic through ssh to my colocated boxes, which ran adzapper, and proxied through that.
Tunnel at the IP layer by running any IP-in-IP encapsulation. If you have some version of windows, for example, you might convince someone with a server to run a PPTP server for you somewhere and you could tunnel through that. There are even Free PPTP Servers for Linux available to help.
Find someone who runs a little proxier for their own net with socks, and bounce off their socks proxy. Someone you know no another ISP probably has Wingate or the like running, and if they allowed it (and on some older version, it will permit this by default), you could set your browsers SOCKS settings to bounce off their proxy server, and since SOCKS isn't on port 80, your ISP will probably ignore it.
There are also a number of things you might discuss with your ISP to resolve the issue.
Suggest that they switch to a less broken cache server. (Squid, anyone?)
Suggest that they exempt you specifically from the cache server by telling it to ignore your ip address.
Note that they have an obligation to make sure their caching software doesn't interfere with your browsing; so it will be necessary (and not cost-effective for them) for you to call for every problem you notice.
Obviously, you'll need to probably speak to a whole number of supervisors, and probably eventually get transferred to a "real engineer", and they will probably hack in a fix (like exempting you only) rather than truly deal with the problem.
If all else fails, then you may want to try issuing ultimatums, like, "If you can't fix this problem, then you can cancel my service." Tech support people are lazy, however, in some cases, and may just opt to cancel you. This is a harsh reality in the world of consumer bandwidth -- and it will be worse, soon, with bells closing their DSL lines to competition, meaning unless someone else builds a telephony infrastructure to you, you'll probably pick Cable vs 1 DSL provider, and if you don't like something at either of them, you're just out of luck. -
Re:Hey, I just work there
-
Re:Redundant Solutions?For connecting a large (300+ seats) internal network at our LAN parties to the Internet via a combination of ADSL and cablemodem lines, I use the Squid Proxy Cache to bundle the lines. This provides us with fault-tolerance, nice load-balancing of the outgoing connections, and a solid cache pool. There's one primary cache (high-end box with fast disks) that is visible to the users, and for each outgoing line a small PC (Pentium 233 will do fine) that acts as a parent (see round-robin option).
We've experimented with load-balancing on a layer below, and I've found it much more difficult to maintain and debug... you know, squid offers beautiful logs and has many cool tuning parameters (I can even put weights on the lines!).
-
Re:Here's an idea
If you really want to do this (and hopefully it's not just for slashdot, 99% of sites have worse advertising than they have here), grab squid from Squid, and a redirector script from here. There are instructions in the package for plugging it into squid - they basically involve setting 'redirector' in squid.conf to point to the redirector.pl script in the package.
In the redirector.pl script, set $WWW to the address of a local file or httpd, and $BANNERGIF to 'null.gif', which is a 1x1 pixel image that comes with the bannerfilter package. Point your browser's HTTP/HTTPS proxy to the local address squid is listening on, and you're now ad-free.
The bannerfilter script is customizable through simple text files, and has an sh script that uses wget to automatically update the ad definitions.
No, I didn't write it. I just use it.
-
Re:Junkbuster
The Ad-Zapper for squid works also fine, and if you're what the slashdot users usually pretend to be, you should run squid, not junkbuster.
;)
Also, for spam in general, or rather against it, SpamMotel and especially SneakEmail work like a charm; SneakEmail even lets you reply to (suspected) spammers without revealing your real address.
Of course, if you have your own domain/MX and mail server, you can generate these "one-time" email addresses yourself - but using sneakemail is just too easy and convenient. -
Gateway vs PersonalAt my employer, I've been using/evaluating for a month now RAV Antivirus for Postfix added by a fine collection of regexp for body_checks and header_checks (preventing that almost anything that MSWIN can execute passes the mail server) and I am VERY satisfied. This way the most common infection "procedure" is prevented.
Of course, all of you can say that is NOT an infalible procedure... but what the hell, none is ! Having dozens of desktops with anti-virus is not infalible also. Sure there are some very fine packages but if you co-ordenate your traffic in a good combination of redirectors for SQUID, disabling file-tranfers through messengers and having your gateway pretty much tied up, I believe that you can have some relaxation time!
- STATS :
- 5Gb net traffic (mail+web) per day
- 3 virus caught in 27 days
- 0 infections
-
Re:Refusal of responsibility on the part of studen
I've heard complaints about the download limits from people yelling things like "What if I want to download several Linux ISOs?" They don't realize we have a mirror server that has all the latest files on the internal network.
Well, it's a university. The last thing you should expect, is that the people would be able to learn.
;-)OTOH, if you did something transparent, like, say, having a caching proxy, then they would still end up using your local copy instead of the connection outside, and it would even work for uninformed and irresponsible people.
-
Save your eyes!Don't gripe! No one has to view ad banners! With Squid Cache and Squid Guard running on your Linux/BSD/Mac OS X (where I use it) box, you never have to view ad banners again. In this case, all you have to add to your "Domain" list in Squid Guard (if you have it set up to block) is "us.a1.yimg.com" (without the quotes) and you will never know that Yahoo has banner ads. I replace all the banner ads with a 1x1 transparent gif.
Using this system also greatly speeds up my web access as I am no longer pulling tons of ads everyday.
-
use the squid
This site's already slashdotted, and there are fewer than 10 comments.:
Warning: Too many connections in
/usr/local/apache/sites/infosync.no/htdocs/show.ph p on line 7 Warning: MySQL Connection Failed: Too many connections in /usr/local/apache/sites/infosync.no/htdocs/show.ph p on line 7 Unable to connect!<psa>Clearly, too many admins of dynamic sites don't know about squid which can act as an 'httpd accelerator', meaning you don't have to go to database for every single request.</psa>
-
Re:Geographic IP Location
The darn thing got my location right within a 15 kilometers range
No distances were given (latitude and longitude, but they're in a weird format...longitude was given as "-115.17" when something like 1156'48" W would be the usual method), but it nailed both IPs I fed it as being in Las Vegas. (When you consider that reverse-mapping one address gets you lasvegas.net and the other gets you lvcm.com, that probably shouldn't be too surprising.)
Given how easy it would be to fool a geolocation system (especially given nearly everybody else in this thread), I don't see how this is really supposed to be effective...or is it really supposed to be more like CSS, which only thwarts fair use and small-scale copying while doing nothing to stop mass production of "counterfeit" DVDs? There's no reason (other than the bandwidth on my cable-modem connection) why I couldn't open my Squid proxy up to the world. In addition to getting almost no ads, you would appear to be browsing from Vegas instead of wherever you are really located. (How's that for MLP?
:-) ) What's to stop someone from doing this elsewhere, either as a free service or for profit, and enable people to bypass whatever geographic restrictions are placed on a website? -
opensource squid ...
Guess it had to happen, opensource squid getting free in the big ocean!
Guess it has to be their new release ... -
Re:bandwidth/capacity
It's a shame there's not a technology-based solution that automatically kicks in for obscenely popular sites. Some sort of popular site caching mechanism or a P2P system might do the trick (and provide a more legitimate use for P2P technologies). Such a system would also help out in non-emergency situations, such as when a given novelty site gets its 15 minutes of Internet fame.
There is -- it's called a caching proxy. You can set one up at your site, speed up local access, and help reduce load on the internet as a whole. -
Re:Get DSL speeds using your dial in modem!
Nah, just put a Squid proxy between you and your modem. 3 users on 56k, and not a one has complained about slow web surfing since.
:) -
AppliancesIf You need a http proxy in enterprise environment, You don't set up a computer to do the task. You get an appliance with service contract.
If you actually have experience with HTTP proxy software, this will only piss you off. Same for firewall appliances, CD tower appliances, file server appliances, web server appliances, you name it.
The right way to do it is to go with what you're familiar with and what you know works. Not what comes in a black box that you suspect might be half-assed. Not what might work in six months.
Spec'ing out the equivalent appliances for two fully-redundant 100GB+ squid proxy servers that you can ssh into, with hot-swap RAID, redundant power supplies, network connectivity of your choice (gig ethernet, for example), etc. with a hardware vendor service agreement that specifies 4-hour parts replacement turnaround time may be an exercise in futility.
(And yes, proxy failover can work much like DNS failover - the client is smart enough to figure it out.)
-
Re:Let the proxy cache be distributed
Great idea, now go write some code.
This would only require minor changes to squid. But I although my idea sounds great in theory, I don't think it would have the advantages it should have because the majority of the users just doesn't bother to install a proxy-server and the majority of the ISP's (at least here in .nl) chooses not to install a proxy in the users' browsers by default. And since the idea only works with a huge userbase...well...too bad. -
In defense of "the freeloaders"
I'd like to address the "just view the damned ads, you freeloading hippies" crowd.
Personally, the reason I started blocking banner ads (a little over a year ago) was because of one very specific ad--that stupid "punch the monkey ad".
It managed to crap more web no-nos into an ad than I ever though possible:
- I froze my browser, as my browser had to load the Java runtime to display it. This is nontrivial time under Netscape, and used to be a lengthy wait under IE, as well.
- It moved. Quickly. Very distracting when you're trying to use Altavista to look up a particular bit of LaTeX wizardry.
- If my mouse cursor hovered over the ad, the ad captured mouse focus, and caused my mouse cursor to not always move as it normally would (largely due to the overhead by the Java runtime, I'm sure--I was using a SPARC LX at the time).
- It would frequently cause Netscape to dump core, and would occasionally cause IE to just freeze-up completely.
At the time, it was a very popular ad. I don't know what I was typing to into Altavista to make it trigger (LaTeX->latex? Monkeys? WTF?), but I seemed to get it every five pages, and Netscape dumping core every five pages was not conducive to my finding out this LaTeX technique, which I needed right then to finish a CS paper (I'd have used Fondren Library, but this was before the Rice campus library stayed open 24 hours daily).
So, as a temporary fix, I disabled Java (I didn't need it at the time), used a different search engine (Google), got what I needed, and then installed Squid+Cameron Simpson's Ad Zapper (once I'd turned-in my paper), and the problem went away. I could have Java as I needed it (Rice's CS departmnet loves Java. Turning it off in a web browser meant not being able to do certain coursework), and my browser didn't crash because of stupid monkeys.
The clear message I'd like to deliver is I don't mind non-intrusive advertising. In fact, most banner ads are very interesting, so long as they don't flash or titter about annoyingly, and don't stupidly try (and fail) to look like dialog boxes (looks really stupid under OpenWin). Occasionally, I click one. However, if it pops up in a separate window, if it spawns things in other windows, if it creates offscreen windows, if it crashed my browser, if it litters my hard drives with cookies, if it prevents me from clicking on your page, or if it dances around like a stupid monkey, I will disable it, and I will go elsewhere.
There are probably a lot of technically-minded users that feel the same way. I don't want to steal content--I don't have this need to remove all adverts from the pages I'm viewing (although, I will strip them out, if need to print the page). But, my computer is my computer, and if your website can't sit in its window and behave itself, you've just lost a viewer.
-
Squid with SquidGuard is the bombAs previously stated, SquidGuard on top of Squid Cache is a probable good solution. SquidGuard is HIGHLY configurable for rule-sets, and Squid is a fantastic web-caching proxy server.
I have recently configured such a web-filtering beast at a private middle school that requires web filtering for students. I am VERY happy with the speed of Squid and the configurability of SquidGuard.
FYI, I simply created two lists "adult" and "student", and configured SquidGuard to pass ALL adult user requests on through unchecked, but check for and block 'bad stuff' when a student is making an attempt.
Client is happy, I am happy (and paid). Chalk another one up for censorship!
Kidding aside, this is a middle school and the children's Internet/computer access is monitored by staff/faculty members as well. Squid & SquidGuard are an added assitance. YMMV
-
Ahh, I'm just bored
Besides, why the hell didn't you take the bait I so carefully prepared for you?
:)
And it wasn't from WHOIS, it was from Squid. -
A few things I do...
Down here in Australia, nearly everyone is stuck on 56k. ADSL costs A$95 a month for 256kb/64kb with a 3GB a month cap. Ouch. Even worse, cable is only available in two cities in the entire country.
Firstly, I set up a junkbuster proxy on my box. Getting rid of all those stupid banners really does help, especially when I'm reading sites like *shudder* CNet or *shudder**shudder* ZDNet, with ther huge middle of the page Flash modem-killers. This feeds into a Squid caching proxy; it really does seem to help a fair bit. Thirdly, I run a BIND caching DNS server. Of course, there are plenty of other DNS servers around, but BIND is the one I saw first, so that's what I'm using.
Overall, with a bit of fiddling, it makes being stuck on a 56k suck slightly less. -
block it all
I don't want any of the ads, so I use Bugnosis to detect the web bugs and the free WebWasher proxy with IE to scrub out the cruft, which is somehow available for free on Linux, though I'm told that Squid and Junkbusters can do the same. AdSubtract is another alternative that comes packages with the ZoneAlarm firewall these days, but I found it to not be as flexible as WebWasher. Unfortunately there are a few sites that do not work with WebWasher, most notably EBay and no matter how I tell it not to touch EBay's cookies and content, it still blocks something that keeps that site from working.
What is needed is some sort of plugin that works directly with the browser, sets all pages and cookies to be filtered out by default, and which lets you just right click on a page to tell it this site is OK to not filter and remember to let these cookies through. All browsers have the cookie feature, but management is usually a pain with what they provide and often left up to third party tools like all of the above. Sounds like Mozilla has some of this built it, so I'll give it a try...it may be time to make a switch. IE6 is supposed to have some of this cookie control, though I'm not sure if it's to that level of convenience.
I haven't seen an ad or a web bug on pages since I've made that change. I look forward to being popup/under and ad free in the future.
-
squid can protect your http server
Just a single line to say that squid ( http://www.squid-cache.org) can be configured as an acceleration server only (without the proxying), and will automatically deny the default.ida, as well as protect your server from unacceptables requests.
-
webwash the popups out
Someone complained about my mentioning the WebWasher product before, but I have to bring it up here since it's right on topic. WebWasher is a program that's free for personal use and available for Linux as well as Windows. It gets rids of cookies and all the other cruft, but most important of all, I don't get sprayed in the face with popups when I use my browser.
The point is, if you avoid loading these ads, they become an ineffective method of advertising and as marketing sees decreased results from pouring money into them, they'll be less willing to use them. That, or they'll just increase the frequency of the pops so that other people will suffer. It doesn't matter to me because I don't see them and I'm not contributing to their revenue. In a sense, I wish that the banner market was still charging lots of $$$ for these so that they'd be wasting more money per ad campaign before realizing it was being filtered out.
TO be balanced with the non-commercial products, I understand that this filtering can also be done with Squid to be effective on a site-wide basis.
-
Re:A step in the right direction
My point is, what business does the ISP have in providing news service anyways?
Performance increase due to nearby caching.
I wonder why ISPs don't run stuff like Squid, since last I heard, the WWW was a fairly popular part of the Internet.
--- -
Re:I can't wait to see this guy's face....
Or free: www.squid-cache.org
.. i've set it up in front of publicfile & apache using publicfile for static content and apache for dynamic content, works wonderfully. --Sean -
Re:From the interview
There is one major example or government funding GPL'ed code: NSA Secure Linux. While I agree that NSA Secure Linux is one of the more eggregious examples of the Government developing code which isn't released as public domain information, there are plenty of other cases as well. How would you consider Squid, which is GPL'd code whose development is funded entirely by the National Science Foundation? Granted, the farther away from the OS you get the less distasteful it is, but the issue still stands that public funds have payed for substantial development which is inaccessable to much of the public.
-
Re:Bad Bad Bad, webmaster.If you run Squid, you can install BannerFilter, which attempts to take care of that for you (while still letting you use whatever browser you want).
--
-
A: Squid uses MD5
Squid uses MD5 keys to keep track of the pages that it's indexed (how else?). It also uses these keys in ICP queries of other sibling/parent servers to find the content. Of course, it doesn't use them in the protocols to talk to webservers... but if the browser/server is willing to use date stamps, what's wrong with that?
If the server is going to be explicit about what time something was changed, and how long it should be valid for, this is valuable information, a little more than a checksum can provide. This is all conveyed in a header request (which is less work than downloading a document and caclulating the checksum, or the same as asking the "enabled" server for one). -
Re:I like the end of the article...
-
Use your proxies, dammit! (Re:Methods of Caching..
The problem is, everyone wants to look at the same content at the same time; under the current system, the server has to send out one copy of the data to each client that requests it, so if 1000 clients request it, the server has to send 1000 copies.
Have you set your connection proxy?
If not, you probably should. And everyone out there too: The above is exactly what hierarchical proxy-cache servers were designed to prevent! As the name indicates, these servers will proxy your HTTP request(some other protocols can be used too), and cache the result. When another identical request comes in, it is served directly from the cache instead of contacting the server.
The proxies-cache servers are organized in a hierachical fashion. So when you send a request, it does not matter if it is not currently in your proxy-cache: it may be stored in another cache higher in the hierarchy. The request will be sent upward, and only if it is really found nowhere between you and the target, the target server will be contacted.
The result is: everyone wins!
- On the server side, the server load is greatly reduced.
- On the client side, browsing of popular sites is faster since the contents will likely already be in a proxy-cache closer than the target
In the situation you describe, if your 1000 clients are under 50 different ISPs, there would be only 50 requests to the server. And everyone (except for the first connected guy of each ISP) would browse much faster, since they get all the data directly from their ISP.
Note that some ISPs enforce the use of their proxy. That's a little bit radical, but if every ISP did that, the Slashdot effect would be a memory, and the net would be a better place...
Conclusion:
Save the Internet : set up your proxy !
(Check with your ISP what proxy you should use).If you want to know more on proxy-caches, check out the docs of Squid, a popular proxy-cache server.
--
SOMEBODY SET UP US THE PROXY ! -
Two reasons why notThere are two important considerations here.
First, the whole point of censorware is that you can't get around it. If you have a choice of whether to run it or not, it might be searching, filtering, categorizing, whatever, but it's not censorware.
The idea of an "open" solution which is forced upon people is a little silly. Apart from the philosophical absurdity, censorware can never work on an open-source operating system without stringent physical controls as well.
(Recall the first rule of security: anyone who has physical access to your machine has the potential to compromise it. This may be as simple as booting from floppy!)
Second, making up a blacklist of porn sites is trivial if you just want to list the ones who want to be listed. Use RSACi. It's already built into your browser. Almost all porn sites rate with RSACi, and they want to be blacklisted, because it helps immunize them from prosecution for providing porn to kids (or at least that's the perception).
If you want to make up a blacklist of sites which don't want to be blacklisted, you have a fight on your hands. It's a phenomenal amount of work to scan the web. Consider the massive server farms and pipes of unholy size that Google or Alta Vista have to use to spider the web. Who's going to volunteer to set up a similar installation to spider porn sites?
If you think you're just going to provide a way for volunteers to send in "hey, I found another porn site" URLs, don't be silly. Most of those submissions are going to be RASCi-rated; almost all the rest will be overlap. The web is huge. Porn is about 1% of it. One percent of huge is still huge.
And then, the big question: who's going to make decisions about these allegedly porn (but not self-rated) sites? Some human being has to categorize them, or you'll be no more accurate than the existing closed-source blacklists (which is to say, laughably inaccurate).
That takes time, and with millions of new or changed pages on the web every hour, do the math and figure out how much time you can expect to get out of your volunteers. How many dollars of free labor does this hypothetical project depend on? Do porn-hating geeks really hate porn that much, that they'll sit in front of a monitor all day for free and surf porn sites?
Short version: if it were easy to do, someone already would have done it. In fact there already exist several places that keep an "open" list of porn sites which can be dropped into any Squid proxy. Most of them are years old and will never be maintained again:
- squidblock.tgz from July 1999
- sxcontrol, last change February 2000
- INfilter, last revised March 2000
- Linux Center's squidblock.tgz
Click the "Latest" link, which is there "just to show that someone is using it!" Note that the "latest" additions to the blacklist include such obscure sites as playboy.com, and such recent new sites as dailydirt.com (domain registered on Jan 12, 1998).
Jamie McCarthy
-
Re:Fake Your BrowserThere's also the GPL'ed Internet JunkBuster's user-agent option.
Or you can also do this with Squid via its fake_user_agent option.
Mine returns "Mozilla/4.0 [en] (Linux; Vic-20)"
:-)
--
-
There is one!
If you want open-source censorware, there is squidGuard, a redirector for the Squid proxy server. It provides a great deal of flexibility as to who's allowed access to what, and when.
-
Re:Rise of Proxies
But on the Macintosh machine at home, I haven't been able to find a decent http proxy to filter banner ads, so I've just suffered. Perhaps someone here can recommend a proxy for the Mac... that works.
With MacOS X, Squid (and the filtering add-ons available for it) would be a pretty good option. If it's been ported to any older versions, it might work for you as well. (It's been ported to NT, at least, so it's not completely restricted to UN*Xish platforms.)
The other option, of course, would be to throw Linux onto your Mac and then run Squid under that.
:-) (I picked up a Quadra 610 dirt-cheap recently with that intention, more for sh*ts and grins than anything else...just need something bigger than an 80-meg hard drive now.) -
Things to tryI worked for a company that took about 750,000 to a million hits/day with almost every page pulled from the DB.
Try the following:- Add more RAM (at LEAST 512), preferably 1 Gig
- Upgrade CPUs
- Upgrade Motherboards
- Upgrade NIC
- Read the "Optimizing MySQL" chapter (11??) from the mysql website.
- Try contacting AbriaSoft who works with MySQL to provide support.
- Try caching as much content as possible. Have you concidered squid. Or, you can easily write your own (this is really easy in PHP... email me for info).
-
Re:Responsible Logging... how about /privacy.txtThere are many good observations in this (and other) posts. As already noted by just about everybody, logging IP addresses is necessary for maintenence.
But as Anomalous Ovum says,
During a transaction IP address will always be known. A log file is merely a form of persistent memory that extends beyond that moment. Therefore the real issue is not whether to log, but how long it is retained.
It is not just how long the information is retained, but how it is used. To make the case clearer, let's look at an example where logging can be more Big Brotherish.I recall setting up squid web proxy and cache at a medium sized university in 1995. Actually at that time, Squid was still Harvest. Anyway, once my co-admin and I got everything up beyond our own tests, we set the clients around the campus to use it. Naturally, we watched the cache-proxy logs go.
Well, as soon as we saw the URLs that were getting fetched, we immediately decided that "we shouldn't be watching this". We had the IP address of the client and we had other ways of finding out who was logged into that particular workstation. All of a sudden we had a way of tracking who at the university was reading what.
Of course we knew beforehand that we would have that information, but it was only after we tail -f the log did we realize how much of an issue that was.
The first thing that we decided was that if users were going to fetch lots of images, we wanted the material cached, instead of getting dozens of seperate requests for the same image. So the cache was doing its job. But we puzzled over what to do about this very private information we suddenly had.
At that point in time, use of the cache was voluntary. One could opt-out by resetting default browser settings. But we wanted as many people to use the cache as possible.
So we were left with a few options
- Anonymize the logs by masking the IP address
that gets logged.
That way, we would know what was being read, cached or not cached which is very useful for maintenance, but have no way to trace the individual user.
Current versions of squid now have that as a configurable feature. We would have just patched harvest or post processed the logs.
- Not log at all.
We really needed the information to tune the proxy. This was not an option we seriously considered.
- Keep things private and lie to everybody
The two of us admins agreed to respect privacy and not trace individual users and only read logs when needed (and mostly using summary stats), but more importantly we agreed that if some PHB in management ever asked us whether we could trace who read what we would lie and say that that was impossible.
On the whole, I still worry about whether we made the right choice. It worked out well, but we effectively lied to users (by not letting them know that such information was logged), and would have lied to management the same way had it come up.
So back to the main point. Logging may be necessary for security and maintanence, but the real issue is what safe guards are in place against misuse of those logs. Typically, it is only the goodwill of the sysadms.
- Anonymize the logs by masking the IP address
that gets logged.
-
Re:HTTP?I was about to say something similar.
MBone wasn't designed for this (as so many people have pointed out), but Squid was. Ideally, the primary source should be a front-ended by a squid cache that only peers with the secondary mirrors. The secondary mirrors wouldn't even have to synchronize; client requests would automatically force a sync with the primary. And becoming a (tertiary) mirror would be as simple as adding the secondaries to your peering list in
/etc/squid.conf.However, in regards to transparent proxying in the US, I can speak from experience. It doesn't pay off. I used to sysadmin for a smallish ISP (2,000 customers, 400 lines) and we experimented with transparent proxying. With 16 gigs of cache, the proxy was serving about 30% of the requests out of cache. However, after some VIP customers noticed that their real-time stock quotes weren't real-time anymore, I had to turn it off.
-
Re:Seems to be a slightly different situation now.
IMHO, it is possible to write good censorware that uses a combination of black lists, acceptable sites, and PICS ratings, with an OSS-like list of blocking rules, that allows children to read information on AIDS and breast cancer, but blocks them from the redlight districts, to a reasonably good extent. AFAIK, none of them are currently 'good' enough by our (/.) standards, and I dunno of any open source one.
Then again ask and ye shall receive. Squid can be used for this purpose and itself has a couple of pointers to publicly maintained porn sites lists. Open source porn blocking.
Admittedly this would require a minute amount of overhead to administer. But hey.
-- A mind is a horrible thing to taste. -
How about Squid?
Squid is a great web proxy cache that many ISP's use to get the same result. It also has hooks built into it to easily join cache heirarchies and NLANR. Its 100% open source too.
Coming back to the topic, squid, akamai, freenet, or any other heirarchal cache structure could make these incredibly high bandwidth home connections work without destroying the servers those customers want to get to. Even when I was using a 56 k connection, I ran squid in my house to save bandwidth. If you applied the same idea to 100 people with 100 mbit connections in their homes, all hooking into a cache-mother at their ISP's office, I think everyone would be happy.
-------- -
SquidYou can set up Squid to filter them out. I'm not sure of the details, but I know that it isn't too difficult. As an added bonus, Squid also speeds up many web pages, because its primary function is as a web cache and proxy.
You can also set up ipchains to filter out certain IP addresses.
-
Re:host header is fine.. unless....
No. HTTP supports caching. FTP doesn't (at least not easily). If several LAN users are using a local cache, multiple requests to the same HTTP URL will be retrieved once over the Internet. Plug that local cache into a distributed cache mesh and popular content is mirrored automatically and transparently to the user.
If you have a problem with current HTTP clients, man wget, or post your problems in detail so the community can see what it has to offer.
In the larger picture, FTP is inferior to HTTP for anonymous file transfers of any size.
-
Re:Does this work with old clients?
Squid in accelerator mode should do this. You will have to tell it to use the host header though.
Ah. I was looking at it purely from a webserver (Apache) perspective. Thanks, I'll go and read up on Squid... I've been using that as a forward proxy for a while, but never thought of using that as a reverse proxy.
--
Greetings,
Ed. -
Why shouldn't they show banners?
If you find banners annoying (as I do), simply filter them out with something like Junkbuster, or my favoured solution, Squid and sleezeball. All those annoying flashing ads get replaced with a nice transparent gif. And so the advertising companies still pay my favourite sites, I occasionally click on those transparent gifs too.
If google wants to add banners, I say good luck to them. I won't be viewing the adverts, but they'll be getting revenue that will keep their service going. As long as the banners don't get in the way of the service, as they have on search engines such as Altavista, then that's fine. It's only when the websites become oriented around products rather than the service that there's a problem. IMO, this is far more likely to happen if they don't display adverts, revenues will no doubt be sapped and may force them into a position where a buyout is necessary. I somehow doubt any company which would buy them out would run the service half as well as the current google owners.