How to Work Around Broken Port-80 Routing?
Dr. Zowie continues: "I use a regional ISP with otherwise-very-good policies. However, they seem to be intercepting
anything that comes from my home net on port 80, so that they can
``transparently'' cache web requests based on the payload of those
packets. The proxy seems to work rather well in most cases: I
never noticed it until I started using OpenNIC. Then I found that some web pages that should have
resolved OK through the OpenNIC system failed even though routing on
different ports worked OK.
"I did some experimentation using ``telnet'' on port 80
directly, and found that packets are being routed based only on
the payload regardless of the original destination address: I can (for
example) retrieve the Slashdot front page by using ``telnet
www.google.com 80'' and asking for "http://www.slashdot.org
http/1.1". The tech support folks seem to be stonewalling me: the
main contact tells me that the behavior is "not broken" even though it
clearly violates RFC
1812, the standard set of rules for IP routing.
"The practice of ``transparent'' proxy routing seems to be growing
more widespread. It appears to break the internet standard in a way
that works for most folks for now, but that breaks port 80 usage in general. Looking ahead, this breakage seems
like a growing nightmare waiting to happen. At the very least, I
expect more instances of my particular problem to appear as folks give up on the corporate hegemony of ICANN. More insidiously, transparent
proxy routers break the layered nature of the internet protocol and
restrict the flexibility that made it work in the first place. One would
hope that such proxies would at least act like routers when the fancier
proxying fails, but at least my ISP's doesn't. What about your ISP's?"
You can use netcat to route your own port 80 traffic. Simply get a good UNIX shell account, and configure your router to direct to that. It becomes a real version of what you would be trying to do. However, I would bitch like crazy if my ISP did anything like that to me. If I want to connect to port 80 on something, I would want to be connecting to such port 80. Any fiddling with it would sure make me drop that ISP in an instant.
samrolken
Or, you could use your own proxy server, like Squid for UNIX or AnalogX Proxy for Win32. You might try something like the port + 65536 rule. Port 80 becomes port 65616 or something (That may not be precise), and that would confuse your router, but still be port 80. I used a similar trick to get around similar proxying at school.
samrolken
I recently had this problem with my university account...They route all resnet web traffic through an old 386 proxy server that can't handle the load. Find a free proxy out there and SSH tunnel to it. I'm sure there are more elegant means of getting through a poorly configured proxy, but this'll work as a quick fix.
find a friend who has a colocated server or dsl connection.
then use that machine as a web proxy, or set up an ipsec tunnel to that machine and route your port 80 traffic through that tunnel.
bgphints - internet routing news, hints and ti
I should have posted all this in one comment... oh well...
n et /Proxies/Free/?tc=1
You could also use a third party proxy server. You can find gobs of them here:
http://tools.rosinstrument.com/proxy/
and here:
http://directory.google.com/Top/Computers/Inter
samrolken
Onenet is the internet "service" provider to most state agencies within Oklahoma, including Oklahoma State University, where I am currently working on a BSEE. Neglecting Onenet's other issues (AOL's netadmins could do a better job than Onenet's), they have a "transparent" web cache proxy. More often than not, errors fetching a web page come not from the browser or the site itself as they should, but from the proxy. DNS errors from the proxy are not uncommon. As for switching ISPs, I can't, which really sucks. But for what I can reach on the net, I'm still getting ultra-cheap broadband :P.
I pledge allegiance to the flag...
of the Corporate States of America...
We had pretty much the exact same problem with our ISP, in that if we sent HTTP requests out without any proxy configuration, they would often take a couple of times to get through, since our ISP's transparent proxying didn't work. However, on setting the browser's proxy settings to the proxy itself, this seemed to solve the problem since it would ask the proxy directly.
:)
Don't ask me why
At my highschool, the current system for blocking webpages was introduced as a means to cache commonly used pages and make the District 225 intranet faster. The superintendent and members of the district board know very little about computers, so naturally it is approved. After the Columbine incident, a new feature was tacked on that blocked certain objectionable web sites. The recent WTC attack caused even more areas of the net to be restricted. Today, when i want to search "terrorism" for a paper on the war afghanistan, my results are blocked. Teachers have informed us that we must use the one non-blocked computer in the tech room, or do research at home.
my friend set up an anonymous web surfing proxy at his home computer, and using this i can get whatever i want.
there are publically available anonymous port-80 proxies still around.
SIGERR: laziness exceeds quota
that's why I suggested adding 80 to 2^16 and setting your proxy to connect at that port. It's the same port, the auto-proxy-router thing just wouldn't see it as such.
samrolken
The thing is, they probably won't listen to problems like this, or your proxy issue in most cases. But I found a way to make them listen to you:
Phone them up saying that you want to cancel the service. Mention something about their web hosting being broken. They will probably say that they will have a management person phone you back to confirm the process.
When they do phone back, for me, the call was like "Hello, there was a call eariler about a slow connection?" And at this point you have someone on the line who is interested in helping you, has power in the organisation to really fix things (because they're management or a senior tech) and they want to get your issue fixed to they don't lose your business. And THIS is when you really try to explain what's going on.
This was my experience. Perhaps it will work for you.
Proxy servers, They might not be cacheing 8080 or other Proxy ports. Check http://tools.rosinstrument.com/proxy/
Bouncers - You set this program on an external server on a port thats not filtered. You just point your browser at this IP/port and your outside your filtered isp. Check www.freshmeat.net
SSH, tunnel or route from an external box.
Really, If you cant go through it, go around it, either with software or networking.
-
Well, if crime fighters fight crime and fire fighters fight fire, what do freedom fighters fight? They never mention that part to us, do they? - George Carlin
That requires an external box, like the shell account the original comment mentioned. If you have that, you could use some more advanced schemes like routing only the SYN-packets for port 80 through your external account. This way you wouldn't cause three times the traffic like you do with a proxy (your connection plus twice the external connection).
The poster mentioned that he used OpenNIC which is an alternative DNS root. It is proper HTTP, but a transparent proxy that does not "see" domains in this namespace effectively block you from viewing webpages under this domain.
His own box is properly configured to do OpenNIC lookups, but the HTTP request to the (proper) webserver gets intercepted. Now the proxy has to do the real HTTP request, but the proxy does not know about the alternative domains and probably returns a "Host not found" error.
I haven't heard of free proxy servers supporting one of the alternative NICs and I doubt the ISP will be interesting in subscribing to such a service. I guess the only solution will be to convince a friend to set up a proxy on a box someplace else.
Some alternative roots have their own "real" Internet domain which acts as a gateway domain, for instance name.space has http://name.space.xs2.net/ (regular hostname) which enables non-subscribers to view http://name.space/ (namespace only), making the domains available globally. If OpenNIC provides such a service, an alternative solution could be to run some proxy at home and let it rewrite OpenNIC urls into "regular" URLs.
( ^_^)/
Actually, that's not true. Often you want to send an arbitrary HTTP request to an arbitrary host. See my example in the article.
Cheers,
Craig
(1) Line up a serious alternative ISP. Talk to their sales department; see if they do the same thing.
(2) Talk to your ISP's sales department. Tell them your problem. Tell them you're ready to move. (Perhaps ask what the hit rate of the cache is, that is, if the overhead is worth it for them.) See if they offer any accomodation.
(3) Go with the ISP that does what you want.
If you're using them for DSL, you may not have a lot of choice.
(As others suggested, if host resolution is your issue, you could run a local proxy on your 127.0.0.1 interface that converts host names into addresses.)
Stupid job ads, weird spam, occasional insight at
I reply to this because I bet a lot of people are going to think this.
The real problem is that you're probably using port 80 for something other than what it's explicit purpose.
No, that's not it at all. Follow the openNIC link.
What he's trying to do is resolve an address, via the perfectly standard and normal DNS protocol, with an alternative root server. This is also perfectly standard and normal. This is not a violation of DNS, nor any other protocol, nor is it a particularly wierd thing to want to do. (Unusual, but perfectly normal.)
The problem is that his ISP is catching all traffic to port 80, and redirecting it to their proxy. Thus, when he asks for "http://www.something.nonstandardroot", the web proxy is interfering with the request (presumably after his home computer correctly resolved the DNS address of www.something.nonstandardroot), catching the GET part of the HTTP request, extracting the server name, and attempting on it's own to resolve the name.
(Note this is a complete waste: The home computer has probably already resolved the address, now the proxy will resolve it again.)
Unfortunately, the proxy is too ignorant to know how to resolve the alternate DNS address. It's not incapable in the technical sense, it just doesn't understand root servers it's not configured for. The problem is that this means that the perfectly normal and acceptable HTTP request, for an HTML document, on an IP address the client computer has already perfectly normally resolved, gets lost, because the proxy doesn't know how to resolve the address. Bad proxy!
A workaround, albiet a sucky one, is to resolve the address on one's home computer, then go to that IP address manually. This still causes problems on subdomain-aware webservers, where several domains or subdomains may all come from the same IP address, and the server wants to use the host part of the HTTP GET request to differentiate what to serve. (You could code up a quick Python/TK script to do this, but it'll still suck.)
So, when you say a proxy is not required to route anything anywhere, you've accidentally hit on the exact problem: a proxy shouldn't be routing, because it may not know how. This proxy tries to. That's why it sucks.
And to cover the last part of your post, there's absolutely nothing non-standard about any of this, except the behavior of the proxy, which is the only thing in this whole mess that hasn't "embrace[d] the DNS standard, HTTP standard and the routing standard". ICANN's root servers are not written into RFC's. They are merely common practice, one that many people, probably correctly, believe is an increasingly dangerous common practice. (You may not completely agree, but the opinions deserve consideration.)
How could a number outside 16 bits make it to a router since TCP only holds 16 bits for ports? If you wrap around to 80, you have 80, not 65616.
-Kevin
If you look at it from your ISP's standpoint transparent proxies aren't as evil as you make it sound.
99.9% of the ISPs clients aren't trying to do anything tricky, like this. Of those 99.9%, say, only 40% have a proxy server specified. These 40% get to enjoy faster web browsing--which is probably all they're doing anyway. The other 60% enjoy slightly less quick web browsing, but that's they're own fault, right? They're the only ones losing out, right?
Wrong. The ISP has to pay for bandwidth. The ISP doesn't like the proxy only because it makes browsing snappier, it likes the proxy because it also saves them on bandwidth costs! If the other 60% of the clients were using the proxy they might save 10%, or more, on total bandwidth costs.
You could think of it like this, too: that's 10% more bandwidth available for the clients at no additional cost to the company (apart from the capital for the proxy server). Yes, they're not perfect, but they make a difference. When you weigh the pros and cons, well, it's obviously going to be worth it for the ISPs to have it installed.
You could look around for an ISP that doesn't use a transparent proxy but, as you said, they're becoming more popular. Realise that they're not doing to squash your freedom, but instead to provide better service and to save money.
Here in Singapore, ISPs are required by law to block port 80, forcing all outgoing http requests to go through a proxy server (which filters out webpages which are deemed unsuitable for Singaporeans to view, including www.playboy.com), or to have a transparent proxy server blocking out such requests.
This has caused me many problems before, when my IP gets determined wrongly by the remote site (which naturally thinks takes the proxy server's IP for my IP address). Some applications don't like the transparent proxy either, for example Frontpage Extension (not my choice to use!), and an autopatching program which refused to download the latest version of a file, insisting on downloading only the file cached in the proxy server until the cache gets flushed.
The only real method of bypassing the proxy is to use another proxy server (since 8080 isn't blocked) outside the ISP's network. This tends to be really slow though.
I guess I have to live with this until the government one day realises that proxy servers cannot stop the people from viewing pr0n, and it's probably not worth maintaining the proxy servers to meet the demands of all the net users in Singapore, not to mention maintaining the list of sites to block.
Second, there's a lot of ways around it which involve tunnelling.
Tunnel to another box running a non-broken web cache. I used to tunnel my http traffic through ssh to my colocated boxes, which ran adzapper, and proxied through that.
Tunnel at the IP layer by running any IP-in-IP encapsulation. If you have some version of windows, for example, you might convince someone with a server to run a PPTP server for you somewhere and you could tunnel through that. There are even Free PPTP Servers for Linux available to help.
Find someone who runs a little proxier for their own net with socks, and bounce off their socks proxy. Someone you know no another ISP probably has Wingate or the like running, and if they allowed it (and on some older version, it will permit this by default), you could set your browsers SOCKS settings to bounce off their proxy server, and since SOCKS isn't on port 80, your ISP will probably ignore it.
There are also a number of things you might discuss with your ISP to resolve the issue.
Suggest that they switch to a less broken cache server. (Squid, anyone?)
Suggest that they exempt you specifically from the cache server by telling it to ignore your ip address.
Note that they have an obligation to make sure their caching software doesn't interfere with your browsing; so it will be necessary (and not cost-effective for them) for you to call for every problem you notice.
Obviously, you'll need to probably speak to a whole number of supervisors, and probably eventually get transferred to a "real engineer", and they will probably hack in a fix (like exempting you only) rather than truly deal with the problem.
If all else fails, then you may want to try issuing ultimatums, like, "If you can't fix this problem, then you can cancel my service." Tech support people are lazy, however, in some cases, and may just opt to cancel you. This is a harsh reality in the world of consumer bandwidth -- and it will be worse, soon, with bells closing their DSL lines to competition, meaning unless someone else builds a telephony infrastructure to you, you'll probably pick Cable vs 1 DSL provider, and if you don't like something at either of them, you're just out of luck.
The BEST solution that unfortunately will never be implemented is to allow specifying a port number in a DNS lookup. Then when the browser or e-mail looks up the address, one could also specify a port that you want.
Unfortunately, this ain't gonna happen without a rewrite of everything.
Sometimes it's best to just let stupid people be stupid.
Of course, the problem with transparent servers is when they're not, and your ISP seems to have one that isn't. Is it possible to find out what kind it is, either by telnetting to the thing and looking at headers or by asking the ISP, and can you do bug reports to the vendor to get them to fix their product?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
My college has a similar set up because it saves an incredible amount of bandwidth. It's not to be mean, or malicious, or spy on your browsing habits, it's just to save bandwidth. And it does (I wish I had numbers to back this up, but I don't run the proxy).
There have been problems with the proxy in the past (it not returning any data) and there are still some minor issues, but on the whole it works well (in that you don't ever notice it).
It sounds like the ISP in question has a bug in their web cache code. If the web cache doesn't have the particular URL cached, it forward the request to the intended destination. I'd bet it's trying, but it can't lookup whatever OpenNIC URL is being specified (because it doesn't use OpenNIC). The ISP really should report this bug to the manufacturer.
My advice is this -- get the ISP on your side to fix the problem. They won't remove the proxy, and they shouldn't have to if the bug is fixed.
"Save the whales, feed the hungry, free the mallocs" -- author unknown
AOL's transparent proxy is a little worse. It ignores the port and proxies anything that looks like HTTP. Of course, they deny having a transparent proxy, but I was able to watch packets leaving our network headed for AOL and then watch altered packets come back from AOL.
I stumbled across this when their proxy had some trouble with the cookies we were using and suddenly no one on AOL could use our service. A few minutes later they could again. Then they could not. During this time, I was running a packet logger on the outgoing traffic from our server and on the incoming traffic to a workstation I had connect to AOL. Everything worked find until the server sent the cookie. Then AOL suddenly stopped sending more packets. This occured on every port I tried, even ports reserved for other services.
The web cache is exhibiting correct behavior. When a forward proxy cache (transparent or not) gets a request in the form of GET http://www.site.com/ http/1.1, it will use the www.site.com address instead regardless of what original dns name you went to (www.google.com in your example). In the transparent case where the GET statement looks more like GET /content.html http/1.1, it will use the original destination address.
In other words, it's your client that's broken. See RFC 2616 for details.
The unfortunate truth is that more often than not, sites simply don't set their cache controls correctly. They forget that caches don't exist just on the server side but that they exist on the client side as well. Section 13 of RFC 2616 explains how they work in great detail and it really should be mandatory reading for any site administrator.
If you're still looking for more information on web caching, check out Content Delivery Networks by Scot Hull. It was just released and is available on Amazon. There is an enlightening section on web caching that should clearly explain why what you're seeing is in fact correct behavior.
I don't see a problem with what he's trying to do.
The problem he's having is that he's asking for an OpenNIC web site, and not receiving the page. The problem is as follows:
The "address" of the site he's looking for is present in two separate places in the request he's making. The IP Header includes the IP address of the site, and the HTTP header includes the URL, which includes the server name.
When he requests a webpage from an OpenNIC TLD, his machine correctly resolves the hostname, and constructs an request, which is sent through his ISP. The web proxy intercepts the request, and tries to proxy his request, so that it can be cached for later lookups.
Apparently, the Web cache is not configured to lookup machines under OpenNIC TLDs. That's reasonable, but that shouldn't stop a web browser from being able to see the web page.
If the web proxy can't identify the hostname present in the URL, it should simply pass it through, allowing the client (who already knows the IP), and the Web Server (who also, clearly, already knows it's own IP) to communicate. This would prevent the client from gaining the benefit of the cache, but would allow the client and server to communicate.
By accusing the poster of "[choosing] to disregard the other relevant standards," I can only assume your talking about his testing the web requests through a telnet client. I think that was an excellent troubleshooting procedure. It clearly identified the source of the problem.
HTTP does have it's own rules, but none of those rules should override TCP/IP. If this user makes a request to a web server (he's obviously already identified the IP address of the server, or he wouldn't be attempting an HTTP request). The caching proxy shouldn't be hijacking his request for any reason. It may be misconfiguration, or it may be broken proxy software, but it certainly isn't the user's fault.
If you want to find the IP address of a transparent proxy, simply point your web browser at a web page that will print out "your" IP address when you request a web page. Instead of printing the IP of your firewall or your host, it will print the transparent proxy's IP address.
For example:
After that, you may be able to do some more investigation into what kind of host it is and/or what kind of software it is running. (This is left as an excercise for the crac...err, reader.)
Normally what you do is to do layer 4 switching but note that you can do do switching on layer 7 as well, which means you can have the switch do url based switching so that a part of the url determines that it should get switched. This requires much more power and is mostly done for server switching like load balancing.
What happens in your case might be that they have placed a switch that can do at least layer 4 switching, between you and the internet.
What then is done is that all port 80 requests coming from the clients side(you) are re-directed to the proxy which means that http requests on other ports will not be cached. Note that anonymous ftp can also be proxied.
A "clever" proxy/switch solution can do ip-spoofing so the webserver gets your IP adr. and sends it back to you directly, but as there is a switch inbetween, it redirects the result to the proxy which then sends the result back to you.
A way to avoid it is to get a gateway somewhere that can channel your http traffic, you could set your browser to use this gateway as a proxy on any port. The switch will most likely not act on the traffic coming on this port an pass it though.
The easy way would be installing a proxy server on a box that you have access to on the outside and configure it so that it won't cache anything.
OK, this is a bit OT, but since you're from Singapore, I'm curious about something. I know that when filtering was proposed there, many people weren't happy about it. Has there ever been a move to form something akin to the EFF to protest this, or is the political situation still such that doing this would get you hauled into court by the government?
The whole political situation there baffles me. More repressive governments have been forced to reform by popular protests. Why hasn't it happened in Singapore? You'd think that, with the extent to which the country is connected to the rest of the world, people would see what's happened in places like Indonesia, Thailand, Yugoslavia, etc. and want to do the same.
That light you see at the end of the tunnel might be from an oncoming train.
Have you ever configured IPSEC connections, particularly accross platforms? The most cross platform methods are x509 certificates and preshared keys. Neither method is viable to distribute among everyone. Sure, with x509, you can in theory have common CAs sign your keys and use that with x509, but that costs money. You could preshareyour own CA certificate and sign it yourself, but then you need the same amount of connection set up for every site you connect to you had before.
More likely solutio is to configure your own proxy beyond the ISPs contorl. Also not easy, since most people don't have machines in that position, but your suggestion is strange enough by itself.
IPSEC wasn't ever meant to be used for oppurtunistic encryption applications (like https, ssh, etc), but to establish connections on a more long term basis that would be used for arbitrary protocols, not such common ones.
XML is like violence. If it doesn't solve the problem, use more.
Hello,
How can you detect transparent proxying? Or opaque proxying?
Douglas Calvert
If you connect to a specific IP address, a transparent proxy should connect to that very same IP address. If it connects to any other for any reason, it is apply a sort of "routing" logic. Apparently what happens is because the client includes an HTTP version 1.1 "Host" header, the proxy prefers to do a DNS lookup on the hostname given, and (if it finds it) connect there instead of the client's original destination IP address.
This is broken. If the proxy has a different idea of what domain names mean, it gets the wrong web site, or perhaps fails to get one at all. A correct transparent proxy implementation should always connect to the very same IP address the client tried to connect to without regard to the "Host" header (which must also be passed along). A DNS lookup can still be done to optimize the cache. If the destination IP address is in the list of A records from the DNS query, then it can simply be matched to the cache by name alone. However, if the IP address does not match any that DNS gets, then those pages can still be cached, but they must be cached under the tuple of both the destination IP address and the "Host" header name together (as this content can be different than any other for the same host name or the same IP address).
Maybe someone can provide a list of which transparent proxy cache programs do it wrong, and which do it right (as I have not examined these programs). I don't know if peakpeak.com will change out the software once they find something that does it right (or even make a configuration change if it turns out that's all that is needed). Ironically, if you find an outside proxy server which can do it right for you, you could connect directly to that service via a different TCP port and end up defeating the efforts of your ISP to save upstream bandwidth by caching.now we need to go OSS in diesel cars
Theres a couple tricks to make proxy's not use its name resolution, use the IP you want. Use IPs, short ips, hex, oct, convert the ip. Havnt had to use this trick in awhile, ymmv.
http://1075594134
Biggest problem with proxies is sites like slashdot with updated content. GFX can be cached, but the html pages are updated every time. Sometimes I find it funny at work, when a site has the wrong time cause its cache. I force a reload that fixes the problem.
At work we have a caching/compression solution, we can speed up text almost to 768K+ on wireless connections. Gfx are compressed, but not as much as text. Just imagine GPRS connection with a good caching/compression solution, DSL type speeds. (shameless plug) You can get it at ATTWS with Pocket PC.
Run up their support costs until they start using
a non-broken proxy cache. Technical solutions are
nice, but they only fix the problem for *you*. If
you care about your peers, and the community of
users, solving the problem for *everyone* is much
to be preferred. Most users won't even understand
that they are being screwed by the ISP. They
depend on you to resolve the issue. Keep calling
support until they fix it.
-I like my women like I like my tea: green-
... are a pain in the rear. From time to time, the web proxy will just... die. No data from my box can go out on port 80 to any sites for a good 10-30 minutes. This is in addition to the usual crap with their gateways, which cause stalls in ALL data transfers at random intervals, for a solid 30-70 seconds. Ironically, that gateway problem stalls my large file downloads and makes it near impossible to view streaming media at any level of enjoyability... The two biggest features flaunted by broadband services like Comcast. Anyway, sorry for the OT rant. :P
Mozilla's a nice operating system, but it needs a better browser.
If you can use a tunnel server, like IPSEC or PPTP or SSH, which lets you pick the IP address to send your IP packets to but doesn't interpret the packets itself, you'll mostly be ok (you'll still have to make sure to do your own DNS if you want to resolve on alternate roots.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Thanks for saving me the trouble of explaining it, since most of the other posters seem to have misunderstood the problem.
The first two solutions are obvious, and I'm surprised caching proxies still don't use either of them. I'm sure they've been suggested before, but I haven't been keeping up with the caching IETF working groups.
Solution 1:
Send to the proxy the address of the DNS server you want the proxy to use for resolution. This is a kludge, and would result in duplicate DNS queries, which can take a long time, but at least the proxy would see the world as your client would see it. Unfortunately, you're either breaking the proxy's transparent feature by doing this, or you're mixing up the layers and violating good architecture by embedding resolution information in your HTTP client request.
Solution 2:
HTTP/1.1 specifies the host field for HTTP client requests. The solution is to also provide an IP address field, but make it optional. If it is present, then the proxy would not resolve the host field to an address, but otherwise it would. It's simple and it does not violate the nice layers set up in the architecture.
This is the solution I would like to see, and it is so simple.
If the user configured his browser to use a specific proxy, then I would agree with you regarding RFC2068. The client in essence is delegating DNS responsibility to the proxy server. However, what is happening here is called transparent proxy. There is no DNS delegation taking place. And RFC2068 requires that semantic transparency be preserved (although it does not seem to differentiate types of proxies). It says:
semantically transparentA cache behaves in a "semantically transparent" manner, with
respect to a particular response, when its use affects neither the
requesting client nor the origin server, except to improve
performance. When a cache is semantically transparent, the client
receives exactly the same response (except for hop-by-hop headers)
that it would have received had its request been handled directly
by the origin server.
In this case the origin server would have delivered a web page (I actually tried it and it works fine for me), and so the proxy has the responsibility to deliver the same thing. In that, it seems, it failed.
now we need to go OSS in diesel cars
It seems to me that ISPs use interception proxies to lower bandwidth costs. Here in Canada (Ontario at least), most of the big ISPs are talking about implementing bandwidth caps (5GB/month with excess charged at C$10/GB). I hope your ISP isn't doing both, as that would seem unnecessary and rather heavy-handed.
My previous ISP, Sympatico, used to have a transparent interception caching proxy. It was quite troublesome and more of a translucent crashing poxy server. I remember being unable to access starwars.com for two weeks once, even though everything else seemed fine. It was particularly annoying for people whose MTU was set too high (they needed 1454 or less) as they would constantly get timeouts on HTTP POST, such as when trying to send email from a web interface like Hotmail or Yahoo. It was also a constant source of problems for people trying to author their own personal web pages as it would cache them and not show their updates.
My current ISP, IStop.com, has an optional proxy. This is great! I normally use it, but if I have problems, I can switch to a direct connection. They run Squid and they also seem to have some sort of advert filter running. I get their logo (cached by my browser) or "This ad zapped" messages in place of at least 80% of web ads, which saves me lots of irritation, and both of us save lots of bandwidth. Incidentally, they also have reasonable bandwidth caps: 10GB non-local + 10GB local (mail, news, proxy, etc) per month, with excess charged at C$3/GB.
After a while, Sympatico reduced HTTP interception to large population centres like Toronto, Montreal and Ottawa. Finally, they stopped doing that too. I guess it was causing too many problems and costing them too much to deal with it. If my ISP were to introduce an interception proxy today, I would leave them immediately. It's just not worth the irritation and problems for the length of time it will take them to fix it or get rid of it. I do live in an area where there is plenty of DSL competition though.
So that would be my advice: switch ISPs immediately. Don't waste anymore time or effort on these guys.
Considering one can get caned or even executed for fairly trivial things over there - I bet there hasn't been much protest. There also appear to be cultural factors at work - most people there would rather just "follow the rules", sort of like where the USA is heading. :(
Just because it CAN be done, doesn't mean it should!
Nice rant, but from the content in your posting, I'm not sure what that has to do, necessarily, with a proxy server. Seriously, do any certifications even exist for running a squid proxy? Come on...
Your rant, taken more as a statement of the lack of general competence is somewhat valid, but I just don't see the connection to this specific issue, other than obligitory karma whoring.
Whatever...
-buffy
I am the network admin for a wireless isp that does transparent cacheing. If a user asks us to turn it off, we can disable it for their IP.
For more than 99% of our users, they don't know what routing or cacheing is, much less that it's happening. For those that actually have issues with the proxy it's a quick modification to our ipchains rules. So far we've only had 2 such requests. Also, we disable the cacheing for business class users by default.
I would hope that you would ask them to disable their transparent cacheing for you before doing something as rash as dropping them. It's my bet that most of their other users do not have this issue, and they may not even be aware that it is causing problems for you.
The original post describes the prediciment that she/he is in, but doesn't even say what is broken, exactly!
From the submission, it actually appears that the proxy is working exactly as configured. The end user, however, is breaking things himself by using nameservers other than his ISP's. That can't be described as a failure of the ISP by any means.
Proxy servers add a lot of value to any network larger than, say your 3l33t home rig. The two main purposes I use them for are to reduce overall bandwidth usage, and to insert some level of malware protection. I've saved myself, and my company a lot of headaches by blocking silly virus code requests.
It's nice that the post managed to include links to RFC, etc... it's too bad that they don't seem to really have an understanding of how networks, specifically the Internet, works.
As others have commented there are plenty of alternative ways to get around this like SSH tunnels, VPNs, third-party proxies, etc...
Just my own little $0.02 worth of a rant. Please drive through.
-buffy
Dr. Zowie's description of the problem sounded like something that can be worked around, at least for some cases - which is why he may need to work with the transparent-proxy's vendor and not just the ISP. The two big problems are using the correct DNS lookup, and having old data in caches. Cache aging strategy is a standard problem - some systems do better or worse jobs of managing it, and some give you workarounds. (For instance, the proxy I use at work seems to respond to "Reload" requests from the browser and refresh its contents.) DNS lookup problems are really a bug - if the browser sends an http request to 192.9.200.1 for foo.bar.altroot/zap.html, it's certainly easier to implement by having the proxy forget the original packet's IP address, see if they have the page cached, and re-resolve the URL if not, but it's also a bug - they could keep track of the IP address as well as the URL. The bugginess of the dumb approach is most apparent with alternate-roots, but it can also be a problem for URLs at domains with round-robin DNS, where requests can go to any of the IPs in the group, but multiple requests need to all go to the same server for consistency, either because of stateful requests or because the servers aren't running with identical content names (e.g. for dynamic pages.) (One can argue that the servers are buggy in that case, but that doesn't mean that the caching proxy's behaviour isn't also buggy.)
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
There are a few workarounds to the problem of devices that you do not wish to handle your traffic doing so.
I have seen tunneling via ip-ip, ssh, and other ipv4 protocols mentioned, however there is another option available, and that is to tunnel your traffic as ipv6 traffic over ipv4.
It does take a bit of time to set up, but if you can find an agreable ipv6 network provider to allow you to tunnel to their server, your traffic will not be handled by any transparent proxy server at your local ISP, regardless of the type of traffic that you are working with.
I am not sure how complete the ipv6 implementation for Windows is yet, or, depending upon which version of Windows you may be running, if it is even an option, but for users working with Linux and BSD, this should not be a significant issue.
Then again, I could be wrong.
-Rusty
You never know...
Now the question is, when someone connects to an IP address, does the proxy, which intercepted the request, connection to that same IP address? Or does it do a DNS lookup and try to connect to what it thinks is the IP address? And if it does the latter, is it smart enough to fall back to the clients original destination IP address if it doesn't get one via DNS?
The classic proxy server, which is connected to directly by the browser client, does not know what IP address the client wanted, because it didn't try to connect to one. But in the case of transparent proxy, there is now the issue of the IP address that the client was connecting to. Have you verified what your proxy is doing?
now we need to go OSS in diesel cars
A minimal outside web service reached via the domain name you want to use can redirect all requests to a slightly different domain name, along with a port number which won't be 80. That won't prevent them from deciding to block every incoming TCP connection not recognized by various protocol proxies (e.g. FTP), so this isn't a perfect solution. As the cat and mouse game continues, you may end up having to keep up a tunnel to an outside server, and run that tunnel through faked HTTP as well.
now we need to go OSS in diesel cars
My ISP (CTC) started doing this on my static dialup without warning. I noticed because 1) eBay pages suddenly required reloading in order to update (ie, if I quit the browser, and then went back to a dynamic eBay page, it was the same as before unless I reloaded the page). .. and then 2) I noticed when connecting to another machine, the address that showed up in the logs was not mine!
Anyway, after poking the machine I discovered it was a Cisco something or other. I also discovered that if you sent a malformed or invalid request, it would STOP transparent proxying for a few minutes!
So the solution I came up with was to telnet port 80 someplace (didn't matter where, because the proxy would pick it up) and type "PLEASE DON'T PROXY ME" and close the connection and then it would leave me alone for a few minutes.
Most of the time I left it on as the proxying seemed to speed up the usual day-to-day surfing. But you might want to try a script to do this automatically. Probably this is just an option the engineers forgot to turn off (I believe by law they must turn off all customer-friendly services :-).
After a few weeks of doing this, and making a few phone calls, the proxy mysteriously went away. Maybe they took my static dialup off the list, or they decided to do it for everybody. Whatever. I've been using Squid so it's pointless for me anyway.
A server running on a single IP address can serve multiple virtual hosts using the HTTP "Host" header to select among those configured. If you typed "http://64.28.67.150" then the client will send as one of the HTTP headers "Host: 64.28.67.150". The origin server (the one the ultimate has the requested document) uses that to select a virtual host, if so configured, or ignores it if not configured with virtual hosts. The proxy server, however, should respect the connecting IP address and use the resolved hostname only for cache matching purposes (so it can match a common cache among multiple IP addresses listed by DNS, if the destination IP is among them). So in the case of your question, using the IP address like that wouldn't make any difference. Where the difference lies in when you connect to a hostname in which your browser gets a different IP address than the proxy gets.
now we need to go OSS in diesel cars
We have a squid cache and during peak browsing, er, working times we see 40-50% cache hit rate.
I think the byte savings isn't quite as good as that, but I don't have any solid data to back that up.
The best I can say is that we had to shut the cache off for a day or so to do some maintenance and the help desk got a lot of calls about how "slow" the web was, in spite of the fact that not more than a few days prior we had *doubled* our internet bandwidth (single 1.5Mbit frame to MPP bonded dual pointtopoint).
I think that overall it provides much better bandwidth utilization (ie, fewer packets on the ISP link, even if the byte savings is only 10-20%) and the client browsing experience is a lot snappier.
Our ISP used to have a whole statewide squid cache hierarchy which you could tune your local squid cache into if you wanted to -- I wish they still did, the aggregated caching would have been very nice.
Not the removal, the separate availability. <mind mode=screensaver>You should be able to buy Linux and install it without Konqueror, and Konqueror without Linux. Oh, wait a minute... you can!</mind>
Just to rub the point home, you can buy and install Linux with or without graphics, with a different Graphics layer (such as Berlin), with a different window manager (such as FluxBox) and so on. All (modulo a few libraries) with or without Konqueror or one of a host of other browsers (Mozilla/Galeon/SkipStone, Netscape, Opera, Amaya, Mnemonic, OmniWeb etc).
Got time? Spend some of it coding or testing
Wrong. That's not what's happening. Ordinary proxying does use the modified GET request form where the URL is used in place of the URI. However, transparent proxying is different because the client is sending a URI, not a URL. And it's connecting to the origin server IP address directly, not to the proxy. The only way to identify the correct host is to use the IP address the client attempted to connect to. That's the transparent in "transparent proxy".
If a client does attempt to connect to some IP address, and a transparent proxy won't use that IP address because it thinks the origin server is at another address, that's wrong. But if it has no idea what the origin server IP address is at all, even though the client was indeed connecting to it, then that's doubly wrong. A message from the transparent proxy saying it cannot find the IP address is simply stupid because it has the IP address the client connected to, since this is a transparent proxy.
now we need to go OSS in diesel cars
Solution 3:
The proxy server is enhanced to try multiple DNS servers, (even in event of NOXDOMAIN), including some in the 'standard' tree and some from OpenNIC. This solution has the advantage of only needing mods to the proxy, not the clients.
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
Thanks. Unfortunately, this doesn't work well if the site (like most sites these days) uses name-based virtual hosting. For a given host, you might get several different web pages depending on what you put in the host part of the GET URI.
*sigh*.
Port 80 is not the 'realm' of http. It's just commonly USED for http.
A transparent proxy *does* break standards. You are no longer buying an internet connection, you are buying a filtered, proxied, mutilated internet connection.
That aside, this is not the issue the guy is having.
He's trying to use an alternative DNS system.. but the proxy is using it's own.. so he is hostage to what his ISP wants to resolve things to.
As for standards.. the STANDARD is to route IP traffic, not analyze it, mangle it off to a transparent proxy, and then send it onwards.
I doubt it's 99.99999%. There are apparently quite a lot of people trading around on the .MP3 and the .DVD hidden domain networks. RIAA and MPAA people most likely have no idea how to get there, if they even know it exists. Do you?
now we need to go OSS in diesel cars
you could also setup a proxy on localhost that rewrites the Host header from 'Host: www.weird_ass.domain' to 'Host: www.weird_ass.domain.existing_domain.com', and then have the DNS server that resolves 'existing_domain.com' to reply with the IP for 'www.weird_ass.domain' when it gets a request for 'www.weird_ass.domain.existing_domain.com'. Maybe the maintainers of the 'weird_ass.domain' zone alredy have that.
You'll probably need a lot of custom code for something that can be fixed by changing ISPs tho.
--
Stay tuned for some shock and awe coming right up after this messages!
TCP/IP does not mandate the use of any specific root servers. This kind of thing doesn't even need to involve alternate DNS name spaces to have an impact. It's perfectly valid to connect to an IP address which is NOT listed in the A-records for a hostname which is also included on an HTTP "Host" header line. Yes, these are weird things, but they are valid internet protocol. If a business wants to assert that it is offering true internet service, then it needs to make it work exactly as if it were. If they can pull off doing a cache that ends up behaving to the end user exactly the same, great. But peakpeak.com didn't accomplish that. An intercepting transparent proxy server should always connect to the true origin server, and pass any "Host" headers through unchanged. It must use them to check for cache validly, along with the IP address. Even in the case of peakpeak.com's broken proxy, which was wanting to connect to whatever IP it got from DNS, should still fallback to the IP address the client was using if it gets nothing from DNS. It didn't even do that. That's very broken.
now we need to go OSS in diesel cars
This is don't with the Web Cache Communication Protocol (WCCP ) from his ROUTER. the command to find out if a Cisco router is WCCP enabled do a sh ip int (your int). Yo can look up the specs of the protocol to figure out how to try and bypass it. But you probably won't get ther by using another proxy(tried it), because you will still go through the original proxy configured at the router before going anywhere.
Saying Java is nice because it works on all OS's is like saying that anal sex is nice because it works on all genders.
If the web proxy can't identify the hostname present in the URL, it should simply pass it through, allowing the client (who already knows the IP), and the Web Server (who also, clearly, already knows it's own IP) to communicate.
.0001% their client base that this concerns is rediculous.
Not a horrible idea, i'm fairly doubtful it will work though.
The problem is the caching architectures i've seen have the exit router communicating to the cache through a gre tunnel, so the cache has no way to actually let it "pass through" as the cache isnt the actual box doing the routing. This is vastly superior to "put proxy between person and internet and let it route too" because of loadbalancing and failover reasons, but doesnt allow what you're asking for.
I dont really consider this 'broken proxy' nor do i consider it a misconfiguration, the simple fact is that his ISP (as well as mine) denies the existance of the opentld system. If he really has a problem with it, he should find another provider. Expecting the ISP to bend over backwards to satisfy their
Solution 3 isn't ideal since it still rests on the tacit assumption that there are only "proper" DNS roots and everything else is invalid. How do you know which roots the client is using? The answer is to let the client decide and to specify an IP address in the HTTP client request as well as the DNS name via the host field.
This also avoids performing unneeded DNS lookups.
I have, it is entirely practicable, with the right infrastructure support. However the probability that the destination supports IPSEC today is so small as to be negligible. And IPSEC would certainly not help someone who insisted on using an alternative root.
IPSEC wasn't ever meant to be used for oppurtunistic encryption applications (like https, ssh, etc), but to establish connections on a more long term basis that would be used for arbitrary protocols, not such common ones.
No, IPSEC was designed for exactly that case, it just happened to be deployed for VPN. IPSEC was started ten years ago, long before SSL was developed.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
The Singapore government is probably more concerned about stopping people accessing the numerous overseas sites run by the opposition movement. For those that don't follow Singapore politics it is one of those countries where the government brings specious lawsuits against opposition politians and elections are run in the manner of the old Soviet Union.
Of course since it is a capitalist pseudo-democracy this rarely gets comment in the western media. When it does the government has sued for libel under its mickey mouse libel laws in its kangeroo court system.
All phone calls made in Singapore are tapped and the government analyses the telephone call logs to see who is talking to whom. Its kinda the state that Ashcroft would like.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Ok my first response: the fuck?
Why does this perceived "problem" have anything to do with management, or the problems w/ certification? This "problem" the poster perceives is entirely not a problem of technical incompetence, but rather the simple fact that to this isp, supporting these tlds is simply not worth the time/trouble/money.
Webcaching is highly cost effective. It allows providers to charge this $35-50/mo charges w/o going under, or without losing too much money.
As someone with 7 years programming experience, 3 years networking experience, and did months of research on webcaching before implementing it, i can tell you for a fact that I would not think of this as being a problem during implementation. Hell in fact i still don't think it's a problem, quite frankly taking down a router for 30 minutes just to fix one person's obscure problem just isnt worth it. You're paying what? $20/mo? Great if you stay for 6 months then you'll pay for my 1hour of overtime. and that's not counting how many other people we piss off for having this 30min slowness/downtime.
So in response to your post. This was not a bad decision by any means. Done correctly it has the ability to speed up response times (and my data shows that it does) and saving money by using much less bandwidth (btwn 10-40% even while full honoring of if-modified-since headers). It improves service to 99.999% of their customers and pisses off one, because he can't use some esotericthingthattheISPdoesntwanttosupport anyways. Please.
Your points over worthless certification and stuff like that certainly could be valid points if you had applied them a relevant arguement, however this is entirely not, as you have no position/ability to judge that this was indeed a bad decision, or that it had anything to do with (unknowledgable) management decisions.
Your problem is not one that HTTP or the proxy spec was designed to cover. When we developed HTTP the issue of ICANN did not exist. I certainly don't think it unreasonable for a proxy code writer to assume that users are using the Internet DNS system. If you want to do things different you should expect problems, that is the way of the world.
The host name header was introduced as a hack to alleviate the problem of IPv4 address exhaustion. There is actually a good reason for the proxy to dereference the DNS name itself since then it can do load balancing amongst http servers if the client does not.
The proxy might also be using a new enhanced http protocol and so it is pretty important that it be able to access the DNS NAPTR records for the service and do the appropriate mapping.
One way to address the problem would be to change the host header so that it has the alternic prefix to the dns name, if porn.xxx is an alternic name one would assume that there is a name something like porn.xxx.alternic.org that resolves in icann space. If you want to use non standard DNS configurations expect to have to patch applications.
Proxy caches were really important in the early days of the web and still are for certain congested links. In the main however the content providers use techniques that mean that caching is very much less useful than it once was. Most content is active these days so it is only the images that cache well.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Transparent proxying is a violation of IP routing, plain and simple. This has been discussed ad nauseum on the IETF WREC WG mailing list and the IETF main list.
Each (hostname,address) tuple can be a different site. The server needs to cache that way. I wouldn't consider doing it any other way. Maybe the proxy server you would write could be poisoned, but mine would not. Then the cache can be optimized by doing a DNS lookup (but not delaying the first request for this) to get the IP addresses. If there is more than one, the cache indentities for these would be merged and shared.
now we need to go OSS in diesel cars
True, but now you must modify both clients and the proxy. Also, how many roots are there, anyhow? One hopes this number doesn't get too big. Anyhow, after rereading other posts, I think using the IP address the client uses in the SYN packet, possibly only in the event that the proxy's own DNS lookup fails, is actually the better solution. That's the result of using the DNS the client uses.
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
It isn't tunnelling or port forwarding or anything technical. This just wastes more bandwidth or cpu cycles.
Call your ISP, tell them that you want internet access, and explain what their proxy is preventing you from doing. Ask them how much it would cost for the service. If they are unwilling to supply the requested service, or charge to much for it, cancel your service and use another provider. Be sure to tell them why you changed.
Alot of small local ISPs were started because users wanted more features than AOL or the other providers. This is America, if you build a better product, customers will beat a path to your door to pay for it...until it is outlawed or you are put out of business by illegal (or recently legalized) tactics by the competition.
What prompted me to write to Slashdot was the thought that perhaps others are having similar difficulties. If I were dealing with a simple proxy router (such as you apparently helped design the standard for) then my solution would be simple: direct ICANN URLs through the proxy and other ones elsewhere. But I really have no choice but to use the proxy, because my ISP is intercepting all of my port 80 packets.
In a more general sense, OpenNIC isn't the only case where the ability to access http directly is important. Other cases include testing virtual hosting or accessing hollow-tree virtual hosts within a machine with a known IP number.
You talk about trying to coerce the rest of the world into supporting an alternate DNS. That's not really necessary: strict IP routing already supports alternate DNS roots -- the layered structure of (most of) the protocols keeps your choice of DNS root from interfering with where your packets go. That separability is a Good Thing, and part of a whole suite of related Good Things that make the IP concept work.
Yup, that's pretty cool. I like that (for example) I can get https:// requests through to any host I want, simply because the encryption enforces the layering of the IP stack. Unfortunately, almost nobody actually publishes the same pages through http: and https:. The encryption layer just uses too many resources in high-bandwidth applications. I wonder if that'll ever change?
Yes, I've been told that Singaporeans are very "rules-oriented", but my understanding of the situation was that not only porn was subject to censorship, but also any political sites that the PAP sees as threatening its hold on power. Not to mention the other rules, such as required registration of locally-hosted political Web sites, which I'd imagine is a way to keep them offline. I can't imagine that everyone is happy with the situation, especially those who would like to stand in elections and change the government. Standing up and protesting for the right to see porn is one thing (although I'd say that what one does in one's home is their business, as long as no one else gets hurt), but stopping political speech is another thing entirely. And FWIW, at an ISP I worked at, we did get protests about filtering of porn sites. We didn't actually filter anything, but some users had filtering software on their computers without their knowledge (don't ask--they weren't the sharpest tools in the shed), and they were very clear in that they didn't want us interfereing with what they looked at. I also worked at an ISP that offered both unfiltered and filtered access, which I thought was the best solution. Whoever set up the account could tell us what kind of access they wanted, and only we could change it, so a user's kids couldn't figure out how to disable the filters. Well, I guess they could've used an outside proxy, but that wasn't our problem.
But I'm not so much making a statement here as asking the question: What would happen if one person or a group of people stated publicly that they thought that these filters are unfair and that they should be taken down, that they thought the rationale for their implementation was wrong, and that they do not serve the interests of ordinary citizens?
That light you see at the end of the tunnel might be from an oncoming train.
Yeah, I know, I used to use (in the PAST-TENSE) netcom/mindspring/earthlink, but know they have some ghey transparent web proxy cache that listens to EVERY port 80 request REGARDLESS of any actual webserver at any address. Doesn't this violate the DMCA? 1) They are "caching" (storing) other web-site's content thereby possibly violating copyright law. 2) You might have an old copy of the website. 3) It's slower if the page is not in their cache. 4) They can easily monitor what you are browsing and sell the info to advertisers. 5) They can change what you look at (Man-in-the-middle). Hmmm... maybe changing ISPs would be a better solution.
The biggest trick the devil pulled was letting lawyers become politicians so they can write the laws.
So modify the host header appropriately as suggested.
Unless of course you don't really want it to work and all you really want to do is complain that your mickey mouse ISP does not support your mickey mouse DNS root.
Incidentaly, looks like someone really didn't like hearing criticism of openNIC, they moderated my previous comment 'overrated' which means 'I want to punish the person for posting this but I know that the metamoderators are likely to punish me for using any of the other down mods'
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
I see what you mean. You are sending traffic to a particular address based on your own DNS resolution, and if the traffic is proxied, you want it to be sent to your chosen destination, not that of the proxy.
In my opinion, the ISP is exhibiting correct behaviour.
Picture this: the object of the exercise with the transparent proxy is to cache pages and increase speed for the customer, right? I think it's already been agreed earlier in the thread that this is not entirely evil.
Let's say the proxy honours the destination IP address that you chose (I'm not sure how this would work in practice, but I'll go with it for now). It returns the web page from the server that your DNS picked, and caches it for the next guy.
Another customer requests a page with the same name. What if they're using a DNS root where the answer conflicts with yours? The customer gets the "wrong" web page. Because cached objects eventually expire, this means that the customer might get a completely different site dependent only on the time and date they happened request it.
The ISP doesn't use the same DNS root you do, so they can't begin to troubleshoot the problem.
I concede that the popular "alternate" DNS roots have few enough conflicts with the IANA-assigned roots at the minute, but even that is an irrelevancy - any solution that allows a customer to choose destination IP address on behalf of other customers opens up the ISP to a denial of service attack by a user less trustworthy than you or I. One could set up an arbitrary "root" server that resolves www.yahoo.com to my own site. Or google. Or some site that accepts credit card orders.
I can't see any scalable way out of this without the ISP picking one root, and sticking with it. If that is so, then I think this is a fundamental problem with split roots and, if you really want to use them, be fully aware of what you're getting yourself into. Turning off the transparent proxy will help this time, but you won't be able to rely on being able to talk to any server on the internet that doesn't use the same root as yours, even the servers you don't (usually) need to know exist.
Regards,
Dave
There certainly does seem to be confusion. I must say, the RFC authors made a poor choice of terms, although clearly it was a historical accident and this is probably the best that can be done to work around it. I've always used the term "transparent" to mean transparent in the sense that you can't tell it's there. I picked up on "interception" during this thread, but I had missed RFC3040 because my term greps must have missed it (concept indexing would be nice, but almost no one ever does that ... it's quite hard to do ... but maybe they have a different term for that, too). Ask 1000 people the meanings of some terms, and you'll get 1001 different answers.
now we need to go OSS in diesel cars
It's usually cached by way of using an MD5 hash of the URL requested as an index in most caching servers (I know, I USED to work with a CDN that used several different tricks and we checked out loads of caching engines shortly before they shut it down, looking for an alternative to Squid). If you use the resolved IP address to place the request and use the HTTP header info only for caching index, you won't get a poisoned cache as you described it because IP address that you got the content from doesn't matter, only the request URL that got you there does.
/. discussion) in the process.
Now, as to why the "transparent" caches don't work like they should... Anyone that knows something of how they're set up would be able to tell you that there is no easy way to achieve the functionality to get the "correct" way with the typical setup. The typical setup usually involves router tricks to NAT the request such that it looks like a seperate caching server is the webserver for your request and then the caching server places the request accordingly.
Unfortunately, with dynamic content out there, a LOT of pages can't be directly cached (and there's nothing to make them so unless you do like epicRealm attempted to do with a CDN or what they and others are attempting to sell right now with an "app accelerator"- there's no current good protocol to tip the cache off that the content is stale so the content providers flag it as uncacheable...) so a "transparent" proxy is of some limited usefulness- unless you've got more than a couple of people placing requests for the same cacheable content, it inserts this big, fat latency and breaks a lot of things (like the subject of this
Unless it's truely transparent, being part of the router itself, it's probably more of a nuisance than a help, no matter what the ISP says to the contrary. I'd be finding a new ISP because they're being a little more than clueless.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Yeah, I thought about that, but how does the IP forwarding box communicate the IP address to the proxy? If they're the same box, then it's easy, but otherwise it would require some funky modifications.
...most of the "transparent" proxies for HTTP tend to be router NAT hacks for a seperate caching server that is set up as a typical caching proxy. Since it's really a typical proxy with router tricks, it's operating in the usual proxy mode which then expects the proxy to do all DNS, etc. for the request, not the client.
It would be really contorted to achieve the "right" way, so nobody's bothered to come up with a caching engine that worked in the manner needed to do it truely transparently (Sitting on the router, etc.)
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Unless they're a large ISP, the only thing the cache is going to buy them a benefit on places that everyone hits that has static content because typically, most caches don't work well with the HTTP 1.1 cache hinting and it's difficult to set up for the HTTP 1.1 cache hinting so they usually send the dynamic content with pragma: no-cache in the headers and set the expires value to expire it immediately from the cache as stale. A cache is a web decellerator and buys NOTHING in the way bandwidth savings like most people think it does.
Amazon, Yahoo, et al. all set pragma: no-cache in the headers for a return request.
And you didn't pay attention, no less: His problem is he's using a different root DNS server than the so-called transparent proxy. Because of this, his browser will resolve and place requests correctly, but because the router is set up to NAT those requests and flip them to the caching proxy server, the request is then re-resolved for a DNS entry, etc. If they're not using the same DNS root, the whole thing breaks down.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
The host server can flag content as non-cacheable and the cache, if it's properly HTTP 1.0 or 1.1 compliant will merely relay the page to the requestor without caching it.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Most proxies blindly resolve requests on behalf of the requestor because it's not really designed to be a "transparent" proxy- it's a router hack that makes it purportedly transparent. They are designed with the HTTP 1.0 or 1.1 proxy server specification from their respective RFC documents. Because of this, there is a relationship that is specified (i.e. the client browser places all HTTP requests to the cache, which then places the request as if it were the client browser. The client browser doesn't do DNS, etc. in this mode of operation) that is not present and is not assumed to be there with the un-proxied mode of operation.
This IS non-compliant with the RFC- it just "works" when you're using the same DNS server.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
IPSEC relies on unencrypted headers to work. This "transparent" proxy is a router hack that re-routes port 80 traffic for everything except the proxy server to the same. IPSEC would get flipped to the proxy and break down since it's not in the IPSEC session.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
If your client browsers are hitting static content sites, then YES, it's VERY effective. If your client browsers are hitting dynamic content sites, it's nowhere near as effective because the playground there is evil and broken. There's not a lot of fully HTTP 1.1 compliant caches out there (a requirement for a server to hint at expiry- needed for dynamic content...) and it's purely evil to set up the hinting for caches to work as intended- so nearly every dynamic content site out there (And that's the majority of the sites the average populace hits) set pragma: no-cache on the headers as well as setting an immediate expire time on the content. Dynamic content sites with average caching engines actually cause a degredation in browsing experience for the users as the caching engine never caches the dynamic content.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
It's a squid or similar server that is distinctly seperate from the router itself (A router COULD transparently proxy by being an interception proxy- but that's a lot more complicated and I don't think there's a lot of them about because they tend to be more expensive for some reason...).
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
If the content is largely dynamic in nature, it won't get cached as the content providers tend to set pragma: no-cache on the headers and set the expires time in the past to force expiry to ensure fresh content. In the case of a LOT of stuff from Yahoo, Amazon, etc. you're going to find that a couple of pictures may cache, but the rest of the site will not be there for that reason.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Largely speaking, dynamic content (i.e. app-server driven websites, like /., Amazon, etc.) don't get cached in servers because there's no clean, easy way to hint expiry as the content goes stale at unpredictable times. Because of this, these content providers tend to set pragma: no-cache in the header and set the expiry time to something in the past to force expiry from the cache as soon as it's served to the requesting client browser.
If you have your browsers all hitting static sites and content, it works very well. It's not so hot to miserable as they hit more dynamic content sites because of what I pointed about above.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
Probably the biggest reason is that many people (seems most of them have shown up on /.) have difficulty understanding the caching logic to make it work right, or are dealing with some perverted intercept mechanism that loses the origin server (where the client is connecting to) IP address during translation. If the cache is not "in" the router, doing NAT to handoff the request to the cache on a separate box is wrong. But apparently some do this anyway.
now we need to go OSS in diesel cars
Who told you that? ICANN?
now we need to go OSS in diesel cars
So, if it just happens to allow mean, malicious, spying or filtering, "transparently" so that you can't get around it, THAT'S OK? Nope.
I don't care how easy it is to get around, it sucks. You could save even more bandwith by blocking debian mirrors, red hat's ftp sites and all manner of stuff that only affects a few of your users. No problem eh? Think of how much faster all the comercial crap will load up for all your "consumers". No thanks.
Friends don't help friends install M$ junk.
The client has already resolved the IP address, using whatever DNS roots it wanted. Because this is a proxy for folks who don't have proxies configured (hence 'transparent' to the user), the client actually tried to connect to the IP address it got from DNS. It's just that the proxy saw the connect on port 80 and intercepted it, but the IP destination address was in the initial SYN packet.
"that's not encryption - it's a new perl script that I'm working on..." - from some Matrix parody
Nonono. I've encountered this before myself (I was trying to use some other site's cookie on my site (don't ask ;) )- it works without the broken cache in between.).
/
/
The broken cache does this:
Intercept packets with destination IP address 1.2.3.4 and port 80.
Looks inside packet.
Sees
GET
Host: www.google.com
(other headers snipped)
It then IGNORES the 1.2.3.4 address, looks up www.google.com for itself (if DNS not cached yet).
Say it is successful and finds 216.239.51.100.
So it connects to 216.239.51.100 port 80 and says
GET
host: www.google.com
---
Now if it isn't successful with the dns lookup, you're screwed - it either gives you an error page or disconnects you - no message.
I believe the correct thing for the cache to do is to use the 1.2.3.4 address both in the cache index and in the outbound connection.
Now the issue with using the IP address in caching is that many sites have multiple IP addresses for the same address and the cache will have to treat them as different sites. This means you need more resources on the cache and performance is lower. So I figure the cache manufacturers figured that performance over correct behaviour was an acceptable tradeoff.
After all they can argue with their customers that the correct behaviour for a transparent proxy cache used to lower bandwidth usage is to lower the bandwidth usage even if it breaks rare situations like this.
Cheerio,
Link.