How to Work Around Broken Port-80 Routing?
Dr. Zowie continues: "I use a regional ISP with otherwise-very-good policies. However, they seem to be intercepting
anything that comes from my home net on port 80, so that they can
``transparently'' cache web requests based on the payload of those
packets. The proxy seems to work rather well in most cases: I
never noticed it until I started using OpenNIC. Then I found that some web pages that should have
resolved OK through the OpenNIC system failed even though routing on
different ports worked OK.
"I did some experimentation using ``telnet'' on port 80
directly, and found that packets are being routed based only on
the payload regardless of the original destination address: I can (for
example) retrieve the Slashdot front page by using ``telnet
www.google.com 80'' and asking for "http://www.slashdot.org
http/1.1". The tech support folks seem to be stonewalling me: the
main contact tells me that the behavior is "not broken" even though it
clearly violates RFC
1812, the standard set of rules for IP routing.
"The practice of ``transparent'' proxy routing seems to be growing
more widespread. It appears to break the internet standard in a way
that works for most folks for now, but that breaks port 80 usage in general. Looking ahead, this breakage seems
like a growing nightmare waiting to happen. At the very least, I
expect more instances of my particular problem to appear as folks give up on the corporate hegemony of ICANN. More insidiously, transparent
proxy routers break the layered nature of the internet protocol and
restrict the flexibility that made it work in the first place. One would
hope that such proxies would at least act like routers when the fancier
proxying fails, but at least my ISP's doesn't. What about your ISP's?"
You can use netcat to route your own port 80 traffic. Simply get a good UNIX shell account, and configure your router to direct to that. It becomes a real version of what you would be trying to do. However, I would bitch like crazy if my ISP did anything like that to me. If I want to connect to port 80 on something, I would want to be connecting to such port 80. Any fiddling with it would sure make me drop that ISP in an instant.
samrolken
I should have posted all this in one comment... oh well...
n et /Proxies/Free/?tc=1
You could also use a third party proxy server. You can find gobs of them here:
http://tools.rosinstrument.com/proxy/
and here:
http://directory.google.com/Top/Computers/Inter
samrolken
Onenet is the internet "service" provider to most state agencies within Oklahoma, including Oklahoma State University, where I am currently working on a BSEE. Neglecting Onenet's other issues (AOL's netadmins could do a better job than Onenet's), they have a "transparent" web cache proxy. More often than not, errors fetching a web page come not from the browser or the site itself as they should, but from the proxy. DNS errors from the proxy are not uncommon. As for switching ISPs, I can't, which really sucks. But for what I can reach on the net, I'm still getting ultra-cheap broadband :P.
I pledge allegiance to the flag...
of the Corporate States of America...
Proxy servers, They might not be cacheing 8080 or other Proxy ports. Check http://tools.rosinstrument.com/proxy/
Bouncers - You set this program on an external server on a port thats not filtered. You just point your browser at this IP/port and your outside your filtered isp. Check www.freshmeat.net
SSH, tunnel or route from an external box.
Really, If you cant go through it, go around it, either with software or networking.
-
Well, if crime fighters fight crime and fire fighters fight fire, what do freedom fighters fight? They never mention that part to us, do they? - George Carlin
The poster mentioned that he used OpenNIC which is an alternative DNS root. It is proper HTTP, but a transparent proxy that does not "see" domains in this namespace effectively block you from viewing webpages under this domain.
His own box is properly configured to do OpenNIC lookups, but the HTTP request to the (proper) webserver gets intercepted. Now the proxy has to do the real HTTP request, but the proxy does not know about the alternative domains and probably returns a "Host not found" error.
I haven't heard of free proxy servers supporting one of the alternative NICs and I doubt the ISP will be interesting in subscribing to such a service. I guess the only solution will be to convince a friend to set up a proxy on a box someplace else.
Some alternative roots have their own "real" Internet domain which acts as a gateway domain, for instance name.space has http://name.space.xs2.net/ (regular hostname) which enables non-subscribers to view http://name.space/ (namespace only), making the domains available globally. If OpenNIC provides such a service, an alternative solution could be to run some proxy at home and let it rewrite OpenNIC urls into "regular" URLs.
( ^_^)/
(1) Line up a serious alternative ISP. Talk to their sales department; see if they do the same thing.
(2) Talk to your ISP's sales department. Tell them your problem. Tell them you're ready to move. (Perhaps ask what the hit rate of the cache is, that is, if the overhead is worth it for them.) See if they offer any accomodation.
(3) Go with the ISP that does what you want.
If you're using them for DSL, you may not have a lot of choice.
(As others suggested, if host resolution is your issue, you could run a local proxy on your 127.0.0.1 interface that converts host names into addresses.)
Stupid job ads, weird spam, occasional insight at
Second, there's a lot of ways around it which involve tunnelling.
Tunnel to another box running a non-broken web cache. I used to tunnel my http traffic through ssh to my colocated boxes, which ran adzapper, and proxied through that.
Tunnel at the IP layer by running any IP-in-IP encapsulation. If you have some version of windows, for example, you might convince someone with a server to run a PPTP server for you somewhere and you could tunnel through that. There are even Free PPTP Servers for Linux available to help.
Find someone who runs a little proxier for their own net with socks, and bounce off their socks proxy. Someone you know no another ISP probably has Wingate or the like running, and if they allowed it (and on some older version, it will permit this by default), you could set your browsers SOCKS settings to bounce off their proxy server, and since SOCKS isn't on port 80, your ISP will probably ignore it.
There are also a number of things you might discuss with your ISP to resolve the issue.
Suggest that they switch to a less broken cache server. (Squid, anyone?)
Suggest that they exempt you specifically from the cache server by telling it to ignore your ip address.
Note that they have an obligation to make sure their caching software doesn't interfere with your browsing; so it will be necessary (and not cost-effective for them) for you to call for every problem you notice.
Obviously, you'll need to probably speak to a whole number of supervisors, and probably eventually get transferred to a "real engineer", and they will probably hack in a fix (like exempting you only) rather than truly deal with the problem.
If all else fails, then you may want to try issuing ultimatums, like, "If you can't fix this problem, then you can cancel my service." Tech support people are lazy, however, in some cases, and may just opt to cancel you. This is a harsh reality in the world of consumer bandwidth -- and it will be worse, soon, with bells closing their DSL lines to competition, meaning unless someone else builds a telephony infrastructure to you, you'll probably pick Cable vs 1 DSL provider, and if you don't like something at either of them, you're just out of luck.
The web cache is exhibiting correct behavior. When a forward proxy cache (transparent or not) gets a request in the form of GET http://www.site.com/ http/1.1, it will use the www.site.com address instead regardless of what original dns name you went to (www.google.com in your example). In the transparent case where the GET statement looks more like GET /content.html http/1.1, it will use the original destination address.
In other words, it's your client that's broken. See RFC 2616 for details.
The unfortunate truth is that more often than not, sites simply don't set their cache controls correctly. They forget that caches don't exist just on the server side but that they exist on the client side as well. Section 13 of RFC 2616 explains how they work in great detail and it really should be mandatory reading for any site administrator.
If you're still looking for more information on web caching, check out Content Delivery Networks by Scot Hull. It was just released and is available on Amazon. There is an enlightening section on web caching that should clearly explain why what you're seeing is in fact correct behavior.
Normally what you do is to do layer 4 switching but note that you can do do switching on layer 7 as well, which means you can have the switch do url based switching so that a part of the url determines that it should get switched. This requires much more power and is mostly done for server switching like load balancing.
What happens in your case might be that they have placed a switch that can do at least layer 4 switching, between you and the internet.
What then is done is that all port 80 requests coming from the clients side(you) are re-directed to the proxy which means that http requests on other ports will not be cached. Note that anonymous ftp can also be proxied.
A "clever" proxy/switch solution can do ip-spoofing so the webserver gets your IP adr. and sends it back to you directly, but as there is a switch inbetween, it redirects the result to the proxy which then sends the result back to you.
A way to avoid it is to get a gateway somewhere that can channel your http traffic, you could set your browser to use this gateway as a proxy on any port. The switch will most likely not act on the traffic coming on this port an pass it though.
The easy way would be installing a proxy server on a box that you have access to on the outside and configure it so that it won't cache anything.
I see what you mean. You are sending traffic to a particular address based on your own DNS resolution, and if the traffic is proxied, you want it to be sent to your chosen destination, not that of the proxy.
In my opinion, the ISP is exhibiting correct behaviour.
Picture this: the object of the exercise with the transparent proxy is to cache pages and increase speed for the customer, right? I think it's already been agreed earlier in the thread that this is not entirely evil.
Let's say the proxy honours the destination IP address that you chose (I'm not sure how this would work in practice, but I'll go with it for now). It returns the web page from the server that your DNS picked, and caches it for the next guy.
Another customer requests a page with the same name. What if they're using a DNS root where the answer conflicts with yours? The customer gets the "wrong" web page. Because cached objects eventually expire, this means that the customer might get a completely different site dependent only on the time and date they happened request it.
The ISP doesn't use the same DNS root you do, so they can't begin to troubleshoot the problem.
I concede that the popular "alternate" DNS roots have few enough conflicts with the IANA-assigned roots at the minute, but even that is an irrelevancy - any solution that allows a customer to choose destination IP address on behalf of other customers opens up the ISP to a denial of service attack by a user less trustworthy than you or I. One could set up an arbitrary "root" server that resolves www.yahoo.com to my own site. Or google. Or some site that accepts credit card orders.
I can't see any scalable way out of this without the ISP picking one root, and sticking with it. If that is so, then I think this is a fundamental problem with split roots and, if you really want to use them, be fully aware of what you're getting yourself into. Turning off the transparent proxy will help this time, but you won't be able to rely on being able to talk to any server on the internet that doesn't use the same root as yours, even the servers you don't (usually) need to know exist.
Regards,
Dave