Slashdot Mirror


How to Work Around Broken Port-80 Routing?

Dr. Zowie writes "My ISP places an opaque (intended to be transparent) web proxy between me and the rest of the world. It is causing me problems due to misconfiguration or misdesign. My question is twofold. On the micro level, what can I do in the short term to work around the broken routing (in the long term, I switch ISPs if it's not fixed)? On the macro level, what can we as a community do to prevent breakage of the net on a global scale by poorly designed routing hacks?"

Dr. Zowie continues: "I use a regional ISP with otherwise-very-good policies. However, they seem to be intercepting anything that comes from my home net on port 80, so that they can ``transparently'' cache web requests based on the payload of those packets. The proxy seems to work rather well in most cases: I never noticed it until I started using OpenNIC. Then I found that some web pages that should have resolved OK through the OpenNIC system failed even though routing on different ports worked OK.

"I did some experimentation using ``telnet'' on port 80 directly, and found that packets are being routed based only on the payload regardless of the original destination address: I can (for example) retrieve the Slashdot front page by using ``telnet www.google.com 80'' and asking for "http://www.slashdot.org http/1.1". The tech support folks seem to be stonewalling me: the main contact tells me that the behavior is "not broken" even though it clearly violates RFC 1812, the standard set of rules for IP routing.

"The practice of ``transparent'' proxy routing seems to be growing more widespread. It appears to break the internet standard in a way that works for most folks for now, but that breaks port 80 usage in general. Looking ahead, this breakage seems like a growing nightmare waiting to happen. At the very least, I expect more instances of my particular problem to appear as folks give up on the corporate hegemony of ICANN. More insidiously, transparent proxy routers break the layered nature of the internet protocol and restrict the flexibility that made it work in the first place. One would hope that such proxies would at least act like routers when the fancier proxying fails, but at least my ISP's doesn't. What about your ISP's?"

23 of 323 comments (clear)

  1. Use netcat... by samrolken · · Score: 4, Informative

    You can use netcat to route your own port 80 traffic. Simply get a good UNIX shell account, and configure your router to direct to that. It becomes a real version of what you would be trying to do. However, I would bitch like crazy if my ISP did anything like that to me. If I want to connect to port 80 on something, I would want to be connecting to such port 80. Any fiddling with it would sure make me drop that ISP in an instant.

    --
    samrolken
  2. Re:Use netcat... or your own proxy server... by samrolken · · Score: 2, Informative

    Or, you could use your own proxy server, like Squid for UNIX or AnalogX Proxy for Win32. You might try something like the port + 65536 rule. Port 80 becomes port 65616 or something (That may not be precise), and that would confuse your router, but still be port 80. I used a similar trick to get around similar proxying at school.

    --
    samrolken
  3. Re:Use netcat... or someone else's proxy server by samrolken · · Score: 4, Informative

    I should have posted all this in one comment... oh well...

    You could also use a third party proxy server. You can find gobs of them here:

    http://tools.rosinstrument.com/proxy/

    and here:

    http://directory.google.com/Top/Computers/Intern et /Proxies/Free/?tc=1

    --
    samrolken
  4. Sounds like what my college does by AntiNorm · · Score: 3, Informative

    Onenet is the internet "service" provider to most state agencies within Oklahoma, including Oklahoma State University, where I am currently working on a BSEE. Neglecting Onenet's other issues (AOL's netadmins could do a better job than Onenet's), they have a "transparent" web cache proxy. More often than not, errors fetching a web page come not from the browser or the site itself as they should, but from the proxy. DNS errors from the proxy are not uncommon. As for switching ISPs, I can't, which really sucks. But for what I can reach on the net, I'm still getting ultra-cheap broadband :P.

    --

    I pledge allegiance to the flag...
    of the Corporate States of America...
  5. Lots of ways to work around your ISPs. by BrookHarty · · Score: 5, Informative

    Proxy servers, They might not be cacheing 8080 or other Proxy ports. Check http://tools.rosinstrument.com/proxy/

    Bouncers - You set this program on an external server on a port thats not filtered. You just point your browser at this IP/port and your outside your filtered isp. Check www.freshmeat.net

    SSH, tunnel or route from an external box.

    Really, If you cant go through it, go around it, either with software or networking.
    -
    Well, if crime fighters fight crime and fire fighters fight fire, what do freedom fighters fight? They never mention that part to us, do they? - George Carlin

  6. Hold on here! by Anonymous Coward · · Score: 1, Informative

    The submitter seems a little confused about how http proxies are required to work. The ISP's proxy seems to be working exactly according to the standard. Taking an http with an absolute uri and redirecting it to the server specified by the uri is a MUST according to RFC 2068 (the HTTP/1.1 standard). Moreover, using a different name resolution system then the server for your client and expecting it to work is a MUST NOT as it can lead to proxy looping.

  7. Re:Use netcat... or your own proxy server... by Anonymous Coward · · Score: 2, Informative

    That requires an external box, like the shell account the original comment mentioned. If you have that, you could use some more advanced schemes like routing only the SYN-packets for port 80 through your external account. This way you wouldn't cause three times the traffic like you do with a proxy (your connection plus twice the external connection).

  8. OpenNIC by glasn0st · · Score: 3, Informative

    The poster mentioned that he used OpenNIC which is an alternative DNS root. It is proper HTTP, but a transparent proxy that does not "see" domains in this namespace effectively block you from viewing webpages under this domain.

    His own box is properly configured to do OpenNIC lookups, but the HTTP request to the (proper) webserver gets intercepted. Now the proxy has to do the real HTTP request, but the proxy does not know about the alternative domains and probably returns a "Host not found" error.

    I haven't heard of free proxy servers supporting one of the alternative NICs and I doubt the ISP will be interesting in subscribing to such a service. I guess the only solution will be to convince a friend to set up a proxy on a box someplace else.

    Some alternative roots have their own "real" Internet domain which acts as a gateway domain, for instance name.space has http://name.space.xs2.net/ (regular hostname) which enables non-subscribers to view http://name.space/ (namespace only), making the domains available globally. If OpenNIC provides such a service, an alternative solution could be to run some proxy at home and let it rewrite OpenNIC urls into "regular" URLs.

    --
    ( ^_^)/
  9. Their way or the highway by Lumpish+Scholar · · Score: 3, Informative

    (1) Line up a serious alternative ISP. Talk to their sales department; see if they do the same thing.

    (2) Talk to your ISP's sales department. Tell them your problem. Tell them you're ready to move. (Perhaps ask what the hit rate of the cache is, that is, if the overhead is worth it for them.) See if they offer any accomodation.

    (3) Go with the ISP that does what you want.

    If you're using them for DSL, you may not have a lot of choice.

    (As others suggested, if host resolution is your issue, you could run a local proxy on your 127.0.0.1 interface that converts host names into addresses.)

    --
    Stupid job ads, weird spam, occasional insight at
    1. Re:Their way or the highway by Jerf · · Score: 4, Informative

      (As others suggested, if host resolution is your issue, you could run a local proxy on your 127.0.0.1 interface that converts host names into addresses.)

      Unfortunately, that's not a complete solution. Example: Compare my home page versus the IP address that hostname resolves to.

      Lots of servers do this.

    2. Re:Their way or the highway by Anonymous Coward · · Score: 1, Informative

      It's called "name based virtual hosting" and is actually a recommended practice because of the ip-address-shortage.

  10. corrections, suggestions, etc by MattW · · Score: 5, Informative
    First of all, the phrase "routing" is a misnomer. Web caching is something that happens on the application layer of the OSI model, layer 7, whereas "routing" refers to layer 3, which supplies IP routing for the TCP/IP protocol suite. What's broken is their caching, their cache server, or their proxying; pick a term.

    Second, there's a lot of ways around it which involve tunnelling.

    Tunnel to another box running a non-broken web cache. I used to tunnel my http traffic through ssh to my colocated boxes, which ran adzapper, and proxied through that.

    Tunnel at the IP layer by running any IP-in-IP encapsulation. If you have some version of windows, for example, you might convince someone with a server to run a PPTP server for you somewhere and you could tunnel through that. There are even Free PPTP Servers for Linux available to help.

    Find someone who runs a little proxier for their own net with socks, and bounce off their socks proxy. Someone you know no another ISP probably has Wingate or the like running, and if they allowed it (and on some older version, it will permit this by default), you could set your browsers SOCKS settings to bounce off their proxy server, and since SOCKS isn't on port 80, your ISP will probably ignore it.

    There are also a number of things you might discuss with your ISP to resolve the issue.

    Suggest that they switch to a less broken cache server. (Squid, anyone?)

    Suggest that they exempt you specifically from the cache server by telling it to ignore your ip address.

    Note that they have an obligation to make sure their caching software doesn't interfere with your browsing; so it will be necessary (and not cost-effective for them) for you to call for every problem you notice.

    Obviously, you'll need to probably speak to a whole number of supervisors, and probably eventually get transferred to a "real engineer", and they will probably hack in a fix (like exempting you only) rather than truly deal with the problem.

    If all else fails, then you may want to try issuing ultimatums, like, "If you can't fix this problem, then you can cancel my service." Tech support people are lazy, however, in some cases, and may just opt to cancel you. This is a harsh reality in the world of consumer bandwidth -- and it will be worse, soon, with bells closing their DSL lines to competition, meaning unless someone else builds a telephony infrastructure to you, you'll probably pick Cable vs 1 DSL provider, and if you don't like something at either of them, you're just out of luck.

  11. The behavior is correct. by xanthan · · Score: 5, Informative

    The web cache is exhibiting correct behavior. When a forward proxy cache (transparent or not) gets a request in the form of GET http://www.site.com/ http/1.1, it will use the www.site.com address instead regardless of what original dns name you went to (www.google.com in your example). In the transparent case where the GET statement looks more like GET /content.html http/1.1, it will use the original destination address.

    In other words, it's your client that's broken. See RFC 2616 for details.

    The unfortunate truth is that more often than not, sites simply don't set their cache controls correctly. They forget that caches don't exist just on the server side but that they exist on the client side as well. Section 13 of RFC 2616 explains how they work in great detail and it really should be mandatory reading for any site administrator.

    If you're still looking for more information on web caching, check out Content Delivery Networks by Scot Hull. It was just released and is available on Amazon. There is an enlightening section on web caching that should clearly explain why what you're seeing is in fact correct behavior.

    1. Re:The behavior is correct. by Dr.+Zowie · · Score: 4, Informative
      Yep, the cache is behaving correctly for a cache. The problem is that it's behaving incorrectly for a router, because I can't send the http: requests I want to the hosts I want to send them to.

      I'm not familiar enough with the ins and outs of cache design to know whether RFC2616 is designed primarily for ``transparent'' or ``selected'' proxies, but using a DNS resolution on the destination host seems to break the layered structure of the IP stack. In this case, packets that I've (layer 3) addressed to a specific host never get there, because (layer 4) they're being directed to another machine based on port, and the other machine (the cache) is routine them based on a name (layer 7) contained in the packet payload.

      That is acceptable behavior for a proxy to which I'm explicitly routing my http requests, but not for a router down which I'm sending port-80 IP packets.

  12. How to find a transparent proxy's IP address by ddkilzer · · Score: 2, Informative

    If you want to find the IP address of a transparent proxy, simply point your web browser at a web page that will print out "your" IP address when you request a web page. Instead of printing the IP of your firewall or your host, it will print the transparent proxy's IP address.

    For example:

    After that, you may be able to do some more investigation into what kind of host it is and/or what kind of software it is running. (This is left as an excercise for the crac...err, reader.)

  13. It's in the layers. by Bender+Unit+22 · · Score: 4, Informative

    Normally what you do is to do layer 4 switching but note that you can do do switching on layer 7 as well, which means you can have the switch do url based switching so that a part of the url determines that it should get switched. This requires much more power and is mostly done for server switching like load balancing.

    What happens in your case might be that they have placed a switch that can do at least layer 4 switching, between you and the internet.
    What then is done is that all port 80 requests coming from the clients side(you) are re-directed to the proxy which means that http requests on other ports will not be cached. Note that anonymous ftp can also be proxied.
    A "clever" proxy/switch solution can do ip-spoofing so the webserver gets your IP adr. and sends it back to you directly, but as there is a switch inbetween, it redirects the result to the proxy which then sends the result back to you.

    A way to avoid it is to get a gateway somewhere that can channel your http traffic, you could set your browser to use this gateway as a proxy on any port. The switch will most likely not act on the traffic coming on this port an pass it though.

    The easy way would be installing a proxy server on a box that you have access to on the outside and configure it so that it won't cache anything.

  14. Re:Look At It From the ISP's Standpoint by ocip · · Score: 2, Informative

    It is much easier for an established ISP to simply implement a transparent proxy, rather than to have all of its clients configure their browser to use a proxy. Remember, only 40% have it configured already. 60% don't. And, of that 60%, maybe 5% have even heard of it before. It really, REALLY sucks to have thousands of customers calling a support desk to configure their browser to use a proxy.

  15. tunneling over other than ssh... by rusty0101 · · Score: 2, Informative

    There are a few workarounds to the problem of devices that you do not wish to handle your traffic doing so.

    I have seen tunneling via ip-ip, ssh, and other ipv4 protocols mentioned, however there is another option available, and that is to tunnel your traffic as ipv6 traffic over ipv4.

    It does take a bit of time to set up, but if you can find an agreable ipv6 network provider to allow you to tunnel to their server, your traffic will not be handled by any transparent proxy server at your local ISP, regardless of the type of traffic that you are working with.

    I am not sure how complete the ipv6 implementation for Windows is yet, or, depending upon which version of Windows you may be running, if it is even an option, but for users working with Linux and BSD, this should not be a significant issue.

    Then again, I could be wrong.

    -Rusty

    --
    You never know...
  16. Transparent proxying should be an option by Denium · · Score: 1, Informative
    If the user wants to use proxying, so be it.

    If the user, despite ISP encouragement, chooses not to use a proxy, that should be his choice. He is paying for the bandwidth, and is assumed to be aware of the possible performance hit.

    This was discussed in the vuln-dev mailing list after Comcast implemented transparent proxying.

    This raised quite a stink when Comcast's logging habits were revealed. Oops.

    There is obviously a performance degradation involved with re-resolving the address given to the cache server. Furthermore, requests now appear to be coming from the server, not the actual user -- potentially breaking host-based authentication systems.

    I've also seen these cache systems horribly implemented. An IRC network that I administer recently starting checking for HTTP proxies on connection. This was performed by connecting to the remote user's host on certain ports (80, 3128, 8000, and 8080) and then issuing a CONNECT request. In more than one case, a blatantly stupid ISP redirected _incoming_ port 80 traffic to their server -- WITHOUT any sort of access restrictions on their proxy. Sort of ironic that they were probably using untold amounts of bandwidth for 1337 bounce kiddiots.

    Proxying without consent is an Evil Thing.

  17. Re:Look At It From the ISP's Standpoint by Anonymous Coward · · Score: 1, Informative
    My ISP (istop.com) has a really good solution for this.

    They have, as is becoming popular these days, a monthly bandwidth cap. It's 10G.

    But, if you use their HTTP proxy (which is NOT transparent; it's a good old-fashion normal proxy that sits on another port) that counts as an extra 10G on top of that.

    In other words, I am well motivated to use their proxy. Given that it also deletes ads (heh), I rather LIKE their proxy. But if it causes trouble, which inevitably it does on some sites, I can just turn off proxying and see things normally. It's a win-win situation for me and them.

    They also don't block any ports, which is explicitly stated in their policy. They also explicitly allow you to run your own servers.

    In short, they seem to be run by people who actually understand networking.

  18. WRONG by mnot · · Score: 2, Informative

    Transparent proxying is a violation of IP routing, plain and simple. This has been discussed ad nauseum on the IETF WREC WG mailing list and the IETF main list.

  19. It's working correctly by davew · · Score: 3, Informative

    I see what you mean. You are sending traffic to a particular address based on your own DNS resolution, and if the traffic is proxied, you want it to be sent to your chosen destination, not that of the proxy.

    In my opinion, the ISP is exhibiting correct behaviour.

    Picture this: the object of the exercise with the transparent proxy is to cache pages and increase speed for the customer, right? I think it's already been agreed earlier in the thread that this is not entirely evil.

    Let's say the proxy honours the destination IP address that you chose (I'm not sure how this would work in practice, but I'll go with it for now). It returns the web page from the server that your DNS picked, and caches it for the next guy.

    Another customer requests a page with the same name. What if they're using a DNS root where the answer conflicts with yours? The customer gets the "wrong" web page. Because cached objects eventually expire, this means that the customer might get a completely different site dependent only on the time and date they happened request it.

    The ISP doesn't use the same DNS root you do, so they can't begin to troubleshoot the problem.

    I concede that the popular "alternate" DNS roots have few enough conflicts with the IANA-assigned roots at the minute, but even that is an irrelevancy - any solution that allows a customer to choose destination IP address on behalf of other customers opens up the ISP to a denial of service attack by a user less trustworthy than you or I. One could set up an arbitrary "root" server that resolves www.yahoo.com to my own site. Or google. Or some site that accepts credit card orders.

    I can't see any scalable way out of this without the ISP picking one root, and sticking with it. If that is so, then I think this is a fundamental problem with split roots and, if you really want to use them, be fully aware of what you're getting yourself into. Turning off the transparent proxy will help this time, but you won't be able to rely on being able to talk to any server on the internet that doesn't use the same root as yours, even the servers you don't (usually) need to know exist.

    Regards,
    Dave

  20. Re:Err...so what is broken exactly? by ViVeLaMe · · Score: 2, Informative
    ok, i think there are some misunderstandings here.
    let's clarify this a bit.
    The poster's problem is not with a "classical" HTTP proxy, but with a *transparent* (also called interception :-P ) HTTP proxy.
    The client uses OpenNIC's alternate root servers, to go to http://www.dev.null .
    Because of transparent proxying, he can't get to http://www.dev.null, because his outgoing port 80 requests are routed to a transparent proxy, who forget about the destination IP, and only take care of the payload of the request: the GET. Since the transparent proxy doesn't user OpenNIC's alternate root servers, it can't resolve www.dev.null, and can't serve the page requested.
    Now this is perfectly normal, expected, and even needed behaviour for a *normal* HTTP proxy, but if you look carefully at the RFC 1919, this is a broken behaviour for a transparent proxy: on a "normal" proxy, the client can even afford to not be able to resolve www.yahoo.com and still access it (it passes the HTTP request to the proxy, which will do the resolving himself). On a transparent proxy, the client is requested to do the DNS lookup, and the transparent proxy
    can determine what is the final target destination instantly, since the LOCAL IP address field of the connection contains the target server's IP address. There is no need for the proxy application to ask the client what is the final target system.

    What the transparent proxy should do is : remember the dest IP, and connect to that IP, instead of trying (and miserably fail) to resolve the hostname included in the payload of the HTTP GET request.
    As a matter of facts, that's not the only things a transparent proxy breaks. check out RFC 3143 for some examples.
    Anyway, if something like this is not specified on the poster's contract, the ISP should have implemented an 'opt out' method for customers who doesn't want it for any reasons (moral, technical, security, whatever). I work for an ISP, and when we implemented antispamming ressources (like MAPS, and so on), we were *required* to be able to let the spam flow to any consumer who asked not to be protected from SPAM. Might sound a stupid request, but, hey, what if the consumer is a SPAM survey organisation, or maybe it's a company with important customers which depends on RBL'd mail servers.. Same goes for antivirus email scanning, for example, since some customers may be virus researchers who WANT to receive those.. sounds stupid to drive those customers out by giving them *too much* services, and flexibility is a good thing. If somethin' so intrusive is mandatory, the ISP's no better than AOL.
    cheers.
    --
    i had a sig, once..