Session Management and Mega-Proxies?
chicagothad asks: "I help
to run large internet systems for a few Fortune 500 companies. We are
running several clustered systems, comprised of both Microsoft and Linux
technologies. We have run into several problems with what is known as
a 'mega-proxy'. A mega proxy is a way that large internet providers
distribute their outbound traffic via a pool of IPs. AOL/Compuserve
is the largest example of this. We are having fits with session
management right now. Does anyone have any ideas on the best system
structure or design to manage these beasts or any other tips that may
be helpful?"
What kind of problems are you having? Because you mentioned large organizations, I'm assuming you are talking about problems with large-volume web server farms and traditional session management techniques.
Basically, the problem is such:
Sessions are usually stored in RAM. Thus, the session only exists on one web server even if there are lots of web servers. To make sure that the right webserver gets the traffic, the client IP, destination IP, and (sometimes) destination port are hashed together to determine which server to go to. Because the hash is deterministic between requests, this method insures that if a user hits Box A, he will continue to hit Box A, provided these things do not change (that is - source IP and destination IP/port).
The problem with the mega proxies (and lots of other forms of NAT where there are multiple outgoing IPs) is that the source IP does change. Thus, the hashing technique described above fails. Cisco Local Directors amoung others do exactly this.
The solution I've implemented basically keeps the session information in RAM, although it does this through a middle-layer. If I get a session ID from a browser but can not find that session ID in my RAM, I put a querry out to the server farm network and ask, "Who has this?" Whoever has it transfers the session to me (transfer, NOT copy, as I only want one authorative copy).
You have to be careful of concurrancy issues while doing this, but if you are careful, it will work well and be extremely fast for the majority of users, as they remain at one IP for the duration of thier session. But it allows the possibility of a session migrating.
Another option is to use a central "session repository" like a database or special application server. These are almost always going to be bottlenecks, though.
I will say that this is not uncharted territory. The solutions to these kinds of problems are well known. If you are dealing with Fortune 500 companies, make sure your project is funded well enough to bring in some as consultants... This is a fundamental issue to get right, and if you have problems here, I suspect you'll encounter some problems later.