Scalable, Fault-Tolerant TCP Connections?
pauljlucas asks:
"My company is developing custom server software for an instant
messaging type server (under Solaris). Every client maintains a
TCP connection to the server when it is 'logged in'; the server
maintains state of who's logged in where. For large-scale
deployment, there are two problems: scalability and fault-tolerance.
A single server can handle at most around 64000 open sockets. To go
beyond this, you need many servers. Another way would be to 'fake' a
TCP stack in user-space (by reading/writing raw TCP packets) thus not
having one real socket per connection. For fault-tolerance, ideally
one would like N servers to maintain the exact state, at least for
the server process, so that if one goes off line, the other(s) can
pick up seamlessly. I'm thinking that both of these issues must have
already been solved without having to write lots of custom software.
Is anybody aware of off-the-shelf software and/or hardware solutions
(either commercial or freeware)?"
You can get a TCP load balancer like Cisco LocalDirector or one its competing clones. They are expensive tho ($20,000)
You indicate that your scalability problem kicks in at around 64000 simultaneous clients. Having developed high-performance scalable servers I would recommend taking a look at The C10k Problem, which is rather sobering: handling 10000 simultaneous connections is already a bitch.
Basically if you are using select() or poll() and have 10000 connections, you can expect 30% of your timeslice (on a fast machine) to be taken scanning the connection list for available data. Your performance also goes down the drain after about 250 connections (empirical observation; our server handles async requests which are offloaded to a hardware device for processing, so most server overhead is packet handling).
To get more connections than that you need to look at OS-specific methods: IOCompletion ports on NT, /dev/poll on Linux & Solaris, and kqueue on FreeBSD. These scale must more linearly out to 10000 connections (depending of course on if your server can handle the total load), but still don't give you the ability to scale endlessly.
Finally, I know of no intrinsic reason (although OS implementations may have arbitrary limits) why a server should be restricted to 64k connections. A TCP/IP connection is defined by two endpoints, each being an IP address/port cobination. Your server's ip/port is always one of those endpoints, and the client are unique. Client connections on the other hand are limited by the number of available port on the computer, i.e. 64k.
For fault tolerance, your best bet is to look at the architecture of Enterprise Java servers. Effectively you have a load balancer up front which redirects packets to one of several application servers; any persistent information must be saved off to a database server, and if necessary two or more database servers must be configured to synchronise with each other on a continual basis.
I am not aware of any software that can magically do this for an arbitrary application, although you may find network shared memory libraries which can more or less accomplish this. But if you are concerned about performance, you are unlikely to find COTS software to do the job (since the current business model is to throw more computing power at it).
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
First off, the Kegel's c10k page referenced earlier is definitely worth a read. And if you're under the impression that having 2^16 TCP port numbers limits you to 2^16 connections, that's not accurate. You can have hundreds of thousands of connections to one machine, presuming you manage them properly (as the c10k page points out).
More importantly, you should check out http://www.linuxvirtualserver.org/ ; it's aim is exactly what you want:
"The Linux Virtual Server is a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system. The architecture of the cluster is transparent to end users. End users only see a single virtual server."
Sounds like a perfect match.
Sumner
rage, rage against the dying of the light
If you're thinking of going to the trouble of simulating TCP with raw sockets, UDP seems a simpler alternative to that.
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
A load balancer does not give redundancy. With a load balancer, if a server dies, NEW connections are sent to a different server instead, but the existing connections to the down server all are closed - an external non-OS integrated solution like load balancing does not give transparent failover on TCP connections. It works for HTTP because browsers are used to connections suddenly dieing and will simply retry. But, if the client isn't smart enough to reconnect, it won't work.
The way to do this is to build a custom TCP stack and integrate it tightly into your app. A lot of work and hard to get right.
I would ask, "Do we REALLY" need this when our application already has to handle things like network failures? You might, though - I don't know what your application is.
Also, don't forget to use redundant routers, redundant firewalls, etc. If you use NAT, that imposes one more problem - transparently moving the connection table between the failed firewall and the working one.
Okay, let's see if I understand.
You've a instant messaging application, which is composed of a lot of clients and a central server.
The clients talk to the server only on start-up, but presumbely you want to keep the TCP connection open so you would know when a client goes offline.
That is not a good way to do it, not because you'll have problems with ports, you can have as many connections open as you want, after all, a TCP connection is identified by source_ip:port + dest_ip:port. But because of the overhead you'll encounter maintaining so many connections.
It's also not good to use UDP for peer-to-peer connections, why complicate your life with things that TCP already offers?
Persumbely, on start-up, a client calls the server, log-in, register its state (online,busy,D/A, etc), and asks for a list of names/ID of friends that it has.
You want to send it the IPs of those who are online, so it can send messages to them directly.
The way to do it, in my idea, is to handle it so:
Have a database of your clients, which would include username & password, an IP & state.
Another field should be a list of names that should be notified when this client goes online/change status.
When the clients calls in, authenticate it, and register it state in the DB, send it the list of IPs & states of the people online that he requested.
And register its state in the DB. Notify everyone that requested to be notified and is currently online that the user went online.
Cut the TCP connection. **
When the clients closes, have it send a message saying "I'm going offline".
Then change the state of it in the DB & inform the users linked to it. **
That way, the only connections that you've is of clients starting, changing status & shuting down.
That should lower your load considerably.
As for scalability & fault tolerance, just put it behind a load balancer, that way, if a server is busy or down, the request goes to another server, all of them are linked to the same DB, so the state is being preserved.
If a server goes down in the middle of a request, then the client should be smart enough to recover, and try again (that is what browsers do, and why load balancers works so well for HTTP).
Be sure to make the client stop after a couple of failed tries, though, you don't want to overload the network in case all your servers down for some reason.
What about errors, you ask? If a client is being terminate or disconnected without having a chance to inform you?
Well, the way I would do it is let other clients discover that.
If a client can't form a direct connection to another client, it should tell the server about this.
The server would try to reach the client by himself, and if he fails, would register that client as offline, store the request for when that user goes online again/discard it, and tell all the clients that are linked to the failed client about it [***].
I think it's much better than have the server poll at possibly hundreds of thousands of connections. (Or more, if you are lucky.)
After all, it doesn't matter if a client that no one is trying to call is offline while it's marked online. #
[**]
If you care about bandwidth/load, have the clients maintain a list of the people/or give it to him during log-on, and have the *client* do the notifications. On start-up, shut-down & status chagne. You'll have to handle the errors yourself, though. ***
[***]
You can be really nasty and have the client that discovered the error inform the rest of the clients that are linked to the failed client that it's gone. But that would require that client to have the list of people that want the failed client's link, which can be bad from privacy point of view.
[#]
If it does matter to you, have the clients poll the other clients in their list every hour or so, it would balance out so you wouldn't have too much lost connections marked as alive.
--
Two witches watched two watches.
Which witch watched which watch?
> No, we want to keep the TCP connection open so a client knows when it has an incoming message instantly.
Why do it via the *server*, anyway? If you have a message from one client to another, then transfer it directly from one client to another, not through the server. That way, the client is aware instantly, and you aren't wasting bandwidth by transferring the data.
--
Two witches watched two watches.
Which witch watched which watch?