Writing High-Availability Services?
bigattichouse asks: "I have a project coming up that will require some serious load capabilities accepting socket connections. while I have a design that can be distributed over multiple servers (using queued reads/writes to the db) and is as low-overhead as I can make it - I am concerned about falling into common problems that may have been overcome in many other projects. What strategies (threading, forks, etc) give the best capability? What common pitfalls should I avoid?"
In a former job we totally hammered an app on our internal lan and got many times the requests rate we would need in the real world.
Fat, dumb and happy we figured that the real world couldn't hammer us as hard as we could internally. Wrong! Slow connections require maintaining connection resources much longer than on an internal network where the response can be created and dispensed with almost instantly.
Maintaining all those simultaneous connections depleted our resources and the app went into full meltdown mere seconds after being released on the public servers.
We beat a hasty retreat to the old code, licked our wounds, and learned a valuable lesson.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
You probably know about this paper already, but just in case you don't:
The paper deals with web servers handling ten thousand simultaneous TCP connections. But most of it is not particularly related to HTTP or web problems, but with more general socket I/O stuff --particulary with the ways of dealing with readiness/error notifications (e.g. select(), poll(), asynchronous signals, etc.). It also discusses other kind of limits (threads, processes, descriptors).
It is quite enlightening. It may be a bit outdated --I remember reading it about the time Netcraft was doing all that noise about Windows being faster than Linux as a web server-- but I'm sure most of it is very relevant.