For me Java NIO (SocketChannel/select/read) worked a lot better then the multi-threaded/blocking version (Socket/read).
I had to simulate 5000 TCP clients reading at 128kbps (radio streams), on Linux 2.6 64bits with 4GB RAM, CPU DualCore (Java 1.6).
With the multi-threaded version is was a bit easier to code, but it consumed 2*100% CPU before reaching the target. The context switches between threads consumed a lot of CPU. Even before reaching the CPU limit, the Thread.sleep() between reads started to have big delays (>1second), probably because of the scheduler that had to schedule 5000 active threads.
With the single-threader, NIO version it reached the target consuming about 20% of one CPU. I had to tweak a bit the reads for this. The problem was that reading in a close loop, there were many short reads, producing many switches between user/kernel space, consuming too much CPU. Putting a small sleep (50-100ms) between each select() resulted in bigger and fewer reads, in the same time consuming all the received data in a single loop.
For me Java NIO (SocketChannel/select/read) worked a lot better then the multi-threaded/blocking version (Socket/read). I had to simulate 5000 TCP clients reading at 128kbps (radio streams), on Linux 2.6 64bits with 4GB RAM, CPU DualCore (Java 1.6). With the multi-threaded version is was a bit easier to code, but it consumed 2*100% CPU before reaching the target. The context switches between threads consumed a lot of CPU. Even before reaching the CPU limit, the Thread.sleep() between reads started to have big delays (>1second), probably because of the scheduler that had to schedule 5000 active threads. With the single-threader, NIO version it reached the target consuming about 20% of one CPU. I had to tweak a bit the reads for this. The problem was that reading in a close loop, there were many short reads, producing many switches between user/kernel space, consuming too much CPU. Putting a small sleep (50-100ms) between each select() resulted in bigger and fewer reads, in the same time consuming all the received data in a single loop.