Australian Company Promises Switching Hardware With Sub-130ns Latency
snowdon writes "The race for low-latency in finance and HPC has taken a major turn. A bunch of engineers from Australia have 'thrown away the air conditioning' in a traditional switch, to get a 10G fibre-to-fibre latency of less than 130ns! Way faster than more traditional offerings. This lady (video) would tell you that it's equivalent to just 26m of optical fibre. Does that mean we just lose money faster?"
Even if a lot of research has been put into reducing the latencies of switching technology, the vast majority of real-world deployments are nowhere near where they should be. The result is corporations spending millions upgrading their core switching, and then the result is the same or worse than what they had with ordinary gigabit technology.
I've been to more than half a dozen sites recently with new installations of 10 GbE, but with terrible network performance. All too often, I'm seeing latencies as high as 1.3 milliseconds even for servers on the same switch. Read up on bandwidth-delay product too get an idea of why this would severely nerf throughput. The odd thing is that at a number of these sites, older servers with 1 GbE connected to the same switching infrastructure get 100% of wire speed without issues.
I don't know exactly what the root cause is, but I'm starting to suspect that the extra latency is coming from somewhere that network engineers don't usually test, like CPU power management taking an excessively long time to wake up a core to respond to a packet. What I think happens is something like this:
- The fast network delivers a burst of data very quickly.
- The receiver CPU starts to slowly wake up from sleep mode, while the sender is waiting for an ACK packet, because it has finished sending an entire TCP window.
- The sender CPU goes to sleep, because it still has nothing to do.
- The receiver CPU finally gets around to the packet, which it processes quickly and efficiently, sending the ACK back.
- The receiver OS sees that the CPU is "0.1%" busy, has nothing to do now, so it sends it to back to sleep.
- The sender CPU starts to slowly wake up, while the receiver is also asleep, waiting patiently for more data.
- The cycle repeats, with more waiting at every step.
With slightly slower networks or CPUs, the CPUs never get a chance to be idle long enough to enter a sleep state, so everything is always ready for more data. I've seen 3x improvements in iPerf TCP throughput by simply running a busy-loop in a background process!