Australian Company Promises Switching Hardware With Sub-130ns Latency

← Back to Stories (view on slashdot.org)

Australian Company Promises Switching Hardware With Sub-130ns Latency

Posted by timothy on Tuesday June 5, 2012 @02:45AM from the time-to-start-straightening-the-undersea-bits dept.

snowdon writes "The race for low-latency in finance and HPC has taken a major turn. A bunch of engineers from Australia have 'thrown away the air conditioning' in a traditional switch, to get a 10G fibre-to-fibre latency of less than 130ns! Way faster than more traditional offerings. This lady (video) would tell you that it's equivalent to just 26m of optical fibre. Does that mean we just lose money faster?"

3 of 77 comments (clear)

Min score:

Reason:

Sort:

Meanwhile... by bertok · 2012-06-05 04:02 · Score: 5, Interesting

Even if a lot of research has been put into reducing the latencies of switching technology, the vast majority of real-world deployments are nowhere near where they should be. The result is corporations spending millions upgrading their core switching, and then the result is the same or worse than what they had with ordinary gigabit technology.
I've been to more than half a dozen sites recently with new installations of 10 GbE, but with terrible network performance. All too often, I'm seeing latencies as high as 1.3 milliseconds even for servers on the same switch. Read up on bandwidth-delay product too get an idea of why this would severely nerf throughput. The odd thing is that at a number of these sites, older servers with 1 GbE connected to the same switching infrastructure get 100% of wire speed without issues.
I don't know exactly what the root cause is, but I'm starting to suspect that the extra latency is coming from somewhere that network engineers don't usually test, like CPU power management taking an excessively long time to wake up a core to respond to a packet. What I think happens is something like this:
- The fast network delivers a burst of data very quickly.
- The receiver CPU starts to slowly wake up from sleep mode, while the sender is waiting for an ACK packet, because it has finished sending an entire TCP window.
- The sender CPU goes to sleep, because it still has nothing to do.
- The receiver CPU finally gets around to the packet, which it processes quickly and efficiently, sending the ACK back.
- The receiver OS sees that the CPU is "0.1%" busy, has nothing to do now, so it sends it to back to sleep.
- The sender CPU starts to slowly wake up, while the receiver is also asleep, waiting patiently for more data.
- The cycle repeats, with more waiting at every step.
With slightly slower networks or CPUs, the CPUs never get a chance to be idle long enough to enter a sleep state, so everything is always ready for more data. I've seen 3x improvements in iPerf TCP throughput by simply running a busy-loop in a background process!
1. Re:Meanwhile... by Cassini2 · 2012-06-05 04:35 · Score: 3, Interesting
  
  The number of interrupts per second that a modern processor can handle has stayed relatively fixed for a large number of years, and network response time is a strong function of interrupt performance.
  It has been a while since I benchmarked interrupt response on a variety of processors with the same block of code. However, when I last did it, a 32 MHz 8-bit PIC18 series microcontroller (MCU) was capable of processing a real-time interrupt at 10,000 times per second. A much faster 3 GHz x86 CPU could manage 70,000 interrupts per second under Real-Time Linux. This wasn't much of an improvement, considering the x86 CPU is theoretically 100 times faster, was capable of executing multiple instructions per clock, and the little 8-bit PIC MCU has a hard limit of around 8 MIPs.
  Why does this happen?
  For a variety of factors including:
  1. The amount of data a modern CPU must flush on an interrupt is huge. Most of the cache needs to be flushed and replaced, and this means a great deal of memory bus bandwidth.
  2. Main memory latency in the modern PC is huge. Modern compiler designers assume it to be on the order of 100's of clocks.
  3. Modern operating systems have to reload the page tables for every user/kernel mode transition. This takes time, and possibly more interrupts.
  4. The I/O response time of the PCI bus is terrible relative to the cycle time of the processor. This means that most of the network traffic is transferred via DMA. However, this optimization only helps in terms of bandwidth, not latency.
  5. Multiple cores don't really help. Synchronization overhead is a killer. Tricks like software transactional memory can help, however typical I/O devices lack the ability to support software transactional memory. Even simple techniques, like giving each core its own I/O and thus eliminating synchronization overhead, are often not implemented. Linux is still trying to eliminate the "Big Kernel Lock".
  In short, the interrupt response latency in modern processors has really not improved much. This was a big discussion in some of the Ethernet working groups, because some of the simulations pointed out that 10Gb Ethernet performance might be limited by interrupt response time on the processors.
2. Re:Meanwhile... by bertok · 2012-06-05 10:19 · Score: 3, Interesting
  
  That's similar to what I'm seeing.
  I found a "DPC ping" tool which queues a simple Kernel task from User mode, and measures the response time as if it was an actual network ping.
  I found it interesting that it was well correlated with both ping times and tested network throughput, and that the DPC ping time wasn't consistent. Some fast CPUs had terrible DPC pings, and some older slower models were much faster. The operating system also contributes significantly, I found some combinations where the average was OK, but there were regular spikes of high latency.
  Now that I've done more benchmarks across more sites, I've been recommending jumbo frames much more often. Previously I though it was a nice-to-have, but these days I'm starting to be of the opinion that without it the money spent on "10 GbE to the server" is just wasted.