Intel Develops Hardware To Enhance TCP/IP Stacks
RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."
Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.
You must imply that the hardware implimentation will be faster than the main CPU, which it almost certainly won't be, because if you've just spent 300 USD on your P4 CPU, what are you doing spending the same amount again - or more - just on your network subsystem?
Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).
For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!
Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.
--
Toby
Yes. Checksum was one of the problems. The other problem is the memory-to-memory-copying of data due to the semantics of the tcp/udp-send() call. This semantics require that the data existing in the memory location at the time send() is called is the data to be sent. If the application changes the data directly after the send()-call this should not affect what is sent. This means that the OS has to copy the data into kernel memory, and then at some later time copy it onto the nic. This memory-to-memory-copying becomes a severe problem when the traffic and bandwidth increases
AC being Alan Cox, DM being Dave Miller.
Read Alan's opinion here.
Read Dave's opinion here.
There has been discussion of this specific Intel announcement here.
There have been multiple fixes to address the inefficiencies of the original design of the BSD TCP/IP stack.
FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!
Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.
Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)
I am government man, come from the government. The government has sent me. -- G.I.R.
You can accelerate graphics to a very large degree because the problem is very subject to parallelism.
You cannot accelerate networking very much because the problem is highly serial.
It is improper to compare the two because they are fundamentally different problems.
You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.
Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.
--
Toby
Hardware implementation will most definitely be leaps and bounds faster than the general CPU. Can a Linux router route 720Gbps of traffic through hundreds of interfaces at once? No. But a Cisco 6500 can, because of hardware designed especially for the task.
Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.
Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.