Intel Develops Hardware To Enhance TCP/IP Stacks
RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."
That all depends on how it's done. Simply offloading the processing won't work, but replacing the TCP/IP drivers with simple hooks into a hardware-based I/O system can.
Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.
You must imply that the hardware implimentation will be faster than the main CPU, which it almost certainly won't be, because if you've just spent 300 USD on your P4 CPU, what are you doing spending the same amount again - or more - just on your network subsystem?
Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).
For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!
Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.
--
Toby
Any given thread which needs network I/O cannot continue until that I/O is complete. The fact the CPU can switch elsewhere makes no difference to the thread which requires the network packet to be processed before it has the information it requires to continue, and if that processing is offloaded to a slower network processor, the performance of that thread is degraded.
--
Toby
Yes. Checksum was one of the problems. The other problem is the memory-to-memory-copying of data due to the semantics of the tcp/udp-send() call. This semantics require that the data existing in the memory location at the time send() is called is the data to be sent. If the application changes the data directly after the send()-call this should not affect what is sent. This means that the OS has to copy the data into kernel memory, and then at some later time copy it onto the nic. This memory-to-memory-copying becomes a severe problem when the traffic and bandwidth increases
AC being Alan Cox, DM being Dave Miller.
Read Alan's opinion here.
Read Dave's opinion here.
There has been discussion of this specific Intel announcement here.
There have been multiple fixes to address the inefficiencies of the original design of the BSD TCP/IP stack.
FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!
You got the PCI bandwidth correct, but you're gigabit bandwidth is a hair off. Depending on how you define "giga" (base 10 or base 2), you get the following numbers:
a) Gigabit/sec = 1000 Mbit/sec = 125MByte/sec
b) Gigabit/sec = 1024 Mbit/sec = 128MByte/sec
True, even these speeds don't completely saturate the PCI bus, though because of how the PCI bus is shared (each device gets a few clock cycles to do it's thing before passing control off to the next device) no single device could anyway unless it's the ONLY thing on the PCI bus. It certianly will saturate (or come dang close to it) when it has it's moment of control though.
Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.
Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)
I am government man, come from the government. The government has sent me. -- G.I.R.
You can accelerate graphics to a very large degree because the problem is very subject to parallelism.
You cannot accelerate networking very much because the problem is highly serial.
It is improper to compare the two because they are fundamentally different problems.
You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.
Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.
--
Toby
Hardware implementation will most definitely be leaps and bounds faster than the general CPU. Can a Linux router route 720Gbps of traffic through hundreds of interfaces at once? No. But a Cisco 6500 can, because of hardware designed especially for the task.
Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.
Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.
Flash memory. It's been done all the time.
4Z5TX
gigabit is full duplex - double your figures.
But new motherboards are already starting to come with gigabit attached to PCI Express. For the last few years any decent board has had them on fast PCI-X, at least 64 bit 66 MHz.
Yes. The nForce4 chipsets offload most TCP/IP processing and firewall from the main CPU.
If you go with a Athlon64 Socket 939 nForce4 board, you get PCI Express, lower power consumption, a ton of great features, good Linux support, and plug-compatible dual core upgrades down the road. Intel's offerings just seem anemic by comparison.
(Personally, I'd also do an NVIDIA graphics board for the excellent Linux driver support. And no, I don't work for NVIDIA, I'm just a satisfied customer.)
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
Bullshit.
I used to work at a company that did Fibre Channel.
One of the things we had was an ASIC that did network processing in hardware, allowing us to do all sorts of interesting stuff at wire speed (2Gbps). If we had to load into memory we would have been at least an order of magnitude slower.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.