Slashdot Mirror


Intel Develops Hardware To Enhance TCP/IP Stacks

RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."

37 of 271 comments (clear)

  1. Good stuff! by kernelistic · · Score: 5, Interesting

    First checksum offloading, now this... It is nice to see that hardware vendors are realizing that 10Gbit/s+ speeds aren't currently realistic without extra forms of computation support from the underlying network interface hardware.

    This is Good News.

    1. Re:Good stuff! by RatRagout · · Score: 5, Informative

      Yes. Checksum was one of the problems. The other problem is the memory-to-memory-copying of data due to the semantics of the tcp/udp-send() call. This semantics require that the data existing in the memory location at the time send() is called is the data to be sent. If the application changes the data directly after the send()-call this should not affect what is sent. This means that the OS has to copy the data into kernel memory, and then at some later time copy it onto the nic. This memory-to-memory-copying becomes a severe problem when the traffic and bandwidth increases

    2. Re:Good stuff! by kernelistic · · Score: 5, Informative

      There have been multiple fixes to address the inefficiencies of the original design of the BSD TCP/IP stack.

      FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!

  2. finally... by N5 · · Score: 5, Funny

    intel is working on something worthwile: a cure for the common slashdot-ing

    and they say the drug companies are miracle workers ;)

    --
    John 3:16 - The easiest way to a BETTER YOU.
  3. White elephant? by Toby+The+Economist · · Score: 5, Interesting

    I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

    --
    Toby

    1. Re:White elephant? by Toby+The+Economist · · Score: 5, Informative

      You must imply that the hardware implimentation will be faster than the main CPU, which it almost certainly won't be, because if you've just spent 300 USD on your P4 CPU, what are you doing spending the same amount again - or more - just on your network subsystem?

      Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).

      For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!

      Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.

      --
      Toby

    2. Re:White elephant? by mr_zorg · · Score: 4, Interesting
      I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

      I think in xyz's book there's a reference which states that offloading graphics processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the graphics processing has completed.

      See how silly that sounds when you substitute network with graphics? We all know that offloading graphics processing is a good thing. Why? Because it's optimized for the task. Why couldn't the same be done for networking?

    3. Re:White elephant? by Joseph_Daniel_Zukige · · Score: 3, Interesting
      See how silly that sounds when you substitute network with graphics?

      Well, does waiting 3 milliseconds at 3 GHz outrun waiting 3 milliseconds at 300 MHz?

      The only advantage I can see to this is that it's often nice to have I/O handled in a separate process/thread running on a separate processor. But, as many have already noted, unless the I/O processor is tuned for this you've either got another expensive processor or you're running the I/O thread on a slower processor.

      If the processor _is_ tuned for this purpose, it's already been done. Most Ethernet i/f cards have a fair amount of intelligence on them already, and complete stacks have been available on cards for about as long as I've been aware of ethernet. (twenty years?)

    4. Re:White elephant? by Jeff+DeMaagd · · Score: 5, Interesting

      Graphics and networking are two very different things. Networking isn't compute intensive, it is I/O intensive. I don't think the Intel hardware network offload is for much more than basic computation.

      Besides, GPUs are more powerful than CPUs at the task of rendering polygons.

      Very often ASICs are better at a task than general purpose CPUs, just that considerations must be made as to whether the performance gain is worth the cost difference.

    5. Re:White elephant? by Toby+The+Economist · · Score: 4, Informative

      You can accelerate graphics to a very large degree because the problem is very subject to parallelism.

      You cannot accelerate networking very much because the problem is highly serial.

      It is improper to compare the two because they are fundamentally different problems.

      You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.

      Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.

      --
      Toby

    6. Re:White elephant? by Uhlek · · Score: 5, Informative

      Hardware implementation will most definitely be leaps and bounds faster than the general CPU. Can a Linux router route 720Gbps of traffic through hundreds of interfaces at once? No. But a Cisco 6500 can, because of hardware designed especially for the task.

      Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.

      Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.

    7. Re:White elephant? by Uhlek · · Score: 4, Insightful

      Comparing the two is completely valid when you're discussing the benefits of task-customized hardware and general purpose computing. Are there limitations where a hardware-based TCP/IP stack will be useful in the desktop/server market, yes, of course there is. But for high-bandwidth applications, I can assure you that offloading the TCP/IP overhead onto an ASIC will not only give you better performance, but also free up primary processor time for other applications.

      Also, Catalyst switches are not highly parallel. They can be parallel, depending on the exact model and configuration, as well as the exact path inside the switch that the traffic takes, but it's not even remotely the same in execution as having "hundreds of linux routers side by side."

      Instead, it is the exacting way in which the various components of the switch pass data, the very specific purpose of each chip and circuit in the device that gives modern routers the speed they do. Special components such as content-addressable memory, tertiary content addressable memory (memory that allows you to store 0s, 1s, and wildcard values instead of just 0s and 1s, allowing for wire-speed match comparisons against ACLs and routing tables), etc. etc. It isn't merely a stack of GP CPUs all running in parallel to achieve a particular task.

      Systems guys often mistake routers and switches for computers with a bunch of Ethernet jacks. They're far from it. They are highly specialized pieces of hardware designed from the bottom up to do one thing and do it well -- transport data. Computers are the opposite. They're designed from the bottom up to be able to do whatever you wish them to as fast as possible, but that flexibility comes with a price.

      If you ever get the urge, you should read up on Catalyst switching architecture. You'll find it quite interesting.

    8. Re:White elephant? by sconeu · · Score: 3, Informative

      Bullshit.

      I used to work at a company that did Fibre Channel.
      One of the things we had was an ASIC that did network processing in hardware, allowing us to do all sorts of interesting stuff at wire speed (2Gbps). If we had to load into memory we would have been at least an order of magnitude slower.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
  4. Fastest network card EVAR by Anonymous Coward · · Score: 4, Funny

    I was one of the lucky few who beta tested this. The plus side is you can overclock your network card to download faster than the remote server bandwidth. I did not try it, but I would be able to slashdot the slashdot.org website just by browsing it.

  5. Security updates by KiloByte · · Score: 4, Funny

    As we know it damn well, shit happens all the time.

    So... how exactly are they going to ship patches in the case of a security issue?

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  6. Ethernet controllers by Anonymous Coward · · Score: 3, Interesting

    What is needed more is a high-speed bus for network interfaces, as gigabit ethernet becomes more common. Even if a gigabit adapter had a whole 32-bit PCI bus to itself, it could still easily saturate it.

    It seems like most common denominator board manufacturers have put off 64-bit PCI support for too long. It's going to bite them in the ass if it doesn't become standard very soon.

    1. Re:Ethernet controllers by afidel · · Score: 5, Insightful

      No, a gigabit adapter can't saturate a PCI bus by itself, 32bit 33MHz PCI is 133MB/s, gigabit is 100MB/s. Then there is 32bit 66MHz PCI, and if you want you could run a 32bit card at 133MHz as the standard supports it (though I've never heard of such a card, if you need 133MHz you generally also need 64bit but I assume a ADC could use the faster speed but not need the wider word size. The fastest current implementation of the slot local bus is 16 channel PCI-express which could handle 4 10gigabit adapters. The problem would be coming up with enough data to keep those pipes full, no disk subsystem is fast enough, and any meaningfull SQL transactions are going to be CPU limited on even the bigest of servers, so why would you need a bus with more bandwidth than that? Add to this the fact that servers which actually need more throughput have long had the faster PCI slots and you realize that it's not a problem in the real world.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:Ethernet controllers by Matt_Bennett · · Score: 5, Insightful
      The critical aspect you leave out is that Gigabit ethernet is (inherently) Full Duplex. That means that that a 32/33 PCI bus would be saturated at a gigabit out, but have no bandwidth for anything incoming.

      In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.

  7. nvidia by Ecio · · Score: 5, Interesting

    Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

    1. Re:nvidia by Glock27 · · Score: 3, Informative
      Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

      Yes. The nForce4 chipsets offload most TCP/IP processing and firewall from the main CPU.

      If you go with a Athlon64 Socket 939 nForce4 board, you get PCI Express, lower power consumption, a ton of great features, good Linux support, and plug-compatible dual core upgrades down the road. Intel's offerings just seem anemic by comparison.

      (Personally, I'd also do an NVIDIA graphics board for the excellent Linux driver support. And no, I don't work for NVIDIA, I'm just a satisfied customer.)

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
  8. Interesting by miyako · · Score: 4, Insightful

    This seems interesting, though given intels track record I wonder if it will really be as useful as they are speculating, as the article has no real technical information.
    Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
    So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?

    --
    Famous Last Words: "hmm...wikipedia says it's edible"
  9. Qlogic TOE cards by jsimon12 · · Score: 5, Informative

    Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.

  10. yeah great by Anonymous Coward · · Score: 5, Funny

    soon it will be dedicated processor and RAM to deal with tcp, then a dedicated processor for the keyboard input, then a dedicated processor for the fans and a special dedicated processor on 12" PCI-X card for the extremely computationally intensive MOUSE, actually this will have it's own special dedicated path call 'AMP' or Accelerated Mouse Port. Mice of the future will need much more bandwidth than today. About 16 GB i/o so they need their own data paths.

    And then there will be other enhancements like the tcp/ip one.

    For instance a special accelerator card for Word and Internet Explorer will be developed.

    Furious Linux users will demand their own technology, so one manufacurer will come up with a special card for running GNOME apps. This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.

  11. Re:A good thing by Quobobo · · Score: 5, Funny

    Newly discovered, a simple and easy karma-gaining method! Amaze your friends, and become more eligible to moderate!

    1. Refresh your browser constantly until there's a new story on Slashdot, to post before everyone else.

    2. Post something similar to "This is good/bad, for INSERT_OBVIOUS_REASON_HERE. And fuck the INSERT_RIAA-LIKE_ORGANIZATION_HERE." (second sentence is optional)

  12. Will it support IPv6? by arc.light · · Score: 4, Interesting

    The article doesn't say, and I'd hate to be "stuck" with a card that only does IPv4. Yeah, I know, hardly anyone uses IPv6 today, but the nations of China and Japan, as well as the US DoD, are starting to roll out IPv6 networks in a big way.

  13. Lots of people agree, including AC and DM by Anonymous Coward · · Score: 4, Informative

    AC being Alan Cox, DM being Dave Miller.

    Read Alan's opinion here.

    Read Dave's opinion here.

    There has been discussion of this specific Intel announcement here.

  14. Re:the good, the bad, the ugly? by pc486 · · Score: 3, Insightful

    I can't believe the parent got modded up. This kind of thing has been done before (RTFA. Yeah yeah, I know. I must be new here...). It's called TOE (TCP Offload Engine) and many networking companies have done TOE. However, most cards are expensive and don't have much support across platforms.

    What's new here is that Intel wants to put this in their chipsets everywhere and not just in $700+ NICs. Already this has been happening with checksum offloading, TCP fragmentation, smart interrupts, and so on in most GigE chips.

    So yes, people have done this before and have been since at least 2000.

    As far a DRM is concerned, look at the NIC market and look at the TCP/IP spec. TCP/IP? Standard and anything non-standard won't work with stuff that's out there. Wierd NICs? I've been getting Linux source-code drivers for even the cheapest of cheap NICs for years now. There's too much competition to sneak in something restrictive.

  15. So, now hackers will target your BIOS rather than by ABeowulfCluster · · Score: 3, Interesting

    targeting the OS. I can see this technology being useful on servers which have multiple network cards and heavy traffic, but not for joe average pc user.

  16. So finally! by Trogre · · Score: 5, Funny

    buying Intel really will make the internet go faster!

    --
    "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  17. Old news by obeythefist · · Score: 4, Informative

    Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.

    Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)

    --
    I am government man, come from the government. The government has sent me. -- G.I.R.
  18. And the CPU doesn't have other things to do? by Moderation+abuser · · Score: 3, Insightful

    My boxes all run tens to hundreds of processes for tens to hundreds of people. Offloading the processing to a networking subsystem isn't going to hurt, especially with gig and 10gig.

    Not that this is a new idea. It's been done for donkey's years.

    --
    Government of the people, by corporate executives, for corporate profits.
  19. ha! who needs it? by flacco · · Score: 4, Funny

    ...when you can get AOL internet accelerator for FREE!

    --
    pr0n - keeping monitor glass spotless since 1981.
  20. And the integrated DRM? by tjlsmith · · Score: 5, Interesting
    and how much DRM are they going to build onto the motherboard, just in passing?

    Don't think for a minute the big boys aren't trying to take the Internet away from us. The missed the opportunity once, never twice.

    --
    Mumia Abu-Jamal is *laughably guilty*. Check the evidence.
  21. This old bit of snake-oil... by Ancient_Hacker · · Score: 4, Insightful
    The nightmare continues. It goes something like this: Some drooling "computer scientist" is too dumb to do anything useful, so they speculate" "Wouldnt it be nice to free up this $XXXX CPU from this humdrum task (choose: moving bits/bytes/pixels/ or packets)". He finds a brain-addled silicon-stuffer to design a chip to do just that. All rejoice at the increased efficiency.

    Except:

    • The silicon-stuffer only has access to the slow processes of maybe two silicon generations back, unlike the CPU which paid for the latest whizzy xx picofurlong process. So the supposedly whizzy chip is still not particularly faster than the CPU.
    • The whizzy chip shows up late, just about when the associated CPU is going to take a 2x speed hike.
    • The chip is on the I/O bus, requiring many slow I/O cycles, with interrupts masked, to get its commands.
    • Said whizzy bit-banger doesnt have any software support from the main operating systems.
    • The silicon-etcher guy can't write english worth a damm, so nobody can understand the spec sheet.
    • And oh, he didnt know the bus was active-low, so all the data packets have to be inverted.
    • And sometimes byte-reversed too.
    • The chip designer doesnt know or care about the whole system, so the chip does several things that spoil the overall performance, like hogging the bus, saturating the bus snoop logic, poisoning the cache, interrupting too often, etc.
    • The droolers forgot to think about the multi-processor option, so the chip doesnt share well with multiple CPU's.
    • The chip is all hard-wired gates, so there's no way to fix the problems.
    Finally some software wizard finds a way of speeding up the code that runs in the CPU so it's now faster than the separate chip, so the chip is now useless and just an extra power waster.

    We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.

  22. Re:Nothing to see here by ergo98 · · Score: 4, Funny

    I'll take any speed boosts Intel wants to throw my way but I think their efforts would be better spent elsewhere.

    Craig Barrett here.

    Listen we apologize for this distraction, and apologize for not consulting with you first. I guess some of our engineers just got caught up in something silly and they went off and did this when instead they could be doing things more valuable to you.

    We immediately begin work on the porn accelerator coprocessor.

  23. Similar to what Jolitzes have been up to? by Hobart · · Score: 3, Interesting
    A while ago I looked up what the original authors of BSD-on-the-386 ( 386bsd ) authors had been up to, I just searched again and found http://www.interprophet.com and http://www.telemuse.net ...
    Their new gig was putting the TCP/IP stack into the silicon for performance, the Internet Archive version says they've been at it since 1989...
    I wonder if Intel licensed their patents, or if this is similar stuff...
    --
    o/~ Join us now and share the software ...
  24. Re:side effects? by Fweeky · · Score: 3, Informative
    From FreeBSD's zero_copy(9) manpage:
    For sending data, there are no special requirements or capabilities that the sending NIC must have. The data written to the socket, though, must be at least a page in size and page aligned in order to be mapped into the kernel. If it does not meet the page size and alignment constraints, it will be copied into the kernel, as is normally the case with socket I/O.

    The user should be careful not to overwrite buffers that have been writ ten to the socket before the data has been freed by the kernel, and the copy-on-write mapping cleared. If a buffer is overwritten before it has been given up by the kernel, the data will be copied, and no savings in CPU utilization and memory bandwidth utilization will be realized.

    It also mentions some issues with regard to zero-copy receive, which requires help from the NIC to ensure received packet payloads are also page-aligned and >= page size. Such support is predictably very rare.