Slashdot Mirror


Intel Develops Hardware To Enhance TCP/IP Stacks

RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."

64 of 271 comments (clear)

  1. Good stuff! by kernelistic · · Score: 5, Interesting

    First checksum offloading, now this... It is nice to see that hardware vendors are realizing that 10Gbit/s+ speeds aren't currently realistic without extra forms of computation support from the underlying network interface hardware.

    This is Good News.

    1. Re:Good stuff! by RatRagout · · Score: 5, Informative

      Yes. Checksum was one of the problems. The other problem is the memory-to-memory-copying of data due to the semantics of the tcp/udp-send() call. This semantics require that the data existing in the memory location at the time send() is called is the data to be sent. If the application changes the data directly after the send()-call this should not affect what is sent. This means that the OS has to copy the data into kernel memory, and then at some later time copy it onto the nic. This memory-to-memory-copying becomes a severe problem when the traffic and bandwidth increases

    2. Re:Good stuff! by kernelistic · · Score: 5, Informative

      There have been multiple fixes to address the inefficiencies of the original design of the BSD TCP/IP stack.

      FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!

    3. Re:Good stuff! by RatRagout · · Score: 2, Interesting

      For sending of files I'm sure this has increased performance greatly as you when sending a file might have to first read the file into userland, copy into kernel and then onto nic. Reading directly from disc to a TOE would of course be the real overhead-killer. Zero-copy techniques are also done for newer APIs like uDAPL for RDMA-operations (over InfiniBand or similar).

    4. Re:Good stuff! by acaspis · · Score: 2, Interesting

      > If the application changes the data directly after the send()-call this should not affect what is sent.

      So just don't let the application change the data (hint: single-assignment programming languages).

      > This means that the OS has to copy the data into kernel memory,

      Either that, or you could improve support for copy-on-write in the MMU (which might benefit other tasks than just networking).

      Sometimes changing the assumptions is the proper way to solve the problem.

    5. Re:Good stuff! by maxwell+demon · · Score: 2, Funny
      Doing so may ofcurse have other affects though.

      Of curse?

      d
      What do you want to drop? [a?*]
      ?
      a - a cursed -1 tcp/ip connection
      a
      Sorry, you can't drop the tcp/ip connection, it seems to be cursed.

      Hmmm ... where's my scroll of remove curse?
      --
      The Tao of math: The numbers you can count are not the real numbers.
  2. finally... by N5 · · Score: 5, Funny

    intel is working on something worthwile: a cure for the common slashdot-ing

    and they say the drug companies are miracle workers ;)

    --
    John 3:16 - The easiest way to a BETTER YOU.
  3. White elephant? by Toby+The+Economist · · Score: 5, Interesting

    I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

    --
    Toby

    1. Re:White elephant? by Uhlek · · Score: 2, Informative

      That all depends on how it's done. Simply offloading the processing won't work, but replacing the TCP/IP drivers with simple hooks into a hardware-based I/O system can.

    2. Re:White elephant? by Toby+The+Economist · · Score: 5, Informative

      You must imply that the hardware implimentation will be faster than the main CPU, which it almost certainly won't be, because if you've just spent 300 USD on your P4 CPU, what are you doing spending the same amount again - or more - just on your network subsystem?

      Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).

      For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!

      Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.

      --
      Toby

    3. Re:White elephant? by Toby+The+Economist · · Score: 2, Informative

      Any given thread which needs network I/O cannot continue until that I/O is complete. The fact the CPU can switch elsewhere makes no difference to the thread which requires the network packet to be processed before it has the information it requires to continue, and if that processing is offloaded to a slower network processor, the performance of that thread is degraded.

      --
      Toby

    4. Re:White elephant? by mr_zorg · · Score: 4, Interesting
      I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

      I think in xyz's book there's a reference which states that offloading graphics processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the graphics processing has completed.

      See how silly that sounds when you substitute network with graphics? We all know that offloading graphics processing is a good thing. Why? Because it's optimized for the task. Why couldn't the same be done for networking?

    5. Re:White elephant? by Joseph_Daniel_Zukige · · Score: 3, Interesting
      See how silly that sounds when you substitute network with graphics?

      Well, does waiting 3 milliseconds at 3 GHz outrun waiting 3 milliseconds at 300 MHz?

      The only advantage I can see to this is that it's often nice to have I/O handled in a separate process/thread running on a separate processor. But, as many have already noted, unless the I/O processor is tuned for this you've either got another expensive processor or you're running the I/O thread on a slower processor.

      If the processor _is_ tuned for this purpose, it's already been done. Most Ethernet i/f cards have a fair amount of intelligence on them already, and complete stacks have been available on cards for about as long as I've been aware of ethernet. (twenty years?)

    6. Re:White elephant? by Jeff+DeMaagd · · Score: 5, Interesting

      Graphics and networking are two very different things. Networking isn't compute intensive, it is I/O intensive. I don't think the Intel hardware network offload is for much more than basic computation.

      Besides, GPUs are more powerful than CPUs at the task of rendering polygons.

      Very often ASICs are better at a task than general purpose CPUs, just that considerations must be made as to whether the performance gain is worth the cost difference.

    7. Re:White elephant? by Trogre · · Score: 2, Insightful

      Try telling that to Amiga fans in 1989-1992.

      Those little boxes were masters at multi-processing, and they did it right - one processor for pretty much every major peripheral task (disk, graphics, sound, something else I can't remember).

      As long as these Intel coprocessors are going to be an open standard (which they almost certainly won't), then I'd welcome this addition to PC architecture.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    8. Re:White elephant? by Toby+The+Economist · · Score: 4, Informative

      You can accelerate graphics to a very large degree because the problem is very subject to parallelism.

      You cannot accelerate networking very much because the problem is highly serial.

      It is improper to compare the two because they are fundamentally different problems.

      You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.

      Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.

      --
      Toby

    9. Re:White elephant? by Uhlek · · Score: 5, Informative

      Hardware implementation will most definitely be leaps and bounds faster than the general CPU. Can a Linux router route 720Gbps of traffic through hundreds of interfaces at once? No. But a Cisco 6500 can, because of hardware designed especially for the task.

      Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.

      Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.

    10. Re:White elephant? by Uhlek · · Score: 4, Insightful

      Comparing the two is completely valid when you're discussing the benefits of task-customized hardware and general purpose computing. Are there limitations where a hardware-based TCP/IP stack will be useful in the desktop/server market, yes, of course there is. But for high-bandwidth applications, I can assure you that offloading the TCP/IP overhead onto an ASIC will not only give you better performance, but also free up primary processor time for other applications.

      Also, Catalyst switches are not highly parallel. They can be parallel, depending on the exact model and configuration, as well as the exact path inside the switch that the traffic takes, but it's not even remotely the same in execution as having "hundreds of linux routers side by side."

      Instead, it is the exacting way in which the various components of the switch pass data, the very specific purpose of each chip and circuit in the device that gives modern routers the speed they do. Special components such as content-addressable memory, tertiary content addressable memory (memory that allows you to store 0s, 1s, and wildcard values instead of just 0s and 1s, allowing for wire-speed match comparisons against ACLs and routing tables), etc. etc. It isn't merely a stack of GP CPUs all running in parallel to achieve a particular task.

      Systems guys often mistake routers and switches for computers with a bunch of Ethernet jacks. They're far from it. They are highly specialized pieces of hardware designed from the bottom up to do one thing and do it well -- transport data. Computers are the opposite. They're designed from the bottom up to be able to do whatever you wish them to as fast as possible, but that flexibility comes with a price.

      If you ever get the urge, you should read up on Catalyst switching architecture. You'll find it quite interesting.

    11. Re:White elephant? by sconeu · · Score: 3, Informative

      Bullshit.

      I used to work at a company that did Fibre Channel.
      One of the things we had was an ASIC that did network processing in hardware, allowing us to do all sorts of interesting stuff at wire speed (2Gbps). If we had to load into memory we would have been at least an order of magnitude slower.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    12. Re:White elephant? by AaronW · · Score: 2, Interesting

      As someone who writes firmware that runs on a network processor, I can attest that the networking hardware is usually very different. For example, although the network processor runs at a lowly 133MHz it is able to forward 1.5 million packets/second when performing basic routing. I don't think there's any way a traditional processor could keep up. For one, the network processor has a massive amount of memory bandwidth and low latency, using SRAM for the tables. Other network processors use content addressable memory. To give an idea of memory bandwidth, one of the network processors I'm looking closely at has 11 separate memory interfaces to the chip, a mix of SRAM and DRAM.

      Most of the work in the router is pattern matching, with a lot of lookups of various parts of the packet. The route lookup is actually one of the least expensive operations compared to all of the other operations that need to take place. Most of the high-end network processors have dedicated pattern matching hardware to speed up these operations and do not rely on caches which break when a lot of flows are active.

      One operation we found to be very painful in a traditional processor is trying to shape traffic, i.e. using the Linux shaping options. Dedicated hardware, on the other hand, has no problems with this.

      Now, I doubt many of these features would help much in a workstation or server. The only thing I can think of that would help significantly is some of the security operations, like being able to do offload encryption or do ACL lookups in hardware, much like how NVidia does this in their new NForce chipset. ACLs tend to be very expensive in terms of CPU cycles.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    13. Re:White elephant? by aminorex · · Score: 2, Insightful

      The IO processor can be made to do the task much faster than the CPU, because it is not a general-purpose chip. It implements in hardware what the CPU would implement in software. As a result, it costs much less to produce. These are the same considerations that apply to graphics pipelines. It would be grossly economically infeasible to implement the functions of a high-end GPU on the CPU, in part because it's on the wrong end of a bus.

      --
      -I like my women like I like my tea: green-
  4. Fastest network card EVAR by Anonymous Coward · · Score: 4, Funny

    I was one of the lucky few who beta tested this. The plus side is you can overclock your network card to download faster than the remote server bandwidth. I did not try it, but I would be able to slashdot the slashdot.org website just by browsing it.

  5. Security updates by KiloByte · · Score: 4, Funny

    As we know it damn well, shit happens all the time.

    So... how exactly are they going to ship patches in the case of a security issue?

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    1. Re:Security updates by TheRagingTowel · · Score: 2, Informative

      Flash memory. It's been done all the time.

      --
      4Z5TX
  6. Ethernet controllers by Anonymous Coward · · Score: 3, Interesting

    What is needed more is a high-speed bus for network interfaces, as gigabit ethernet becomes more common. Even if a gigabit adapter had a whole 32-bit PCI bus to itself, it could still easily saturate it.

    It seems like most common denominator board manufacturers have put off 64-bit PCI support for too long. It's going to bite them in the ass if it doesn't become standard very soon.

    1. Re:Ethernet controllers by afidel · · Score: 5, Insightful

      No, a gigabit adapter can't saturate a PCI bus by itself, 32bit 33MHz PCI is 133MB/s, gigabit is 100MB/s. Then there is 32bit 66MHz PCI, and if you want you could run a 32bit card at 133MHz as the standard supports it (though I've never heard of such a card, if you need 133MHz you generally also need 64bit but I assume a ADC could use the faster speed but not need the wider word size. The fastest current implementation of the slot local bus is 16 channel PCI-express which could handle 4 10gigabit adapters. The problem would be coming up with enough data to keep those pipes full, no disk subsystem is fast enough, and any meaningfull SQL transactions are going to be CPU limited on even the bigest of servers, so why would you need a bus with more bandwidth than that? Add to this the fact that servers which actually need more throughput have long had the faster PCI slots and you realize that it's not a problem in the real world.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:Ethernet controllers by Anonymous Coward · · Score: 2, Informative

      You got the PCI bandwidth correct, but you're gigabit bandwidth is a hair off. Depending on how you define "giga" (base 10 or base 2), you get the following numbers:

      a) Gigabit/sec = 1000 Mbit/sec = 125MByte/sec
      b) Gigabit/sec = 1024 Mbit/sec = 128MByte/sec

      True, even these speeds don't completely saturate the PCI bus, though because of how the PCI bus is shared (each device gets a few clock cycles to do it's thing before passing control off to the next device) no single device could anyway unless it's the ONLY thing on the PCI bus. It certianly will saturate (or come dang close to it) when it has it's moment of control though.

    3. Re:Ethernet controllers by jpc · · Score: 2, Informative


      gigabit is full duplex - double your figures.

      But new motherboards are already starting to come with gigabit attached to PCI Express. For the last few years any decent board has had them on fast PCI-X, at least 64 bit 66 MHz.

    4. Re:Ethernet controllers by Matt_Bennett · · Score: 5, Insightful
      The critical aspect you leave out is that Gigabit ethernet is (inherently) Full Duplex. That means that that a 32/33 PCI bus would be saturated at a gigabit out, but have no bandwidth for anything incoming.

      In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.

  7. nvidia by Ecio · · Score: 5, Interesting

    Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

    1. Re:nvidia by MatthewNewberg · · Score: 2, Interesting

      I've used both Nvidia, and 3com, and switched back and forth so many times(I had both unboard untill the board fired).. It doesn't seem to effect anything at all(including cpu usage). Then again I wasn't pushing more then 10mbits/sec accross the network or using a lot of connections.

    2. Re:nvidia by Glock27 · · Score: 3, Informative
      Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

      Yes. The nForce4 chipsets offload most TCP/IP processing and firewall from the main CPU.

      If you go with a Athlon64 Socket 939 nForce4 board, you get PCI Express, lower power consumption, a ton of great features, good Linux support, and plug-compatible dual core upgrades down the road. Intel's offerings just seem anemic by comparison.

      (Personally, I'd also do an NVIDIA graphics board for the excellent Linux driver support. And no, I don't work for NVIDIA, I'm just a satisfied customer.)

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
  8. Interesting by miyako · · Score: 4, Insightful

    This seems interesting, though given intels track record I wonder if it will really be as useful as they are speculating, as the article has no real technical information.
    Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
    So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?

    --
    Famous Last Words: "hmm...wikipedia says it's edible"
    1. Re:Interesting by AutumnLeaf · · Score: 2, Insightful

      I've seen extremely beefy NFS file-servers go into a crash-reboot-crash cycle after the first crash because all of the hosts trying to remount the filesystem completely crush the machine before it is fully up to speed. We've had to unplug the network cables on the server to prevent the mount storm for killing the server again.

      Note, this is enterprise-grade hardware hooked up to million-dollar disk arrays.

      Now, is that entirely from dealing with the networking stack? No. Not quite. However, consider this. It takes time to checksum headers and data. It takes time unwrap packets. If you have a ton of clients raining requests for data on your server, it's not hard to see that dealing with the networking bookkeeping could impact the throughput of requests. ie: Database servers and web servers are two things that come to mind here in addition to file servers.

      Btw, note that this another part of the "platform" initiative/orientation. While Intel's track-record has not been great in many respects, they do have a good track-record of success with "platforms." eg: Centrino was a "platform."

  9. Qlogic TOE cards by jsimon12 · · Score: 5, Informative

    Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.

    1. Re:Qlogic TOE cards by incubuz1980 · · Score: 2, Informative

      The Solaris TCP/IP stack has been greatly improved in Solaris 10. There really is a BIG difference compared to older versions of Solaris.

  10. yeah great by Anonymous Coward · · Score: 5, Funny

    soon it will be dedicated processor and RAM to deal with tcp, then a dedicated processor for the keyboard input, then a dedicated processor for the fans and a special dedicated processor on 12" PCI-X card for the extremely computationally intensive MOUSE, actually this will have it's own special dedicated path call 'AMP' or Accelerated Mouse Port. Mice of the future will need much more bandwidth than today. About 16 GB i/o so they need their own data paths.

    And then there will be other enhancements like the tcp/ip one.

    For instance a special accelerator card for Word and Internet Explorer will be developed.

    Furious Linux users will demand their own technology, so one manufacurer will come up with a special card for running GNOME apps. This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.

    1. Re:yeah great by ceeam · · Score: 2, Funny

      But then - imagine that - a single Z80 would suffice to act as a _C_PU commanding all those!

    2. Re:yeah great by yem · · Score: 2, Insightful

      I didn't know whether to mod you interesting or funny :-)

      Parallelism is great. Look the way things are going. Dual CPU motherboards, Dual core CPUs, Cell..

      And gnome.. sheesh.. back when I ran a P100 and Gnome was slow, I thought "well one day I'll have a 500Mhz monster and Gnome will be fast". Here I am with a P4-2.6Ghz/1Gb and Gnome is STILL a dog. *sigh*

      --
      No, I did not read the f***ing article!
  11. Re:A good thing by Quobobo · · Score: 5, Funny

    Newly discovered, a simple and easy karma-gaining method! Amaze your friends, and become more eligible to moderate!

    1. Refresh your browser constantly until there's a new story on Slashdot, to post before everyone else.

    2. Post something similar to "This is good/bad, for INSERT_OBVIOUS_REASON_HERE. And fuck the INSERT_RIAA-LIKE_ORGANIZATION_HERE." (second sentence is optional)

  12. Will it support IPv6? by arc.light · · Score: 4, Interesting

    The article doesn't say, and I'd hate to be "stuck" with a card that only does IPv4. Yeah, I know, hardly anyone uses IPv6 today, but the nations of China and Japan, as well as the US DoD, are starting to roll out IPv6 networks in a big way.

  13. Re:the good, the bad, the ugly? by DietCoke · · Score: 2, Interesting

    The problem is that you're still dealing with a bottleneck at the system bus, AFAIK. I installed a CAT-6 network at home today and had to do quite a bit of reading to determine whether it was worth doing. I read in numerous places that with gigabit network that you essentially need a 1Ghz processor just to keep up with the data coming in. Now, placing that processor on the NIC might make sense, but it would seem to me that it'd still have to be at least equal to the processor to be able to handle the data in a steady stream.

    I can't claim to be an expert in this subject, but that's the situation as I've understood it.

  14. Lots of people agree, including AC and DM by Anonymous Coward · · Score: 4, Informative

    AC being Alan Cox, DM being Dave Miller.

    Read Alan's opinion here.

    Read Dave's opinion here.

    There has been discussion of this specific Intel announcement here.

  15. Re:the good, the bad, the ugly? by pc486 · · Score: 3, Insightful

    I can't believe the parent got modded up. This kind of thing has been done before (RTFA. Yeah yeah, I know. I must be new here...). It's called TOE (TCP Offload Engine) and many networking companies have done TOE. However, most cards are expensive and don't have much support across platforms.

    What's new here is that Intel wants to put this in their chipsets everywhere and not just in $700+ NICs. Already this has been happening with checksum offloading, TCP fragmentation, smart interrupts, and so on in most GigE chips.

    So yes, people have done this before and have been since at least 2000.

    As far a DRM is concerned, look at the NIC market and look at the TCP/IP spec. TCP/IP? Standard and anything non-standard won't work with stuff that's out there. Wierd NICs? I've been getting Linux source-code drivers for even the cheapest of cheap NICs for years now. There's too much competition to sneak in something restrictive.

  16. So, now hackers will target your BIOS rather than by ABeowulfCluster · · Score: 3, Interesting

    targeting the OS. I can see this technology being useful on servers which have multiple network cards and heavy traffic, but not for joe average pc user.

  17. So finally! by Trogre · · Score: 5, Funny

    buying Intel really will make the internet go faster!

    --
    "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  18. Old news by obeythefist · · Score: 4, Informative

    Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.

    Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)

    --
    I am government man, come from the government. The government has sent me. -- G.I.R.
  19. And the CPU doesn't have other things to do? by Moderation+abuser · · Score: 3, Insightful

    My boxes all run tens to hundreds of processes for tens to hundreds of people. Offloading the processing to a networking subsystem isn't going to hurt, especially with gig and 10gig.

    Not that this is a new idea. It's been done for donkey's years.

    --
    Government of the people, by corporate executives, for corporate profits.
  20. Re:cpu? e-net controller? by mabinogi · · Score: 2, Funny

    didn't you know?

    The secret to faster downloads is to keep wiggling the mouse, that way it pushes the data through faster.

    --
    Advanced users are users too!
  21. if i were to make wildly unsubstatiated guesses... by evilmousse · · Score: 2, Interesting


    i'd guess the tcp/ip stack implementations available to intel are pretty solid. still, i'd hope it'd be flashable just in case. i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net.

  22. Re:the good, the bad, the ugly? by igb · · Score: 2, Interesting
    It's been done many times before. A company called CMC made a 3U VME board which provided full TCP offload to System V machines --- I ported it into an SVR3 system and ported Lachman's NFS product to run over it. Sun shipped an Omniserve (or somesuch name) product as the NC400 and NC600 for the 4/4X0 and 4/6X0 range which offloaded quite a lot of NFS and XDR protocol overhead, as well as some of TCP. Neither of these products was unique.

    Less generically, the original Auspex NFS servers had distinct boards for Ethernet, Network and File processing, which managed to do TCP offload _and_ zero copy.

    With the exception of the Auspex example, most of these cards were rapidly obsolete because the overhead of copying the network traffic to and from the offload card is greater than the work involved in doing the processing. You can't do a zero-copy without a huge amount of scaffolding in the OS.

    Anyway, 3Com had a card which did this a couple of years ago. It sank without trace.

    ian

  23. ha! who needs it? by flacco · · Score: 4, Funny

    ...when you can get AOL internet accelerator for FREE!

    --
    pr0n - keeping monitor glass spotless since 1981.
  24. And the integrated DRM? by tjlsmith · · Score: 5, Interesting
    and how much DRM are they going to build onto the motherboard, just in passing?

    Don't think for a minute the big boys aren't trying to take the Internet away from us. The missed the opportunity once, never twice.

    --
    Mumia Abu-Jamal is *laughably guilty*. Check the evidence.
  25. DoS Attacks by Gary+Destruction · · Score: 2, Interesting

    Will this technology make it easier for systems to withstand DoS Attacks?

  26. Ha, old news! FPS's have had this for ages. by quarrel · · Score: 2, Funny

    This is ridiculous.

    We're had this for years in FPS's- used to be that I used to have to practice for ages just to compete with the young kids at FPS's. Then along came some great 'acceleration' technology, and it's been so much easier. I call mine a bot.

    Ever since it hasn't been about upgrading my CPU or graphics cards to get that head-shot. I've been offloading all that work!

  27. Re:White elephant - flawed logic by morzel · · Score: 2, Insightful
    Using the same logic, machines with two (or more) CPUs wouldn't be useful, since the second CPU is not going to be any faster in than the first one.

    With all due respect to Mr. Tannenbaum, but if he stated what you put in your post, his logic is severely flawed.

    Let's compare the general CPU/networking CPU combination with a manager/secretary.
    The manager has a number of tasks which needs to be done, including scheduling a number of appointments. Without a secretary, he'll be obliged to call/contact the people involved, wait for their responses and note the scheduled appointments in his calendar. Once that is done, he can go about with his other tasks.
    When that manager has a secretary, he can just tell the secretery to make the appointments and notify him when they're done. That secretary isn't going to be any faster in time making those appointments (still has to call the same people); but in the mean time the manager can start working on something more useful (in theory).

    While the secretary may not be that much faster at scheduling appointments (she probably is, since she knows how to deal with this and who to contact a lot quicker and in a more structured way than the manager), the end result is that the manager can get more work done because he delegated some of it to the secretary.

    Note for the Politically Correct: feel free to swap he/she where approriate.

    --
    Okay... I'll do the stupid things first, then you shy people follow.
    [Zappa]
  28. This old bit of snake-oil... by Ancient_Hacker · · Score: 4, Insightful
    The nightmare continues. It goes something like this: Some drooling "computer scientist" is too dumb to do anything useful, so they speculate" "Wouldnt it be nice to free up this $XXXX CPU from this humdrum task (choose: moving bits/bytes/pixels/ or packets)". He finds a brain-addled silicon-stuffer to design a chip to do just that. All rejoice at the increased efficiency.

    Except:

    • The silicon-stuffer only has access to the slow processes of maybe two silicon generations back, unlike the CPU which paid for the latest whizzy xx picofurlong process. So the supposedly whizzy chip is still not particularly faster than the CPU.
    • The whizzy chip shows up late, just about when the associated CPU is going to take a 2x speed hike.
    • The chip is on the I/O bus, requiring many slow I/O cycles, with interrupts masked, to get its commands.
    • Said whizzy bit-banger doesnt have any software support from the main operating systems.
    • The silicon-etcher guy can't write english worth a damm, so nobody can understand the spec sheet.
    • And oh, he didnt know the bus was active-low, so all the data packets have to be inverted.
    • And sometimes byte-reversed too.
    • The chip designer doesnt know or care about the whole system, so the chip does several things that spoil the overall performance, like hogging the bus, saturating the bus snoop logic, poisoning the cache, interrupting too often, etc.
    • The droolers forgot to think about the multi-processor option, so the chip doesnt share well with multiple CPU's.
    • The chip is all hard-wired gates, so there's no way to fix the problems.
    Finally some software wizard finds a way of speeding up the code that runs in the CPU so it's now faster than the separate chip, so the chip is now useless and just an extra power waster.

    We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.

  29. Re:A good thing by orasio · · Score: 2, Funny

    3. Don't be funny. Funny doesn't give you karma.

  30. Re:Nothing to see here by ergo98 · · Score: 4, Funny

    I'll take any speed boosts Intel wants to throw my way but I think their efforts would be better spent elsewhere.

    Craig Barrett here.

    Listen we apologize for this distraction, and apologize for not consulting with you first. I guess some of our engineers just got caught up in something silly and they went off and did this when instead they could be doing things more valuable to you.

    We immediately begin work on the porn accelerator coprocessor.

  31. You speak in jest, but... by leonbrooks · · Score: 2, Insightful

    ...the orignal IBM PC put a processor in the keyboard and another (dumb) processor on the motherboard to talk to it.

    This USB keyboard I'm typing on involves at least three processors, one to scan the keys, one to do the USB on the peripheral side and the third to do the USB on the motherboard side.

    --
    Got time? Spend some of it coding or testing
  32. Similar to what Jolitzes have been up to? by Hobart · · Score: 3, Interesting
    A while ago I looked up what the original authors of BSD-on-the-386 ( 386bsd ) authors had been up to, I just searched again and found http://www.interprophet.com and http://www.telemuse.net ...
    Their new gig was putting the TCP/IP stack into the silicon for performance, the Internet Archive version says they've been at it since 1989...
    I wonder if Intel licensed their patents, or if this is similar stuff...
    --
    o/~ Join us now and share the software ...
  33. Re:White elephant - flawed logic by morzel · · Score: 2, Insightful
    This is the problem which faces networking processing. Any given thread which performs network I/O will be executing on a single CPU.
    In the purest form, it would be like that: one single thread that does not gain much from the offloading. However: have you checked just how many threads are actually running on PCs nowadays? You specifically say 'more tasks can be done concurrently'... isn't this exactly the point of offloading?

    Next thing you know, the difference between SCSI and IDE are moot because 'for one thread it won't make that much a difference since you'll end up waiting for the data to come of the platters anyway'

    To consider your analogy, if the manager has only one task to do, and needs the other person his secretary calls to respond before he can continue, there's very little point having a secretary make the call for him. He's going to be stuck waiting till the reply comes through anyway.
    There are just not many managers around nowadays that just have one task to do...

    To take the problem to an illustrative extreme, we could in theory have a multitude of slow CPUs which the main zippy CPU offloads everything to; graphics, network, disk, etc.
    Why would you think that a network processor would be slower? Just due to the fact that it is a specialized processor you can count on it that it'll do TCP checksumming and all that stuff a lot faster than most (if not all) general purpose CPUs. On top of that, you won't get interrupts/context switches for bad packets...

    While this all may not seem much, this is definitely a performance improvement for the system as a whole.

    --
    Okay... I'll do the stupid things first, then you shy people follow.
    [Zappa]
  34. Re:White elephant - flawed logic by Bill_the_Engineer · · Score: 2, Insightful

    OK I'll bite...

    The problem with Toby's argument is that he is fixated on the speed of the CPU. It doesn't matter how much slower or faster the Network CPU is compared to the Main CPU. It is more important to have the Network CPU fast enough to handle to I/O requirements dictated by the network architecture.

    With L2 cache and DMA being the norm now a days, I don't see what the problem is. Sure the Main CPU will stall if the cache needs to do fetch something from main memory, but hardware can be adjusted to take these possibilities into account.

    Having processors dedicated to tasks, frees the CPU to handle any other tasks on its agenda. I see a network ASIC being able to receive the data payload ready for transmission, and do its thing until it interrupts the CPU to report it is done.

    Also, the cpu would not have to wait for the network transmission to complete before sending more data. The network device would keep accepting payloads until the buffer was full.

    While the Graphics Card is a good example, a better example would to look at the FPU. Floating Point Arithmetic is more CPU intensive than integer. To speed things up, the CPU submits the desired computation to the FPU and the FPU notifies the CPU when the calculation is complete.

    Then there is the other omission made by Toby, the bus does not have a 1:1 speed ratio with the CPU. With this in mind and using Toby's logic, the ASIC would only have to match the bus speed not the CPU's.

    Toby keeps mentioning why pay for a dedicated CPU when expensive CPU you have can handle the task. I think most engineers would ask why tie up an expensive CPU when a dedicated CPU can do the task.

    In other words, lets free our expensive CPUs to perform general computational tasks by off loading some of the mundane labor to dedicated ASICS.

    I will say Toby is correct with one thing. In a personal computer, I don't see the advantage to the Network ASIC (other than API), since the CPU is idle most of the time anyway.

    However, in Intel's target market. I would like to have the CPU perform the application logic and offload the networking to dedicated processors. The idea being that if more headroom to the CPU is possible with the Network ASICS, I could see an increase to the maximum number of transactions per second. This increase could be just enough to keep me from investing in another blade or even another server.

    Then again.. I may need more sleep.

    Best Regards,
    Bill

    --
    These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
  35. Re:side effects? by Fweeky · · Score: 3, Informative
    From FreeBSD's zero_copy(9) manpage:
    For sending data, there are no special requirements or capabilities that the sending NIC must have. The data written to the socket, though, must be at least a page in size and page aligned in order to be mapped into the kernel. If it does not meet the page size and alignment constraints, it will be copied into the kernel, as is normally the case with socket I/O.

    The user should be careful not to overwrite buffers that have been writ ten to the socket before the data has been freed by the kernel, and the copy-on-write mapping cleared. If a buffer is overwritten before it has been given up by the kernel, the data will be copied, and no savings in CPU utilization and memory bandwidth utilization will be realized.

    It also mentions some issues with regard to zero-copy receive, which requires help from the NIC to ensure received packet payloads are also page-aligned and >= page size. Such support is predictably very rare.