Slashdot Mirror


Intel Develops Hardware To Enhance TCP/IP Stacks

RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."

271 comments

  1. Great by g8way · · Score: 0, Offtopic

    Yet another processor that requires liquid nitrogen.

  2. Good stuff! by kernelistic · · Score: 5, Interesting

    First checksum offloading, now this... It is nice to see that hardware vendors are realizing that 10Gbit/s+ speeds aren't currently realistic without extra forms of computation support from the underlying network interface hardware.

    This is Good News.

    1. Re:Good stuff! by RatRagout · · Score: 5, Informative

      Yes. Checksum was one of the problems. The other problem is the memory-to-memory-copying of data due to the semantics of the tcp/udp-send() call. This semantics require that the data existing in the memory location at the time send() is called is the data to be sent. If the application changes the data directly after the send()-call this should not affect what is sent. This means that the OS has to copy the data into kernel memory, and then at some later time copy it onto the nic. This memory-to-memory-copying becomes a severe problem when the traffic and bandwidth increases

    2. Re:Good stuff! by kernelistic · · Score: 5, Informative

      There have been multiple fixes to address the inefficiencies of the original design of the BSD TCP/IP stack.

      FreeBSD for example, has a kernel option called ZERO_COPY_SOCKETS, which dramatically increases network throughput of syscalls such as sendfile(2). With this option enabled, as the name entails, data is no longer copied from userland to kernel space and then passed onto the network card's ringbuffers. It is copied in one swoop!

    3. Re:Good stuff! by should_be_linear · · Score: 1

      I also noticed that many Enterprise servers most of it's CPU power spend parsing XML. I wonder why nobody (Intel, AMD) have hardware aid for this. It would also have huge PR benefits in Enterprise/SMB market. I guess it would be utf8 encoding only, but thats not limiting at all, is it?

      --
      839*929
    4. Re:Good stuff! by RatRagout · · Score: 2, Interesting

      For sending of files I'm sure this has increased performance greatly as you when sending a file might have to first read the file into userland, copy into kernel and then onto nic. Reading directly from disc to a TOE would of course be the real overhead-killer. Zero-copy techniques are also done for newer APIs like uDAPL for RDMA-operations (over InfiniBand or similar).

    5. Re:Good stuff! by noselasd · · Score: 1

      Well, the kernel could just block till it is done sending, thus
      sending it straight from the userspace supplied buffer.

      Doing so may ofcurse have other affects though.

    6. Re:Good stuff! by gilesjuk · · Score: 1

      10GB isn't realistic without some faster BUS technology either. Will 64-bit PCI handle it?

    7. Re:Good stuff! by acaspis · · Score: 2, Interesting

      > If the application changes the data directly after the send()-call this should not affect what is sent.

      So just don't let the application change the data (hint: single-assignment programming languages).

      > This means that the OS has to copy the data into kernel memory,

      Either that, or you could improve support for copy-on-write in the MMU (which might benefit other tasks than just networking).

      Sometimes changing the assumptions is the proper way to solve the problem.

    8. Re:Good stuff! by RatRagout · · Score: 1

      The send() (and sendto()) calls I'm referring to here are native C-calls provided by the OS to send messages over TCP or UDP. Changing the semantics could kill/unstabilize applications.

      Newer APIs like uDAPL adresses these issues by providing asynchronous message sending using calls with different semantics.

    9. Re:Good stuff! by maxwell+demon · · Score: 2, Funny
      Doing so may ofcurse have other affects though.

      Of curse?

      d
      What do you want to drop? [a?*]
      ?
      a - a cursed -1 tcp/ip connection
      a
      Sorry, you can't drop the tcp/ip connection, it seems to be cursed.

      Hmmm ... where's my scroll of remove curse?
      --
      The Tao of math: The numbers you can count are not the real numbers.
    10. Re:Good stuff! by johannesg · · Score: 1
      It is copied in one swoop!

      So shouldn't it be called ONE_COPY_SOCKETS, then?

    11. Re:Good stuff! by acaspis · · Score: 1
      Granted, the single-assignment policy won't help mainstream OSes and applications, but I am pretty sure high-performance routers probably do it all the time (i.e. pass pointers instead of copying, and garbage-collect buffers after the data has been sent and ack'ed).

      Also it looks like the BSD zero-copy sockets already use the MMU copy-on-write trick I mentioned in order to preserve the userspace semantics:
      http://www.cs.duke.edu/ari/trapeze/freenix/node6.h tml

    12. Re:Good stuff! by RatRagout · · Score: 1

      No Zero-copy points to the fact that there are no memory-to-memory copies. It's copied (or sent if you will) directly from memory to nic.

    13. Re:Good stuff! by selectspec · · Score: 1

      Ethernet RDMA protocols solve this problem. RDMA will be ubiquitos in the next year or two.

      --

      Someone you trust is one of us.

    14. Re:Good stuff! by Tower · · Score: 1

      PCI-X or PCI-E(4) is needed to come close to 10Gb... PCI-X DDR or PCI-E(8) and up are certainly sufficient.

      fast/wide PCI 64bits/66 MHz 533 MB/s 4.3 Gbps
      PCI-X 64bits/133 MHz 1.06 GB/s 8.5 Gbps
      PCI-Express x4 serial/4lanes 2.5 GHz 1 GB/s 8 Gbps
      PCI-X 266 64bits/266 MHz 2.1 GB/s 17 Gbps
      PCI-Express x8 serial/8lanes 2.5 GHz 2 GB/s 16 Gbps
      PCI-X 533 64bits/533 MHz 4.3 GB/s 34 Gbps
      PCI-Express x16 serial/16lanes 2.5 GHz 4 GB/s 32 Gbps

      --
      "It's tough to be bilingual when you get hit in the head."
    15. Re:Good stuff! by sconeu · · Score: 1

      RDMA has been in use for several years in Infiniband.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    16. Re:Good stuff! by Anonymous Coward · · Score: 0

      Best news since the 16550 UART.

      Back in my day anything over 9600 baud was gravy.

    17. Re:Good stuff! by Anonymous Coward · · Score: 0

      Wait, wait wait. Let me get this straight. You want hardware XML parsing?

      Good freak! What the hell is wrong with you! Idoit.

    18. Re:Good stuff! by Anonymous Coward · · Score: 0

      PCI-Express has started showing up in low-end servers. The Dell 2850 (2U workhorse) has an 8x PCI-Express lane on it.

      Of course considering the port costs for 10 Gbps, you probably wouldn't use it with a low-end machine anyway.

    19. Re:Good stuff! by aminorex · · Score: 1

      Ubiquitos con quesa? Ay, caramba!

      --
      -I like my women like I like my tea: green-
    20. Re:Good stuff! by aminorex · · Score: 1

      No, not even in an ideal world, with a 66MHz bus (a practically unachievable 4.2 Gbps).

      But PCI-X 266 will handle 10Gbps Ethernet, on a single 64-bit lane.

      --
      -I like my women like I like my tea: green-
    21. Re:Good stuff! by Anonymous Coward · · Score: 0

      I also noticed that many Enterprise servers most of it's CPU power spend parsing XML.

      "I also noticed that many Enterprise servers spend most of their CPU power parsing XML." or "I also noticed that most of many Enterprise servers' CPU power is spent parsing XML.".

    22. Re:Good stuff! by Anonymous Coward · · Score: 0

      doesn't work in the presence of threads and/or shared memory.

  3. finally... by N5 · · Score: 5, Funny

    intel is working on something worthwile: a cure for the common slashdot-ing

    and they say the drug companies are miracle workers ;)

    --
    John 3:16 - The easiest way to a BETTER YOU.
  4. White elephant? by Toby+The+Economist · · Score: 5, Interesting

    I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

    --
    Toby

    1. Re:White elephant? by Anonymous Coward · · Score: 0

      Of course, all CPU's/IC's/ASIC's are equally as powerfull as another, spefici instructions/clock speed/whateverthe fuck be damned, right? :P

    2. Re:White elephant? by Anonymous Coward · · Score: 1, Insightful

      Doesn't matter. Intel is eyeing AMD's success at courting the ricer community and trying to horn in on that action.

    3. Re:White elephant? by Uhlek · · Score: 2, Informative

      That all depends on how it's done. Simply offloading the processing won't work, but replacing the TCP/IP drivers with simple hooks into a hardware-based I/O system can.

    4. Re:White elephant? by Toby+The+Economist · · Score: 5, Informative

      You must imply that the hardware implimentation will be faster than the main CPU, which it almost certainly won't be, because if you've just spent 300 USD on your P4 CPU, what are you doing spending the same amount again - or more - just on your network subsystem?

      Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy() (Tannenbaum's book again).

      For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!

      Given that the main performance bottleneck is memory access, since you're basically copying buffers around and so caching isn't going to help you, I don't see how any sort of super-duper hardware is going to give you anything like a 2x speed up, let alone at an economic price.

      --
      Toby

    5. Re:White elephant? by Anonymous Coward · · Score: 0

      The CPU is never blocked on i/o in a modern operating system. The i/o is scheduled and the completion is managed asychronously.

    6. Re:White elephant? by Toby+The+Economist · · Score: 2, Informative

      Any given thread which needs network I/O cannot continue until that I/O is complete. The fact the CPU can switch elsewhere makes no difference to the thread which requires the network packet to be processed before it has the information it requires to continue, and if that processing is offloaded to a slower network processor, the performance of that thread is degraded.

      --
      Toby

    7. Re:White elephant? by mr_zorg · · Score: 4, Interesting
      I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

      I think in xyz's book there's a reference which states that offloading graphics processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the graphics processing has completed.

      See how silly that sounds when you substitute network with graphics? We all know that offloading graphics processing is a good thing. Why? Because it's optimized for the task. Why couldn't the same be done for networking?

    8. Re:White elephant? by MatthewNewberg · · Score: 1

      Is there anyway the could reduce the amount of data in the stack?

    9. Re:White elephant? by UniverseIsADoughnut · · Score: 1

      Not that some things Intel does isn't marketing driven. I doubt they would go about doing this if they didn't have good reason to.

      It's not like this would be an easy thing to sell in some way that people would really understand very well. But regardless they aren't going to develop a whole new piece of hardware that is worthless. Making a design decision that pushes something down a bad path like clock speed is a whole different issue. I'm pretty sure intel guys would think this one out before spending a ton of cash working on it.

      I think intel is realizing that the future is much brighter in them delivering better hardware solutions for distributing out the computer to many parts instead of everything done in software in the CPU. Not only does it mean their cpus don't have to work as much thus less heat and power draw, but it also means they get to sell more chips. And specialized hardware will always be faster then software. The CPU should be used for stuff that can't be easily hardware done, or for emerging things. Once it gets well define it should get moved out to it's own chip.

      I just picture them moving towards more centrenio type families

    10. Re:White elephant? by Joseph_Daniel_Zukige · · Score: 3, Interesting
      See how silly that sounds when you substitute network with graphics?

      Well, does waiting 3 milliseconds at 3 GHz outrun waiting 3 milliseconds at 300 MHz?

      The only advantage I can see to this is that it's often nice to have I/O handled in a separate process/thread running on a separate processor. But, as many have already noted, unless the I/O processor is tuned for this you've either got another expensive processor or you're running the I/O thread on a slower processor.

      If the processor _is_ tuned for this purpose, it's already been done. Most Ethernet i/f cards have a fair amount of intelligence on them already, and complete stacks have been available on cards for about as long as I've been aware of ethernet. (twenty years?)

    11. Re:White elephant? by Jeff+DeMaagd · · Score: 5, Interesting

      Graphics and networking are two very different things. Networking isn't compute intensive, it is I/O intensive. I don't think the Intel hardware network offload is for much more than basic computation.

      Besides, GPUs are more powerful than CPUs at the task of rendering polygons.

      Very often ASICs are better at a task than general purpose CPUs, just that considerations must be made as to whether the performance gain is worth the cost difference.

    12. Re:White elephant? by JollyFinn · · Score: 1

      What the heck. Few factoids
      The main CPU runs multiple things.
      The cost of network traffic are cache flushes and context switches. And so on.
      General purpose CPU is much weaker than special purpose CPU, if you can parallerize at all.
      And MFG costs my ass. These things should be relatively small.

      Think following scenario.
      Network interrupt->context switch-> move lot of data around and compute some what-> context switch.
      To finish what I was doing, and then compute the thing that I just put in the line. (unless some other processor does it first ;)
      VS
      I finished of doing previous thing, I'll check if there is anything new for me to crunch on, if not I'll yield the processor voluntarily in case some other thread needs it, but there is something so I can continue running my code from trace cache that would of been flushed in context switch...
      I can see order of magnitude difference on those two approaches. Remember TLB miss is REALLY expensive, as a instruction cache misses, and getting stuff from mainmemory.

      --
      Emacs is good operating system, but it has one flaw: Its text editor could be better.
    13. Re:White elephant? by Anonymous Coward · · Score: 0

      Consider the fact that you can get extremely high performance for a high specialized task using an FPGA. It's conceivable that they could create hardware designed for these specific tasks that could outperform the main CPU (for these specific tasks) without requiring transistor densities as great as those needed for the top of the line general purpose processors.

      For some tasks a general purpose processor can be at a disadvantage since it carries "dead weight" unneccessary for the task.

    14. Re:White elephant? by Trogre · · Score: 2, Insightful

      Try telling that to Amiga fans in 1989-1992.

      Those little boxes were masters at multi-processing, and they did it right - one processor for pretty much every major peripheral task (disk, graphics, sound, something else I can't remember).

      As long as these Intel coprocessors are going to be an open standard (which they almost certainly won't), then I'd welcome this addition to PC architecture.

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
    15. Re:White elephant? by Toby+The+Economist · · Score: 4, Informative

      You can accelerate graphics to a very large degree because the problem is very subject to parallelism.

      You cannot accelerate networking very much because the problem is highly serial.

      It is improper to compare the two because they are fundamentally different problems.

      You can throw tons of hardware at 3D graphics and get good results, because just by having more and more pipelines, you go faster and faster.

      Processing a network packet is quite different; the data goes through a series of serial steps and eventually reaches the application layer. The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin.

      --
      Toby

    16. Re:White elephant? by Uhlek · · Score: 5, Informative

      Hardware implementation will most definitely be leaps and bounds faster than the general CPU. Can a Linux router route 720Gbps of traffic through hundreds of interfaces at once? No. But a Cisco 6500 can, because of hardware designed especially for the task.

      Simply put, software on general purpose processors sucks for doing heavy computational work. Hardware tuned especially for a task has, and always will, be where it's at. However, the costs involved in creating ICs specific to a task usually mean that ASICs are only created where there is a need. Modern graphics cards are a great example. The on-board graphics processors are designed especially to create graphics, something that, if offloaded onto the GP CPU, would crush even the highest of the high end.

      Also, offloading the TCP/IP stack on a normal workstation probably isn't going to be a huge performance boost. Where this will be useful is in situations where there is a need for high-throughput, low-latency network I/O processing.

    17. Re:White elephant? by Anonymous Coward · · Score: 0

      My operating system is not DOS and I have multiple threads running on the CPU. So clearly offloading network I/O to extra hardware frees up my CPU to do something more useful, for example providing dynamic web content, or scaning incoming email for SPAM, or hell just redrawing damaged window regions on my desktop.

      I'm not really picking on you, I just thought I'd point out that this idea that "You can't offload IO to hardware because the process is blocked on that IO anyway!" is very outdated. It may have made some sense in 1983 when Minix was written but it doesn't make any sense today.

    18. Re:White elephant? by evilviper · · Score: 1
      that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU

      This is a pretty ridiculous claim. Take a look at Cisco routers some time... With a slow CPU, they can transfer gigabit upon gigabits of data through every second. In some cases, they are even just using PCI network cards.

      Packetizing data, and handling the incredible storm of interrupts, is something CPUs are very poor at. Servers stand to get a huge performance benefit from offloading that particular work.

      But, of course, you may have read something very general somewhere, so Intel must be spending millions of dollars for nothing, and just making-up their intentions to unveil this product soon...
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    19. Re:White elephant? by Lehk228 · · Score: 1

      and this hardware will be better at checksumming than a normal CPU, you didn't think they were just going to stick a pentium II on there and call it a NIC.

      --
      Snowden and Manning are heroes.
    20. Re:White elephant? by Anonymous Coward · · Score: 0

      Why couldn't each of these steps have their own specialized hardware?

    21. Re:White elephant? by Anonymous Coward · · Score: 1, Interesting

      It is improper to compare, in the way you have, a linux router to a Cisco 6500.

      You are correct to say the throughput of the Cisco is much higher.

      But, and here's the thing, how much faster is that Cisco box at processing *any one socket*? I think the answer is the linux router and the Cisco box are in the same order of magnitude.

      The Cisco box reache 720 Gbps because it is highly parallel. It's like having hundreds of linux routers side by side.

      Intel are developing hardware for the desktop CPU, motherboard and NIC, which is to say, for a single machine which is going to be probably connected to the net over ethernet. There'll be one or a few cores, and most network processing will be by processes which will be blocked until they receive the data they're being sent - and this too is entirely unlike the router, which is just throwing data in and then out again.

      All in all, cheese and chalk. A big-box router operates in a different way and fufils a different task to the desktop machine Intel are working on. Comparing the two is invalid.

      --
      Toby

    22. Re:White elephant? by mboverload · · Score: 1

      Yeah, compression.

    23. Re:White elephant? by Toby+The+Economist · · Score: 1

      It is true that if you offload network I/O, you free the CPU to perform other tasks.

      This has the negative effect of making the thread which has had its network I/O offloaded slower, but the positive effect of freeing the CPU to perform other tasks.

      However, I say to you, on a desktop system, which is where this Intel stuff is, the user is usually going to be the cause of the network traffic and he will want that thread to perform and will not care that he could be freeing up a few more percent CPU time to be spent in the idle task.

      --
      Toby

    24. Re:White elephant? by Uhlek · · Score: 4, Insightful

      Comparing the two is completely valid when you're discussing the benefits of task-customized hardware and general purpose computing. Are there limitations where a hardware-based TCP/IP stack will be useful in the desktop/server market, yes, of course there is. But for high-bandwidth applications, I can assure you that offloading the TCP/IP overhead onto an ASIC will not only give you better performance, but also free up primary processor time for other applications.

      Also, Catalyst switches are not highly parallel. They can be parallel, depending on the exact model and configuration, as well as the exact path inside the switch that the traffic takes, but it's not even remotely the same in execution as having "hundreds of linux routers side by side."

      Instead, it is the exacting way in which the various components of the switch pass data, the very specific purpose of each chip and circuit in the device that gives modern routers the speed they do. Special components such as content-addressable memory, tertiary content addressable memory (memory that allows you to store 0s, 1s, and wildcard values instead of just 0s and 1s, allowing for wire-speed match comparisons against ACLs and routing tables), etc. etc. It isn't merely a stack of GP CPUs all running in parallel to achieve a particular task.

      Systems guys often mistake routers and switches for computers with a bunch of Ethernet jacks. They're far from it. They are highly specialized pieces of hardware designed from the bottom up to do one thing and do it well -- transport data. Computers are the opposite. They're designed from the bottom up to be able to do whatever you wish them to as fast as possible, but that flexibility comes with a price.

      If you ever get the urge, you should read up on Catalyst switching architecture. You'll find it quite interesting.

    25. Re:White elephant? by Anonymous Coward · · Score: 0

      This is server tech, not desktop.

      ""This will solve some of the problems we have heard from a lot of enterprise server customers," said an Intel spokesman."

      And for servers, esp. Virtual Servers, driving multi-gigabit I/O. such as during backups, CPU utilization is definitely a concern. You should read the article before "correcting" other people.

    26. Re:White elephant? by Anonymous Coward · · Score: 0

      > You cannot accelerate networking very much because the problem is highly serial.

      Ever heard of web servers with thousands of open sockets? If that's not highly parallel then I don't what is.

      One immediate result should be higher throughput of the system.

      Moreover, freeing the cpu from being slowed down by tcp processing means it can do some real application processing (that's what cpu's are best at).

    27. Re:White elephant? by Tim+C · · Score: 1

      Besides, GPUs are more powerful than CPUs at the task of rendering polygons.

      Yes, that's the whole point - they're more powerful at that task because they're specifically designed to perform that task (amongst others). Similarly, a "network processing unit" would be specifically designed to support in hardware the operations required of it. Make that chip fast enough, and it'll be faster at doing it than a general-purpose CPU. The only question is how fast it has to be, and whether or not it's cost-effective.

    28. Re:White elephant? by maxwell+demon · · Score: 1

      Ever heared of pipelining?

      So it's a series of steps. Ok, then make each step a part of a pipeline, with a specialized circuit for exactly that step. Then while the next circuit on the pipeline gets to do the next step on that packet, the first one can already start processing the next packet. This is how modern CPUs speed up the decoding of machine instructions, so why shouldn't the same work with TCP/IP packets as well?

      --
      The Tao of math: The numbers you can count are not the real numbers.
    29. Re:White elephant? by Anonymous Coward · · Score: 0

      is to up the clock rate But we already know intel is good at that game (or is it needing higher clocks for same performance?), too bad your network card will create 120 watts of heat now! Like that P4 wasn't bad enough.

    30. Re:White elephant? by PDAllen · · Score: 1

      If you've just spent $300 on your new P4, why would you buy a graphics card at $200 which has an order of magnitude more computing power?

      Hint: general purpose CPU doing specific job => slow. Custom designed hardware for specific job => faster and cheaper.

    31. Re:White elephant? by Florian+Weimer · · Score: 1

      I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.

      This a bit of an oversimplification. There are at least three cases in which offloading makes sense: dropping packets on the NIC (for example, during a DoS attack), reducing bus overhead by combining multiple requests into one (TCP segmentation offload), and computation which takes significant advantage from special hardware (TCAM/network search engines for making forwarding decisions).

      The first two issues are mostly relevant only on systems in which the NIC shares a comparatively low-bandwidth bus with other devices. The third one requires specialized memory chips (TCAM). AFAIK, a reasonable sized TCAM chip still costs too much for integration even into high-end NICs, and its power consumption is also a concern. That's why I think that offloading doesn't makes too much sense for NICs (at least from technical point of view, it's very nice for marketing, though).

    32. Re:White elephant? by WoodieR · · Score: 1

      so is this to be another proprietary standard to further confuse the issues when deciding on a new PC or is this to be released as an open standard like RAM etc ?

      --
      Question Authority before IT questions You ...
    33. Re:White elephant? by advocate_one · · Score: 1
      "The only way you can really make it go faster is to up the clock rate, and you find it's uneconomic to try to beat the main CPU, which remember has *already* been paid for. You have all that CPU for free; to then spend the kind of money you'd need to outpace the CPU makes no sense, let alone the money you'd need to spend to outpace the CPU by a decent margin"

      what??? not when your doing multimedia decoding of compressed data... or other such tasks... offloading the networking stuff to hardware will have the same benefit as dumping a softmodem for a real modem... I want to dump the compute bound stuff to purpose built hardware so the CPU can get on with doing something else while the crunching is being done... why do you think your CPU has a floating point co-processor built in??? the cpu is busy doing something else and when the FPU has finished the calculation, it interrupts the cpu to notify it the result's ready... IT WILL BE EXACTLY THE SAME WITH THIS HARDWARE ASSIST FOR TCP/IP STACK...

      --
      Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
    34. Re:White elephant? by Qzukk · · Score: 1

      the main CPU is normally blocked in it's task until the network processing has completed.


      Good thing my scheduler has about 50 other tasks in the queue waiting for their turn.

      --
      If I have been able to see further than others, it is because I bought a pair of binoculars.
    35. Re:White elephant? by lostguy · · Score: 1
      For hardware TCP/IP processing to be useful, you need to be say 2x the speed of the CPUs memcpy() function!


      Not at all true. Dipping into Ricardian economics, you can conclude that the best, most valuable, purpose of the primary CPU is to process user input and to execute applications. If another CPU can be introduced into the computational economy such that it can perform a task, even if at a lower rate than the primary CPU, thus freeing up the primary CPU to perform its most valuable task more efficiently, then computational trade with that CPU is a win.

      The secondary CPU does not need to be 2x as fast at processing TCP/IP as the primary CPU. It merely needs to process it well enough that it can take the load off the primary CPU.

      Now, if the primary CPU has to sit around and wait for the secondary CPU to accomplish its task, thereby creating an opportunity cost in computation, then yes, the secondary CPU needs to be significantly faster for its presence to be justified.
    36. Re:White elephant? by Anonymous Coward · · Score: 0

      shutup

    37. Re:White elephant? by Taladar · · Score: 1

      That doesn't change the fact that it isn't the computational performance needed for fast networking but I/O performance which can not be achieved by placing the processor work in another component but by upgrading the slower parts of the PC like the PCI-Bus.

    38. Re:White elephant? by Anonymous Coward · · Score: 0

      that was pre-out of order execution. Nowadays, the processor can start to execute code that relies on the data that hasn't been processed yet, and once it gets to the point where it needs that data, it is hopefully available for it to execute it. otherwise it waits a bit, Also keep in mind this is a SERVER technology. With today's servers having 10 or more gigabit ethernet connections, TCP/IP overhead has resulted in an entire processor being consumed by tcp/ip overhead. If intel was billing this as the next great thing for desktop PCs I'd agree it's a white elephant. However, Servers really need this technology in order to light up all that dark fiber effectively ;)

    39. Re:White elephant? by jeanicinq · · Score: 1

      offloading network processing normally isn't useful With implementation of Vanderpool (VMX) instructions, I doubt the goal was just to offload the network process just to benefit one kernel. I see this move as to move the TCP/IP stack out of the kernel. That move would allow an easier approach to implement multiple kernels on the same machine with the use of VMX where each one access the TCP/IP hardware directly. Each kernel/system would benefit overall.

    40. Re:White elephant? by Anonymous Coward · · Score: 0

      desktop machine Intel are working on
      You mean 'rackmount server', not 'desktop machine' and the word 'are' should be replaced with the word 'is' unless you want to sound like a redneck, in which case you need to add a 'be' after the word 'are'.. HTH. HAND.

    41. Re:White elephant? by Taladar · · Score: 1

      Chances are that if you really need 10 Gbit/s networking you will have mostly network-related tasks running.

    42. Re:White elephant? by ratboy666 · · Score: 1

      Back to the question...

      Yes, I agree that "Catalyst" or whatever makes routing decisions very fast. And ASICs can help.

      But... How does this accelerate "scp" (just an example). The data has to be read, has to be encrypted, and then has to be sent.

      What fraction of this does your blessed "Catalyst" or ANY special-purpose ASIC accelerate? Checksums - check. Possibly a single buffer copy - check.

      In other words - YOU WILL NOT SEE MORE THAN 2X IMPROVEMENT.

      Ok, maybe... because its scp. Ok, try with FTP.. What!?! The results are the same!?!

      Pick an example that HAS TO DEAL WITH THE DATA, and illustrate how that is improved.

      You do get the idea. To quote you: "but that flexibility comes with a price".

      Ratboy

      --
      Just another "Cubible(sic) Joe" 2 17 3061
    43. Re:White elephant? by sconeu · · Score: 3, Informative

      Bullshit.

      I used to work at a company that did Fibre Channel.
      One of the things we had was an ASIC that did network processing in hardware, allowing us to do all sorts of interesting stuff at wire speed (2Gbps). If we had to load into memory we would have been at least an order of magnitude slower.

      --
      General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
    44. Re:White elephant? by Anonymous Coward · · Score: 0

      And some of us want that CPU doing useful work like dynamic content creation instead of spending all its time doing gruntwork like TCP.

    45. Re:White elephant? by CoolGuySteve · · Score: 1

      That would be more reasonable if you didn't have to have CPU interrupts for every frame as well as the occasional gettimeofday(). These are the things that kill CPU performance when there's a high speed interface involved.

      Also, in many cases, all you want to do is forward a packet out another interface and it would be cool if the kernel didn't need to be bothered with such a simple task.

      There is a need for accelerated TCP/IP. I suspect this technology is meant for clustering, where the packets can get huge but the frames are relatively tiny and bothersome.

    46. Re:White elephant? by Rei · · Score: 1

      Even servers have multiple tasks to run, you know. It's not like we're talking about clients bouncing UDP packets back and forth like a game of Pong (hmm... neat program idea there... ;) ). They're doing stuff with the data. Only in the case of "Only connecting to one client, and I have to process what they send me before I can get another packet from them" is the situation as described. Sounds like a pretty rare situation to me. Only in that situation or the "on-server processing is trivial" cases would this card be pointless. I doubt you'll find most servers to be this way.

      --
      "Lock and load, Brides of Christ!"
    47. Re:White elephant? by Nevyn · · Score: 1
      That would be more reasonable if you didn't have to have CPU interrupts for every frame

      Hint: This is false for both FreeBSD and Linux.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    48. Re:White elephant? by bluGill · · Score: 1

      I haven't done benchmarks, but my guess is the Cisco router is an order of magnitude SLOWER processing any one socket. Cisco optimizes their products for passing packets based on the IP header. They rarely look at the TCP/UDP part at all.

      Often IP routing is done in hardware, but there is a slow trap called for every packet destined to the router. The CPU then slowly switches modes, processes the packet, and deals with it. Since the router should never receive many packets they put little effort into optimizing this part of the process. Linux by comparison often runs things like a webserver, meaning it often has to process a packet destined for it, so they put effort into this part.

      Simple economics, and programing 101. Do not optimize code that is rarely used. Cisco managers should fire anyone who wastes time optimize code that isn't a bottleneck.

      Note that firewalls muck this up a little because the CPU may need to process many packets, but they still are not packets destined for the router.

    49. Re:White elephant? by borud · · Score: 1
      during the past 15 years a lot of things have happened with regard to CPUs and the way we write software and indeed what our software does. you should take this into consideration and periodically re-evaluate your axioms. what you find might surprise you.

      for instance, if you look at the past 5 years, CPUs have become incredibly fast, while memory latencies haven't really improved that much. for instance this means that in many scenarios, caching intermediate results in large tables may not always be faster than re-calculating them. it can also mean that memory locality is important for some applications.

      none of my classical textbooks even consider this. they more or less take for granted that the world isn't changing.

    50. Re:White elephant? by AaronW · · Score: 2, Interesting

      As someone who writes firmware that runs on a network processor, I can attest that the networking hardware is usually very different. For example, although the network processor runs at a lowly 133MHz it is able to forward 1.5 million packets/second when performing basic routing. I don't think there's any way a traditional processor could keep up. For one, the network processor has a massive amount of memory bandwidth and low latency, using SRAM for the tables. Other network processors use content addressable memory. To give an idea of memory bandwidth, one of the network processors I'm looking closely at has 11 separate memory interfaces to the chip, a mix of SRAM and DRAM.

      Most of the work in the router is pattern matching, with a lot of lookups of various parts of the packet. The route lookup is actually one of the least expensive operations compared to all of the other operations that need to take place. Most of the high-end network processors have dedicated pattern matching hardware to speed up these operations and do not rely on caches which break when a lot of flows are active.

      One operation we found to be very painful in a traditional processor is trying to shape traffic, i.e. using the Linux shaping options. Dedicated hardware, on the other hand, has no problems with this.

      Now, I doubt many of these features would help much in a workstation or server. The only thing I can think of that would help significantly is some of the security operations, like being able to do offload encryption or do ACL lookups in hardware, much like how NVidia does this in their new NForce chipset. ACLs tend to be very expensive in terms of CPU cycles.

      --
      This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    51. Re:White elephant? by Anonymous Coward · · Score: 0

      Networking isn't slow because of lack of parallelism, its slow because a general purpose microprocesser isn't designed specifically for the task. TCP/IP has very fixed resources it uses to keep track of connections and offloading this to a custom ASIC would definately result in measureable speed increases with heavy network activity.

      Keep in mind Intel already makes highly specialized network processors such as the IXP1200 and IXP2100 that have six true 64-bit datapath procesors(with 128 registers each), a hardware hash table, hardware context switching control, queuable memory controllers an ARM control processor and 64-bit I/O FIFOs all on one die.

    52. Re:White elephant? by Fweeky · · Score: 1

      "You cannot accelerate networking very much because the problem is highly serial."

      It's serial at line-level (usually), but that doesn't mean you can't process frames/packets in parallel. At the very least you can offload things like checksums, and process packets for different connections in parallel; you can even do simple things like aligning packet payloads on page boundaries so the OS doesn't have to copy them about needlessly (zero-copy is good when you're talking 100's of MB/s, no?). On top of that, I'm pretty sure you can process multiple packets for a single connection in parallel since you can fill different bits of your socket window from multiple packets at once.

      "You have all that CPU for free;"

      Er, no, you don't; you just said it's been paid for, it's certainly not free. In the case of a heavily loaded server, you don't want half your CPU doing little more than copying memory between the network and your application, doing tedious packet reassembly, calculating checksums and servicing more interrupts than necessary; you want it running your application code to generate that data.

    53. Re:White elephant? by Anonymous Coward · · Score: 0

      Ah, *that's* why your multi GHz, multi GB computer was unable to tell you the difference between ITS and IT'S!

    54. Re:White elephant? by Anonymous Coward · · Score: 0

      Except twenty years ago when the Amiga came out with co-processors for graphics and sound. THEN it was bad! Bad Amiga!

    55. Re:White elephant? by aminorex · · Score: 2, Insightful

      The IO processor can be made to do the task much faster than the CPU, because it is not a general-purpose chip. It implements in hardware what the CPU would implement in software. As a result, it costs much less to produce. These are the same considerations that apply to graphics pipelines. It would be grossly economically infeasible to implement the functions of a high-end GPU on the CPU, in part because it's on the wrong end of a bus.

      --
      -I like my women like I like my tea: green-
    56. Re:White elephant? by innosent · · Score: 1

      You are forgetting a crucial point in high-end network routing... You have to DO SOMETHING with those packets, like filter them through your firewall rules, NAT them, parse them, whatever. Offloading TCP/IP to specialized hardware means that the CPU(s) is/are free to do actual work. Even routing at GbE speeds limits the processing you can do with those packets at wire speed, and falling behind is just not something you can allow to happen (when/how are you going to catch up if you have a saturated or near-saturated link?). You have to decode packets, verify checksums, process the data (probably through several layers), calculate the new checksums, and encode the packets to send. If you offload the network protocols, all you have to do is process the data.

      --
      --That's the point of being root, you can do anything you want, even if it's stupid.
    57. Re:White elephant? by Anonymous Coward · · Score: 0

      Those 3 pound illuminated heatsinks aren't cutting it?

    58. Re:White elephant? by chainsaw1 · · Score: 1

      Right. The pure essence of runing netowrk operations may not benefit, but the implementations we tend to use do. In addition to checksumming, I think having hardware support on the NIC for compression, encryption, (firewall rule processing?) would be very helpful...even though direct ethernet communication doesn't need any of this.

      --
      - Sig
    59. Re:White elephant? by Anonymous Coward · · Score: 0

      the main CPU is normally blocked in it's task

      "its".

  5. Fastest network card EVAR by Anonymous Coward · · Score: 4, Funny

    I was one of the lucky few who beta tested this. The plus side is you can overclock your network card to download faster than the remote server bandwidth. I did not try it, but I would be able to slashdot the slashdot.org website just by browsing it.

  6. Security updates by KiloByte · · Score: 4, Funny

    As we know it damn well, shit happens all the time.

    So... how exactly are they going to ship patches in the case of a security issue?

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
    1. Re:Security updates by TheRagingTowel · · Score: 2, Informative

      Flash memory. It's been done all the time.

      --
      4Z5TX
    2. Re:Security updates by Capt'n+Hector · · Score: 1

      Easily, that's how. Very, very easily.

      --
      Quid festinatio swallonis est aetherfuga inonusti?
      Africus aut Europaeus?
    3. Re:Security updates by Anonymous Coward · · Score: 0

      updating flash memory has always been a scary thing to do. hopefully the code is good enough so you would not have to flash it all of the time.

    4. Re:Security updates by Florian+Weimer · · Score: 1

      So... how exactly are they going to ship patches in the case of a security issue?

      Typically, the host system driver uploads the firmware code that deals with all non-essential features (obviously, booting from network already needs most of the firmware).

    5. Re:Security updates by tji · · Score: 1

      > So... how exactly are they going to ship patches in the case of a security issue?

      That's a good point. Depending on how this is implemented, a security fix could be a big problem. If they used an ASIC on the NIC card, you might be looking at a swap-out upgrade.

      If they use something that is firmware upgradable, it generally wouldn't have the performance of a true hardware solution.

      Although, TCP/IP is a pretty well established standard, so I think security issues in individual packet processing are unlikely. There is not a lot of security happening in TCP/IP.. it's known to be insecure. The complex security operations are happening at other layers (an IPSec tunnel, or application-specific security).

      If a problem was found in the network offload hardware, you would just fall back to a standard processing scenario - where all the network processing is done on the host CPU.

  7. the good, the bad, the ugly? by Interfacer · · Score: 1, Interesting

    It seems such an obvious thing: make a tcp/ip processor, put it on a NIC and give it a high level interface, instead of just a low level IP interface.

    makes you wonder why nobody has done it before...

    maybe this is some plan of intel to control the internet: add some secret DRM capability to it, wait until everyone until everyone is using it, and then take over the world.
    Or -door number 2- sell your services to the NSA.

    1. Re:the good, the bad, the ugly? by Anonymous Coward · · Score: 0

      It has been done before.

    2. Re:the good, the bad, the ugly? by DietCoke · · Score: 2, Interesting

      The problem is that you're still dealing with a bottleneck at the system bus, AFAIK. I installed a CAT-6 network at home today and had to do quite a bit of reading to determine whether it was worth doing. I read in numerous places that with gigabit network that you essentially need a 1Ghz processor just to keep up with the data coming in. Now, placing that processor on the NIC might make sense, but it would seem to me that it'd still have to be at least equal to the processor to be able to handle the data in a steady stream.

      I can't claim to be an expert in this subject, but that's the situation as I've understood it.

    3. Re:the good, the bad, the ugly? by pc486 · · Score: 3, Insightful

      I can't believe the parent got modded up. This kind of thing has been done before (RTFA. Yeah yeah, I know. I must be new here...). It's called TOE (TCP Offload Engine) and many networking companies have done TOE. However, most cards are expensive and don't have much support across platforms.

      What's new here is that Intel wants to put this in their chipsets everywhere and not just in $700+ NICs. Already this has been happening with checksum offloading, TCP fragmentation, smart interrupts, and so on in most GigE chips.

      So yes, people have done this before and have been since at least 2000.

      As far a DRM is concerned, look at the NIC market and look at the TCP/IP spec. TCP/IP? Standard and anything non-standard won't work with stuff that's out there. Wierd NICs? I've been getting Linux source-code drivers for even the cheapest of cheap NICs for years now. There's too much competition to sneak in something restrictive.

    4. Re:the good, the bad, the ugly? by rf0 · · Score: 1

      The more things are abstracted from the user the less we know about what is going on. Of course I'm being totally paranoid but it does open the way to easier ethernet tapping I suppose

      Rus

    5. Re:the good, the bad, the ugly? by igb · · Score: 2, Interesting
      It's been done many times before. A company called CMC made a 3U VME board which provided full TCP offload to System V machines --- I ported it into an SVR3 system and ported Lachman's NFS product to run over it. Sun shipped an Omniserve (or somesuch name) product as the NC400 and NC600 for the 4/4X0 and 4/6X0 range which offloaded quite a lot of NFS and XDR protocol overhead, as well as some of TCP. Neither of these products was unique.

      Less generically, the original Auspex NFS servers had distinct boards for Ethernet, Network and File processing, which managed to do TCP offload _and_ zero copy.

      With the exception of the Auspex example, most of these cards were rapidly obsolete because the overhead of copying the network traffic to and from the offload card is greater than the work involved in doing the processing. You can't do a zero-copy without a huge amount of scaffolding in the OS.

      Anyway, 3Com had a card which did this a couple of years ago. It sank without trace.

      ian

    6. Re:the good, the bad, the ugly? by Anonymous Coward · · Score: 0

      ?????

      No. Modern buses such as PCI-X can shift enough data for several NIC's at full whack. The really funny bit is that if you process all those TCP packets in hardware you have to transfer less data over the bus, because you've already striped off the Ethernet frame headers, the IP header, and probably most of the TCP headers. You only need to pass lumps of raw data over the bus, instead of entire Ethernet frames as you do with a normal NIC.

    7. Re:the good, the bad, the ugly? by Steamboater · · Score: 1


      A 2 Ghz Opteron running Solaris 10 will saturate
      a 1Gb network (no jumbo frames) using about 8% of
      the CPU. For 10 Gb and PCI-Express such an accelerator might be useful... as long as it does ipv4, ipv6, ipsec, vlan, etc, etc.

      The one great advantage of doing TCP/IP largely in
      software is that new features, protections against DOS attacks, etc, are easily added.

      - Bart

  8. Ethernet controllers by Anonymous Coward · · Score: 3, Interesting

    What is needed more is a high-speed bus for network interfaces, as gigabit ethernet becomes more common. Even if a gigabit adapter had a whole 32-bit PCI bus to itself, it could still easily saturate it.

    It seems like most common denominator board manufacturers have put off 64-bit PCI support for too long. It's going to bite them in the ass if it doesn't become standard very soon.

    1. Re:Ethernet controllers by kernelistic · · Score: 1

      This has come to be in the server space. Select servers (Usually mid-level and higher) from the likes of Dell have had 64-bit PCI slots in them for at least 4 years now.

      It is becoming more common to see onboard ethernet cards in user systems as it frees up a PCI slot. There isn't any reason (Cost aside) as to why these cards could not be interfaced to existing 133-Mhz PCI-X bridges.

      Remember that a 64-bit bus alone does not give you extra throughput. Transfering data at higher clock rates, on edge and level will. There are even 64bit/33Mhz slots around and they offer very little advantage over 32bit/33Mhz ones...

    2. Re:Ethernet controllers by afidel · · Score: 5, Insightful

      No, a gigabit adapter can't saturate a PCI bus by itself, 32bit 33MHz PCI is 133MB/s, gigabit is 100MB/s. Then there is 32bit 66MHz PCI, and if you want you could run a 32bit card at 133MHz as the standard supports it (though I've never heard of such a card, if you need 133MHz you generally also need 64bit but I assume a ADC could use the faster speed but not need the wider word size. The fastest current implementation of the slot local bus is 16 channel PCI-express which could handle 4 10gigabit adapters. The problem would be coming up with enough data to keep those pipes full, no disk subsystem is fast enough, and any meaningfull SQL transactions are going to be CPU limited on even the bigest of servers, so why would you need a bus with more bandwidth than that? Add to this the fact that servers which actually need more throughput have long had the faster PCI slots and you realize that it's not a problem in the real world.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:Ethernet controllers by zackeller · · Score: 1

      I see it being bypassed by PCI-E. Even PCI-E 1x is fast enough for a gigabit interface, and it's already on almost all new motherboards. We'll see how well it does once cards actually come out for it.

    4. Re:Ethernet controllers by Anonymous Coward · · Score: 2, Informative

      You got the PCI bandwidth correct, but you're gigabit bandwidth is a hair off. Depending on how you define "giga" (base 10 or base 2), you get the following numbers:

      a) Gigabit/sec = 1000 Mbit/sec = 125MByte/sec
      b) Gigabit/sec = 1024 Mbit/sec = 128MByte/sec

      True, even these speeds don't completely saturate the PCI bus, though because of how the PCI bus is shared (each device gets a few clock cycles to do it's thing before passing control off to the next device) no single device could anyway unless it's the ONLY thing on the PCI bus. It certianly will saturate (or come dang close to it) when it has it's moment of control though.

    5. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      Most PCI-E Motherboards will have Gigabit onboard.

      Im not sure about the quality (=speed/reliability) of these inbuilt interfaces though.

    6. Re:Ethernet controllers by Jeff+DeMaagd · · Score: 1

      Most currently sold chipsets provide a network interface right into the chipset as its own port, bypassing the PCI bus. The same is done with on-board IDE/ATA/SATA controllers, audio, USB, Firewire and such.

    7. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      And if the majority of the ethernet adapter's bus traffic is data, you can bet that whatever's left will be taken up by instructions.

    8. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      My understanding of the PCI bus is limited at best, but don't most net adapters have bus mastering capability? It'll just slap the PCI bus controller in the face when it says "hey you, time's up!"

    9. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      They are talking about network interfaces, in the network world prefixes are base 10.

    10. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      No, a gigabit adapter can't saturate a PCI bus by itself


      And what about full duplex mode...?
    11. Re:Ethernet controllers by lachlan76 · · Score: 1

      There is more than one device on the PCI bus...

    12. Re:Ethernet controllers by Anonymous Coward · · Score: 0
      ...and any meaningfull SQL transactions are going to be CPU limited on even the bigest of servers, so why would you need a bus with more bandwidth than that?
      Well, not everyone's doing SQL. A big video server, for example, could fill up a gigE pipe without too much difficulty. But I agree with you that such a beast would almost certainly not be using 32-bit PCI.
    13. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      Says who? Your PC has multiple PCI buses. Your AGP graphics card sits on a bus on it's own (PCI bus -> PCI to AGP bridge -> AGP device)

    14. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      ...which is why most server class motherboards have multiple PCI busses; one for each card.

    15. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      No. This is a common misconception but utterly wrong. Stop thinking of a PCI bus as "A set of slots on my motherboard". A PCI bus is a communication bus for pheripheral devices. The "Southbridge" which contains all those integrated pheripherals has a PCI controller and a PCI bus, and those pheripheral controllers sit on that internal PCI bus. They appear as PCI devices when the operating system scans the PCI buses for PCI devices, because they are PCI devices on a PCI bus.

      Putting a GigE controller on an integrated chip on the motherboard doesn't magically get around the bandwidth limitations of PCI because the controller is still a PCI device.

    16. Re:Ethernet controllers by Chatsubo · · Score: 1

      So, what we really need is hardware SQL.

      --
      > no, yes, maybe (tagging beta)
    17. Re:Ethernet controllers by jpc · · Score: 2, Informative


      gigabit is full duplex - double your figures.

      But new motherboards are already starting to come with gigabit attached to PCI Express. For the last few years any decent board has had them on fast PCI-X, at least 64 bit 66 MHz.

    18. Re:Ethernet controllers by Matt_Bennett · · Score: 5, Insightful
      The critical aspect you leave out is that Gigabit ethernet is (inherently) Full Duplex. That means that that a 32/33 PCI bus would be saturated at a gigabit out, but have no bandwidth for anything incoming.

      In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.

    19. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      In truth, a gigabit ethernet card can saturate a 1X PCI-E link (2Gb/s after the 8B/10B encoding is removed), when sending small packets- basically due to packet overhead.


      That doesn't match the studies I saw. The packet overhead exists also at the ethernet layer, so the PCI-E link doesn't become the bottlneck either. The numbers I saw was that a 10Gb/s ethernet card could be handled with a 6x PCI-E link (which doesn't really exist). This is one of the reasons why some companies wanted Gen-2 signaling to be 6.5 Gb/s instead, so that a 2x link could fully handle 10Gb/s ethernet (it appears that the 5Gb/s camp won, so one will need a 4x link to fully handle 10Gb/s traffic...).

      So going back to Gen1 signaling and a 1x link, 1Gb/s should be handled easily.

      But you might know better than I do. Don't hesitate to argue your case, I'm interested to know if I am wrong.

    20. Re:Ethernet controllers by Anonymous Coward · · Score: 0

      So... you are telling me that I've got a 2 Gbit card for 1 Gbit price ???
      I don't belive any PR department wouldn't yell very high about this if it really was true...

    21. Re:Ethernet controllers by sirsnork · · Score: 1

      How very wrong you are. Intels chipsets have CSA. That is the only chipset that has a "dedicated" port for ethernet. All the other things you mention are still connected to the PCI bus

      --

      Normal people worry me!
  9. nvidia by Ecio · · Score: 5, Interesting

    Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

    1. Re:nvidia by Intocabile · · Score: 1

      I Was going to say the same thing.

    2. Re:nvidia by bersl2 · · Score: 0, Offtopic

      From what I've heard, nVidia's implementation is sucking major ass.

    3. Re:nvidia by MatthewNewberg · · Score: 2, Interesting

      I've used both Nvidia, and 3com, and switched back and forth so many times(I had both unboard untill the board fired).. It doesn't seem to effect anything at all(including cpu usage). Then again I wasn't pushing more then 10mbits/sec accross the network or using a lot of connections.

    4. Re:nvidia by Glock27 · · Score: 3, Informative
      Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?

      Yes. The nForce4 chipsets offload most TCP/IP processing and firewall from the main CPU.

      If you go with a Athlon64 Socket 939 nForce4 board, you get PCI Express, lower power consumption, a ton of great features, good Linux support, and plug-compatible dual core upgrades down the road. Intel's offerings just seem anemic by comparison.

      (Personally, I'd also do an NVIDIA graphics board for the excellent Linux driver support. And no, I don't work for NVIDIA, I'm just a satisfied customer.)

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
  10. Interesting by miyako · · Score: 4, Insightful

    This seems interesting, though given intels track record I wonder if it will really be as useful as they are speculating, as the article has no real technical information.
    Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
    So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?

    --
    Famous Last Words: "hmm...wikipedia says it's edible"
    1. Re:Interesting by AutumnLeaf · · Score: 2, Insightful

      I've seen extremely beefy NFS file-servers go into a crash-reboot-crash cycle after the first crash because all of the hosts trying to remount the filesystem completely crush the machine before it is fully up to speed. We've had to unplug the network cables on the server to prevent the mount storm for killing the server again.

      Note, this is enterprise-grade hardware hooked up to million-dollar disk arrays.

      Now, is that entirely from dealing with the networking stack? No. Not quite. However, consider this. It takes time to checksum headers and data. It takes time unwrap packets. If you have a ton of clients raining requests for data on your server, it's not hard to see that dealing with the networking bookkeeping could impact the throughput of requests. ie: Database servers and web servers are two things that come to mind here in addition to file servers.

      Btw, note that this another part of the "platform" initiative/orientation. While Intel's track-record has not been great in many respects, they do have a good track-record of success with "platforms." eg: Centrino was a "platform."

    2. Re:Interesting by Anonymous Coward · · Score: 1, Insightful

      Patch your OS, it should not crash due to high load, ever.

    3. Re:Interesting by Bill+Wong · · Score: 1

      Actually, I have a feeling that is hardware is not as "Enterprise-grade" as he thinks... Million-dollar disk arrays tend to be built because storage is needed for data worth millions+. One doesn't build million-dollar disk arrays on a whim, of course, no one has the budget anymore... So, one would hope that he would have heavily clustered and load-balanced servers to prevent these crash-reboot-crash cycles... All of our file servers at work, are on clustered in groups of 4+ servers... Heck, even our Legato backup cluster, well, is clustered...

  11. Qlogic TOE cards by jsimon12 · · Score: 5, Informative

    Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.

    1. Re:Qlogic TOE cards by Anonymous Coward · · Score: 1, Insightful

      I'm guessing with sweeping comments such as Sun's TCP stack is crappy you've extensively tested solaris 10? nice to know theres people giving expert opinions on cutting edge software so that people like me dont have to form factually based opinions

    2. Re:Qlogic TOE cards by Anonymous Coward · · Score: 0

      And Linux's TCP stack is made of gold? I guess that's why it gets torn out and replaced every two years.

    3. Re:Qlogic TOE cards by incubuz1980 · · Score: 2, Informative

      The Solaris TCP/IP stack has been greatly improved in Solaris 10. There really is a BIG difference compared to older versions of Solaris.

    4. Re:Qlogic TOE cards by Anonymous Coward · · Score: 0

      I wasn't arguing against Solaris. Solaris has had a great TCP/IP stack for years, despite its shortcomings in the past *cough* sequence number generation.

    5. Re:Qlogic TOE cards by jsimon12 · · Score: 1

      The Solaris TCP/IP stack has been greatly improved in Solaris 10...

      You mean brought up to par with the rest of the computing industry? When they improve it too the point of better or at least on par performance with the *BSD's then we can talk. :)

    6. Re:Qlogic TOE cards by Anonymous Coward · · Score: 0

      > The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.

      Do you have some reference with test data to back this up? You have a lot of posts making wild assertions, but never any backup data.

  12. Re:A good thing by SinaSa · · Score: 1

    With the ever growing popularity of fluff statements like this one, I think a statement like the parent may yield no real benefits to this discussion.

    --
    --
    The last digit of pi is four.
  13. yeah great by Anonymous Coward · · Score: 5, Funny

    soon it will be dedicated processor and RAM to deal with tcp, then a dedicated processor for the keyboard input, then a dedicated processor for the fans and a special dedicated processor on 12" PCI-X card for the extremely computationally intensive MOUSE, actually this will have it's own special dedicated path call 'AMP' or Accelerated Mouse Port. Mice of the future will need much more bandwidth than today. About 16 GB i/o so they need their own data paths.

    And then there will be other enhancements like the tcp/ip one.

    For instance a special accelerator card for Word and Internet Explorer will be developed.

    Furious Linux users will demand their own technology, so one manufacurer will come up with a special card for running GNOME apps. This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.

    1. Re:yeah great by burns210 · · Score: 1

      I always thought having components offloaded to their cards(the way OS X offloads video the video car). Network offload to the NIC, sound to the sound card, etc. Why not? Given that 100mhz+ processor are becoming dirt cheap, and their ability to take on processor load only makes sense, freeing time for the system CPU to move on to better things.

    2. Re:yeah great by myspys · · Score: 1

      you know the end of this story, don't you?

      the amiga, of course!

      dead it might be, but it was still a beatiful design!

    3. Re:yeah great by ceeam · · Score: 2, Funny

      But then - imagine that - a single Z80 would suffice to act as a _C_PU commanding all those!

    4. Re:yeah great by Anonymous Coward · · Score: 1, Interesting

      And don't forget all this extra power will be used up by the anti-virus product that will be required according to the company policy.

      No problem for the managers that run desktop's that have enough compute power to launch a space shuttle every 2 seconds, and are used to show nice screensavers, but a pain in the but for me who still tries to replace his Pentium 90 Cpu with something that is socket compatible with a "nice for home" giveaway.

      --ac for obvious reasons.

    5. Re:yeah great by yem · · Score: 2, Insightful

      I didn't know whether to mod you interesting or funny :-)

      Parallelism is great. Look the way things are going. Dual CPU motherboards, Dual core CPUs, Cell..

      And gnome.. sheesh.. back when I ran a P100 and Gnome was slow, I thought "well one day I'll have a 500Mhz monster and Gnome will be fast". Here I am with a P4-2.6Ghz/1Gb and Gnome is STILL a dog. *sigh*

      --
      No, I did not read the f***ing article!
    6. Re:yeah great by Terrasque · · Score: 0

      This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.

      Ouch! I was drinking, and almost ruined my keyboard reading that :-(

      I guess the morale of the story is: Don't drink and slashdot

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    7. Re:yeah great by master_p · · Score: 1

      All these custom chips remind me of the Amiga. Back then, custom chips were considered unnecessary. Now PCs are full of them (custom chips, that is). It's funny how the world goes around...

  14. Re:A good thing by Quobobo · · Score: 5, Funny

    Newly discovered, a simple and easy karma-gaining method! Amaze your friends, and become more eligible to moderate!

    1. Refresh your browser constantly until there's a new story on Slashdot, to post before everyone else.

    2. Post something similar to "This is good/bad, for INSERT_OBVIOUS_REASON_HERE. And fuck the INSERT_RIAA-LIKE_ORGANIZATION_HERE." (second sentence is optional)

  15. Will it support IPv6? by arc.light · · Score: 4, Interesting

    The article doesn't say, and I'd hate to be "stuck" with a card that only does IPv4. Yeah, I know, hardly anyone uses IPv6 today, but the nations of China and Japan, as well as the US DoD, are starting to roll out IPv6 networks in a big way.

    1. Re:Will it support IPv6? by Anonymous Coward · · Score: 0

      My words exactly!

      Will it support IPv6? How about other Transport Layer protocols like SCTP?
      Also keep in mind that TCP is not a static thing it is constantly evolving, especially the congestion avoidance and control algorithms.

    2. Re:Will it support IPv6? by tji · · Score: 1

      Probably not.. But, all that means is that IPv6 processing will not benefit from the hardware on the card. So, you'll be no worse off than you are today, with a card that does no offload.

      It's sort of like Intel's current Gig-E cards, they offer TCP checksum offload. But, your driver/stack needs to take advantage of this feature. If you use an old driver that doesn't use it, everything still works fine, but your CPU is doing a bit more work.

      If/when IPv6 becomes common, Intel will most likely have TCP Offload support implemented quickly.

  16. many white elephants by Joseph_Daniel_Zukige · · Score: 1

    Think 80186, ergo, "io co processing instructions". ;-)

  17. Lots of people agree, including AC and DM by Anonymous Coward · · Score: 4, Informative

    AC being Alan Cox, DM being Dave Miller.

    Read Alan's opinion here.

    Read Dave's opinion here.

    There has been discussion of this specific Intel announcement here.

  18. overclock :D by Thesi · · Score: 0, Redundant

    can I overclock it?

    --
    This signature is annoying. - STEvil of www.xtremesystems.org
  19. Attn MODS. by DAldredge · · Score: 1, Redundant

    I will do this slowly so you can understand.

    HE
    DIDN'T
    SAY
    A
    DAMN
    THING!

    1. Re:Attn MODS. by hdparm · · Score: 1

      You must be new here :o)

  20. Re:A good thing by Jugalator · · Score: 0, Offtopic

    With the ever growing wishes by some to get first posts, I think the little time to write a post may yield that kind of quality.

    --
    Beware: In C++, your friends can see your privates!
  21. So, now hackers will target your BIOS rather than by ABeowulfCluster · · Score: 3, Interesting

    targeting the OS. I can see this technology being useful on servers which have multiple network cards and heavy traffic, but not for joe average pc user.

  22. So finally! by Trogre · · Score: 5, Funny

    buying Intel really will make the internet go faster!

    --
    "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  23. Re:Deja vu? - EXACTLY! by Anonymous Coward · · Score: 0

    Marketese is all.

  24. Open Source Drivers by BarrettVS · · Score: 1

    But will the technical details of this be available for OSS or will it be like OpenBSD's experience with Intel's cryptographic hardware?

  25. Old news by obeythefist · · Score: 4, Informative

    Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.

    Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)

    --
    I am government man, come from the government. The government has sent me. -- G.I.R.
  26. cpu? e-net controller? by Anonymous Coward · · Score: 0
    which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead

    As opposed to right now, where all that TCP/IP stuff is handled by the floppy drive and the mouse?

    If the point isn't obvious now, I'm trying to say the CPU, the motherboard chipset, and the ethernet controller were already intimately involved in the whole network stack thing.

  27. And the CPU doesn't have other things to do? by Moderation+abuser · · Score: 3, Insightful

    My boxes all run tens to hundreds of processes for tens to hundreds of people. Offloading the processing to a networking subsystem isn't going to hurt, especially with gig and 10gig.

    Not that this is a new idea. It's been done for donkey's years.

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:And the CPU doesn't have other things to do? by Toby+The+Economist · · Score: 1

      Intuitively, people think it won't hurt, but intuition is wrong.

      Consider; you have a hundred users, all doing some sort of network based task - say, reading Usenet via an NNTP server.

      You offload their network processing from the CPU to a slower CPU on the network card.

      Every time a thread in your NNTP server blocks, waiting for a packet to arrive or be sent, the main CPU moves onto another thread...which also then needs a send/recv, and blocks, and so on.

      In the meantime, the slow CPU gets around to dealing with the queue of send/recv requests that it has.

      The main CPU spends its time zipping from blocking thread to blocking thread, or being idle, with each thread spending plenty of time waiting on the slow network CPU.

      You'd be much better off letting the CPU zip through the network processing and then moving onto the next thread; the throughput for each thread would be much higher.

      "But, " I hear you say, "why is my network CPU so slow?"

      "Because, " I reply, "you don't spend 300 USD on a P4 CPU and then the same amount, or more, on your network card, when you ALREADY have a zippy P4 CPU to do all the work!"

      --
      Toby

    2. Re:And the CPU doesn't have other things to do? by aminorex · · Score: 1

      Your reasoning fails if an ASIC is used for TOE, instead of a general-purpose CPU.

      Consider: Why do winmodems suck?

      --
      -I like my women like I like my tea: green-
    3. Re:And the CPU doesn't have other things to do? by Anonymous Coward · · Score: 0

      Winmodems suck because normal OSes are shitty at realtime tasks, and normal CPUs are shitty at DSP tasks.

      The "accelerator" in these intel NICs is just an ARM (err, X-Scale) CPU hooked up to a normal ethernet chip. They have checksum offload, but that is because the ethernet chip has it; so does the ethernet chip in my $50 NIC.

      Ignoring routering, the only part of TCP/IP that is especially ASICable is the checksumming. Which is why it is already offloaded to an ASIC (in both these TOE schemes and in a normal TCP/IP stack).

      The place where these TOE systems really make sens e is for iSCSI HBAs.

  28. Re:cpu? e-net controller? by mabinogi · · Score: 2, Funny

    didn't you know?

    The secret to faster downloads is to keep wiggling the mouse, that way it pushes the data through faster.

    --
    Advanced users are users too!
  29. if i were to make wildly unsubstatiated guesses... by evilmousse · · Score: 2, Interesting


    i'd guess the tcp/ip stack implementations available to intel are pretty solid. still, i'd hope it'd be flashable just in case. i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net.

  30. Re:Were you born a lazy sack of shit? by Anonymous Coward · · Score: 0, Funny

    I accelereate TCP/IP stacks...

    with my ASS!

  31. Most people define Acryonyms second by PickyH3D · · Score: 0, Redundant
    I/OAT, or I/O Acceleration Technology
    Should be
    I/O Acceleration Technology, or I/OAT
    It only makes sense.

    It's like programming with a variable that has yet to be defined.

    1. Re:Most people define Acryonyms second by RyuuzakiTetsuya · · Score: 0, Offtopic

      when I was actively doing programming, I got into the bad habit of doing:

      int x = 0;

      --
      Non impediti ratione cogitationus.
    2. Re:Most people define Acryonyms second by PickyH3D · · Score: 0, Offtopic

      Wow, that's horrible. It's almost like you're defining it before using it? Crazy, how that works out just like I suggested.

    3. Re:Most people define Acryonyms second by Anonymous Coward · · Score: 1, Funny

      You guys put the "-" in anal-retentive.

    4. Re:Most people define Acryonyms second by PickyH3D · · Score: 1

      How exactly is it off topic to respond to an off topic post?

  32. Re:cpu? e-net controller? by RyuuzakiTetsuya · · Score: 1

    Troll but, i'll bite.

    I said, TCP/IP data. Typically, the ethernet controller, mobo chipset, and cpu don't care what kind of data it's processing, just that it's processing data. Now it'll be sensitive to TCP/IP overhead and have special ways to process it.

    --
    Non impediti ratione cogitationus.
  33. Re:cpu? e-net controller? by Anonymous Coward · · Score: 0

    And the most that your average ethernet controller does in hardware is what? TCP checksumming? Oh, thanks Mr. Controller, that helps a lot.

  34. Re:if i were to make wildly unsubstatiated guesses by Anonymous Coward · · Score: 0

    I know you're probably on to something, but really, I have no idea what you're talking about... TCP/IP stack in flash memory? Huh?

  35. Is that the same Tannenbaum that said.... by droopycom · · Score: 1

    ... that Linux was an obsolete design ?

    If so, I will beware any bold predictions he make.

    He might be right in theory I guess ... but in practice ?

    1. Re:Is that the same Tannenbaum that said.... by Toby+The+Economist · · Score: 1

      Monolithic kernels are obsolete; Tannenbaum is correct.

      Linux still works, though, and filled an important and well-supported part of the OS worldspace, and became successful.

      These two facts are not mutually incompatable.

      --
      Toby

    2. Re:Is that the same Tannenbaum that said.... by jpop32 · · Score: 1

      Yup, that's him.

      And, furthermore, it's the game guy that wrote The Bible Of Networking (Computer Networks, Prentice Hall). If you did networking courses in college, chances are high that you studied from his book. He _is_ one of the greatest authorities, living or dead, in the field.

      So, when he has something to say regarding networking, you better listen up.

      Besides, he was right. Monolithic kernels are obsolete technology. Linux success has nothing to do with it. Would you argue that Windows is cutting-edge technology, simply because it's successful?

  36. Cosmetic applications? by Anonymous Coward · · Score: 0

    I wish Intel would enhance my girlfriend's stack...

  37. Re:Great - no by SpaceLifeForm · · Score: 1

    This makes it too easy for spyware. I would not use this technology if you want your privacy.

    --
    You are being MICROattacked, from various angles, in a SOFT manner.
  38. ha! who needs it? by flacco · · Score: 4, Funny

    ...when you can get AOL internet accelerator for FREE!

    --
    pr0n - keeping monitor glass spotless since 1981.
  39. Speaking of drugs by Illserve · · Score: 1

    Enhance your Stack!

    Have you ever wanted your TCP stack to be more secure? Has your internet ever dribbled? Sign up for intel soft tabs now!

  40. And the integrated DRM? by tjlsmith · · Score: 5, Interesting
    and how much DRM are they going to build onto the motherboard, just in passing?

    Don't think for a minute the big boys aren't trying to take the Internet away from us. The missed the opportunity once, never twice.

    --
    Mumia Abu-Jamal is *laughably guilty*. Check the evidence.
    1. Re:And the integrated DRM? by demon_2k · · Score: 1

      To my knowledge, you can use memory footpring of cpu usage to identify a process. Maybe you can use them to identify a connection...

  41. Deja-vu? by KZigurs · · Score: 1

    Whoa! Like when you could actually buy network cards that communicated only protocol layer to OS?

    Actually it's fun. Once the computer was full of small, dedicated processors that dealt with various processing where it was applicable and your old 486 actually didn't felt that bad.

    Didn't felt bad when compared to P200 with winmodem, cheap NIC and AC97 sound card.

  42. Re:cpu? e-net controller? by Anonymous Coward · · Score: 0

    That was the case with the early versions of NT (I don't know if it changed). The idea seems sensible from one side -- the application with the most of the mouse activity (and focus) got the highest schedule priority and hence more CPU time.

  43. DoS Attacks by Gary+Destruction · · Score: 2, Interesting

    Will this technology make it easier for systems to withstand DoS Attacks?

    1. Re:DoS Attacks by Anonymous Coward · · Score: 1, Interesting

      Harder, I would have thought. Instead of just overloading the pipe maybe they'll be able to overload your processor as well. Wonderful.

    2. Re:DoS Attacks by Anonymous Coward · · Score: 0

      Yeah... and creating DoS attacks... ...a compromised server with this type card could just close all network traffic...

    3. Re:DoS Attacks by Anonymous Coward · · Score: 0

      No, though it will make it easier to launch zombie DDoS attacks :)

      The end user won't even notice now :)

  44. Ha, old news! FPS's have had this for ages. by quarrel · · Score: 2, Funny

    This is ridiculous.

    We're had this for years in FPS's- used to be that I used to have to practice for ages just to compete with the young kids at FPS's. Then along came some great 'acceleration' technology, and it's been so much easier. I call mine a bot.

    Ever since it hasn't been about upgrading my CPU or graphics cards to get that head-shot. I've been offloading all that work!

  45. best possibilities by Kaenneth · · Score: 0

    packet from a common worm? main CPU never sees it.
    ping? main CPU never sees it.

    heck give it enough scratch ram, and maybe host your main page directly on the NIC.

  46. Re:A good thing by FIGJAM · · Score: 0, Offtopic

    I say the last digit of pi is zero

    --
    Do your best, hope for the best, suspect the worst.
  47. How big difference? by Masq666 · · Score: 1

    How much will this speed up ethernet connection?, does anyone know. Same article at Bits of News

    --
    Bits of News Giving you the latest bits.
    1. Re:How big difference? by ockegheim · · Score: 1

      At the moment the processor is doing all this stuff to ensure reliable transmission of data over a potentially unreliable medium. With normal internet, there are relatively few packet to be processed, so there would not much of a speed gain. On the other hand, anyone using their network at gigabit speeds would almost certainly benefit a lot.

      I read that using tcp at gigabit speeds uses 100% of the processing power of a 2.x gHz Pentium. Without this workload the processor would be available for much more non-networking uses.

      --
      I’m old enough to remember 16K of memory being described as “whopping”
    2. Re:How big difference? by Masq666 · · Score: 1

      Looks like servers will benefit a lot from this, since they often have a lot of there workload dedicated to tcp.

      --
      Bits of News Giving you the latest bits.
  48. Re:A good thing by Anonymous Coward · · Score: 0

    Here be the fourth fluffy post in this thread.

    How far can we take it to the right before my browser crashes?

  49. Re:A good thing by Anonymous Coward · · Score: 0

    No no no no no no no.

    Take note how *saying* something is fluff get's you more karma than the fluff itself.

    You must be new here.

  50. Re:White elephant - flawed logic by morzel · · Score: 2, Insightful
    Using the same logic, machines with two (or more) CPUs wouldn't be useful, since the second CPU is not going to be any faster in than the first one.

    With all due respect to Mr. Tannenbaum, but if he stated what you put in your post, his logic is severely flawed.

    Let's compare the general CPU/networking CPU combination with a manager/secretary.
    The manager has a number of tasks which needs to be done, including scheduling a number of appointments. Without a secretary, he'll be obliged to call/contact the people involved, wait for their responses and note the scheduled appointments in his calendar. Once that is done, he can go about with his other tasks.
    When that manager has a secretary, he can just tell the secretery to make the appointments and notify him when they're done. That secretary isn't going to be any faster in time making those appointments (still has to call the same people); but in the mean time the manager can start working on something more useful (in theory).

    While the secretary may not be that much faster at scheduling appointments (she probably is, since she knows how to deal with this and who to contact a lot quicker and in a more structured way than the manager), the end result is that the manager can get more work done because he delegated some of it to the secretary.

    Note for the Politically Correct: feel free to swap he/she where approriate.

    --
    Okay... I'll do the stupid things first, then you shy people follow.
    [Zappa]
  51. You are a moron. by Anonymous Coward · · Score: 0

    Specialized CPUs for $INSERT_PURPOSE_HERE can be much cheaper and much faster than a general-purpose P4 CPU (if you don't believe me, think about graphics cards). Besides which, most networking is I/O-bound... which means that the point of these 'intelligent' network chips is to offload the bus not the CPU. So, do us all a favour and shut the hell up.

    1. Re:You are a moron. by Anonymous Coward · · Score: 0

      you had a valid point until your subject and last sentance. the grandparent may be misguided, but you are just an asshole.

  52. Ugly by Detritus · · Score: 1

    It's been done before, many times before, going back to the early days of Ethernet and TCP/IP. There was a company in the 1980s called Excelan that made smart LAN boards. The problem has always been that it usually doesn't work that well. Smart boards are expensive. Smart boards with fast CPUs and lots of memory are really expensive. A new protocol stack has to be created for the main CPU to communicate with the smart board. When you compare the number of cycles required to support the host to smart board protocol with the number of cycles required to do TCP/IP on the main CPU, you often find that the gain is disappointing. It just isn't cost effective and the performance improvement is marginal.

    --
    Mea navis aericumbens anguillis abundat
  53. 3COM? by argent · · Score: 1

    Hasn't 3COM already implemented this, putting higher level stack elements in their firmware?

  54. Re:if i were to make wildly unsubstatiated guesses by conteXXt · · Score: 1

    "i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net."

    Apparently some people missed the sarcasm here.

    To those, this happens OFTEN.

    --
    The truth about Led Zep should never be told on /. (Karma suicide ensues)
  55. Alacritech by molekyl · · Score: 1

    If I am to believe the marketing, the first to do this kind of complete offloading were Alacritech, with their TCP/IP Accelerator. Unfortunately, you have to register to see their benchmark reports.

  56. RDMA by Anonymous Coward · · Score: 0

    AFAIK, RDMA doesn't work off of the MMU so it can't do virtual memory address translation or know about things like page faults. So you're really limited with how you can set up memory to work with it, or with being able to share it in a multi-tasking environment. Imagine only one unix process being allowed to access the network at any one time.

  57. This old bit of snake-oil... by Ancient_Hacker · · Score: 4, Insightful
    The nightmare continues. It goes something like this: Some drooling "computer scientist" is too dumb to do anything useful, so they speculate" "Wouldnt it be nice to free up this $XXXX CPU from this humdrum task (choose: moving bits/bytes/pixels/ or packets)". He finds a brain-addled silicon-stuffer to design a chip to do just that. All rejoice at the increased efficiency.

    Except:

    • The silicon-stuffer only has access to the slow processes of maybe two silicon generations back, unlike the CPU which paid for the latest whizzy xx picofurlong process. So the supposedly whizzy chip is still not particularly faster than the CPU.
    • The whizzy chip shows up late, just about when the associated CPU is going to take a 2x speed hike.
    • The chip is on the I/O bus, requiring many slow I/O cycles, with interrupts masked, to get its commands.
    • Said whizzy bit-banger doesnt have any software support from the main operating systems.
    • The silicon-etcher guy can't write english worth a damm, so nobody can understand the spec sheet.
    • And oh, he didnt know the bus was active-low, so all the data packets have to be inverted.
    • And sometimes byte-reversed too.
    • The chip designer doesnt know or care about the whole system, so the chip does several things that spoil the overall performance, like hogging the bus, saturating the bus snoop logic, poisoning the cache, interrupting too often, etc.
    • The droolers forgot to think about the multi-processor option, so the chip doesnt share well with multiple CPU's.
    • The chip is all hard-wired gates, so there's no way to fix the problems.
    Finally some software wizard finds a way of speeding up the code that runs in the CPU so it's now faster than the separate chip, so the chip is now useless and just an extra power waster.

    We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.

    1. Re:This old bit of snake-oil... by Anonymous Coward · · Score: 0

      Aside from the sarcasm, you've summed up the TOE chimera quite well.

      The only thing missing here is that people never learn from other's mistakes, and keep reinventing the wheel. And it's astounding that VCs keep dumping money into this illusion.

      The interesting thing about this phenomena is that it keeps happening in other tech areas as well. Why is that? Hubris, arrogance, and stupidity I suppose. The difference with the other fields is that the problems usually aren't as obvious as with TOE. It's useful to be able to recognize those fields which are subject to TOE-like failures; but it's something few people even try to do.

    2. Re:This old bit of snake-oil... by Anonymous Coward · · Score: 0

      Only in this case it's Intel, which can put the technology in the chipset so it's not on the IO bus; they can make it with the same generation of fab tech as the rest of the system; and since they designed practically the whole damn thing anyway, they can make it all work in an integrated manner.

      Of course, Intel has historically been very bad at this whole offloading thing. How many of you have servers using I2O?

      dom

    3. Re:This old bit of snake-oil... by RyuuzakiTetsuya · · Score: 1

      Excuse me?

      My soundcard that can do surround sound in hardware is weeping, along with my MPEG2 DVD Decoder card.

      --
      Non impediti ratione cogitationus.
  58. hardware modems & NICs by Anonymous Coward · · Score: 0

    i have used both winmodems and external serial modems, and i seen a noticeable improvement with the external serial modem, Linux sure found it and ran it good, surfing seemed slightly faster with the external serial modem, could it be because it had its own CPU and did not hitchhike off the motherboard's CPU? i think that is so...

    i bet the same logic applies to 10/100 NIC cards for broadband, maybe if they were built with their own CPU then they would have better thoughtput...

    just my $ 00.02

  59. Re:A good thing by orasio · · Score: 2, Funny

    3. Don't be funny. Funny doesn't give you karma.

  60. Dupe? Well not a /. dupe... by bernywork · · Score: 1

    Umm, haven't we been here before with the Intel PRO cards?

    They at one point used to do just the PRO/100 cards, then they dropped them and started doing PRO/100 cards that did IPSEC hand off? If I remember correctly the S was security and they had a few other models? I was thinking back then that they would be looking at IP hand off at some point.

    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
  61. API by Anonymous Coward · · Score: 0

    If they put out an API for Software Engineers, will it be available at http://ioatse.cs ?

  62. ...no cuz... by x2A · · Score: 1

    that will be offloaded to your AVPU (Anti-Virus Processing Unit)

    --
    The revolution will not be televised... but it will have a page on Wikipedia
  63. This isn't new by Carl+Oppedahl · · Score: 1

    Yes and see also this Adaptec product which seems to have been doing TCP/IP offloading for over a year.

    1. Re:This isn't new by Anonymous Coward · · Score: 0

      The article explicitly states that Intel's technology is different from Adaptec and Alacritech TOEs, although its kind of light on details regarding what the difference is.

  64. side effects? by geo.georgi · · Score: 1

    Does this approach have some side effects?
    For example programs, that reuse the buffer right after the send() ?

    1. Re:side effects? by rthille · · Score: 1

      Not sure about the implementation, but the kernel could mark the page as write-protected (with the MMU) and implement copy on write or block the process until the copy to the NIC has completed.

      --
      Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
    2. Re:side effects? by Fweeky · · Score: 3, Informative
      From FreeBSD's zero_copy(9) manpage:
      For sending data, there are no special requirements or capabilities that the sending NIC must have. The data written to the socket, though, must be at least a page in size and page aligned in order to be mapped into the kernel. If it does not meet the page size and alignment constraints, it will be copied into the kernel, as is normally the case with socket I/O.

      The user should be careful not to overwrite buffers that have been writ ten to the socket before the data has been freed by the kernel, and the copy-on-write mapping cleared. If a buffer is overwritten before it has been given up by the kernel, the data will be copied, and no savings in CPU utilization and memory bandwidth utilization will be realized.

      It also mentions some issues with regard to zero-copy receive, which requires help from the NIC to ensure received packet payloads are also page-aligned and >= page size. Such support is predictably very rare.
  65. Re:Great - no by Anonymous Coward · · Score: 0

    You tell me when your BIOS gets infected with spyware and I'll start worrying.

  66. x86 vs RISC by Ulrich+Hobelmann · · Score: 0, Offtopic

    So it goes on and on...

    x86 has gotten 32bit extensions, protected mode, MMX, 3DNow, MMX2, SSE, SSE2, 64bit extensions (+ some new registers), and now another special-purpose instruction set (?) enhancement.

    PPC, on the other hand, has been a 64bit instruction set from the beginning (of the '90s, that is); has had one SIMD instruction set (Altivec) that many claim to be superior to all that SSE stuff; and it has lots of nice registers and cool instructions that are much more fun to use for any compiler than the Intel crap.

    Oh, and PPC hasn't changed through all those years, so you don't have to learn new instruction sets all the time (and program that damn chip in assembly, because compilers don't know the extensions, yet!).

  67. Re:Nothing to see here by ergo98 · · Score: 4, Funny

    I'll take any speed boosts Intel wants to throw my way but I think their efforts would be better spent elsewhere.

    Craig Barrett here.

    Listen we apologize for this distraction, and apologize for not consulting with you first. I guess some of our engineers just got caught up in something silly and they went off and did this when instead they could be doing things more valuable to you.

    We immediately begin work on the porn accelerator coprocessor.

  68. Microsoft by ryu1232 · · Score: 1

    Now if Microsoft would remove the restriction of 10 concurrent tcpip session on xphome and pro this might be useful.

  69. I don't think you understand P4s by leonbrooks · · Score: 1

    Using a P4 to do I/O work is like using a battleship as a landing craft. Until now, the alternatives have been to do that or let your soldiers (packets) swim to shore. Intel's smarter cards are like providing landing craft.

    This is not a new concept.

    DEPCAs made network I/O easy back in the days of ISA busses twenty odd years ago, and there have been PCI cards with their own CPUs which you can actually load a version of Linux into and use as standalone routers - so the network cards handle stuff like ICMP and defragmenatation without even touching the main CPU.

    --
    Got time? Spend some of it coding or testing
  70. I forgot by orasio · · Score: 1

    4. Don't be too insightful/interesting, too often.
    Excelent karma is no good if you want mod points, I haven't had those for a looooooong time.

    Now I always post with karma bonus, even when flaming, so I can go back to "good" or "great" karma.

    1. Re:I forgot by Anonymous Coward · · Score: 0

      Probaly due to browsing too much, not because of your high karma. I had not had mod points for 3 years. Then I lost my cookies on my work machine and was too lazy to get my password changed, because then my cookies on my home machine would be invalid. The result - I did most of my browsing of slashot while not logged in, only checking a few stories at home, and the moderation points came flowing back.

  71. Not as silly as it sounds by leonbrooks · · Score: 1
    A flock of little processors to:
    • manage the TCP stack
    • manage and parse each TCP connection
    • optimise the parsed SQL
    • plan and execute intelligent disk IO
    ...leaves the main processor to marshal everything and pick up any processing too complicated for the sub-processors' tiny little minds. Such a beastie would certainly keep the RAID arrays rattling and network cards glowing.
    --
    Got time? Spend some of it coding or testing
  72. You speak in jest, but... by leonbrooks · · Score: 2, Insightful

    ...the orignal IBM PC put a processor in the keyboard and another (dumb) processor on the motherboard to talk to it.

    This USB keyboard I'm typing on involves at least three processors, one to scan the keys, one to do the USB on the peripheral side and the third to do the USB on the motherboard side.

    --
    Got time? Spend some of it coding or testing
  73. Not if it is all a hardwired hardware design by 6800 · · Score: 1
    While I've no clue how intel is gonna implement this, if they do it all in real hardware (no uproc and no microcode), it'd be right hard to hack.

    In addition, for economy and speed, the stack would not necessarily be implemented as serially as it is in a full software implementation. Also most operations would occur in one clock cycle.

    Of course upgrades to tcpip would be - replace the card.

  74. Re:White elephant - flawed logic by Toby+The+Economist · · Score: 1

    > Using the same logic, machines with two (or
    > more) CPUs wouldn't be useful, since the second
    > CPU is not going to be any faster in than the
    > first one.

    This deduction is improper for two reasons.

    First, for it to be relevent to the networking scenario described in the OP, the networking CPU would have to be equal in processing capability to the main CPU. This is not the case.

    For example, if I had a dual processor machine where one CPU is a 3 GHz P4 and the other is a 66 MHz Pentium, is the second CPU really that useful or is it in fact a hinderance? particularly when you consider the networking scenario, when any tasks offloaded to the slow CPU *must* be completed before the fast CPU can continue with that task.

    Secondly, it fails to take into account the inherently and unavoidably serial nature of network packet processing. You cannot usefully apply two CPUs to this task. If a machine was given tasks which were not subject to parallism, then having multiple CPUs does not speed up any given task; more tasks can be done concurrently, but each task takes the same time.

    This is the problem which faces networking processing. Any given thread which performs network I/O will be executing on a single CPU.

    To consider your analogy, if the manager has only one task to do, and needs the other person his secretary calls to respond before he can continue, there's very little point having a secretary make the call for him. He's going to be stuck waiting till the reply comes through anyway.

    By and large, many people in this thread are failing to perceive that parallism is not a solution, since the issue is the performance of any single thread which is performing network I/O.

    To take the problem to an illustrative extreme, we could in theory have a multitude of slow CPUs which the main zippy CPU offloads everything to; graphics, network, disk, etc.

    Result? anything that requires operations which are offloaded performs weakly, since its critical path of execution spends most of the time on the slow CPUs - and we *paid* for all those slow CPUs, when we've already paid for our expensive main fast CPU!

    --
    Toby

  75. Remote DMA by venkats · · Score: 1

    TCP offload engines, zero copy I/O etc are not exactly a new concept. However, what could be significant is in the realization of applications based on these concepts. i am trying to bring in a point about RDMA (Remote Direct Memory access) which relies upon a hardware based RDMA engine residing in the peer. RDMA suits bulky data transfer like the one seen in SAN (Storage area networks).
    more info on RDMA is available at http://rdmaconsortium.org/.

  76. Why multiples are NOT useful without routing accel by Anonymous Coward · · Score: 0
    Generally, those who are sceptical about the real-world utility of this are going to be right. And for those who think that stacking multiple of these together will let you build a cheaper switch/router forget about the shared nature of the IP routing database (especially when you use the sort of dynamic routing protocol a large router invariably will need).

    Without an accelerated routing database, you are, most likley, stuffed.

    Shaheed (who worked on I/O for the world's fastest routers some years back)

  77. this is good. by john_uy · · Score: 1
    typical gigabit ethernet cards today can transfer an average throughput of 400mbps only without special tweaking and multiple transfers. this is probably the same concept as the one alacritech, adaptec, and qlogic offers. they can saturate the network link at full 1gbps with minimal cpu load. i haven't tested this but we plan on progressively putting this in our servers (though a bit expensive, a little cheaper than the single port fc card.) this is the limiting factor right now. we have an fc for our storage that is doing wonders with speed but when it comes to the network, puft! hopefully this technology will be standard in servers soon as it will greatly improve performance of file transfers (especially network backups!)

    honestly, i would want a switch line card to be in a computer to provide non-blocking high i/o and real time processing to network traffic much like a router and a switch does.

    anyway, we'll be waiting for the offload of the 10gbe cards! this time we need to upgrade our fc to support 10gbe as well. :)

    --
    Live your life each day as if it was your last.
  78. Similar to what Jolitzes have been up to? by Hobart · · Score: 3, Interesting
    A while ago I looked up what the original authors of BSD-on-the-386 ( 386bsd ) authors had been up to, I just searched again and found http://www.interprophet.com and http://www.telemuse.net ...
    Their new gig was putting the TCP/IP stack into the silicon for performance, the Internet Archive version says they've been at it since 1989...
    I wonder if Intel licensed their patents, or if this is similar stuff...
    --
    o/~ Join us now and share the software ...
  79. Re:if i were to make wildly unsubstatiated guesses by evilmousse · · Score: 1


    uh.. no, i was serious... i can only think of a few times in recent years i've heard of a tcp/ip stack implementation getting compromised.

    i've searched US-CERT for "tcp/ip" and there's only two or three i see.

    as for the other flash memory comment.. am i missing something? the tfa is about hardware tcp/ip implementations.. you'd want to be able to correct the code if a critical flaw was discovered.. wouldn't that be time for firmware?

  80. Embeded Ethernet compression? by Anonymous Coward · · Score: 0

    So my modem does many forms of compression, and I download files over ethernet compressed in JPG, GIF, RAR format. My cable modem does MPEG2 compression.

    The question comes to mind, why doesn't Ethernet adopt some for of compression?

    Most of my packets are small and would not benefit from compression, but most of my bandwidth is used by large packets that would benefit from compression.

    In a LAN environment, I may return an SQL dataset in raw ASCII with no compression. Or I may copy a large text file from one machine to another.

    If the goal is to increase thruput, why not optimize bandwidth? Add something to the Ethernet spec to allow connections to be compressed or uncompressed.

    You could do this at multiple layers, either by compressing each packet (minus header info) which would be easier but less compression would occur, or by compressing the entire transfer (again, minus header/trailer info.)

    I know it is a bit more complicated than this, but my $.01US 56K winmodem does it, as does my $100 external non win modem.

    www.Acmenews.com LLC

    1. Re:Embeded Ethernet compression? by Phil+Karn · · Score: 1
      The question comes to mind, why doesn't Ethernet adopt some for of compression?

      Ahem. I take it you've already upgraded everything to gigabit Ethernet, and that's still not fast enough for you?

      Even the oldest, slowest form of ethernet is orders of magnitude faster than dialup, and a lot of people don't even bother to use the fastest version. There's just no point to adding compression as that would provide, at best, another 2-4x on text. And none at all on much of the bulk data that people commonly send, such as images, sound files and tarballs, because they're already compressed at the application layer.

      Also, a good compression algorithm would necessarily increase latency, and that's usually totally unacceptable on a LAN. Otherwise, why ever have LAN parties?

      Communication links have an amazing range of speeds and costs. Compression is wholly appropriate for those in the low speed, high cost region relative to the CPUs that can do the compression. That leaves out local links. In the local area, wires have gotten so fast that compression just makes no sense.

  81. Graphics cards by Anonymous Coward · · Score: 0

    Would a graphics card whose had it's re-programmable vector thingies programmed to handle tcp/ip be useful? Possible?

  82. Solaris's TCP stack by jsimon12 · · Score: 1

    It is no secret that the Solaris TCP stack is wildly outdated and could use a complete overhaul. Sure the new modular and improved stack they are including in Solaris 10 is a start and it is lightyears better then say Solaris 2.6. Sun is still playing catchup.

  83. This vs TOE by Grimace1975 · · Score: 1

    Difference between this and a TOE (TCP Offload Engine)?

    TOE HBA's (Host Bus Adapter) have been available from many vendors for a while.

  84. Re:nvidia (MOD parent up) by ashayh · · Score: 1

    While Intel is still a long way away from selling this chipset, the Nvidia nforce4 Pro is already available(although right now its expensive and rare).
    Not sure if intels solution also offers a firewall. The firewall dosent work in Linux(yet?). Not sure if the offloading engine will work in Linux.

  85. Re:White elephant - flawed logic by morzel · · Score: 2, Insightful
    This is the problem which faces networking processing. Any given thread which performs network I/O will be executing on a single CPU.
    In the purest form, it would be like that: one single thread that does not gain much from the offloading. However: have you checked just how many threads are actually running on PCs nowadays? You specifically say 'more tasks can be done concurrently'... isn't this exactly the point of offloading?

    Next thing you know, the difference between SCSI and IDE are moot because 'for one thread it won't make that much a difference since you'll end up waiting for the data to come of the platters anyway'

    To consider your analogy, if the manager has only one task to do, and needs the other person his secretary calls to respond before he can continue, there's very little point having a secretary make the call for him. He's going to be stuck waiting till the reply comes through anyway.
    There are just not many managers around nowadays that just have one task to do...

    To take the problem to an illustrative extreme, we could in theory have a multitude of slow CPUs which the main zippy CPU offloads everything to; graphics, network, disk, etc.
    Why would you think that a network processor would be slower? Just due to the fact that it is a specialized processor you can count on it that it'll do TCP checksumming and all that stuff a lot faster than most (if not all) general purpose CPUs. On top of that, you won't get interrupts/context switches for bad packets...

    While this all may not seem much, this is definitely a performance improvement for the system as a whole.

    --
    Okay... I'll do the stupid things first, then you shy people follow.
    [Zappa]
  86. Great... now we must upgrade.... by Anonymous Coward · · Score: 0

    ...the HARDWARE too, whenever we choose to upgrade our TCP/IP stack... and those cards are not cheap...

  87. Re:White elephant - flawed logic by Bill_the_Engineer · · Score: 2, Insightful

    OK I'll bite...

    The problem with Toby's argument is that he is fixated on the speed of the CPU. It doesn't matter how much slower or faster the Network CPU is compared to the Main CPU. It is more important to have the Network CPU fast enough to handle to I/O requirements dictated by the network architecture.

    With L2 cache and DMA being the norm now a days, I don't see what the problem is. Sure the Main CPU will stall if the cache needs to do fetch something from main memory, but hardware can be adjusted to take these possibilities into account.

    Having processors dedicated to tasks, frees the CPU to handle any other tasks on its agenda. I see a network ASIC being able to receive the data payload ready for transmission, and do its thing until it interrupts the CPU to report it is done.

    Also, the cpu would not have to wait for the network transmission to complete before sending more data. The network device would keep accepting payloads until the buffer was full.

    While the Graphics Card is a good example, a better example would to look at the FPU. Floating Point Arithmetic is more CPU intensive than integer. To speed things up, the CPU submits the desired computation to the FPU and the FPU notifies the CPU when the calculation is complete.

    Then there is the other omission made by Toby, the bus does not have a 1:1 speed ratio with the CPU. With this in mind and using Toby's logic, the ASIC would only have to match the bus speed not the CPU's.

    Toby keeps mentioning why pay for a dedicated CPU when expensive CPU you have can handle the task. I think most engineers would ask why tie up an expensive CPU when a dedicated CPU can do the task.

    In other words, lets free our expensive CPUs to perform general computational tasks by off loading some of the mundane labor to dedicated ASICS.

    I will say Toby is correct with one thing. In a personal computer, I don't see the advantage to the Network ASIC (other than API), since the CPU is idle most of the time anyway.

    However, in Intel's target market. I would like to have the CPU perform the application logic and offload the networking to dedicated processors. The idea being that if more headroom to the CPU is possible with the Network ASICS, I could see an increase to the maximum number of transactions per second. This increase could be just enough to keep me from investing in another blade or even another server.

    Then again.. I may need more sleep.

    Best Regards,
    Bill

    --
    These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
  88. memcpy accelerator? by morcheeba · · Score: 1

    Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy()

    Which begs the question... why not implement a generic memcpy accelerator and speed up all sorts of operations?

    I know there's DMA for that, but unless it is cache-coherent, the cache invalidation could make it too slow to be useful. I've used DMAs on lots of other systems, but the last time I used it on PC hardware was a 486. Is this in modern P4 or AMD processors?

  89. Re:Nothing to see here by 2TecTom · · Score: 1

    from all of us who are now rotflofao ty

    --
    Words to men, as air to birds.
  90. Re:White elephant - flawed logic by sirsnork · · Score: 1

    This is a very good point (several in fact). The final paragraph fails to take into account that even 1GbE doesn't leave the processor idle. At 10GbE the processor will be run at close to 100% just handling the network load. This is one of the reasons 10GbE is so expensive today because a lot of hardware offloading is required

    --

    Normal people worry me!
  91. Re:Nothing to see here by joker784 · · Score: 0, Offtopic

    Ha! Typical male chauvinistic obsession with faster, faster - when everybody knows that when it comes to sex you actually strive for SLOWNESS and PROLONGING. We don't need no fucking porn accelerator :-)