Intel Develops Hardware To Enhance TCP/IP Stacks
RyuuzakiTetsuya writes "The Register is reporting that Intel is developing I/OAT, or I/O Acceleration Technology, which allows the CPU, the mobo chipset and the ethernet controller to help deal with TCP/IP overhead."
Yet another processor that requires liquid nitrogen.
First checksum offloading, now this... It is nice to see that hardware vendors are realizing that 10Gbit/s+ speeds aren't currently realistic without extra forms of computation support from the underlying network interface hardware.
This is Good News.
intel is working on something worthwile: a cure for the common slashdot-ing
;)
and they say the drug companies are miracle workers
John 3:16 - The easiest way to a BETTER YOU.
I think in Tannenbaum's book there's a reference which states that offloading network processing normally isn't useful, because the CPU that work is offloaded to is always less powerful than the main CPU and the main CPU is normally blocked in it's task until the network processing has completed.
--
Toby
I was one of the lucky few who beta tested this. The plus side is you can overclock your network card to download faster than the remote server bandwidth. I did not try it, but I would be able to slashdot the slashdot.org website just by browsing it.
As we know it damn well, shit happens all the time.
So... how exactly are they going to ship patches in the case of a security issue?
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
It seems such an obvious thing: make a tcp/ip processor, put it on a NIC and give it a high level interface, instead of just a low level IP interface.
makes you wonder why nobody has done it before...
maybe this is some plan of intel to control the internet: add some secret DRM capability to it, wait until everyone until everyone is using it, and then take over the world.
Or -door number 2- sell your services to the NSA.
What is needed more is a high-speed bus for network interfaces, as gigabit ethernet becomes more common. Even if a gigabit adapter had a whole 32-bit PCI bus to itself, it could still easily saturate it.
It seems like most common denominator board manufacturers have put off 64-bit PCI support for too long. It's going to bite them in the ass if it doesn't become standard very soon.
Isnt Nvidia doing the same with his new nforce serie motherboards? lowering cpu usage by adding network management code and a SPI firewall inside the chipset?
This seems interesting, though given intels track record I wonder if it will really be as useful as they are speculating, as the article has no real technical information.
Granted, I've never administered a server that was under anywhere remotely near the types of loads we are talking about for this to be useful, but I have a hard time imagining that dealing with the TCP/IP stack would be more intensive than running applications (as the article claims).
So, far all you people out there much more qualified to discuss this than I am, will having some part of the processor dedicated to handling TCP/IP really speed things up, or is this primarily a marketing technology?
Famous Last Words: "hmm...wikipedia says it's edible"
Uh, this isn't new, Qlogic has been doing it for some time now, in there TOE cards (TCP Offload Engine). The cards are smoking, especially on Solaris, cause Sun's TCP stack is crappy.
With the ever growing popularity of fluff statements like this one, I think a statement like the parent may yield no real benefits to this discussion.
--
The last digit of pi is four.
soon it will be dedicated processor and RAM to deal with tcp, then a dedicated processor for the keyboard input, then a dedicated processor for the fans and a special dedicated processor on 12" PCI-X card for the extremely computationally intensive MOUSE, actually this will have it's own special dedicated path call 'AMP' or Accelerated Mouse Port. Mice of the future will need much more bandwidth than today. About 16 GB i/o so they need their own data paths.
And then there will be other enhancements like the tcp/ip one.
For instance a special accelerator card for Word and Internet Explorer will be developed.
Furious Linux users will demand their own technology, so one manufacurer will come up with a special card for running GNOME apps. This card will have 4 duel core 6 Ghz processors and allow Gnome to run at normal speeds.
Newly discovered, a simple and easy karma-gaining method! Amaze your friends, and become more eligible to moderate!
1. Refresh your browser constantly until there's a new story on Slashdot, to post before everyone else.
2. Post something similar to "This is good/bad, for INSERT_OBVIOUS_REASON_HERE. And fuck the INSERT_RIAA-LIKE_ORGANIZATION_HERE." (second sentence is optional)
The article doesn't say, and I'd hate to be "stuck" with a card that only does IPv4. Yeah, I know, hardly anyone uses IPv6 today, but the nations of China and Japan, as well as the US DoD, are starting to roll out IPv6 networks in a big way.
Think 80186, ergo, "io co processing instructions". ;-)
AC being Alan Cox, DM being Dave Miller.
Read Alan's opinion here.
Read Dave's opinion here.
There has been discussion of this specific Intel announcement here.
can I overclock it?
This signature is annoying. - STEvil of www.xtremesystems.org
I will do this slowly so you can understand.
HE
DIDN'T
SAY
A
DAMN
THING!
With the ever growing wishes by some to get first posts, I think the little time to write a post may yield that kind of quality.
Beware: In C++, your friends can see your privates!
targeting the OS. I can see this technology being useful on servers which have multiple network cards and heavy traffic, but not for joe average pc user.
buying Intel really will make the internet go faster!
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
Marketese is all.
But will the technical details of this be available for OSS or will it be like OpenBSD's experience with Intel's cryptographic hardware?
Intel has been wanting to do this for years! I remember reading old articles on The Register about it, and how they were pulling back because Microsoft didn't like the idea of Intel taking away things that Microsoft were running with their software, including things like managing networking instead of having the OS do it.
Of course it couldn't last, what with nVidia doing firewalls and NICs and all sorts of other things, Intel is a big company and they know when they need to compete. MS has also lost a bit of their clout when it comes to things like pressuring the bigger companies (intel, HP, Dell)
I am government man, come from the government. The government has sent me. -- G.I.R.
As opposed to right now, where all that TCP/IP stuff is handled by the floppy drive and the mouse?
If the point isn't obvious now, I'm trying to say the CPU, the motherboard chipset, and the ethernet controller were already intimately involved in the whole network stack thing.
My boxes all run tens to hundreds of processes for tens to hundreds of people. Offloading the processing to a networking subsystem isn't going to hurt, especially with gig and 10gig.
Not that this is a new idea. It's been done for donkey's years.
Government of the people, by corporate executives, for corporate profits.
didn't you know?
The secret to faster downloads is to keep wiggling the mouse, that way it pushes the data through faster.
Advanced users are users too!
i'd guess the tcp/ip stack implementations available to intel are pretty solid. still, i'd hope it'd be flashable just in case. i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net.
I accelereate TCP/IP stacks...
with my ASS!
It's like programming with a variable that has yet to be defined.
Troll but, i'll bite.
I said, TCP/IP data. Typically, the ethernet controller, mobo chipset, and cpu don't care what kind of data it's processing, just that it's processing data. Now it'll be sensitive to TCP/IP overhead and have special ways to process it.
Non impediti ratione cogitationus.
And the most that your average ethernet controller does in hardware is what? TCP checksumming? Oh, thanks Mr. Controller, that helps a lot.
I know you're probably on to something, but really, I have no idea what you're talking about... TCP/IP stack in flash memory? Huh?
... that Linux was an obsolete design ?
... but in practice ?
If so, I will beware any bold predictions he make.
He might be right in theory I guess
I wish Intel would enhance my girlfriend's stack...
This makes it too easy for spyware. I would not use this technology if you want your privacy.
You are being MICROattacked, from various angles, in a SOFT manner.
...when you can get AOL internet accelerator for FREE!
pr0n - keeping monitor glass spotless since 1981.
Enhance your Stack!
Have you ever wanted your TCP stack to be more secure? Has your internet ever dribbled? Sign up for intel soft tabs now!
Don't think for a minute the big boys aren't trying to take the Internet away from us. The missed the opportunity once, never twice.
Mumia Abu-Jamal is *laughably guilty*. Check the evidence.
Whoa! Like when you could actually buy network cards that communicated only protocol layer to OS?
Actually it's fun. Once the computer was full of small, dedicated processors that dealt with various processing where it was applicable and your old 486 actually didn't felt that bad.
Didn't felt bad when compared to P200 with winmodem, cheap NIC and AC97 sound card.
That was the case with the early versions of NT (I don't know if it changed). The idea seems sensible from one side -- the application with the most of the mouse activity (and focus) got the highest schedule priority and hence more CPU time.
Will this technology make it easier for systems to withstand DoS Attacks?
This is ridiculous.
We're had this for years in FPS's- used to be that I used to have to practice for ages just to compete with the young kids at FPS's. Then along came some great 'acceleration' technology, and it's been so much easier. I call mine a bot.
Ever since it hasn't been about upgrading my CPU or graphics cards to get that head-shot. I've been offloading all that work!
packet from a common worm? main CPU never sees it.
ping? main CPU never sees it.
heck give it enough scratch ram, and maybe host your main page directly on the NIC.
I say the last digit of pi is zero
Do your best, hope for the best, suspect the worst.
How much will this speed up ethernet connection?, does anyone know. Same article at Bits of News
Bits of News Giving you the latest bits.
Here be the fourth fluffy post in this thread.
How far can we take it to the right before my browser crashes?
No no no no no no no.
Take note how *saying* something is fluff get's you more karma than the fluff itself.
You must be new here.
With all due respect to Mr. Tannenbaum, but if he stated what you put in your post, his logic is severely flawed.
Let's compare the general CPU/networking CPU combination with a manager/secretary.
The manager has a number of tasks which needs to be done, including scheduling a number of appointments. Without a secretary, he'll be obliged to call/contact the people involved, wait for their responses and note the scheduled appointments in his calendar. Once that is done, he can go about with his other tasks.
When that manager has a secretary, he can just tell the secretery to make the appointments and notify him when they're done. That secretary isn't going to be any faster in time making those appointments (still has to call the same people); but in the mean time the manager can start working on something more useful (in theory).
While the secretary may not be that much faster at scheduling appointments (she probably is, since she knows how to deal with this and who to contact a lot quicker and in a more structured way than the manager), the end result is that the manager can get more work done because he delegated some of it to the secretary.
Note for the Politically Correct: feel free to swap he/she where approriate.
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
Specialized CPUs for $INSERT_PURPOSE_HERE can be much cheaper and much faster than a general-purpose P4 CPU (if you don't believe me, think about graphics cards). Besides which, most networking is I/O-bound... which means that the point of these 'intelligent' network chips is to offload the bus not the CPU. So, do us all a favour and shut the hell up.
It's been done before, many times before, going back to the early days of Ethernet and TCP/IP. There was a company in the 1980s called Excelan that made smart LAN boards. The problem has always been that it usually doesn't work that well. Smart boards are expensive. Smart boards with fast CPUs and lots of memory are really expensive. A new protocol stack has to be created for the main CPU to communicate with the smart board. When you compare the number of cycles required to support the host to smart board protocol with the number of cycles required to do TCP/IP on the main CPU, you often find that the gain is disappointing. It just isn't cost effective and the performance improvement is marginal.
Mea navis aericumbens anguillis abundat
Hasn't 3COM already implemented this, putting higher level stack elements in their firmware?
"i can imagine only once in a blue moon would you find someone with libpcap and the patience to find holes in some of the most trusted code in the net."
Apparently some people missed the sarcasm here.
To those, this happens OFTEN.
The truth about Led Zep should never be told on
If I am to believe the marketing, the first to do this kind of complete offloading were Alacritech, with their TCP/IP Accelerator. Unfortunately, you have to register to see their benchmark reports.
AFAIK, RDMA doesn't work off of the MMU so it can't do virtual memory address translation or know about things like page faults. So you're really limited with how you can set up memory to work with it, or with being able to share it in a multi-tasking environment. Imagine only one unix process being allowed to access the network at any one time.
Except:
-
The silicon-stuffer only has access to the slow processes of maybe two silicon generations back, unlike the CPU which paid for the latest whizzy xx picofurlong process. So the supposedly whizzy chip is still not particularly faster than the CPU.
- The whizzy chip shows up late, just about when the associated CPU is going to take a 2x speed hike.
- The chip is on the I/O bus, requiring many slow I/O cycles, with interrupts masked, to get its commands.
- Said whizzy bit-banger doesnt have any software support from the main operating systems.
- The silicon-etcher guy can't write english worth a damm, so nobody can understand the spec sheet.
- And oh, he didnt know the bus was active-low, so all the data packets have to be inverted.
- And sometimes byte-reversed too.
- The chip designer doesnt know or care about the whole system, so the chip does several things that spoil the overall performance, like hogging the bus, saturating the bus snoop logic, poisoning the cache, interrupting too often, etc.
- The droolers forgot to think about the multi-processor option, so the chip doesnt share well with multiple CPU's.
- The chip is all hard-wired gates, so there's no way to fix the problems.
Finally some software wizard finds a way of speeding up the code that runs in the CPU so it's now faster than the separate chip, so the chip is now useless and just an extra power waster.We've seen successive waves of this concept, none of them have had much success. Graphics processors are one partial exception, and it took almost a decade of mis-designs of those before they became stable enough to be usable.
i have used both winmodems and external serial modems, and i seen a noticeable improvement with the external serial modem, Linux sure found it and ran it good, surfing seemed slightly faster with the external serial modem, could it be because it had its own CPU and did not hitchhike off the motherboard's CPU? i think that is so...
i bet the same logic applies to 10/100 NIC cards for broadband, maybe if they were built with their own CPU then they would have better thoughtput...
just my $ 00.02
3. Don't be funny. Funny doesn't give you karma.
Umm, haven't we been here before with the Intel PRO cards?
They at one point used to do just the PRO/100 cards, then they dropped them and started doing PRO/100 cards that did IPSEC hand off? If I remember correctly the S was security and they had a few other models? I was thinking back then that they would be looking at IP hand off at some point.
Curiosity was framed; ignorance killed the cat. -- Author unknown
If they put out an API for Software Engineers, will it be available at http://ioatse.cs ?
that will be offloaded to your AVPU (Anti-Virus Processing Unit)
The revolution will not be televised... but it will have a page on Wikipedia
Yes and see also this Adaptec product which seems to have been doing TCP/IP offloading for over a year.
Does this approach have some side effects?
For example programs, that reuse the buffer right after the send() ?
You tell me when your BIOS gets infected with spyware and I'll start worrying.
So it goes on and on...
x86 has gotten 32bit extensions, protected mode, MMX, 3DNow, MMX2, SSE, SSE2, 64bit extensions (+ some new registers), and now another special-purpose instruction set (?) enhancement.
PPC, on the other hand, has been a 64bit instruction set from the beginning (of the '90s, that is); has had one SIMD instruction set (Altivec) that many claim to be superior to all that SSE stuff; and it has lots of nice registers and cool instructions that are much more fun to use for any compiler than the Intel crap.
Oh, and PPC hasn't changed through all those years, so you don't have to learn new instruction sets all the time (and program that damn chip in assembly, because compilers don't know the extensions, yet!).
I'll take any speed boosts Intel wants to throw my way but I think their efforts would be better spent elsewhere.
Craig Barrett here.
Listen we apologize for this distraction, and apologize for not consulting with you first. I guess some of our engineers just got caught up in something silly and they went off and did this when instead they could be doing things more valuable to you.
We immediately begin work on the porn accelerator coprocessor.
Now if Microsoft would remove the restriction of 10 concurrent tcpip session on xphome and pro this might be useful.
Using a P4 to do I/O work is like using a battleship as a landing craft. Until now, the alternatives have been to do that or let your soldiers (packets) swim to shore. Intel's smarter cards are like providing landing craft.
This is not a new concept.
DEPCAs made network I/O easy back in the days of ISA busses twenty odd years ago, and there have been PCI cards with their own CPUs which you can actually load a version of Linux into and use as standalone routers - so the network cards handle stuff like ICMP and defragmenatation without even touching the main CPU.
Got time? Spend some of it coding or testing
4. Don't be too insightful/interesting, too often.
Excelent karma is no good if you want mod points, I haven't had those for a looooooong time.
Now I always post with karma bonus, even when flaming, so I can go back to "good" or "great" karma.
- manage the TCP stack
- manage and parse each TCP connection
- optimise the parsed SQL
- plan and execute intelligent disk IO
...leaves the main processor to marshal everything and pick up any processing too complicated for the sub-processors' tiny little minds. Such a beastie would certainly keep the RAID arrays rattling and network cards glowing.Got time? Spend some of it coding or testing
...the orignal IBM PC put a processor in the keyboard and another (dumb) processor on the motherboard to talk to it.
This USB keyboard I'm typing on involves at least three processors, one to scan the keys, one to do the USB on the peripheral side and the third to do the USB on the motherboard side.
Got time? Spend some of it coding or testing
In addition, for economy and speed, the stack would not necessarily be implemented as serially as it is in a full software implementation. Also most operations would occur in one clock cycle.
Of course upgrades to tcpip would be - replace the card.
> Using the same logic, machines with two (or
> more) CPUs wouldn't be useful, since the second
> CPU is not going to be any faster in than the
> first one.
This deduction is improper for two reasons.
First, for it to be relevent to the networking scenario described in the OP, the networking CPU would have to be equal in processing capability to the main CPU. This is not the case.
For example, if I had a dual processor machine where one CPU is a 3 GHz P4 and the other is a 66 MHz Pentium, is the second CPU really that useful or is it in fact a hinderance? particularly when you consider the networking scenario, when any tasks offloaded to the slow CPU *must* be completed before the fast CPU can continue with that task.
Secondly, it fails to take into account the inherently and unavoidably serial nature of network packet processing. You cannot usefully apply two CPUs to this task. If a machine was given tasks which were not subject to parallism, then having multiple CPUs does not speed up any given task; more tasks can be done concurrently, but each task takes the same time.
This is the problem which faces networking processing. Any given thread which performs network I/O will be executing on a single CPU.
To consider your analogy, if the manager has only one task to do, and needs the other person his secretary calls to respond before he can continue, there's very little point having a secretary make the call for him. He's going to be stuck waiting till the reply comes through anyway.
By and large, many people in this thread are failing to perceive that parallism is not a solution, since the issue is the performance of any single thread which is performing network I/O.
To take the problem to an illustrative extreme, we could in theory have a multitude of slow CPUs which the main zippy CPU offloads everything to; graphics, network, disk, etc.
Result? anything that requires operations which are offloaded performs weakly, since its critical path of execution spends most of the time on the slow CPUs - and we *paid* for all those slow CPUs, when we've already paid for our expensive main fast CPU!
--
Toby
TCP offload engines, zero copy I/O etc are not exactly a new concept. However, what could be significant is in the realization of applications based on these concepts. i am trying to bring in a point about RDMA (Remote Direct Memory access) which relies upon a hardware based RDMA engine residing in the peer. RDMA suits bulky data transfer like the one seen in SAN (Storage area networks).
more info on RDMA is available at http://rdmaconsortium.org/.
Without an accelerated routing database, you are, most likley, stuffed.
Shaheed (who worked on I/O for the world's fastest routers some years back)
honestly, i would want a switch line card to be in a computer to provide non-blocking high i/o and real time processing to network traffic much like a router and a switch does.
anyway, we'll be waiting for the offload of the 10gbe cards! this time we need to upgrade our fc to support 10gbe as well. :)
Live your life each day as if it was your last.
o/~ Join us now and share the software
uh.. no, i was serious... i can only think of a few times in recent years i've heard of a tcp/ip stack implementation getting compromised.
i've searched US-CERT for "tcp/ip" and there's only two or three i see.
as for the other flash memory comment.. am i missing something? the tfa is about hardware tcp/ip implementations.. you'd want to be able to correct the code if a critical flaw was discovered.. wouldn't that be time for firmware?
So my modem does many forms of compression, and I download files over ethernet compressed in JPG, GIF, RAR format. My cable modem does MPEG2 compression.
The question comes to mind, why doesn't Ethernet adopt some for of compression?
Most of my packets are small and would not benefit from compression, but most of my bandwidth is used by large packets that would benefit from compression.
In a LAN environment, I may return an SQL dataset in raw ASCII with no compression. Or I may copy a large text file from one machine to another.
If the goal is to increase thruput, why not optimize bandwidth? Add something to the Ethernet spec to allow connections to be compressed or uncompressed.
You could do this at multiple layers, either by compressing each packet (minus header info) which would be easier but less compression would occur, or by compressing the entire transfer (again, minus header/trailer info.)
I know it is a bit more complicated than this, but my $.01US 56K winmodem does it, as does my $100 external non win modem.
www.Acmenews.com LLC
Would a graphics card whose had it's re-programmable vector thingies programmed to handle tcp/ip be useful? Possible?
It is no secret that the Solaris TCP stack is wildly outdated and could use a complete overhaul. Sure the new modular and improved stack they are including in Solaris 10 is a start and it is lightyears better then say Solaris 2.6. Sun is still playing catchup.
Difference between this and a TOE (TCP Offload Engine)?
TOE HBA's (Host Bus Adapter) have been available from many vendors for a while.
While Intel is still a long way away from selling this chipset, the Nvidia nforce4 Pro is already available(although right now its expensive and rare).
Not sure if intels solution also offers a firewall. The firewall dosent work in Linux(yet?). Not sure if the offloading engine will work in Linux.
Next thing you know, the difference between SCSI and IDE are moot because 'for one thread it won't make that much a difference since you'll end up waiting for the data to come of the platters anyway'
There are just not many managers around nowadays that just have one task to do... Why would you think that a network processor would be slower? Just due to the fact that it is a specialized processor you can count on it that it'll do TCP checksumming and all that stuff a lot faster than most (if not all) general purpose CPUs. On top of that, you won't get interrupts/context switches for bad packets...While this all may not seem much, this is definitely a performance improvement for the system as a whole.
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
...the HARDWARE too, whenever we choose to upgrade our TCP/IP stack... and those cards are not cheap...
OK I'll bite...
The problem with Toby's argument is that he is fixated on the speed of the CPU. It doesn't matter how much slower or faster the Network CPU is compared to the Main CPU. It is more important to have the Network CPU fast enough to handle to I/O requirements dictated by the network architecture.
With L2 cache and DMA being the norm now a days, I don't see what the problem is. Sure the Main CPU will stall if the cache needs to do fetch something from main memory, but hardware can be adjusted to take these possibilities into account.
Having processors dedicated to tasks, frees the CPU to handle any other tasks on its agenda. I see a network ASIC being able to receive the data payload ready for transmission, and do its thing until it interrupts the CPU to report it is done.
Also, the cpu would not have to wait for the network transmission to complete before sending more data. The network device would keep accepting payloads until the buffer was full.
While the Graphics Card is a good example, a better example would to look at the FPU. Floating Point Arithmetic is more CPU intensive than integer. To speed things up, the CPU submits the desired computation to the FPU and the FPU notifies the CPU when the calculation is complete.
Then there is the other omission made by Toby, the bus does not have a 1:1 speed ratio with the CPU. With this in mind and using Toby's logic, the ASIC would only have to match the bus speed not the CPU's.
Toby keeps mentioning why pay for a dedicated CPU when expensive CPU you have can handle the task. I think most engineers would ask why tie up an expensive CPU when a dedicated CPU can do the task.
In other words, lets free our expensive CPUs to perform general computational tasks by off loading some of the mundane labor to dedicated ASICS.
I will say Toby is correct with one thing. In a personal computer, I don't see the advantage to the Network ASIC (other than API), since the CPU is idle most of the time anyway.
However, in Intel's target market. I would like to have the CPU perform the application logic and offload the networking to dedicated processors. The idea being that if more headroom to the CPU is possible with the Network ASICS, I could see an increase to the maximum number of transactions per second. This increase could be just enough to keep me from investing in another blade or even another server.
Then again.. I may need more sleep.
Best Regards,
Bill
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Also remember that a well implimented TCP/IP stack runs at about 90% of the speed of a memcpy()
Which begs the question... why not implement a generic memcpy accelerator and speed up all sorts of operations?
I know there's DMA for that, but unless it is cache-coherent, the cache invalidation could make it too slow to be useful. I've used DMAs on lots of other systems, but the last time I used it on PC hardware was a 486. Is this in modern P4 or AMD processors?
HIV Crosses Species Barrier... into Muppets
from all of us who are now rotflofao ty
Words to men, as air to birds.
This is a very good point (several in fact). The final paragraph fails to take into account that even 1GbE doesn't leave the processor idle. At 10GbE the processor will be run at close to 100% just handling the network load. This is one of the reasons 10GbE is so expensive today because a lot of hardware offloading is required
Normal people worry me!
Ha! Typical male chauvinistic obsession with faster, faster - when everybody knows that when it comes to sex you actually strive for SLOWNESS and PROLONGING. We don't need no fucking porn accelerator :-)