Is There a Place for a $500 Ethernet Card?
prostoalex writes "ComputerWorld magazine runs a story on Level 5 Networks, which emerged from the stealth startup status with its own brand of network cards and software called EtherFabric. The company claims they are reducing the load on the servers CPUs and improve the communications between the servers. And it's not vaporware: 'The EtherFabric software shipping Monday runs on the Linux kernel 2.4 and 2.6, with support for Windows and Unix coming in the first half of next year. High volume pricing is $295 for a two-port, 1GB-per-port EtherFabric network interface card and software, while low volume quantities start from $495.'"
Yes, there is a place for a $500 ethernet card, far, far away from this guy.
I wonder what has changed? I have never known the CPU to get dragged down by network traffic, but maybe in the network server markets it is different, However with the Ethernet chipsets being designed into the motherboard and integrated into the tight circle of RAM and CPU, it isn't clear there is a need for this.
How long before the network control is put into the CPU? It is going to be tough to beat that type of performance.
A most overlooked advantage to owning a computer is if they foul up there's no law against wacking them around a bit.
Is There a Place for a $500 Ethernet Card?
Of course there is, assuming the card performs as advertised. Sheer conjecture: the card likely has a lot of the smarts onboard. Maybe it has some of the TCP and IP stuff on board too (checksum, etc). Compare that to a crapbox $10.95 RealTek[a] card which generates interrupts like mad because it has no smarts and you'd probably be very suprised. (Think of comparing a decent hardware modem to a software based WinModem.)
[a] I had a sales-drone at Computer Boulevard here in Winnipeg just RAVE about RealTek cards. I said I really wanted 3 Intel or 3COM cards for a new work proxy server and he said 'Why? RealTeks are way cheaper and run at the same speed!' Retard.
Trolling is a art,
right inside my computer :)
Short answer;
Where there are PHB's, there is overpriced hardware.
Million dollar PCs (sans gold-plating) and (quite seriously) mission-critical blade servers, customer ip routers, etc.... I have clients that pay upwards of $600 canadian now (though that's for quad cards with ample on-board processing to off-load from cpu horsepower).
This isn't exactly an entirely new concept. Intel have been selling their ethernet chips with built in SSL accelerators for quite some time, and the advantage of offloading duties from the software to the hardware (see Intel etherexpress vs RealTek style cards) is obvious. Whether these cards offload enough of the normal duties of a typical cluster node to be worthwhile should be interesting to see, as there are a wide variety of cluster load types and obviously these cards will have a niche to fit into alongside their competitors in the diverse set of demands around cluster networks. As for the price tag, I seem to remember gigabit cards being extremely expensive a few years back, and its probably pretty competitive with where they're aiming this product, alongside myrinet and infiniband.
Business Voyeur
I give Realtek 6 months tops to make thier own knock-off of the card for $24.95.
You say things that offend me and I can deal with it. Can you?
But not necessarily where the vendors think it is.
Back when I was working at a startup developing anti-DDoS technology, one of the biggest problems we were faced when implemented GigE, was the load on the PCI bus. (This was before we started using PCI-X).
It depends on exactly how customisable the network card software is, but if you could plonk a couple of those into whatever system you wanted - and if the cards themselves could do, say, signature detection of various flood types, or basic analysis of traffic trends then that is a very definite market.
I realise the core issue is not addressed (if your physical pipe is full, then you're fucked), but it takes the load of dropping the malicious packets off the host CPU so it can attempt to service whatever valid traffic actually gets through.
And then there is IP fragmentation. Bad fragments? Perhaps a dodgy fragmentation implementation in the stack? (you know which OS I mean) Lets just drop that before the host sees it and crashes.
I don't know, I can't find any real information describing what they do, but I can certainly see uses for this.
It's nice to see a piece of hardware that ships with linux drivers and promises Windows support later. So frequently applications and hardware are first supported under Windows and occasionally ported to other platforms.
if your internet connection is anything less than fiber, which is about 99.9% of all connections?
The other 0.1%, obviously.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
if your internet connection is anything less than fiber, which is about 99.9% of all connections? Not to mention the fact that not many computers can actually handle that much data at once anyway
Listen, when I've got 30 web servers banging away on a single database server, I want each web server in and out as quickly as possible. Every bit of the handshake, query, and results is going to wrap up that much faster if things are faster, period. When you're dealing with a huge data-driven e-commerce site, where every page renders around a hundred or more queries, and there are dozens or hundreds of concurrent page views, this stuff really counts in the aggregate.
If you sell one more widget per day, all year long, because your web presentation layer is just a little more snappy, that's sure as hell going to pay for a $500 NIC.
Don't disappoint your bird dog. Go to the range.
The name Level 5 refers to the network protocol stack where level 5 delivers data from the network to the application, according to Karr. The company isn't concerned about any potential confusion with Internet Protocol telecom Level 3 Communications Inc. On the contrary, he quipped, "It's working in our favor. People say, 'Yes, we've heard of you. You're a big company.'"
As lawyers at Level 3 begin salivating at thought of all of the potential lawsuits.
Back in the early-mid 80's (and probably even before then) IBM mainframes using SNA instead of TCPIP used special networking processors that handled all of that "networking stuff" so that the mainframe CPU (which really was a "unit" and not just a single chip) could just concentrate on running its jobs and not be interrupted by the communications end of things. Everything old is new again. Same situation, just smaller and faster (CPU and helper communications card take up 1U in a rack instead of 1 whole corner of the head end room).
We use Filers for storage at Gigabit speeds. Compared to our SAN/FC evironments, we see much higher CPU utilisation on our Sol 8 boxes, especially when attempting to get to Gigibit speeds.
$500 for a network card you have to have a good reason that you will need it. I am sure there are applications that will utilize it but for the price it may not be worth it. With sub $500 computers coming to age. It is probably cheaper just to split all your services onto smaller boxen and have a load balancing switch/router. Computers are cheap today $500 for a network card is steep and will only fill a niche market. Perhaps if the price was in the $50 range it would be more widely accepted. But with good enough systems at 1k and additional 500 could be used for a faster CPU other then a faster network CPU
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I highly doubt they're aiming these cards at the general public. The kind of folks who worry about this kind of performance aren't buying $500 computers, they're buying $5,000 + computers, and trying to tweak every ounce of performance out of them. I'm willing to bet my employer is going to look pretty seriously at these cards for some of our heavy-use systems.
Sometimes you can't "split all your services onto smaller boxen and have a load balancing switch/router". Not everything on the network is a web server.
Each page renders a hundred or more queries? Sounds like you're better off investing in a better design than better hardware.
Open Source Java DAO Generator
You're probably thinking of the i960-based cards, though Intel's PRO series adapters (not i960 based) do something similar (TCP checksumming is now builtin to the chipset and most OS drivers now know how to take advantage of that). That processor, and variants, were used in everything from network cards to RAID controllers.
They failed because the performance gain and CPU offload numbers were never enough to justify the price difference.
Ding ding ding. I forget who said it (maybe Alan Cox, but I'm REALLY not sure about that), but the opinion was along the lines that it would always be more benefitial to throw the money at a faster processor (or a second processor etc), because you'd get a performance boost everywhere. $300 buys quite a bit extra CPU horsepower these days, and there's no need for the hassles of custom drivers and such. Nowadays CPUs are just so damn fast, it's also not really necessary.
Please help metamoderate.
"A $500 LAN Card? Oh my God, Stevie, thats almost as much as my GeForce9900XTLSI+ cost!" Said the kid with the Lone Gunmen T-Shirt.*
"That's nothing, This 8-Track-ROM player off of ThinkGeekcost almost a cool grand" Stevie said, as the other nerds bowed around his glowing and chromed Frag Machine.
*Lone Gunmen T-Shirts coming soon. 8-Track-ROM's, too.
Looks good for your age..
Sure there is.
In a related story, the IRS has recently ruled that the cost of Windows upgrades can NOT be deducted as a gambling loss.
Take sun, some of their new server kit this year is going to ship 10Gbit/s ethernet on the board, which acording to their docs, is going to take 3 USIV procs to keep the bus saturated (6 cores). But when you are looking at 8 to 64 way server boxes, who cares about those 3 procs, especially when in 24-30 months it will take less than one proc to handle that load (Quad Cored + Moore's Law), and the eventually one thread will have the horsepower.
Surely those smart dudes at Via, AMD, Intel, Samsung, Nat Semi, and/or Motorolla aren't going to:
A) FUD this to death if it really works
B) File patent suit until doomsday to keep it locked up
C) Buy them out
D) Let them wither on the vine and then buy the IP.
09f911029d74e35bd84156c5635688c0
And you can't spell, you will fit right in...
If I point out that you are incorrect, making me a foe does not make you any more correct.
That being said, http://www.ammasso.com/ makes an Ethernet card (priced around $495, I believe) that utilizes both TCP offload and RDMA. The latency of the cards is around 10us. This is great for people needing a high-performance cluster, but can't afford the Infiniband interconnect.
If this card can do most of the work of IPSEC for me, it'd be a big win.
My main concern though is that with two ports, how can I be absolutely certain the packet has to go through my firewall rules before it can go anywhere?
Of course, the extra ports could be an advantage. If it could handle all the rules for you, then it might even be capable of functioning as a layer 4 switch and sending out a new IP packet before completely recieving said packet.
But, I'd want all the software on that card to be Open Source.
Need a Python, C++, Unix, Linux develop
Virtualization.
These are the kinds of NICs that would be put into a datacenter that is leaning heavily toward VMware GSX or ESX servers. Any bit of offload of the CPU in sharing the NICs is a good thing.
What?
Pfffttt...please!
All the TCP/IP acceleration isn't going to do you jack shit if the server is running like crap. Also, you will need a good gigabit switch or router with CAT6 in place to remotely take advantage of this new NIC.
Basic rule of thumb still applies. "Your fastest connection is your slowest link"
Life is not for the lazy.
I would wait on jumping to any type of conclusion and see what happens when EPSRC adopts this for ther 512 node cluster. If it really improves the performance by 10% it certainly should be something to look into. Now, is there is a market for it at end user's desktop? I don't think so. 9$ Realtek card would be just fine. :-)
"Smaller boxes" is relative. Google's cluster nodes are dual Xeons with terabyte+ HDs. For Google it is small, for anyone else, that is powerful computer you're going to be paying alot for. If you're buying one of those computers you're probably going to look at one of these cards, and that is exactly the market they're looking for.
I have a few servers right now that cost close to $30k each. (and this is at a 29 employee non-tech company).
$30k might seem like a lot to a Windows technician, but that is a cheap box for high-end Unix servers (won't even touch mainframe for that).
6 years ago, when I ran a network for a large company (in a very rural community, mind you) the most expensive server there cost over $320,000.
It was an AS/400 and it was/is a phenomenal piece of equipment.
Heck, it cost $900 just for 1 10/100 ethernet card back in 1999. (and PC nics were $15)
$30k is peanuts. Go checkout some 8-way IBM boxes. Or HP or high-end chompaq. And no, you do NOT reboot a $300k+ server during maintenance.
I looked at their benchmark web page http://www.level5networks.com/prod_etherfabricperf .htm where they claim that a typical PC with "conventional" ethernet burns 83.5% of CPU for communication overhead while only 16.5% remain to the application.
But they don't say which CPU was used - probably an 850 MHz Pentium III or something similar outdated.
Fact is, on a current 3.x GHz Pentium IV or an equivalent Athlon or Opteron the communication overhead is in one digit range, percentage-wise.
A famous computer science quote is:
"Lies, damned lies and benchmarks"
and another one is
"Don't trust any statistics that you haven't forged yourself."
Dedicated Linux servers (root access) $45 p.M.
Not trying to knock your design if it works it works. Since you're working on another flavor of it, let me give you my opinion on what I would have done differently. I've worked on webapps like this in java, not sure if you could do the same in php or asp.net. For something like this I would go with Java from my experience with it and also doing PHP. Whatever you do it in this advice might be helpful.
Your merchant info.... This probably doesn't get updated every day so you can cache it on an application level and let the cache refresh itself in a smart way when there are updates. You can do the same on a per session basis with shopper info. Sounds like your tax logic can be streamlined a bit as well. You might want to think of havinga seperate process that does the tax and can keep a cache of all the information, that way you don't have to hit the database for each item like it sounds you're doing now. Most of your logging probably doesn't have to be done in real time to the database. Or even in the database at all. There are ways to link your application logic with the webservers logging mechanism. The webserver usually does it in a smarter way, then you aggregate that info on a regular basis. If that doesn't work for you try asynchronous logging. Start up a seperate thread that writes to the log. That way the user doesn't have to wait for the logging to finish. Also caching the logs locally and then aggregating it even every minute or so on a heavy site should increase network and db performance since a few larger writes are faster than a lot of smaller ones. Even with everything you mentioned I don't see how you can have a hundred or more queries per page. I'm thinking at most 5-10 queries per page will get you all of what you need to display products, cross reference products/specials, and a bunch of other stuff. Your checkout pages might need a little more because you'll want to make sure you get fresh data from the database even though a good caching method shouldn't require it. Doesn't hurt to play it safe when the money actually trades hands. Your datamodel might need some going over as well.
You might want to add some more ram to handle the extra caching and there are many open source distributed caching tools that make it easy. OSCache is good for Java apps, memcached is in C but can be used with other languages including PHP and Java and I think ASP?
Since you're thinking of doing another version of it, you might want to consider these things, and probably more. Hard to say concretely without knowing much more. YOu can probably cut down on your hardware too. A site like hotornot.com, which is granted a lot simpler can serve up about 20 million pages a day across I think 50 servers. If you have a strong DB with 30 front end webservers (assuming you got them all 5 years ago and they're standard issue type webservers) I would expect a well designed, complex e-commerce site to be able to handle around 6-9 million page views a day with good response times easily. That's trying to be conservative too considering what you said.
Getting this NICs for 30something servers is going to run you between 10-15k depending where the volume discount sits in. Like I said I don't know your whole system but that money being spent so you're pages don't hit the database 100 or more times per page should do more for you than these cards. That's my long answer so you don't think I was just being flippant.
Open Source Java DAO Generator
If you have a machine (say on a machine running linux kernel 2.4.20-30.9smp) with a built in gig port (say with eth0 identified as eth0: Tigon3 [partno(BCM95704A6) rev 2003 PHY(5704)] (PCI:66MHz:64-bit) 10/100/1000BaseT) connected to a decent gigabit switch, and another machine (same card, same os)with a gigabit card, those two machines will achieve 940Mbps talking to each other (results via iperf, 0.0-10.0 sec 1.09 GBytes 940 Mbits/sec).
However, if you plug a windows box (2000 or xp, didn't have a 2003 handy) with either an add on card, OR built in gig (2000 vs xp) you get a rather less impressive figure of 550-630. Coincidentally, you'll get the same basic number if you run two instances of iperf on the same computer... This tells me the bottleneck isn't the PCI bus, it's the OS. If you can prove me wrong please do so...
All GBE cards are FC on the MAC layer. Get over it.
Here's where the problems come in:
1) buses suck. PCI-X is fast; a faster bus clock is better still
2) the problems for GBE NICs are, in no special order: dropping crap packets; cleaning up dirty cache (a huge problem for clusters, where this product is poised); session/protocol relationship managment; buffering up misrouted UDP; managing evil ports (setting them up and tearing them down); managing proxies and work arounds (a little SIP anyone? Burping up your IPSec???)
3) making the driver aware of what's going on so sessions don't vomit
4) not bothering the freaking CPU chipset every few milliseconds with useless crap
So, if they do any of these things, bless them and send me the bill. Because (save TOE cards), all of them hassle the drivers and chipsets to no end with stuff that could easily be offloaded. And to those that say, just put more cheapo load-balanced hardware on the job-- you're a chump and deserve to have stuff blown to bits when you mulitply failure points with junko doorstop hardware boxes with all of the brains of a goose.
---- Teach Peace. It's Cheaper Than War.
Not only that, but you set up your internal network to operate at gigabit speeds, can't you? There is more to the network than the connection to the public internet even for those who don't have a fiber connection.
As CPUs get faster an interrupt costs you more in terms of lost CPU time. So, reducing the number of interrupts is more important now than ever before.
My 100 Mbs ethernet card generates about 5k interrupts / second when transferring data at about 30 Mbps. Gigabit cards are engineered to hold interrupts until a few packets of data come in so that a DMA can move a larger chunks of data. If this NIC reduces the use of interrupts even further (say by off boarding computation or even the entire TCP/IP stack and thus allows for even larger DMA transfers) the impact could be substantial.
Unfortunetly, my knowledge of computer innards stops here, so I can't calculate how much cpu time 5000 interrupts actually take or how the new PCI-Express bus changes interrupt processing or how much a benefit it would be to have say only 1000 interrupts / second instead of the 5000.
Interrupts are the one place where it's not remotely true. A faster processor will allow your system to handle significantly more interrupts. The whole interrupt model needs to be thrown out and replaced with something much better.
And while I'm at it, there are many cases where it's not true. Wherever you have a significant bottleneck, hardware acceleration helps tremendously. Tasks like encryption and (HighDef) video playback can max-out the highest-end systems available, while a $50 card can handle those tasks easily.
I don't think purpose-built hardware everywhere is the answer, but I do think having an FPIC/ASIC as a standard computer component could make for incredible speed improvements in most/all of the tasks that are hard for CPUs to perform.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
and have a read of why the interrupt problem isn't a problem anymore, at least on Linux. Note the date too - October 2001.
LWN.net
NAPI has been implemented in the kernel.org kernels for a number of years now.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
I can't believe what I'm seeing here. The majority think this is a bad thing, it seems. I disagree.
I have a few Sun E450s in my shop and I am going to be moving them to gigabit ethernet soon. A gigabit ethernet card from Sun costs considerably more than this so it is an option as long as it will run on Sparc hardware with Solaris drivers. Sorry, but the Intel e1000 just isn't going to cut it here.
I'd like to see that article about 20 instructions (I assume these are ML instructions) handling an entire packet. This may be the case on CISC CPUs, but I just don't see it happening on RISC CPUs. I am not saying it's impossible, but I would definitely have to see that to believe it and I am genuinely interested in reading this article. Please let me know where I can find it.
I don't think some of you understand the difference between intelligent network chips and networkchips with a CPU core inside them. Take a look at Cisco's solutions. Their line is moving hastily towards ASICs. The idea here is that specialized hardware designed to perform a task will ALWAYS be faster at handling that task than a CPU running on the same clock. Cisco is proving this with Layer 3 switching vs routing. It's not clear what their solution is, but I'm willing to bet this NIC is an ASIC optimized for the purpose of handling TCP/IP traffic from an ethernet network. I have a hard time seeing how any CPU will be able to beat out an ASIC in this field today.
Also of note is memory and bus bandwidth. I have seen some comments about CPU usage and how it's negligible and what not. While I don't believe that either (I pay a lot for those cycles, I want to use as many for data processing as possible), I do believe that the CPU handling the TCP/IP stack takes a little more BUS bandwidth as well as memory bandwidth. If this is all handled on the card, both bandwidth usages will be reduced. Bus and memory bandwidth is already lagging way behind CPU speed as it is. It is my number one system performance limiter right now. The more I can eliminate it, the more productive I can be. Someone already mentioned large numbers of packets. This is a good argument as well. When dealing with large numbers of small packets, CPU usage on a CPU-based TCP/IP stack increases as opposed to a smaller number of larger packets. So in some cases, it depends on your network and it's configuration.
Also, consdier that maybe I do only get 10 more cycles per second from using this card. Is the card worth it? With CPU cycles at a premium and everyone here trying to purchase as many as possible and never a single idle CPU in any of my servers, I have to give a resounding YES! 10 cycles per second per CPU times the numebr of CPUs and seconds the NIC is in place is a LOT of cycles and most certainly worth $500 over the lifetime of a $30,000 machine. If they can prove it does what they claim on my hardware, count me in.
Congrations guys... you just admitted to causing actual trademark confusion... have fun in the courtroom.
they were used to reduce the over-generation margin of the power companies
By using up the excess power? :)
Move Sig. For great justice.
It's just like people who see gold ends on something and figure it's the automatic winner in any contest.
:P
Never mind that using gold connectors and non-gold connectors together causes corrosion.
It's been a long time.
Accelerating Ethernet in hardware, while remaining 100% compatible with the standard protocols on the wire, isn't all that new. Just over 2 years ago, I worked on a TOE (TCP offload engine) card at Adaptec.
l ymatrix.html?cat=%2fTechnology%2fNAC+Cards%2fNAC+C ards
http://www.adaptec.com/worldwide/product/prodfami
It was a complete TCP stack in hardware (with the exception of startup/teardown, which still was intentionally done in software, for purposes of security/accounting).
Once the TCP connection was established, the packets were completely handled in hardware, and the resulting TCP payload data was DMA'ed directly to the application's memory when a read request was made. Same thing in the other direction, for a write request. Very fast!
I'm not sure of the exact numbers but we reduced CPU utilization to around 10%-20% of what it was under a non-accelerated card, and were able to saturate the wire in both directions using only a 1.0Ghz CPU. This is something that was difficult to do, given the common rule of thumb that you need 1Mhz of CPU speed to handle every 1Mbit of data on the wire.
To make a long story short, it didn't sell, and I (among many others) was laid off.
The reason was mostly about price/performance: who would pay that much for just a gigabit ethernet card? The money that was spent on a TOE-accelerated network card would be better spent on a faster CPU in general, or a more specialized interconnect such as InfiniBand.
When 10Gb Ethernet becomes a reality, we will once again need TOE-accelerated network cards (since there are no 10GHz CPU's today, as we seem to have hit a wall at around 4Ghz). I'd keep my eye on Chelsio: of the Ethernet TOE vendors still standing, they seem to have a good product.
BTW, did you know that 10Gb Ethernet is basically "InfiniBand lite"? Take InfiniBand, drop the special upper-layer protocols so that it's just raw packets on the wire, treat that with the same semantics as Ethernet, and you have 10GbE. I can predict that Ethernet and InfiniBand will conceptually merge, sometime in the future. Maybe Ethernet will become a subset of InfiniBand, like SATA is a subset of SAS....
Dr. Demento On The 'Net!
- Erecting yet another edifice brings on the huge and unavoidable overheads of yet another different CPU instruction set, yet another real-time scheduler, another code base, another set of performance and timing bottlenecks. Another group of programmers. Another set of in-circuit emulators, debugging tools, and system kernel. Another cycle of testing, bug fixes, updates.
- It sets up a split in the programming team-- there's now much more reason for finger-pointing and argument and mistrust.
- The extra money would usually buy you another CPU and lots of RAM, resources that would benefit every part of the system, not just the network I/O.
- The separate I/O processor usually requires the geekiest and least communicative of the programmers-- not a good thing. The manuals for the I/O card are likely to be very brief and sketchy, and rarely up to date.
- The I/O processor is almost always at least one generation of silicon technology older than the CPU, so even though the glossy brochures just drip with Speeeed! and Vrooom!-y adjectives, it's not that speedy in comparison to the CPU.
For examples, see the $4000 graphics co-processor that IBM tried to sell for the PC (IIRC the CPU could outdo-it). The various disk-compression cards for the PC. Also see the serial ports on the Mac IIvx (very expensive and not noticeably better). Don't forget the P-code chip for the PDP-11/03. All very expensive and blase' performance/$.The white-paper and the web pages on the company site which describe the implementation talk about how this is done.
This card, and the software which drives it, differ from traditional ethernet accellerator cards and from alternative network protocols (like myrinet and iWarp) in several ways.
Alternative protocols not only require using a different software API but also require custom hardware at both communication endpoints.
Traditional hardware TCP/IP accelerators run the bottom half of the stack in custom silicon. This does tend to help reduce host CPU load but suffer from a number of problems. Since host CPU speeds have tended to increase regularly, they often helped for only a brief period of time. They also tended to help most for large packets but helped little or not at all for small packets.
This technology claims to help large and small packets equally well, and also claims to reduce packet latency across the board. It does so by running the bulk of the TCP/IP stack in user space rather than via system calls. The hardware runs ethernet Rx and Tx processing but does not implement the higher level IP protocol processing. Instead, once connections are established, the ethernet frames coming from the hardware, are fed via a system call interface to the application process to which they belong. Then, no further context switching between kernel and the process are required. The top end of the hardware driver and all of the subsequent IP layers, are executed in the context of the user space process. They are linked to the app via shared libraries.
Basically, instead of the linking the IP calls against code which requires frequent switching between user and kernel space, the entire upper half of the stack is run by the application sending and receiving the packets. This offers uniform benefits in packet latency across all packet sizes, and offers improvement in throughput as well.
I assume that all that is required is to link against a different set of shared libraries to gain these benefits (and of course to have the custom hardware on at least one side of the comm. link). This looks very good in principle.
The following page provides an overview of the technology and compares it to each of the competing mechanisms.
http://www.level5networks.com/sol_approaches.htm