Zero-Copy TCP and UDP Output in NetBSD

← Back to Stories (view on slashdot.org)

Zero-Copy TCP and UDP Output in NetBSD

Posted by chrisd on Monday May 6, 2002 @02:34PM from the who-is-this-zero-guy-anyhow dept.

-is writes "Jason R. Thorpe has recently added experimental code to NetBSD-current, that enables zero-copy for TCP and UDP on the transmit-side. These changes could mean significant performance improvements for FTP, WWW, and Samba servers. See Jason's announcement to the current-users mailing list for details." From the text: " On tests on an embedded system with limited memory bandwith, TCP transmit performance on 100baseTX-FDX went from ~6500KB/s to ~11100KB/s, a significant improvement." Excellent!

3 of 74 comments (clear)

Min score:

Reason:

Sort:

what about zero copy on receive? by AdamBa · 2002-05-06 15:32 · Score: 3, Interesting

I'm surprised nobody has come up with hardware to do this. The problem is the network card needs to know about the user's buffer ahead of time. In the old days (i.e. 5 years ago) you had a mix of Netbeui and IPX and TCP on a network and it didn't make much sense to make a card intelligent enough to figure out where to put packets.
But now all anyone cares about is TCP. Furthermore, a typical copy of data to a server goes something like:
1) packet sent by the client to a known port on the server
2) a few packets to set things up and assign a dedicated server port
3) lots of data blasting from the client to the dedicated server port
4) some cleanup packets at the end

Step 3 is what you care about. So you would need to tell the network card, when you get packets for this port, put the data in this buffer in the order received, and put the headers here (in some small header-sized buffers TCP would also provide). Now you might get bad checksums (although the hardware could check that also) or drops or out of order, then you would need to rearrange...but in the 99%+ normal case you get all the packets in order with valid checksums. So the card stuffs the data in the right place, TCP checks the header buffers to make sure everything is kosher, and boom your data is in memory with no copies and off to disk (or wherever) it goes.
You need some other stuff like TCP has to be able to hint this to the network card driver, and figure out if more than one app is using a port (so it can turn all this optimization off) and so on. But hey when it worked it would be cool.
The other way this would work is if the network card was set up with a big chain of receive buffers and it would actually hand a buffer up to TCP (so it got taken out of the chain) and then eventually it would get it back...but this requires a lot of trust of the levels above TCP that ultimately decide when the receive data isn't needed anymore.
As Dilbert said this weekend...if you can understand the preceding, you have my sympathy.
- adam
1. Re:what about zero copy on receive? by AdamBa · 2002-05-07 04:35 · Score: 3, Interesting
  
  Yes, you need separate buffers for the headers (I tried to explain this in my first post but wasn't that clear).
  Let's say you get a 64K buffer from the user. So you hand it to the card and say "all data for port 0x1234 goes in here." Then you also give a chain of receive headers, 64 bytes or whatever. The processing after that should be pretty straightforward...when the card interrupts you with a packet received, it sets a flag saying that the data was put in a user buffer. Then the network card driver tells TCP it has a packet and that it has split header/data and the data is at location XXX. At that point the processing should be basically the same for TCP, verifying checksums and headers etc, but then at the last step where it would copy the data to the user's buffer, it just doesn't have to -- as long as the data was supposed to wind up at XXX.
  The tricky case is handling drops and dups and out of order. For example if the fifth packet in a transfer is received third, then TCP can't just move it to the right spot because the card may be using that spot to receive another packet. In general trying to tell the card "oh you should back up and start receiving new packets here instead of here" is tricky timing because a packet may be coming in while you are trying to tell the card that.
  Of course in situations like this TCP doesn't have to be perfectly optimized since you will likely need to retransmit anyway, but it shouldn't be terrible. And the card will also need to be given a set of general buffers for packets that are not to an expected port, or where the user buffer runs out of room, etc. Then TCP has to be clever about putting those packets in the right place.
  You could have the card be smarter and actually known about where to put each packet, it couldn't even do acks and retransmit requests...but you don't want to make it too complicated. Plus I think you want to avoid having the card need to interpret any part of the packet that is encrypted during an IPSEC session (which I don't know exactly where that begins). Some cards do IPSEC in hardware but that is another issue.
  And of course this only helps if the server is CPU-bound, as opposed to disk or network etc.
  - adam
Re:FreeBSD has zerocopy sendfile... by Espen+Skoglund · 2002-05-06 22:18 · Score: 2, Interesting

And you also forgot to mention:
HISTORY
sendfile() first appeared in FreeBSD3 .0. This manual page first appeared in FreeBSD 3.1.