Slashdot Mirror


Have Sockets Run Their Course?

ChelleChelle writes "This article examines the limitations of the sockets API. The Internet and the networking world in general have changed in very significant ways since the sockets API was first developed in 1982, but the API has had the effect of narrowing the ways in which developers think about and write networked applications. This article discusses the history as well as the future of the sockets API, focusing on how 'high bandwidth, low latency, and multihoming are driving the development of new alternatives.'"

230 comments

  1. Really... by Wingman+5 · · Score: 3, Funny

    I think sockets work fi.... *connection lost, host not routable*

    1. Re:Really... by rolfwind · · Score: 4, Funny

      I think sockets work fi.... *connection lost, host not routable*

      Really, for networking, all they need to do is ask slashdot's elite technical team. Years before Gmail automatically saved my drafts, /. consistently preempted everone with the above example (or Homeland_Security/FBI/Police knocking on the door, or person getting a hard attack) and snatches the post from the jaws of defeat when the user wouldn't otherwise be able to hit submit. Moreover, unlike anyone else to this day, even gmail, there is also a nice little hint as to the cause of the interruption.

    2. Re:Really... by shentino · · Score: 4, Insightful

      Dealing with network failures isn't actually a trivial issue from the POV of an application, let alone an OS supporting it.

      http://en.wikipedia.org/wiki/Two_Generals'_Problem

    3. Re:Really... by MSDos-486 · · Score: 1

      Clearly a circuit based connection is more relia*%$^()^$ NO CARRIER

    4. Re:Really... by Anonymous Coward · · Score: 0

      he thnk

    5. Re:Really... by knutkracker · · Score: 5, Funny

      or person getting a hard attack

      Viagra overdose?

    6. Re:Really... by lskovlund · · Score: 5, Funny

      This is not funny. It's called priapism and can result in impotence or worse.

    7. Re:Really... by msuarezalvarez · · Score: 3, Funny

      Methinks someone has a sad story to tell...

    8. Re:Really... by amnezick · · Score: 0

      *rap beat in the background*
      ...
      and snatches the post from the jaws of defeat \
      when the user wouldn't otherwise be able [...] submit \
      ... aha ha ha ha ha ha haaaaa ... chkt chkt chkt chaaa

      --
      mov ax,4c00h
      int 21h
    9. Re:Really... by Anonymous Coward · · Score: 0

      LOOOOL

      I thought you were serious until I read "or worse".

    10. Re:Really... by PopeRatzo · · Score: 1

      I get it.

      --
      You are welcome on my lawn.
    11. Re:Really... by Anonymous Coward · · Score: 1, Informative

      You think the "or worse" is a joke? Apparently you didn't read the link:

      Potential complications include ischemia...the ischemia may result in gangrene, which could necessitate penis removal.

    12. Re:Really... by Anonymous Coward · · Score: 1, Funny

      Or an epic adventure where he thought he was superman, with a tragic ending

    13. Re:Really... by Anonymous Coward · · Score: 0

      ... or a VERY, VERY happy story...

    14. Re:Really... by Dishevel · · Score: 1

      Not sure that is actually worse. I mean if it doesn't work why have it?

      --
      Why is it so hard to only have politicians for a few years, then have them go away?
    15. Re:Really... by Anonymous Coward · · Score: 0

      and can result in impotence or worse.

      There's a pill for that.

    16. Re:Really... by nabsltd · · Score: 1

      So you can aim better.

      I know this is a foreign concept to many men, but it's important in the practice of writing your name in the snow.

    17. Re:Really... by mangu · · Score: 2, Funny

      it's important in the practice of writing your name in the snow

      I live in a tropical country, you insensitive clod!

    18. Re:Really... by conspirator57 · · Score: 1

      snow... shaved ice... whatever. i think you can sacrifice a few litres of ice from your margarita machine in the name of... well, it's not science, but its damn cool.

      --
      "If still these truths be held to be
      Self evident."
      -Edna St. Vincent Millay
    19. Re:Really... by Anonymous Coward · · Score: 0

      Or maybe an exceedingly climactic plot with a soft let down of an ending ?

    20. Re:Really... by Anonymous Coward · · Score: 0

      1) Piss while standing
      2) Not look like a freak in the locker room
      3) Still somewhat functional with a penile implant
      4) Keep it around in case they find a medical treatment some day.

    21. Re:Really... by mustafap · · Score: 1

      >I live in a tropical country, you insensitive clod!

      I live in a cold, wet, miserable country (UK) you insensitive clod!

      Mind you, we don't have those big spiders or nasty snakes, so scrub that. I'm happy!

      --
      Open Source Drum Kit, LPLC deve board - mjhdesigns.com
    22. Re:Really... by TerranFury · · Score: 1

      So you can aim better.

      This didn't seem to help my ex-roommate... He'd seriously leave a puddle on the floor. It was disgusting.

  2. whats really needed... by Anonymous Coward · · Score: 2, Insightful

    is no sockets. some way to seamlessly connect LOCAL processes to each other without socket overhead by using the familiar socket interface. something simpler than shared memory.
    and a better protocol method of opening sockets with the hard stuff taken care of by the OS. and with transparent buffer protection etc.

    1. Re:whats really needed... by Anonymous Coward · · Score: 5, Informative

      You mean like this? http://en.wikipedia.org/wiki/Unix_domain_sockets

    2. Re:whats really needed... by i.of.the.storm · · Score: 1

      I'm seriously confused, was the OP AC aiming for someone to post that or were they just ignorant? Parent is right either way.

      --
      All your base are belong to Wii.
    3. Re:whats really needed... by fractoid · · Score: 4, Funny

      Or more like this? :)

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    4. Re:whats really needed... by Anonymous Coward · · Score: 0

      since when do pipes/domain sockets have buffer protection ?

    5. Re:whats really needed... by Anonymous Coward · · Score: 0

      Or this? http://plan9.bell-labs.com/plan9/ Everything's a file, seriously.

  3. Open Transport, Part II by Etcetera · · Score: 5, Informative

    Been there, done that. Apple (once again) had a great implementation of an alternative technology, that it finally abandoned when it didn't feel like fighting any more.

    Open Transport (the PPC stack used in the Classic Mac OS) was fast, efficient, and cool. And based on the STREAMS methodology, the only real competition to Berkeley Sockets.

    Choice is good, mmmkay?

    1. Re:Open Transport, Part II by Anonymous Coward · · Score: 2, Funny

      As next you will probably claim Apple has invented MAC addresses too....

    2. Re:Open Transport, Part II by Anonymous Coward · · Score: 5, Funny

      Well, I did hear it was a Xerox standard so it must have been copied from someone. I guess it could have been Apple.

    3. Re:Open Transport, Part II by Concerned+Onlooker · · Score: 4, Funny

      "Well, I did hear it was a Xerox standard so it must have been copied from someone."

      I hope you meant to make that joke.

      --
      http://www.rootstrikers.org/
    4. Re:Open Transport, Part II by Anonymous Coward · · Score: 1, Insightful

      Sorry to burst your bubble, but I remember STREAMS and they sucked. I was also one of the earliest Mac programmers, and frankly, there was little to like in Classic Mac OS either, other than a lot of fiddly and clever hacks to make it fit into 128k.

      Yeah, Apple abandoned those technologies because they were tired of fighting alright... fighting their limitations and bugs.

    5. Re:Open Transport, Part II by Anonymous Coward · · Score: 3, Insightful

      Open Transport didn't come about until the mid 1990's.

      So, if you were programming for the Classic Mac OS in the 128K days, still doing that 10 years later and hating it *that* much, you probably feel like you've wasted half your life.

      Yes, you could have moved on to other, newer, more advanced operating systems, but you *chose* to stick with it. One really has to respect that I suppose.

      Shows your more masochistic side.

    6. Re:Open Transport, Part II by Anonymous Coward · · Score: 1, Insightful

      What didn't you like about STREAMS? Was it the nice architectural way you could layer processes or the efficient way you could avoid data copies?

    7. Re:Open Transport, Part II by Shin-LaC · · Score: 4, Informative

      I wrote networking code using Open Transport before I ever touched sockets, so I think I have a view as unbiased as you can get (or perhaps biased in favor of OT). I didn't mind OT, but I when I moved to sockets I was impressed with how easy and comfortable it is to work with them. The limitations of the classic Mac OS architecture probably made writing Open Transport code thornier than it would have been on a moden system, so I won't discount TLI or STREAMS in general, but I have to say that the particular implementation that was Open Transport on Mac OS is inferior to sockets on a UNIX system.

    8. Re:Open Transport, Part II by bug1 · · Score: 1

      STREAMS are overkill for simplex

    9. Re:Open Transport, Part II by k8to · · Score: 1

      Um, STREAMS lost because of its poor performance compared to sockets. This article is about improving performance. STREAMS is not the answer.

      --
      -josh
    10. Re:Open Transport, Part II by wealthychef · · Score: 3, Interesting

      I found Open Transport to be a nightmare in practice. It did everything under the sun, so in order to just open a connection, send data, and tear it down, you had to do a bunch of stuff that I really could not understand as a beginning programmer. Maybe the documentation and usability has gotten better since then, or maybe I just wasn't smart enough. At any rate, sockets are easy to use, so I was glad when they switched to a Unix with sockets.

      --
      Currently hooked on AMP
    11. Re:Open Transport, Part II by FithisUX · · Score: 1

      STREAMS are also suitable for Bluetooth. The proliferation of networking either in IP or BT calls for a unified solution. STREAMS can provide the lower level of this solution. And I believe everything should run on user space leaving buses and physical layer in the kernel. Personally I have no problem in running them also to User Space.

    12. Re:Open Transport, Part II by Anonymous Coward · · Score: 0

      Sockets can also be used for bluetooth. Some systems expose bluetooth as AF_BLUETOOTH sockets..

    13. Re:Open Transport, Part II by mzs · · Score: 3, Informative

      Sure it was cool how you could push and pop drivers (say you wanted a different line discipline) but please tell me how it prevented any copies? The AT&T implementation also had two extra context switches.

      This is what was bad about STREAMS:

      In early implementations there was no notion of multithreading so a bad thing happened later. There was a time when the STREAMS drivers and demultiplexers assumed single threaded so the kernel had to pass off to a single worker thread in the kernel everything STREAMS related. So yo had some big iron box of the time with say four processors and IO performance was just balls until the STREAMS drivers were rewritten. But then you still had that worker thread around, so one thing was that those were broken out, so there was an extra two thread switches there. Then they did some stuff variously called something like Fast STREAMS where the fast paths would not switch. So all this optimization work went in to making STREAMS fast and they were still slow. It turned-out that the reason for that was due to the complexity of the STREAMS subsystem and all the layering that caused so many extra function calls per driver. STREAMS have largely been relegated to legacy and conformance at this point.

    14. Re:Open Transport, Part II by Anonymous Coward · · Score: 0

      I cannot find the paper off hand, but the streams model of TCP networking is inherently performance challenged.

    15. Re:Open Transport, Part II by nikanth · · Score: 1

      Apple failed?! Is it because apple was unable to make curvy shiny cases/UI for their API? If they can, they can sell any crap!

  4. RFC 1925 by Endymion · · Score: 4, Insightful

    This seems to dance a bit too close to Networking Truths 6a, 11, and possibly 12. I will reserve judgment until I see solid real-world evidence.

    --
    Ce n'est pas une signature automatique.
    1. Re:RFC 1925 by dbIII · · Score: 3, Interesting

      There are some sitautions where it isn't the best choice. In very simple clustering they just may not be enough sockets. For instance one package uses "rsh" up to around 512 hosts beyond which it doesn't work reliably unless you use "ssh" and a single socket. Of course "rsh" access scares people for plenty of other good reasons but that's a point best discussed elsewhere.

    2. Re:RFC 1925 by Endymion · · Score: 4, Insightful

      Yes, there are always pathological cases that demonstrate the weaknesses of any technique. A big point I take away from RFC1925 (and personal experience), is that you have to A) recognize that trade-offs are always going to be made, and B) adapt your implementation to fit the laws of physics, instead of trying to bend the network to fit what you think an implementation should be.

      The simple fact is that Sockets have worked very well for a long time. Yes, this sometimes means you have to shape your design and implementations to fit the "socket style", and history has shown that it is not only possible, but practical. Changing to a new design will not remove the fact that if you design your protocol/app badly, or are inherently in a pathological use-case, then your network performance will suffer.

      For some problems, the ssh idea of multiplexing a single socket works well. For others, multiple rsh (*1) style work better. To say that Sockets need to be replaced because you chose to use rsh for your transport is an amazingly arrogant (*2) position. And yes, some of this is "tradition" and inertia, but designing a whole new library should be for significant real-world benefit, and not for corner-cases or maginal 1% gains.

      Of course, if someone can actually produce some real-world benchmarks that validate the "let's ditch Sockets" claim...

      [*1] As with you, this is totally ignoring the security implications, etc.
      [*2] In no way is this a personal attack at you; I mean it in a purely academic sense. It's a very tall claim to say that decades of networking history, and thousands of talented engineers were wrong.

      --
      Ce n'est pas une signature automatique.
    3. Re:RFC 1925 by dbIII · · Score: 2, Interesting

      Yes, that's why I said "some". Just like the guys that wrote clustering software that is really just "rsh" and couldn't imagine anyone running it on a couple of thousand nodes it looks like the author hit a case where it really should have been done another way. Good answer above, however what I really was doing was trying to show a way that sockets can be used badly or used well.

    4. Re:RFC 1925 by ThePhilips · · Score: 2, Interesting

      Of course, if someone can actually produce some real-world benchmarks that validate the "let's ditch Sockets" claim...

      There are really few real world example where you can do something better than sockets.

      BSD sockets are quite versatile API. I have programmed them on both side - implementing my own protocol/address family and actually using them in program - and hardly see how one can do it better, maintaining level of guarantees provided by the API. And the level of guarantees what makes it possible to develop applications behaving reliably/predictably under ever varying conditions - and not loose your sanity in the process.

      Also what many novice forget that sockets support a number of assertions application can make on sync/async error handling. IOW, one can easily improve performance of BSD socket by simply removing error handling. But something tells me that no-one's gonna do it.

      --
      All hope abandon ye who enter here.
    5. Re:RFC 1925 by Anonymous Coward · · Score: 1, Interesting

      [*1] As with you, this is totally ignoring the security implications, etc.

      If you can break our firewall and then escape with any significant fraction of our petabytes of information because our using of rsh is a security problem, then we will thank you and give you a job.

      [*2] In no way is this a personal attack at you; I mean it in a purely academic sense. It's a very tall claim to say that decades of networking history, and thousands of talented engineers were wrong.

      As implemented, sockets have limitations. On large scales, we run out of them. The number of file descriptors used to be an issue, now its the number of sockets.

      ssh is NOT an option, because the handshaking and key exchanges are orders of magnatude too slow to be of any use on large scale. And does anybody believe that ssh w/o a passphrase on the private key is more secure than hostbased rsh authentication on a private network?

      Netbook standards do not apply to petascale computing.

    6. Re:RFC 1925 by mzs · · Score: 1

      There is always UDP (you can use the same socket for many 'connections'). It really is not too hard to come-up with a little UDP protocol that has a message id and time-out with a retry that gets you really what you need. You can even leverage multicast then. Also if are on a cluster where you can get everyone to use a jumbo frame there is zero copy UDP on both FreeBSD and Solaris. Also at least many of the Solaris (this is a hit or miss feature on other OSs) nic drivers are able to turn down the interrupt rate way down by polling every now and then under heavy load instead of interrupting on every frame.

      There is also RAW sockets and SCTP but I have found UDP+multicast work great.

  5. Re:haha by Anonymous Coward · · Score: 2, Insightful

    There has been an alternative all the time:
    http://en.wikipedia.org/wiki/Transport_Layer_Interface

  6. Hilarious by karmaflux · · Score: 5, Insightful

    This guy's worried about "narrowing the ways in which developers think about and write networked applications" in a world where people are reinventing wall(1) as twitter, IRC as friendfeed, and other web 2.0 'innovations.' You want to widen developers' thinking about networking? Leave sockets alone and close off port 80.

    --

    REM Old programmers don't die. They just GOSUB without RETURN.

    1. Re:Hilarious by noidentity · · Score: 1

      Anyway, isn't "narrowing the ways in which developers think about and write [type of] applications" another way of saying it abstracted things?

    2. Re:Hilarious by convolvatron · · Score: 1

      yes. but it really helps if its the right abstraction

    3. Re:Hilarious by Darinbob · · Score: 3, Insightful

      And many these new abstractions as described in the article could be built on top of an OS supplied socket-like API.

    4. Re:Hilarious by drmofe · · Score: 1

      ...WoW and every other of that ilk as a reinvention of MUD (Bartle, R)

    5. Re:Hilarious by Anonymous Coward · · Score: 1, Insightful

      Twitter seems more .plan than wall... but I agree ;-)

    6. Re:Hilarious by oldhack · · Score: 1

      "REM Old programmers don't die. They just GOSUB without RETURN."

      Or their stack overflowed.

      --
      Fuck systemd. Fuck Redhat. Fuck Soylent, too. Wait, scratch the last one.
    7. Re:Hilarious by Zoxed · · Score: 2, Funny

      > reinventing...

      and USENET as Web Forums :-(

    8. Re:Hilarious by Nocturna81 · · Score: 3, Funny

      alt.news.slashdot?

    9. Re:Hilarious by Steauengeglase · · Score: 3, Insightful

      Personally I don't use the service, but I'm not sure if I buy a lot of the ideas people have about Twitter (all about ego, vidiots, convergence wackos who want to tack myspace on to your toaster). I'll agree that it is a lot like the .plan updates of old, but deep down it seems more like a hack or set of hacks than a full reimplementation of anything.

      Would you rather send out a mass text message, possibly costing your non-text messaging friends hundreds of dollars (those $1/text costs gather pretty quick) or post something on Twitter that he can either look at on his PC or smart phone with unlimited data? Then tinyURL fits in another cheap hack. Sure it makes it easier to fit the URL in your twit (saying that just doesn't feel right), but it also allows Bob to look at that YouTube you sent him at work via redirect. All of this isn't anything new, it is just people coping with changes in the landscape.

    10. Re:Hilarious by jedidiah · · Score: 1

      Actually, some MMOG developers are infact ex-MUD developers.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    11. Re:Hilarious by dwye · · Score: 1

      > alt.news.slashdot?

      I should think comp.news.slashdot, surely!

    12. Re:Hilarious by An+ominous+Cow+art · · Score: 3, Insightful

      I'd vote for talk.bizarre.slashdot

    13. Re:Hilarious by paitre · · Score: 1

      Raph Koster of Ultima Online fame, for example.

      I forget which MUD he was originally behind, tho... that was a good 15 years ago now. lol

    14. Re:Hilarious by Anonymous Coward · · Score: 0

      alt.binaries.pictures.erotica.slashdot !!

    15. Re:Hilarious by xenocide2 · · Score: 1

      Twitter is basically a terrible implementation of IRC, squashed to fit into SMS's horribly stupid and anticonsumer protocol. 20 cents for 140 bytes is not the sort of technology we should be organizing around.

      The above is paragraph in fact wouldn't fit into twitter or SMS, and I think that pretty much says it all.

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

    16. Re:Hilarious by RocketRabbit · · Score: 1

      I think you have hit upon a potential gold mine of an idea here.

    17. Re:Hilarious by chromas · · Score: 1

      Eww!!

    18. Re:Hilarious by Nocturna81 · · Score: 1

      Aarrgghh! Naked Cowboy Neal! The goggles, they do nothing!!

    19. Re:Hilarious by geekgirlandrea · · Score: 1

      Ewwww. Please kill me now. I can't bear to live with those mental images.

      On the other hand, alt.binaries.pictures.erotica.unix really does exist. :)

    20. Re:Hilarious by Anonymous Coward · · Score: 0

      Would you rather send out a mass text message, [...] or post something on Twitter that he can either look at on his PC or smart phone with unlimited data?

      I'd vote for (c): do neither because me taking a healthy shit (or whatever similar tripe people fill these tweets with) isn't really broadcast-worthy.

      Face it people: the minutiae of your life isn't that interesting to anyone but you.

    21. Re:Hilarious by idontgno · · Score: 1

      My program initialization contains an "ON ERROR RESUME NEXT".

      --
      Welcome to the Panopticon. Used to be a prison, now it's your home.
  7. Which sockets API? by PhrostyMcByte · · Score: 4, Informative

    There are Berkeley sockets which are relatively portable, and then there are extremely platform-specific APIs for high performance and scalability. The old API might have run it's course, but most of the new ones are still relevant. Things like asio are helping to merge all the differences into one nice API.

    1. Re:Which sockets API? by Anonymous Coward · · Score: 5, Interesting

      The Berkeley socket API has stood up very well against the tests of time, and it is fairly lean and quite versatile, but yeah, there's definitely room for newcomers.

      For example, when it comes to high packet rates - say, thousands of VoIP RTP streams - the length of the typical path a packet takes through the kernel layers becomes quite prohibitive.

      I've been trying to reach gigabit ethernet saturation with G711 VoIP RTP streams (that is, 172-byte UDP packets @ 50Hz per stream), which works out to a theoretical maximum of 10500 streams - 525000 packets/second. My initial speed tests, with minor tweaking, got me around 1/10th of that, thanks to all the kernel overhead, and the lack of control over how and when packets will be sent.

      So I wrote my own socket-> UDP-> IP-> ARP-> Ethernet abstraction which hooks directly into the PACKET_MMAP API (as used by libpcap), with the TX Ring patch, and with all the corner-cutting I managed to achieve 10000 streams (500k packets/sec) which equates to about 95% of the theoretical peak.

      In short, we probably need more widespread support for different network programming APIs which address more specific needs - BSD sockets are too generalised sometimes.

    2. Re:Which sockets API? by LSD-OBS · · Score: 3, Interesting

      Stupid thing posted me anonymously despite being logged in!

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    3. Re:Which sockets API? by ThePhilips · · Score: 1

      Ignore the RTFA. Quote:

      ... the calling program must repeatedly ask for data to be delivered.

      I presume that the date on the article is off by 10 years or something. I make the judgment based on the facts that the author calls SCTP "recently developed" and apparently never heard of /dev/epoll or kqueues (or e.g. libevent allowing to use them in portable manner).

      --
      All hope abandon ye who enter here.
    4. Re:Which sockets API? by Midnight+Thunder · · Score: 2, Funny

      Stupid thing posted me anonymously despite being logged in!

      It was deemed you already had too much Karma. That was a test of the new Karma limitation system ;)

      --
      Jumpstart the tartan drive.
    5. Re:Which sockets API? by Luyseyal · · Score: 2, Interesting

      Sounds like a new achievement "Too much karma: Enlightenment to Anonymous Cowardom"

      -l

      --
      Help cure AIDS, cancer, and more. Donate your unused computer time to worldcommunitygrid.org. Join Team Slashdot!
    6. Re:Which sockets API? by Nevyn · · Score: 1

      Actually he did mention "kevents", which even with a name noone uses is probably still worth +1/2 a clue. Saying that I didn't see any mention of epoll/sendfile/splice/tee/TCP_CORK/TCP_CONGESTION/TCP_DEFER_ACCEPT/aio ... so I think it's just the usual uninformed crap from ACM, from someone who has glanced at the FreeBSD kernel.

      --
      ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
    7. Re:Which sockets API? by metamatic · · Score: 2, Insightful

      Well, yeah. When I read the article, my immediate thought was "So, implement your fancy special-purpose socket replacement on top of UDP."

      --
      GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
    8. Re:Which sockets API? by LSD-OBS · · Score: 1

      Well it wasn't exactly an overnight hack, heh! I would really be a lot more comfortable if something similar, widespread, and well tested was in existence. It's kinda scary ripping the networking code out of my other more mature projects which are in full production in order to test this.

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    9. Re:Which sockets API? by shutdown+-p+now · · Score: 1

      When I read the article, my immediate thought was "So, implement your fancy special-purpose socket replacement on top of UDP."

      Do you mean "TCP replacement on top of UDP"? Because Berkeley sockets API covers UDP too, you know.

      And, of course, this does nothing for those scenarios where the API really is what is limiting (usually when it comes to performance).

    10. Re:Which sockets API? by mzs · · Score: 1

      I am very impressed with what you described. Are you in a position to share the code?

    11. Re:Which sockets API? by Lord+Ender · · Score: 1

      Yeah, right. Trying to take credit for this fine AC's post, I see. We see through your ruse.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    12. Re:Which sockets API? by LSD-OBS · · Score: 1

      No, I'm only talking about UDP - specifically, moving and routing large numbers of media streams. I really hope you don't think that I have no idea what a SOCK_DGRAM is :-)

      The can of worms that is TCP is not something I have the energy or need (thankfully) to re-implement!

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    13. Re:Which sockets API? by LSD-OBS · · Score: 1

      It's something I've done on company time, but my director has expressed interest in opensourcing some of this stuff (he likes the 'many eyes' idea), so in the near future we'll make it available somehow, when I'm sure it's properly threadsafe (lockless!) and works in a few other environments. Don't think that helps you right now though, sorry!

      That said, the real magic comes from the packet mmap API with Johann Baudy's TX Ring patch. The stuff I've been doing is encapsulating this down-to-the-metal interface with something approximating really simple sockets, with all the other support stuff (ARP, device and OS IP route enumeration, etc) that you need in order to use it as a general IP-over-Ethernet stack.

      Because our situation involves high pps I've spent some time keeping the per-packet stuff simple and therefore quick - eg, on 1x2.5Ghz core: 180m ARP lookups/sec, 200m route lookups/sec.

      I'm sure standard BSD sockets would perform in the same ballpark if they didn't have such broad application!

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
    14. Re:Which sockets API? by mzs · · Score: 1

      Tell him of outside interest. Also there could be a good paper in this to various conferences with notes about how it was related to research supported by said company. That always looks good.

    15. Re:Which sockets API? by nwf · · Score: 1

      Indeed, I've come to like asio. I started using it before it became part of Boost, and it just works. Very minimal code, particularly coupled with Boost's threading. Works on Linux and Mac OS, which is all I care about.

      Way, way better than Open Transport on the Mac. That was just horrible in every way imaginable.

      --
      I don't know, but it works for me.
    16. Re:Which sockets API? by Anonymous Coward · · Score: 0

      Stupid thing posted me anonymously despite being logged in!

      When you reach Enlightenment, you lose your individual identity and become part of the Collective.

    17. Re:Which sockets API? by LSD-OBS · · Score: 1

      Whoops, you weren't talking to me, sorry!

      --
      Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
  8. wrong by jipn4 · · Score: 4, Interesting

    Although the addition of a single system call to a loop would not seem to add much of a burden, this is not the case

    Really? For a lot of networking code that's in use these days, I don't see that the system call overhead is the bottleneck. On clients you usually have network bandwidth as the limiting step (rather than system calls). On servers, it usually seems to be disk access or HLL interpreters.

    Each system call requires arguments to be marshaled and copied into the kernel, as well as causing the system to block the calling process and schedule another.

    That's easy to fix without changing the socket API: just add a system call that can return multiple packets from multiple streams simultaneously, a cross between select and readv. If there's a lot of data buffered in the kernel, it can then return that with a single system call.

    Solving this problem requires inverting the communication model between an application and the operating system.

    Not only does it not require that, inversion of control doesn't even solve it, since you still have the context switches.

    1. Re:wrong by jipn4 · · Score: 2, Interesting

      Oops... left out half of it...

      That's easy to fix without changing the socket API: just add a system call that can return multiple packets from multiple streams simultaneously, a cross between select and readv. If there's a lot of data buffered in the kernel, it can then return that with a single system call. The user mode socket library can use that system call internally and still present every caller with the regular select/poll/socket abstraction; when callers request data, it first returns data that's already buffered in the process without another system call, and when it runs out of that, then it calls back into the kernel.

    2. Re:wrong by convolvatron · · Score: 3, Insightful

      if this were all in one domain, the most flexible and efficient thing would be to have memory for receive frames allocated at the bottom of the stack, and use callbacks all the way up.

      because of the user kernel boundary we have a copy which is difficult to get around (put the next 1k bytes exactly here, although i really dont care), and some unfriendly and inefficient hacks to weasel around the 'natural' blocking semantics.

      even if its completely academic, i think its interesting to look at the user kernel boundary and try to refactor things which have negative structural impacts.
       

    3. Re:wrong by jipn4 · · Score: 4, Interesting

      even if its completely academic, i think its interesting to look at the user kernel boundary and try to refactor things which have negative structural impacts.

      And you think that 2009 is the first time people think about this? System call overhead used to be a much bigger issue. UNIX and Linux has the current set of interfaces because they are a good compromise between simplicity and efficiency.

      And these issues are constantly being evaluated implicitly: people who write network servers benchmark their code and find the bottlenecks. If the bottleneck is some system call, they complain to the kernel mailing list and maybe roll up their sleeves and come up with something new. If that turns out to be useful, more and more people ask for it to be put into the kernel, and eventually it becomes standard.

      What motivates kernel developers is real benchmarks and the needs of important, real-world applications, not fluff pieces that express generic displeasure with the way things are done.

    4. Re:wrong by jipn4 · · Score: 4, Insightful

      the most flexible and efficient thing would be to have memory for receive frames allocated at the bottom of the stack, and use callbacks all the way up.

      Sure, in the same way that the "most flexible and efficient thing" would be to write inassembly language and turn off the MMU. But UNIX is not trying to do the most flexible and efficient thing, it's trying to be a reasonable tradeoff between simplicity, safety, and efficiency. And that means that efficiency only gets optimized to the point where it stops being a limiting factor for most programs.

    5. Re:wrong by convolvatron · · Score: 4, Interesting

      no. in fact i can remember having discussions myself about this more than 20 years ago, and those were hardly the first.

      unix has these interfaces as a matter of historical accident, what was an excellent design at the time. its hardly the only good point in the space.

      you might find that it helps to think about these thing..even when developing important, real-world applications. why shouldn't the kernel be able to call into userspace safely and transfer ownership of a buffer? is that really so terrible to consider?

    6. Re:wrong by Darinbob · · Score: 4, Interesting

      But socket-like interfaces exist on systems without any user kernel interface. Ie, embedded systems. Many of those have implementations that do a good job of avoiding extra data copying, and yet still have an API that resembles sockets. I wonder if people are confusing the general idea of "sockets" with the specific "Berkeley Sockets" implementation and specification?

    7. Re:wrong by convolvatron · · Score: 1

      but in this case we have structural flaws, which as you point out have some workarounds..some of which have their own problems. it seems reasonable to think about other approaches. i'm not going to buy into the tablets brought down from the berkeley hills.

      gnn isn't really advocating throwing out sockets, you'll have to blame chellechelle for the inflammatory headline. queue is exactly that, a forum for discussing practice, and not a very deep one at that.

      go ahead and live with your select and poll variants, they really aren't that bad. but i dont think they are the best that can be imagined.

    8. Re:wrong by jipn4 · · Score: 2, Insightful

      but in this case we have structural flaws

      Not conforming to someone's pipe dream of kernel design is not a flaw. It's a flaw only if it demonstrably causes problems.

      i'm not going to buy into the tablets brought down from the berkeley hills.

      That's why they make all kinds. You're free to use Windows Vista; those people spend billions correcting supposed "structural flaws". Don't spoil UNIX or Linux for the rest of us. We like its "structural flaws" the way they are.

    9. Re:wrong by jipn4 · · Score: 4, Insightful

      unix has these interfaces as a matter of historical accident, what was an excellent design at the time.

      No, UNIX has these interfaces because they get the job done. People tried all sorts of other interfaces and none of them caught on.

      you might find that it helps to think about these thing..even when developing important, real-world applications.

      How does it "help" me to think about solutions to problems I'm not having? I've never seen the socket interface to be rate limiting in anything I care about.

      why shouldn't the kernel be able to call into userspace safely and transfer ownership of a buffer? is that really so terrible to consider?

      Well, if that's your biggest itch, be my guest: implement a kernel patch, make it public, convince people to use it, and if it develops a large user community, maybe Linus will pick it up and it will become a standard part of the kernel.

      If nobody is willing to put in the effort, evidently the feature isn't needed.

    10. Re:wrong by Endymion · · Score: 1

      No, UNIX has these interfaces because they get the job done.

      I think I'm going to have to add to my list of RFC1924 issues with this proposal...

      "(1) It Has To Work."

      I've never seen the socket interface to be rate limiting in anything I care about.

      This whole topic stinks of a really bad case of Premature Optimization.

      --
      Ce n'est pas une signature automatique.
    11. Re:wrong by TheThiefMaster · · Score: 5, Informative

      Windows' solution is pretty nice. You can pass a pre-created socket handle to accept_ex, which automatically accepts an incoming connection using that socket handle, so that you don't have to use two system calls (select and accept). You can also pre-accept multiple sockets, instead of having to make the system calls under load.
      Sockets can also be closed with a "re-use" flag, which leaves the handle valid and saves making a system call to create another.

      You then associate the sockets with an "IO completion port", which as best as I can tell is a multithreaded-safe linked list for really fast kernel to user program communication. To receive from the socket you make an async receive call, giving a pointer to a buffer to receive into.
      Whenever data is received on those sockets (and has had a corresponding async request made for it already) the kernel automatically queues the socket handle to that linked list. If you associate a socket with the completion port before you accept a connection with it (i.e. you're using acceptex) it also triggers when the socket accepts a connection.
      In the user code, you run multiple threads listening on the completion port (you can also use the completion port in the thread pooling API, which runs two threads to each cpu core by default). When a message arrives from the kernel, the most recently finished thread wakes and processes the received data, which will already be in the user-space buffer you provided in the original receive call.

      If all threads are busy and there are messages in the completion port they will bounce right off of the completion port, picking up the next bit of completed IO they need to process without making a system call.

    12. Re:wrong by RAMMS+EIN · · Score: 4, Interesting

      ``Windows' solution is pretty nice. You can pass a pre-created socket handle to accept_ex, which automatically accepts an incoming connection using that socket handle, so that you don't have to use two system calls (select and accept). You can also pre-accept multiple sockets, instead of having to make the system calls under load.
      Sockets can also be closed with a "re-use" flag, which leaves the handle valid and saves making a system call to create another.

      You then associate the sockets with an "IO completion port", which as best as I can tell is a multithreaded-safe linked list for really fast kernel to user program communication.''

      I don't know. To me, it all just sounds like kludges to work around the facts that system calls are slow and that the implementation of the Berkeley API causes many system calls. You are adapting the structure of your program to code around the problems, instead of fixing the problems that cause the natural style of your program to lead to slowness.

      There is nothing in the Berkeley socket API that mandates system calls or context switches. At worst, some copying is necessary (because the API lets the caller specify where data are to be stored, instead of letting the callee return a pointer to where data are actually stored).

      The reason we have system calls and context switches, I claim, is that we are using unsafe languages. Because of this, applications could contain code that overwrites other programs' memory. We don't want that, and we have taken to separate address spaces to avoid it. The separate address spaces are enforced by the hardware, but this has a price, especially on x86. Perhaps it is time to rethink the whole "C is fast" credo. As the number of work instructions that can be executed in the time it takes to do a context switch increases, so does the relative performance of systems that do not need context switches, but of course we can only do away with context switches if we can provide safety guarantees in another way. One way would be to have the compiler enforce them. But that is outside the scope of Berkeley sockets, of course.

      --
      Please correct me if I got my facts wrong.
    13. Re:wrong by Anonymous Coward · · Score: 0

      This isn't remotely feasible, and suggests that you don't understand Unix I/O very well (or at all).

      For a start, a socket can be shared by more than one process, and you don't know which process should receive the data until the process actually read()s it.

    14. Re:wrong by Anonymous Coward · · Score: 0

      why shouldn't the kernel be able to call into userspace safely and transfer ownership of a buffer? is that really so terrible to consider?

      Rootkit much?

    15. Re:wrong by Anonymous Coward · · Score: 0

      Like this?

    16. Re:wrong by Anonymous Coward · · Score: 0

      and at some point, it's worth just getting a faster computer. Do you spend your effort on improving the performance of the software, or do you spend it on improving the performance of the hardware.

    17. Re:wrong by Directrix1 · · Score: 1

      This has nothing to do with C. C is nothing without specific libraries. If interfacing with this system is slow, then that system is slow.

      --
      Occam's razor is the blind faith in the natural selection of least resistance and in universal oversimplification. -- EF
    18. Re:wrong by hughk · · Score: 1

      You can write without any memory management. You can even write it in assembler. You will be faster but less resilient. Yep, you can write code in your super-safe language but ultimately you are talking about buffers that change ownership and not so easy to implement in a way can be implemented without synchronisation problems.

      --
      See my journal, I write things there
    19. Re:wrong by Anonymous Coward · · Score: 0

      You've missed the point. The most CPU intensive part of the process is copying the data from kernel buffers to userspace buffers, something you cannot avoid with read/readv style API.

    20. Re:wrong by Nigel+Stepp · · Score: 1

      You make it sound like we do context switches soley for (and because of) memory protection reasons.

      How do you run many programs "at once" on a single CPU without context switches? (or n programs on m<n CPUs, for that matter)

      To get rid of context switches, you need to get rid of a lot more than separate address space.

      --
      4096R/EF7BAFA6 79E1 DF98 D09D 898F 9A11 F6F0 DDDC 23FA EF7B AFA6
    21. Re:wrong by Anonymous Coward · · Score: 0

      How do you run many programs "at once" on a single CPU without context switches?

      Obviously you need a context switch, but it doesn't have to do anything but swap the stack and CPU registers and it can run entirely in user space. In particular, you can avoid the expensive remapping of virtual memory. See my previous reply to the GP.

    22. Re:wrong by ClosedSource · · Score: 1

      I think some people use socket-like API's in embedded systems because they're used to it from Unix, not because it's really needed.

      Those of us who did our first professional programming in embedded systems (and thus have no general purpose OS legacy bias) are more likely to see embedded APIs as code bloat. Of course embedded resources are so cheap today that code bloat doesn't matter that much. In addition, if the "embedded system" is really a scaled-down general purpose computer, than APIs are useful for the same reason they are in desktop and server systems.

    23. Re:wrong by Darinbob · · Score: 1

      Socket like APIs are used in embedded systems because that's both what many network stacks provide, and because they're extremely simple to use. Alternatives I've seen here tend to be on network devices (ie, routers) where the team wrote their own network stack, or real time applications (VoIP, video streaming). But alternatives can be a clumsy solution to a general network stack provided with an RTOS or third party library because of the extra learning time involved.

      Simplicity is a big factor. That's why Berkeley Sockets became popular and AT&T STREAMS did not. STREAMS had some nice concepts in the abstract, but was more difficult to use.

    24. Re:wrong by mzs · · Score: 1

      Sure you can. Check-out zerocopy on FreeBSD and Solaris. Also mmap and sendfile has done the zerocopy tcp send half on Linux since 2.4 and there are some more hackish zerocopy read half things. Also you can already limit syscalls with TCP_CORK (and on the BSDs the similar TCP_NOPUSH).

    25. Re:wrong by kelnos · · Score: 1

      Er, no, it has nothing to do with the language. It has to do with the kernel not trusting the code that's running in userspace. And why should it? No matter what *language* you write in, it still gets compiled down to the same (or similar) machine code, or it gets run on an interpreter that boils it all down to machine code.

      A new language doesn't solve this... fundamentally changing how user programs interact with the kernel might, but it may mean a completely new way of thinking both by the developer and OS.

      --
      Xfce: Lighter than some, heavier than others. Just right.
    26. Re:wrong by Anonymous Coward · · Score: 0

      Although the addition of a single system call to a loop would not seem to add much of a burden, this is not the case

      Really? For a lot of networking code that's in use these days, I don't see that the system call overhead is the bottleneck.

      IRC servers have repeatedly run into system limitations for years, including the number of syscalls necessary to get things done.

    27. Re:wrong by lawpoop · · Score: 1

      unix has these interfaces as a matter of historical accident, what was an excellent design at the time. No, UNIX has these interfaces because they get the job done. People tried all sorts of other interfaces and none of them caught on.

      Is this really evidence that there isn't a better solution out there, or rather that it's hard to unseat an established standard?

      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
    28. Re:wrong by Anonymous Coward · · Score: 0

      Is this really evidence that there isn't a better solution out there, or rather that it's hard to unseat an established standard?

      Define "better". Do you mean "easier"? "More efficient"? "Cleaner"? "More versatile"? System V streams were "better" in some of those ways, and people gave them a serious try. The fact that nothing else has caught on tells you that nothing is sufficiently "better" for most people in order to make it worth switching.

  9. Structured Stream Transport by ace123 · · Score: 5, Informative

    BSD sockets have a limitation of only a single stream at a time (for example, if you are loading a website over HTTP and you get stuck loading a huge image, you have no choice but to open up another socket connection or else wait). They are also stuck around the paradigm of only supporting byte streams, which means that users are always forced to write the same code over and over to create packet headers or delimited messages.

    I would highly recommend checking out Structured Stream Transport. I'm not from MIT and I wasn't entirely satisfied with their sample implementation, but the paper is really insightful and explains how you can develop basically a smarter version of TCP that is both more efficient and also more flexible. And I'm sure there are other systems being developed with similar ideas in mind.

    We definitely need to keep bsd sockets, if not just because I'm a regular user of netcat :-p, and also because they are what allow the creation of more advanced protocols, but I don't think most applications should still be using such low-level protocols today.

    1. Re:Structured Stream Transport by Anonymous Coward · · Score: 5, Informative

      TCP != BSD Sockets

      No matter how much abstraction you pile on top to open multiple streams, automatically add headers, communicate a fix message size to avoid in-band delimiters, etc., you'll still have to send all those messages over linear octet streams when using TCP.

      Now you could choose not to use TCP -- UDP lets you send non-linear messages of arbitrary size without delimiters. And there may be other newer, better options available as well. But you can do both TCP and UDP (as well other other comm types) using the same sockets API.

    2. Re:Structured Stream Transport by Anonymous Coward · · Score: 5, Informative

      BSD sockets have a limitation of only a single stream at a time (for example, if you are loading a website over HTTP and you get stuck loading a huge image, you have no choice but to open up another socket connection or else wait).

      No it doesn't. This is a limitation of TCP. You could just as easily use a different protocol (e.g., SCTP) with sockets.

    3. Re:Structured Stream Transport by serviscope_minor · · Score: 4, Insightful

      It is said that those who do not understand history are doomed to repeat it...

      They are also stuck around the paradigm of only supporting byte streams, which means that users are always forced to write the same code over and over to create packet headers or delimited messages.

      Byte streams is one of the UNIX fundamantals. Before UNIX, there were many systems which provided wide varieties of structured IO. This turned out to be a real pain and one of the UNIX innovations was simply to scrap it.

      Ans today, most applications don't use low level sockets: they cal a library which does it for them. Moving the library in to the kernel is not a good idea.

      --
      SJW n. One who posts facts.
    4. Re:Structured Stream Transport by ace123 · · Score: 1

      I honestly had never heard of SCTP before, and I'm surprised that it is not used more widely since it has been around since 2000. It looks to be more complicated than what I was talking about since it covers more issues (talking to multiple hosts). Do you happen to know of any uses of this protocol in real applications?

      BSD Sockets themselves are very flexible, I suppose I was complaining about the read/write semantics in stream sockets. Either way, it is possible to layer protocols even at the application level so it's not a big deal. Sadly I didn't get a chance to read the article before acm.org died.

    5. Re:Structured Stream Transport by ace123 · · Score: 2, Interesting

      I definitely agree with you. In fact byte streams being a fundamental part of POSIX is one thing I love and make use of every day, for example piping output between programs/sockets. My post was not very clear, but I was trying to say that users developing application protocols should not be using BSD sockets directly any more--people usually write or use libraries for that sort of thing.

      As far as new protocols go, you can build basically anything using UDP (and UDP is far less likely to be firewalled than any custom IP-level protocol you make up). I think such a protocol could only ever be practically implemented user-space library anyway

      I would be curious what the article thinks is so fundamentally wrong with the sockets paradigm.

    6. Re:Structured Stream Transport by amorsen · · Score: 3, Insightful

      I honestly had never heard of SCTP before, and I'm surprised that it is not used more widely since it has been around since 2000.

      Firewalls don't support it. Consumer routers can't do NAT on it. New protocols on the Internet are fairly unlikely to have a chance.

      --
      Finally! A year of moderation! Ready for 2019?
    7. Re:Structured Stream Transport by Kjella · · Score: 1

      It is said that those who do not understand history are doomed to repeat it... (...) Byte streams is one of the UNIX fundamantals. Before UNIX, there were many systems which provided wide varieties of structured IO. This turned out to be a real pain and one of the UNIX innovations was simply to scrap it.

      Ah, the "It's no longer our problem, thus the problem is solved" approach. While maybe it shouldn't be in the kernel, there's some things there should be only one of and basic messaging/IPC is one of them - looking at the wikipedia page there's more than two dozen listed and probably doesn't include the ancient pre-UNIX ways. It looks like finally the open source world is starting to settle on D-Bus as the core backend (Gnome, KDE and Win/Mac support) but that it's taken 40 years to get there exactly because UNIX/POSIX just left it at byte streams.

      Maybe I'm thinking more in terms of a development platform than an OS platform, but just giving you the bare metal and "the rest can be done by libraries" is not my idea of a good solution. C++ is the worst example of a DIY library project, personally I prefer using C++/Qt but if I didn't I'd probably go with Java or C#. You really shouldn't have to go outside the standard library (for me adopting in the Qt library) to get what I'd consider programming "primitives", for advanced definitions of primitive. Libraries should be more about taking primitives and creating "modules" that you can drop in to get specific bits of functionality, not just e.g. boost to get a decent threading.

      But then, I never was one of the C++ developers that love to fumble about with clever memory tricks and pointer magic. I'm much more concerned with getting the high-level right and not optimizing some algorithm way down there.

      --
      Live today, because you never know what tomorrow brings
    8. Re:Structured Stream Transport by Phs2501 · · Score: 4, Funny

      Firewalls don't support [SCTP]. Consumer routers can't do NAT on it. New protocols on the Internet are fairly unlikely to have a chance.

      This is a good example of why NAT sucks. When IPv6 comes along and and restores true end-to-end connectivity across the Internet, there will be a lot more freedom to experiment with new and interesting protocols. Except, of course...

      New protocols on the Internet are fairly unlikely to have a chance.

      Damn.

    9. Re:Structured Stream Transport by Anonymous Coward · · Score: 0

      I honestly had never heard of SCTP before, and I'm surprised that it is not used more widely since it has been around since 2000.

      Firewalls don't support it yet. Consumer routers can't do NAT on it right now. New protocols on the Internet are fairly unlikely to have a chance.

      Fixed that for you.

      The protocols will be added eventually. Both Linux and BSDs have SCTP in the kernel now, and quite a few routers use them as the base OS (Linux->Linksys, FreeBSD->Juniper's JUNOS).

      It'll trickle out eventually, even if it does take a few years.

    10. Re:Structured Stream Transport by mishehu · · Score: 1

      I'm not sure about firewalls not supporting sctp... I seem to have a file named "nf_conntrack_proto_sctp.ko" and another named "xt_sctp.ko" on my system. Perhaps you meant cheap POS firewalls for grandma. But I'm not sure how SCTP solves the underlying issue of sockets... SCTP still uses sockets.

    11. Re:Structured Stream Transport by Panaflex · · Score: 2, Interesting

      if you are loading a website over HTTP and you get stuck loading a huge image, you have no choice but to open up another socket connection or else wait

      I think your confusing the HTTP protocol with BSD sockets. Your example is an HTTP 1.0 limitation, check out HTTP pipelining.

      A socket is at it's very basic a read/write file handle. You can implement asynchronous handling, write your own protocol and do lots of extreme goodness. If you choose to be protocol stupid about how you transport your data then you live with the consequences.

      As a network protocol engineer, you must look at minimum guaranteed latency, pick an average guaranteed bandwidth and taylor your protocol & packet sizes as necessary.

      Writing a protocol is difficult when you care about performance and error handling.

      IMHO, HTTP should have allowed a UDP pipelined transport mode . The overhead savings would have been worth the hassle.

      --
      I said no... but I missed and it came out yes.
    12. Re:Structured Stream Transport by Panaflex · · Score: 1

      I think you mean "a limitation of HTTP." TCP is a linear stream - but multiplexing data over TCP streams is certainly an option, as HTTP pipelining has shown.

      --
      I said no... but I missed and it came out yes.
    13. Re:Structured Stream Transport by isj · · Score: 1

      Do you happen to know of any uses of this protocol in real applications?
      Diameter applications (rfc3588). SS7 can be transported over SCTP.
      I have been in contact with people using SCTP for high-performance cluster computing, but I don't know the details.

      There are experimental work on HTTP-over-SCTP. At one point it seemed that streaming applications could benefit from SCTP's lack of head-of-line blocking, but recently it seems that DCCP is a better fit for those applications.

    14. Re:Structured Stream Transport by Cybah · · Score: 1

      I would be curious what the article thinks is so fundamentally wrong with the sockets paradigm.

      TFA doesn't say there's anything wrong with the sockets "paradigm". It basically says that the API has performance issues with two use cases (low latency and high bandwidth) and doesn't support multi-homing very well.

      Wrt multi-homing, it mostly talks about applications which I'd call mobile IP. Also, it curiously doesn't mention the ability to bind to all addresses as the current (crude) solution to multi-homing.

    15. Re:Structured Stream Transport by ADRA · · Score: 3, Funny

      Sorry to cut it to you, but NAT is here to stay. As a security paradigm, there's no surface attack to a user's PC that isn't even visible.

      --
      Bye!
    16. Re:Structured Stream Transport by alexmin · · Score: 1

      "UDP lets you send non-linear messages of arbitrary size without delimiters." - as long as it does not exceed MTU, which is insufficient for a lot of apps. To combat that problem people built reliability layer on top or UDP (RMCAST etc) and end up reinventing TCP, poorly.

      Having to serialize data structures into byte stream is a problem, though, that I fully agree with. But what is the other option?

    17. Re:Structured Stream Transport by mzs · · Score: 1

      TCP suffers from "head of the line blocking" see the URG hack for more about that. HTTP being on top of TCP suffers from this as well. What this means is say one fragment gets lost. It will take something like 25 seconds for that to be resent. In the meantime all the other data that has been coming along cannot be passed along to your program.

    18. Re:Structured Stream Transport by lewiscr · · Score: 2, Interesting

      I'll always be NATing my home connection, even with IPv6. I assume my cable provider will charge me for those "extra" IPv6 IPs that I would be using. And if this one doesn't, the cable provider that buys this one will.

    19. Re:Structured Stream Transport by kelnos · · Score: 1

      Yah, just like IPv6 has.

      There are also quite a large number of routers (the majority) that *don't* run Linux. (And I do know something about this, working for a major home router OEM.) It's a commodity business these days, and cost cutting is the rule. If adding extra protocol support (esp for something like IPv6 with larger memory requirements) means adding more RAM or flash, it's not gonna happen unless you can make a business case for it. I'd be less surprised to see IPv6 in a consumer-level router appliance than SCTP... and I don't expect to see IPv6 standard for quite a while.

      --
      Xfce: Lighter than some, heavier than others. Just right.
    20. Re:Structured Stream Transport by Anonymous Coward · · Score: 0

      No, HTTP pipelining doesn't solve the problem the OP is talking about. You can certainly send more than one request to the HTTP server before receiving even the first response, but the server will still send the responses sequentially, in the order received. If you want to receive the responses out of order or simultaneously, you need to open separate socket connections to the server.

      Of course, it's certainly *possible* to design a protocol that interleaves responses (sorta analogous to how audio and video are interleaved in a container file), but HTTP at least doesn't do it. In practice, it's easier just to open another connection, anyway.

      But I'm not convinced this is a bad thing...

    21. Re:Structured Stream Transport by Panaflex · · Score: 1

      If you don't want ordered packets, then use UDP. That's what its for!

      I've written a few web servers and contributed some to Apache - so I'm intimately familiar with the issues of HTTP.

      --
      I said no... but I missed and it came out yes.
    22. Re:Structured Stream Transport by Just+Some+Guy · · Score: 2, Insightful

      Sorry to cut it to you, but NAT is here to stay. As a security paradigm, there's no surface attack to a user's PC that isn't even visible.

      If only you could devise some kind of wall between your machine and the fiery flames that didn't require NAT, but alas, such is merely dreaming.

      --
      Dewey, what part of this looks like authorities should be involved?
  10. Couldn't this be like a flag, rather than new API? by tjstork · · Score: 4, Interesting

    he recently developed SCTP (Stream Control Transport Protocol)4 incorporates support for multihoming at the protocol level, but it is impossible to export this support through the sockets API

    The word that bugs me there, is "impossible". The question is, why? If you have to do something with sockets under the hood, then so be it, but it would seem to me that you could just add a few more fields to socket address to take into account multiple homes.

    We've already had alternative APIs to sockets and for quite some time. sockets won. There were named pipes, ipx/spx, and the seemingly stupid idea of treating a network resource as a file has trumped every time.

    --
    This is my sig.
  11. SCTP an interesting example by isj · · Score: 5, Interesting

    I am developing SCTP applications and has contributed to the linux implementation, and I think that one of the advantages of the socket API is that it is usable with select()/ and poll(), ie. it is file descriptors you can pass around.

    But for SCTP there are things that don't fit nicely into the socket API, especially when using one-to-many socket types. For instance for retrieving options for an association you have to piggyback data in a getsockopt() call by using the output buffer also for input. It works, but it is not nice. Also, for sending/receiving messages you have to use sendmsg/recvmsg with all the features including control data, and the ugly control data parsing.

    1. Re:SCTP an interesting example by QuoteMstr · · Score: 3, Informative

      It works, but it is not nice.

      So use a wrapper, like sctp_send from libsctp. There's no reason the kernel proper has to export these interfaces.

    2. Re:SCTP an interesting example by phantomfive · · Score: 1

      Just FYI: about a year ago I was writing a program with SCTP, and I kept getting kernel panics in the SCTP layer. It was annoying. YMMV.

      Also, select() and poll() are both inefficient. I suggest you use epoll(). Once you get the hang of it, I think you will like the interface better as well.

      --
      Qxe4
    3. Re:SCTP an interesting example by Anonymous Coward · · Score: 0

      I hate to say it, but is it possible that your program using SCTP was poorly written? We were doing testing with SCTP 2 1/2 years ago or more, and we never ended up with kernel panics. We were using the vanilla kernel tree at the same time too.

    4. Re:SCTP an interesting example by QuoteMstr · · Score: 1

      select and poll are perfectly fine for small network daemons. Sometimes the reduction in code complexity is worth a negligible performance hit. Not every program needs to wait on 10,000 sockets all at once.

    5. Re:SCTP an interesting example by QuoteMstr · · Score: 1

      There was a kernel bug, and most likely also a bug in the OP's program. No program, no matter how badly written, should cause a kernel panic.

    6. Re:SCTP an interesting example by isj · · Score: 1

      True. Perhaps my viewpoint is a bit nonstandard because I also developed some of the wrappers in lksctp, and had to deal with some of the issues/limitations of the kernel API versus the nice application-level lksctp-API. One thing that would make life a bit easier for the wrappers would be a nice input-output socket call. Currently we have misuse getsockopt().

      From the application level something that is missing (this is SCTP specific) is a way to wait for free output buffers for a specific SCTP association in a one-to-many socket. The problem is that other associations may have free buffers, and poll() for POLLOUT will return immediately, but the association that the application wants to send to has its buffers full. = busy wait.

    7. Re:SCTP an interesting example by phantomfive · · Score: 1

      Yes, this is correct (on both counts). When my program we debugged, it fixed the problem. Continually crashing kernels didn't help the debugging process much.

      --
      Qxe4
    8. Re:SCTP an interesting example by phantomfive · · Score: 1

      The thing is, select()/poll() aren't exactly an interface of happiness, bubbles and simplicity. In a lot of cases epoll() is going to be a better interface. I would probably use it all the time, except it's not cross-platform compatible, unfortunately.

      --
      Qxe4
  12. Why use sockets when you can just use vice grips? by yourassOA · · Score: 1

    And hey one size fits all.

  13. Hmm... by fozzy1015 · · Score: 3, Interesting

    In my experience the way the socket API can slow down a processor is having to monitor many thousands of socket descriptors using select() or poll(), like in a web server. For Linux epoll() was created for this scenario.

    1. Re:Hmm... by mzs · · Score: 1

      Yes and /dev/poll on Solaris and kqueue on FreeBSD or just do the best thing on whatever your system is by using libevent:

      http://www.monkey.org/~provos/libevent/

      Poll is sadly O(N) but there are some optimizations that can be made to poll to make it faster.

      First poll does not need to be a simple syscall that copies over the entire array into kernel memory. Almost every time that poll is called that array is identical to what it was the last time and at the same address. libc can in userspace first compare that array, base addr, and size and when all is the same it can call a faster poll syscall or pass an arg to tell it to do the fast path of what was done the last time.

      Secondly the kernel does not need to copy a buffer for the results into the userspace. It can simply just twiddle that memory itself while still in the kernel context.

      Finally there can be optimizations for the simple typical case of there only being a few descriptors, like 8 or less.

      Many systems do some or all of those, from performance measurements it seems to me that Solaris does particularly well.

  14. STREAMS? by $lashdot · · Score: 2, Informative

    Macs used STREAMS from system 7.5.2 onwards. Was kind of sad to see that go away with the switch to OS X.

    1. Re:STREAMS? by rgviza · · Score: 1

      Streams are how information moves to network and disk.

      You can't transfer bytes without a stream unless you are opening and closing a handle for every byte. If that were the case, OSX would run like so much molassis.

      Further since OSX is a UNIX operating system written in C, it *has* to support streams. Streams are a part of C and UNIX and OSX is an officially certified UNIX OS.

      --
      Don't kid yourself. It's the size of the regexp AND how you use it that counts.
    2. Re:STREAMS? by chthonicdaemon · · Score: 1

      I think GP meant this STREAMS, not the ones you are thinking of.

      --
      Languages aren't inherently fast -- implementations are efficient
  15. Absolutely! by Anonymous Coward · · Score: 0

    I've been complaining about them for years. Maybe a generation ago, they were useful, but I think most people wear them out of tradition. The first thing I do when I get home is take off my shoes and socks. Also, I hate pants. ... oh! "Sockets" ... they're fine or whatever.

  16. IOCP by kiss7 · · Score: 1

    IOCP is perfect for both high bandwidth and low latency. ...you have to use the "ugly" windows os for it :)

    1. Re:IOCP by dave420 · · Score: 1

      Or Solaris.

  17. It really seems to me... by Secret+Rabbit · · Score: 2, Insightful

    ...that most of the things that this guy is talking about would be better implemented below the sockets API. As in, how the OS handles things. Making things transparent is a good thing.

    I'll also point out that having a fail over interface so that the client doesn't lose the connection has already been done in OpenBSD's pf called CARP. It is a free alternative to VRRP and HSRP. In other words, this doesn't have to be implemented in the API when another avenue already exists that does it.

  18. Yes Mine are good by ben2umbc · · Score: 3, Funny

    My socks are fine for now. When they do run their course I go to walmart and get new socks its $5 for 6 pair!

  19. User level networking and the last copy by wdebruij · · Score: 4, Interesting

    This is hardly news and partly mistaken.

    The statement that sockets limit throughput by copying between kernel and application processes is a bit simplistic. The copy of Rx data to an application usually primes the cache. If data isn't touched and loaded into the cache at this point, it will have to be loaded shortly, anyway. Granted, for Tx this trick does not hold.

    Second, the interface is not the implementation. Just because sockets are traditionally implemented as system calls does not state that they have to. User level networking is a well known alternative to OS services for high-bandwidth and low-latency communication (e.g., U-net developed around '96). I know, because I myself built a network stack with large shared buffers that implements the socket API through local function calls (blatant plug, but on topic. The implementation is still shoddy, but good enough for UDP benchmarking).

    User level networking can also offers low latency. My implementation doesn't, but U-net does.

    This leaves the third point of the article, on multihoming. As sockets abstract away IP addresses and network interfaces, I don't see why they cannot support multihoming behind the socket interface. Note that IP addresses do not have to mapped 1:1 onto NICs. Operating systems generally support load-balancing or fail-over behind the interface through virtual interfaces (in IRIX) or some other means (Netfilter in Linux).

    Not need to replace sockets just yet.

    1. Re:User level networking and the last copy by mzs · · Score: 1

      Maybe I am not really getting your RX caching argument but it sort of falls apart.

      Say you have a typical nic where it DMAs frames into main memory. That's actually a pretty typical scenario. So there is this frame in DRAM. Depending on the cache and dma coherence and snooping abilities of your hardware that frame is not in the cache of the processor at that point yet. The kernel now copies it to the user space buffer. That is where it gets in the cpu cache, but with fancy hardware you might have just cached the orignal DMAed copy as well.

      If there was no copy it would have gotten cached the moment the userspace process accessed it. There really is no win there and if you have fancy hardware you are flushing extra cachelines. So you have gained nothing, possibly flushed something else you will need soon out of the cache, and also added a copy.

      I can tell you from personal experience that I have had great improvements from zero_copy in FreeBSD and Solaris. It really is a lot of time that the cpu burns copying buffers around. I also have the displeasure of dealing with with vxWorks and historically network performance has been abysmal. A lot of it comes down to the netTask but there is certainly a lot of overheard in copying. Using zBufs was not much of an improvement, I am thinking that was due to the overhead in zBufs calls themselves. I had some progress in using so_socket itself but never really finished that-up. Wind River has moved to some proprietary stack it seems now so who knows what is the situation currently.

  20. Should stop drinking... by Anonymous Coward · · Score: 1, Funny

    Have Rockets Run Their Course?

  21. Re:Couldn't this be like a flag, rather than new A by phantomfive · · Score: 4, Interesting

    The word that bugs me there, is "impossible". The question is, why? If you have to do something with sockets under the hood, then so be it, but it would seem to me that you could just add a few more fields to socket address to take into account multiple homes.

    Especially since SCTP actually does use the sockets API. You have to use recvmsg() instead of recv() if you want to do multi-homing, but in using SCTP I was actually impressed by how flexible the BSD socket API actually is. I can't say I particularly like it, and everyone who uses it ends up writing a wrapper around most of the send and recv calls, but flexibility is definitely it's strong point. If we ever do get routing by carrier pigeon, the BSD socket API will be able to adapt to it.

    --
    Qxe4
  22. Re:Old school socket set.... by Anonymous Coward · · Score: 0

    Language evolves, get over it. Often the best name for some abstract concept is a metaphor using some concrete object. Your computer is full of strings and threads too! Call the dictionary police!

    Oh, and that language your so possesive of? Yeah, hasn't been around very long (and the King would like to have a work with you about what you've done to His English).

  23. Bad by poor_boi · · Score: 0, Redundant

    This is a bad article and a bad thread and you all should feel bad for posting it and taking it seriously and, finally, for reading this -- my post/

    1. Re:Bad by Anonymous Coward · · Score: 0

      just... wow. Seriously, get over it, it's just a slashdot article

  24. Alternatives by RAMMS+EIN · · Score: 1

    I couldn't get to the article, but if they think Berkeley sockets are obsolete, I'd like to see what alternative they offer, why they think these alternatives are better, and what the pitfalls of the alternatives are.

    --
    Please correct me if I got my facts wrong.
  25. Re:Old school socket set.... by Anonymous Coward · · Score: 0

    Oh, and that language your so possesive of? Yeah, hasn't been around very long (and the King would like to have a work with you about what you've done to His English).

    Case in point.

  26. Re:Old school socket set.... by Anonymous Coward · · Score: 0

    BSD sockets are also an 'old school set'. This might not be the site for you.

  27. Re:Old school socket set.... by TapeCutter · · Score: 2, Funny

    "damn punk kids"

    Ro-ro..

    Let's get outta here Scooby!

    --
    And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  28. It's not sockets, its bind() by argent · · Score: 5, Interesting

    The socket API... or rather the UNIX file descriptor API... has been extended many times. Sockets are already one such extension, and there's no reason you couldn't do something like mmap() a socket to map the buffers into user space directly. Heck, udp sockets already diverge from the read/write paradigm.

    The problem with sockets is at a higher level. They're not mapped into the file system name space. You should be able to open a socket by calling open() on something like "/dev/tcp/address-or-name/port-or-name" and completely hide the details of gethostbyname(), bind(), and so on from the application layer. If they'd done that we'd already be using IPv6 for everything because applications wouldn't have to know about the details of addresses because they'd just be arbitrary strings like file names already are.

    1. Re:It's not sockets, its bind() by Quietust · · Score: 1

      If they'd done that we'd already be using IPv6 for everything because applications wouldn't have to know about the details of addresses because they'd just be arbitrary strings like file names already are.

      I was under the impression that getaddrinfo() already served to easily provide support for IPv6.

      --
      * Q
      P.S. If you don't get this note, let me know and I'll write you another.
    2. Re:It's not sockets, its bind() by Score+Whore · · Score: 1

      If they'd done that we'd already be using IPv6 for everything because applications wouldn't have to know about the details of addresses because they'd just be arbitrary strings like file names already are.

      Anyone who writes an application that needs to know the details of addresses is doing it wrong. Sockets don't require any particular knowledge of the underlying network protocols.

    3. Re:It's not sockets, its bind() by cpghost · · Score: 1

      You mean something like portalfs, as implemented e.g. in FreeBSD?

      --
      cpghost at Cordula's Web.
    4. Re:It's not sockets, its bind() by argent · · Score: 1

      Anyone who writes an application that needs to know the details of addresses is doing it wrong. Sockets don't require any particular knowledge of the underlying network protocols.

      $ man getaddrinfo
      ...
          The hostname and servname arguments are either pointers to NUL-terminated
          strings or the null pointer. An acceptable value for hostname is either
          a valid host name or a numeric host address string consisting of a dotted
          decimal IPv4 address or an IPv6 address. The servname is either a deci-
          mal port number or a service name listed in services(5). At least one of
          hostname and servname must be non-null.
      ...
      $ man gethostbyaddr
      ...

      You can do anything you want with an already opened socket without knowing if the underlying network layer uses IPv4, IPv6, X.400, DECNET, OpenNET, FutureNet, SMB, NetBIOS, Netware, or ID4 nework addresses or end-point identifiers.

      You can't open a socket without implicit knowledge about AF_INET, AF_INET6, or AF_UNIX addresses. There's been SOME improvements over the past twenty-odd years, so if your application is relatively new and you're on top of things it's not too much work to handle IPv4 and IPv6, but damn...

      The API should never have exposed anything but an anonymous character string as address and end-point identifiers for anything but applications like network scanners that are inherently protocol-sensitive.

    5. Re:It's not sockets, its bind() by Score+Whore · · Score: 1

      The API should never have exposed anything but an anonymous character string as address and end-point identifiers for anything but applications like network scanners that are inherently protocol-sensitive.

      Given that the protocols aren't interchangeable in their feature sets what you state isn't even a desirable situation. But knowing the internals of the addressing scheme isn't necessary for a properly written program.

    6. Re:It's not sockets, its bind() by argent · · Score: 1

      Given that the protocols aren't interchangeable in their feature sets what you state isn't even a desirable situation.

      The point of the UNIX pipe and socket API is that they hide that kind of detail from applications.

      A pipe, an AF_UNIX socket, a local disk file, a file on an NFS share, an OpenNET connection to a named pipe on MSNET or DECNET servers, a serial port, raw disk partition, and all kinds of other objects are all presented to a program using the same abstraction. I have had software running under Eunice on VMS talking over OSI CONS to a Xenix named pipe, and without changing a line of code used the same code to talk to a local serial port, an IPv4 network port, and over a multiplexed file to a load tester. I've written the same code talking to a local raw disk partition, a local floppy disk, a remote raw disk partition over OpenNET, a stream file on a VMS server over DECNET, and so on.

      So there is a huge interchangeable subset in the feature sets of any network that can maintain the file and socket abstraction.

      But knowing the internals of the addressing scheme isn't necessary for a properly written program.

      Most properly written programs were written using gethostbyname(), not even gethostbyname2(). And even getaddrinfo() doesn't support AF_UNIX let alone Lan Manager/NetBIOS, nor does it hide the semantics of the endpoint namespace (integer in IPv4 and IPv6, but not specified in networks using file system namespaces like AF_UNIX or OpenNET, and a character string in networks that use the named pipe abstraction).

    7. Re:It's not sockets, its bind() by argent · · Score: 1

      Yes, except for being twenty years too late.

  29. Re:Old school socket set.... by Jesus_666 · · Score: 1

    I thought this story was about wireless energy. You know, wall sockets.

    --
    USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
  30. Re:Old school socket set.... by Anonymous Coward · · Score: 0

    what the hell is wrong with you ?

  31. Unix always had it by mangu · · Score: 4, Informative

    some way to seamlessly connect LOCAL processes to each other

    You mean, like pipes?

    1. Re:Unix always had it by SQLGuru · · Score: 5, Funny

      You mean, like pipes?

      Pipes for local communication and tubes for global communication. Seems like a winner.

    2. Re:Unix always had it by Junks+Jerzey · · Score: 1

      Pipes are good, but they were designed for a specific paradigm, not the kind of thing you'd use sockets for. Bidirectional pipe communication is clunky, to say the least.

    3. Re:Unix always had it by mR.bRiGhTsId3 · · Score: 1

      Thus, the answer is clearly shared memory or message passing handled by the kernel or a low level daemon. I'm pretty sure every OS supports this model in some fashion.

    4. Re:Unix always had it by Anonymous Coward · · Score: 0

      It would have been nice if instead of inventing sockets, with their "interesting" states (like half open), they had done something more like the pipe(2) system call, which returns 2 file descriptors (in an array). One fd would be open in the read direction and one in the write direction. When you finish sending you just close the write file descriptor.

      As others have said, passing in the connection details as a string like /dev/tcp/hostname/service_name would have made the world much more flexible and likely to move to things like ipv6.

  32. Low latency? by sphealey · · Score: 1

    "...high bandwidth, low latency..."? Low latency? Is the author working on some alternative universe Internet with low latency, rather than the high, increasing, and highly variable latency of the Internet here in this universe/on this planet? Or perhaps he has a telco that isn't continuously raising the price of T1s and T3s to force him onto high-latency IP connectivity "solutions"?

    sPh

    1. Re:Low latency? by Svartalf · · Score: 1

      I think it's less the Internet that the author's talking to and more things like clustering, etc. which is LAN-centric, not WAN-centric.

      For those sorts of configurations and applications, high-bandwidth and low-latency is crucial. To be able to analyze the chaotic traffic on the backbone of the WAN, you need the same sort of ability, actually.

      --
      I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
  33. If you like structured IO by Anonymous Coward · · Score: 0

    Go and use z/OS and it's MVS filesystem.

    AAAAAARRRRGGGHHHHH!!!!!

  34. Plan 9 by Anonymous Coward · · Score: 2, Informative

    Already does that.

    1. Re:Plan 9 by argent · · Score: 1

      I'm still bitter about Plan 9. Damn, that bitch had one hot set of APIs.

      I'm bitter about Alpha, too. And Amiga. And SGML. And Palm OS. And Power PC.

      *sigh*

  35. Re:haha by ThePhilips · · Score: 3, Insightful

    ... TLI is designed from an OSI model-oriented viewpoint ...

    That explains why - fortunately - it wasn't widely adopted.

    --
    All hope abandon ye who enter here.
  36. Re:Couldn't this be like a flag, rather than new A by Anonymous Coward · · Score: 0

    but flexibility is definitely it's strong point.
    its*
    sorry!

  37. Real problems, or.... by countach · · Score: 2, Interesting

    It seems to me that all the issues the author mentions could be solved with some library written over the top of sockets (and potentially other primitives like threads). Sockets are meant to be a low level interface, not to solve every problem.

    The multi-home problem is real, but could be fixed with a relatively minor extension to the API, like IPV6 has been added in.

    1. Re:Real problems, or.... by MROD · · Score: 1

      Actually, multi-homing shouldn't be an application problem at all, it's a system level one.

      i.e. It should be moved to the network stack. The application could talk to a virtual interface which the network stack then passes through the most appropriate interface.

      Fail-over is also a system-level problem, not an application-level one.

      Just read up on clustering. (Not HPC clustering such as Beowulf, high-availability clustering.)

      --

      Agrajag: "Oh no, not again!"
  38. Plan 9? by Hucko · · Score: 1

    How does Plan 9 do this? From memory it wasn't precisely sockets... but more interesting. gah... I'll go research

    --
    Semi-automatic amateur armchair Australian philosopher; conjecture ready at any moment...
    1. Re:Plan 9? by Anonymous Coward · · Score: 0

      dial(2) is used to establish a connection;
      normal read(2) and write(2) suffice after that.

  39. Sprockets will never die. by eXFeLoN · · Score: 0

    Seriously you must touch my monkey.

    --
    My other sig is a knife wound.
  40. Was it the Enquirer? by alien_life_form · · Score: 2, Funny

    Having RTFA, I have to ask: "What in Cthulu's name have APIs got to do with all this?".

    The author broadly complains of the current status of networking at the OS level (copying bytes, connecting to/from multihomed hosts, etc.). APIs don't get into it.

    The title of the article appears to be an attention grabbing device, it could well have been titled "Does Britney Spears carry my baby?".

    (The incipit would be "No. Now, in a world of low latency and high bandwidth...")

    Cheers,
    alf

  41. Comment removed by account_deleted · · Score: 3, Interesting

    Comment removed based on user account deletion

  42. Ask the Winsock PM! by WinsockPM · · Score: 1
    So ... let's assume that I'm the Microsoft PM for Winsock. And let's assume that Microsoft has recently noticed that Winsock is just about the most popular topic in the Windows programming section of MSDN. And let's assume that Microsoft wants people to write network program on Windows.

    (It helps that all of the above are true. I've been the Winsock PM now for just a week).

    What's good about Winsock?

    What's bad?

    Why are you using Winsock and not, for example, Windows Communication Foundation, or an HTTP protocol?

    What kind of program are you writing?

    Is there anything else you'd like to tell the Microsoft Winsock PM?

    Fun Winsock fact: the most popular comment in the MSDN "comment on this topic" is "sdslk". Followed by, "get rid of this window"

  43. Re:haha by bsdaemonaut · · Score: 1

    "Otherwise, TLI looks similar, API-wise, to sockets."

    So if the Sockets API is limiting.. how exactly does it help to take on an alternative, but similar API?

  44. Re:Couldn't this be like a flag, rather than new A by mini+me · · Score: 1

    If we ever do get routing by carrier pigeon

    We did get routing by carrier pigeon. And yes, sockets did handle it just fine.

    http://www.blug.linux.no/rfc1149/

  45. Socket critique by Animats · · Score: 1

    I've never been impressed with the mania for "zero copy" systems. On modern CPUs, copying of data handled very recently is cheap, because it's already in the faster caches. On the other hand, mucking about with the MMU to move pages from one address space to another tends to be expensive, especially if cache flushing is required. Mach made that design mistake.

    I'm old enough to remember when the "sockets" API was developed. We'd been using a very early 3COM TCP/IP package, "UNET", which predated BSD networking. It simply used "open", "read", and "write", rather than special "socket" calls. Adding extra calls was very Berkeley; they were writing alongside the UNIX kernel, not fully integrating their own stuff. There was no reason not to have "read" and "write" work on sockets, and in some operating systems, they do.

    Bear in mind that BSD didn't have threads. Hence the need for the polled "select" model.

    If you want to see interprocess communication done right, look at QNX. Their "MsgSend", "MsgReceive", and "MsgReply" model allows one program to call another. If you want networking to call the application, that's the way to do it. It's a proven model, and it's fast enough that I've pumped uncompressed video through it with message passing using about 3% of a Pentium III class CPU.

    By the way, bear in mind that ACM Queue is just a sort of blog. That's not a refereed paper.

  46. What a surprise! by ClosedSource · · Score: 1

    That developers who embraced an OS that is designed with the idea that "everything is a file" prefer an approach of treating a network resource as a file.

  47. My favorite alternative... by Anonymous Coward · · Score: 0

    My favorite alternative is iWarp. It uses a lot of current infrastructure, it was made for acceleration, it's message based (makes coding much easier), has easier-to-use mechanisms as well as lightning fast mechanisms, and is asynchronous (although, it provides functions for syncing).

  48. It's been done, often by jc42 · · Score: 1

    I've worked on any number of projects in which we created a new network API, usually with UI tools to match. Of course, our package always used on sockets as the lower-level "internal" basis.

    It's called "layering". Some network programmers have learned that it's a useful approach.

    So far, I've never seen a different networking UI that's easier to program (or debug) than sockets. I keep reading articles like this on the topic, but I'm still looking for one that's better for the job. It's possible that the Berkeley people found the best low-level approach.

    However, one thing that they missed that has been sorely needed on a number of projects is a "timeout" parameter to the connect() call. Time and again I've seen cases where an app hangs inside a call to connect() and never returns. Typically setting an alarm won't interrupt the call, and sometimes even a "kill -9" won't kill the process. Sometimes only a reboot will get rid of the zombie process. If the OS no longer gives your process any cpu time, it doesn't matter what clever code you have in it to diagnose problems. This seems to happen under unknown conditions on all OSs (though I haven't actually tested it on all releases of all OSs, so I could be wrong).

    But this is an implementation detail of what's basically a fairly sound design.

    --
    Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    1. Re:It's been done, often by isj · · Score: 1

      ...a "timeout" parameter to the connect() call

      There is a common wait to do this. I am not sure if it counts as a solution or a kludge:
      - set socket into non-blocking mode
      - call connect()
      - check for immediate success (happens on solaris over loopback interface)
      - if not:
      - select/poll for writability, using a timeout
      - When writable, retrieve connection status with getsockopt()

      So yes, a timeout parameter to the connect() call would make life easier for client applications. But there is a workaround.

      sometimes even a "kill -9" won't kill the process
      That sounds odd. Which OS was that?

    2. Re:It's been done, often by jc42 · · Score: 1

      - set socket into non-blocking mode
      - call connect()
      - check for immediate success (happens on solaris over loopback interface)

      Well, I've tried that; it only works if the connect() returns in the calling process. With the bug I wrote about, the problem is that this doesn't happen, even with a nonblocking socket. Also, an ALARM is supposed to interrupt a system call and force a failure return, but this doesn't happen, either.

      sometimes even a "kill -9" won't kill the process
      That sounds odd. Which OS was that?

      I've documented it on FreeBSD, NetBSD, Solaris and several flavors of linux. It's been a couple of years since I've worked on a project where I could test this, so I can't tell you the releases.

      It may not be a single bug, as similar "hangs, unkillable" problems have been reported in all sorts of systems since the dawn of "time sharing" computer systems back in the 1960s. They're usually good classroom studies for subtle kernel bugs, since diagnosing the problem requires a more-or-less complete path analysis of the code to find a "can't get there from here" scenario, where "there" is the code that returns to the process and "here" is the state in some system dumps.

      A big part of the problem is that, in my experience, when the problem happens, it's typically after several million connect() calls by the same process. Imagine a search bot, for example, which might have to read millions or billions of URLs, typically with a connect() per URL (though this can be optimized in the obvious way).

      Most people treat "one in a million" as an idiomatic way of saying "extremely rare". But there are 86400 seconds in a day, so if you call connect() as few as 12 times per second, you hit a million in a bit under a day. (23 hours, 8 minutes, 53 seconds, actually. ;-)

      But in my experience, the bug isn't usually reproducible. During testing, I have had the process log all its connect() calls, so I could tell who it was trying to connect to. Usually restarting it for that address will result in success, which doesn't tell you much. I've managed to reproduce the problem maybe a total of a couple of dozen times, and the only thing I find in common is that the remote system has report that it was running some versions of Windows and IIS. But the sample size is far too small to bash Microsoft for this (other than the fun of doing so ;-). And most Windows+IIS systems connect just fine. I've seen the problem when the remote system was various unixoid OSs, but that has never been reproducible. And in any case, the culprit could easily be some intermediary bridge or router doing something "intelligent" with the connection attempt.

      OTOH, the problem long predates any reports of ISPs like Comcast intentionally screwing with connections in order to cause problems for one end or both, which only go back a few years (to my knowledge). Before that, it almost had to be some incompatibility between the two ends of the connection.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    3. Re:It's been done, often by isj · · Score: 1

      Thanks for that description. Quite interesting.

      You mention flavors of BSD and signal handling. Perhaps http://lists.apple.com/archives/darwin-kernel/2005/Aug/msg00133.html may be helpful next time you encounter this. If OSX borrowed the signal handling code from the BSD relatives, it could be this. In that case it is probably a different bug on linux.

      In any case an unkillable process is a kernel bug (as long as NFS and cluster software is not involved).

  49. "the resistance army" - coincidence? by coolsnowmen · · Score: 1

    I fear it might be more than a coincidence that a guy selling T-shirts with "the resistance army" on it got arrested for doing something seemingly trivial.

  50. Sorry by coolsnowmen · · Score: 1

    Wow, wrong article, whoops. Sorry

  51. Another interesting read by shutdown+-p+now · · Score: 1

    Another interesting read on sockets performance, for the most part specifically in Linux context, is this. It seems to be rather detailed, covering a lot of ground. However, it's last updated in 2006, so I wonder what had changed since then...

  52. Stateless yes TCP no by sgt+scrub · · Score: 1

    TFA goes as far as saying networking sockets are a proven technology. "is quite impressive for an API to have remained in use and largely unchanged for 27 years." Then wanders off with nothing more than an example of how something works better in a small area of networking. It is as if the writer is suggesting we should switch everything over to stateless protocols and blast servers with simultaneous connection requests or send back 8-16 packets at the same time to a client. That would be insane. And, I hope not what he was intending. Not that the bandwidth could/should be there but if you think about what can be done by a bad guy under the cover of 8-16 packets being sent at one time from 8-16 different ports. As far as other protocols go (TCP) some of us prefer all of the checks involved using sockets. FTFA"The typical processing loop of a sockets-based program isnâ(TM)t simply read(), process(), read(), but instead select(), read(), process(), select()." I would rather have a lock than a buffer overflow. The faster the machine the less important that lock gets. The importance of security doesn't shrink. Applications that are currently using UDP (with the exception of applications that transfer very little information DNS/NTP) would be much better off using SCTP. Multihomed transfers of data would be useful to simplify bandwidth sharing designs. ie. one home for each isp without a separate load balancing software is possible with applications designed around SCTP. So SCTP, and other stateless transaction protocols, do work better without having to have a single socket open to send packets through when multiple packets can be sent to multiple destinations simultaneously. Statefull applications, however, are best left to reliable old sockets.

    --
    Having to work for a living is the root of all evil.
  53. Re:haha by mzs · · Score: 1

    XTI gave you more control at the application level. So for example you could do TCP/IP, IPX, with very similar code NetBIOS. With sockets you need kernel and/or libsocket support for that kind of thing like how you have SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW. The other thing is it allowed you to tweak things more easily since the kernel was not so involved. Say you did not want naggle with your TCP or you wanted to make a protocol something like SCTP to T/TCP.

  54. Horrible for multiple connections by harlows_monkeys · · Score: 3, Interesting

    Sockets are very annoying when you have a lot of clients being served by one server. Consider, for instance, a chat server, with 25000 clients connected. You have 25000 sockets, one per client (plus a listen socket for new clients to connect to).

    Whenever data arrives, the system has to somehow notify you that one of your sockets is ready to read. That generally involves some kind of polling, with select or poll, or some kind of interrupt mechanism, such as a signal. I'm leaving out some options, but regardless of how you get notified, you then read the data from the appropriate socket.

    Then guess what happens? Most likely you take that data, wrap it in a data structure that tells you which client it was for, and stick it on a work queue, where the main thread or threads pull things to process.

    Step back and look at what happened here:

    1. The data from all 25000 clients comes in on a single interface.
    2. The kernel goes to great effort to process this stream of data and split it up into separate streams for the 25000 clients.
    3. You have to deal with that data coming into your server application via 25000 different sockets.
    4. You put it all back into a single stream (your work queue) as a bunch of messages.
    5. You pull the items from the work queue to process.

    That's just insane! The kernel demultiplexed the incoming data, and the server just remultiplexed it when it put it onto the work queue. Demultiplexing belongs in the server application, not the kernel.

    What I want is a single stream between my code and the kernel that delivers all the data for all 25000 clients. Whenever any client has data, I want to be able to read from that, and get back a message, that identifies which client it is from, and gives me that data.

    The kernel should just be parsing the incoming TCP stream enough to recognize what port a given packet is for, and what client it came from, and then should queue it up into a single stream for the server handling that port. (The kernel has enough information from that to keep track, on a per client basis, of how much data is pending in the queue for the server app, so has what it needs to manage flow control).

    1. Re:Horrible for multiple connections by Chirs · · Score: 1

      "Consider, for instance, a chat server, with 25000 clients connected. You have 25000 sockets, one per client (plus a listen socket for new clients to connect to)."

      Why are you using TCP for a chat server? You could just use UDP with a single server socket, and all your objections go out the window. Sure, you lose the reliability that TCP provides, but this is a chat server--is it really a big deal if you lose the odd message?

    2. Re:Horrible for multiple connections by harlows_monkeys · · Score: 1

      Why are you using TCP for a chat server? You could just use UDP with a single server socket, and all your objections go out the window. Sure, you lose the reliability that TCP provides, but this is a chat server--is it really a big deal if you lose the odd message?

      This particular chat server was part of a gaming service that the people I was working for were developing. You had chat rooms where people could meet, and challenge each other to games. Say you challenged me to a game of chess and I accepted. The server would make a private channel for us, our client software would launch the chess client on each of our systems, and the chess clients would communicate via hidden messages in our private channel, while we could chat or trash talk with each other in that channel. If this game was for the ladder system, the score keeper application on one of our servers would join the private channel and monitor and record the moves, and update the ladder based on the result.

      It's kind of bad to just a lose a move in chess, so yes, it would be a big deal to lose a message.

      Hence, with the UDP approach, you just end up in your server code (and in the client code you give your users!) having to implement a retry mechanism. You'll probably need flow control, too. Bottom line is you will end up essentially implementing TCP on top of UDP

      Seems kind of a waste when the Linux kernel on the server and the Windows kernel on the client already have perfectly fine TCP implementations.

      I did consider it. I looked around to see if I could find an open source, application level, "TCP over UDP" hack, but couldn't find any (this was several years ago). A very cursory search just now found one that isn't production level yet.

  55. Re:haha by bsdaemonaut · · Score: 1

    I know very little about it, as there doesn't seem to be a ton of info available. It does seem like XTI still ships with Solaris and is at least available through third-party opensource libraries for linux.

    It seems to me like most of what your saying sockets is missing, it has gained through higher-level abstractions. Since XTI seems to be a higher level API in and of itself.. I'm not sure if I fully grasp what real advantages it would provide.

    In any case I have to imagine that the article is in actuality looking for something new. Perhaps something like XTI in that it is abstract and outside the umbrella of the kernel, but more effecient than what XTI was coming to be.

  56. is too by Anonymous Coward · · Score: 0

    *I* think it's funny. Relax, ya' old stick-in-the-mud, don't be so hard nosed!