Have Sockets Run Their Course?
ChelleChelle writes "This article examines the limitations of the sockets API. The Internet and the networking world in general have changed in very significant ways since the sockets API was first developed in 1982, but the API has had the effect of narrowing the ways in which developers think about and write networked applications. This article discusses the history as well as the future of the sockets API, focusing on how 'high bandwidth, low latency, and multihoming are driving the development of new alternatives.'"
I think sockets work fi.... *connection lost, host not routable*
is no sockets. some way to seamlessly connect LOCAL processes to each other without socket overhead by using the familiar socket interface. something simpler than shared memory.
and a better protocol method of opening sockets with the hard stuff taken care of by the OS. and with transparent buffer protection etc.
Been there, done that. Apple (once again) had a great implementation of an alternative technology, that it finally abandoned when it didn't feel like fighting any more.
Open Transport (the PPC stack used in the Classic Mac OS) was fast, efficient, and cool. And based on the STREAMS methodology, the only real competition to Berkeley Sockets.
Choice is good, mmmkay?
Hire a Linux system administrator, systems engineer,
This seems to dance a bit too close to Networking Truths 6a, 11, and possibly 12. I will reserve judgment until I see solid real-world evidence.
Ce n'est pas une signature automatique.
There has been an alternative all the time:
http://en.wikipedia.org/wiki/Transport_Layer_Interface
This guy's worried about "narrowing the ways in which developers think about and write networked applications" in a world where people are reinventing wall(1) as twitter, IRC as friendfeed, and other web 2.0 'innovations.' You want to widen developers' thinking about networking? Leave sockets alone and close off port 80.
REM Old programmers don't die. They just GOSUB without RETURN.
There are Berkeley sockets which are relatively portable, and then there are extremely platform-specific APIs for high performance and scalability. The old API might have run it's course, but most of the new ones are still relevant. Things like asio are helping to merge all the differences into one nice API.
Although the addition of a single system call to a loop would not seem to add much of a burden, this is not the case
Really? For a lot of networking code that's in use these days, I don't see that the system call overhead is the bottleneck. On clients you usually have network bandwidth as the limiting step (rather than system calls). On servers, it usually seems to be disk access or HLL interpreters.
Each system call requires arguments to be marshaled and copied into the kernel, as well as causing the system to block the calling process and schedule another.
That's easy to fix without changing the socket API: just add a system call that can return multiple packets from multiple streams simultaneously, a cross between select and readv. If there's a lot of data buffered in the kernel, it can then return that with a single system call.
Solving this problem requires inverting the communication model between an application and the operating system.
Not only does it not require that, inversion of control doesn't even solve it, since you still have the context switches.
BSD sockets have a limitation of only a single stream at a time (for example, if you are loading a website over HTTP and you get stuck loading a huge image, you have no choice but to open up another socket connection or else wait). They are also stuck around the paradigm of only supporting byte streams, which means that users are always forced to write the same code over and over to create packet headers or delimited messages.
I would highly recommend checking out Structured Stream Transport. I'm not from MIT and I wasn't entirely satisfied with their sample implementation, but the paper is really insightful and explains how you can develop basically a smarter version of TCP that is both more efficient and also more flexible. And I'm sure there are other systems being developed with similar ideas in mind.
We definitely need to keep bsd sockets, if not just because I'm a regular user of netcat :-p, and also because they are what allow the creation of more advanced protocols, but I don't think most applications should still be using such low-level protocols today.
he recently developed SCTP (Stream Control Transport Protocol)4 incorporates support for multihoming at the protocol level, but it is impossible to export this support through the sockets API
The word that bugs me there, is "impossible". The question is, why? If you have to do something with sockets under the hood, then so be it, but it would seem to me that you could just add a few more fields to socket address to take into account multiple homes.
We've already had alternative APIs to sockets and for quite some time. sockets won. There were named pipes, ipx/spx, and the seemingly stupid idea of treating a network resource as a file has trumped every time.
This is my sig.
I am developing SCTP applications and has contributed to the linux implementation, and I think that one of the advantages of the socket API is that it is usable with select()/ and poll(), ie. it is file descriptors you can pass around.
But for SCTP there are things that don't fit nicely into the socket API, especially when using one-to-many socket types. For instance for retrieving options for an association you have to piggyback data in a getsockopt() call by using the output buffer also for input. It works, but it is not nice. Also, for sending/receiving messages you have to use sendmsg/recvmsg with all the features including control data, and the ugly control data parsing.
And hey one size fits all.
In my experience the way the socket API can slow down a processor is having to monitor many thousands of socket descriptors using select() or poll(), like in a web server. For Linux epoll() was created for this scenario.
Macs used STREAMS from system 7.5.2 onwards. Was kind of sad to see that go away with the switch to OS X.
I've been complaining about them for years. Maybe a generation ago, they were useful, but I think most people wear them out of tradition. The first thing I do when I get home is take off my shoes and socks. Also, I hate pants. ... oh! "Sockets" ... they're fine or whatever.
IOCP is perfect for both high bandwidth and low latency. ...you have to use the "ugly" windows os for it :)
...that most of the things that this guy is talking about would be better implemented below the sockets API. As in, how the OS handles things. Making things transparent is a good thing.
I'll also point out that having a fail over interface so that the client doesn't lose the connection has already been done in OpenBSD's pf called CARP. It is a free alternative to VRRP and HSRP. In other words, this doesn't have to be implemented in the API when another avenue already exists that does it.
My socks are fine for now. When they do run their course I go to walmart and get new socks its $5 for 6 pair!
This is hardly news and partly mistaken.
The statement that sockets limit throughput by copying between kernel and application processes is a bit simplistic. The copy of Rx data to an application usually primes the cache. If data isn't touched and loaded into the cache at this point, it will have to be loaded shortly, anyway. Granted, for Tx this trick does not hold.
Second, the interface is not the implementation. Just because sockets are traditionally implemented as system calls does not state that they have to. User level networking is a well known alternative to OS services for high-bandwidth and low-latency communication (e.g., U-net developed around '96). I know, because I myself built a network stack with large shared buffers that implements the socket API through local function calls (blatant plug, but on topic. The implementation is still shoddy, but good enough for UDP benchmarking).
User level networking can also offers low latency. My implementation doesn't, but U-net does.
This leaves the third point of the article, on multihoming. As sockets abstract away IP addresses and network interfaces, I don't see why they cannot support multihoming behind the socket interface. Note that IP addresses do not have to mapped 1:1 onto NICs. Operating systems generally support load-balancing or fail-over behind the interface through virtual interfaces (in IRIX) or some other means (Netfilter in Linux).
Not need to replace sockets just yet.
Have Rockets Run Their Course?
The word that bugs me there, is "impossible". The question is, why? If you have to do something with sockets under the hood, then so be it, but it would seem to me that you could just add a few more fields to socket address to take into account multiple homes.
Especially since SCTP actually does use the sockets API. You have to use recvmsg() instead of recv() if you want to do multi-homing, but in using SCTP I was actually impressed by how flexible the BSD socket API actually is. I can't say I particularly like it, and everyone who uses it ends up writing a wrapper around most of the send and recv calls, but flexibility is definitely it's strong point. If we ever do get routing by carrier pigeon, the BSD socket API will be able to adapt to it.
Qxe4
Language evolves, get over it. Often the best name for some abstract concept is a metaphor using some concrete object. Your computer is full of strings and threads too! Call the dictionary police!
Oh, and that language your so possesive of? Yeah, hasn't been around very long (and the King would like to have a work with you about what you've done to His English).
This is a bad article and a bad thread and you all should feel bad for posting it and taking it seriously and, finally, for reading this -- my post/
I couldn't get to the article, but if they think Berkeley sockets are obsolete, I'd like to see what alternative they offer, why they think these alternatives are better, and what the pitfalls of the alternatives are.
Please correct me if I got my facts wrong.
Oh, and that language your so possesive of? Yeah, hasn't been around very long (and the King would like to have a work with you about what you've done to His English).
Case in point.
BSD sockets are also an 'old school set'. This might not be the site for you.
"damn punk kids"
Ro-ro..
Let's get outta here Scooby!
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
The socket API... or rather the UNIX file descriptor API... has been extended many times. Sockets are already one such extension, and there's no reason you couldn't do something like mmap() a socket to map the buffers into user space directly. Heck, udp sockets already diverge from the read/write paradigm.
The problem with sockets is at a higher level. They're not mapped into the file system name space. You should be able to open a socket by calling open() on something like "/dev/tcp/address-or-name/port-or-name" and completely hide the details of gethostbyname(), bind(), and so on from the application layer. If they'd done that we'd already be using IPv6 for everything because applications wouldn't have to know about the details of addresses because they'd just be arbitrary strings like file names already are.
I thought this story was about wireless energy. You know, wall sockets.
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
what the hell is wrong with you ?
You mean, like pipes?
"...high bandwidth, low latency..."? Low latency? Is the author working on some alternative universe Internet with low latency, rather than the high, increasing, and highly variable latency of the Internet here in this universe/on this planet? Or perhaps he has a telco that isn't continuously raising the price of T1s and T3s to force him onto high-latency IP connectivity "solutions"?
sPh
Go and use z/OS and it's MVS filesystem.
AAAAAARRRRGGGHHHHH!!!!!
Already does that.
That explains why - fortunately - it wasn't widely adopted.
All hope abandon ye who enter here.
but flexibility is definitely it's strong point.
its*
sorry!
It seems to me that all the issues the author mentions could be solved with some library written over the top of sockets (and potentially other primitives like threads). Sockets are meant to be a low level interface, not to solve every problem.
The multi-home problem is real, but could be fixed with a relatively minor extension to the API, like IPV6 has been added in.
How does Plan 9 do this? From memory it wasn't precisely sockets... but more interesting. gah... I'll go research
Semi-automatic amateur armchair Australian philosopher; conjecture ready at any moment...
Seriously you must touch my monkey.
My other sig is a knife wound.
Having RTFA, I have to ask: "What in Cthulu's name have APIs got to do with all this?".
The author broadly complains of the current status of networking at the OS level (copying bytes, connecting to/from multihomed hosts, etc.). APIs don't get into it.
The title of the article appears to be an attention grabbing device, it could well have been titled "Does Britney Spears carry my baby?".
(The incipit would be "No. Now, in a world of low latency and high bandwidth...")
Cheers,
alf
Comment removed based on user account deletion
(It helps that all of the above are true. I've been the Winsock PM now for just a week).
What's good about Winsock?
What's bad?
Why are you using Winsock and not, for example, Windows Communication Foundation, or an HTTP protocol?
What kind of program are you writing?
Is there anything else you'd like to tell the Microsoft Winsock PM?
Fun Winsock fact: the most popular comment in the MSDN "comment on this topic" is "sdslk". Followed by, "get rid of this window"
"Otherwise, TLI looks similar, API-wise, to sockets."
So if the Sockets API is limiting.. how exactly does it help to take on an alternative, but similar API?
We did get routing by carrier pigeon. And yes, sockets did handle it just fine.
http://www.blug.linux.no/rfc1149/
I've never been impressed with the mania for "zero copy" systems. On modern CPUs, copying of data handled very recently is cheap, because it's already in the faster caches. On the other hand, mucking about with the MMU to move pages from one address space to another tends to be expensive, especially if cache flushing is required. Mach made that design mistake.
I'm old enough to remember when the "sockets" API was developed. We'd been using a very early 3COM TCP/IP package, "UNET", which predated BSD networking. It simply used "open", "read", and "write", rather than special "socket" calls. Adding extra calls was very Berkeley; they were writing alongside the UNIX kernel, not fully integrating their own stuff. There was no reason not to have "read" and "write" work on sockets, and in some operating systems, they do.
Bear in mind that BSD didn't have threads. Hence the need for the polled "select" model.
If you want to see interprocess communication done right, look at QNX. Their "MsgSend", "MsgReceive", and "MsgReply" model allows one program to call another. If you want networking to call the application, that's the way to do it. It's a proven model, and it's fast enough that I've pumped uncompressed video through it with message passing using about 3% of a Pentium III class CPU.
By the way, bear in mind that ACM Queue is just a sort of blog. That's not a refereed paper.
That developers who embraced an OS that is designed with the idea that "everything is a file" prefer an approach of treating a network resource as a file.
My favorite alternative is iWarp. It uses a lot of current infrastructure, it was made for acceleration, it's message based (makes coding much easier), has easier-to-use mechanisms as well as lightning fast mechanisms, and is asynchronous (although, it provides functions for syncing).
I've worked on any number of projects in which we created a new network API, usually with UI tools to match. Of course, our package always used on sockets as the lower-level "internal" basis.
It's called "layering". Some network programmers have learned that it's a useful approach.
So far, I've never seen a different networking UI that's easier to program (or debug) than sockets. I keep reading articles like this on the topic, but I'm still looking for one that's better for the job. It's possible that the Berkeley people found the best low-level approach.
However, one thing that they missed that has been sorely needed on a number of projects is a "timeout" parameter to the connect() call. Time and again I've seen cases where an app hangs inside a call to connect() and never returns. Typically setting an alarm won't interrupt the call, and sometimes even a "kill -9" won't kill the process. Sometimes only a reboot will get rid of the zombie process. If the OS no longer gives your process any cpu time, it doesn't matter what clever code you have in it to diagnose problems. This seems to happen under unknown conditions on all OSs (though I haven't actually tested it on all releases of all OSs, so I could be wrong).
But this is an implementation detail of what's basically a fairly sound design.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
I fear it might be more than a coincidence that a guy selling T-shirts with "the resistance army" on it got arrested for doing something seemingly trivial.
Wow, wrong article, whoops. Sorry
Another interesting read on sockets performance, for the most part specifically in Linux context, is this. It seems to be rather detailed, covering a lot of ground. However, it's last updated in 2006, so I wonder what had changed since then...
TFA goes as far as saying networking sockets are a proven technology. "is quite impressive for an API to have remained in use and largely unchanged for 27 years." Then wanders off with nothing more than an example of how something works better in a small area of networking. It is as if the writer is suggesting we should switch everything over to stateless protocols and blast servers with simultaneous connection requests or send back 8-16 packets at the same time to a client. That would be insane. And, I hope not what he was intending. Not that the bandwidth could/should be there but if you think about what can be done by a bad guy under the cover of 8-16 packets being sent at one time from 8-16 different ports. As far as other protocols go (TCP) some of us prefer all of the checks involved using sockets. FTFA"The typical processing loop of a sockets-based program isnâ(TM)t simply read(), process(), read(), but instead select(), read(), process(), select()." I would rather have a lock than a buffer overflow. The faster the machine the less important that lock gets. The importance of security doesn't shrink. Applications that are currently using UDP (with the exception of applications that transfer very little information DNS/NTP) would be much better off using SCTP. Multihomed transfers of data would be useful to simplify bandwidth sharing designs. ie. one home for each isp without a separate load balancing software is possible with applications designed around SCTP. So SCTP, and other stateless transaction protocols, do work better without having to have a single socket open to send packets through when multiple packets can be sent to multiple destinations simultaneously. Statefull applications, however, are best left to reliable old sockets.
Having to work for a living is the root of all evil.
XTI gave you more control at the application level. So for example you could do TCP/IP, IPX, with very similar code NetBIOS. With sockets you need kernel and/or libsocket support for that kind of thing like how you have SOCK_STREAM, SOCK_DGRAM, and SOCK_RAW. The other thing is it allowed you to tweak things more easily since the kernel was not so involved. Say you did not want naggle with your TCP or you wanted to make a protocol something like SCTP to T/TCP.
Sockets are very annoying when you have a lot of clients being served by one server. Consider, for instance, a chat server, with 25000 clients connected. You have 25000 sockets, one per client (plus a listen socket for new clients to connect to).
Whenever data arrives, the system has to somehow notify you that one of your sockets is ready to read. That generally involves some kind of polling, with select or poll, or some kind of interrupt mechanism, such as a signal. I'm leaving out some options, but regardless of how you get notified, you then read the data from the appropriate socket.
Then guess what happens? Most likely you take that data, wrap it in a data structure that tells you which client it was for, and stick it on a work queue, where the main thread or threads pull things to process.
Step back and look at what happened here:
That's just insane! The kernel demultiplexed the incoming data, and the server just remultiplexed it when it put it onto the work queue. Demultiplexing belongs in the server application, not the kernel.
What I want is a single stream between my code and the kernel that delivers all the data for all 25000 clients. Whenever any client has data, I want to be able to read from that, and get back a message, that identifies which client it is from, and gives me that data.
The kernel should just be parsing the incoming TCP stream enough to recognize what port a given packet is for, and what client it came from, and then should queue it up into a single stream for the server handling that port. (The kernel has enough information from that to keep track, on a per client basis, of how much data is pending in the queue for the server app, so has what it needs to manage flow control).
I know very little about it, as there doesn't seem to be a ton of info available. It does seem like XTI still ships with Solaris and is at least available through third-party opensource libraries for linux.
It seems to me like most of what your saying sockets is missing, it has gained through higher-level abstractions. Since XTI seems to be a higher level API in and of itself.. I'm not sure if I fully grasp what real advantages it would provide.
In any case I have to imagine that the article is in actuality looking for something new. Perhaps something like XTI in that it is abstract and outside the umbrella of the kernel, but more effecient than what XTI was coming to be.
*I* think it's funny. Relax, ya' old stick-in-the-mud, don't be so hard nosed!