This again. No sir, you don't know the exact number of bytes up front, you have to wait for your short read/write return value or EAGAIN before you know that number.
I am a Unix programmer, you are a Windows one, so stop spewing bullshit about things you have never seen and never used.
Once file descriptor is marked by poll() as available for write, write() system call succeeds with whatever number of bytes that kernel can accept for I/O operation. If I/O is nonblocking, this is the amount of data buffered for operation -- driver will make sure that either it will be written, or there will be connection reset, or another similar low-level error will be detected, later visible through another poll().
Once write() succeeded, it's up to the userspace program to decide which I/O it is supposed to perform and in which order. However if by any chance there is another buffer to write to the same file descriptor (unrelated chunk of data or remaining data that was not accepted in previous write()), the recommended procedure is to keep the data waiting until another poll() call. Such poll() will put the process to sleep until process can do something meaningful with I/O -- or won't if by the time poll() is called the kernel buffer is available again. This is how Unix-like systems optimize I/O and scheduling -- by waking up processes when those processes have I/O available, and letting them remain awake when they are processing that I/O. Getting EAGAIN means that you tried to write() blindly without checking, and wasted CPU time and syscall interface when you were supposed to be asleep -- it is acceptable in some edge cases, but severely discouraged in general because it second-guesses the logic of the scheduler. Same applies to other write()-like syscalls and epoll.
With read() and similar, situation is even simpler -- if file descriptor is available for reading, nonblocking read() always returns data already buffered by the kernel. If there is more data waiting, it will show up in the next poll(), buffering mechanism is entirely up to the kernel and drivers, so they can be as efficient as their developers can make them -- the only requirement is to support I/O operations in a manner consistent with the kernel interface. Again, read()-like operations and epoll act in the same manner.
No forced concurrency in userspace. No callbacks. No busy-waiting. No "events". No restrictions on what can be waited on at the same time -- GUI program can wait on devices, network sockets, IPC and GUI events at the same time issuing a single syscall, and being awakened on any combination of those conditions. If programmer wants to split those things between processes or thread, he can do it as much as he pleases, but this will be his concurrency, imposed by design of his program, and not something imposed on the program by stupid design of the kernel.
You mention "Windows advocates". I'm not one of those. I use both, these days probably *nix more than Windows if anything.
You are Windows programmer and Windows advocate. You think, doing write() in a tight loop and waiting for EAGAIN is a valid programming technique -- $deity forbids, you actually write something for Unix-like systems, because with this knowledge those would be the buggiest software ever written for Unix-like systems. You don't understand Unix. You don't know anything about Unix. You are a Windows programmer, and I recommend you to stay on Windows.
But I do recognize that NT has asychronicity "built in", everywhere that kernel mode does I/O you have to consider it, its designers had it in mind from day 1 as the primary I/O method, whereas I can't say the same thing about the Unix syscalls. You might consider AIO a needless check box but a proper in-kernel implementation (none of this "spawn a thread and call a blocking API" nonsense that glibc does) performs decently, and to me, it makes more sense than your preferred method, which, let's be honest, looks l
If the amount of data exceed the amount that can be processed, you will not get EAGAIN, you will have operation succeeded with the size of the accepted chunk as the return value. All Unix programmers know that.
No. The difference is fundamental, and I don't even consoder either POSIX specification or glibc interface to be relevant, as they implement unneeded and unwanted interface for no other reason but to provide a checkbox, so Windows advocates can shut up about those features.
IOCP do nothing to alert the process about data being available for reading but expect a process to do somethig when I/O finished succesfully from the kernel's point of view. pol() and epoll do the opposite, combine notificatons for all I/O but make absolutely sure that exact number of bytes written or read is known immediately for nonblocking file descriptors. This is a superior model from userspace's point of view, as buffers can be marked as filled or empty after one nonblocking syscall. IOCP is only efficient when combined with concurrency, poll() and epoll can be used with no concurrency and no busy-waiting, and such mode is actually the most optimal way of using them except for the situation when the work is more efficiently split between cores.
Not sure I see how that can't be done with an I/O completion port?
Because you have to either wait for it using clumsy interface (and abandon combined handling of all I/O or any subser of I/O in a single thread), or delegate it to completion callback function (and deal with multiple contexts). Either way, threads and locking out of nowhere.
Is that really the case? Here I was under the mistaken impression that in the NT kernel everything is a FILE_OBJECT. I'll admit I don't know much about the kernel implementation of Winsock. From what I can see on MSDN it appears to be pretty similar to file I/O.
Until it will be possible for a single or multiple threads/processes to simultaneously wait for anything happening on any number of such objects of any nature in any combination (what poll() does in Unix), it's not a unified file descriptor.
The way I see it Linux forces a lot of extra work in user mode because it has to handle EAGAIN and remember how much the kernel completed last time..
If your program has to rely on EAGAIN, you are doing it wrong (or use it for send()/write() in a program specifically for low response latency, when it is applicable for the protocol). If poll()/epoll tells you that I/O operation is available, this I/O operation will succeed (though if it's a write operation to a device or TCP socket, it may be limited to a smaller buffer than one you are sending). If you want to complain about "spurious" errors, complain about EINTR, a compromise with between a relatively ugly signals mechanism and a requirement to never force threads upon the user.
So I don't really understand why "spilling into userspace" is less of a problem on Unix.
That's because you don't know how to use it on Unix.
My understanding was also that Solaris had a better aio_* implementation.
I don't even know how good or bad it is on Linux because no one came up with a problem that requires it on anything Unix-like. Of all software written since AIO became available on Linux, nginx web server is the only example I can find that actually uses it (if the user chooses to enable AIO), and benefits seem to be dubious at best. In everything else when poll() scalability is insufficient, people just use epoll, and that's it.
And on Unix-like systems the whole thing is completely unnecessary because file descriptors that are waiting for write will cause the same thread/process to wake up when a chunk of data can be written, with those file descriptors marked as available for write (and the same for read if the previous read operation was incomplete from the userspace process' point of view, regardless of what kernel thinks about completeness or incompleteness of its own operations). In other words, userspace only wakes up when it has to read, write or react to an error.
Kernel can optimize its buffering and scheduling in whatever way it pleases while hiding it from the userspace, and userspace can have whatever logic of its own buffering the programmer wanted to implement, without either performance hit of being awakened when there is absolutely nothing to do, or interaction between contexts that requires additional locking/synchronization.
AIO is developed in a way that turns the interface into a "product" -- it's faster to write for it if you are prohibited from implementing complex logic in userspace that can be re-used in multiple pieces of software. Unix nonblocking I/O is not designed as a shiny "product" -- a program written for it requires the programmer to write or re-use some well thought out buffering mechanism that implements programmer's intentions about logic behind the I/O. This does not help when the goal is to write a simple demo example and then scream "See? It's simple! Everything is done for you!" to the prospective users, but it is a superior and much more flexible model that allows Unix programmers to write better software. It's interesting that Linux development over the last decade focused on all important functionality shifting to userspace while the kernel provides optimal interface for such implementation (like ALSA and udev), not the complete packaged functionality itself (like OSS and devfs). This is a continuation of the same engineering tradition, though on a different level.
This is like complaining that O_NONBLOCK or aio_write() accomplishes it in a POSIX-specific manner. So?
No, it's like complaining that select()/poll()/epoll allow aggregation of all I/O within the same context or splitting it into multiple contexts (processes or threads), so no one sane uses AIO on Unix, while Windows has hideous multi-context monstrosity forced upon users as the only model that does not slow everything to a crawl. All because illustrious Windows designers, did not implement unified file descriptors.
The functionality is there. My statement is that asynchronicity is more fundamental to the system design than it is on Linux.
Things should be asynchronous INSIDE the kernel. Linux does not spill it into userspace unless user really, really wants it there (by using AIO).
aio_* functions were hardly implemented, or implemented with many quirks
I wouldn't know how well they are implemented because I have never seen an application where they are necessary or superior to other methods.
O_NONBLOCK is a less ideal solution than "here are my I/O buffers, tell me when you're done"
Only on Windows, where you can't issue single poll() on all I/O that your process may to perform.
Many types of file descriptors (example: files on disk) will block even if you're using non-blocking or async calls.
If you have to do something while disk I/O is in progress, and it is the same process as one that requested this I/O, you can delegate disk I/O to another process or thread, thanks to superior IPC provided by Unix-like systems. As opposed to sockets, disk I/O can only succeed, fail, or return less data than requested, so there is no need to handle any state other than errors and amount of data transferred. This creates just as much additional context as completion, but notifications can go through regular IPC (say, a pipe), and can be included into the same poll() as the rest of I/O. Have I mentioned that the whole thing becomes lockless from the processes' point of view, and scheduler automatically optimizes context switching based on pending I/O and notifications because they have exactly the same internal representation, as opposed to imitating a hardware interrupt model with its braindead choice of priorities?
You have absolutely no idea, what filesystems and nonblocking I/O are. Windows implement superficially similar functionality, however both underlying implementation, and available interface are done in Windows-specific manner. Symlinks have volume boundaries "magically" resurrected for relative links (Unix always uses a single tree with mount points having no effect on interpretation of file paths and names) and can point to remote files (what is basically forcing an automounter to be constantly present), asynchronous I/O does not work in a single-context model used in Unix-like systems, creating concurrency out of thin air.
And before any wannabe heros mod me down you might want to consider that YOUR data could be part of it.
I would prefer if my data on insecure servers was taken by someone who widely announces the problem, rather than by someone else who would do it in secrecy and cause me some serious trouble.
And the survival rate of such companies will be, exactly what? "Over the last 200 years" only a very small percentage of people could afford starting an industrial business.
Is it another Brian Proffitt's intellectual abortion? I think, I recognize the style...
Oh wow, it is not! But wait, this is the illustrous author of Mono and proud applicant to Microsoft Unix IE team, ex-Novell executive Miguel de Icaza! (To be fair, some Miguel's work is not nearly as idiotic, and most of stupidity in GNOME happened after he left).
OS compartmentalization doesn't help running different OSs on the same hardware
The only reason for running multiple OSes on the same hardware is some software being sabotaged to not run on OS where everything else runs. There is no benefit in encouraging this kind of behavior.
and I don't know of any implementations that allow for live migration between physical hosts
It's such a rarely needed capability, no one bothered implementing it since Condor (that exists, but no one uses it). Any infrastructure for state persistency on the application level serves the same putpose better and provide other benefits unachievable by a simple checkpointing/transfer. However if it was necessary, it would be implemented with or without compartmentalization (it is not always necessary to combine hosts, when the goal is migrating applications).
Hardware emulation. Hardware emulates different hardware (in this case, set of processors with peripherals). OS design started from this, then the idea of virtual memory that uses model radically different from the hardware memory model, killed it. Similarly, time-sharing turned into scheduling based on state of processes and i/o operations (modern virtualization uses "cooperative" scheduling where OS still has something to say, however it's still clumsy), and storage partitioning was replaced with permissions model in filesystems.
If someone has any illusions about the time I am talking about, let me spell it out: IT WAS LATE SIXTIES. Modern virtualization is a regression toward pre-minicomputers state of computer technology. And no, it's not any better this time around, it has exactly the same benefits and exactly the same flaws as then.
I'd say the biggest drawback to pot smoking in teenage years is a lack of ability to find and keep a job.
Teenagers shouldn't be working anyway, they should be in school.
Not in US. Without working awful teenage jobs those people will never develop hatred and disdain toward all other people, or all-overriding envy toward psychopatic "leaders". The ideology of "self-reliant man" can't survive without that.
This again. No sir, you don't know the exact number of bytes up front, you have to wait for your short read/write return value or EAGAIN before you know that number.
I am a Unix programmer, you are a Windows one, so stop spewing bullshit about things you have never seen and never used.
Once file descriptor is marked by poll() as available for write, write() system call succeeds with whatever number of bytes that kernel can accept for I/O operation. If I/O is nonblocking, this is the amount of data buffered for operation -- driver will make sure that either it will be written, or there will be connection reset, or another similar low-level error will be detected, later visible through another poll().
Once write() succeeded, it's up to the userspace program to decide which I/O it is supposed to perform and in which order. However if by any chance there is another buffer to write to the same file descriptor (unrelated chunk of data or remaining data that was not accepted in previous write()), the recommended procedure is to keep the data waiting until another poll() call. Such poll() will put the process to sleep until process can do something meaningful with I/O -- or won't if by the time poll() is called the kernel buffer is available again. This is how Unix-like systems optimize I/O and scheduling -- by waking up processes when those processes have I/O available, and letting them remain awake when they are processing that I/O. Getting EAGAIN means that you tried to write() blindly without checking, and wasted CPU time and syscall interface when you were supposed to be asleep -- it is acceptable in some edge cases, but severely discouraged in general because it second-guesses the logic of the scheduler. Same applies to other write()-like syscalls and epoll.
With read() and similar, situation is even simpler -- if file descriptor is available for reading, nonblocking read() always returns data already buffered by the kernel. If there is more data waiting, it will show up in the next poll(), buffering mechanism is entirely up to the kernel and drivers, so they can be as efficient as their developers can make them -- the only requirement is to support I/O operations in a manner consistent with the kernel interface. Again, read()-like operations and epoll act in the same manner.
No forced concurrency in userspace. No callbacks. No busy-waiting. No "events". No restrictions on what can be waited on at the same time -- GUI program can wait on devices, network sockets, IPC and GUI events at the same time issuing a single syscall, and being awakened on any combination of those conditions. If programmer wants to split those things between processes or thread, he can do it as much as he pleases, but this will be his concurrency, imposed by design of his program, and not something imposed on the program by stupid design of the kernel.
You mention "Windows advocates". I'm not one of those. I use both, these days probably *nix more than Windows if anything.
You are Windows programmer and Windows advocate. You think, doing write() in a tight loop and waiting for EAGAIN is a valid programming technique -- $deity forbids, you actually write something for Unix-like systems, because with this knowledge those would be the buggiest software ever written for Unix-like systems. You don't understand Unix. You don't know anything about Unix. You are a Windows programmer, and I recommend you to stay on Windows.
But I do recognize that NT has asychronicity "built in", everywhere that kernel mode does I/O you have to consider it, its designers had it in mind from day 1 as the primary I/O method, whereas I can't say the same thing about the Unix syscalls. You might consider AIO a needless check box but a proper in-kernel implementation (none of this "spawn a thread and call a blocking API" nonsense that glibc does) performs decently, and to me, it makes more sense than your preferred method, which, let's be honest, looks l
Do you speak it?
If the amount of data exceed the amount that can be processed, you will not get EAGAIN, you will have operation succeeded with the size of the accepted chunk as the return value. All Unix programmers know that.
No. The difference is fundamental, and I don't even consoder either POSIX specification or glibc interface to be relevant, as they implement unneeded and unwanted interface for no other reason but to provide a checkbox, so Windows advocates can shut up about those features.
IOCP do nothing to alert the process about data being available for reading but expect a process to do somethig when I/O finished succesfully from the kernel's point of view. pol() and epoll do the opposite, combine notificatons for all I/O but make absolutely sure that exact number of bytes written or read is known immediately for nonblocking file descriptors. This is a superior model from userspace's point of view, as buffers can be marked as filled or empty after one nonblocking syscall. IOCP is only efficient when combined with concurrency, poll() and epoll can be used with no concurrency and no busy-waiting, and such mode is actually the most optimal way of using them except for the situation when the work is more efficiently split between cores.
Not sure I see how that can't be done with an I/O completion port?
Because you have to either wait for it using clumsy interface (and abandon combined handling of all I/O or any subser of I/O in a single thread), or delegate it to completion callback function (and deal with multiple contexts). Either way, threads and locking out of nowhere.
Is that really the case? Here I was under the mistaken impression that in the NT kernel everything is a FILE_OBJECT. I'll admit I don't know much about the kernel implementation of Winsock. From what I can see on MSDN it appears to be pretty similar to file I/O.
Until it will be possible for a single or multiple threads/processes to simultaneously wait for anything happening on any number of such objects of any nature in any combination (what poll() does in Unix), it's not a unified file descriptor.
The way I see it Linux forces a lot of extra work in user mode because it has to handle EAGAIN and remember how much the kernel completed last time..
If your program has to rely on EAGAIN, you are doing it wrong (or use it for send()/write() in a program specifically for low response latency, when it is applicable for the protocol). If poll()/epoll tells you that I/O operation is available, this I/O operation will succeed (though if it's a write operation to a device or TCP socket, it may be limited to a smaller buffer than one you are sending). If you want to complain about "spurious" errors, complain about EINTR, a compromise with between a relatively ugly signals mechanism and a requirement to never force threads upon the user.
So I don't really understand why "spilling into userspace" is less of a problem on Unix.
That's because you don't know how to use it on Unix.
My understanding was also that Solaris had a better aio_* implementation.
I don't even know how good or bad it is on Linux because no one came up with a problem that requires it on anything Unix-like. Of all software written since AIO became available on Linux, nginx web server is the only example I can find that actually uses it (if the user chooses to enable AIO), and benefits seem to be dubious at best. In everything else when poll() scalability is insufficient, people just use epoll, and that's it.
And on Unix-like systems the whole thing is completely unnecessary because file descriptors that are waiting for write will cause the same thread/process to wake up when a chunk of data can be written, with those file descriptors marked as available for write (and the same for read if the previous read operation was incomplete from the userspace process' point of view, regardless of what kernel thinks about completeness or incompleteness of its own operations). In other words, userspace only wakes up when it has to read, write or react to an error.
Kernel can optimize its buffering and scheduling in whatever way it pleases while hiding it from the userspace, and userspace can have whatever logic of its own buffering the programmer wanted to implement, without either performance hit of being awakened when there is absolutely nothing to do, or interaction between contexts that requires additional locking/synchronization.
AIO is developed in a way that turns the interface into a "product" -- it's faster to write for it if you are prohibited from implementing complex logic in userspace that can be re-used in multiple pieces of software. Unix nonblocking I/O is not designed as a shiny "product" -- a program written for it requires the programmer to write or re-use some well thought out buffering mechanism that implements programmer's intentions about logic behind the I/O. This does not help when the goal is to write a simple demo example and then scream "See? It's simple! Everything is done for you!" to the prospective users, but it is a superior and much more flexible model that allows Unix programmers to write better software. It's interesting that Linux development over the last decade focused on all important functionality shifting to userspace while the kernel provides optimal interface for such implementation (like ALSA and udev), not the complete packaged functionality itself (like OSS and devfs). This is a continuation of the same engineering tradition, though on a different level.
This is like complaining that O_NONBLOCK or aio_write() accomplishes it in a POSIX-specific manner. So?
No, it's like complaining that select()/poll()/epoll allow aggregation of all I/O within the same context or splitting it into multiple contexts (processes or threads), so no one sane uses AIO on Unix, while Windows has hideous multi-context monstrosity forced upon users as the only model that does not slow everything to a crawl. All because illustrious Windows designers, did not implement unified file descriptors.
The functionality is there. My statement is that asynchronicity is more fundamental to the system design than it is on Linux.
Things should be asynchronous INSIDE the kernel. Linux does not spill it into userspace unless user really, really wants it there (by using AIO).
aio_* functions were hardly implemented, or implemented with many quirks
I wouldn't know how well they are implemented because I have never seen an application where they are necessary or superior to other methods.
O_NONBLOCK is a less ideal solution than "here are my I/O buffers, tell me when you're done"
Only on Windows, where you can't issue single poll() on all I/O that your process may to perform.
Many types of file descriptors (example: files on disk) will block even if you're using non-blocking or async calls.
If you have to do something while disk I/O is in progress, and it is the same process as one that requested this I/O, you can delegate disk I/O to another process or thread, thanks to superior IPC provided by Unix-like systems. As opposed to sockets, disk I/O can only succeed, fail, or return less data than requested, so there is no need to handle any state other than errors and amount of data transferred. This creates just as much additional context as completion, but notifications can go through regular IPC (say, a pipe), and can be included into the same poll() as the rest of I/O. Have I mentioned that the whole thing becomes lockless from the processes' point of view, and scheduler automatically optimizes context switching based on pending I/O and notifications because they have exactly the same internal representation, as opposed to imitating a hardware interrupt model with its braindead choice of priorities?
You have absolutely no idea, what filesystems and nonblocking I/O are. Windows implement superficially similar functionality, however both underlying implementation, and available interface are done in Windows-specific manner. Symlinks have volume boundaries "magically" resurrected for relative links (Unix always uses a single tree with mount points having no effect on interpretation of file paths and names) and can point to remote files (what is basically forcing an automounter to be constantly present), asynchronous I/O does not work in a single-context model used in Unix-like systems, creating concurrency out of thin air.
This does not mean, actual usage of the word is anything other than a weasel word substitute for "invention".
Oh wow, brave defender of corporations roman_mir to the rescue of Apple!
Sugary coffee is most definitely a solution. And a suspension... To think of it, it's usually an emulsion, too, if it is with cream.
Really, it's an euphemism for "invention", but used to describe things that do not qualify for patents.
What should be said in its place is "improvement", "engineering" and anything done by "knowledge workers".
So what is wrong with "firefox http://facebook.com/" ?
And before any wannabe heros mod me down you might want to consider that YOUR data could be part of it.
I would prefer if my data on insecure servers was taken by someone who widely announces the problem, rather than by someone else who would do it in secrecy and cause me some serious trouble.
Understanding someone's view is one thing, treating objectively wrong claims about reality as if they are valid, is something completely different.
SQL injections? You mean those things I learned from YouTube when I was 12?
No, SQL injection IS WHAT YOU ARE, little Bobby Tables!
And the survival rate of such companies will be, exactly what?
"Over the last 200 years" only a very small percentage of people could afford starting an industrial business.
"ps -eo pid,user,args --sort user". Really? Try explaining that one to Aunt Mildred who just wants to check her pictures on facebook.
If you are checking Facebook pictures by listing processes, you are doing it wrong.
Is it another Brian Proffitt's intellectual abortion? I think, I recognize the style...
Oh wow, it is not! But wait, this is the illustrous author of Mono and proud applicant to Microsoft Unix IE team, ex-Novell executive Miguel de Icaza!
(To be fair, some Miguel's work is not nearly as idiotic, and most of stupidity in GNOME happened after he left).
Yes, they will instead invite the person who fondles baboons in his kitchen.
Anyone got one of them there mnemnomnics?
Republicans.
OS compartmentalization doesn't help running different OSs on the same hardware
The only reason for running multiple OSes on the same hardware is some software being sabotaged to not run on OS where everything else runs. There is no benefit in encouraging this kind of behavior.
and I don't know of any implementations that allow for live migration between physical hosts
It's such a rarely needed capability, no one bothered implementing it since Condor (that exists, but no one uses it). Any infrastructure for state persistency on the application level serves the same putpose better and provide other benefits unachievable by a simple checkpointing/transfer. However if it was necessary, it would be implemented with or without compartmentalization (it is not always necessary to combine hosts, when the goal is migrating applications).
Virtualization is NOT emulation.
Hardware emulation. Hardware emulates different hardware (in this case, set of processors with peripherals). OS design started from this, then the idea of virtual memory that uses model radically different from the hardware memory model, killed it. Similarly, time-sharing turned into scheduling based on state of processes and i/o operations (modern virtualization uses "cooperative" scheduling where OS still has something to say, however it's still clumsy), and storage partitioning was replaced with permissions model in filesystems.
If someone has any illusions about the time I am talking about, let me spell it out: IT WAS LATE SIXTIES. Modern virtualization is a regression toward pre-minicomputers state of computer technology. And no, it's not any better this time around, it has exactly the same benefits and exactly the same flaws as then.
I'd say the biggest drawback to pot smoking in teenage years is a lack of ability to find and keep a job.
Teenagers shouldn't be working anyway, they should be in school.
Not in US. Without working awful teenage jobs those people will never develop hatred and disdain toward all other people, or all-overriding envy toward psychopatic "leaders". The ideology of "self-reliant man" can't survive without that.
Judging by the photo on your homepage you're either lying or use vast quantities of anti-aging cream, hair dye and plastic surgery.....
Or maybe I just have no life.
The guy may be well out of his depth, but surely constructive advice would be preferable, especially if you're as experienced as you claim.
Any "constructive" advice would enable him to make more and worse mistakes. There is no replacement for learning things in depth.