Lets see you quit your job, sit in the mountains for 3 months and still support yourself.
Are you honestly saying you don't know anyone who's been out of work for 3 months in the last year or so and managed to get along okay? Yeah, you adjust your standard of living. It's not that tough, though. Especially if you plan ahead and save up a few months' salary.
In theory, you can give a Unix file a name of any arbitrary string of charcters
Umm, no.
\000 - Putting a zero byte in a filename will break any program written in C.
'/' - A filename with an embedded slash will be unusable except by programs that walk directory contents very carefully.
These won't work at all. The filesystem forbids them.
The comp.unix FAQ does address the case of an old, buggy NFS daemon that could be coerced to create a filename with a '/' in it, but Linux has never had such a thing at it was definitely a bug and not intended behavior.
I hear that a lot from meat eaters who can't understand how vegetarians can be happy without eating meat. They simply don't believe that you can derive the same enjoyment from a vegetarian diet as you can from eating meat. But I don't believe this is the case.
You're welcome to your opinion, but this is (quite literally) a matter of taste.
It's pretty simple set arithmetic to see that an omnivorous diet has more in it than a vegetarian diet. I'm not saying you can't have a yummy and satisfying vegetarian diet, but you _are_ losing options when you do it. If pig snouts happens to be the thing you find tastiest, the vegetarian diet doesn't include them.
And yes, if you change your eating habits you may find after a while that you no longer like what you used to eat. But that's largely a matter of conditioning.
IOW, there are a lot of good arguments for vegetarianism. This isn't one.
now it is a stir-fry with tofu or a spicy bean burger
Tofu is great. It's one of my favorite foods, extremely versatile.
Can't say as I've had a bean burger I cared for. Really, I hate all these meat substitutes (everything from soy cheese to bean burgers to Quorn). There are so many good vegetable meals that I don't see the point of eating bad meat substitutes. And if you spend all your time trying to create things that look/smell/feel/taste like meat, it undermines the "I don't miss meat" argument.
Basically, yes. "XML database" doesn't mean much about the database itself, unless you mean that the file format used to store the data is XML (which is pretty much uninteresting, except for being fairly braindead for many sorts of data sets). It tells you nothing about how, logically, the data is organized and what operations it supports (which is what "object database", "relational database", "hierarchical database", etc. attempt to convey), which is generally what a programmer using the database is _most_ interested in.
It may mean that data is presented in XML at query time and XML queries are accepted; if so, that's a moderately more interesting claim but really speaks to a database interface (a la JDBC or pydb) rather than anything interesting about the database itself. Which is not to be dismissed, but formatting results as XML is trivial compared to having to implement e.g. relations in code (for instance), and that sort of interface can be added to any kind of underlying database.
It doesn't really speak at all about the on-disk storage structures (even if data is "stored as XML"), which is often the most interesting thing from a performance standpoint and often interesting from a usability standpoint (e.g. "can efficiently store data in the existing native filesystem" is often mandatory for non-dedicated applications).
Why does the Linux kernel set the exec flag for stack pages?
Executing code on the stack isn't unheard of in legitimate programs--it's sometimes used for performance reasons and sometimes to simplify implementation. Usually it's done in cases where the program's control flow is somewhat complicated:
* Linux puts signal handlers on the stack. They need to be executable. * Kaffe and other vms put code on the stack for efficiency reasons. * Many functional programming languages write code on the stack for performance reasons. * Some garbage collectors write code on the stack * Some user-space threading libraries put code on the stack
I'm sure there are others. I know Solar Designer's noexec stack patch had some workarounds for gcc's trampolines, I'm not sure if they worked with everything or not.
The TCP stack that shipped with NT 3.51 and all later versions
The TCP stack shipping with Win2k and XP is clearly not the same stack that shipped with NT 3.51 and 4.0 (it may share substantial amounts of code, but even simple fingerprinting shows that it behaves quite differently).
I know this because I co-wrote the Windows NT Winsock implementation and I worked very closely with the TCP guys.
Okay, reality check here.
1. The Windows Socket: Background paper on MSDN says: Windows Sockets are based on the UNIX® sockets implementation in the Berkeley Software Distribution (BSD, release 4.3) from the University of California at Berkeley.
2. Although the user-mode API for NT 3.5 was implemented entirely by Microsoft, the kernel TCP/IP stack originally included a stack licensed from Spider Systems. And the Spider Systems code was based on the BSD Net/3 TCP code. While much of the Spider Systems code (for the TCP stack) was rewritten before the release of NT 3.5, some of it remained. Much more of it remained in the userspace utilities (e.g. ftp.exe) and you could see the BSD copyright notice if you ran "strings" on that binary.
Can I prove this? No, but just because you read something in a WSJ article doesn't prove anything, either.
Like I said, either the Win2k/XP stack uses the open/freeBSD stack or the programmers implementing the win2k/XP stack referred heavily to the BSD code (even for non-RFC issues) or Microsoft went to great lengths to make it appear that way or there were some amazing coincidences with a number of the implementation details. The WSJ article is one thing, but the fingerprints the Win2k/XP stack has are extremely similar to the *BSD stack in many ways.
There's nothing wrong with this, it's perfectly legal and there's no advertising clause on open/freebsd any more AFAIK.
FYI, MS's TCP stack isn't BSD-derived. Where do they use zlib, btw?
Do you have a reference for this? The Wall Street Journal ran an article a year or so back where they investigated and concluded that the stack in Windows 2000 and XP is BSD-derived. Sadly, it's no longer available online.
Circumstantial evidence: Windows has historically exhibited a lot of security flaws consistent with a port of the BSD Net/3 TCP/IP stack (which other independent TCP/IP implementations haven't shown).
Windows 2000 and later seem to have moved from Net/3 to an OpenBSD/FreeBSD-based stack. It's impossible to know for sure, but you can use fingerprinting techniques (a la queso) to see things like Windows' TCP window size being 0x402E, which just happens to be exactly the same arbitrary number that Open/FreeBSD were using for the 2-3 years leading up to the Win2K release. There's no good reason for Windows to pick this number independently. There are a host of other, similar signs that demonstrate either MS used the open/freebsd stack or they spent a lot of time trying to duplicate subtle implementation details of the open/freebsd stack that aren't part of implementing the RFCs.
If you don't collect licensces for your patent immediately, (i.e. within a reasonbale time frame) why do you get to do it years later (after everyone started using because it was free and efficient)?
Because that's the way the law works. With trademarks, you lose them if you don't constantly enforce them. Patent law is different, it allows for submarining for years and only enforcing after adoption is widespread.
The "reasonable time frame" is up to 20 years now. Ugh.
If you check American Colonial history really carefully, you'll find that the Pilgrims didn't come to the New World(C) for religious tolerance; they had that in the Netherlands. What they came for was to set up their OWN religious tyranny (example: the excommunication of some religious nonconformists from the Mass Bay colonies). Religious freeedom was only on the Puritan mind insofar as it meant freedom to practice THEIR religious orthodoxy as THEY dictated it.
The Puritans weren't big on individual liberties or religious freedom. Luckily they weren't the ones who wrote the Constitution or ran the government for the first little while there. Jefferson, Adams, Franklin, and Paine were all non-Christian (ranging from agnostic to Deist to Unitarian), and Washington and Madison both campaigned heavily against any government support of particular religions (Washington also put a lot of energy into defending the appointment of non-Christian chaplains in his army).
Freedom of religion was a real concern to them, and certainly wasn't the sham "freedom of any religion you want, as long as it's Christian" that a lot of right-wingers seem to promote today. And it did, indeed, include freedom _from_ religion if that was your personal belief.
Jefferson published an interesting work called the Jefferson Bible which is basically the New Testament with all of the miracles removed; it's just the life of Jesus as a moral man, not as the son of God.
It wasn't just them, either; at the time of the Revolution only 7% of colonists were members of any organized church (though around half the remainder were "somewhat practicing"). The times of the Puritans, where only members of 1 religion had formed your entire colony, were long gone.
It's interesting to note in these times that one of the first things Madison signed as president was the Treaty of Tripoli, which stated in part:
As the government of the United States of America is not in any sense founded on the Christian Religion - as it has in itself no character of enmity against the laws, religion or tranquility of Musselmen, - and as the said States never have entered into any war or act of hostility against any Mehomitan nation, it is declared by the parties that no pretext arrising from religious opinions shall ever produce an interruption of the harmony existing between the two countries.
If it were a commercially-released game, that would be considered something that would be a product of the content designers, who are developers.
In the commercially released games I've worked on, the content designers are most definitely not considered developers. "Developer" is applied to a wide range of people, including the person who picks up the game idea, finances it, and puts the team together (equivalent of a producer in the movie biz), the people who hustle to raise capital, and the programmers. But I've never heard it applied to the artists or designers.
[That's not to say that "developer" is a prestige title: on many projects the designers are clearly the most important, driving force with the most clout]
As a rule, people who build acknoweldgements into their UDP based protocol should have used TCP.
Not necessarily. A better rule is that people who build sequencing and acknowledgements in should have used TCP. But if you can do without one or the other then sometimes UDP is worth it, but not often.
This is tough, all OSes are different. Here are a couple of examples:
Many OSes save/restore all state when you switch contexts. Linux is lazy about saving state; it'll only save FPU state if the FPU is used by the new process. So for the 99% of process switches that aren't between FPU-using procs, you don't incur that overhead. Ditto TLB invalidation (e.g. when entering kernel mode but not switching processes); Linux is lazy about doing that. Both FPU state saving and TLB invalidation (especially) are heavyweight operations.
OS internals also vary wildly; for instance, Linux tracks processes by keeping a pointer to the current task_struct. This pointer is updated when task switching. Threads are just processes which share VM (and a few other things) and are scheduled the same way as processes. NT (4.x; I'm not sure about 2000/XP) keeps context information on the kernel stack and requires a more heavyweight stack operation when context switching. Thread switching takes a completely different path through the scheduler as threads aren't really considered similar to processes at all.
One thread per CPU event driven state machine servers will always beat One process per CPU event driven state machine servers.
The simple reason is that the OS can optimize context switches to avoid switching page tables, and the resulting cache and TLB flushes.
Sure, its not likely to be more than a 5-10% speed up on linux, but when you're groping for those last few TPS, it matters.
Can you name a single real-world application where using threads instead of processes on Linux speeds it up even 1%, let alone 5%?
Certainly if it does exist it's not an efficient application.
(Sure, other OSes can't context switch to save their lives and force you to use incorrect abstractions because of it--I don't much care)
I'm not saying to never use threads, but the decision to use threads vs. processes should be based on whether you want/need your memory to be shared (with all the problems that introduces as well as the convenience) or not, not on any perceived performance problems. Or, as Alan puts it, "threads are processes that share more". That's the way to think of them. And good modular programming remembers to share only what is absolutely necessary--keep your data hidden when possible.
I'd think you appreciate the quote on threads attributed to Alan Cox on Larry McVoy's page.
(The quote in question is: "A computer is a state machine. Threads are for people who can't program state machines." Alan Cox)
Except I'd assert that threads are far harder to program correctly than state machines. Easier concept at first, and easy to come up with a design for the 90% solution, but the devil's in the details and threads have a ton of details. Not to say that state machines don't, but they seem to cause less problems in practice.
If you don't take a cursory run with a profiler on it, you'll never know the real cost of speeding it up.
Right. It's obviously a cost/benefit tradeoff. If you start the report at midnight and need it at 8:00 in the morning, then if it takes 15 minutes to run you probably don't even want to think about profiling. If it takes 7 hours, it's still fast enough for now but you may want to concern yourself with whether it'll always be fast enough. What's the cutoff? 1 hour? 4 hours? Depends on how crucial the report is and what other projects are on your plate at the moment.
Obviously "performance problem" is tough to quantify in general, but I still contend that you should normally only profile if there is a potential performance problem (or if you have idle resources, etc). Otherwise, go do some QA. Work on a new project. Clean up the nasty hack you wrote late at night to get it going. Write some documentation. Whatever.
Not so. Have you ever compared the time between a thread-switch and a process-switch?
Yep. And for most applications, it's not meaningful. If you spend all your time context switching, you're definitely not efficiently designed whether you use threads or processes--you can definitely measure the overhead in that case, but when you go to a situation where you're synchronizing on anything (mutexes, sockets, whatever) the difference essentially disappears. And even in the measurable situation, the difference isn't huge--about 2 usecs on my home machine on a total overhead of 4 usecs vs. 6 usecs (threads vs. procs). Sure, it's 33% SLOWER!!!! Horror!!! In the real world, it generally doesn't matter and it's small enough that if context switch overhead is hurting your multiprocess app then switching to multithreading won't really help.
Both are so fast that if you though about your design at all they won't even be a blip on the radar, unlike on some OSes where switching process can take 100s of usecs vs. 10 usecs for a thread switch.
There are exceptions, which is why I didn't say that threads are always bad. But the performance argument here is almost always specious, brought up by people who learned about threaded programming on other platforms where it is a huge win and used to defend a poor design choice (look, I can measure the difference in contrived situation X even though it has no effect on system performance).
But what happens whe the program files overnight, and the poor user comes in in the morning to find that he doesn't have enough time to run the program again before the deadline?
Then you profile and optimize, because it's not "fast enough" any more.
Umm, fork() is the one that's braindead. Who the hell dreamed up a system where creating a new process would copy the entire state of an existing one only to have it wiped out when the other process did an exec()? fork() requires all sorts of nasty stuff (like copy-on write in the VM) that is ditched if the OS follows a process/thread model.
Uh, COW isn't ditched in a process/thread model. Shared libraries would suck without it. Demand paging of executables wouldn't work with it. It's a fundamentally good thing used by Unix, MacOS X, Windows, and almost all other modern OSes which support protected memory. Definitely not "nasty stuff", and by itself it eliminates 99% of the fork() overhead vs. threads.
You really want to be able to create a new process with the same state as the existing one, and fork/exec allows that. There's system() if you want an entirely new executable (which might call fork()/exec() or might call spawn(), vfork()/exec(), or whatever...). I don't feel like arguing over whether a spawn()/CreateProcess*()-style syscall is good, but not having a fork()-style syscall is simply braindead. There are things you can do with fork()/exec() that you can't do with spawn() or CreateProcess*(); the reverse isn't true.
Um, but...I think there's a confusion of context occurring. The situation you describe happens when you're writing little chunks of one-off code to perform one task and be done with. Usually it'll be used once, or is part of a stopgap "until there's a real solution."
With testing, that's generally right. If something's going to run often, it can potentially fail a lot of times and so even a small cost of failure will be compounded to the point where QA is worthwile.
With performance, that's often not true. There are a lot of jobs that don't need anything approaching "good" performance (batch reports--I need a web usage report every morning on my desk/in my inbox--where the quick-and-dirty multipass solution that takes 3 hours to run can be scheduled at midnight, and the programmer can then do another project with big ROI instead of spending time writing a faster solution that takes only seconds to run) are one extremely common example of this (as is other batch processing). Many applications fall into that domain, many of them absolutely mission critical and responsible for millions in revenue but also not worth spending time optimizing when it could be better spent testing, adding features, or working on another project entirely.
And many (I'd say most) interactive application are fast enough from the get-go and never need optimization. Sure, there are some apps that either do a lot of computation (mp3 players, games, compilers, etc), or are run many times at once (web servers), or are too slow when first run for unknown reasons. But a lot of programs are fine from the start and profiling them is a waste.
That said, thanks for the information, it has certainly helped to clear some things up. No problem.
I guess the key point I want people to remember (if I only clear one thing up...) is that a decision about whether to use threads or processes should be based on whether they want all (or mostly) shared memory, in which case threads are in order, or some protected memory (and possibly some shared) in which case processes are the way to go.
Windows has hoodwinked people into thinking threads are fast and processes are slow (and that processes have to start new executables), when that's really not the interesting detail and isn't really very true under well-designed operating systems. And you lose a lot by giving up protected memory (even only giving it up wrt other threads in your memory space).
However, "fast enough" is a really bad metric to use. Yes, utility "X" is fast enough. But oh, I didn't realize it was going to be used in conjunction with utility "Y" and "Z". Now, everything is really slow. Hey, can you say Microsoft?
Hey, I need this report on my desk every morning. It takes 3 hours to run. Let's kick it off every night at midnight.
Fast enough, even though a well-coded, well-designed implementation might take seconds to run. And mission critical. No point wasting programmer time speeding it up when we can do another project with big upside instead.
But processes as provided by current operating systems are too expensive to use.
No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.
If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.
Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.
Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT),/dev/poll (Solaris),/dev/epoll, signal-per-fd, kqueues (FreeBSD), etc. (select and poll don't scale well to 10s of thousands of connections when most are idle, but some of the others are highly scalable). See e.g. Dan Kegel's c10k page for specifics.
Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs
http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.
WTF? How does Java make it hard to write non-threaded programs?
No fork(). No multiplexed I/O. Try writing a good scalable network server in a single thread without the moral equivalent of select()
Java 1.4 recognized that, and added I/O multiplexing. Still no good multiprocess (but not multithreaded) framework, though, and I/O multiplexing only solves a limited subset of cases.
When I'm running a graphical program, the UI must not lock up, no matter what processing is going on in the background. I don't care how you solve that problem, but a simple use of threads is one of the simplest methods.
Not really. It seems simple until you get into the details. Yes, for some things multi-threaded is the way to go. But a multi-process solution is usually easier to implement and more stable, and a straight asynchronous single-threaded state machine solution is often the best (in terms of ease of implementation and performance). Remember, the difference between threads and process is that processes have protected memory and threads have all shared memory. The number of cases where you really don't want most of your memory protected is very small, especially when you remember that processes can easily get shared memory segments for select pieces of memory. Most people choose threads because they think threads are better/faster/smaller than processes (which is true on some broken OSes but not meaningfully true on Linux) rather than based on whether or not they want most memory shared.
Lets see you quit your job, sit in the mountains for 3 months and still support yourself.
Are you honestly saying you don't know anyone who's been out of work for 3 months in the last year or so and managed to get along okay? Yeah, you adjust your standard of living. It's not that tough, though. Especially if you plan ahead and save up a few months' salary.
Sumner
Umm, no.
These won't work at all. The filesystem forbids them.
The comp.unix FAQ does address the case of an old, buggy NFS daemon that could be coerced to create a filename with a '/' in it, but Linux has never had such a thing at it was definitely a bug and not intended behavior.
Sumner
I hear that a lot from meat eaters who can't understand how vegetarians can be happy without eating meat. They simply don't believe that you can derive the same enjoyment from a vegetarian diet as you can from eating meat. But I don't believe this is the case.
You're welcome to your opinion, but this is (quite literally) a matter of taste.
It's pretty simple set arithmetic to see that an omnivorous diet has more in it than a vegetarian diet. I'm not saying you can't have a yummy and satisfying vegetarian diet, but you _are_ losing options when you do it. If pig snouts happens to be the thing you find tastiest, the vegetarian diet doesn't include them.
And yes, if you change your eating habits you may find after a while that you no longer like what you used to eat. But that's largely a matter of conditioning.
IOW, there are a lot of good arguments for vegetarianism. This isn't one.
now it is a stir-fry with tofu or a spicy bean burger
Tofu is great. It's one of my favorite foods, extremely versatile.
Can't say as I've had a bean burger I cared for. Really, I hate all these meat substitutes (everything from soy cheese to bean burgers to Quorn). There are so many good vegetable meals that I don't see the point of eating bad meat substitutes. And if you spend all your time trying to create things that look/smell/feel/taste like meat, it undermines the "I don't miss meat" argument.
Sumner
Am I missing something, or is this just XML hype?
Basically, yes. "XML database" doesn't mean much about the database itself, unless you mean that the file format used to store the data is XML (which is pretty much uninteresting, except for being fairly braindead for many sorts of data sets). It tells you nothing about how, logically, the data is organized and what operations it supports (which is what "object database", "relational database", "hierarchical database", etc. attempt to convey), which is generally what a programmer using the database is _most_ interested in.
It may mean that data is presented in XML at query time and XML queries are accepted; if so, that's a moderately more interesting claim but really speaks to a database interface (a la JDBC or pydb) rather than anything interesting about the database itself. Which is not to be dismissed, but formatting results as XML is trivial compared to having to implement e.g. relations in code (for instance), and that sort of interface can be added to any kind of underlying database.
It doesn't really speak at all about the on-disk storage structures (even if data is "stored as XML"), which is often the most interesting thing from a performance standpoint and often interesting from a usability standpoint (e.g. "can efficiently store data in the existing native filesystem" is often mandatory for non-dedicated applications).
Sumner
Why does the Linux kernel set the exec flag for stack pages?
Executing code on the stack isn't unheard of in legitimate programs--it's sometimes used for performance reasons and sometimes to simplify implementation. Usually it's done in cases where the program's control flow is somewhat complicated:
* Linux puts signal handlers on the stack. They need to be executable.
* Kaffe and other vms put code on the stack for efficiency reasons.
* Many functional programming languages write code on the stack for performance reasons.
* Some garbage collectors write code on the stack
* Some user-space threading libraries put code on the stack
I'm sure there are others. I know Solar Designer's noexec stack patch had some workarounds for gcc's trampolines, I'm not sure if they worked with everything or not.
Sumner
The TCP stack that shipped with NT 3.51 and all later versions
The TCP stack shipping with Win2k and XP is clearly not the same stack that shipped with NT 3.51 and 4.0 (it may share substantial amounts of code, but even simple fingerprinting shows that it behaves quite differently).
I know this because I co-wrote the Windows NT Winsock implementation and I worked very closely with the TCP guys.
Okay, reality check here.
1. The Windows Socket: Background paper on MSDN says:
Windows Sockets are based on the UNIX® sockets implementation in the Berkeley Software Distribution (BSD, release 4.3) from the University of California at Berkeley.
2. Although the user-mode API for NT 3.5 was implemented entirely by Microsoft, the kernel TCP/IP stack originally included a stack licensed from Spider Systems. And the Spider Systems code was based on the BSD Net/3 TCP code. While much of the Spider Systems code (for the TCP stack) was rewritten before the release of NT 3.5, some of it remained. Much more of it remained in the userspace utilities (e.g. ftp.exe) and you could see the BSD copyright notice if you ran "strings" on that binary.
Can I prove this? No, but just because you read something in a WSJ article doesn't prove anything, either.
Like I said, either the Win2k/XP stack uses the open/freeBSD stack or the programmers implementing the win2k/XP stack referred heavily to the BSD code (even for non-RFC issues) or Microsoft went to great lengths to make it appear that way or there were some amazing coincidences with a number of the implementation details. The WSJ article is one thing, but the fingerprints the Win2k/XP stack has are extremely similar to the *BSD stack in many ways.
There's nothing wrong with this, it's perfectly legal and there's no advertising clause on open/freebsd any more AFAIK.
Sumner
FYI, MS's TCP stack isn't BSD-derived. Where do they use zlib, btw?
Do you have a reference for this? The Wall Street Journal ran an article a year or so back where they investigated and concluded that the stack in Windows 2000 and XP is BSD-derived. Sadly, it's no longer available online.
Circumstantial evidence: Windows has historically exhibited a lot of security flaws consistent with a port of the BSD Net/3 TCP/IP stack (which other independent TCP/IP implementations haven't shown).
Windows 2000 and later seem to have moved from Net/3 to an OpenBSD/FreeBSD-based stack. It's impossible to know for sure, but you can use fingerprinting techniques (a la queso) to see things like Windows' TCP window size being 0x402E, which just happens to be exactly the same arbitrary number that Open/FreeBSD were using for the 2-3 years leading up to the Win2K release. There's no good reason for Windows to pick this number independently. There are a host of other, similar signs that demonstrate either MS used the open/freebsd stack or they spent a lot of time trying to duplicate subtle implementation details of the open/freebsd stack that aren't part of implementing the RFCs.
Sumner
24 years old, white with red hair, carrying a valid US military ID. One would think that I'd be the last person to get searched.
Military? I hear those guys carry weapons! Definitely search them.
Sumner
If you don't collect licensces for your patent immediately, (i.e. within a reasonbale time frame) why do you get to do it years later (after everyone started using because it was free and efficient)?
Because that's the way the law works. With trademarks, you lose them if you don't constantly enforce them. Patent law is different, it allows for submarining for years and only enforcing after adoption is widespread.
The "reasonable time frame" is up to 20 years now. Ugh.
Sumner
If you check American Colonial history really carefully, you'll find that the Pilgrims didn't come to the New World(C) for religious tolerance; they had that in the Netherlands. What they came for was to set up their OWN religious tyranny (example: the excommunication of some religious nonconformists from the Mass Bay colonies). Religious freeedom was only on the Puritan mind insofar as it meant freedom to practice THEIR religious orthodoxy as THEY dictated it.
The Puritans weren't big on individual liberties or religious freedom. Luckily they weren't the ones who wrote the Constitution or ran the government for the first little while there. Jefferson, Adams, Franklin, and Paine were all non-Christian (ranging from agnostic to Deist to Unitarian), and Washington and Madison both campaigned heavily against any government support of particular religions (Washington also put a lot of energy into defending the appointment of non-Christian chaplains in his army).
Freedom of religion was a real concern to them, and certainly wasn't the sham "freedom of any religion you want, as long as it's Christian" that a lot of right-wingers seem to promote today. And it did, indeed, include freedom _from_ religion if that was your personal belief.
Jefferson published an interesting work called the Jefferson Bible which is basically the New Testament with all of the miracles removed; it's just the life of Jesus as a moral man, not as the son of God.
It wasn't just them, either; at the time of the Revolution only 7% of colonists were members of any organized church (though around half the remainder were "somewhat practicing"). The times of the Puritans, where only members of 1 religion had formed your entire colony, were long gone.
It's interesting to note in these times that one of the first things Madison signed as president was the Treaty of Tripoli, which stated in part:
As the government of the United States of America is not in any sense founded on the Christian Religion - as it has in itself no character of enmity against the laws, religion or tranquility of Musselmen, - and as the said States never have entered into any war or act of hostility against any Mehomitan nation, it is declared by the parties that no pretext arrising from religious opinions shall ever produce an interruption of the harmony existing between the two countries.
Sumner
If it were a commercially-released game, that would be considered something that would be a product of the content designers, who are developers.
In the commercially released games I've worked on, the content designers are most definitely not considered developers. "Developer" is applied to a wide range of people, including the person who picks up the game idea, finances it, and puts the team together (equivalent of a producer in the movie biz), the people who hustle to raise capital, and the programmers. But I've never heard it applied to the artists or designers.
[That's not to say that "developer" is a prestige title: on many projects the designers are clearly the most important, driving force with the most clout]
Sumner
As a rule, people who build acknoweldgements into their UDP based protocol should have used TCP.
Not necessarily. A better rule is that people who build sequencing and acknowledgements in should have used TCP. But if you can do without one or the other then sometimes UDP is worth it, but not often.
Sumner
This is tough, all OSes are different. Here are a couple of examples:
Many OSes save/restore all state when you switch contexts. Linux is lazy about saving state; it'll only save FPU state if the FPU is used by the new process. So for the 99% of process switches that aren't between FPU-using procs, you don't incur that overhead. Ditto TLB invalidation (e.g. when entering kernel mode but not switching processes); Linux is lazy about doing that. Both FPU state saving and TLB invalidation (especially) are heavyweight operations.
OS internals also vary wildly; for instance, Linux tracks processes by keeping a pointer to the current task_struct. This pointer is updated when task switching. Threads are just processes which share VM (and a few other things) and are scheduled the same way as processes. NT (4.x; I'm not sure about 2000/XP) keeps context information on the kernel stack and requires a more heavyweight stack operation when context switching. Thread switching takes a completely different path through the scheduler as threads aren't really considered similar to processes at all.
Sumner
One thread per CPU event driven state machine servers will always beat One process per CPU event driven state machine servers.
The simple reason is that the OS can optimize context switches to avoid switching page tables, and the resulting cache and TLB flushes.
Sure, its not likely to be more than a 5-10% speed up on linux, but when you're groping for those last few TPS, it matters.
Can you name a single real-world application where using threads instead of processes on Linux speeds it up even 1%, let alone 5%?
Certainly if it does exist it's not an efficient application.
(Sure, other OSes can't context switch to save their lives and force you to use incorrect abstractions because of it--I don't much care)
I'm not saying to never use threads, but the decision to use threads vs. processes should be based on whether you want/need your memory to be shared (with all the problems that introduces as well as the convenience) or not, not on any perceived performance problems. Or, as Alan puts it, "threads are processes that share more". That's the way to think of them. And good modular programming remembers to share only what is absolutely necessary--keep your data hidden when possible.
Sumner
I'd think you appreciate the quote on threads attributed to Alan Cox on Larry McVoy's page.
(The quote in question is:
"A computer is a state machine. Threads are for people who can't program state machines." Alan Cox)
Except I'd assert that threads are far harder to program correctly than state machines. Easier concept at first, and easy to come up with a design for the 90% solution, but the devil's in the details and threads have a ton of details. Not to say that state machines don't, but they seem to cause less problems in practice.
Sumner
If you don't take a cursory run with a profiler on it, you'll never know the real cost of speeding it up.
Right. It's obviously a cost/benefit tradeoff. If you start the report at midnight and need it at 8:00 in the morning, then if it takes 15 minutes to run you probably don't even want to think about profiling. If it takes 7 hours, it's still fast enough for now but you may want to concern yourself with whether it'll always be fast enough. What's the cutoff? 1 hour? 4 hours? Depends on how crucial the report is and what other projects are on your plate at the moment.
Obviously "performance problem" is tough to quantify in general, but I still contend that you should normally only profile if there is a potential performance problem (or if you have idle resources, etc). Otherwise, go do some QA. Work on a new project. Clean up the nasty hack you wrote late at night to get it going. Write some documentation. Whatever.
Sumner
Not so. Have you ever compared the time between a thread-switch and a process-switch?
Yep. And for most applications, it's not meaningful. If you spend all your time context switching, you're definitely not efficiently designed whether you use threads or processes--you can definitely measure the overhead in that case, but when you go to a situation where you're synchronizing on anything (mutexes, sockets, whatever) the difference essentially disappears. And even in the measurable situation, the difference isn't huge--about 2 usecs on my home machine on a total overhead of 4 usecs vs. 6 usecs (threads vs. procs). Sure, it's 33% SLOWER!!!! Horror!!! In the real world, it generally doesn't matter and it's small enough that if context switch overhead is hurting your multiprocess app then switching to multithreading won't really help.
Both are so fast that if you though about your design at all they won't even be a blip on the radar, unlike on some OSes where switching process can take 100s of usecs vs. 10 usecs for a thread switch.
There are exceptions, which is why I didn't say that threads are always bad. But the performance argument here is almost always specious, brought up by people who learned about threaded programming on other platforms where it is a huge win and used to defend a poor design choice (look, I can measure the difference in contrived situation X even though it has no effect on system performance).
Sumner
But what happens whe the program files overnight, and the poor user comes in in the morning to find that he doesn't have enough time to run the program again before the deadline?
Then you profile and optimize, because it's not "fast enough" any more.
Is that hard to understand?
Sumner
Umm, fork() is the one that's braindead. Who the hell dreamed up a system where creating a new process would copy the entire state of an existing one only to have it wiped out when the other process did an exec()? fork() requires all sorts of nasty stuff (like copy-on write in the VM) that is ditched if the OS follows a process/thread model.
Uh, COW isn't ditched in a process/thread model. Shared libraries would suck without it. Demand paging of executables wouldn't work with it. It's a fundamentally good thing used by Unix, MacOS X, Windows, and almost all other modern OSes which support protected memory. Definitely not "nasty stuff", and by itself it eliminates 99% of the fork() overhead vs. threads.
You really want to be able to create a new process with the same state as the existing one, and fork/exec allows that. There's system() if you want an entirely new executable (which might call fork()/exec() or might call spawn(), vfork()/exec(), or whatever...). I don't feel like arguing over whether a spawn()/CreateProcess*()-style syscall is good, but not having a fork()-style syscall is simply braindead. There are things you can do with fork()/exec() that you can't do with spawn() or CreateProcess*(); the reverse isn't true.
Sumner
Um, but...I think there's a confusion of context occurring. The situation you describe happens when you're writing little chunks of one-off code to perform one task and be done with. Usually it'll be used once, or is part of a stopgap "until there's a real solution."
With testing, that's generally right. If something's going to run often, it can potentially fail a lot of times and so even a small cost of failure will be compounded to the point where QA is worthwile.
With performance, that's often not true. There are a lot of jobs that don't need anything approaching "good" performance (batch reports--I need a web usage report every morning on my desk/in my inbox--where the quick-and-dirty multipass solution that takes 3 hours to run can be scheduled at midnight, and the programmer can then do another project with big ROI instead of spending time writing a faster solution that takes only seconds to run) are one extremely common example of this (as is other batch processing). Many applications fall into that domain, many of them absolutely mission critical and responsible for millions in revenue but also not worth spending time optimizing when it could be better spent testing, adding features, or working on another project entirely.
And many (I'd say most) interactive application are fast enough from the get-go and never need optimization. Sure, there are some apps that either do a lot of computation (mp3 players, games, compilers, etc), or are run many times at once (web servers), or are too slow when first run for unknown reasons. But a lot of programs are fine from the start and profiling them is a waste.
Sumner
That said, thanks for the information, it has certainly helped to clear some things up.
No problem.
I guess the key point I want people to remember (if I only clear one thing up...) is that a decision about whether to use threads or processes should be based on whether they want all (or mostly) shared memory, in which case threads are in order, or some protected memory (and possibly some shared) in which case processes are the way to go.
Windows has hoodwinked people into thinking threads are fast and processes are slow (and that processes have to start new executables), when that's really not the interesting detail and isn't really very true under well-designed operating systems. And you lose a lot by giving up protected memory (even only giving it up wrt other threads in your memory space).
Sumner
However, "fast enough" is a really bad metric to use. Yes, utility "X" is fast enough. But oh, I didn't realize it was going to be used in conjunction with utility "Y" and "Z". Now, everything is really slow. Hey, can you say Microsoft?
Hey, I need this report on my desk every morning. It takes 3 hours to run. Let's kick it off every night at midnight.
Fast enough, even though a well-coded, well-designed implementation might take seconds to run. And mission critical. No point wasting programmer time speeding it up when we can do another project with big upside instead.
This sort of thing is not uncommon at all.
Sumner
Okay, so let's say threads are evil.
/dev/poll (Solaris), /dev/epoll, signal-per-fd, kqueues (FreeBSD), etc. (select and poll don't scale well to 10s of thousands of connections when most are idle, but some of the others are highly scalable). See e.g. Dan Kegel's c10k page for specifics.
Okay.
But processes as provided by current operating systems are too expensive to use.
No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.
If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.
Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.
Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT),
Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs
http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.
Sumner
WTF? How does Java make it hard to write non-threaded programs?
No fork(). No multiplexed I/O. Try writing a good scalable network server in a single thread without the moral equivalent of select()
Java 1.4 recognized that, and added I/O multiplexing. Still no good multiprocess (but not multithreaded) framework, though, and I/O multiplexing only solves a limited subset of cases.
Sumner
When I'm running a graphical program, the UI must not lock up, no matter what processing is going on in the background. I don't care how you solve that problem, but a simple use of threads is one of the simplest methods.
Not really. It seems simple until you get into the details. Yes, for some things multi-threaded is the way to go. But a multi-process solution is usually easier to implement and more stable, and a straight asynchronous single-threaded state machine solution is often the best (in terms of ease of implementation and performance). Remember, the difference between threads and process is that processes have protected memory and threads have all shared memory. The number of cases where you really don't want most of your memory protected is very small, especially when you remember that processes can easily get shared memory segments for select pieces of memory. Most people choose threads because they think threads are better/faster/smaller than processes (which is true on some broken OSes but not meaningfully true on Linux) rather than based on whether or not they want most memory shared.
Sumner