Difference between "adjusted" and "reported"?
on
Red Hat In The Black
·
· Score: 5
With the SEC investigating the occurance of Tech companies not reporting employee stock options as part of the company's liabilities, how much faith can we put in this statement when you look at the full quote:
"Adjusted" net income of $600,000 (up from a loss of $3.7m last year).
"Reported" net loss of $27.6m (from a loss of $17.4m last year).
If I'm correct, doesn't this mean that at the end of the day they are actually worse off than they were last year and just putting PR spin on the figure?
Given that we are talking business sales rather than home user sales (in terms of support contracts), you should be able to get your support call escalated to someone who knows what they are talking about fairly easily. Every time I've talked to Microsoft Support, I've gotten past the first operater very quickly and had someone from the specialist team call me back (we are a development house with about 25 people, not Dell). We've even had special hotfixes made just for us at times.
Even if you consider Microsoft's tech support fairly bad (which I disagree with), I wonder what sort of value you place on, say, 3dfx's support at the moment? Personally I prefer support from a company that exists - however bad - to support from a company that doesn't exist.
Yup. It's pretty hard to write kernel code to give you a root shell on either system. The way most script kiddies would do it is wait for someone to publish an exploit and then tinker with it using a hex editor to make it do what they wanted.
If you subvert the system call table on either system you can gain a root shell fairly quickly, even from a web server like Tux or IIS.
Nothing to do with cost of product...
on
VA Layoff Rumors
·
· Score: 3
Actually, it makes sense to purchase from a company that has a significant chance of existing for the foreseeable future. The day MSFT goes Chapter 11 will probably come a long time after the current batch of Linux vendors are well and truly forgotten.
In a world where Linux vendors are selling support and not software, you have to be careful that your support contract is actually worth something. With the present financial climate, I'd place a lot more value on a MSFT support contract than a LNUX one. Nothing to do with the cost of the product - everything to do with the performance of the company.
Exactly how is Linux kernel mode any safer than NT kernel mode? Once I'm in kernel mode I can do absolutely anything, regardless of the CAP functions - just hack up the kernel memory space and make all sorts of function calls around them. Code running in a processor's priviledged mode is (by definition) trusted code.
Personally, I'd be suggesting that the NT kernel space may be marginally more effective given that the kernel can be paged in and out which makes writing kernel buffer overrun stuff slightly more error prone (BSOD).
It comes down to the fact that if someone can run arbitrary code in kernel mode, they can relatively easily (at a minimum) switch out of protected mode, format all your hard drives, erase any tapes that may be in the machine (via BIOS calls) and basically render your machine worthless. This is WITHOUT calling any kernel functions!!
Of course, if you want to be more subtle, just rewrite the syscall interface table to some hooks that subvert the security of the system and let it appear to continue running normally while allowing you access to stuff you shouldn't (like passwords of users that logon to the site and other fun things).
Re:FOR loops: a question, ANSI C++, C++98, C++99..
on
GCC 3.0 Released
·
· Score: 2
Which is why I always code these sort of loops as:
int i;
for (i = 0; i < x; i++) { ...
}
for (i = 0; i < x2; i++) { ...
}
The frame buffer is part of main memory - you set up how much main memory to use as the frame buffer in your BIOS.
Sure it has an "AGP 6x" connection, but this doesn't help a lot when you are pumping all the data to the RAMDAC through that connection as well.
I don't think he understands memory arbitration...
on
nVidia nForce
·
· Score: 3
Another interesting problem is the issue of memory bandwidth. There is some concern, rightfully, about available system memory bandwidth when using the integrated graphics core. In theory, the CPU should always be able to grab memory bandwidth first. When using the built-in IGP graphics, the memory streaming tests came in about 15-20% lower than when a GeForce2 Ultra was dropped into the AGP slot. This is possibly related to their memory arbitration logic still being enabled and consuming cycles even when nothing else is truly vying with the CPU for memory access.Another interesting problem is the issue of memory bandwidth. There is some concern, rightfully, about available system memory bandwidth when using the integrated graphics core. In theory, the CPU should always be able to grab memory bandwidth first. When using the built-in IGP graphics, the memory streaming tests came in about 15-20% lower than when a GeForce2 Ultra was dropped into the AGP slot. This is possibly related to their memory arbitration logic still being enabled and consuming cycles even when nothing else is truly vying with the CPU for memory access.
Exactly how does the reviewer think the picture is getting on the screen if the RAMDAC isn't accessing main memory? Even suggesting that the CPU should have higher priority than the RAMDAC is either ignorance or stupidity. Of course the RAMDAC has to have priority or you are going to end up with big blank patches on the screen!!
Dinkumware, of course (well, actually the one that ships with MSVC). The painful thing is even though we have the source it is so damn hard to read that any changes we make couldn't accurately be ported through upgrades.
We considered going to SGI, but ended up just working around the bugs when we saw the SGI implementation of std::string wasn't ref counted at all (performance loss bigtime). Hopefully some of them will be figured out by the time VC.NET ships.
Is there any other way to program an MVS/390 if you are pathologically allergic to COBOL?;-)
Interesting paper. It's certainly a novel way of using SMT to combat the memory/internal clock disparity. Possibly even more effective on x86 where you really want a lot of stuff in L1 because you hit memory all the time!
select() is a pig of a system call (IMHO). Replacing it is probably not a good idea though - you'll break compatibility all over the place. I believe this is discussed every so often in the Linux kernel and improvements are slowly being made here and there with single thread wakeups and the like.
There's plenty of evidence that Unix is every bit as fast as NT with it's API. My point is that just because NT supports the BSD sockets API doesn't mean you should use it in high performance code. Personally I prefer using ReadFileEx() and WriteFileEx() on NT for socket (and file, named pipe and everything else) I/O. If you do it properly you don't have to wait for any events at all - the system just calls your completion routine when it's finished all by itself (kinda like a signal).
Winsock 2.x is good now that socket handles are full blooded file handles. Damn shame you still can't pass them as stdin or stdout to a child process though.:-(
In case you didn't notice, NT is not Unix, never has been Unix and never will be Unix. There are so many design differences in the underlying system that it is hard to believe you are even suggesting it's a good idea to use a single code base.
I've seen plenty of code that has a single source for Unix and NT, but NONE of it is high performance and most of it behaves very strangely on NT when you compare it to a properly written NT service or application.
If you are writing high performance code then you are almost certainly writing for a particular system and have to write the code for that system. Writing for NT is different to writing for Unix (I prefer writing for NT personally but that's another issue) and trying to say that it's lock-in is just stating the obvious.
By your argument Linux is deliberately encouraging the use of non-portable code through applications like Tux which only work on Linux boxes and not Windows, or even other Unixes.
Actually, my experience is mainly coming from Windows.
There's all sorts of hidden gotchas waiting for you with multithreaded code that you'll only find when you put it on a dual processor machine. Take for example the entire STL library will screw up in multithreaded code. std::string is reference counted, but not in a thread-safe manner so you get nasty bugs turning up about one in a million times; std::map has a great big static mutex (no, not a critical section, a mutex) around the entire class so actions on one map (especially a foreach algorithm) will block out all other maps in your process; deque has threadsafe bugs as well so you have to be very careful there.
If you write yourself a nice thread class that is independant of your worker classes, you won't have coupling between different operations on the same thread (assuming they aren't long lived) so the tying there is virtually non-existant.
The "thread safety" of the MS runtime library is a real performance killer. Every memory operation has a critical section around it that holds out every other thread while you are doing a new or delete.
Basically I'm saying that contrary to popular belief there is a very good chance that a multithreaded app will be slower, less stable and much harder to maintain than a similar single threaded app.
True. There's another way that's also very fast in NT that would be really difficult to emulate on Unix (probably because it wouldn't be fast on Unix):
To set this up you treat the sockets as file handles and use ReadFileEx() and WriteFileEx() with the lpCompletionRoutine parameter set to point to a function that the OS should call directly when the I/O is done. When you are blocked waiting for activity, put the thread in an alertable wait state using *WaitForXXXObjectEx() function and the completion routine you specified will be called by magic (actually via an Asynchronous Procedure Call or APC, but close enough to magic) when the I/O has finished.
This works very quickly on NT because it mirrors the way the underlying kernel and device driver stack works. Basically the I/O completion can come straight up from the driver routine into user space with a minimal delay and minimal number of context switches. The second advantage is you don't have to open event handles for every I/O you have outstanding, and so you don't run into the limit of waiting on 64 objects at a time.
The only drawback to this method (if you can call it a drawback) is that I/O that is initiated on one thread is always sent back to that thread so you have to run one thread per CPU and round robin them
The closest thing on Unix to this sort of behaviour is signals, but signals and multithreaded code tend not to mix very well.
Just a FYI really, not saying it's good or bad compared to Unix - just another thing to have in your bag of tricks.
I'm not sure. It looks like they've tried to use the same methods on 4 different operating systems. This is something that is doomed to failure in a benchmark situation as there are different programming paradigms for the different systems.
A much better benchmark would have been simply comparing IIS to Apache or Tux. Oh yeah. That's been done. Tux won. Hehe.
If you are memory bound, increasing the number of threads a CPU can run will help a little, but not significantly as sooner or later all instructions are going to be stalled waiting for memory. If you have a rich register set then you have the ability to run more instructions before you have to stall on memory loads.
Unless you have a well designed memory interface that can support multiple outstanding transactions, and sufficient registers to allow other threads to continue without accessing memory while one is blocked on memory access you aren't going to be gaining the full performance from SMT.
The method used here for programming Windows 2000 is almost certain to guarantee slow results. Assuming he's written his code to use select() or even WaitForSingleObject() then he's signifiantly slowing down the system.
If you want to write high performance socket applications on Windows you MUST use I/O completion ports (something this article failed to mention at all). Most high load applications I've written using sockets have shown a 50% to 100% improvement in throughput for the same CPU load when switching to I/O Completion ports from a tradition (Unix style) asyncronous I/O model.
I'm not saying in this case that Win2k would beat Linux, just that the tests were skewed by the author's inadequate knowledge of writing high performance code on Windows 2000.
One thing I've learned is if you can avoid threading then do. There are so many hidden ways you can cause race conditions, deadlocks and all sorts of other unforeseen dependancies in your code that it just isn't worth it.
Most libraries (including the standard C lib) aren't properly reentrant, and if they are it's because they put a big mutex around functions. Remember this when you call the functions - you may block!!
About the only good reasons I can think of for multithreading are:
(i) You are CPU bound and need to take advantage of SMP machines.
(ii) The OS or library your are using doesn't support async I/O so you need to block in a separate thread to handle multiple clients (cough..java..cough..).
(iii) The OS can only select() or WaitForMultipleObjects() on so many things at once and you need another thread to get around that limitation.
In any other case, avoid the temptation to multi-thread your code. It just isn't worth the pain of debugging those cases where one thread stomps all over the memory space of another thread, your application deadlocks in 3rd party libraries or you end up thrashing the context between 2 different thread so fast that more time is spent in the OS context switch code than in your application.
Multithreaded code is the last choice you should make when desigining - its an optimisation and not a core design philosophy in most cases.
If there are more threads than register sets, you have to do normal context switching. I think 4 threads is about the limit at the moment.
To my knowledge there are no CPUs available at the moment that do SMT. I'm not even sure if there are operating systems that support it (you need OS support to load the thread specific context of each CPU register set). The Alpha EV8 will probably be the first mainstream CPU to support it, though there were plenty of rumors that an upcoming revision of the P4 Xeon will support it as well.
It should be noted that SMT does nothing for you if the CPU is tied down in memory stalls, thus the x86 architecture is probably going to gain the least from this as it is very register starved and hence dumps things to memory all the time. Running more threads just increases the required memory bandwidth and so you need a very fast memory system (which the EV8 has) to keep up with everything.
The CPU has a separate set of registers, flags, segment tables (on x86) and so on for each thread that it is running simultaneously. When the instructions enter the pipelines they are tagged with bits to say which thread they are actually operating on and the execution units of the CPU use that set of registers and flags for the data required to run the instruction.
If you think of a four-way SMT CPU as having four sets of everything then you are starting to get the idea. For example on an x86 you would have four different CS:IP pairs, each one providing instructions for a different thread. Once those instructions are loaded and decoded the instructions in the pipelines are tagged with which CS:IP pair provided them. On execution if a register is referenced, then the register is loaded from the register set that corresponds to that tag (mov EAX,0 would affect the EAX #1 if it came from CS:IP #1, affect EAX #2 if it came from CS:IP #2 etc).
As there are implicitly no register dependancies between these instructions, any stalled instruction which is waiting for results from another execution unit does not hold up execution of instructions from other threads.
This model of execution speeds up performance considerably without the requirement of having multiple CPU cores on the one piece of silicon as you are getting far more efficiency from the one set of execution units.
For more information, Paul DeMone had an artice over at realworldtech a while back on the EV8 and how it worked. Take a look - it was quite interesting.
The Windows linker works pretty much the same way - an application always loads the DLL that it was originally bound against. If you change the interface you should change the version (ie filename) of the lib.
The real problem with Windows comes from the two simple facts:
(i) Windows developers got lazy and didn't put versions in their filenames.
(ii) Windows developers got lazy and put all their application DLLs on the path (ie system directory) instead of in the application directory.
Despite what other developers may lead you to believe, all versions of Windows NT (and I believe Windows 9x is the same) will load DLLs from the application directory first, regardless of whether there is another DLL of the same name already loaded. For a DLL to be "reused" in memory the following conditions have to be satisfied:
(i) The DLL must have the same full path name.
(ii) The DLL must be able to be loaded at the same logical address.
If the logical address is already used (probably by another DLL) then NT will simply dynamically load a new copy of the DLL at a new address. For those who don't understand why it has to be a new copy, look in any OS book about Dynamic Loading and "fixups".
Remember that: You can load two DLLs of the same name on any version of Win32. Many people still live in Win16 land where you couldn't do this, but it is definitely possible (try it and see).
Linux is NOT descending into DLL hell for the simple reason that it has a logical way of maintaining the separation between major versions of DLLs. As developers are being good on Linux, DLL hell won't exist the way it does on Windows.
SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.
Naturally, it requires OS support for it to work, but most CPU manufacturers are looking to go this way in the near future.
You've been telling developers the wrong thing. Windows 3.1 behaved this way, but NT has always loaded different DLLs for different applications. I've used that fact many times in my developement life.
It's even simple to prove. Create two DLLs of the same name - one that prints "1" and one that prints "2". You can easily get two different apps to load the two different DLLs at the same time.
Basically, if you read NT Internals or even MSDN, you'd realize that tha path of the DLL AND the relocation address has to be the same before the DLL can be shared. If these two aren't the same then you have two copies of a DLL with the same name in memory.
I'm not proposing it. Last I heard the French, Canadians and British all thought it was a fantastic idea though. After all, their logic goes something like Australia sold them the Uranium so it's Australia's responsibility to take it back once they've made it toxic.
Personally I think if they want to use nuclear power then they should think of what to do with their own waste. I've nothing against nuclear power itself, just with countries that can't deal with the undesirable side-effects.
With the SEC investigating the occurance of Tech companies not reporting employee stock options as part of the company's liabilities, how much faith can we put in this statement when you look at the full quote:
"Adjusted" net income of $600,000 (up from a loss of $3.7m last year).
"Reported" net loss of $27.6m (from a loss of $17.4m last year).
If I'm correct, doesn't this mean that at the end of the day they are actually worse off than they were last year and just putting PR spin on the figure?
Given that we are talking business sales rather than home user sales (in terms of support contracts), you should be able to get your support call escalated to someone who knows what they are talking about fairly easily. Every time I've talked to Microsoft Support, I've gotten past the first operater very quickly and had someone from the specialist team call me back (we are a development house with about 25 people, not Dell). We've even had special hotfixes made just for us at times.
Even if you consider Microsoft's tech support fairly bad (which I disagree with), I wonder what sort of value you place on, say, 3dfx's support at the moment? Personally I prefer support from a company that exists - however bad - to support from a company that doesn't exist.
Yup. It's pretty hard to write kernel code to give you a root shell on either system. The way most script kiddies would do it is wait for someone to publish an exploit and then tinker with it using a hex editor to make it do what they wanted.
If you subvert the system call table on either system you can gain a root shell fairly quickly, even from a web server like Tux or IIS.
Actually, it makes sense to purchase from a company that has a significant chance of existing for the foreseeable future. The day MSFT goes Chapter 11 will probably come a long time after the current batch of Linux vendors are well and truly forgotten.
In a world where Linux vendors are selling support and not software, you have to be careful that your support contract is actually worth something. With the present financial climate, I'd place a lot more value on a MSFT support contract than a LNUX one. Nothing to do with the cost of the product - everything to do with the performance of the company.
Exactly how is Linux kernel mode any safer than NT kernel mode? Once I'm in kernel mode I can do absolutely anything, regardless of the CAP functions - just hack up the kernel memory space and make all sorts of function calls around them. Code running in a processor's priviledged mode is (by definition) trusted code.
Personally, I'd be suggesting that the NT kernel space may be marginally more effective given that the kernel can be paged in and out which makes writing kernel buffer overrun stuff slightly more error prone (BSOD).
It comes down to the fact that if someone can run arbitrary code in kernel mode, they can relatively easily (at a minimum) switch out of protected mode, format all your hard drives, erase any tapes that may be in the machine (via BIOS calls) and basically render your machine worthless. This is WITHOUT calling any kernel functions!!
Of course, if you want to be more subtle, just rewrite the syscall interface table to some hooks that subvert the security of the system and let it appear to continue running normally while allowing you access to stuff you shouldn't (like passwords of users that logon to the site and other fun things).
Which is why I always code these sort of loops as:
...
...
Just to avoid problems on either compiler.int i;
for (i = 0; i < x; i++) {
}
for (i = 0; i < x2; i++) {
}
Just to avoid problems on either compiler.
The frame buffer is part of main memory - you set up how much main memory to use as the frame buffer in your BIOS.
Sure it has an "AGP 6x" connection, but this doesn't help a lot when you are pumping all the data to the RAMDAC through that connection as well.
Another interesting problem is the issue of memory bandwidth. There is some concern, rightfully, about available system memory bandwidth when using the integrated graphics core. In theory, the CPU should always be able to grab memory bandwidth first. When using the built-in IGP graphics, the memory streaming tests came in about 15-20% lower than when a GeForce2 Ultra was dropped into the AGP slot. This is possibly related to their memory arbitration logic still being enabled and consuming cycles even when nothing else is truly vying with the CPU for memory access.Another interesting problem is the issue of memory bandwidth. There is some concern, rightfully, about available system memory bandwidth when using the integrated graphics core. In theory, the CPU should always be able to grab memory bandwidth first. When using the built-in IGP graphics, the memory streaming tests came in about 15-20% lower than when a GeForce2 Ultra was dropped into the AGP slot. This is possibly related to their memory arbitration logic still being enabled and consuming cycles even when nothing else is truly vying with the CPU for memory access.
Exactly how does the reviewer think the picture is getting on the screen if the RAMDAC isn't accessing main memory? Even suggesting that the CPU should have higher priority than the RAMDAC is either ignorance or stupidity. Of course the RAMDAC has to have priority or you are going to end up with big blank patches on the screen!!
Dinkumware, of course (well, actually the one that ships with MSVC). The painful thing is even though we have the source it is so damn hard to read that any changes we make couldn't accurately be ported through upgrades.
;-)
We considered going to SGI, but ended up just working around the bugs when we saw the SGI implementation of std::string wasn't ref counted at all (performance loss bigtime). Hopefully some of them will be figured out by the time VC.NET ships.
Is there any other way to program an MVS/390 if you are pathologically allergic to COBOL?
Interesting paper. It's certainly a novel way of using SMT to combat the memory/internal clock disparity. Possibly even more effective on x86 where you really want a lot of stuff in L1 because you hit memory all the time!
select() is a pig of a system call (IMHO). Replacing it is probably not a good idea though - you'll break compatibility all over the place. I believe this is discussed every so often in the Linux kernel and improvements are slowly being made here and there with single thread wakeups and the like.
:-(
There's plenty of evidence that Unix is every bit as fast as NT with it's API. My point is that just because NT supports the BSD sockets API doesn't mean you should use it in high performance code. Personally I prefer using ReadFileEx() and WriteFileEx() on NT for socket (and file, named pipe and everything else) I/O. If you do it properly you don't have to wait for any events at all - the system just calls your completion routine when it's finished all by itself (kinda like a signal).
Winsock 2.x is good now that socket handles are full blooded file handles. Damn shame you still can't pass them as stdin or stdout to a child process though.
In case you didn't notice, NT is not Unix, never has been Unix and never will be Unix. There are so many design differences in the underlying system that it is hard to believe you are even suggesting it's a good idea to use a single code base.
I've seen plenty of code that has a single source for Unix and NT, but NONE of it is high performance and most of it behaves very strangely on NT when you compare it to a properly written NT service or application.
If you are writing high performance code then you are almost certainly writing for a particular system and have to write the code for that system. Writing for NT is different to writing for Unix (I prefer writing for NT personally but that's another issue) and trying to say that it's lock-in is just stating the obvious.
By your argument Linux is deliberately encouraging the use of non-portable code through applications like Tux which only work on Linux boxes and not Windows, or even other Unixes.
It's just daft.
Actually, my experience is mainly coming from Windows.
There's all sorts of hidden gotchas waiting for you with multithreaded code that you'll only find when you put it on a dual processor machine. Take for example the entire STL library will screw up in multithreaded code. std::string is reference counted, but not in a thread-safe manner so you get nasty bugs turning up about one in a million times; std::map has a great big static mutex (no, not a critical section, a mutex) around the entire class so actions on one map (especially a foreach algorithm) will block out all other maps in your process; deque has threadsafe bugs as well so you have to be very careful there.
If you write yourself a nice thread class that is independant of your worker classes, you won't have coupling between different operations on the same thread (assuming they aren't long lived) so the tying there is virtually non-existant.
The "thread safety" of the MS runtime library is a real performance killer. Every memory operation has a critical section around it that holds out every other thread while you are doing a new or delete.
Basically I'm saying that contrary to popular belief there is a very good chance that a multithreaded app will be slower, less stable and much harder to maintain than a similar single threaded app.
True. There's another way that's also very fast in NT that would be really difficult to emulate on Unix (probably because it wouldn't be fast on Unix):
To set this up you treat the sockets as file handles and use ReadFileEx() and WriteFileEx() with the lpCompletionRoutine parameter set to point to a function that the OS should call directly when the I/O is done. When you are blocked waiting for activity, put the thread in an alertable wait state using *WaitForXXXObjectEx() function and the completion routine you specified will be called by magic (actually via an Asynchronous Procedure Call or APC, but close enough to magic) when the I/O has finished.
This works very quickly on NT because it mirrors the way the underlying kernel and device driver stack works. Basically the I/O completion can come straight up from the driver routine into user space with a minimal delay and minimal number of context switches. The second advantage is you don't have to open event handles for every I/O you have outstanding, and so you don't run into the limit of waiting on 64 objects at a time.
The only drawback to this method (if you can call it a drawback) is that I/O that is initiated on one thread is always sent back to that thread so you have to run one thread per CPU and round robin them
The closest thing on Unix to this sort of behaviour is signals, but signals and multithreaded code tend not to mix very well.
Just a FYI really, not saying it's good or bad compared to Unix - just another thing to have in your bag of tricks.
I'm not sure. It looks like they've tried to use the same methods on 4 different operating systems. This is something that is doomed to failure in a benchmark situation as there are different programming paradigms for the different systems.
A much better benchmark would have been simply comparing IIS to Apache or Tux. Oh yeah. That's been done. Tux won. Hehe.
If you are memory bound, increasing the number of threads a CPU can run will help a little, but not significantly as sooner or later all instructions are going to be stalled waiting for memory. If you have a rich register set then you have the ability to run more instructions before you have to stall on memory loads.
Unless you have a well designed memory interface that can support multiple outstanding transactions, and sufficient registers to allow other threads to continue without accessing memory while one is blocked on memory access you aren't going to be gaining the full performance from SMT.
Of course, I've been know to be wrong.
The method used here for programming Windows 2000 is almost certain to guarantee slow results. Assuming he's written his code to use select() or even WaitForSingleObject() then he's signifiantly slowing down the system.
If you want to write high performance socket applications on Windows you MUST use I/O completion ports (something this article failed to mention at all). Most high load applications I've written using sockets have shown a 50% to 100% improvement in throughput for the same CPU load when switching to I/O Completion ports from a tradition (Unix style) asyncronous I/O model.
I'm not saying in this case that Win2k would beat Linux, just that the tests were skewed by the author's inadequate knowledge of writing high performance code on Windows 2000.
One thing I've learned is if you can avoid threading then do. There are so many hidden ways you can cause race conditions, deadlocks and all sorts of other unforeseen dependancies in your code that it just isn't worth it.
Most libraries (including the standard C lib) aren't properly reentrant, and if they are it's because they put a big mutex around functions. Remember this when you call the functions - you may block!!
About the only good reasons I can think of for multithreading are:
(i) You are CPU bound and need to take advantage of SMP machines.
(ii) The OS or library your are using doesn't support async I/O so you need to block in a separate thread to handle multiple clients (cough..java..cough..).
(iii) The OS can only select() or WaitForMultipleObjects() on so many things at once and you need another thread to get around that limitation.
In any other case, avoid the temptation to multi-thread your code. It just isn't worth the pain of debugging those cases where one thread stomps all over the memory space of another thread, your application deadlocks in 3rd party libraries or you end up thrashing the context between 2 different thread so fast that more time is spent in the OS context switch code than in your application.
Multithreaded code is the last choice you should make when desigining - its an optimisation and not a core design philosophy in most cases.
If there are more threads than register sets, you have to do normal context switching. I think 4 threads is about the limit at the moment.
To my knowledge there are no CPUs available at the moment that do SMT. I'm not even sure if there are operating systems that support it (you need OS support to load the thread specific context of each CPU register set). The Alpha EV8 will probably be the first mainstream CPU to support it, though there were plenty of rumors that an upcoming revision of the P4 Xeon will support it as well.
It should be noted that SMT does nothing for you if the CPU is tied down in memory stalls, thus the x86 architecture is probably going to gain the least from this as it is very register starved and hence dumps things to memory all the time. Running more threads just increases the required memory bandwidth and so you need a very fast memory system (which the EV8 has) to keep up with everything.
The CPU has a separate set of registers, flags, segment tables (on x86) and so on for each thread that it is running simultaneously. When the instructions enter the pipelines they are tagged with bits to say which thread they are actually operating on and the execution units of the CPU use that set of registers and flags for the data required to run the instruction.
If you think of a four-way SMT CPU as having four sets of everything then you are starting to get the idea. For example on an x86 you would have four different CS:IP pairs, each one providing instructions for a different thread. Once those instructions are loaded and decoded the instructions in the pipelines are tagged with which CS:IP pair provided them. On execution if a register is referenced, then the register is loaded from the register set that corresponds to that tag (mov EAX,0 would affect the EAX #1 if it came from CS:IP #1, affect EAX #2 if it came from CS:IP #2 etc).
As there are implicitly no register dependancies between these instructions, any stalled instruction which is waiting for results from another execution unit does not hold up execution of instructions from other threads.
This model of execution speeds up performance considerably without the requirement of having multiple CPU cores on the one piece of silicon as you are getting far more efficiency from the one set of execution units.
For more information, Paul DeMone had an artice over at realworldtech a while back on the EV8 and how it worked. Take a look - it was quite interesting.
I wasn't sure at the time that it worked on Win9x as well. I just tried it. It does.
So, the REAL problem with DLL Hell is simply that people are installing DLLs into the system directory instead of with their applications.
The Windows linker works pretty much the same way - an application always loads the DLL that it was originally bound against. If you change the interface you should change the version (ie filename) of the lib.
The real problem with Windows comes from the two simple facts:
(i) Windows developers got lazy and didn't put versions in their filenames.
(ii) Windows developers got lazy and put all their application DLLs on the path (ie system directory) instead of in the application directory.
Despite what other developers may lead you to believe, all versions of Windows NT (and I believe Windows 9x is the same) will load DLLs from the application directory first, regardless of whether there is another DLL of the same name already loaded. For a DLL to be "reused" in memory the following conditions have to be satisfied:
(i) The DLL must have the same full path name.
(ii) The DLL must be able to be loaded at the same logical address.
If the logical address is already used (probably by another DLL) then NT will simply dynamically load a new copy of the DLL at a new address. For those who don't understand why it has to be a new copy, look in any OS book about Dynamic Loading and "fixups".
Remember that: You can load two DLLs of the same name on any version of Win32. Many people still live in Win16 land where you couldn't do this, but it is definitely possible (try it and see).
Linux is NOT descending into DLL hell for the simple reason that it has a logical way of maintaining the separation between major versions of DLLs. As developers are being good on Linux, DLL hell won't exist the way it does on Windows.
SMT is where you have one processor core executing several threads at the same time without having to context switch. The CPU maintains state (registers and flags etc.) for each thread and can execute instructions from each thread simultaneously down different pipes. This improves throughput as you don't have the overhead of task switching and you also have a far better chance of keeping your pipes full.
Naturally, it requires OS support for it to work, but most CPU manufacturers are looking to go this way in the near future.
You've been telling developers the wrong thing. Windows 3.1 behaved this way, but NT has always loaded different DLLs for different applications. I've used that fact many times in my developement life.
It's even simple to prove. Create two DLLs of the same name - one that prints "1" and one that prints "2". You can easily get two different apps to load the two different DLLs at the same time.
Basically, if you read NT Internals or even MSDN, you'd realize that tha path of the DLL AND the relocation address has to be the same before the DLL can be shared. If these two aren't the same then you have two copies of a DLL with the same name in memory.
I'm not proposing it. Last I heard the French, Canadians and British all thought it was a fantastic idea though. After all, their logic goes something like Australia sold them the Uranium so it's Australia's responsibility to take it back once they've made it toxic.
Personally I think if they want to use nuclear power then they should think of what to do with their own waste. I've nothing against nuclear power itself, just with countries that can't deal with the undesirable side-effects.