Why Faster CPUs? What About SMP?
Codeine asks: "As we press harder and harder against the physical limitations of speed, why do CPU manufacturers continue with the costly faster single processor model, instead of focussing on multi-processor designs? The new IBM Blue Gene seems to be acknowleging that more/simpler processors is the way to go (very like non-AI, millions of neurons). Why aren't we seeing commoditisation of SMP?"
What is ACPI? I'm thinking about adding another processor to my machine, but if this is something important, I may just get a faster cpu.
the good ground has been paved over by suicidal maniacs
True, mostly, but not always. Some high end compilers can do certain optimizations that will allow certain types of algorithms to run faster on a multiprocessor machine. For example, consider this loop:
Successive iterations of this loop aren't dependent on previous iterations, so in theory, if you had n processors, you could do all n iterations in parallel at once. But of course, most algorithms aren't this parallelizable. For example:
And of course this is the answer to the poster's question. While some problems may be highly parallelizable, others aren't - each step of the algorithm may depend on the results of the previous step. In this case, throwing more processors at it does you zero good. It's like if you're driving across the country and you decide to take 100 of your friends, all in their own cars, to try to get there faster. It's still going to take the same amount of time.
Say hello to zMac.
The only reason I compile in APM anyway is to get the machine to turn off by itself. Since that's not very often, it's not a big deal if it doesn't work in SMP mode.
Thanks.
the good ground has been paved over by suicidal maniacs
Here's the basic reasons SMP hasn't taken off: The general public doesn't get it and there's no pressing reason to teach them.
When Joe Sixpack tunes in QVC and sees a "700 MHz computer" that's easy to compare to a "350 MHz computer"--it's twice as fast. But what do you make of a "dual 400 Mhz"? Is that 800?
Once Moore's Law starts pooping out on us we'll see many more multi-processor machines and then Joe will start to understand.
--
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Yeah, I know. If you'd checked other replies to my post before you posted, you'd note that I corrected myself. /. doesn't support post-post editing, unfortunately.
Hey, I'm not saying Joe needs SMP -- I'm also not saying he doesn't. But considering how cheap previous-generation chips get, buying an SMP motherboard with just a single processor, and then upgrading later, allows for almost twice the upgradability (processor speed wise) later.
Although really, IMO, most people would be served just fine by a NeXTStation (33MHz 68040 and 64MB RAM, say).
As for few people needing SMP, consider this: if chip speeds were still in the 100-200MHz range, and CPU vendors had put more effort into improving ISA rather than pumping up speed (and providing incremental improvements in CPU technology), we could be running 4 CPU systems. And since 100-200MHz processors would still be in abundance, it would probably mean more cheap computing for everyone. The stratification of MHz has resulted in there not being any single CPU produced in sufficient quantities to spread computers faster.
--Matthew
Ummm... Apple switched to USB at least as much because Macs were losing the peripheral war -- there wasn't ADB, DB-9 serial, or SCSI on most PCs, and that constituted 95% of Apple peripherals. However, for quite some time PCs had been coming with USB, whether manufacturers were supporting it or not. So, suddenly, they only had one mostly incompatible interface, but unlike ADB, SCSI, or DB-9 serial, it was intended to go on all PCs. Now, any new PC comes with USB.
Now Macs also have Firewire/IEEE 1394 and AirPort/IEEE 802.11, two Apple technologies making their way into PCs. Just a few days ago, I saw a pretty new Compaq system at Radio Shack with Firewire, and now Carnegie Mellon is installing 11Mbps wireless networking (you know, 802.11?) on campus. So, to recap: standard technology on Powermacs is: USB (on PCs too), Firewire (on PCs too), Airport (coming close to standard on laptops), and, now, SMP.
Take a look at your average Windows PC (yes, just pretend Windows9x could benefit from SMP for the sake of my argument). Now, take a look at all the little icons in the tray, and the desktop, and the taskbar itself, and then, finally, the one application our Hero, Joe Sixpack, is running. Suddenly, Joe doesn't need much CPU for any particular reason, but keeping all the little processes happy while he loads some Microsoft bloatware, and having a snappy system requires a good bit of CPU. Voila! Joe Sixpack could benefit from 2 200MHz CPUs, rather than 1 400-500MHz CPU.
I'll say it again: Apple seems to be leading the PC pack. Now that Apple has put out SMP machines, labelled them "fit for general consumption," and then gone off on how cool they are, I am quite confident that PC manufacturers will follow suit.
--Matthew
I wouldn't be so quick to blame everything on "badly written" drivers if I were you. Applications are fairly simple, locking-wise. They have one entry point and full control over when new threads enter or exit. Apps that benefit at all from SMP usually do so trivially; anything that's a pain in the ass to handle in parallel just gets a huge mutex slapped around it, and apps rarely need to hold two locks at once. For drivers, it's very different. Drivers have multiple entry points, any of which can generally be invoked at any time even when something else is already going on. Single-threading requests is generally not an option for performance reasons. Drivers tend to develop deeper locking hierarchies and more complex locking behaviors than almost any app, so it's no surprise that locking errors - race conditions, deadlocks, etc. - are so common. Yes, a driver that has such errors in it is still broken, but it may still be "better written" than the trivial SMP code app writers can get away with.
It may not be the driver's fault, anyway. The OS itself may have SMP problems that get triggered by specific perfectly-legal driver behavior. For example:
I've seen this kind of crap happen on a dozen OSes, in cases where the driver had every right to call that OS function under those conditions but the OS screwed up. I've seen cases where the OS-provided synchronization facilities had subtle bugs (usually SMP-specific bugs) that caused starvation or missed wakeups under some conditions. Drivers are hard to write under the best of conditions, and when they have to be written while avoiding all of the OS bugs it's sometimes amazing that they ever work at all.
Slashdot - News for Herds. Stuff that Splatters.
There are two main reasons SMP isn't more pervasive:
The first of these is pretty self-explanatory. I'll try to expand a little on the second.
Multiprocessor (MP)hardware is a lot more complex than uniprocessor (UP) hardware, with extra latency in the memory subsystem to deal with potential cache issues - even if no sharing is occurring at that particular moment. Code running on multiple processors needs to do locking, and the locking itself can be pretty costly (especially since it uses bus-saturating interlocked memory instructions). This is why running an MP kernel on a single processor is even slower than a UP kernel. Lastly, not all code parallelizes well; much of it contains major sequential dependencies. In the end, all of the extra work that's done to make MP behave correctly may end up costing more than it's worth even for small numbers of processes.
As the number of processors increases, all of these effects increase exponentially. The memory system starts to get pretty hideously expensive, cache warming and memory locality issues become more complex as efforts are made to reduce the strain on the memory system, and all the while it becomes harder and harder to keep all of the CPUs busy enough to make the whole thing worthwhile...and this is even for a mere couple of dozen processors.
When you're looking at something like Blue Gene, look not at the amount of CPU power involved but at the incredible memory/communications bandwidth - multiple communicating processors on a single chip, multiple chips on a board, boards arranged into modules, etc. The key to Blue Gene is that they have this phenomenal bandwidth coupled with a specialized application which is almost uniquely able to take advantage of how the memory/communications system is structured.
Slashdot - News for Herds. Stuff that Splatters.
True. I've been using an SMP machine for three years now, and it's painful to go back to a single-processor machine. However, I think that there are several reasons for hardware vendors (in particular, x86 vendors) not releasing SMP machines:
1. Drivers
SMP causes a lot of badly-written drivers to fail, although they might work reasonably well under a single CPU.
2. Cost
SMP on x86 requires more expensive motherboards, a larger-capacity power supply, and overall better quality of components, all of which costs more (not to mention the cost of the second CPU itself).
3. Competition
x86 vendors have to keep their prices down in order to be competitive, and with the current "MHz = Better speed" idea firmly implanted in the minds of most people, it's going to be harder selling a dual-CPU 700MHz system (for example) if there are 800MHz single-CPU systems available.
4. Lack of OS support
Like it or not, the majority of users are still stuck on Win95/98, neither of which support SMP. WinNT/2000 does, but how many computers for home use are sold with those installed?
5. Bad architecture
The x86 platform's SMP, quite frankly, sucks. A lousy bus/cache architecture means that you won't get 2x the performance you would from a single CPU for any application which hits main memory a lot.
6. Difficulty of programming for SMP
If you want to get the benefits of SMP from within a single application, you basically have to use threads, which are a real pain to debug properly.
That's all I can think of off the top of my head...
FreeBSD can run ACPI because their SMP is poor. FreeBSD (Note that 5.x will probably change this) using the big giant lock mythod of getting at the hardware. Thus when you acess hardware on one CPU the other cpu is stoped. Generally this is bad, but it means that ACPI works - the system looks like a single processor to ACPI.
I love freeBSD, and have run it in SMP since the pre-3.x days.
First of all, to some extent SMP is being commoditized -- Apple, for instance, is now selling SMP as being a simple one-step upgrade from UP in their PowerMac G4s. Apple is also the computer vendor that brought us widespread use of USB, the focus on industrial design as a consideration buying computers, etc. Expect other vendors to follow that lead, insofar as they can load operating systems that can take advantage of SMP.
Microsoft should probably credited with holding systems back to single processors with Win9x/ME, and yes even WinNT. With NT, IIRC, processes, not threads, were spread across processors -- so you saw very little benefit running a single, multi-threaded app on an SMP system. I would hope W2K does something more reasonable, as in something that virtually every other SMP implementation does (notably, except MacOS pre-X), and spread threads across processors.
Finally, in the x86 arena, only intel can support SMP currently -- and considering that AMD has been providing a much better price/performance ratio for some time, and is even generally ahead in performance right now. That makes it more difficult to justify going with lower-performing, more expensive processors to increase performance, although of course the difference between dual 800MHz P3's and a single 1.1GHz Athlon should be quite noticable if you're running a well-threaded application (or lots and lots of processes).
All that is for PC systems (including Macs as Personal Computers, if not Wintel PeeCees :). For other architectures (alpha, sparc/ultrasparc, MIPS, PA-RISC for instance), SMP is alive and well. SGI's highest-end workstations-that-could-be-servers, Octanes and Octane2s, support two processors, and their servers support a lot of processors. Sun has SMP workstations and ridiculously SMP servers as well; I've seen a lot of SMP alpha motherboards, but since alpha's are almost as commodity as PCs I haven't checked out what sorts of systems [c|o|m|p|a|q] sells. Hewlett-Packard also sells SMP workstations and servers, but my experience with them is with the old HP 9000/7xx series that are largely, if not completely, uniprocessor.
--Matthew
SMP is not always faster. If you are running two completley independent CPU bound programs, then SMP is faster, but then why not have two comptuers? As soon as your threads need to interact SMP slows down. Depending on your algorythm this might or not be a big deal.
Or to put it anouther way, the best SMP code will in the general case be slower on a 2 cpu system as the smae program for one processor that is twice as fast. (ie a SMP program for two P3-500 will run slower then a single processor only program for one P3-1000. Cache cohearancy issues and the like. Of course two P3-500s might be cheaper by enough to make it worthwhile.
Massive multi-CPU machines like the IBM Blue Gene you reference are never SMP. SMP machines generally have multiple CPUs sharing bus, RAM and I/O, as well as everything else a uniprocessor machine has, and therefore encounter all kinds of inefficencies as you scale up. In practice, 4 CPUs seems to be the "sweet spot" for SMP - after this you start running into the law of diminishing returns hard. You can, to an extent, code around this, by doing things like making the locks in your kernel more and more fine-grained, but this adds a lot of unnecessary overhead for machines with small numbers of CPUs and also makes the code orders-of-magnatude more complex (and hence unmanagable). "Supercomputers" generally are built with large numbers of nodes (generally with 1-4 CPUs each) that could (in theory) operate independently, with a very high speed, low latency interconnect lashing them toghether (really they are glorified clusters). This seems to be the future for high-end Unix machines: both Compaq's Wildfire (now shipping) and IBM's Regatta (coming soon) systems will feature a "cluster in a single box" type of archetecture.
--
http://gammatron.weblogger.com
...There are also some particular problems that simply do not convert to parallel processing very well if at all. Some say that running win9x is the largest problem of this type. ;-) So the time required to solve the problem depends upon the time that the fastest cpu can run through the code. The only way to get an answer quicker is to build a faster cpu.
An alternative that is being explored is finding a different, parallelizable algorithm that can solve the same problem, but that research also requires work and resources. If you spent those resources on building a faster cpu, then not only would you have a faster solution to your particular problem, but also a faster solution to other problems as well.
So...in most cases the best choice as to where to spend those resources is to spend them building a faster cpu.
Good judgement comes from experience, and experience comes from bad judgement.
- W. Wriston, former Citibank CEO