BigTux Shows Linux Scales To 64-Way
An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.
At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.
While FreeBSD is a great OS/kernel, it doesn't scale as well as Linux, end of story.
Huh? What smoke are you craking? Here is the comparison of MS's latest and greatest Windows 2003 server editions So, umm where is this double of what Linux supports? Plain vanilla Linux 2.6 can do 64-way no problem. Actually, SGI has had single image 128-way Linux system out for a while. They should have 256-way, single image Linux system out soon. That is more then MS can even touch. Maybe do some research before you just shoot off FUD.If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
Looking at the literature, Linux and Unix in general seems to be designed to keep processes as lightweight as possible. OTOH, Windows processes are a little heavier and take longer to start up.
Then, OTOH, Windows threads are very lightweight compared to the equivalent thread model in Linux. Benchmarks have shown that in multi-process setups, Unix is heavily favored, but in multi-threaded setups Windows comes out on top.
When it comes to multi-processors, is there a theoretical advantage to using processes vs threads? Leaving out the Windows vs Linux debate for a second, how would an OS that implemented very efficient threads compare to one that implemented very efficient processes?
Would there be a difference?
Never mind Linux for a moment, I'm just amazed that 64 Itanium 2's have actually been sold...
The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.
No, you have a misconception. On these REAL big iron systems, each CPU (or each few CPUs) does have its own busses, memory, and io busses.
So in that regard it is as good as a cluster, but then add the fact that they have a global, cache coherent shared memory and interconnets that shame any cluster.
The only advantage of a cluster is cost. Actually redundancy plays a role too, although less so with proper servers, as they have redundancy built in, and you can partition off the system to run multiple operating systems too.
To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.
Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.
Not really. Check the world's second fastest supercomputer. It is a cluster of 20 512-way IA64 systems running Linux.
Someone wasn't awake when their Comp Sci class covered Ahmdal's Law. Or the Dining Philosopher's Problem. Or vector processing. Or networking. Or the parallelization problem. Or...
Actually, the troll can be made to serve a useful purpose, because there are probably a lot of people who read Slashdot who didn't do Comp Sci.
Part of the problem with parallelization is that not all problems can be divided up that way. If one man takes 60 seconds to dig a posthole, how long would it take 60 men to dig a single posthole? Answer - 60 seconds. Exactly the same amount of time is spent, because only one person can be digging the posthole at a time. Having more people doesn't help.
Another part of the problem is sharing resources. Let's say you have some computer memory that can respond to a read operation in one clock cycle. Let's also say that the computer program never reads from memory. (Very unlikely.) The first processor fetches an instruction (which is a read operation) and then executes it. The second processor can't do anything while the first one is reading, so has to wait until it has finished with that part, before it can do a read of its own.
If the instruction takes 1 clock cycle to execute, then the first processor will be ready after the second one has performed its fetch. In which case, you will be running the memory flat-out with just 2 processors. Any more than that, and the system will actually slow down, because the processors will have to wait.
Likewise, if the average time to run an instruction is N clock cycles, you will (on average) be able to have N+1 processors, before the memory is maxed out.
In practice, processors run about an order of magnitude faster than RAM, which is why modern systems have lots of L1 and L2 cache (and sometimes L3), pipelining, etc. These are all tricks to try and access the somewhat slower main memory as little as possible.
Also in practice, programmers try to avoid "expensive" (in terms of clock cycles) operations because you can generally get the same results faster by other means. (That's why RISC technology became popular - make the fast operations faster, rather than adding stuff that people will try to avoid.)
In consequence, sharing resources is a very difficult problem. It is not the only problem that many-way systems face, though. If you have N processors, there are !N possible ways for those processors to communicate. In this case, it would be !64 (64x63x62x...x2x1), which is a horribly large number. You couldn't have one link per pathway, for example, which means you've got to share links, which means you've got to have some damn good scheduling and routing mechanisms. Even then, with limited resources, you can only have so many processors talking at a time, before you are overwhelmed. Which means that "chatty" problems will involve a lot of processors spending a lot of time simply waiting for their turn to chat.
(This goes back to why people generally build clusters, rather than many-way SMP systems, and why high-end clusters use the fastest networking technology on the planet. Clustering is easy. Getting the communication speeds up is the problem. Getting communication speeds to the point of being useful for scientific applications is a very complex, expensive problem. Which is the main reason Mr. Cray charged more than Mr. Dell for his computers - and why people would pay it.)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
This is an unmodified stock 2.6 kernel (well it's patched with stuff that's in distros, and will be in the next kernel). Out of the box, it detected the NUMA set up, memory partitions, the whole bit.
The SGI boxes are nothing like the stock kernel.
I don't need no instructions to know how to rock!!!!
Linux scaling to 512 processors:/ columbia/
http://www.sgi.com/features/2004/oct
The story should be HP has finally caught up to where SGI were 2 years ago.\
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower