BigTux Shows Linux Scales To 64-Way
An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
What parallel-computing activity doesn't involve intermittent activity by a single processor? You have to spawn the parallel job somehow, and typically that starts as a single process. Is the implication here that compiling is pipelined, but linking is a single-CPU job?
If you mod me down, I shall become more powerful than you can possibly imagine.
SGI
Unisys
Fujitsu
HP
It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).
The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.
At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.
Correct, AFAIK the biggest windows 2003 datacenter installs are on Unisys ES7000's and those only support 32-way windows partitions. The box can hold 64 Xeon's so I would say that Unisys isn't comfortable with the scalability of windows to the full system size, otherwise they'd be shouting it from the rooftops.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
In general, people use clusters of single or dual-processor systems, because many problems demand lots of hauling of data but relatively little communication between processors. For example, ray-tracing involves a lot of processor churning, but the only I/O is getting the information in at the start, and the image out at the end.
Databases are OK for this, so long as the data is relatively static (so you can do a lot of caching on the separate nodes and don't have to access a central disk much).
A 64-way superscaler system, though, is another thing altogether. Here, we're talking about some complex synchronization issues, but also the ability to handle much faster inter-processor I/O. Two processors can "talk" to each other much more efficiently than two ethernet devices. Far fewer layers to go through, for a start.
Not a lot of problems need that kind of performance. The ability to throw small amounts of data around extremely fast would most likely be used by a company looking at fluid dynamics (say, a car or aircraft manufacturer) because of the sheer number of calculations needed, or by someone who needed the answer NOW (fly-by-wire systems, for example, where any delay could result in a nice crater in the ground).
The problem is, most manufacturers out there already have plenty of computing power, and the only fly-by-wire systems that would need this much computing power would need military-grade or space-grade electronics, and there simply aren't any superscaler electronics at that kind of level. At least, not that the NSA is admitting to.
So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.
I have no special gift, I am only passionately curious. --Albert Einstein
Take this with a grain of salt, because I was part of the group that developed the chipset for the first Superdome systems (PA-RISC). I'm probably a little biased.
A 64-way Superdome system is spread across sixteen plug-in system boards. (Imagine two refrigerators next to each other; it really is that big.) A partition is made up of one or more system boards. Within a partition, each processor has all of the installed memory in its address space. The chipset handled the details of getting cache blocks back and forth among the system boards.
That's a huge amount of memory to have by direct access. Access is pretty fast, too.
Still, they were doubtless pretty expensive. HP-UX didn't allow for on-the-fly changes to partitions, but the chipset supports it. (The OS always lagged a bit behind. We built a chip to allow going above 64-way, but the OS just couldn't support it. A moral victory.) Perhaps Linux could get that support in place a little more quickly....
The United States of America: We mean well.
Smaller, say 4 or 8 way NUMA boards, that are within the means of the average geek?
I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.
I don't need no instructions to know how to rock!!!!
Linux scaling to 512 processors:/ columbia/
http://www.sgi.com/features/2004/oct
The story should be HP has finally caught up to where SGI were 2 years ago.\
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.
Are you a cluster salesman by chance?
A "big iron" system like one of these has exactly the same CPU-memory ratio as any cluster box - they are COMMODITY CPUs, you put 2-4 of them per bus in these big systems just as you put 2-4 of them on a bus in each box of a cluster. And each of these buses has a chunk of memory located off that bus right next to those CPUs, and an interface to IO as well. So your implication that clusters are somehow "faster" because nothing is shared is ludicrous - one of these big boxes can do exactly the same thing.
The difference between a cluster and a big iron setup like these is "What happens when I need to get to memory/other CPUs/disk that is not local to the CPU?"
And that's where clusters suck. While a big, single-image system can have a processor on its own bus with its own memory and disk just as well as a cluster can, when a cluster needs to get at non-local stuff, it has to spend micro to milliseconds pushing those transactions through a few network layers out onto a slow physical net where they then have to be readdressed once they arrive at the remote system and accepted and interpreted by that operating system. In one of these big systems, remote resources look exactly like local resources, except for access time, which instead of taking micro or milliseconds, takes nanoseconds.
And this isn't new either, supercomputers have been doing this since the 80's. How you figure multiple CPUs running separate OS's over ethernet is faster than multiple CPUs running under the same OS on a NUMA archetecture is beyond me.
If you were to read more about Superdome, you would find that each set of 2 or 4 processors have their own memory, and PCI I/O bus, comprising what is called a "cell".
The memory and I/O devices in a cell are accessible to all the other cells via a interconnect. The speed, latency, and bandwidth varies based on how "distant" the destination cell resides from the source cell, but it is still much faster than most clusters.
My kernel only goes up to 11.
How'd you get a three processor system? Is it a quad board, discounted heavily because one socket was broken? That'd be neat, where'd you get it?
Infuriate left and right
This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.
Absolutely. This is why we should be wary of claims that have been made (and posted on Slashdot recently) that Linux 'scales to 512 or 1024 processors' (as in some SGI machines). This size of machine is only effective for very specialised software. A report that the kernel scales well to 64 processors is far more believable, and is a sign of the increasing quality of Linux.