BigTux Shows Linux Scales To 64-Way

← Back to Stories (view on slashdot.org)

BigTux Shows Linux Scales To 64-Way

Posted by timothy on Tuesday January 18, 2005 @03:28PM from the can't-let-you-do-that-dave dept.

An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."

18 of 247 comments (clear)

Min score:

Reason:

Sort:

Pardon my ignorance, but... by wizard_of_wor · 2005-01-18 15:34 · Score: 3, Interesting

What parallel-computing activity doesn't involve intermittent activity by a single processor? You have to spawn the parallel job somehow, and typically that starts as a single process. Is the implication here that compiling is pipelined, but linking is a single-CPU job?

--
If you mod me down, I shall become more powerful than you can possibly imagine.
1. Re:Pardon my ignorance, but... by drmarcj · 2005-01-18 15:39 · Score: 2, Interesting
  
  I could imagine an SMP job where you immediately spawn N new processes each which computes a certain subset of a given dataset. Assuming you never collected the results at the end (say, you just write out the results to files on disk for later analysis), you would technically never need inter-process communication, thus no serial processing by a single "master" process. But yes, you're right. You almost never do this in parallel processing, and in that sense the post is misleading in assuming there is anything but a theoretical possibility of no overhead in an SMP.
2. Re:Pardon my ignorance, but... by bluGill · 2005-01-18 15:46 · Score: 2, Interesting
  
  As a simple question you are correct that every parallel computing job has some single processing parts. Those who study parallel systems spend most of their time looking for way to make sure that all processors are in use. Often an algorithm that less than optimal for single processor systems can use more processors, so a choice needs to be made.
  The other major issue is communication time. An algorithm that depends on all the CPUs talking all the time may appear fast on paper, but it will be slower than the single processor version!
  In short, you came really close to the point, while missing it.
Hrmm by Nailer · 2005-01-18 15:38 · Score: 4, Interesting

SGI
Unisys
Fujitsu
HP

It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).
1. Re:Hrmm by chthon · 2005-01-18 19:34 · Score: 3, Interesting
  
  This is about an unmodified 2.6 kernel.
  I have the articles at home (Linux Journal) about the SGI systems. First they do measurements on their systems, and then patch the bottlenecks in the kernel.
  I don't think these patches can easily be put into a standard kernel.
Re:So this time.. by ikewillis · 2005-01-18 15:43 · Score: 5, Interesting

This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.
The answers have to do with fine grained locking of kernel services, so that the number of resource contentions between processors can be mitigated through a diverse number of locks with the hope that diversifying locks will ensure that fewer will be likely to be held at a given time, or designing interfaces that don't require locking of kernel structures at all.
At any rate, Amazon successfully powers their backend database with Linux/IA64 running on HP servers. YMMV, but if it's good for what most would consider the preminent online merchant, it's probably good enough for you too.
Re:A little factoid for you by afidel · 2005-01-18 16:08 · Score: 3, Interesting

Correct, AFAIK the biggest windows 2003 datacenter installs are on Unisys ES7000's and those only support 32-way windows partitions. The box can hold 64 Xeon's so I would say that Unisys isn't comfortable with the scalability of windows to the full system size, otherwise they'd be shouting it from the rooftops.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:excuse my ignorance by jd · 2005-01-18 16:16 · Score: 4, Interesting

A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking. (On a 64-way system, any given CPU can only have control of a given resource 1/64th of the time. Unless this is handled extremely well, this is Bad News.)

In general, people use clusters of single or dual-processor systems, because many problems demand lots of hauling of data but relatively little communication between processors. For example, ray-tracing involves a lot of processor churning, but the only I/O is getting the information in at the start, and the image out at the end.

Databases are OK for this, so long as the data is relatively static (so you can do a lot of caching on the separate nodes and don't have to access a central disk much).

A 64-way superscaler system, though, is another thing altogether. Here, we're talking about some complex synchronization issues, but also the ability to handle much faster inter-processor I/O. Two processors can "talk" to each other much more efficiently than two ethernet devices. Far fewer layers to go through, for a start.

Not a lot of problems need that kind of performance. The ability to throw small amounts of data around extremely fast would most likely be used by a company looking at fluid dynamics (say, a car or aircraft manufacturer) because of the sheer number of calculations needed, or by someone who needed the answer NOW (fly-by-wire systems, for example, where any delay could result in a nice crater in the ground).

The problem is, most manufacturers out there already have plenty of computing power, and the only fly-by-wire systems that would need this much computing power would need military-grade or space-grade electronics, and there simply aren't any superscaler electronics at that kind of level. At least, not that the NSA is admitting to.

So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Interesting. Almost exactly a year ago... by gnunick · 2005-01-18 16:58 · Score: 3, Interesting

IBM packs 64 Xeons into a single server (Jan 15, 2004)
"[CTO of IBM's xSeries server group Tom Bradicich] acknowledges that there are challenges in producing such a large system -- including building support into Windows and Linux, neither of which are suited for 64-processor systems today"

Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.

--
I have no special gift, I am only passionately curious. --Albert Einstein
Re:excuse my ignorance by Sir+Nimrod · 2005-01-18 16:59 · Score: 4, Interesting

Take this with a grain of salt, because I was part of the group that developed the chipset for the first Superdome systems (PA-RISC). I'm probably a little biased.

A 64-way Superdome system is spread across sixteen plug-in system boards. (Imagine two refrigerators next to each other; it really is that big.) A partition is made up of one or more system boards. Within a partition, each processor has all of the installed memory in its address space. The chipset handled the details of getting cache blocks back and forth among the system boards.

That's a huge amount of memory to have by direct access. Access is pretty fast, too.

Still, they were doubtless pretty expensive. HP-UX didn't allow for on-the-fly changes to partitions, but the chipset supports it. (The OS always lagged a bit behind. We built a chip to allow going above 64-way, but the OS just couldn't support it. A moral victory.) Perhaps Linux could get that support in place a little more quickly....

--
The United States of America: We mean well.
Will we ever see by stratjakt · 2005-01-18 17:04 · Score: 3, Interesting

Smaller, say 4 or 8 way NUMA boards, that are within the means of the average geek?

I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.

--
I don't need no instructions to know how to rock!!!!
1. Re:Will we ever see by SunFan · 2005-01-18 17:40 · Score: 2, Interesting
  
  8-way multicore chips will be available within a year. Not exactly NUMA, but they'll probably have other nuances to keep you entertained.
  
  --
  -- Microsoft is the most expensive commodity operating system and office suite vendor in the marketplace.
Read my lips by Chatz · 2005-01-18 17:48 · Score: 5, Interesting

Linux scaling to 512 processors:
http://www.sgi.com/features/2004/oct/ columbia/

The story should be HP has finally caught up to where SGI were 2 years ago.\

--
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
1. Re:Read my lips by hackstraw · 2005-01-19 00:03 · Score: 2, Interesting
  
  I've heard through the grapevine that the mods to the linux kernel have stability issues.
  
  I am someone who might be in the market for a SGI Altix or XD1, but a very parallel broken box does not scale that well in my opinion.
Re:Interesting. by Anonymous Coward · 2005-01-18 18:41 · Score: 2, Interesting

The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.

Are you a cluster salesman by chance?

A "big iron" system like one of these has exactly the same CPU-memory ratio as any cluster box - they are COMMODITY CPUs, you put 2-4 of them per bus in these big systems just as you put 2-4 of them on a bus in each box of a cluster. And each of these buses has a chunk of memory located off that bus right next to those CPUs, and an interface to IO as well. So your implication that clusters are somehow "faster" because nothing is shared is ludicrous - one of these big boxes can do exactly the same thing.

The difference between a cluster and a big iron setup like these is "What happens when I need to get to memory/other CPUs/disk that is not local to the CPU?"

And that's where clusters suck. While a big, single-image system can have a processor on its own bus with its own memory and disk just as well as a cluster can, when a cluster needs to get at non-local stuff, it has to spend micro to milliseconds pushing those transactions through a few network layers out onto a slow physical net where they then have to be readdressed once they arrive at the remote system and accepted and interpreted by that operating system. In one of these big systems, remote resources look exactly like local resources, except for access time, which instead of taking micro or milliseconds, takes nanoseconds.

And this isn't new either, supercomputers have been doing this since the 80's. How you figure multiple CPUs running separate OS's over ethernet is faster than multiple CPUs running under the same OS on a NUMA archetecture is beyond me.
Re:Interesting. by ptbarnett · 2005-01-18 19:12 · Score: 2, Interesting

The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.
If you were to read more about Superdome, you would find that each set of 2 or 4 processors have their own memory, and PCI I/O bus, comprising what is called a "cell".
The memory and I/O devices in a cell are accessible to all the other cells via a interconnect. The speed, latency, and bandwidth varies based on how "distant" the destination cell resides from the source cell, but it is still much faster than most clusters.
11?!? by A+nonymous+Coward · 2005-01-18 19:52 · Score: 2, Interesting

My kernel only goes up to 11.

How'd you get a three processor system? Is it a quad board, discounted heavily because one socket was broken? That'd be neat, where'd you get it?

--
Infuriate left and right
Re:So this time.. by Decaff · 2005-01-18 21:43 · Score: 2, Interesting

This is the real question which is oft ignored. There is far too great an emphasis of being able to manage n CPUs rather than how effectively kernel services operate on n CPUs.

Absolutely. This is why we should be wary of claims that have been made (and posted on Slashdot recently) that Linux 'scales to 512 or 1024 processors' (as in some SGI machines). This size of machine is only effective for very specialised software. A report that the kernel scales well to 64 processors is far more believable, and is a sign of the increasing quality of Linux.