BigTux Shows Linux Scales To 64-Way
An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
I know linux is pretty good from a security sence (compared to windows, at least), and I'm not surprised to find it operates on exotic setups, but is there that many programs out there that support such a setup? or ones that will actually benefit from this many processors? Or is the point of this system to develop custom business for their use? Or is it for a data server of some sort that can benefit from multiple cores answering requests?
lol: You see no door there!
While FreeBSD is a great OS/kernel, it doesn't scale as well as Linux, end of story.
Huh? What smoke are you craking? Here is the comparison of MS's latest and greatest Windows 2003 server editions So, umm where is this double of what Linux supports? Plain vanilla Linux 2.6 can do 64-way no problem. Actually, SGI has had single image 128-way Linux system out for a while. They should have 256-way, single image Linux system out soon. That is more then MS can even touch. Maybe do some research before you just shoot off FUD.If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
If it can scale to 16 procs well, it will scale to 64 procs well.
Until you start talking about double that amount of procs, which is what Windows Server does these days
Wrong. Windows Server 2003 supports a maximum of only 64 processors, and I believe it was significantly tested only on 32-way and smaller machines.
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
NASA's Columbia cluster ^ 512-way SGI machines running Linux (actually 20 of them...) Not to mention "Columbia's record results were achieved running the LINPACK benchmark on 8,192 of the NASA supercomputer's 10,240 processors. Columbia also achieved an 88 percent efficiency rating on the LINPACK benchmark, the highest efficiency rating ever attained in a LINPACK test on large systems." from http://www.sgi.com/company_info/newsroom/press_rel eases/2004/october/worlds_fastest.html
Nice thing about processes is that they do not share memory. As such, the processes will be localized as would all the memory access. OTH, if you had just ONE big process loaded with nothing but threads, you would likely find the memory backplane going into highgear as data would be moved around abit.
The problem is that most resources (memory, the bus, disks, etc) can only be used by one CPU at a time. So, for problems which are resource-intensive, you're generally better to cluster than to use SMP, so that each processor has its own bus, memory, etc.
No, you have a misconception. On these REAL big iron systems, each CPU (or each few CPUs) does have its own busses, memory, and io busses.
So in that regard it is as good as a cluster, but then add the fact that they have a global, cache coherent shared memory and interconnets that shame any cluster.
The only advantage of a cluster is cost. Actually redundancy plays a role too, although less so with proper servers, as they have redundancy built in, and you can partition off the system to run multiple operating systems too.
To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.
Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.
Not really. Check the world's second fastest supercomputer. It is a cluster of 20 512-way IA64 systems running Linux.
Yes. From the link:
Brooks and his team instead pointed to Kalpana, an Intel® Itanium® 2-based, 512-processor SGI® Altix® 3000 system in use at NASA Ames since November 2003 and named to honor Kalpana Chawla, a NASA scientist lost in the Columbia accident.. In less than six months, Taft says, the Kalpana system - the first 512-processor Linux® system ever to operate under a single Linux kernel - had revolutionized the rate of scientific discovery at NASA for a number of disciplines. On NASA's previous supercomputers, simulations showing five years worth of changes in ocean temperatures and sea levels were taking 12 months to model. But on the SGI® Altix® system, scientists could simulate decades of ocean circulation in just days, while producing simulations in greater detail than ever before. And the time required to assess flight characteristics of an aircraft design, which involves thousands of complex calculations, dropped from years to a single day. "That kind of leap is incredible," says Taft. "What took a year on the best computing technology previously available, we could now accomplish in days on the Altix system."
This is an unmodified stock 2.6 kernel (well it's patched with stuff that's in distros, and will be in the next kernel). Out of the box, it detected the NUMA set up, memory partitions, the whole bit.
The SGI boxes are nothing like the stock kernel.
I don't need no instructions to know how to rock!!!!
Oh, and if you think the latest implementation of Linux thread are slower, especially slower then MS Windows, you are an idiot. Here is are some test from IBM. Current Linux threads were spawning at more then 10,000 PER SECOND while MS Windows was spawning barely 6,000. Linux Thread performance, scroll down to the "pretty" graphs. Oh, and these numbers are higher then Solaris. Linux threads and Linux processes spawn _MUCH_ faster then the best MS has to offer and faster then Solaris.
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
LinuxThreads is the old implementation still used in vanilla 2.4. It wasn't entirely POSIX compliant I believe (but very close).
1 755784e 3d90d637b774f233d5b8f42d e/422
IBM worked on their own threading implementation for linux (NGPT) that was 2 times the speed of LinuxThreads. Then NPTL was developed which was 4 times the speed of IBM's implementation.
I believe the link you provide are the benchmarks for IBM's implemenation (but not sure, I merely skimmed through).
Anyway, here are some good links on NPTL and NGPT:
http://kerneltrap.org/node/429?PHPSESSID=d
http://kerneltrap.org/no
Have you used any _recent_ Linux thread? LinuxThreads is an implementation of the Posix 1003.1c thread package.
Dude, get with the times, LinuxThreads are obsolete. Kernel 2.6 / glibc 2.3 use NPTL, which launches new threads four times faster than LinuxThreads, allows you to have more than 8192 threads per process, doesn't require you to have lots of manager threads that don't do anything useful, delivers signals to threads as opposed to processes, and is actually more-or-less POSIX compliant.
I've been using NPTL on my workstation for 12 months, and I haven't looked back (except when early versions of Mono were incompatible with NPTL). You talk about "any _recent_ Linux thread" - but it looks like you are using a Debian Woody...
Global shared-memory can be done on OpenMOSIX, using the Migshm extension, which provides you with Distributed Shared Memory.
There is a world of difference between emulating it with the operating system / programming environment, and having hardware cache coherent global shared memory.
The Altix uses 4-way CPU "bricks", along with networking and memory bricks, which you can then use to assemble a system. Yes, resources are visible globally, and it is a LOT faster than a PoP (pile-of-pcs) cluster using ethernet, but it is still a cluster of 4-way nodes.
No it is not. The big difference is that it isn't just "networking" them anymore than 2 CPUs on a SMP motherboard are networked. It is a specialty interconnect with higher bandwidth and lower latency than you'll find in anything you think of as a network. It also directly carries the cache directory protocol on the wire rather than TCP packets.
It is not a cluster. If you think it is then you either don't know what a cluster is or you don't know what an Altix is.
It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.
I'll repeat it for you for the 100th time. This does not get any better in a cluster. In fact, it gets *much* worse because the latency and bandwidth on the interconnect is so much worse.
Why do you think people pay so much money for one when they could get 1000 cheap P4's and cluster them? Do you seriously think you know more about the subject than the people making and buying these things? (Hint: you don't)
SGI has a linux Machine that scales to 512 proccessors... Right now in real world enviroments. Being used, being bought.
/dev/, proccesses load balance from machine to machine, etc etc.)
The point of the article is that this is the STANDARD Linux kernel. The same exact thing that you can download from www.kernel.org
Not a hacked setup designed specificly to work with hardware like they did with your special Altix, AIX, or hacked up Linux 2.4 series versions.
This is proof that you can make a kernel that you can use to run a embedded platform on a 100mhz Pentium is scalable enough to run 64bit 64proccessor classic Unix Big Iron machine.
Just a couple years ago there were originizations that would of have to payed hundreds of thousands of dollars in software developement, licensing fees, and support costs to be able to do the safe thing.
Hell, with the OpenSSI clustering technology I have 3 PC's in my basement running Debian that have in a single image cluster. (one unified root filing system, failover capabilities, network load balancing capabilities, one unified
All of it rocks and is free and Free. Try doing this with Windows or Altix you'd be broke before you get finished.
Right now the Linux developement model is creating free software that rivals and even in some cases surpasses all other closed source rivals. The only thing that is "better" is when you take a AIX or a Solaris setup and specificly design it to be used with a specific machine. However that is increasingly impractical and partally explains why the future of Solaris looks bleak and IBM is switching focus from AIX to Linux and why other traditional Unix companies are beginning to abandon their propriatory OSes.
It's going to take a few more years and probably a 3.0 linux kernel to complete the transformation of Unix back to it's original open source (think AT&T giving source code away with the OS, and the original BSD project) roots. , but it's happenning.