BigTux Shows Linux Scales To 64-Way
An anonymous reader writes "HP has been demonstrating a Superdome server running the Stream and HPL benchmarks, which shows that the standard 2.6 Linux kernel scales to 64 processors. Compiling the kernel didn't scale quite so well, but that was because it involves intermittent serial processing by a single processor. The article also notes that HP's customers are increasingly using Linux for enterprise applications, and getting more interested in using it on the desktop..."
Nothing for you to see here, please move along.
I'm sure a lot of you have been to an Apple store at some point. On all of their systems, they have embedded buttons in the desktop, where you can launch safari, etc. Does anyone know how to do this? It's driving me nuts! thanks
That's what, 640.000?
Does it run Linux well?
If it can scale to 16 procs well, it will scale to 64 procs well.
Until you start talking about double that amount of procs, which is what Windows Server does these days, or hundreds of procs, which is what Cray has been doing for years, scalability at this small "scale" (haha) is not very impressive.
FreeBSD, the dying operating system that it is, still supports SMP on many procs much better than Linux. That's no one's fault, but jumping up and down saying how good Linux SMP is without looking at the competition makes you look a little foolish.
Well, normally I would just go along with it and quietly get my paycheck, but this time I had been inspired by recent Slashdot postings about the power of open source. I had done some studying up on my own, too.
So when my boss put the question to me, I responded with "That could work, but I'm thinking Ubuntu Warty Warthog or Debian Woody, with Derby 0.9 database and of course X-Bitch client to keep in touch".
Well, now I'm unemployed just like you all and I'm looking for a job. All I know is, nobody ever got fired for buying Dell and Microsoft. Damn slashbots... a curse on you!
What parallel-computing activity doesn't involve intermittent activity by a single processor? You have to spawn the parallel job somehow, and typically that starts as a single process. Is the implication here that compiling is pipelined, but linking is a single-CPU job?
If you mod me down, I shall become more powerful than you can possibly imagine.
I haven't had a 64-way since college.
And you?
"Look, Smithers! I'm Davy Crockett!"
SGI
Unisys
Fujitsu
HP
It looks like there might actually be a competitive marketplace for scalable multiprocessor Linux systems real soon now (if not already).
"serial processing" is most probably the linking step... "intermittent" probably means that they incrementally link groups of .o files, etc.
who cares about stuff like this. linux is on its way out.
I know your going for the +5 Funny mods, but .. your not a real linux user. Sorry, its plainly obvious.
;)
What the fuck is "X-Bitch client".
Now, of course I know your talking about "BitchX" the IRC client. But the fact you got the name wrong is just plain lame. I mean, couldn't you have googled it? There isn't a dash anyway. Thats studying? You couldn't get an MCSE!
YOU ARE TEH SUCK.
"Computers will never truly be free until the last windows user is strangled with the entrails of the last mac user."
But seriously, this is pretty cool - though I think the best thing about multi-processor systems past two or four is really the ability to run virtualized servers with two or four dedicated CPUs each inside an uber-CPU'd system.
This flies in the face of science.
I work on a SuperDome and would love to see it running Linux. HP-UX is such a pain!!!
I was raised on the command line, bitch
"Nemo me impune lacesset"
and you jab poo.
Quite informative and typical of what I've seen in a few other cases. Poorly "equipped" IT admin makes "dumb" but well-meaning proposal to switch to Linux/OSS. In some cases, he/she's kicked out on his/her butt (as in this case). Other times, the shop switches with disasterous results.
I know linux is pretty good from a security sence (compared to windows, at least), and I'm not surprised to find it operates on exotic setups, but is there that many programs out there that support such a setup? or ones that will actually benefit from this many processors? Or is the point of this system to develop custom business for their use? Or is it for a data server of some sort that can benefit from multiple cores answering requests?
lol: You see no door there!
Looking at the literature, Linux and Unix in general seems to be designed to keep processes as lightweight as possible. OTOH, Windows processes are a little heavier and take longer to start up.
Then, OTOH, Windows threads are very lightweight compared to the equivalent thread model in Linux. Benchmarks have shown that in multi-process setups, Unix is heavily favored, but in multi-threaded setups Windows comes out on top.
When it comes to multi-processors, is there a theoretical advantage to using processes vs threads? Leaving out the Windows vs Linux debate for a second, how would an OS that implemented very efficient threads compare to one that implemented very efficient processes?
Would there be a difference?
I like the way HP is taking their software distributing with offering Linux as a solution along with AMD processors. Dell attempts it but only for servers. HP I believe does the same, but at least them seem like they care more. nd that is what matters if we are going to be pushing Linux into enterprises AND the home... Hooray for HP!
_
Free 27" Sony WEGA TV
Check out AMDZone and the Inq.
MS now recommends just this kind of a system for a desktop when doing Longhorn :).
The hell with that--I just want a wireless driver for my Dell (Broadcom) PCI card. :-(
thats great to know that the kernel can handle 64way machines.. Especially since i just ordered one from my local pc store in bits to build myself..
Really the key will be when the system scales to 128processors and beyond.
Ah, shut up and get back to work, Bill!
Never mind Linux for a moment, I'm just amazed that 64 Itanium 2's have actually been sold...
To be efficient, the processors would need gigantic caches, to keep the load on the rest of the system down. Either that, or you COULD run the CPUs out of step over a bus that is 64 times faster than normal. I'd hate to be the person designing such a system, though.
Now, this system could be of extreme interest in the supercomputer world. One of the biggest complaints about clustering is the poor interconnects. This would seem to get round that problem. A Blue Gene-style cluster where each node is a 64-way SMP board, and you're running a few thousand nodes, would likely be an order of magnitude faster than anything currently on the supercomputer charts.
On the other hand, do we need to know what the weather is not going to be, ten times as often?
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Imagine a Beo-- oh wait...
you had me at #!
... use 10 faster processors.
Imagine a Beowulf cluster of those!
Looks like someone was up to those challenges, eh? 64-processor support *and* 64-bit support. Awesome news.
I have no special gift, I am only passionately curious. --Albert Einstein
Smaller, say 4 or 8 way NUMA boards, that are within the means of the average geek?
I'm not talking about mere mortal SMP systems, I wan't all the crazy memory partitioning and whatnot.
I don't need no instructions to know how to rock!!!!
You seem to forget that the enterprise users which fund development on big machines are usually the ones that supporting the entire projects you use.
:)
Between the kernel, your latest DBMS, etc, lots of companies fund the dollars to these projects (or the man hours).
Nonetheless- write one
when you see the word 'Linux', drink!
Be careful when you are at an Appel-store. Always walk with your back next to the wall. If the clerk tries the old "Oooopsie, I dropped the iPod. Can you please pick it up for me?'-trick, don't bend over.
What we do at the Santa Monica (CA) store is put a 16-port USB hub under the table with 802.11 dongles in all of the ports. Then we have a MIDI-over-IP downconverter to multiplex the shared memory requests. Wrap it all in an AppleScript® Dictionary and you're ready to roll!
Hope this helps.
Using Itanium 2 cpus jsut like the Superdomes... how is this new news?
||| I still can't believe Parkay's not butter.
I'm so confused. Itanium bad. Linux kernel scalability good. Help!
---
Posted as me for the negative karma whoring.
True, you can build very large clusters from these bricks, but the bricks themselves don't scale beyond a relatively small number of CPUs.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Linux scaling to 512 processors:/ columbia/
http://www.sgi.com/features/2004/oct
The story should be HP has finally caught up to where SGI were 2 years ago.\
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
Ruckus is a failure.
I also have a question: how come iptables sucks so much arse? Why don't the Linux d00ds just copy pf?
PF isn't really any better than iptables as far as I know. Lots of (openbsd) people like the syntax better... but if you can't handle iptables syntax, you shouldn't be administering a complex firewall in the first place.
Support for today's problems and the future DRM problems of tomorrow.
It also doesn't avoid the main point, which is that any given resource can only be used by one CPU at a time. If processor A on brick B is passing data along wire C, then wire C cannot be handling traffic for any other processor at the same time. That resource is claimed, for that time.
While it's true that you can only send one signal down a wire at "a time" (absent weird frequency stuff, although the wires are bidirectional, so you can really send two signals), "a time" in these systems is on the order of nanoseconds. So while only one CPU can use a wire in any given nanosecond, hundreds of CPUs can use the same wire within the same millisecond, which is close enough to "at the same time" to work as "at the same time", so you can have multiple streams of traffic using the same physical connection.
The only resource a CPU locks on is an exclusively owned (writable) cache line. CPUs share access to I/O space, and share access to cache lines that are read-only. CPUs can talk to "local" memory (on the same node) or memory on a node on the opposite side of the system, in an identical manner except for access latency (i.e. the address for a particular piece of memory is the same no matter which CPU is addressing it).
How does how many CPUs are in a brick have anything to do with whether it's an N-Way SMP system? A brick is just a physical box. The interconnect that connects the processors together extends over multiple bricks. The bricks just provide modularization - you could put all 64 CPUs in one brick if you wanted to, but the only difference would be cosmetic (additional pieces of metal between boards).
Do you really think anyone is building single boxes with 512 processors in them? These things come in *RACKS*.
To maximize resources to the absolute limit, you'd need a completely asynchronous computer. Such computers exist, sure, but they're usually very specialized and I know of none that are superscaler.
I'm not sure of the state-of-the-art for massively parallel asynchronous CPUs, but my guess is that they're nowhere near the same level as more traditional synchronous designs.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
My kernel only goes up to 11.
How'd you get a three processor system? Is it a quad board, discounted heavily because one socket was broken? That'd be neat, where'd you get it?
Infuriate left and right
Way to quote a fucking article from a year ago, douchenozzle...
Where do you think all this NUMA awareness came from? Sequent Engineers, that's where. Where do you think they are now?
I was under the impression that enterprise applications were normally limited by the speed of the hard-drive and RAM, applications like webserving and database management.
You see that brine there? That's my brine.
a kernel compile using a single Itanium 2 processor took about 19 minutes
And a kernel compile on a four way PPro 200 MHz took about two and a half minutes. Ok, that was a 2.2 kernel, where they probably used a 2.6 kernel, so that may account for a bit of the extra time, but still, 19 minutes? No wonder they need a 64-way box to make Itaniums do anything serious.
I take it you've never seen This Is Spinal Tap.
Get Firestarter. It's a GUI for iptables. Best thing to do is figure out what port need to be blocked and write a bash script so iptables can block those, allow others, etc and instant firewall assuming you won't chang eit much (home use).
The fucking news and the fucking article itself are misleading.
> A 64-way system may or may not be useful. It depends on the speed of the interconnects, and the way it handles bus locking.
Of course it IS useful. It is great for database consolidation (especially for SQL Server which practically doesn't scale horizontally), for example, as upgrades can be done in minutes and the whole goddamned thing is as stable as an Intel box can be.
And in case you missed what the FA said, they did NOT run an OS on 64 CPUs (that's why it's bullshit and misleading) but they partitioned those 64 CPU is 16 four-way servers. But hey - this is Slashdot and any Linux related hype is welcome....
> So, sure, there are people who could use such a system, but I cannot imagine many of them are in the market.
Sorry, pal, but HP sold $1b of such boxes in 2004. Manufacturing, telcos, utilities and many other users need "boxen" like these. I think they're slightly more suitable for Windows because of the way it can "add" (allocate, actually) processors to Exchange and SQL Server systems.
Really? I would have thought that the compilation of loads and loads of .c files is exactly the sort of thing that could be shared among processors. It certainly has been on projects that I've worked on.
make -j (num of processors) ?
Good gosh, slashdot is really going to pieces. Two people explain Spinal Tap to me, another comes up with a possibly real, possibly tongue-in-cheek answer, and, worst of all, someone mods me up as "insightful". What do I have to do, add footnotes and explanations?
I guess the two Spinal Tap explainers never heard the joke about only 10 people in the world. No, I'm not going to explain that.
This is pathetic. Insightful, jeez. Now watch someone mod this as flamebait or funny.
Infuriate left and right
Did I read that correctly, they've got Linux working way good on a c-64?? ;)
Do you know what an ethernet switch is? And why it's better than a hub? you're assuming that the resource management on these systems works like a hub. It doesn't - it works like a switch. the *ONLY* place that a CPU shares resources with other CPUs is on the processor bus - 2-4 CPUs share that, *EXACTLY* the same as in a cluster of Xeons or any other dual-CPU box. Once you get past the processor bus, everything is buffered. The CPU sends out whatever data request it has and off it goes. The interconnect takes care of making sure the wires are used appropriately, the CPU doesn't have to worry about it.
Now, yes, it's possible that a CPU needs something from memory or IO and it has to wait for it to come back, but EXACTLY the same thig would happen in a CPU in a cluster as well.
You very simply have asbsolutely no clue what you're talking about - a node in one of these huge systems functions pretty muc identically to one box in a cluster - it is archetectually the same. You don't add processors by sticking them on the same processor bus, you add processors by adding more nodes, each with their own memory and IO, and having a REALLY FAST interconnect between them, and an OS where everything is one system image.
Cluster: Many distinct computers networked together. Supercomputer: Many distinct nodes networked into one computer.
Like I've said, I've used transputers. Let me know when you find a distinct node in an array. You can't? Oh dear.
Seymore Cray, for many years, resisted multi-processor computers. Most of his designs were monolithic, on the grounds that a good design doesn't need to be MP. I guess that means that his designs weren't supercomputers, then. No? They were? Oh.
I guess that the only conclusion is the one in the Princess Bride - "You keep using that word. I do not think it means what you think it means".
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I find it interesting how well developed this is. I mean, how many linux coders actually have access to such hardware for testing/development purposes? Many of the larger projects can have a huge base of devs from within the userbase supplying patches/fixes/upgrades. I'm guessing that the userbase for the system described isn't very high (much less so for those able to much with running kernels on such)
Or perhaps most of it just scales up very nicely from smaller systems?