23 Second Kernel Compiles
b-side.org writes "As a fine testament to how quickly linux is absorbing technology formerly available only to the computing elite, an LKML member posted a
23 second kernel compile time to the list this morning as a result of building a 16-way NUMA cluster. The NUMA technology comes gifted from IBM and SGI. Just one year ago, a
Sequent NUMA-Q would have cost you about USD $100,000. These days, you can probably build a 16-way Xeon (4X 4-way SMP) system off of ebay for two grand, and the NUMA comes free of charge!"
ok..I'm NOT about to start the perverbial deluge of people wanting to know about a beowulf cluster of these things. But what I will ask is this: if it can do that for a kernel, I wonder how long it will take to do Mozilla, or XFree? It'd be interesting to see those stats.
JoeLinux
This may be good news, but what the heck! They should have at least included the .config that they used so that we can know what drivers/modules that are compiled with it, or maybe this is just bare-bones kernel enough to run the basic. We need to know the complexity of the configuration before we could really say it's fast.
Take-off every
You can't build a NUMA cluster worth a crap without a fast, low-latency interconnect.
Sequent's NUMA Boxen use a flavor of SCI (Scalable Coherent Interface) which is integrated into the memory controller.
While you can use some sort of PCI-based interconnect, the results are just plain not worth it.
Infiniband should be better, though I've heared the latency is too high to make this a marketable solution.
Keep your eyes on IBM's Summit chipset based systems. These are quads tied together with a "scalability port" and go up to 16-way. They should go to 32 or higher by 2003. That's when NUMA will -finally- be inevitable...
Maybe this is a silly question..
:-)
Yes it is...
-adnans
"In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd people." --Linus Torvalds
but why would you want to compile a kernel in 23 seconds?
.config file. I'll compile it, and send back to you by email a clickable link to download your custom tarball. Of course no one here would trust a remotely compiled kernel :)
I think this benchmark is used time and time again because its really the only one that nearly any Linux user would be able to compare their own experiences to. If they said 1.2 GFLOPS, I (and I suspect most others) could only say "Wow, that sounds like a lot. I wonder what that looks like." OTOH, I have seen how long it takes to download 33 Slackware diskettes in parallel on a v.34 modem, and I still run 3 P75's today.
I've been told that I will soon be deploying Beowulf HPC clusters to many clients, including universities and biomedical firms. If they were to tell me that the clusters will be able to do protein folds (or whatever they call it -- referring back to the nuclear simulation discussion) in "only 4 weeks", I won't have a clue as to how to scale that relative to customary performance of the day.
Sure, there are many other applications that are run on clusters, but kernel compiles are the ones that all of us do. It can give us an idea of what kind of performance you'd get out of other processor-intensive operations. And many people will tell you there are so many variables with kernel compiles that its ridiculous to compare the results.
Check out beowulf.org and see what people are doing with cluster computing. I've always wanted to open a site that compiles kernels for you. Just select the patches you want applied and paste the
Intelligent Life on Earth
what about the interconnect? the machine in question is /not/ a simple beowulf cluster, it's NUMA. Non Uniform Memory Architecture, which implies there is some form of memory architecture, and that the main difference between that architecture and that of a normal computer is that it is non-uniform.
/not/ a collection of cheap PCs connected via 100/1G ethernet or other high-speed packet interconnect.
Ie, the CPUs in this computer share a common address space and can reference any memory, just that some memory (eg located at another node) has a higher cost of access than other memory. (as opposed to a typical SMP system where all memory has an equal 'cost of access').
at the moment, under linux, this implies that there is special hardware in between those CPUs to provide the memory coherency - ie lots of bucks - cause there is no software means of providing that coherency (least not in linux today).
NB: normal linux SMP could run fine on a NUMA machine (from the memory management POV), but it would be slower because it would not take the non-uniform bit into account.
anyway... despite what the post says, this machine is
I use Friend/Foe + mod-point modifiers as a karma/reputation system.
By computer graphics technology, do you mean a render-farm? That would be much better suited to a standard beowolf cluster, because the interprocess communication is minimal. That is an example of an "embarrasingly parallel" compumpting problem. As for live graphics, an Onyx workstation doesn't benefit from CPU power so much as its Reality Engine/Infinite Reality graphics pipeline. When you need better graphics performance, you can utilize multiple graphics pipelines. Some of the Onyx 3000s can use (I think) as many as 16 different IR3s for improved graphics output, like in RealityCenters.
The point of this article isn't that kernel compilation is fast because it is usually CPU bound, and 16 CPUs alleviate that problem. If fact kernel compiliation isn't strictly CPU bound... there are other performance limits too, especially disk performance. The significance of this article is that multithreaded kernel compiles benefit from the increased interprocess communication potential in NUMA architectures... performance would be much worse trying to spread that across a beowolf cluster.
While rendering (not displaying) graphics or running basic number crunching does not benefit much from a NUMA setup as compared to a beowolf style setup, some complex equation do benefit... computing the first million digits of Pi would use interprocess communication, as would large scale data minig application. It's been a few years since I've been there, I saw a huge cluster of Origin 2000s CC-NUMAed together with one Onyx 2, which handled displaying the results of the data mining. (An Onyx2 is basically an Origin 2000 with a graphics pipeline. An Onyx 3000 without any graphics bricks is an Origin 3000.)