Linux Gains Support for NUMA
soosterh writes "CNet has an article about a NUMA patch from IBM. It says that the improvement adds some support in Linux for nonuniform memory access, or NUMA, a design for higher-end servers with many processors. Linus Torvalds, the original creator of the operating system and still its top authority, accepted the update this month into version 2.5, the current test version of the software."
Seriously, this is something that will close one of the last remaining gaps between Linux and Solaris. Not that it will do much good for 99% of users out there, but if you need this, you *really* need it.
And, of course, also support for the Hammer architecture, which is (smaller scale) NUMA. Each processor in an x86-64 system has its own memory bus, so time to access memory depends on whether the memory is directly connected to a given processor, or whether another processor needs to mediate, the definition of NUMA.
I've had this sig for three days.
Someone correct me if I'm wrong.
Also, I seriously doubt if any desktop machine will use NUMA; it's primarily about systems which use system boards, where there are CPUs & RAM on a board which slots into the system & a CPU can access memory on a local board faster than that on other boards. Desktops tend to use one "system board" (i.e. the motherboard) so there isn't the difference in speed for accessing the data.
Oh, for cryin' out loud. Dude, there's this thing called Google. Try it out some time.
That said, I'll give you a hint: non-uniform memory access. If you've got a computer that uses different banks of memory as a single physical address space, then that computer has a NUMA architecture.
If you want to maintain cache coherency across a NUMA system, you have to employ some tricks. These tricks are sufficiently complex to warrant their own name: ccNUMA.
I write in my journal
NUMA refers to a wide range of features. Everything from multipatch networking or SCSI, to memory allocation, to placing processes close to "good" memory. This particular patch simply makes processes run on CPUs where they're likely to be close to memory which they will need.
I'd imagine it's mainly for 64-bit as that's the kind of systems which tend to ship with NUMA (usually with MIPS or Itanium). Without knowing more, I couldn't comment as to whether it will work under 32-bit or not, but I can't see how it would be so limited.
That is an incredibly naive comment. NUMA systems have been around for quite a while (think Sequent), the current generation of IBM x440 are NUMA. These are all 32-bit Intel architectures.
This patch didn't even address memory, it only dealt with scheduling processes anyway.
You are correct. The LWN article on this just became available to non-subscribers and you can read it here:http://lwn.net/Articles/20741/
(BTW. Everyone should subscribe to LWN. It's an exceptional value)
the MIPS/Itanium systems the parent refers to are (I assume) the SGI Origin and Altix multiprocessor servers, both 64bit, the first MIPS/IRIX, the second Itanium/Linux:
Origin
Altix
"we demand rigidly defined areas of doubt and uncertainty!"
You do not have to run Linus stock kernel.
Not two vendors ship the same kernel. So in the end it's up to the vendor you use to tweak your kernel. Redhats are heavily patched to suit (what they belive) is there users needs..
I think thats a good system.
...can be found here.
they are copying Linux related news from CNET.
"More recently, the NUMA scheduler patch has been reworked (by Martin Bligh, Erich Focht, Michael Hohnbaum, and others) around a simple observation: most of the NUMA problems can be solved by simply restricting the current scheduler's balancing code to processors within a single node. If the rebalancer - which moves processes across CPUs in order to keep them all busy - only balances inside a node, the worst processor imbalances will be addressed without moving processes into a foreign-node slow zone. A simple (three-line) patch which did nothing but add the within-node restriction yielded most of the benefits of the full NUMA scheduler; indeed, it performed better on some benchmarks. Real-world loads, however, will require a scheduler which can distribute processes evenly across nodes. Occasionally it is necessary, even, to move processes to a slower node; a lot of CPU time on a lightly-loaded node will give better performance than waiting in the run queue on a heavily-loaded node. So a bit of complexity had to be added back into the new scheduler to complete the job."
Extracted from:
http://lwn.net/Articles/20741/
However, the main way you might be able to add RAM over and above the MB limit is via some kind of PCI card with DIMMS on it. I'm not sure how that would work over PCI (even 66MHz/64bit) or how it would work at a lower level, but it might get by some limits. The limits OP was asking about may be of the order of trying to get over 1GB of RAM for some simulation code. Of course if you need over 1GB of RAM, buy a system which supports it.
In any event, from what people are saying, the NUMA patch is a change to the scheduler, to ensure that processes run on the CPU nearest the RAM bank storing the data. I don't think it addresses trying to add RAM from other sources (either disk or hypothetical PCI card)
Contrary to what is said in the post, NUMA support has been in Linux for quite a while already. The recent patches accepted by Linus merely add NUMA awareness to the scheduler, which, while certainly being a prerequisite for Linux being used on production NUMA boxen, is not at all required for NUMA support in general.
quidquid latine dictum sit altum videtur.
Actually the HT implementation in the P4/Xeon chips does not act as you suggest in 1. When doing HT the cache is cut in half and each virtual CPU gets a half cache ... which is probably the main reason HT can yeild inferior performance for some applications.
... but I really couldn't help it too much. It just seems to be an overused word in CS/EE.
There is a very good reason for doing it this way. The P4 cache uses VIRTUAL addresses so if each virtual cpu is executing in a different virtual address space(which is allowed) then you need a way to differentiate which cache lines belong to each virtual cpu since they might very well both reference lets say virtual address 0xDEADBEEF which translates into a different physical address (and hence different data). Intel engineers went with the simple solution of splitting the cache in two, instead of adding an extra tag to each cache line which would have created extra overhead/latency on every cache access.
I apologize for overusing the word virtual
Thoughts on tech, Software Engineering, and stuff