Linux May Need a Rewrite Beyond 48 Cores
An anonymous reader writes "There is interesting new research coming out of MIT which suggests current operating systems are struggling with the addition of more cores to the CPU. It appears that the problem, which affects the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says. Luckily, we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?)."
It appears that the problem, that affect the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says.
Seriously? You picked that over my submission?
I submitted this earlier this morning I guess my submission was lacking. But if you're interested in the original MIT article and the actual paper (PDF):
eldavojohn writes "Multicore (think tens or hundreds of cores) will come at a price for current operating systems. A team at MIT found that as they approached 48 cores their operating system slowed down. After activating more and more cores in their simulation, a sort of memory leak occurred whereby data had to remain in memory as long as a core might need it in its calculations. But the good news is that in their paper (PDF), they showed that for at least several years Linux should be able to keep up with chip enhancements in the multicore realm. To handle multiple cores, Linux keeps a counter of which cores are working on the data. As a core starts to work on a piece of data, Linux increments the number. When the core is done, Linux decrements the number. As the core count approached 48, the amount of actual work decreased and Linux spent more time managing counters. But the team found that 'Slightly rewriting the Linux code so that each core kept a local count, which was only occasionally synchronized with those of the other cores, greatly improved the system's overall performance.' The researchers caution that as the number of cores skyrockets, operating systems will have to be completely redesigned to handle managing these cores and SMP. After reviewing the paper, one researcher is confident Linux will remain viable for five to eight years without need for a major redesign."
I don't know, guess I picked a bad title or something?
Luckily we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?).
Again, seriously? What does "(Windows?)" even mean? As you pass a certain number of cores, modern operating systems will need to be redesigned to handle extreme SMP. It's going to differ from OS to OS but we won't know about Windows until somebody takes the time to test it.
My work here is dung.
They have an one-off error in their math, it's actually 9 times a 6 core CPU. So, at 42 cores a rewrite is needed.
It's not the case of not being able to do such, but instead about where there are performance regressions. Of course it's possible to run Linux on multiple hundreds of cores, but it seems that after 48 cores there is a performance regression and thus all those cores don't benefit as much as they could. That is the issue here.
Can somebody please explain what the fuck they are actually talking about? They've dumbed down the terminology to the point I have no idea what they are saying. Is this some kind of cache-related issue? Inefficient bouncing of processes between cores? What?
It looks like TFS was written by a Windows fanboy; why mention Linux specifically when it is a general problem? Why try to half-assedly imply that Windows is more advanced than Linux?
Yet Another Tech Blog
(but so much more, including game and movie reviews)
http://yanteb.peasantoid.org
I'm still waiting for Windows to work well on ONE.
No kidding. SGI's Altix is a huge box full of multi-core IA-64 processors. 512 to 2048 cores is more normal, but they were reaching 10240 last I checked. This is SMP (NUMA of course), not a cluster. I won't say things work just lovely at that level, but it does run.
48 cores is nothing.
I thought this as well, but after more carefully reading the article, I *think* I see what the problem is. It's not really a problem with large numbers of cores in a system, so much as a problem with large numbers of cores on a chip. Since the multicore chips share caches (level 2 cache is shared, level 1 cache isn't IIRC, but I could be wrong) it's actually cache memory where the issue lies. I've worked on single system image SGI systems with 512 cores, but those systems were actually 256 dual core chips. That works fine, and assuming well written SMP code performance scales as you'd expect with number of cores.
I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
Hahaha. Oh arrogances from ignorance, how I loath you.
The Kruger Dunning explains most post on
SGI has some awfully big single-system-image linux boxes.
Not really. SGI has big NUMA machines, with a single Linux kernel per node (typically under 8 processors), some support for process / thread migration between nodes, and a very clever memory controller for automatically handle accessing and caching remote RAM. Each kernel instance is only responsible for a few processes. They also have a lot of middleware on top of the kernel that handles process distribution among nodes.
It's an interesting design, and the SGI guys have given a lot of public talks about their systems so it's easy to find out more, but it is definitely not an example of Linux scaling to large multicore systems.
I am TheRaven on Soylent News
We've known about this problem for ... well, as long as we've had more than one core - actually as long as we've had SMP... You increase the number of cores/CPUs, you decrease available memory thruput per core, which was already the bottleneck anyway. Am I missing something here?
So, they found scalability problems in some microbenchmarks. Well, some of the scalability paths cited in the paper will be fixed when Nick Piggin's VFS scalability patchset gets merged. But it's not like you need to rewrite every operative system to scale beyond 48 cores, it's just the typical scalability stuff, and the kind of scalability issues found these days are mostly corner cases (Piggin's VFS being an exception).
What they're saying is basically two things:
First, there's a bottleneck in the on-chip caches. When a core's working on data it needs to have it in it's cache. And if two cores are working on the same block of memory (block size being determined by cache line size), they need to keep their copies of the cache synchronized. When you get a lot of cores working on the same block of memory, the overhead of keeping the caches in sync starts to exceed the performance gains from the additional cores. That's not new, we've known that in multi-threaded programming for decades: when you've got a lot of threads dependent on the same data items, the locking overhead's going to be the killer. And we've known the solution for just as long: code to avoid lock contention. The easiest is to make it so you don't have multiple threads (cores) working on the same (non-read-only) memory at the same time, that just requires some thinking on the part of the developers.
Second, you only gain from additional cores if there's workload to spread to them usefully. If you've got 8 threads of execution actually running at any given time, you won't gain from having more than 8 cores. And on modern computers often we don't have more than a few threads actually using CPU time at any given moment. The rest are waiting on something and don't need the CPU and, as long as we aren't thrashing execution contexts too badly, they can be ignore from a performance standpoint. To take advantage of truly large numbers of cores, we need to change the applications themselves to parallelize things more. But often applications aren't inherently multi-threaded. Games, yes. Computation, yes. But your average word processor or spreadsheet? It's 99% waiting on the human at the keyboard. You can do a few things in the background, file auto-save and such, but not enough to take advantage of a large number of cores. The things that really take advantage of lots of cores are things like Web servers where you can assign each request to it's own core. And no, browsers don't benefit the same way. On the client side there are so (relatively) few requests and network I/O's so slow relative to CPU speed that you can handle dozens of requests on a single core and still have cycles free assuming you use an efficient I/O model. But it all boils down to the developers actually thinking about parallel programming, and I've noticed a lot of courses of study these days don't go into the brain-bending skull-sweat details of juggling large numbers of threads in parallel.
Linux supposedly scales to 1024 or something like that. This is not what they supposedly scale to, but the performance impact of actually trying to use that many cores.
The K42 project at IBM Research investigated the benefit of a complete OS rewrite with scalability to very large SMP systems in mind. This is an open source operating system supporting Linux-compatible API and ABI.
Their target systems, "next generation SMP systems", back in 2003 seems to have become the current generation of SMP/multi-core systems in the meantime.
Tilera Corp. already has CPU architecture with 16-100 cores per chip.
TILE-Gx family
Support for these is already being included in the mainline kernel.
...there is some time left to come up with a new Linux (Windows?).
Windows, the new Linux.
You read it here first...
XKCD:Xeric Knowledge Comically Dispen
(But a non-proprietary NVIDIA driver will still not play your Flash movies smoothly. :P)
Lets drive the greenhorn OUT! No filthy high UID's with their spelling and gramar and solid well researched non-sensationlist writing. I want my editors to rape the language (bonus points if it is several languages at once) and sent my heart racing by raising my bile and fear of the unknown and known.
Headlines sell adverts. Truth, accuracy, honesty do not. Accept it, you are reading slashdot, it works.
MMO Quests are like orgasms:
You may solo them, I prefer them in a group.
Um, no. The early Itanium-based Altixes (Altices?) could go up to 512 cores running a single copy of Linux. The new Nehalem-based Altixes can have up to 2048 cores in a single system image IIRC. We just finished acceptance testing on an SGI Altix UV 1000 with 1024 cores. It runs one copy of Linux on it.
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
The point isn't that NT Scales to 256 cores, the point is how efficient it is when scaling to this many processors. The NT Kernel in Win7 was adjusted so that systems with 64 or 256 CPUs have a very low overhead handling the extra processors.
Linux in theory (just like NT in theory) can support several thousand processors, but there is a level that this becomes inefficient as the overhead of managing the additional processors saturates a single system. (Hence other multi-SMP models are often used instead of a single 'system')
Just simply Google/Bing: windows7 256 Mark Russinovich
You can find nice articles and even videos of Mark talking about this in everyday terms to make it easy to understand.