Slashdot Mirror


Linux May Need a Rewrite Beyond 48 Cores

An anonymous reader writes "There is interesting new research coming out of MIT which suggests current operating systems are struggling with the addition of more cores to the CPU. It appears that the problem, which affects the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says. Luckily, we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?)."

109 of 462 comments (clear)

  1. Original Source and Actual Paper by eldavojohn · · Score: 5, Informative

    It appears that the problem, that affect the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says.

    Seriously? You picked that over my submission?

    I submitted this earlier this morning I guess my submission was lacking. But if you're interested in the original MIT article and the actual paper (PDF):

    eldavojohn writes "Multicore (think tens or hundreds of cores) will come at a price for current operating systems. A team at MIT found that as they approached 48 cores their operating system slowed down. After activating more and more cores in their simulation, a sort of memory leak occurred whereby data had to remain in memory as long as a core might need it in its calculations. But the good news is that in their paper (PDF), they showed that for at least several years Linux should be able to keep up with chip enhancements in the multicore realm. To handle multiple cores, Linux keeps a counter of which cores are working on the data. As a core starts to work on a piece of data, Linux increments the number. When the core is done, Linux decrements the number. As the core count approached 48, the amount of actual work decreased and Linux spent more time managing counters. But the team found that 'Slightly rewriting the Linux code so that each core kept a local count, which was only occasionally synchronized with those of the other cores, greatly improved the system's overall performance.' The researchers caution that as the number of cores skyrockets, operating systems will have to be completely redesigned to handle managing these cores and SMP. After reviewing the paper, one researcher is confident Linux will remain viable for five to eight years without need for a major redesign."

    I don't know, guess I picked a bad title or something?

    Luckily we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?).

    Again, seriously? What does "(Windows?)" even mean? As you pass a certain number of cores, modern operating systems will need to be redesigned to handle extreme SMP. It's going to differ from OS to OS but we won't know about Windows until somebody takes the time to test it.

    --
    My work here is dung.
    1. Re:Original Source and Actual Paper by VorpalRodent · · Score: 4, Funny

      What does "(Windows?)" even mean?

      I read that as saying "Windows is the new Linux!". Clearly the submitter is trying to incite violence in the Slashdot community.

      --
      Take it to the limit, everybody to the limit, come on, everybody fhqwhgads.
    2. Re:Original Source and Actual Paper by Dragoniz3r · · Score: 4, Interesting

      Oh look, CmdrTaco published yet another story with a poorly-written, hypersensationalist summary! Par for the course.

    3. Re:Original Source and Actual Paper by klingens · · Score: 5, Interesting

      Yes it is lacking: it's too long for a /. "story". Editors want small, easily digested soundbites, not articles with actual information.

    4. Re:Original Source and Actual Paper by eudaemon · · Score: 5, Informative

      I just laughed at the "we aren't anywhere near 48 cores" comment - there are already commercial products with more than 48 cores now. I mean even a crappy old T5220 pretends to have 64 CPUs due to the 8 CPU, 8 thread design.

    5. Re:Original Source and Actual Paper by Anonymous Coward · · Score: 2, Informative

      I don't know, guess I picked a bad title or something?

      No. Your summary was too long.

      Seriously, the purpose of a summary is not to include every last fact and detail mentioned in the article; it's to give the reader enough information to decide whether reading the full article is worth it. Don't try to put everything in there.

    6. Re:Original Source and Actual Paper by Skal+Tura · · Score: 3, Interesting

      Scare piece.

      Your submission wasn't scaring enough. From your submission, it seems that it's not that big of a deal and rather easy solution. This submission makes it sound like linux kernel needs a complete rewrite ground-up, as in starting from scratch.
      Plus yours was a bit long and lots of details.

    7. Re:Original Source and Actual Paper by WinterSolstice · · Score: 3, Informative

      Got a pile of AIX servers here like that:
      http://www-03.ibm.com/systems/power/hardware/780/index.html

      I was kind of wondering about the "modern operating systems" comment... I think he meant "desktop operating systems".
      Many of the big OS vendors (IBM, DEC (now HP), CRAY, etc) are well beyond this point. Even OS/2 could scale to 1024 processors if I recall correctly.

      --
      An operating system should be like a light switch... simple, effective, easy to use, and designed for everyone.
    8. Re:Original Source and Actual Paper by Skal+Tura · · Score: 2, Informative

      nevermind quite an standard server, a dual xeon 6core HT... total reported CPUs is 24, and it's quite a lot used and nothing special.

    9. Re:Original Source and Actual Paper by Perl-Pusher · · Score: 4, Insightful

      Core !=CPU

    10. Re:Original Source and Actual Paper by TheRaven64 · · Score: 3, Interesting

      And it's worth noting that the most common application for that kind of machine is to partition it and run several different operating systems on it. Solaris has already had some major redesign work for scaling that well. For example, the networking stack is partitioned both horizontally and vertically. Separate connections are independent except at the very bottom of the stack (and sometimes even then, if they go via different NICs), and each layer in the stack communicates with the ones above it via message passing and runs in a separate thread.

      However, it sounds like this paper is focussing on a very specific issue: process accounting. To fairly schedule processes, you need to work out how much time they have spent running already, relative to others. I'm a bit surprised that Linux actually works as they seem to be describing, since their 'change' was to make it work in the same way as pretty much every other SMP-aware scheduler that I've come across; schedule processes on cores independently and periodically migrate processes off overloaded cores and onto spare ones.

      There are lots of potential bottlenecks. The one I was expecting to hear about was cache contention. In a monolithic kernel, there are some data structures that must be shared among each core and every tim you do an update on one core you must flush the caches on all of them, which can start to hurt performance when you have lots of concurrent updates. A few important data structures in the Linux kernel were rewritten in the last year to ensure that unrelated portions of them ended up in different cache lines, to help reduce this.

      Even then, it's not a problem that's easy to solve at the software level. Hardware transactional memory would go a long way towards helping us scale to 128+ processors, but the only chip I know of to implement it (Sun's Rock) was cancelled before it made it into production.

      --
      I am TheRaven on Soylent News
    11. Re:Original Source and Actual Paper by NevarMore · · Score: 4, Interesting

      The thing is eldavojohn practically *is* an editor for /. , just check out his submission page. Despite having such a high UID he's got a solid reputation, a good writing style, and offers good commentary on a wide variety of topics.

    12. Re:Original Source and Actual Paper by interkin3tic · · Score: 2, Insightful

      I don't know, guess I picked a bad title or something?

      Slashdot: dramatically overstated news for nerds... since that seems to be the evolution of news services for some reason?

      I'm working on a submission: Fox news just had a bit about the internet, I'm assuming that their headline is something like "WILL USING OBAMANET 'IPv6' KILL YOU AND MAKE YOUR CHILDREN TERRORISTS?"

    13. Re:Original Source and Actual Paper by Dahamma · · Score: 2, Informative

      the purpose of a summary is not to include every last fact and detail mentioned in the article; it's to give the reader enough information to decide whether reading the full article is worth it.

      If you think a summary can actually help get a /. reader to RTFA, you must be new here...

    14. Re:Original Source and Actual Paper by RCGodward · · Score: 3, Funny

      Don't bother checking the box, RMS, we know who it is.

    15. Re:Original Source and Actual Paper by BeardedChimp · · Score: 4, Informative

      The purpose of an editor is to edit any submissions to make them ready for print.

      If the summary was too long, the editor should have got off his arse rather than wait for the summary that fits the word count to come along.

    16. Re:Original Source and Actual Paper by Anonymous Coward · · Score: 3, Informative

      OS/2's SMP support is a joke. I'm sure that somewhere in that tangle is a comment like "up to 1024 processors". But it's as relevant as a sticker on a Ford Cortina warning not to exceed the speed of sound.

      Officially the SMP version of OS/2 "Warp Server" supported 64 processors. In practice anything other than an embarrassingly parallel task would see rapidly diminishing returns after just a couple of CPUs. The stuff that this article is moaning about, that Linux doesn't do well enough on 48 CPUs? OS/2 doesn't even attempt it, the official docs just say to "avoid" such things. This test case on 48 CPUs on OS/2 would just leave the OS constantly thrashing trying to move pages from one CPU to another, and no work being done.

      Now maybe if OS/2 had been a huge success, and IBM were now the dominant OS vendor on the desktop, there'd be a 1024 CPU version of OS/2 today. But in our reality, where OS/2 support was gradually abandoned and handed over to an underfunded little independent outfit, it sucks on SMP.

    17. Re:Original Source and Actual Paper by Anonymous Coward · · Score: 2, Insightful

      Wow, really just wow you sir are the cream of the crop! /sarcasm

      the OP has a very valid point, i come to read about technology news on slashdot not scare pieces with little or no information or value, his post was far superior in every respect and yet got passed over for this garbage post. And you devalue his point further by not even giving him the time of day, way to go asshole.

    18. Re:Original Source and Actual Paper by Wonko+the+Sane · · Score: 3, Insightful

      Your summary was too long.

      Yes, but the submission that got accepted has a bullshit headline.

      Of course "Linux May Need to Continue Making Incremental Changes Like It Has Been Doing For The Last Several Years To Scale Beyond 48 Cores" doesn't draw in as many clicks.

    19. Re:Original Source and Actual Paper by X0563511 · · Score: 2, Informative

      I've seen longer stories about lamer things get published...

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    20. Re:Original Source and Actual Paper by spazdor · · Score: 2, Funny

      Fuck, dude, we hurd you the first time. and "GNU Plus Linux" is terrible marketing.

      --
      DRM: Terminator crops for your mind!
    21. Re:Original Source and Actual Paper by Captain+Splendid · · Score: 4, Insightful

      Which is why he's treated like shit: Can't have any kind of excellence here, Taco wants to keep that old-school newsgroup feel. That's the only explanation that still fits.

      --
      Linux, you magnificent bastard, I read the fucking manual!
    22. Re:Original Source and Actual Paper by aywwts4 · · Score: 2, Informative

      If it is any consolation this straw is the one that broke the RSS feed's back.

      I have unsubscribe from Slashdot today due to the trend typified in your article VS the one published. (No this is not a new trend, but I'm fed up and finished with it.) See you on Reddit's Science/Linux/Everything else

      --
      Web Developers: Celebrate to our roots! Animated Gifs and Tiled Backgrounds, dont let our history die!
    23. Re:Original Source and Actual Paper by Unequivocal · · Score: 2, Insightful

      Elaborate please. I'm ignorant and curious.

    24. Re:Original Source and Actual Paper by Anonymous Coward · · Score: 2, Insightful

      Oh look, CmdrTaco published yet another story with a poorly-written, hypersensationalist summary! Par for the course.

      Remember back when the slashdot "editors" were part of the community and would actually respond to site concerns raised by users? I haven't seen ANY "editor" post a reply to any slashdot user post in friggin YEARS. Good luck with getting their attention these days if you aren't an advertiser.

    25. Re:Original Source and Actual Paper by Lumpy · · Score: 2, Interesting

      you are not in the club of liked submitters. Honestly the number of crap submissions that get picked over well thought out and very well cited ones is nuts to the point that I simply stopped submitting stories here. Its a waste of time.

      --
      Do not look at laser with remaining good eye.
    26. Re:Original Source and Actual Paper by UnknowingFool · · Score: 2, Funny

      it's to give the reader enough information to decide whether reading the full article is worth it.

      We are supposed to read the articles? Why didn't anyone tell me about this before?!!

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    27. Re:Original Source and Actual Paper by spazdor · · Score: 5, Insightful

      The very act of summarization constitutes an act of commentary. You're saying "I think the pertinent parts of this story are these, and the most important questions raised are those."

      A good summary invites commentary and frames the questions in a way which makes for better discussion, but don't for a second imagine the OP ought to be value-neutral (if such a thing could even exist.)

      --
      DRM: Terminator crops for your mind!
    28. Re:Original Source and Actual Paper by monkeySauce · · Score: 4, Informative

      The article is about cores per chip, not cores per system.

      You're trying to compare a 48-cylinder engine with a bunch of 4-cylinder engines working together.

    29. Re:Original Source and Actual Paper by Gilmoure · · Score: 3, Funny

      Wait, Macs don't suck?

      --
      I drank what? -- Socrates
    30. Re:Original Source and Actual Paper by jgagnon · · Score: 3, Interesting

      You can also think of it as the difference between rooms and buildings. Multiple cores may exist in a single CPU just like multiple rooms may exist in a building. Getting around between rooms in the same building isn't such a big deal. But getting from Room A in Building 1 to Room B in Building 2 requires you to leave Building 1 and then enter Building 2, which takes more time. Some motherboards support multiple CPUs (buildings) but most do not. Those that do are usually more expensive than the ones that support only a single CPU.

      --
      Remember to maintain your supply of /facepalm oil to prevent chafing.
    31. Re:Original Source and Actual Paper by bberens · · Score: 2, Informative

      A CPU can contain multiple cores which share Level 2 cache. Conversely a multi-CPU system has multiple complete CPUs which do not share their L2 cache.

      --
      Check out my lame java blog at www.javachopshop.com
    32. Re:Original Source and Actual Paper by Surt · · Score: 3, Informative

      This is not Amdahl's law, this is the dispatcher being inefficient.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    33. Re:Original Source and Actual Paper by Anpheus · · Score: 3, Interesting

      And multiple threads per core can be thought of as say, movable dividers in rooms. Yeah, it's really one room, but you can divide it into 2 "sort of", and it doesn't really mean you have twice as many rooms, but there are certain benefits you can get from doing so.

    34. Re:Original Source and Actual Paper by drsmithy · · Score: 4, Insightful

      I was kind of wondering about the "modern operating systems" comment... I think he meant "desktop operating systems".

      What's a "desktop operating system" these days ? The only mainstream OS that hasn't seen extensive use and development in SMP server environments for a decade plus is OS X. For all the others, "desktop" vs "server" is just a matter of the bundled software and kernel tuning.

      Even OS/2 could scale to 1024 processors if I recall correctly.

      Yeah. Just like those old PPC Macs were "up to twice as fast" as a PC.

    35. Re:Original Source and Actual Paper by jgagnon · · Score: 3, Informative

      To elaborate slightly further... If you had two CPUs on your motherboard with 8 cores each and four threads of execution per core, you'd have a total of: 2 CPUs, 16 cores, and 64 threads of execution.

      --
      Remember to maintain your supply of /facepalm oil to prevent chafing.
    36. Re:Original Source and Actual Paper by hardburn · · Score: 4, Insightful

      Trolling, I'm sure, but to people who take "GNU/Linux" seriously: how much of any given distro is really GNU code anymore? While GNOME may still be preferred by Ubuntu, there are also a lot of Kbuntu users, and many other distros seem to prefer KDE. Neither XFree86 nor X.Org were ever GNU. Smaller installations, like smartphones and home gateways (which often do run Linux, even if you can't install a custom version like DD-WRT), use busybox for their basic command line tools, and almost certainly do not use glibc. Debian even went for the eglibc fork, partially because Ulrich Drepper makes Theo DeRaadt look like a nice guy. HURD has gone nowhere for 20 years now, even if it does have some neat ideas.

      Non-GNU GUI applications and libraries now make up a huge percentage of a desktop distro, Apache and custom web apps make up a big chunk of server code, and smartphones may or may not have any GNU code at all.

      So what's left of GNU code now? Well, gcc is likely to keep being the world's de facto C compiler (though even this was mainly because of the egcs fork way back when). I'm sure there will be legions of emacs users for years to come, and I guess a lot of people still prefer GNOME. GNU's basic command line tools and bash will no doubt still be used on servers and desktops. But is this really sufficient to warrant a "GNU/Linux" nomenclature, not to mention all the pedantry that surrounds it?

      To the AnonCow troll above: GNU code has nothing to do with how the kernel handles multicore processors, so your whole point is moot within this context.

      --
      Not a typewriter
    37. Re:Original Source and Actual Paper by Old97 · · Score: 2, Funny

      Wow, you've convinced me. I'm canceling all my plans to migrate to OS/2. Thanks.

      --
      Very often, people confuse simple with simplistic. The nuance is lost on most. - Clement Mok
    38. Re:Original Source and Actual Paper by bn557 · · Score: 2, Informative

      Cores often share cache. Separate CPUs rarely do. The problem in this case is, when you approach 48 Cores in 1 CPU, the accounting task for the cache users starts growing out of proportion to the performance gain from adding cores.

      --
      Humans are slow, innaccurate, and brilliant; computers are fast, acurrate, and dumb; together they are unbeatable
    39. Re:Original Source and Actual Paper by mitgib · · Score: 2, Informative

      You can have 48 cores today with a Quad G34 motherboard.

      --
      Being a spelling & grammar Nazi is a sign you do not poses the intelligence to contribute to the conversation
    40. Re:Original Source and Actual Paper by CRCulver · · Score: 3, Insightful

      Debian (and I suppose Ubuntu too) makes use of a lot of Bash scripts behind the scenes. Grub is still the boot loader of choice. A lot of installation CDs use parted to set up the hard drive. Just some examples off the top of my head.

    41. Re:Original Source and Actual Paper by dAzED1 · · Score: 5, Informative

      and YET...that's irrelevant, because as many people have pointed out the problem is the cores that share L2 cache. There have been large systems with many, many processors for a long time, some of which run Linux. The problem that was described was 48cores on a single die, sharing the same cache. Sun's die-to-die tech isn't relevant to this problem, nor is putting more than 6 8-core CPUs in a single system.

    42. Re:Original Source and Actual Paper by mlts · · Score: 3, Informative

      I saw earlier today on another news site a post about something similar saying that no OS commercially made can support more than 32 cores.

      One of the followup postings was someone with an IBM 780 doing a prtconf|grep proc and showing 64 virtual processors on an LPAR. AIX supports up to 256 CPUs (physical or virtual.) I'm sure Solaris can do similar without breaking a sweat.

    43. Re:Original Source and Actual Paper by dgatwood · · Score: 4, Informative

      Well, gcc is likely to keep being the world's de facto C compiler (though even this was mainly because of the egcs fork way back when).

      Actually, I doubt that is true. At this point, the commercial UNIX vendors and the BSDs seem to be putting their weight behind Clang/LLVM/LLDB, in large part due to GCC going GPLv3. In addition to being a cleaner architecture that's easier to enhance than GCC, it is also faster, and it often produces much better code as well. The GNU toolchain's days as the de facto standard are numbered, IMHO.

      Back on topic, it occurs to be that large clusters with hundreds of cores start to inherently behave a lot more like NUMA and really need to be treated that way. Note that lots of modern OSes, including Linux, have supported NUMA in the past, so suggesting that it requires a completely rewritten OS is a preposterous assertion. That's not at all what this article is saying. What this article is saying is that tasks often are not easily divisible into tasks small enough to take advantage of multiple cores, and that managing processor affinity to ensure that threads working on the same data are run on the cores within the same physical die starts to become an unmanageable problem past a certain point.

      In effect, what it is saying is that barring interconnect improvements, for many classes of problems, the performance penalty caused by multiple cores needing to access the same data exceeds the performance gain from adding additional cores at or around 48 cores. No OS change will help this, and in many cases, no software changes can help this, either. Most computing tasks are simply not massively parallelizable. This conclusion should be entirely expected by anybody who has ever tried to parallelize software to any real degree, but it's always good to see studies that bear out.

      Put another way, once you exceed about 48 cores, the cores start to act more like clusters than cores. You start to see more and more accesses in which one CPU has to force data out of another CPU's cache. The nonuniformity of memory accesses starts to dominate the access times. Thus, past about that point (and probably much lower for most problems), adding more cores no longer improves performance. Even for massively parallelizable problems like video compression, once you exceed a certain number of nodes doing the work, the time spent assembling the final data actually exceeds the performance win achieved by adding additional processing nodes. This is completely straightforward, completely understood by real-world computer programmers, and shouldn't really be a surprise to anyone.

      I'm not convinced an OS change can fix this, nor even an architectural change, though both can help to some degree by making parallelization easier (e.g. by providing APIs for supporting work units arranged in a dependency graph like GCD as an alternative to raw thread-based APIs). At some point, though, you're bounded by the number of distinct pieces that a problem can be divided into that don't depend on the output of any other piece, and once you hit that limit, adding additional computational units can only hinder performance, not help it. Your only real choices, then, are to find new and interesting ways to refactor the problem so that this is no longer the case, to change the structure of the input data to remove dependencies, to increase the speed of the individual CPU cores, or to turn the machines loose processing more than one problem at any given time to keep the remaining cores occupied.

      Oh, yeah, and there's one other change that helps a lot: keep your read-only data in read-only pages, and write your code so that results go somewhere else. Read-only pages can be cached in every CPU without any real cache coherency overhead, at least in theory (I'm assuming that most modern CPUs do this), which means that input data sharing between CPUs doesn't matter. This design, combined with lockless work unit APIs, can make a huge difference in how many CPU

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    44. Re:Original Source and Actual Paper by DrgnDancer · · Score: 3, Informative

      SGI runs Single System Image Linux systems with over 1000 cores, that's not the problem. If you read the article it seems that they aren't talking about the number of cores in the system, they're talking about the number of cores on a chip. Multicore chips use shared caches. the problem is that the algorithms used to handle CPU caching don't scale to really huge numbers of cores sharing the cache in a single chip. Having 4X16 core chips will work fine, having a single 64 core chip will present difficulties. At least that's how I understand the article.

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    45. Re:Original Source and Actual Paper by The+Wild+Norseman · · Score: 2, Funny

      SGI runs Single System Image Linux systems with over 1000 cores, that's not the problem.

      640 cores should be enough for anyone.

      --
      "A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
    46. Re:Original Source and Actual Paper by joib · · Score: 4, Informative

      Unfortunately, the summary as well as the short articles on the web were more or less completely missing the point. The actual paper ( http://pdos.csail.mit.edu/papers/linux:osdi10.pdf ) explains what was done.

      Essentially they benchmarked a number of applications, figured out where the bottlenecks were, and fixed them. Some of the things they fixed where done by introducing "sloppy counters" in order to avoid updating a global counter. Others were to switch to more fine-grained locking, switching to per-cpu data structures, and so forth. In other words, pretty standard kernel scalability work. As an aside, a lot of the VFS scalability work seems to clash with the VFS scalability patches by Nick Piggin that are in the process of being integrated into the mainline kernel.

      And yes, as the PDF article explains, the Linux cpu scheduler mostly works per-core, with only occasional communication with schedulers on other cores.

    47. Re:Original Source and Actual Paper by amorsen · · Score: 2, Insightful

      I'm willing to bet that when mainstream 64-core general-purpose CPUs arrive, they will be NUMA and be partitioned in groups with shared cache. I will be surprised if all the cores have a shared cache other than possibly a large slow write-through level-4 cache. It would be very tricky to make an efficient modern cache with deferred writeback and access by 64 cores, and the gains over e.g. 4 smaller caches would be modest. The memory bandwidth requirements of a 64-core chip also make it very tempting to implement separate memory controllers for groups of cores instead of needing an extremely fast shared memory controller.

      So all in all, I think a very fast desktop tomorrow will look like a shrunk version of a modern NUMA server, at least when it comes to what the operating system can see.

      --
      Finally! A year of moderation! Ready for 2019?
    48. Re:Original Source and Actual Paper by GooberToo · · Score: 2, Insightful

      Completely agree.

      Of course, this all ignores the fact that Linux already scales well beyond 48 cores. Even more so, it appears the group is confusing bus contention for OS scalability. The problem is, using modern CPUs (cores), they are sharing caching, which is all too frequently the real problem. The shared cache leads to cache contention.

      Linux, right now, is capable of scaling well beyond 128 cores (err...cpus)...and more... Its just not standard code because the overhead is less optimal for 99.999% of the current user base. Basically this boils down to, Windows scales poorly. I've not met anyone who doesn't already know this.

      Long story short, News at 11, a story everyone already knows. No new news is now news. Basically they documented what everyone already knows for almost a decade now.

    49. Re:Original Source and Actual Paper by im_thatoneguy · · Score: 3, Informative

      The original summary was lacking but the alternative proposed summary was WAY too long.

      It's just supposed to pique my interest enough to read the article, not run several pages.

    50. Re:Original Source and Actual Paper by gtall · · Score: 2, Funny

      Nah, jokes like "64xx should be enough for anybody" actually suck the humor out of the reader due to old age. The GP is probably an inmate in the Home for the Terminally Bland.

    51. Re:Original Source and Actual Paper by Kumiorava · · Score: 2, Insightful

      I just read the original article that said they used 8 6-core processors to _simulate_ 48-core processor. It would be hard to experiment on a real 48-core processor as those are not readily available.

    52. Re:Original Source and Actual Paper by Bob-taro · · Score: 2, Insightful

      Think of in terms of cars. The processes are roads, the CPUs are cars and the cores are the seats in the cars, only the seats can each travel on different roads independently and share resources with the other seats in the same car. If you have a 2-seater and the seats are on different roads, they can obviously only go half as fast as if they are on the same road. Now if you have 48 seats in a car, than it isn't a car anymore, it's a bus, so obviously you'd have to make fundamental changes to the OS.

      When it comes to computers, you can never go wrong with a car analogy.

      --
      Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
    53. Re:Original Source and Actual Paper by mjwx · · Score: 2, Funny

      The thing is eldavojohn practically *is* an editor for /. , just check out his submission page. Despite having such a high UID he's got a solid reputation, a good writing style, and offers good commentary on a wide variety of topics.

      Which is exactly why he _cant_ be a /. editor.

      --
      Calling someone a "hater" only means you can not rationally rebut their argument.
    54. Re:Original Source and Actual Paper by wastedlife · · Score: 2, Informative

      While NT was originally supposed to be called OS/2 3.0, it was a new OS developed by Cutler and some other devs from DEC, not continued development of the OS/2 code.

      --
      Said, "It's just like dice but it's got more sides And it tells me who lives and who dies"
  2. Linux already runs on thousands of cores by Chirs · · Score: 2, Insightful

    SGI has some awfully big single-system-image linux boxes.

    I saw a comment on the kernel mailing list about someone running into problems with 16 terabytes of RAM.

    1. Re:Linux already runs on thousands of cores by Gaygirlie · · Score: 4, Interesting

      It's not the case of not being able to do such, but instead about where there are performance regressions. Of course it's possible to run Linux on multiple hundreds of cores, but it seems that after 48 cores there is a performance regression and thus all those cores don't benefit as much as they could. That is the issue here.

    2. Re:Linux already runs on thousands of cores by DrgnDancer · · Score: 4, Informative

      I thought this as well, but after more carefully reading the article, I *think* I see what the problem is. It's not really a problem with large numbers of cores in a system, so much as a problem with large numbers of cores on a chip. Since the multicore chips share caches (level 2 cache is shared, level 1 cache isn't IIRC, but I could be wrong) it's actually cache memory where the issue lies. I've worked on single system image SGI systems with 512 cores, but those systems were actually 256 dual core chips. That works fine, and assuming well written SMP code performance scales as you'd expect with number of cores.

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    3. Re:Linux already runs on thousands of cores by TheRaven64 · · Score: 3, Interesting

      SGI has some awfully big single-system-image linux boxes.

      Not really. SGI has big NUMA machines, with a single Linux kernel per node (typically under 8 processors), some support for process / thread migration between nodes, and a very clever memory controller for automatically handle accessing and caching remote RAM. Each kernel instance is only responsible for a few processes. They also have a lot of middleware on top of the kernel that handles process distribution among nodes.

      It's an interesting design, and the SGI guys have given a lot of public talks about their systems so it's easy to find out more, but it is definitely not an example of Linux scaling to large multicore systems.

      --
      I am TheRaven on Soylent News
    4. Re:Linux already runs on thousands of cores by Gaygirlie · · Score: 2, Interesting

      Since the multicore chips share caches (level 2 cache is shared, level 1 cache isn't IIRC, but I could be wrong) it's actually cache memory where the issue lies.

      That's what I thought too, but after thinking it a bit more I'd dare to claim it's both a hardware and software issue. Too small cache of course does cause issues like the researchers noticed but it's mostly because the method how memory accesses and cache is handled in software that makes it such a big issue. Rethinking the approach how kernel handles such could very well minimize the impact even in cases where there is not all that much cache available.

      Of course, I'm not an expert in SMP or multi-core systems so I could have verily misunderstood it.

    5. Re:Linux already runs on thousands of cores by Troy+Baer · · Score: 3, Interesting

      Um, no. The early Itanium-based Altixes (Altices?) could go up to 512 cores running a single copy of Linux. The new Nehalem-based Altixes can have up to 2048 cores in a single system image IIRC. We just finished acceptance testing on an SGI Altix UV 1000 with 1024 cores. It runs one copy of Linux on it.

      --
      "My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
  3. Error in their math by El_Muerte_TDS · · Score: 5, Funny

    They have an one-off error in their math, it's actually 9 times a 6 core CPU. So, at 42 cores a rewrite is needed.

  4. Enough by wooferhound · · Score: 2, Funny

    640 cores ought to be enough for anybody . . .

    --
    We are Dead Stars looking back Up at the Sky
    1. Re:Enough by Abstrackt · · Score: 2, Funny

      lol i dont see why people cant be happy with a quad it gets the job done if you need more than 4 cores you should just shoot yourself

      I'll hand out guns to the scientists then. Maybe they'll be willing to donate their punctuation to you as well.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
  5. What are they talking about by pclminion · · Score: 4, Insightful

    Can somebody please explain what the fuck they are actually talking about? They've dumbed down the terminology to the point I have no idea what they are saying. Is this some kind of cache-related issue? Inefficient bouncing of processes between cores? What?

    1. Re:What are they talking about by jd · · Score: 5, Informative

      What they are talking about really reduces to a variant of Ahmdals Law, but simply put scaling is always non-linear. There will be overheads per core for communication (why is why SMP over 16 CPUs is such a headache) and overheads per core within the OS for housekeeping (knowing what core a specific thread is running on, whether it is bound to that core, etc, and trying to schedule all threads to make best use of the cores available).

      The more cores you have, the more state information is needed for a thread and the more possible permutations the scheduler must consider in order to be efficient. Which, in turn, means the scheduler is going to be bulkier.

      (Scheduling is a variant of the box-packing problem, which is an NP-Complete problem, but it has the added catch that you only get a very short time to pack the threads in and scheduling policies - such as realtime and core-binding - must also be satisfied in addition to packing all the threads in.)

      The more of this extra data you need, the slower task-switching becomes and the more of the cache you are hogging with stuff not actually tied to whatever the threads are actually doing. At some point, the degradation in performance will exactly equal the increase in performance for the extra cores. The claim is that this happens at 48 cores for modern OS'. This is plausible but it is unclear if it is an actual problem. Those same OS' are used on supercomputers of 64+ cores, by segregating the activities in each node. MOSIX, Kerrighd and other such mechanisms have allowed Linux kernels to migrate tasks from one node to another transparently. (ie: You don't know or care where the code runs, the I/O doesn't change at all.) The only reason Linux doesn't have clustering as standard is that Linus is waiting for cluster developers to produce a standard mechanism for process migration that also fits within the architectural standards already in use.

      If you clustered a couple of hundred nodes, each with 48 cores, you're looking at having around 2000+ on the system. It wouldn't take a "rewrite" per-se, merely a few hooks and a standard protocol. To support a single physical node with more than 48 cores, you might need to split it into virtual nodes with 48 or fewer cores in each, but Linux already has support for virtualization so that's no big deal either.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  6. Only Linux? by Ltap · · Score: 3, Interesting

    It looks like TFS was written by a Windows fanboy; why mention Linux specifically when it is a general problem? Why try to half-assedly imply that Windows is more advanced than Linux?

    --
    Yet Another Tech Blog
    (but so much more, including game and movie reviews)
    http://yanteb.peasantoid.org
    1. Re:Only Linux? by Attila+Dimedici · · Score: 3, Insightful

      Having read eldavojohn's post that summarizes the article, it appears that the reason to pick out Linux specifically is because that is the OS that the writers of the paper actually tested. Since Windows uses a different system for keeping track of what various cores are doing it is likely that Windows will run into this problem at a different number of cores. However, until someone conducts a similar test using Windows we will not know if that number is more or less than 48.

      --
      The truth is that all men having power ought to be mistrusted. James Madison
    2. Re:Only Linux? by wastedlife · · Score: 2, Informative

      They did not "rewrite the kernel" for 7. They updated the code, just like every other piece of software normally does when it moves from version to version. Rewriting the kernel implies that they tore it down and started over, which is most certainly not true. Vista/2008 is NT version 6.0, 7/2008 R2 is NT version 6.1, not a rewrite.

      --
      Said, "It's just like dice but it's got more sides And it tells me who lives and who dies"
    3. Re:Only Linux? by aardwolf64 · · Score: 2, Insightful

      No, their rewrite is also subject to to this issue. Go publicize Windows somewhere else.

      No, it isn't subject to this issue. They removed the dispatcher lock. Go bash Windows somewhere else.

    4. Re:Only Linux? by tibman · · Score: 2, Informative

      The problem isn't scaling to that number of cores but the overhead in doing so. That's what i took from it

      --
      http://soylentnews.org/~tibman
  7. 64 cores by hansamurai · · Score: 2, Interesting

    At my last job we had a bunch of Sun T5120s which housed 64 cores. So yeah, we are "anywhere near 48".

  8. Jaguar? by MrFurious5150 · · Score: 2, Insightful

    Cray seems to have addressed this problem, yes?

  9. 48 cores? by drunkennewfiemidget · · Score: 4, Funny

    I'm still waiting for Windows to work well on ONE.

    1. Re:48 cores? by Beer+Drunk · · Score: 2, Funny

      Actually I'm rather pleased with Windows 7. It's a great improvement over their last few attempts and other than a few spurious reboots right in the middle of several hours work and often requiring me to force the *&#%$ drives to re-mirror it's not toooo bad. OK, I just use it on this box because there are a couple of programs I like not available for native Linux yet but at least it's not Vista or ME bad.

    2. Re:48 cores? by Abstrackt · · Score: 2, Interesting

      OK, I just use it on this box because there are a couple of programs I like not available for native Linux yet but at least it's not Vista or ME bad.

      One trick in business and politics is to offer a bad choice next to a worse one so it doesn't seem as bad by comparison. Every time I see or hear that comment the conspiracy theorist in me wonders whether ME and Vista were deliberately bad to soften the shock of adjustment to XP and 7.

      --
      They say a little knowledge is a dangerous thing, but it's not one half so bad as a lot of ignorance. - Terry Pratchett
  10. seeing as Linux does 10240 cores already, WTF? by r00t · · Score: 4, Interesting

    No kidding. SGI's Altix is a huge box full of multi-core IA-64 processors. 512 to 2048 cores is more normal, but they were reaching 10240 last I checked. This is SMP (NUMA of course), not a cluster. I won't say things work just lovely at that level, but it does run.

    48 cores is nothing.

    1. Re:seeing as Linux does 10240 cores already, WTF? by Unequivocal · · Score: 3, Informative

      I think specifically they are talking about having 48 cores behind an L2 cache. Or 48 cores on a single die. Multi-CPU boxes generally communicate between CPU dies via the bus and from what little I can gather, that helps reduce or eliminate the issue they're describing..

  11. Re:based on a 1970s OS and language by Anonymous Coward · · Score: 2, Insightful

    UNIX and C were great in their days. But perhaps not in the meg-core era.

    So, what is better in your opinion? Java? Or maybe even ruby? Oh yes, that would be great. Run-time OS reflection through kernel drivers implemented as ruby modules.

    Too bad CPU's don't come with built-in ruby interpreters.

  12. Who uses that by MSDos-486 · · Score: 2, Funny

    http://xkcd.com/619/

  13. Obligatory xkcd reference by zill · · Score: 2, Interesting

    Do they have support for smooth full-screen flash video yet?

    My Ubuntu 10.04 system still can't play embedded youtube videos. At least Adobe provided a work-around by adding a "play on youtube" option in the right click context menu.

    1. Re:Obligatory xkcd reference by diegocg · · Score: 2, Informative

      Yes, you can play smooth full-screen video in Linux with the "Square" preview release (which includes 64 bit support). Full-screen 720p video only uses 30-40% of the CPU on my crappy Intel graphics chip, and it's completely smooth.

    2. Re:Obligatory xkcd reference by Anonymous Coward · · Score: 2, Insightful

      If your Ubuntu 10.04 system can't play embedded youtube videos then you should get off your ass and fix it instead of wasting your time pasting xkcd links. Ubuntu plays flash videos out of the box without a single hitch for years.

    3. Re:Obligatory xkcd reference by RyuuzakiTetsuya · · Score: 2, Funny

      If you have 4,096 CPUs I don't think that "smooth flash playback" is a problem.

      4,095 CPUs however...

      --
      Non impediti ratione cogitationus.
  14. Re:based on a 1970s OS and language by geekoid · · Score: 4, Insightful

    Hahaha. Oh arrogances from ignorance, how I loath you.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  15. 48 Cores in 1U by kybur · · Score: 2, Informative

    I'm not affiliated with Supermicro in any way, but they have four 1U serverboards designed for the 12 core opterons, so that's 48 cores in a 1U server. I'm guessing that Supermicro is not the only vendor of quad opteron boards supporting the latest chips. There are most likely quite a few of these in use by real people. Anyone want to speak up?

    I know from personal experience that the socket F opterons performed very poorly in an 8 way configuration compared to the previous generation (socket 940 gen). I ran multiple tests on dual core chips (885s, I think), back in 2006 or 7 where I'd get nearly double the performance in going from a quad configuration to an 8 way configuration, but with the socket F breed of chips, there was no performance boost at all, it was like the clock speed was being cut in half and all the threads took twice as long to complete. I saw this behavior again and again, and the motherboard manufacturer that I was testing the chips with told me that it was an issue with the chips themselves. I think this is the reason why 8-way opteron systems are very rare now.

  16. that's crazy by Punto · · Score: 2, Funny

    Nobody's every going to need more than 640 cores

    --

    --
    Stay tuned for some shock and awe coming right up after this messages!

  17. how is this news? by dirtyhippie · · Score: 3, Insightful

    We've known about this problem for ... well, as long as we've had more than one core - actually as long as we've had SMP... You increase the number of cores/CPUs, you decrease available memory thruput per core, which was already the bottleneck anyway. Am I missing something here?

  18. I don't understand... by Anonymous Coward · · Score: 2, Informative

    I'm trying to understand the point of this article..Do we really need a new paper to say that centralized memory bandwidth is at some point a limiting problem in an SMP environment? Isn't this why we have NUMA?

    If you want to go after linux internals like the BKL more power to you but that horse left the stable a long long time ago as well.

    You could talk about the software problem in dealing with decentralized memory access, synchronization, scalable algorithms...etc but this is all likely something needing to be addressed in application space rather than at the kernel where this paper seems to focus.

    There are no shortage of huge single system image linux systems with thousands of processor cores and not a single one of them use SMP architecture. They are all NUMA based (decentralized memory access).

  19. Patches available by diegocg · · Score: 3, Informative

    So, they found scalability problems in some microbenchmarks. Well, some of the scalability paths cited in the paper will be fixed when Nick Piggin's VFS scalability patchset gets merged. But it's not like you need to rewrite every operative system to scale beyond 48 cores, it's just the typical scalability stuff, and the kind of scalability issues found these days are mostly corner cases (Piggin's VFS being an exception).

  20. Just the cache problem by Todd+Knarr · · Score: 4, Informative

    What they're saying is basically two things:

    First, there's a bottleneck in the on-chip caches. When a core's working on data it needs to have it in it's cache. And if two cores are working on the same block of memory (block size being determined by cache line size), they need to keep their copies of the cache synchronized. When you get a lot of cores working on the same block of memory, the overhead of keeping the caches in sync starts to exceed the performance gains from the additional cores. That's not new, we've known that in multi-threaded programming for decades: when you've got a lot of threads dependent on the same data items, the locking overhead's going to be the killer. And we've known the solution for just as long: code to avoid lock contention. The easiest is to make it so you don't have multiple threads (cores) working on the same (non-read-only) memory at the same time, that just requires some thinking on the part of the developers.

    Second, you only gain from additional cores if there's workload to spread to them usefully. If you've got 8 threads of execution actually running at any given time, you won't gain from having more than 8 cores. And on modern computers often we don't have more than a few threads actually using CPU time at any given moment. The rest are waiting on something and don't need the CPU and, as long as we aren't thrashing execution contexts too badly, they can be ignore from a performance standpoint. To take advantage of truly large numbers of cores, we need to change the applications themselves to parallelize things more. But often applications aren't inherently multi-threaded. Games, yes. Computation, yes. But your average word processor or spreadsheet? It's 99% waiting on the human at the keyboard. You can do a few things in the background, file auto-save and such, but not enough to take advantage of a large number of cores. The things that really take advantage of lots of cores are things like Web servers where you can assign each request to it's own core. And no, browsers don't benefit the same way. On the client side there are so (relatively) few requests and network I/O's so slow relative to CPU speed that you can handle dozens of requests on a single core and still have cycles free assuming you use an efficient I/O model. But it all boils down to the developers actually thinking about parallel programming, and I've noticed a lot of courses of study these days don't go into the brain-bending skull-sweat details of juggling large numbers of threads in parallel.

  21. Re:Sun E10Ks were at 72 cores over a decade ago by jedidiah · · Score: 2, Informative

    An E10K is a glorified network computing cluster.

    It's not what's being discussed at all.

    --
    A Pirate and a Puritan look the same on a balance sheet.
  22. Re:Windows 7 scales to 256 cores by h4rr4r · · Score: 3, Insightful

    Linux supposedly scales to 1024 or something like that. This is not what they supposedly scale to, but the performance impact of actually trying to use that many cores.

  23. K42: these problems were already tackled by compudj · · Score: 5, Informative

    The K42 project at IBM Research investigated the benefit of a complete OS rewrite with scalability to very large SMP systems in mind. This is an open source operating system supporting Linux-compatible API and ABI.

    Their target systems, "next generation SMP systems", back in 2003 seems to have become the current generation of SMP/multi-core systems in the meantime.

  24. Re:OpenIndiana?? by h4rr4r · · Score: 2, Informative

    OpenSolaris is dead. Solaris sucks to use without GNU userland anyway and being sued by oracle is no fun. Besides you troll, this would not need a new linux, just some small changes to the current one.

  25. Tilera? by Anonymous Coward · · Score: 3, Informative

    Tilera Corp. already has CPU architecture with 16-100 cores per chip.
    TILE-Gx family

    Support for these is already being included in the mainline kernel.

  26. Slashdot by carrier+lost · · Score: 3, Funny

    ...there is some time left to come up with a new Linux (Windows?).

    Windows, the new Linux.

    You read it here first...

  27. Re:other kernels by Beelzebud · · Score: 2, Interesting

    Possibly, but they still have tons of work to do. I recently installed Arch Hurd http://www.archhurd.org/ just to get some hands on time with the state of the OS, and was kind of surprised at the status. Many things are in place and work correctly, but it's nowhere near something I could say I'd actually want to use on a daily basis.

  28. Already Problematic with 4 cores by chocapix · · Score: 2, Interesting

    Using "cat /proc/cpuinfo" as a benchmark, I can see that my quad core is several times slower with an SMP kernel compared to a non-SMP kernel.

  29. So what the fuck is he doing here then? by SmallFurryCreature · · Score: 5, Funny

    Lets drive the greenhorn OUT! No filthy high UID's with their spelling and gramar and solid well researched non-sensationlist writing. I want my editors to rape the language (bonus points if it is several languages at once) and sent my heart racing by raising my bile and fear of the unknown and known.

    Headlines sell adverts. Truth, accuracy, honesty do not. Accept it, you are reading slashdot, it works.

    --

    MMO Quests are like orgasms:

    You may solo them, I prefer them in a group.

    1. Re:So what the fuck is he doing here then? by icebraining · · Score: 3, Insightful

      Headlines sell adverts. Truth, accuracy, honesty do not. Accept it, you are reading slashdot, it works.

      No, I read /. because of comments like eldavojohn's. If they were to disable the comments I'd unsubscribe it from my feeds immediately.

  30. Re:Well, they better start coding now... by Surt · · Score: 2, Insightful

    PER CPU. As was pointed out in many other comments. Linux has already scaled to thousands of cores across many cpus.

    --
    "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
  31. BeOS! by JonnyO · · Score: 2

    If BeOS had survived this wouldn't be an issue. Cores and threads everywhere! But noooooooo...

  32. BS on not being near 48 cores... I have 34 already by Fallen+Kell · · Score: 2, Informative

    I have 34 systems which have 48 cores already in the server room. These are quad socket systems with 4 AMD 12-core CPU's. So I call BS to the guys who think we have plenty of time, because there are plenty of people deploying these things already.

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
  33. You're Welcome! by eldavojohn · · Score: 2, Funny

    However, posting your own post in your own post is a bit excessive, and there could have been better ways to do this than just repost your entire freakin story as the first comment.

    Yo dawg, I heard you liked my post so I put a post inside my post so you could enjoy it while you're enjoying my post!

    --
    My work here is dung.
  34. Re:based on a 1970s OS and language by EvanED · · Score: 2, Interesting

    A language can change nearly overnight to add mechanism for threading.

    Is that why the C and C++ people have spent so long at trying to come up with a memory model that will actually work correctly under concurrent execution? Is that why Java got it wrong the first time?

  35. Re:Windows 7 scales to 256 cores by TheNetAvenger · · Score: 3, Insightful

    The point isn't that NT Scales to 256 cores, the point is how efficient it is when scaling to this many processors. The NT Kernel in Win7 was adjusted so that systems with 64 or 256 CPUs have a very low overhead handling the extra processors.

    Linux in theory (just like NT in theory) can support several thousand processors, but there is a level that this becomes inefficient as the overhead of managing the additional processors saturates a single system. (Hence other multi-SMP models are often used instead of a single 'system')

    Just simply Google/Bing: windows7 256 Mark Russinovich

    You can find nice articles and even videos of Mark talking about this in everyday terms to make it easy to understand.

  36. Re:Windows 7 scales to 256 cores by walshy007 · · Score: 2, Informative

    The point is the article dealing with a simulated theoretical cpu with 48+ cores on a single die with shared l2 cache.

    The changes made are incremental and I imagine will be dealt with long before this actually becomes an issue when (or if) we get cpus with that many cores on a single die.

    multi socket systems are already immune to this the way it is setup, you could have an 8 socket system with each cpu having 8 cores and it would not show the problems shown in the article.

    In other words, business as usual, the kernel gets optimized for hardware that actually exists or will exist in the near future. 48 core single cpus are a few years away, and the changes to accomodate them don't require anything significant so I'm sure it will be dealt with at the time.