Not All Cores Are Created Equal

← Back to Stories (view on slashdot.org)

Not All Cores Are Created Equal

Posted by kdawson on Monday December 22, 2008 @01:56PM from the working-out-the-kinks dept.

joabj writes "Virginia Tech researchers have found that the performance of programs running on multicore processors can vary from server to server, and even from core to core. Factors such as which core handles interrupts, or which cache holds the needed data can change from run to run. Such resources tend to be allocated arbitrarily now. As a result, program execution times can vary up to 10 percent. The good news is that the VT researchers are working on a library that will recognize inefficient behavior and rearrange things in a more timely fashion." Here is the paper, Asymmetric Interactions in Symmetric Multicore Systems: Analysis, Enhancements and Evaluation (PDF).

27 of 183 comments (clear)

unsurprising. by Anonymous Coward · 2008-12-22 14:03 · Score: 5, Interesting

Anyone who thinks computers are predictably deterministic hasn't used a computer. There are so many bugs in hardware and software that cause it to behave differently than expected, documented, designed. Add to that inevitable manufacturing defects, no matter how microscopic, and it's unimaginable to find otherwise.
It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."
1. Re:unsurprising. by Rod+Beauvex · 2008-12-22 14:03 · Score: 5, Funny
  
  It's those turny knobs. They lie.
2. Re:unsurprising. by symbolset · 2008-12-22 14:11 · Score: 5, Funny
  
  You have to buy the one that goes to 11. You know how 10 makes the toast almost totally black? Well, what if you want your toast just a little bit more crispy? What if you want just that little bit more? That's what 11 is for. Those other toasters only go to 10, but this one goes to 11.
  
  --
  Help stamp out iliturcy.
3. Re:unsurprising. by MightyYar · 2008-12-22 14:13 · Score: 4, Funny
  
  I had a Pentium that DEFINITELY went to 11.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
4. Re:unsurprising. by RuBLed · 2008-12-22 14:31 · Score: 5, Funny
  
  mine only went up to 10.998799799
5. Re:unsurprising. by aaron+alderman · 2008-12-22 15:53 · Score: 5, Interesting
  
  Impossible like "xor eax, eax" returning a non-zero value and crashing windows?
6. Re:unsurprising. by $RANDOMLUSER · 2008-12-22 16:04 · Score: 5, Funny
  
  Moral of the story: There's a lot of overclocking out there, and it makes Windows look bad.
  
  Oh. So that's what's been doing it.
  
  --
  No folly is more costly than the folly of intolerant idealism. - Winston Churchill
7. Re:unsurprising. by zappepcs · 2008-12-22 16:06 · Score: 5, Interesting
  
  Actually, (sorry no link) there was a researcher that was using FPGAs and AI code to create simple circuits, but the goals was to have the AI design it. What he found is that due to minor manufacturing defects, the code that was built by AI was dependent on the FPGA it was tested on and would not work on just any FPGA of that specification. After 600 iterations, you'd think it would be good. One experiment went for a long time, and in the end when he analyzed the AI generated code, there were 5 paths/circuits inside that did nothing. If he disabled any or all of the 5 the overall design failed. Somehow, the AI found that creating these do nothing loops/circuits caused a favorable behavior in other parts of the FPGA that made for overall success. Naturally that code would not work on any other FPGA of the specified type. It was an interesting read, sorry that I don't have a link.
  
  --
  Support NYCountryLawyer RIAA vs People
8. Re:unsurprising. by johnw · 2008-12-22 20:15 · Score: 3, Informative
  
  A simple Google search for "fpga genetic algorithm" shows up references quite quickly - e.g.
  http://biology.kenyon.edu/slonc/bio3/AI/GEN_ALGO/gen_algo.html
  The only part of the GP story I haven't seen before (and can't find a reference for) is the bit about the design not working on other FPGAs of the same specification. The closest story is that of Adrian Thompson at the University of Sussex who got a circuit with unconnected elements which nonetheless seem to be needed in order for the whole thing to achieve its goal. Nothing about the design only working on specific instances of the FPGA.
9. Re:unsurprising. by sowth · 2008-12-22 20:38 · Score: 3, Insightful
  
  They probably put in the if(1) lines because they were testing various aspects of the program, or maybe some like to turn off various aspects of the program, but don't want to be arsed to write the proper code to select options. I commonly do that in POVray (3d raytracing) scripts when testing, so I don't have to wait for long renders--fog, radiosity, lots of light and such take orders of magnitude more time.
  As for the AI adding crap, it is probably more trying random code than truly thinking about how the code should work. This leads to the useful code intertwined with lots of crap code. Unfortunately, there are programmers who write like this too... (cue funny mod)
  As for the code not working on other FPGAs, maybe the researcher should not use real chips to check the iterations. A simulated one which conforms to the spec exactly and upon where quirks and such are expected, dies or sends a signal back to the AI program. Testing after the fact on real chips to verify the AI didn't exploit bugs in the simulator would be more proper procedure.
  Maybe I have too much of a background in theory, but I am not completely sure why the FPGAs would be so different. Is it race time conditions? Or is the FPGA being used in some analog way? Or does the circuit depend on the exact timing of some input, so the speed / capacitance of each component make a huge difference? Or was the poster talking about FPGAs with different specs?
  Crazy things happen when you enter the real world. I remember back when I was in electronics assembly. One would first assume all the solder would wick onto the metal, but the boards would always have tonnes of solder bridges, and we had to carefully examine every component and correct them. Friggin' microprocessors had countless tiny legs too!
10. Re:unsurprising. by raynet · 2008-12-22 21:21 · Score: 4, Funny
  
  I am sure you mean to say; Wow, a joke from 1994.995994999.
  
  --
  - Raynet --> .
who would've guessed... by Eto_Demerzel79 · 2008-12-22 14:03 · Score: 4, Insightful

...programs not designed for multi-core systems don't use them efficiently.
1. Re:who would've guessed... by timeOday · 2008-12-22 16:07 · Score: 4, Insightful
  
  No, the programs are not the problem. The programmer should not have to worry about manually assigning processes to cores or switching a process from one core to another - in fact, there's no way the programmer could do that, since it would require knowing what the system load is, what other programs are running, and physical details (such as cache behavior) of processors not even invented yet. This is all the job of the OS.
multicore dev is fun... much like prison rape! by Shadowruni · 2008-12-22 14:12 · Score: 4, Interesting

The current state of dev reminds me sort of the issues that Nintendo had with the N64.... a beautiful piece of hardware with (at the time) a God-like amount of raw power, but *REALLY* hard to code for. Hence the really interesting titles for it either came from Rare who developed on SGI machines (a R10000 drive that beast) or Nintendo, who built the thing.
/yeah yeah, I know the PS1 and Sega Saturn had optical media and that the media's storage capacity which lead to better and more complex were truly what killed the N64.
//bonus capt was arrestor

--
"Chinese Amazons, power armor, laser swords.... things just meant to be." - Shampoo, A Very Scary Bet
1. Re:multicore dev is fun... much like prison rape! by carlzum · 2008-12-22 14:51 · Score: 4, Interesting
  
  I believe the biggest problem with multi-core development is a lack of maturity in the tools and libraries available. Taking advantage of multiple cores requires a lot of thread management code, which is great for highly optimized applications but deters run-of-the-mill business and user app developers. There was a recent opinion piece in Dr Dobbs discussing the benefits a concurrency platforms I found interesting. The article is clearly promoting the author's company (Clik Arts), but I agree with his argument that the complexities of multi-core development need to be handled in a framework and not applications.
Linux and Windows by WarJolt · 2008-12-22 14:29 · Score: 3, Insightful

I don't know if Linux or Windows has an automatic mechanism to schedule task priority based on processor caches, but the study didn't even mention Windows. Seeing that the scheduling and managing the caches are OS problems this seems kind of important.
The other thing that seems odd is they were using a 2.6.18 Kernel and in 2.6.23 they added the Completely Fair Scheduler which could potentially change their results. It doesn't seem logical to base a cutting edge study on stuff that was released years ago.
Linux schedules better than this by bluefoxlucid · 2008-12-22 14:30 · Score: 3, Informative

Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.

--
Support my political activism on Patreon.
1. Re:Linux schedules better than this by nullchar · 2008-12-22 14:41 · Score: 4, Interesting
  
  Possibly... but it appears an SMP kernel treats each core as a separate physical processor.
  Take an Intel Core2 Quad machine and start a process that takes 100% of one CPU. Then watch top/htop/gnome-system-monitor/etc where you can watch the process hop around all four cores. It makes sense that the process might hop between two cores -- the two that share L2 cache -- but all four cores doesn't make sense to me. Seems like the L2 cache is wasted when migrating between each core2 package.
2. Re:Linux schedules better than this by Krishnoid · 2008-12-22 15:08 · Score: 3, Interesting
  
  Wasn't there an article recently about this describing that if only one core was working at peak capacity that the die would heat unevenly, causing problems?
NUMA NUMA by Gothmolly · 2008-12-22 14:41 · Score: 3, Informative

Linux can already deal with scheduling tasks to processors where the necessary resources are "close". It may not be obvious to the likes of PC Magazine, but its trivially obvious that even multithreaded programs running on a non-location aware kernel are going to take a hit. This is a kernel problem, not an application library problem.

--
I want to delete my account but Slashdot doesn't allow it.
This isn't news by nettablepc · 2008-12-22 14:43 · Score: 5, Informative

Anyone who has been doing performance work should have known this. The tools to adjust things like core affinity and where interrupts are handled have been available in Linux and Windows for a long time. These effects were present in 1980s mainframes. DUH.
1. Re:This isn't news by Clover_Kicker · 2008-12-22 14:58 · Score: 5, Insightful
  
  80s mainframe tech is NEW and EXCITING to a depressing number of tech people, look at how excited everyone got when someone remembered and re-implemented virtualization.
not a surprise by Eil · 2008-12-22 15:00 · Score: 5, Insightful

Here's an exercise: Take 2 brand-new systems with identical configurations and start them at the same time doing some job that takes a few hours and utilizes most of the hardware to some significant degree. Say, compiling some huge piece of code like KDE or OpenOffice. System administrators who do exactly this will tell you that you'll almost never see the two machines complete the job at precisely the same time. Even though the CPU, memory, hard drive, motherboard, and everything else is the same, the system as a whole is so complex that minute differences in timing somewhere compound into larger ones. Sometimes you can even reboot them and repeat the experiment and the results will have reversed. It shouldn't come as a surprise that adding more complexity (in the form of processor cores) would enhance the effect.
1. Re:not a surprise by im_thatoneguy · 2008-12-22 16:12 · Score: 4, Interesting
  
  We have this problem at work.
  We have a render farm of 16 machines. 12 of them are effectively identical but despite all of our coaxing one of them always runs about 30% slower. It's maddening. But "What can you do?". Hardware is the same. We Ghost the systems so the boot data is exactly the same... and yet... slowness. It's just a handicapped system.
Re:Linux and Windows by swb · 2008-12-22 15:57 · Score: 3, Informative

They mentioned this in an ESX class I took. I seem to remember it in the context of setting a processor affinity or creating multi-CPU VMs and how either the hypervisor was smarter than you (eg, don't affinity) or that multi-CPU VMs could actually slow other VMs because the hypervisor would try to keep multi-CPU VMs on the same socket, thus deny execution priority to other VMs (eg, don't assign SMP VMs because you can unless you have the CPU workload).
Well known problem by sjames · 2008-12-22 16:01 · Score: 3, Insightful

The problem is a complex one. Every possible scheduling decision has pluses and minuses. For example, keeping a process on the same core for each timeslice maximizes cache hits, but can lose if it means the process has to wait TOO long for it's next slice. Likewise, if a process must wait for something, should it yield to another process or busy wait. SHould interrupts be balanced over CPUs or should one CPU handle them?
A lot of work has gone in to those questions in the Linux scheduler. For all of that, the scheduler only knows so much about a given app and if it takes TOO long to 'think' about it, it negates the benefits of a better decision.
For special cases where you're quite sure you know more than the scheduler about your app, you can use the isolcpus kernel parameter to reserve CPUS to run only the apps you explicitly assign to them.
You can also decide which CPU any given IRQ can be handled by (but not which core within a CPU as far as I know) wilt /proc/irq/*/smp_affinity.
Unless your system is dedicated to a single application and you understand it quite well, the most likely result of screwing with all of that is overall loss of performance.
Re:Yup by cetialphav · 2008-12-22 16:48 · Score: 3, Informative

How about a "parallel foreach(Thing in Things)" ?
That is easy. If your application can be parallelized that easily, then it is considered embarrassingly parallel. OpenMP exists today and does just this. All you have to do (in C) is add a "#pragma" above the for loop and you have a parallel program. OpenMP is commonly available on all major platforms.
The real problem is that most desktop applications just don't lend themselves to this type of parallelism and so the threads have lots of data sharing. This data sharing causes the problem because the programmer must carefully use synchronization primitives to prevent race conditions. Since the programmer is using parallelism to boost performance, they only want to introduce synchronization when they absolutely have to. When in doubt, they leave it out. Since it is damn near impossible to test the code for race conditions, they have no indication when they have subtle errors. This is what makes concurrent programming so difficult. One researcher says that using threads makes programs "wildly nondeterministic".
It is hard to blame the programmers for being aggressive in seeking performance gains because Amdahl's Law is a real killer. If you have 90% of the program parallelized, the theoretical maximum performance gain is 10X no matter how many cores you can throw at the problem.