Researchers Unveil Experimental 36-Core Chip

← Back to Stories (view on slashdot.org)

Researchers Unveil Experimental 36-Core Chip

Posted by samzenpus on Monday June 23, 2014 @12:52AM from the we-need-another-core dept.

rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.

143 comments

Min score:

Reason:

Sort:

im still a bit skeptical. by nimbius · 2014-06-23 00:55 · Score: 3, Funny

All this performance in just one chip. I mean, sure, it has 36 cores but lets be rational here...does it seriously expect to run crysis?

--
Good people go to bed earlier.
1. Re:im still a bit skeptical. by Anonymous Coward · 2014-06-23 01:09 · Score: 1, Funny
  
  You'd need to imagine a beowulf cluster of 'em to accomplish that.
2. Re:im still a bit skeptical. by Anonymous Coward · 2014-06-23 01:55 · Score: 0
  
  Imagine a beowulf cluster of them on a chip and we're getting somewhere.
3. Re:im still a bit skeptical. by Anonymous Coward · 2014-06-23 02:45 · Score: 0
  
  ...does it seriously expect to run crysis?
  I didn't seriously expect that joke was still funny. But there you go.
4. Re:im still a bit skeptical. by Jeremy+Erwin · 2014-06-23 14:15 · Score: 1
  
  To really run crysis well, you'd probably need something like the GeForce GTX Titan-- which has 896 double precision cores. However, if you raytrace the graphics, you might be able to run it on a 72 core Knights Landing chip.
5. Re:im still a bit skeptical. by Anonymous Coward · 2014-06-24 10:31 · Score: 0
  
  Fuck a beowulf cluster. I want Crysis on a chip.
Re:Different Power Supply Voltage by drinkypoo · 2014-06-23 01:00 · Score: 1, Insightful

According to the comparison table, (Refer timeline 4:21 of this video) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.
Really? They won't be able to specify a 1.1V VRM instead of a 1.0V VRM? Those poor, poor chip makers. They sound like a bunch of incompetent fucks.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Different Power Supply Voltage by edmudama · 2014-06-23 01:02 · Score: 2

That doesn't matter. The power supply surrounding the socket/pads will account for whatever Vcc needs to be.

--
More data, damnit!
Moore's Law by Grindalf · 2014-06-23 01:05 · Score: 0

That's a fun post! 36-core is immense! As an aside: It's been a while since we've seen any decent rise in processor Ghz. I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet! I'm sure this was discussed in Moore's Law lectures prior to Y2K, but mention it these days and everyone scowls! So some people can (and they run cool) and some people can't, what normally happens in computing when the faster items are released?

--
The purpose of existence is to make money.
1. Re:Moore's Law by Opportunist · 2014-06-23 01:09 · Score: 4, Interesting
  
  As an aside: It's been a while since we've seen any decent rise in processor Ghz.
  Just to abuse a car analogy: Maybe it's time we stop revving up and instead shift gears.
  
  --
  We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
2. Re:Moore's Law by K.+S.+Kyosuke · 2014-06-23 01:09 · Score: 1
  
  36-core is immense!
  Yawn...
  
  --
  Ezekiel 23:20
3. Re:Moore's Law by Anonymous Coward · 2014-06-23 01:16 · Score: 1
  
  Think of the possibilites:
  make -j36
4. Re:Moore's Law by Grindalf · 2014-06-23 01:16 · Score: 2
  
  Whilst I have my foot to the floor ... I still think it's a failure of science - there's nothing wrong with doing both simultaneously - to believe otherwise would be to buy into a rhetorical device based on "false opposites."
  
  --
  The purpose of existence is to make money.
5. Re:Moore's Law by Grindalf · 2014-06-23 01:18 · Score: 1
  
  That's a beauty, and the Ezekiel ref in your sig (23:20) made me laugh out loud too ...
  
  --
  The purpose of existence is to make money.
6. Re:Moore's Law by wjcofkc · 2014-06-23 01:20 · Score: 1
  
  The reason Apple stuck with the Power architecture for so long, was because IBM promised them quad and greater core chips running at 8 Ghz, air cooled, by 2005. Needless to say, they didn't even come close to delivering. It was that failure that led Apple to switch to x86.
  
  --
  Brought to you by Carl's Junior.
7. Re:Moore's Law by itzly · 2014-06-23 01:28 · Score: 1
  
  Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.
8. Re:Moore's Law by caseih · 2014-06-23 01:31 · Score: 1
  
  And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die. In any event Moore's Law continues to hold, more or less. Other things like CPU speed have followed a similar trend in times past, but no longer do now.
9. Re:Moore's Law by Virtucon · 2014-06-23 01:33 · Score: 2
  
  Immense? Immense you say? Try IBM's mega footprint z196 at over 512mm^2 is one big ass chip.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
10. Re:Moore's Law by dreamchaser · 2014-06-23 01:37 · Score: 1
  
  There are still technical challenges to increasing clock speed. Just because "IBM said it would" doesn't make it so. Instead you are seeing higher IPC due to architectural refinements as well as more and more cores. Clock speeds are still inching up but do not expect any huge radical jumps anytime soon.
11. Re:Moore's Law by Grindalf · 2014-06-23 01:39 · Score: 1
  
  Regarding your Power PC comment specifically: This was from an IBM research department that makes a processor other than the Power PC – I believe they were used for Z OS units or the like. I use it as an operational example ONLY as, as Mr Spock says:- “that which HAS happened CAN happen” and it is therefore a possibility.
  
  --
  The purpose of existence is to make money.
12. Re:Moore's Law by Virtucon · 2014-06-23 01:40 · Score: 1
  
  Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
13. Re:Moore's Law by K.+S.+Kyosuke · 2014-06-23 01:41 · Score: 1
  
  Now why would I want to do that? Obviously, for FP tasks, a modified design would be necessary - but given that GA144 is already unbeatable in integer performace energy efficiency, even at 180 nm where it's being manufactured, if you extend the ISA to be more FP-friendly and switch to a recent process, I don't see a problem. Well, it would need different memory interfaces to make it a shared memory multiprocessor. That's a bummer. But I guess it can't be helped; programmers are lazy.
  
  --
  Ezekiel 23:20
14. Re:Moore's Law by Anonymous Coward · 2014-06-23 01:46 · Score: 3, Interesting
  
  A better analogy is that they keep adding seats and making the whole vehicle slower.
  Kawasaki Ninja == 10GHZ single core (fastest way to get anywhere alone)
  Ford Mustang == 4GHz quad-core (most people only use the front two seats, but if desperate you can squeeze more people in)
  Chevy Suburban == 3.3 GHz 8-core (it seems like everyone wants one, but most people who have a full load just have a bunch of little kiddies)
  Mercedes Sprinter == 2.7 GHz 12-core (just meant to be a grinding people hauler)
  School Bus == 1.2GHz Xeon Phi (slow as hell and very specialized, no normal person would ever want one)
  Double Decker Bus == Peh's stuff (probably a use for mass transit(i.e virtualization) and as a cool novelty)
15. Re:Moore's Law by itzly · 2014-06-23 01:49 · Score: 1
  
  Maybe you don't want to do that, but good floating point performance is a requirement for a lot of useful tasks. Also, many real world tasks need access to large amounts of memory, and often that memory needs to be available to multiple nodes. The GA144 fails there too, since it has a pitiful amount of memory. Except for a small handful of niche applications that happen to match the GA144's capabilities, it's a useless device.
16. Re:Moore's Law by willy_me · 2014-06-23 01:56 · Score: 2
  
  And hopefully in any lectures on Moore's Law, the students learn that Moore's Law refers to transistors on a die, not the speed of the chips. This 36-core chip probably jumps ahead of Moore's Law a bit, as it's got to be a fairly large die.
  Moore's Law refers to the number of components per integrated circuit for minimum cost. Note that this is basically transistor density and is not impacted by core size. Silicon defects and transistor size determine the optimal number of components per IC.
  A quote from Wikipedia,
  
  Moore himself wrote only about the density of components, "a component being a transistor, resistor, diode or capacitor,"[26] at minimum cost.
17. Re:Moore's Law by K.+S.+Kyosuke · 2014-06-23 01:58 · Score: 1
  
  It's the notion (asynchronous, self-clocked, energy efficient chip, maximizing performance per watt and performance per mm^2) that matters to me, not this specific design (which is intended for specific purposes). Witness how the HPC people embraced GPUs, which are sort of heading in a similar direction already.
  
  --
  Ezekiel 23:20
18. Re:Moore's Law by Grindalf · 2014-06-23 02:17 · Score: 1
  
  Do you know what I'm going to do? I'm going to go out and get a shirt printed up with the expression “I Heart Processor Ghz” and wear it at parties! Why? Every time this crops up at meetings that I've attended there is always someone who loses their temper at the mere mention of anyone developing a faster processor, irrespective of how many cores or cache size and I don't like it! The physics has been done! I'd swear, if anyone ever does a THz processor, and one of these kids finds out, they'll egotistically self explode on the spot and try and burn down the factory in question because it's such an affront. The cultural phenomenon of “Gigaphobia” needs investigating by qualified professionals :0)
  
  --
  The purpose of existence is to make money.
19. Re:Moore's Law by Z00L00K · 2014-06-23 02:33 · Score: 1
  
  It's of course good if the distance between cores is kept to a minimum, but if the software designers and compilers considers the limitations when generating the binaries it may not be a huge performance bottleneck in real world applications.
  It's better to switch to a new core than to switch task on a core for example. Looking at what happens in a modern PC most processing is mostly unrelated to the other. Even inside a web browser you may have several plugins running in different parts of the screen, but they don't really interact with each other, so they can run on standalone processor cores.
  When doing SIMD calculations then you run the same instruction in parallel on many cores with different data as input, and that is not a big deal either.
  The bottleneck you may experience is on the buses to RAM, disk and I/O devices. Just realize that not every core has the same distance to the resource - so by having affinity on the executables indicating preferences to type of I/O it might be possible to assign it to the correct area of cores in the processor.
  So far much of the computer design has been into trying to make the computer as general as possible. Think of it as a Swiss Army knife (Or maybe a Leatherman Multi-tool) - it can do everything, but not excel at anything. A real mechanic has a good toolbox instead with different groups of tools, screwdrivers, hammers etc. and each of those tools are highly specialized even within it's group. Screwdrivers comes in many forms; Flat, Phillips, Pozidrive, Allen, Torx, XZN, Orange Juice based etc. By using the right tool for the job you get the work done faster and often more accurate than by the generic tool.
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
20. Re:Moore's Law by Z00L00K · 2014-06-23 02:36 · Score: 1
  
  Boeing 747 == take on a crapload of people for a long haul excursion.
  
  --
  If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
21. Re:Moore's Law by Shoten · 2014-06-23 02:37 · Score: 3, Insightful
  
  Nope, Liquid Nitrogen cooling gets you past the speed limits. How about over 8Ghz on a chip that costs less than $200? Going to Helium and you can get over 8.5Ghz. although both become a bit unweildy when it comes to game play because I don't want my hard drives to freeze. I love that last video there's some real country boy engineering there including using a propane torch and a hair dryer to keep certain components from freezing.
  I'm a little confused as to why you're citing the chip's low low price of "less than $200" if you need liquid nitrogen to get it to perform the way you want it to. You do realize that cooling systems cost money, too...right? There's no point in being able to use a cheap processor to get to X performance benchmark if the required additional support systems cost thousands of dollars more than a more powerful and more expensive processor that can do it out of the box. Not to mention the fact that liquid nitrogen cooling isn't exactly hassle-free, especially in a household environment. And it's worth noting that even if you boost Ghz, you eventually run into choke points related to pushing data to and from the chip anyways. You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.
  
  --
  
  For your security, this post has been encrypted with ROT-13, twice.
22. Re:Moore's Law by skovnymfe · 2014-06-23 02:38 · Score: 1
  
  Liquid Nitrogen/Helium cooling is great... while it lasts. When it's used up however, you've got to pay for another bottle of cooling. I have no idea how long a $200 CPU can run at 8GHz on a bottle of Nitrogen, or how much a bottle of Nitrogen costs, but I can't imagine it's a good long term solution.
23. Re:Moore's Law by Opportunist · 2014-06-23 02:47 · Score: 1
  
  Of course, but just like when you shift, first shift, then rev the engine up again. Else your clutch will probably wear out quickly.
  And no, I have absolutely no idea how that analogy still applies.
  
  --
  We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
24. Re:Moore's Law by Opportunist · 2014-06-23 02:50 · Score: 1
  
  Odd. I went to a catholic church, but strangely we never got to that part.
  Talk about selective teaching and leaving out the interesting parts!
  
  --
  We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
25. Re:Moore's Law by Rockoon · 2014-06-23 03:05 · Score: 1
  
  When doing SIMD calculations then you run the same instruction in parallel on many cores with different data as input
  Your definition of core seems to be completely different from anyone elses. You seem to have relabeled execution units as 'cores,' and for seemingly completely ignorant reasons.
  
  --
  "His name was James Damore."
26. Re:Moore's Law by Anonymous Coward · 2014-06-23 03:05 · Score: 0
  
  Yes of course, everything is just a continuous cycle of ever-improving performance, which is why we still fly the Concorde and work only 20 hours a week in a leisure society. Oops.
27. Re:Moore's Law by gnasher719 · 2014-06-23 03:29 · Score: 1
  
  Try doing a 100x100 double precision matrix inversion on one of those chips, and you'll stop yawning pretty quickly.
  That should be easily done in a millisecond or so on a single core of any modern Intel processor. You could probably get it down to 100 microseconds on the latest ones.
28. Re:Moore's Law by Anonymous Coward · 2014-06-23 03:31 · Score: 0
  
  A litte more yawn:
  http://en.wikipedia.org/wiki/PicoChip
  and of course Adapteva's Epiphany IV mentioned in another thread.
  Apart from that there are others that have tried to achieve better performance with lots of simple CPU cores/threads:
  http://en.wikipedia.org/wiki/Parallax_Propeller
  http://en.wikipedia.org/wiki/XMOS
  http://en.wikipedia.org/wiki/Ubicom (skip to Ubicom32)
29. Re:Moore's Law by Anonymous Coward · 2014-06-23 03:50 · Score: 0
  
  Yawn...
  The CM-1, depending on the configuration, had as many as 65,536 processors ...
  Yawn...
30. Re:Moore's Law by SuricouRaven · 2014-06-23 04:10 · Score: 2
  
  Nitrogen overclocking is done for contests. You can get phase change cooling, which is the next best thing and will still get your processor far below zero. The big downside to that is just power consumption. It's also bulky and noisy.
31. Re:Moore's Law by Salgat · 2014-06-23 04:11 · Score: 1
  
  What an empty statement. It's easy to say we should try something else when things get difficult, without having any practical solution in place.
32. Re:Moore's Law by jones_supa · 2014-06-23 04:25 · Score: 1
  
  Let's still not forget that a single core of a modern Core i7 chip is about 6x as fast as a single-core Pentium 4. At the same clockspeed.
33. Re:Moore's Law by itzly · 2014-06-23 04:32 · Score: 2
  
  My point exactly. What is a simple task on an modern Intel becomes nearly impossible on the GA144. We've already tried the idea of combining large numbers of simple processors, and it has failed every single time. If NxM simple cores together can't beat a modern Intel processor for a range of useful tasks, there's not much point in developing it.
34. Re:Moore's Law by default+luser · 2014-06-23 04:56 · Score: 1
  
  Absolutely not true.
  The Core 2 Duo is approximately 2x faster clock-for-clock versus the Pentium 4, and the current Haswell core is barely 40% faster than that (assume a 7% speedup per-clock for every core rev since). That gets you somewhere in the 2x-3x performance improvement range for Haswell, barring corner-cases that are embarrassingly easy to leverage AVX/FMA (most real-world use cases show small improvements).
  Intel proved that they could do a whole lot better than the Pentium 4, but your performance improvement factor is off by half!
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
35. Re:Moore's Law by ColdWetDog · 2014-06-23 05:10 · Score: 1
  
  What's a 'gear'?
  
  --
  Faster! Faster! Faster would be better!
36. Re:Moore's Law by ColdWetDog · 2014-06-23 05:14 · Score: 4, Funny
  
  You can give the most important worker on an assembly line all the crystal meth they can eat, but they can't work any faster than the conveyor belt in front of them.
  Ah! The 21st Century version of the 'mythical man month' - so much more apropos for this audience than the pregnancy analogy.
  
  --
  Faster! Faster! Faster would be better!
37. Re:Moore's Law by Kaenneth · 2014-06-23 05:37 · Score: 1
  
  The Titanic == Itanium
38. Re:Moore's Law by UnknownSoldier · 2014-06-23 06:08 · Score: 1
  
  > it's been a while since we've seen any decent rise in processor Ghz.
  That's because silicon doesn't scale past 5 GHz at room temperature.
  > I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet!
  There have been 100 GHz CPUs for ages but the supply/demand isn't financially viable yet.
  * http://www.itproportal.com/201...
39. Re:Moore's Law by laird · 2014-06-23 06:20 · Score: 1
  
  Not just 'tried to', but actually delivered. Thinking Machines CM-1 and CM-2 had routers on chip with 32 CPUs per chip. In the hybercube architecture, the 5 lowest bits of CPU address routed on-chip, and the rest of the CPU address routed between chips. It worked quite well, and was the fastest computer on the planet for several years running.
  
  --
  Enable 3D printed prosthetics!
40. Re:Moore's Law by jones_supa · 2014-06-23 06:36 · Score: 1
  
  I used www.cpubenchmark.net for my numbers.
41. Re:Moore's Law by nmr_andrew · 2014-06-23 07:47 · Score: 1
  
  Using liquid helium would be way cost prohibitive, especially for a very small gain (8.5 GHz vs. 8 GHz in GP's post). Under a fairly good contract and purchased in relatively large quantities, our current cost of LHe is slightly less than $11/liter.
  Liquid nitrogen is a different story. It's still expensive compared to a cooling fan, but we pay ~$0.40/liter for LN2. If you were doing a lot of this, the "standard" tank sizes are 160 or 180 L, one such tank carefully managed should last a bare minimum of 1-2 weeks cooling a small number of CPUs, or a few dollars/day.
  So, if you really need that 8 GHz speed, and I can't think of any remotely general purpose chip that you can buy that runs anywhere near that clock speed, it could be cost effective. However, as the post just above as I'm writing this starts to point out, it's a moot point if you can't feed in enough data to keep a chip running at that speed - if you're idle 75% of the time, might as well just not bother overclocking like that unless you can also get your RAM, busses, etc. to run 4x faster than their normal rating.
42. Re:Moore's Law by Virtucon · 2014-06-23 08:33 · Score: 1
  
  Some of us run better than off the shelf liquid cooling, no hassles and for less than 300 bucks. I have a nice system and it's quiet because I can run the big fans. Sure, Liquid Nitrogen systems are available but the OP was about stopping the rev up process, since 8Ghz is now possible, the barrier needs to be set higher. I don't think we'll see it anytime within the next five years but maybe.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
43. Re:Moore's Law by Virtucon · 2014-06-23 08:35 · Score: 1
  
  Oh yeah plus Liquid Helium is becoming rare. http://phys.org/news201853523....
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
44. Re:Moore's Law by nmr_andrew · 2014-06-23 09:40 · Score: 1
  
  I didn't really feel the need to go there, but I hear you. Considering that my job relies on keeping the superconducting coils in our magnets at ~4K, I'm all too aware of this. Our prices have nearly doubled in the last few years, and there have been a handful of supply scares.
45. Re:Moore's Law by Virtucon · 2014-06-23 10:15 · Score: 1
  
  Well then since our Helium reserves come from Oil and Natural Gas drilling, all I can say is Drill baby Drill!
  When I started TIG welding in the 70s, Helium tanks were about $30/bottle which was still expensive considering a mortgage for a decent home was $300. Now all I use is Argon which is a bitch when trying to weld overhead.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
46. Re:Moore's Law by Shoten · 2014-06-23 10:24 · Score: 1
  
  Some of us run better than off the shelf liquid cooling, no hassles and for less than 300 bucks. I have a nice system and it's quiet because I can run the big fans. Sure, Liquid Nitrogen systems are available but the OP was about stopping the rev up process, since 8Ghz is now possible, the barrier needs to be set higher. I don't think we'll see it anytime within the next five years but maybe.
  Yeah, but Intel and AMD will go bankrupt if they make chips just for "some of us." And if you look at where Intel has gotten their speed increases, very little of it in the past decade has been from clock speed. Ghz is no longer where the performance boost is to be found.
  
  --
  
  For your security, this post has been encrypted with ROT-13, twice.
47. Re:Moore's Law by kelemvor4 · 2014-06-23 10:34 · Score: 1
  
  That's a fun post! 36-core is immense! As an aside: It's been a while since we've seen any decent rise in processor Ghz. I remember IBM talking about functioning reasonably cool 10 Ghz processors (ref needed) in the early 2000s, but no one has them in the shops yet! I'm sure this was discussed in Moore's Law lectures prior to Y2K, but mention it these days and everyone scowls! So some people can (and they run cool) and some people can't, what normally happens in computing when the faster items are released?
  It's a step down from the 48 core CPU Intel created in 2009. http://www.intel.com/pressroom...
48. Re:Moore's Law by Virtucon · 2014-06-23 11:30 · Score: 1
  
  Ghz is king because not all workloads are multithreaded enough to take advantage of multiple cores/threads. Eventually software engineers will catch up and start
  leveraging what the architecture provides I'd bet that 8 out of 10 COTS packages out there at least in the Desktop arena don't take advantage multithreading.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
49. Re:Moore's Law by Anonymous Coward · 2014-06-23 14:23 · Score: 0
  
  And still, only capable of 50000 Dhrystone MIPS.
50. Re:Moore's Law by Grindalf · 2014-06-23 19:13 · Score: 1
  
  That's nice!
  
  --
  The purpose of existence is to make money.
51. Re:Moore's Law by default+luser · 2014-06-24 04:08 · Score: 1
  
  Which uses Passmark, which is a simple corner-case number-crunching bonanza. Pure AVX2, or FMA without any real-world qualifiers, restriction or branching? Sure, we got that!
  And even with that, you're still off. The performance improvement with Haswell per-core is less than 5x. See here:
  http://www.cpubenchmark.net/compare.php?cmp[]=2020&cmp[]=1127
  So, in the unoptimized case the performance improvement is 2-3x, and in the embarrassingly-parallel case the speedup is 4-5x. But then, if you had such an embarrassingly-parallel case, you'd just port it to OpenCL and be done with it. Haswell is for all those hard-to-optimize compute cases.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
52. Re:Moore's Law by jones_supa · 2014-06-24 08:47 · Score: 1
  
  I was still less off than you.
53. Re:Moore's Law by default+luser · 2014-06-24 11:32 · Score: 1
  
  In your glass tower, yes.
  In the real world, not so much.
  Here is an example of one of the world's most optimized pieces of software: x264. It's also one of the few real-world loads that can take advantage of multiple processor and SSE. So how much speedup did this incredible piece of software see with AVX2, which DOUBLED the width of the integer pipelines?
  FIVE PERCENT! Yup, that's it!
  All that work for so very little improvement, because in the REAL WORLD data does not align on perfect AVX2 boundaries, and data fetch is as much of a hindrance as the actual processing of that data. Read more about WHY this is the best that could be done here, if you don't mind paying for SCRIBD.
  Parading around test results form something like Passmark is just self-delusion. It only tests that the features do in-fact work, and these tests tend to work directly from cache in small data sets that are usually not branch-heavy. IT gives score for number of MIPS, but does not take into account the fact that most real software can't actually make use of these features at-speed.
  And when they increase the vector size yet-again to 512-bits wide in a year, it will once-again be a limited real-world improvement, because optimization of real loads is hard, and auto-vectorization of arbitrary loads is even harder problem to solve. So Intel keeps adding new features, and they keep adding about 5-7% each (real world). So I don't see how you get above 3x from those puny performance increases, while not deluding yourself.
  
  --
  Man is the animal that laughs.
  And occasionally whores for Karma.
36 cores? Network on a chip? Meh! by Kyle · 2014-06-23 01:07 · Score: 1

http://www.adapteva.com/epipha...
64 cores, mesh network that extends off the chip, in production.
Try harder MIT :-p

--
The previous comments are only true, if no-one says they're wrong.
1. Re:36 cores? Network on a chip? Meh! by itzly · 2014-06-23 01:13 · Score: 1
  
  Adding cores is easy. Keeping all the cores busy with useful work in a typical range of high performance applications is the difficult part.
2. Re:36 cores? Network on a chip? Meh! by Melkhior · 2014-06-23 01:20 · Score: 1
  
  http://www.adapteva.com/epipha...
  64 cores, mesh network that extends off the chip, in production.</p><p>Try harder MIT :-p</p></quote>
  
  They already tried harder : http://www.tilera.com/. And as another post mentioned, Intel Knights Corner is cache coherent on 61 cores (62 architectured).
  
  The summary doesn't get the point of the article: what's novel is not the presence of cache coherency, it's just the new way of implementing snoop-based cache coherency over their network. Cache coherency for a large number of cores can be very expensive time-wise, so any idea to improve it is more than wecome.
3. Re:36 cores? Network on a chip? Meh! by Anonymous Coward · 2014-06-23 01:22 · Score: 0
  
  Is that one actually made? The only one in the store is the 16 core version.
4. Re:36 cores? Network on a chip? Meh! by Anonymous Coward · 2014-06-23 01:25 · Score: 0
  
  And no cache is discussed anywhere (even in the reference manual).
5. Re: 36 cores? Network on a chip? Meh! by Anonymous Coward · 2014-06-23 02:02 · Score: 0
  
  Made, and about to be unleashed
  http://www.hpcwire.com/off-the-wire/adapteva-unveils-worlds-smallest-supercomputing-platform-isc14/
6. Re:36 cores? Network on a chip? Meh! by TheRaven64 · 2014-06-23 05:00 · Score: 5, Informative
  
  The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.
  Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk . We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.
  
  --
  I am TheRaven on Soylent News
7. Re:36 cores? Network on a chip? Meh! by Anonymous Coward · 2014-06-23 19:44 · Score: 0
  
  So you've open sourced a project written in a propriety HDL called Bluespec?
8. Re:36 cores? Network on a chip? Meh! by TheRaven64 · 2014-06-23 19:53 · Score: 2
  
  Yes, we've also released the generated Verilog for anyone who wants to use just that. If you're a university, you can easily get a free license for Bluespec. If you're not, then you either most likely don't have the resources to get a decent FPGA (the ones that can run a processor at a useable speed start at about $3K), or you can probably afford the license. We're also talking to Bluespec about open sourcing their compiler, as most of their real value is from other services on top of it, but that's likely to take some time.
  We're evaluating CHISEL, which is promising, but currently there's nothing else in the open source world that comes even vaguely close to Bluespec in terms of productivity for hardware designers, and CHISEL was not available when we started.
  
  --
  I am TheRaven on Soylent News
Intel Knights Landing by SirDrinksAlot · 2014-06-23 01:09 · Score: 2, Informative

So what's special about this chip that Intel's Xeon Phi (first demonstrated in 2007 as Knights Landing with 80 or so cores) isn't already doing? Or is this just a rehash of 7 year old technology that's already in production? It sounds like a copy/paste of Intel's research.
"Intel's research chip has 80 cores, or "tiles," Rattner said. Each tile has a computing element and a router, allowing it to crunch data individually and transport that data to neighboring tiles." - Feb 11, 2007
1. Re:Intel Knights Landing by dreamchaser · 2014-06-23 01:17 · Score: 2
  
  Presumably the novel way they address (pun intended) cache coherency is what is new. More efficiency = greater performance. Time will tell.
2. Re:Intel Knights Landing by gman003 · 2014-06-23 01:19 · Score: 1
  
  It does seem rather similar - a large cluster of cores, laid out in a grid topology. Perhaps they're doing something different with the cache coherency? I couldn't find too much on how Intel's handling that, and it seems to be a focus of the articles on this chip.
3. Re:Intel Knights Landing by Trepidity · 2014-06-23 01:24 · Score: 5, Informative
  
  Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.
  
  --
  10 PRINT CHR$(205.5+RND(1)); : GOTO 10
4. Re:Intel Knights Landing by epine · 2014-06-23 03:02 · Score: 1
  
  The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.
  Somehow this was obvious to me even from the press release. I've never yet seen details of an ordering model laid bare where it wasn't the core novelty. Ordering models are inherently substantive. Ordering models beget theorems. Cute little Internets drool and coo.
Architecture by wjcofkc · 2014-06-23 01:16 · Score: 1

I would be curious to know more about the architecture and all around chip specs they are using in their prototype: clock speed, memory interface, etc. The article states they are developing a version of Linux to test it on, so it's safe to say it's an established architecture. Anyway, I am excited to see the results once they have tested it on Linux. While this does not help with the density per core problem, perhaps it will help extend Moore's Law from the perspective of speed increase in respect to micro circuitry.

--
Brought to you by Carl's Junior.
Is there anything new here? by Junta · 2014-06-23 01:20 · Score: 1

So, in one die, it's a little interesting, though GPU stream processors and Intel's Phi would seem to suggest this is not that novel. The latter even let's you ssh in and see the core count for yourself in a very familiar way (though it's not exactly the easiest of devices to manage, it's still a very much real world example of how this isn't new to the world).
The 'not all cores are connected' is even older. In the commodity space, hypertransport and QPI can be used to construct topologies that are not full mesh. So not only is it not all cores on a bus, it is also not all cores mesh connected, the two attributes claimed as novel here.
Basically, as of AMD64 people had relatively affordable access to an implementation of the concept, and as of Nehalm both major x86 vendors had this concept in place. Each die included all the logic needed to implement a fabric, with the board providing essentially passive traces.

--
XML is like violence. If it doesn't solve the problem, use more.
1. Re:Is there anything new here? by Anonymous Coward · 2014-06-23 01:31 · Score: 0
  
  Also checkout "transputers" for waferscale networks of processor cores.
  http://en.m.wikipedia.org/wiki/Transputer
2. Re:Is there anything new here? by Trepidity · 2014-06-23 01:39 · Score: 4, Informative
  
  The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".
  How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.
  
  --
  10 PRINT CHR$(205.5+RND(1)); : GOTO 10
3. Re:Is there anything new here? by enriquevagu · 2014-06-23 06:14 · Score: 3, Informative
  
  Some knowledge about multicore cache coherence here. You are completely right, Slashdot's summary does not introduce any novel idea. In fact, a cache-coherent mesh-based multicore system with one router associated to each core was presented on the market years ago by a startup from MIT, Tilera. Also, the article claims that today's cores are connected by a single shared bus -- that's far outdated, since most processors today employ some form of switched communication (an arbitrated ring, a single crossbar, a mesh of routers, etc).
  What the actual ISCA paper presents is a novel mechanism to guarantee total ordering on a distributed network. Essentially, when your network is distributed (i.e., not a single shared bus, basically most current on-chip network) there are several problems with guaranteeing ordering: i) it is really hard to provide a global ordering of messages (like a bus) without making all messages cross a single centralized point which becomes a bottleneck, and ii) if you employ adaptive routing, it is impossible to provide point-to-point ordering of messages.
  Coherence messages are divided in different classes in order to prevent deadlock. Depending on the coherence protocol implementation, messages of certain classes need to be delivered in order between the same pair of endpoints, and for this, some of the virtual networks can require static routing (e.g. Dimension-Ordered Routing in a mesh). Note a "virtual network" is a subset of the network resources which is used by the different classes of coherence messages to prevent deadlock. This is a remedy for the second problem. However, a network that provided global ordering would allow for potentially huge simplifications of the coherence mechanisms, since many races would disappear (the devil is in the details), and a snoopy mechanism would be possible -- as they implement. Additionally, this might also impact the consistency model. In fact, their model implements sequential consistency, which is the most restrictive -- yet simple to reason about -- consistency model.
  Disclaimer: I am not affiliated with their research group, and in fact, I have not read the paper in detail.
Passing messages rather than sharing! by Anonymous Coward · 2014-06-23 01:21 · Score: 0

Erlang on a chip :-)
Re:Different Power Supply Voltage by fuzzyfuzzyfungus · 2014-06-23 01:22 · Score: 4, Interesting

A higher high/low voltage swing (with a reasonable amount of other stuff being equal) will be more of a thermal nuisance; but if the perks make up for it, that's hardly a dealbreaker. The toasty end of boring desktop CPUs is somewhere north of 200watts already, with a little shoving that they typically survive, so if somebody really wants 36 cache-coherent cores on-die, they'll suck it up and make it work.

For applications that don't specifically demand that, I'd be interested to know how the costs and benefits of 'dealing with the cooling demands of a smaller number of denser parts' compare with 'dealing with the cooling demands of more, cooler, parts, closer to whatever the performance per watt sweet spot is; but with more cabling, PSUs, switches, and similar interconnect and support stuff to buy and power'...
Where's my massively parrallel programming languag by KingOfBLASH · 2014-06-23 01:25 · Score: 1

While adding an extra core or two made big jumps in performance (because you are almost always running at least two applications) there comes a point where most users won't see a performance boost. While I may now be able to throw 36 processors at a problem, you have to program all those cores to work together. Right now that's a lot of effort, and until programming languages catch up and can optimize code by making it massively parallel, this is going to be a non-starter.
The two hardest problems in CS: by magsol · 2014-06-23 01:27 · Score: 4, Funny

pointer arithmetic, cache invalidation, and off-by-one errors

--
"I'd just like to emphasise that taking a million years isn't a metaphor here..." -Rich Bradshaw
1. Re:The two hardest problems in CS: by Anonymous Coward · 2014-06-23 04:05 · Score: 0
  
  and naming
2. Re:The two hardest problems in CS: by frank_adrian314159 · 2014-06-23 04:12 · Score: 0
  
  No one expects the Spanish Inquisition.
  
  --
  That is all.
Interesting by Virtucon · 2014-06-23 01:31 · Score: 3, Informative

Cache coherency has been one of the banes of multicore architecture for years. It's nice to see a different approach but chip manufacturers are already getting high performance results without introducing additional complexity. The Oracle (Sun) Sparc T5 architecture has 16 cores with 128 threads running at 3.6Ghz. It gives a few more years to Solaris at least but it's still a hell of a processor. For you Intel fans the E7-2790 v2 sports 15 cores with 30 threads with a 37.5MB cache so they're doing something right because it screams and is capable of 85GB/s memory throughput.
I'm sure the chip architects are looking at this research but somehow I think they're already ahead of the curve because these kinds of cores/threads are jumps ahead of where we were just a few years ago. Anybody remember the first Pentium Dual Core and The UltraSparc T1?

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"
1. Re:Interesting by Anonymous Coward · 2014-06-23 02:23 · Score: 0
  
  The link for the Intel CPU heads to the ARK page for the Intel® Xeon® Processor E7-8893 v2, a 6 core chip with 12 threads. Where do you see 15 cores with 30 threads?
2. Re:Interesting by Anonymous Coward · 2014-06-23 02:40 · Score: 0
  
  GP Linked to the wrong part -
  http://ark.intel.com/products/75258/Intel-Xeon-Processor-E7-8890-v2-37_5M-Cache-2_80-GHz
3. Re:Interesting by Anonymous Coward · 2014-06-23 02:50 · Score: 1
  
  Oh? No, these parts cause quite a lot of trouble, and you won't know it until you're into kernel programming or HPC programming, and fighting sub-microsecond latency and lock contention issues.
  And processors with weak cache coherency and weak memory ordering are **MURDER** on the normal programmers. The less of those exists, the better. Most people cannot even GRASP the weak memory/cache ordering model, let alone deal with issues caused by it.
4. Re:Interesting by Anonymous Coward · 2014-06-23 04:18 · Score: 0
  
  Before Pentium dual cores, there were multiprocessor Pentium Pro boards, that cache coherency would kill you, because the cache you had to invalidate, was not on the chip, but on the system bus, (which at that time was a slow 66Mhz bus),
  There are several topologies on which to arrange the chips so that they can form assembly lines or triple sets of hypercubes.
5. Re:Interesting by Bengie · 2014-06-23 04:31 · Score: 1
  
  High "thread" count cores are good for work loads where there is little inter-thread communication and has lots of memory stalls. By having a lot of threads running at once, whenever there is a memory stall, you can just switch to another thread, and the chance of that thread being stalled is very low. This also means lots more cache thrashing, so you need larger caches, but they can be tuned for high-throughput high-latency. The entire design for these cpus is geared for high-throughput high-latency, which also tends to be great for energy efficiency.
6. Re:Interesting by Virtucon · 2014-06-23 08:54 · Score: 1
  
  Oh no question, high thread counts would make sense for say a web service application server vs. something more compute intensive. None of these architectures will ever be in the terraflop or petaflop range for that so there will still be need for specialization of highly compute intensive workloads to those kinds of systems. One thing that will kill this architecture is software compatibility, so it'd be interesting to see if it does take off. In the meantime Moore's law will keep pushing the Sparc and Intel teams to most likely surpass this in say 5 years or maybe license/adapt some of the features into existing designs.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
Re:Where's my massively parrallel programming lang by itzly · 2014-06-23 01:32 · Score: 1

A "new programming language" isn't a magical solution to make a non-parallel algorithm work well on a multi processor architecture.
Parallel processing still remains elusive. by 140Mandak262Jamuna · 2014-06-23 01:38 · Score: 2

Parallel processing has made big strides, but only in some limited areas. Graphics rendering where each pixel can be updated independent of other pixels. Or in fluid mechanics (CFD) using time marching techniques where updating the solution at one point needs data from a limited set of neighbors, or in iterative solvers of matrices. Even something very structured without if statements like inverting a matrix, parallel methods have suffered.
Basic problem is this, even if just 5% of the work has to be serial, the maximum speedup is 20x, that is the theoretical maximum. YMMV, and it does. Internet and search has opened up another vast area where a thread can do lots of work and send just very small set of results back to the caller. Hits are so small compared to misses, you can make some headway. Even then we have found very few applications suitable for massively parallel solutions.
We need a big breakthrough. If you divide a 3D domain into a number of sub domains, the interfaces between the subdomains is 2D. The volume of 3D domain represents computational load, and the area interfaces represent the communication load. If we could come up with domain-division algorithms that guarantee the interfaces would be an order of magnitude smaller, even as we go from 3D to higher number of dimensions, and if we could organize these subdomains into hierarchies, we would be able to deploy more and more of computational work, and be confident the communication load would not overwhelm the algorithm. This break through is yet to come. Delaunay Tessellations (and its dual Voronoi polygons) have been defined in higher dimensions. But the number of "cells" to number of "vertices" ratio explodes in higher dimensions, last time we tried, we could not even fit a 10 dimensional mesh of 10 points into all the available memory of the machine. It did not look promising.

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Not entirely new by Anonymous Coward · 2014-06-23 01:45 · Score: 0

CellBE was designed like this.
It had a shared loop where data was kicked around in both directions and each core picked out only that which was addressed to it.
And I don't even think this is an idea that was new to Cell either.
Intel Knights Landing by Anonymous Coward · 2014-06-23 01:47 · Score: 0

So the point here, which is not insignificant if they really have solved it, is cache coherency. Snooping other caches can become a massively expensive task in terms of round trip latency. Potentially 1000s of cylces. In low power architectures (which is certainly a consideration) moving data around is really expensive from a power perspective and is where a large portion of the power is actually spent in real world uses. IF they have solved it in a more efficient manner rather than the current brute-force approaches, then that's good research.
Re:Where's my massively parrallel programming lang by Anonymous Coward · 2014-06-23 01:54 · Score: 0

No kidding... If that was all there was to it the guys at the CPU level would just do it for us.
The problem is previous results dependency. If you do not care about previous results then multi threaded programming is dead easy and a scheduling problem which is fairly well understood. It is when you need the previous results or external I/O that parallel fails.
I predict that at some point some bright spark of a CPU guy will come up with the idea of discarded results. As 'if' conditions tend to create pipeline stalls. You could go ahead a run both paths of code. Then decide which one is correct and discard the unused results. Maybe the already have... I have not followed CPU acrh for a while now...
Great, so they reinvented by LeadSongDog · 2014-06-23 02:31 · Score: 1

..the Transputer. Great idea, but a giant market fail.

--
Oh, I'm sorry sir, I thought you were referring to me, Mr. Wensleydale.
1. Re:Great, so they reinvented by itzly · 2014-06-23 04:34 · Score: 1
  
  Giant market fail, because it was not a great idea after all.
2. Re:Great, so they reinvented by angel'o'sphere · 2014-06-23 05:44 · Score: 1
  
  Lol,
  Such says the guy with no clue.
  The Transputers where way ahead of what we do in our days.
  And the first thing I thought when I saw the MIT concept is: "oh, they have put 32 transputers on a single die".
  Transputers where build byba company called INMOS.
  About 90% of the military hardware in Europe (around 1990/1995) was running on transputers.
  That means radar systems, flight control, avionic hardware etc.
  INMOS went down because the Japanes wanted to buy it. But the french government intervened and prevented it. After some years of debating a frensh (government owned) company (was it Thales?) bought INMOS.
  But as the company had no interest in processor manufaction, they simply shut that branch down and got rid of it.
  Down went a multi billion research program of the EU, I would not wonder if there was an US sponsored conspiracy behind it ... I can not cease to wonder why our processors in real live are still 30 years behind what we had in our universities as research prototypes.
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
3. Re:Great, so they reinvented by slew · 2014-06-23 05:57 · Score: 1
  
  Giant market fail, because it was not a great idea after all.
  Actually, the transputer had a few good kernels of an idea: sea of loosely interconnected processors each with local memory. However, the actually execution wasn't that good, and the only real market was embedded military signal processing systems. For a while, inmos attempted to chase workstation graphics, but eventually they got killed by the i860 (which is sad as it too wasn't a very good implementation of any idea either, but happened better $/! than the transputer for floating point) which of course eventually died as its limitations caught up with it as well (although many of its ideas lived on in the original Pentium like U/V super-scalar execution pipe, pipelines fp unit)...
  One thing both the Transputer and the i860 had in common, is that they were engineering solutions in search of a problem to solve. Perhaps products that embed great engineering ideas often don't translate to good implementations which inevitably fail in the market, but good ideas tend live on and sometimes find their way into the market worthy products...
4. Re:Great, so they reinvented by itzly · 2014-06-23 07:20 · Score: 1
  
  If it was a great idea, people would still be doing it.
5. Re:Great, so they reinvented by angel'o'sphere · 2014-06-23 11:09 · Score: 1
  
  People are still doing it, or did you not get what this article is about?
  Or did you never catch any hint (given in this thread and many others) about http://www.greenarraychips.com... ?
  
  --
  Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
DOCTOR Singapore Research Professor, for you. by Thanshin · 2014-06-23 02:37 · Score: 0

Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT
If he is the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, who is the Research Professor of Electrical Engineering and Computer Science?
1. Re:DOCTOR Singapore Research Professor, for you. by vovin · 2014-06-23 04:21 · Score: 2
  
  Uh, she.
2. Re:DOCTOR Singapore Research Professor, for you. by Anonymous Coward · 2014-06-23 12:57 · Score: 0
  
  In English, when in doubt or it isn't clear, we default to the male personal pronoun, as there is no gender-neutral one. "It" doesn't do.
  This doesn't stop accusations of sexism, etc actual or implied.
  Even some common English first names are gender ambiguous.
Re:Where's my massively parrallel programming lang by Z00L00K · 2014-06-23 02:41 · Score: 1

The question is - do you always need a parallel tasking software? Most tasks are bread&butter tasks, no need to chew them up. Put your energy into the few things that do need to be broken up.
But mostly it's a "hen and egg" problem - can't do multi-core software since there aren't enough serious multi-core machines, or the owners in software companies don't see a benefit in it.

--
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 02:44 · Score: 0

1V over 1um is a million volts per meter, assnozzle. Your incompetent and imbecilic commentary really drag down this place.
Re:Where's my massively parrallel programming lang by Rockoon · 2014-06-23 03:24 · Score: 2

You could go ahead a run both paths of code. Then decide which one is correct and discard the unused results.
Intel is already doing partial speculative execution in the case of conditional branches. The pipeline is filled with the predicted path which is then frequently executed out of order (before the condition is known) ..

Intel is not however doing the full concept you have described (eager speculative execution) and I don't think its likely that they ever will. The best case for eager speculative execution would be when the branches are completely unpredictable, which is only very rarely true. Further, it requites significant over-provisioning of execution units to have enough to execute both paths of a conditional branch each at "best possible speed" .. resources that would be completely wasted whenever there isnt a conditional branch in the pipeline...

--
"His name was James Damore."
Re:D-WAV-E !! by Anonymous Coward · 2014-06-23 03:31 · Score: 0

If it was up to me, you could have ours. The refrigerators in the parking garage are really annoying.
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 03:57 · Score: 0

I figure you must be some type of alcoholic or have another substance problem. Can you confirm?
Re:Different Power Supply Voltage by LynnwoodRooster · 2014-06-23 04:11 · Score: 1

According to the comparison table, (Refer timeline 4:21 of this video) this chip uses 1.1V while other standard chips are using 1.0V. This difference may make it hard for the chip makers to use this technology.
No, it's the only way to make it faster because it goes to eleven...

--
Browsing at +1 - no ACs, I ignore their posts. So refreshing!
Such Short memory. . . by Anonymous Coward · 2014-06-23 04:18 · Score: 0

Been there, done that:
http://en.wikipedia.org/wiki/Transputer
http://en.wikipedia.org/wiki/Network_on_a_chip
And still, the modern interconnects from the likes of ARM (CCN-508) are, in effect, the same thing.
And then there's this:
http://www.xmos.com/
IBM even does this with their MCM's for their high end servers & Mainframes.
Serializing things to send over to another core also costs time/transistors.
What's really needed is a novel approach in how to exploit all of this processing power and (oh by the way, as the man in the corner says) get a better SW architecture in place that can take advantage of all of this. Things today are just soooo inefficient.
Best of luck!!!
Re:Different Power Supply Voltage by Moof123 · 2014-06-23 04:37 · Score: 4, Interesting

Banging my head on the table right now.
Why do people with zero actual semiconductor knowledge try to speak as an authority*?!
It's a research chip, meaning they don't need to be on the latest process node to show their proof of concept. Larger nodes (much cheaper to design a chip on) have thicker gate passivation layers and run at higher voltage. From an architecture standpoint the process node/voltage are irrelevant. So if their architecture proves out, some bigger outfit can run with it while targetting the latest-greatest itty-bitty process node to increase the clock-rate, drop the power, and reduce the area/price.
*I am not a processor designer, just a mixed signal (mostly analog) guy, but I've been working in the semiconductor industry, including doing process bake-offs for over a dozen years.
Re:Different Power Supply Voltage by Ralph+Wiggam · 2014-06-23 05:07 · Score: 1

Why do people with zero actual semiconductor knowledge try to speak as an authority*?
Is this your first day on Slashdot?
lots of other many-core processors by loufoque · 2014-06-23 05:17 · Score: 1

There are hundreds of processors with 64 cores or more, each of them claiming to have solved the scalability problem.
Re:Different Power Supply Voltage by wagnerrp · 2014-06-23 05:27 · Score: 1

The toasty end of boring desktop CPUs is somewhere north of 200watts already
Well... somewhere south of 100W, anyway, and even high end workstation/server chips are under 150W.
Re:Different Power Supply Voltage by wagnerrp · 2014-06-23 05:35 · Score: 1

You people seem to forget we're dealing with chips that have features counted in individual atoms. 1V across three atoms may work, 1.1V across three atoms arcs over.
Luckily we're still dealing with features hundreds of atoms across, and not just three...
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 05:45 · Score: 0

Oh look, it's Mr Heat Controls Transistors, who still hasn't provided a single source for his heat theory.
But anyways
https://www.youtube.com/watch?...
Three atoms, dickweed.
Re:Different Power Supply Voltage by drinkypoo · 2014-06-23 05:50 · Score: 1

I figure you must be some type of alcoholic or have another substance problem. Can you confirm?
Yeah, I'm allergic to stupidity. I have to take a pill before I can come anywhere near slashdot, and keep an inhaler and epi pen on hand.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Why I like multi-core: by packrat0x · 2014-06-23 05:52 · Score: 1

Because Windows programs have a habit of taking over a processor; acting like I am still using DOS.

--
227-3517
Nice design. by Animats · 2014-06-23 06:07 · Score: 1

This is a nice little trick. This has the potential to extend shared consistent memory multiprocessor designs to far larger numbers of processors. Whether this is a performance win remains to be seen. Good idea, though. Note that the prototype chip is just a feasibility test; they used an off the shelf Power CPU design, added their interconnect network, and send the job off to a fab. A production chip would have optimizations this does not.
We known only two general purpose multiprocessor architectures that are broadly useful - shared consistent memory multiprocessors, and clusters of machines with no shared memory. Dozens of other schemes have been tried - SIMD machines (the Connection Machine), non-shared memory with DMA to a bigger memory (the Cell), message passing to adjacent machines in N dimensions (Hypercube), message passing over an on-chip network (several examples), cross-CPU DMA access (Infiniband) and shared memory without cache consistency (Intel experimental). In all cases, the hardware worked, the programming was a problem from hell, and the concept was dropped. The Cell in the PS3 is the only high-volume product with an exotic multiprocessor architecture, and that was such a pain that the PS4 dropped it for a more conventional architecture.
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 06:55 · Score: 0

https://en.wikipedia.org/wiki/...
Not a big deal by AaronW · 2014-06-23 07:11 · Score: 1

I don't see what the big deal is. I'm currently working with early silicon on a cache coherent 48-core 64-bit MIPS chip with NUMA support and built-in 40Gbps Ethernet support. The chip also has a lot of extended instructions for encryption and hashing plus a lot of hardware engines for things like zip compression, RAID calculations, regular expression engines and networking support among other things. It also has built-in support for content addressable memory.
It also has a network on-chip where each core or group of cores can have its own network interface to other cores. This is useful for things like virtualization or when you want to run multiple Linux kernels and other applications side by side since we also support running binaries on bare metal without an OS underneath.
http://cavium.com/OCTEON-III_C...

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
Scala? by bigsexyjoe · 2014-06-23 08:37 · Score: 1

Maybe Scala can be your language. It supports creating your code out of mostly immutable objects, which makes it good for parellelism.

--
Democracy Now! - your daily, uncensored, corporate-free
no sign of a sensible debate then by Anonymous Coward · 2014-06-23 09:05 · Score: 0

So seriously I know it gets bitchy In the comments some times but today I scrolled down to what appears to be a bunch a kids bickering.
Get a grip and try and have an intelligent discussion!
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 09:15 · Score: 0

P4 Northwood processor load sink: 89W (source: my own 2.66GHz single core and board-bundle monitoring), the design spec TDP for a Northwood is 67W@2.2GHz-103W@3GHz (source: Intel). The Extreme Edition processors are pretty much unlocked and will suck in whatever power's available, the Gallatin cores are all >100W TDP. Late Prescotts are all 115W. Pentium Ds are all shy of 150W (source: Wikipedia). The server lines (eg Itanium, Xeon, Pentium Pro) are all comparable in terms of TDP to the midrange desktop processors.
Re:Different Power Supply Voltage by viperidaenz · 2014-06-23 09:59 · Score: 1

Current 22nm Intel CPU cores run perfectly fine with a core voltage of 1.26V.
Re:Different Power Supply Voltage by viperidaenz · 2014-06-23 10:00 · Score: 1

Current CPU's can run perfectly fine on 1.1V.
A common core voltage is currently 0.6V - 1.35V, depending on clock.
Voltage is a function of process technology, not system architecture.
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 10:21 · Score: 0

Yeah, like the massive stupidity of not grasping that when you have IC features three atoms thick, that 0.1V more spells the difference between "working" and "arced over".
You must have quite a drug setup, it can't be easy being allergic to your own stupidity.
At 34:40 minutes it starts.
https://www.youtube.com/watch?...
Three. Atoms.
Shitguzzler.
Re:Different Power Supply Voltage by cheater512 · 2014-06-23 10:25 · Score: 1

Add an extra atom. Pretty simple. No reason why it has to be 3 atoms thick.
In fact I hear that a few years ago the smallest features were hundreds of atoms! Who knows how they managed to deal with this tricky issue of higher voltages.
Re:Different Power Supply Voltage by wagnerrp · 2014-06-23 10:26 · Score: 1

Intel abandoned the Netburst architecture in the mid-2000s. All that hardware is a decade old, except for the Pentium Pro, which is almost two decades old.
Re:Different Power Supply Voltage by fuzzyfuzzyfungus · 2014-06-23 11:32 · Score: 1

Post Netburst, AMD is the one having TDP issues, and their current enthusiast-gamer-nutjob CPU is specced at 220 watts. Intel has their numbers down from the Prescott Pentium D days, though the use of 'TDP' rather than peak, and thermal throttling that actually works, makes it a little tricky to pin a precise ceiling value on some of them without actually getting out the test equipment.

Most are, of course, much lower, given the popularity of laptops and desktops that don't need water cooling, and so on.

My intended point, which I should have clarified better, is that 150-200watt CPUs, while the market generally doesn't like them, can, are, and have been, sold for use by relatively unskilled users running cheaply mass constructed computers under minimally controlled 'room temperature' conditions, so it is only reasonable to assume that, were a part with a moderately alarming power draw to have some virtue for server use that compensated for that, it could be made to work with relatively little fuss. It'd probably be really noisy once they got it down to 1-2U, and the hot aisle would be even less pleasant than usual; but if people wanted them no major engineering problems would have to be overcome to deliver.
Re:Different Power Supply Voltage by Anonymous Coward · 2014-06-23 12:19 · Score: 0

From TFA:

After testing the prototype chips to ensure that they’re operational, Daya intends to load them with a version of the Linux operating system, modified to run on 36 cores, and evaluate the performance of real applications, to determine the accuracy of the group’s theoretical projections. At that point, she plans to release the blueprints for the chip, written in the hardware description language Verilog, as open-source code.
It's a Verilog RTL core, nothing about an RTL core dictates supply voltages. You can tailor the synthesis to target any supply voltage or operating frequency you want.
I can't believe this wasn't mentioned in the summary, as it's probably the most significant aspect that she intends to release it as open-source. There are other NoC processors like Tilera, who recently released a 72 core chip. What is innovative here is that it uses scoreboarding and knowledge of the chip topology to know what messages could arrive before they arrive to deal with message reordering and that the design will be open source, neither of which was mentioned in the headline or summary.
Re:Different Power Supply Voltage by wagnerrp · 2014-06-23 13:28 · Score: 1

and their current enthusiast-gamer-nutjob CPU is specced at 220 watts.
I'll admit, the AMD FX was the only line I didn't check before posting. Their next closest chips are only 140W, and they've only got a couple at that. Most are 115W or lower. I didn't even know the AM3 socket was capable of 220W.
Re:Different Power Supply Voltage by fuzzyfuzzyfungus · 2014-06-24 01:03 · Score: 1

Based on the mixed reviews, it sounds like 220w is really pushing your luck unless the motherboard has some heroically overqualified VRM onboard, and your PSU is descended from an arc welder on its mothers side; but I've yet to see a single report of somebody actually fusing a pin rather than just crashing a lot, so apparently the socket is tougher than it looks. I was very surprised to see such a part being sold at that power level, though, rather than just 'unlocked, and we'll just look the other way'.
Spin Ph.Ds? by Meeni · 2014-06-24 08:34 · Score: 1

MIT is expert a making these sort of PR stunts were they claim they invented something novel when they replicate some boring old result from 10yr ago. Well, here it is 30yr ago.