Intel Announces Xeon E5 and Knights Corner HPC Chip

← Back to Stories (view on slashdot.org)

Intel Announces Xeon E5 and Knights Corner HPC Chip

Posted by Unknown on Wednesday November 16, 2011 @06:06AM from the one-treeleeon-flops dept.

MojoKid writes "At the supercomputing conference SC2011 yesterday, Intel announced its new Xeon E5 processors and demoed their new Knights Corner many integrated core (MIC) solution. The new Xeons won't be broadly available until the first half of 2012, but Intel has been shipping the new chips to a small number of cloud and HPC customers since September. The new E5 family is based on the same core as the Core i7-3960X Intel launched Monday. The E5, while important to Intel's overall server lineup, isn't as interesting as the public debut of Knights Corner. Recall that Intel's canceled GPU (codenamed Larrabee) found new life as the prototype device for future HPC accelerators and complementary products. According to Intel, Knights Corner packs 50 x86 processor cores into a single die built on 22nm technology. The chip is capable of delivering up to 1TFlop of sustained performance in double-precision floating point code and operates at 1 — 1.2GHz. NVIDIA's current high-end M2090 Tesla GPU, in contrast, is capable of just 665 DP GFlops."

122 comments

Min score:

Reason:

Sort:

Re:Pfffft! by ColdWetDog · 2011-11-16 06:12 · Score: 2

4chan is still down? Maybe we should lend them a hand.

--
Faster! Faster! Faster would be better!
may I say by Anonymous Coward · 2011-11-16 06:13 · Score: 0

time to do some bitcoin mining?
Huh? by PowerCyclist · 2011-11-16 06:15 · Score: 2

I mostly understand the figures this post states, but it sounds like engineering dialog from 'Star Trek: Voyager'. But, all this means to me is that the chips from last year are now cheaper that they've been out-classed.
1. Re:Huh? by Anonymous Coward · 2011-11-16 06:19 · Score: 1
  
  Summary: Faster chips out. You can't get them. Also a 50 core chip was released.
2. Re:Huh? by Surt · 2011-11-16 06:37 · Score: 1
  
  If you could make your question clearer, you'll probably get a more effective answer.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
3. Re:Huh? by gstoddart · 2011-11-16 07:26 · Score: 1
  
  If you could make your question clearer, you'll probably get a more effective answer.
  An interjection followed by two statements does not a question make. ;-)
  
  --
  Lost at C:>. Found at C.
4. Re:Huh? by Anonymous Coward · 2011-11-16 07:36 · Score: 1
  
  huh?
5. Re:Huh? by SuricouRaven · 2011-11-16 08:11 · Score: 2
  
  More importantly, an x86 chip. Not a GPU. Which means anyone who knows even the fundamentals of programming can use one with minimal additional training. No screwing around with the inability of GPUs to do recursion or deep nesting, no trying to deal with your data as if it were a texture. Just code and go.
6. Re:Huh? by Surt · 2011-11-16 08:42 · Score: 1
  
  Huh?
  A question asked directly does, though.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
7. Re:Huh? by gstrickler · 2011-11-16 09:07 · Score: 1
  
  You mean the fundamentals of parallel processor programming. It's not exactly a widely held skill yet.
  
  --
  make imaginary.friends COUNT=100 VISIBLE=false
8. Re:Huh? by Anonymous Coward · 2011-11-16 09:15 · Score: 0
  
  But you still need a good parallel algorithm -- a much bigger and more fundamental issue for many problem areas than learning some oddball programming skills, and somewhat worse than needing a given good parallel algorithm, to also be a non-recursive algorithm.
9. Re:Huh? by gstoddart · 2011-11-16 09:33 · Score: 1
  
  Huh?
  A question asked directly does, though.
  Well, to continue the pedantry ... in and of itself, "Huh?" is merely an interjection.
  
  Interjection is a big name for a little word. Interjections are short exclamations like Oh!, Um or Ah! They have no real grammatical value but we use them quite often, usually more in speaking than in writing. When interjections are inserted into a sentence, they have no grammatical connection to the sentence. An interjection is sometimes followed by an exclamation mark (!) when written.
  It's a grammatical equivalent to a grunt.
  Ergo, no question was ever posed. :-P
  
  --
  Lost at C:>. Found at C.
10. Re:Huh? by Surt · 2011-11-16 09:44 · Score: 1
  
  http://www.google.com/search?q=define+huh%3F&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a
  I think the use of a ? pretty clearly moves it from mere interjection to inquiry / expression of confusion. If he had titled his post Huh. instead of Huh? I'd agree with you.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
11. Re:Huh? by SuricouRaven · 2011-11-16 10:50 · Score: 1
  
  KC will have no problems with recursion, it's an x86. It's GPUs that don't do recursion. At least not current ones - remember they were made for processing graphics, not something where recursion is of much use.
12. Re:Huh? by GigaplexNZ · 2011-11-16 11:05 · Score: 1
  
  Whoosh. Parent was suggesting that GPUs lack of recursion is not the big hurdle, knowing how to do parallel algorithms in the first place is the biggest issue. x86 doesn't solve that.
13. Re:Huh? by Anonymous Coward · 2011-11-16 13:15 · Score: 0
  
  IBM's Cell processor reached 1Tflop per-processor 7 years ago, and thats using a 90nm process.
14. Re:Huh? by HappyPsycho · 2011-11-16 15:12 · Score: 1
  
  Just to clarify, the 50-core beast hasn't been released as yet.
  They gave a demo of what is most likely a prototype chip.
15. Re:Huh? by Anonymous Coward · 2011-11-16 16:46 · Score: 0
  
  You mean the fundamentals of parallel processor programming. It's not exactly a widely held skill yet.
  Wrong.
  It is the fundamental of parallel-vector-machine programming. All GPUs on the market are today PVMs.
  The MIC is not a PVM, it has many independent processing cores which each have limited vector capabilities, it is still a parallel computer in that it is doing multiple operations concurrently, but there is no requirement (aside from bus contention) for the operations to execute on aligned vectors.
  This makes the MIC much more versatile and applicable to problems that can't be efficiently computed on a PVM, such as raytracing, or any other problem which requires independent branching (for example anything that requires recursive-descent, such as tree searches). And if it can deliver the performance claimed, on time, it will be within one generation of the performance of PVMs, making them temporarily obsolete, until someone can again meet Intel's performance figure with a PVM.
16. Re:Huh? by Anonymous Coward · 2011-11-16 19:34 · Score: 0
  
  Not even close, apples to apples.
  If you assume 8 spus per chip (PS3 only has seven turned on), running at 3.2GHz (PS3 clock rate) counting FMAC as two FLOPs * 4 way SIMD you get about 200G single precision FLOPS - assuming you can retire one FMAC per cycle which you can't. The SPUs on the cell also took 10x longer for double precision AND could only do two at a time giving you a theoretical 10G FLOPS double precision. Intel is claiming 1TFLOP double precision. Apples to apples, cell is 1% best case.
  Not only that but the SPUs have about 15GB/s bandwidth to main memory. At four bytes per float you only get about 2 GFLOPS worth of data, 8GB/s read, 8GB/s write so I hope you're doing a lot of math per float. While they don't specify I'd bet there is quite a bit more bandwidth available on the Intel chip but still nowhere near enough to support that kind of compute power.
17. Re:Huh? by makomk · 2011-11-17 02:35 · Score: 1
  
  Not exactly. You haven't needed to treat your data as though it was a texture on GPUs for a couple of generations now, and getting decent performance out of Knights Bridge means writing very similar incredibly-wide SIMD code to what GPUs use - except that GPUs have some decent tools to making the porting process easier, and I'm not sure if Knights Bridge does. (For example, if you have a loop over a bunch of elements that does the same operations on each but does a different number of passes for different elements, GPUs can generate masking code to emulate this and drop out of the loop once it's done. The same's true if you've got an IF statement with code that only applies to a small number of elements - the entire 16-wide GPU thread has to pay the performance cost if any element triggers it, but the GPU can automatically mask out the rest and can skip the entire section if nothing requires it.)
Dead by Anonymous Coward · 2011-11-16 06:17 · Score: 0

Xeon is dead remember
-L. Ellison
Little Intel has growed up by ackthpt · 2011-11-16 06:21 · Score: 0

When they said nobody needed multicore processors I heard the echos of "640K should be enough for anyone" and "There is no reason for any individual to have a computer in his home" Now they're trying to see how many they can jam on one die. 50 is a pretty odd number, though. Usuall see things in powers of 2 (2, 4, 8, 16) Perhpas they neede space on the die for Mickey or an etched portrait of Jobs.

--

A feeling of having made the same mistake before: Deja Foobar
1. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 06:24 · Score: 1
  
  When they said nobody needed multicore processors
  [citation needed]
2. Re:Little Intel has growed up by Maximum+Prophet · 2011-11-16 06:27 · Score: 1
  
  More than likely, there's 64 cores but only 50 are activated because they can't get a decent yield of perfect chips. That also means that you might be able to get samples of 25 core chips that didn't even make the 50 core cutoff. (One core might also be dedicated for book keeping purposes)
  
  --
  All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
3. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 06:27 · Score: 3, Insightful
  
  Odds are... they have it lined up such that... they are in a 5x10 grid. Or a 5x5 Grid front/back.
  Just because it's a computer doesn't mean it's bound by the power of two. Boards are rectangular. Chips laid out aren't necessarily in binary distribution.
4. Re:Little Intel has growed up by Azmodan · 2011-11-16 06:33 · Score: 1
  
  This would be a +1 if you werent posting as AC.
5. Re:Little Intel has growed up by UnknowingFool · 2011-11-16 06:34 · Score: 1
  
  Your average consumer doesn't need 50 cores. For HPC which this was designed for, multiple cores are essential. As for the number of cores, I would guess die size was a factor. There also might be redundancy. It's a RAID 50 CPU. ;)
  
  --
  Well, there's spam egg sausage and spam, that's not got much spam in it.
6. Re:Little Intel has growed up by mikael · 2011-11-16 06:36 · Score: 1
  
  I'm guessing there would have to be glue logic to get all these processors to share the memory space as well as read/write access. From the promotional pictures of other multi-core chip dies, each core is usually surrounded by a band of interface logic as well as a hefty large block of cache memory. That seems to be the biggest change in the evolution of CPU's. It seems easier to just create larger caches or more cores than anything low level.
  Maybe they accept one or more non-functional cores in exchange for increased yields. Those cores that don't function correctly could simply be disabled.
  
  --
  Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
7. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 06:37 · Score: 0
  
  always scaling by powers of 2 only really makes sense when the only resource impacted by the increase in size is the registers that store values. computers store numbers in binary, so yes, as you add bits to a register, the number of cases it supports doubles for each bit added. hence, color depth over the history of gaming went from 8-bit, to 16-bit, to 32-bit, etc. But nothing about the process by which you physically etch a copy of a cpu layout onto a wafer of silicon scales naturally in powers of two. The only thing that scales naturally as a power of two for that is the variable in which you store information like the unique-identifier for each core on the die. But the number of bits required to assign each core an id number is not the limiting factor. The limiter is the physical size and shape of each core, and the number of times the individual core area divides into the manufacturing constrains of the equipment that does the etching.
8. Re:Little Intel has growed up by fuzzyfuzzyfungus · 2011-11-16 06:38 · Score: 3, Informative
  
  Intel's period of dismissive attitude toward advanced features(multiple cores, 64-bit support on x86, something that sucked less than FSB) was never really serious. Back when they still thought that they had a chance of making IA64 the 'serious' platform and gradually letting x86(and AMD) sink into the bargain bin, they did some tactical rubbishing of what "normal users" needed in order to justify restricting those features to the high-end SKUs; but they worked on them.
  
  Once it became clear that that particular plan wasn't a happening thing, and that AMD was delivering serious server parts and knockdown prices, and Nvidia was doing interesting things with GPUs, and ARM licensees were pumping out increasingly zippy low-end chips, they stopped fucking around. These days they'll still charge as hard as they can for the features provided; but their hopes of sandbagging x86s in order to sell IA64s are dead
9. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 06:50 · Score: 0
  
  I believe I read somewhere (maybe HPCWire) that there are inactive cores on the chip, and the reason is a combination
  of yield and heat management.
10. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 06:50 · Score: 0
  
  When they said nobody needed multicore processors
  [citation needed]
  [{Fallacy:Appeal to authority}]
  (Protip: It does not make something more true or false, if person X links to person Y stating it with quotes and "Intel said" around it, than if person X states it directly. Maybe you hung in Wikipedia mailing lists for too long.)
11. Re:Little Intel has growed up by Surt · 2011-11-16 06:51 · Score: 1
  
  Who said nobody needed multicore processors? That seems like a pretty unlikely claim, particularly from intel who were very much into selling multi-cpu systems to the high-end long before multicore became the norm. I had a dual-socket pentium II consumer grade system ages ago. That we were headed to multicore was obvious even then.
  
  --
  "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
12. Re:Little Intel has growed up by David+Greene · 2011-11-16 06:53 · Score: 4, Informative
  
  Your average consumer doesn't need 50 cores.
  Sure they do. What do you think a GPU is? History has shown over and over that we can never have enough computing power. Now that we're at the physical limits of clock speeds, parallelism is going mainstream.
  
  --
13. Re:Little Intel has growed up by Bengie · 2011-11-16 07:00 · Score: 1
  
  "Your average consumer doesn't need 50 cores [yet]"
  Games are getting pretty good at using my 1536 core GPU, which is just a co-processor
14. Re:Little Intel has growed up by RightSaidFred99 · 2011-11-16 07:02 · Score: 1
  
  What natural phenomenon would require that the number of course on a chip be a power of 2? I can't think of any.
15. Re:Little Intel has growed up by mlts · 2011-11-16 07:08 · Score: 3, Interesting
  
  I wonder if Intel is taking a page from IBM's playbook.
  Upper end POWER7 CPUs have the ability to have half their cores turned off. The cores that are on can then use the disabled neighbor's caches, and run at a higher clock speed. For some things, this switch actually speeds up some tasks that can't be evenly broken up into balanced threads.
  I can see Intel doing this where some cores are disabled due to manufacturing defects (which happen to all dies), and having the operable cores use nearby caching which would otherwise go to waste.
16. Re:Little Intel has growed up by Desler · 2011-11-16 07:17 · Score: 1
  
  Now that we're at the physical limits of clock speeds,
  Since when? You can easily overclock most modern chips to 4ghz and with enough cooling to 5 or 6+ ghz. The i7 sandy bridge chips for example have been overclocked past 6ghz. So exactly what supposed "physical limit" do you mean?
17. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 07:19 · Score: 0
  
  Intel's been doing that since the Core i5 line, I believe; they call it "turbo mode" or some other such insipid marketing name. The idea is sound, though; disable one core and run the higher one at a greater speed; as long as it's within the same thermal element it should be fine.
18. Re:Little Intel has growed up by Kristian+T. · 2011-11-16 07:20 · Score: 1
  
  Reminds me that I have a dual Deschutes 350 in the attic somewhere. Served me faithfully from 1998 to 2004. If it wern't for the 128MB of memory and the price of electricity - I might still have it do..... uhm something. Trouble is it's still hard to do multithreading, and our programming languages are still inherently single thread, maybe with some thread primitives glued on.
  
  --
  Run with the lemmings, and you'll get your feet wet.
19. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 07:22 · Score: 0
  
  It's Larrabee reborn! Whatever.
20. Re:Little Intel has growed up by Sloppy · 2011-11-16 07:25 · Score: 1
  
  Your average consumer doesn't need the 80386. There's hardly any software compiled to take advantage of its features anyway. I can see maybe someone using them for servers, but that's a pretty small niche.
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
21. Re:Little Intel has growed up by zpiro · 2011-11-16 07:36 · Score: 3, Interesting
  
  At 6Ghz, you are very close to the speed of light in copper, so unless you can break the speed of light... its a "physics limit".
  Below this point you have the problem of energy efficiency, i.e. whats the point of spending more energy on cooling than on actually powering the thing?
  Intel's 3d-transistors are HUGE because of this, they can push higher clock speed more easily.
22. Re:Little Intel has growed up by gstoddart · 2011-11-16 07:38 · Score: 4, Informative
  
  What natural phenomenon would require that the number of course on a chip be a power of 2? I can't think of any.
  Because computers count in binary, which is powers of two. And, I'll assume you meant cores.
  Historically such things have been powers of two to make the addressing simpler without having extra magic or control lines left over. So, 1, 2, 4, 8, 16, 32 and 64 all make sense in terms of being expressable in a fixed number of bits ... 50 to some of us seems like a fairly arbitrary choice. Since you use an unusual combination of wiring, it might as well be 37 or 51 since it's not a number that 'naturally' lends itself to computers. The device is likely wired in such a way that it could count to 64 ... or they're doing things in a slightly odd way.
  Anyway, that's why some of us find it to be a little odd. And it's also why the hard-drive makers deciding "1 GIG" is "1,000,000,000 bytes" is irksome ... with all of those extra powers of two, it should be "1 073 741 824 bytes". Which means you lose about 72MB/GIG ... so my 2TB drive isn't.
  
  --
  Lost at C:>. Found at C.
23. Re:Little Intel has growed up by Maximum+Prophet · 2011-11-16 07:45 · Score: 1
  
  Addressing.
  
  Let's say you've set aside 6 bits in every data structure that deals with core administration. You can grow to 2^6, or 64 cores without re-architecting your data structures.
  
  As long as we are using binary in computers, making everything 2^N will make the most efficient use of space.
  
  Of course, space isn't always the limiting factor, so sometimes for cost or speed reasons, we see objects that number 2^N-M.
  
  --
  All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)
24. Re:Little Intel has growed up by gstoddart · 2011-11-16 07:46 · Score: 1
  
  Your average consumer doesn't need the 80386. There's hardly any software compiled to take advantage of its features anyway. I can see maybe someone using them for servers, but that's a pretty small niche.
  I remember almost exactly that quote in PC Magazine back in the day. I think at the time it was the 80486, but same thing. they probably said the same thing about the '386 too.
  Of course, I have a quad-core machine sitting on my desk at home with 8GB of RAM, and running at a clock speed two orders of magnitude higher than my '486 did. :-P
  So, obviously consumer needs for CPU speed is growing far faster than anyone would have predicted in the late 80's/early 90's. I still remember the first time I saw a PC with a 1GB hard-drive ... a bunch of us stood around it thinking "WTF will we ever do with that much disk space?".
  
  --
  Lost at C:>. Found at C.
25. Re:Little Intel has growed up by jessehager · 2011-11-16 07:49 · Score: 1
  
  Tilera's 100-core processor is built like this. It's a 10x10 grid of cores.
26. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 07:51 · Score: 0
  
  If it were a square with a root of a power of two.
  Admit that you would expect it to at least be a square.
  50^.5 = 7.07. The next whole number is 8.
  8^2 = 64, happens to be a power of two.
27. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 07:54 · Score: 0
  
  And if you count cores in such a way as to get 1536 off a GPU, there are many more than 50 cores on MIC. Quite right, though, with the right application the cores are easily used.
28. Re:Little Intel has growed up by gstoddart · 2011-11-16 08:01 · Score: 3, Insightful
  
  No, it means you're an idiot who cannot deal with the fact that the prefixes have a specific meaning unless one is talking about computer
  
  So, are you always an asshole, or just on Slashdot?
  
  --
  Lost at C:>. Found at C.
29. Re:Little Intel has growed up by RightSaidFred99 · 2011-11-16 08:01 · Score: 1
  
  Indeed, cores. And I still don't see any reason, and AMD has 3 core processors. I can have 3G of memory. I can have 9G of memory. Binary numbers are not pervasive by mandate in all areas of computing.
  Though I do agree base-10 usage for hard drives is ridiculous.
30. Re:Little Intel has growed up by RightSaidFred99 · 2011-11-16 08:02 · Score: 1
  
  Nah. That extra bit you lose isn't going to cause anyone any heartburn. And nobody's using 6 bits for anything. It's a specious argument.
31. Re:Little Intel has growed up by Anomalyst · 2011-11-16 08:10 · Score: 0
  
  I'm guessing its an incurable condition brought on by exposure to Apple iDevices
  
  --
  There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
32. Re:Little Intel has growed up by gstoddart · 2011-11-16 08:19 · Score: 2
  
  I'm guessing its an incurable condition brought on by exposure to Apple iDevices
  Well, since I own 3 iPods and an iPad ... you'd think I'd be the one being accused of being an asshole by that logic.
  I'm going to go with self-righteous prick who feels entitled to be an ass on the internet because he's got a 5-digit Slashdot ID and therefore considers himself to be l337.
  
  --
  Lost at C:>. Found at C.
33. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 08:36 · Score: 0
  
  Then who the fuck is "they". "They" seem to always say things, but nobody can ever identify who "they" are. It's almost as those "they" don't exist at all and are merely a convenient fabrication for the sake of being able to argue one's own opinion or set up a strawman.
34. Re:Little Intel has growed up by gstoddart · 2011-11-16 08:47 · Score: 2
  
  Indeed, cores. And I still don't see any reason, and AMD has 3 core processors. I can have 3G of memory. I can have 9G of memory.
  Well, in fairness, on the memory side, you do that with some combination of memory modules which are addressable by powers of two. (eg. 2GB + 1GB, or 4GB + 4GB + 1GB), each of which is discrete from the others. I don't believe you can buy a 3GB or 9GB memory module.
  
  Binary numbers are not pervasive by mandate in all areas of computing.
  Nope, absolutely not. Not saying that ... just saying that traditionally such things have been architected to use powers of two because it was most efficient.
  Obviously, for other reasons, Intel decided to go with 50 cores.
  
  --
  Lost at C:>. Found at C.
35. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 08:49 · Score: 0
  
  At 6Ghz, you are very close to the speed of light in copper, so unless you can break the speed of light... its a "physics limit".
  During one 6 GHz clock cycle, light travels 50 cm, or something like 50 times the width of a CPU die. I'd say there's easily room to go up to 60 GHz before the speed of light becomes a serious limitation. Ought even to be able to push off-die busses towards 10 GHz before running into that particular physics limit.
36. Re:Little Intel has growed up by hechacker1 · 2011-11-16 09:00 · Score: 1
  
  I agree generally, like AMD's bulldozer hitting 8GHz on a single core before failing to the limits of physics (even with extreme cooling). I'm assuming nobody will never be able to get more than 1 or 2 cores active (out of 8) while getting to 8GHz on that architecture.
  But these days, the chips run in multiple clock domains. I believe the Intel chips are separated by a base clock, L3 Clock, Core clocks, RAM clocks, and bus clocks. The architectures are moving ever toward asynchronous operation in order to pack billion upon billion of transistors on a package without having to synchronize them all the time.
37. Re:Little Intel has growed up by Score+Whore · 2011-11-16 09:32 · Score: 1
  
  Well, in fairness, on the memory side, you do that with some combination of memory modules which are addressable by powers of two. (eg. 2GB + 1GB, or 4GB + 4GB + 1GB), each of which is discrete from the others. I don't believe you can buy a 3GB or 9GB memory module.
  Certain models of Xeon processor have three memory controllers. Which, when configuring for maximum memory bandwidth, leads to memory being measured in terms of three times powers of two (3 x 2^30.)
38. Re:Little Intel has growed up by Kjella · 2011-11-16 09:34 · Score: 1
  
  I'm guessing 5x10, if you look at their Intel Core i7 3960X the cores are about twice as wide as they are high.
  
  --
  Live today, because you never know what tomorrow brings
39. Re:Little Intel has growed up by yuhong · 2011-11-16 09:37 · Score: 1
  
  Ah, the disaster that is the move from real to protected mode.
  Summary: First fiasco was that in year 1982 MS ignored the announcement of the 286 around the time and proceeds to develop a real-mode multitasking version of DOS, and only in around 1985 when IBM refused to license it that it was realized it was a mistake. And while the resulting OS/2 1.x sucked and lost it's chance with Windows 3.x (which was incompatible and both designed for 16-bit protected mode), second fiasco was when MS broke the JDA with IBM in year 1991 before the 32-bit OS/2 2.0 (which had been developing since year 1989) was even given a chance. Then later on MS attacked OS/2, particularly in the Wrap era when MS resorted to tactics like astroturfing (look up "OS/2 Microsoft Munchkins" for example). Imagine if MS embraced OS/2 instead. Both fiascos delayed the move to protected mode by years, not to mention MS's attacks on DR-DOS as OS/2 did not depend on DOS.
40. Re:Little Intel has growed up by Score+Whore · 2011-11-16 09:38 · Score: 1
  
  It's not storing 6 bits in a data structure. It's running traces (if that's even what they're called in IC design) throughout the die connecting these things together. At that level adding two extra traces to carry those two bits is an expense you might want to forgo. However once you've got six wires/bits out there, the only reasons I can think of to not use 64 whatevers is the previously mentioned heat management and die yield issues.
41. Re:Little Intel has growed up by Sloppy · 2011-11-16 10:09 · Score: 1
  
  Still bitter?
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
42. Re:Little Intel has growed up by SuricouRaven · 2011-11-16 10:12 · Score: 1
  
  In 32-bit color, only 24 bits are actually used. It's just more efficient to process one pixel in a 32-bit register than have to screw around with ANDs and shifts to get the data you want. The leftover eight bits are usually zeroed, sometimes used to store alpha or depth information.
43. Re:Little Intel has growed up by RocketRabbit · 2011-11-16 10:22 · Score: 0
  
  Overclockers are up to 8.4 GHz now, with AMD chips.
44. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 10:35 · Score: 0
  
  AMD has 3 core processors
  Technically, AMD has 4 core processors that have had a core disabled for various reasons.
45. Re:Little Intel has growed up by David+Greene · 2011-11-16 10:38 · Score: 1
  
  Since when?
  Since the point we reached the ability to handle the power and heat dissipation requirements economically. Engineering is about tradeoffs. Until we get better materials, multicore is more cost-effective than push the clock beyond the reasonable cost envelope.
  
  --
46. Re:Little Intel has growed up by Sebastopol · 2011-11-16 10:42 · Score: 1
  
  except for that pesky 8-bit alpha channel, which clearly isn't used.
  
  --
  https://www.accountkiller.com/removal-requested
47. Re:Little Intel has growed up by yuhong · 2011-11-16 10:49 · Score: 2
  
  Yea, I know it is too late. The good news is that the x64 transition went much better.
48. Re:Little Intel has growed up by Anonymous Coward · 2011-11-16 11:17 · Score: 0
  
  The architectures are moving ever toward asynchronous operation in order to pack billion upon billion of transistors on a package without having to synchronize them all the time.
  If by this you mean they're going for true asynchronous (no-clock) design, then no, can't agree. There's been research into async circuit design over the years but it's never achieved any real commercial success. The latest try I know of is Achronix, a FPGA startup which uses async techniques in its FPGA fabric, with tools that are supposed to automate the job of translating your clocked logic design to async. It remains to be seen if the technology will be commercially viable.
  If you mean they're using more clock domains and that some domains may be asynchronous WRT other clock domains, then yeah, that's true. It's a bear to synchronize things across the whole die at multiple GHz, and you don't tend to need the whole chip under one clock tree in multi-core processors anyways.
49. Re:Little Intel has growed up by magnusk · 2011-11-16 11:30 · Score: 2
  
  No, light travels 5cm in one 6 GHz clock cycle, in a vacuum. Speed of light limitations have been a consideration for years. The Cray1 was designed in the early 70s and its physical design allowed for the propagation speed of electricity in copper. It only ran at 80MHz. It's not just about cycle time - what's the duration of your edges? What other latencies are there in the electronics? In 2004, IBM's POWER5 MCM was 9.5cm wide and the CPUs ran at ~2GHz. Not sure what speed the interconnect ran at.
  
  --
  Music at http://www.ignorantbliss.co.uk/
50. Re:Little Intel has growed up by c · 2011-11-16 11:36 · Score: 1
  
  The device is likely wired in such a way that it could count to 64 ... or they're doing things in a slightly odd way.
  Or it's 64 cores with an average usable yield of 50 "good" ones.
  
  --
  Log in or piss off.
51. Re:Little Intel has growed up by petermgreen · 2011-11-16 14:22 · Score: 1
  
  Well, in fairness, on the memory side, you do that with some combination of memory modules which are addressable by powers of two. (eg. 2GB + 1GB, or 4GB + 4GB + 1GB), each of which is discrete from the others. I don't believe you can buy a 3GB or 9GB memory module.
  However certain intel processors do use interleaved triple channel memory so there must be a division by 3 going on in the memory addressing system somewhere.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
52. Re:Little Intel has growed up by CODiNE · 2011-11-16 23:30 · Score: 1
  
  With cores it's a little bit different than RAM in that you're physically limited by how many you can squeeze in a certain size.
  So the addressing may be limited to 16, 32, 64, whatever cores, but physically you may not quite be able to get 16 in that space so they might say max out at 14 and then get a few dead ones here and there so end up selling 8, 10's and 12's with the rare perfect 14's being used for some special customers.
  Now with the 50 cores, you might actually have 53 or so actually WORKING cores but the extra 3 are turned off just so they sell a nice pretty round number like 50 and not 51.
  
  --
  Cwm, fjord-bank glyphs vext quiz
53. Re:Little Intel has growed up by Anonymous Coward · 2011-11-17 00:35 · Score: 0
  
  and die area (die can accommodate fixed number of transistors), so if you have die area for 40 cores you will probably not want to make chip with 32 cores and waste reminder you will either make 40 core chip, or in case your yields are bad (as in 1 or 2 cores are dead on big percentage of chips) you say, OK this chips are sold as 38 core, 32 is more "round" number but if they can put a bit more i am sure users would not mind extra 20% performance for free
54. Re:Little Intel has growed up by Shinobi · 2011-11-17 00:51 · Score: 1
  
  The SI prefixes are specifically base-10 units, and have been so since the 1800's, with the metric system, and later adapted into the SI system. The fact that computer scientists and programmers misused the units and disregarded an established standard of communications and data encapsulation, and the fact that people STILL do it, is what's vexing, not the fact that the storage manufacturers have taken to use the proper approach.
55. Re:Little Intel has growed up by Bengie · 2011-11-17 07:36 · Score: 1
  
  Amazing what a liquid nitrogen jacket with a liquid helium center can do when overclocking.
56. Re:Little Intel has growed up by Bengie · 2011-11-17 07:38 · Score: 1
  
  "I still remember the first time I saw a PC with a 1GB hard-drive ... a bunch of us stood around it thinking "WTF will we ever do with that much disk space?"."
  Now we're like "Damn 2GB texture pack."
Re:Pfffft! by fa2k · 2011-11-16 07:02 · Score: 1

Pfffft! My prosthetic horse cock penis can deliver 500 OPH (orgasms per hour)
"DP" is double precision in this case, not the other one;)
How can that be? by gr8_phk · 2011-11-16 07:16 · Score: 3, Insightful

A 50 core chip at 1GHz is going to need to perform 20 double precision floating point ops per cycle per core to achieve 1Tflop performance. OK, so 1.2GHz cuts that down to 16flops/clock. Since when can anything Intel Architecture achieve that many flops per cycle? Two 4-element dot products is only 14 flops. I suppose if they did two vector-scaler multiply-adds that would get 16 flops per cycle. So I just answered my own question. But can they really keep the FP unit running continuously at that rate? On all 50 cores?
1. Re:How can that be? by parlancex · 2011-11-16 07:25 · Score: 2
  
  Maybe, but probably not. The key to high performance computing when dealing with parallel workloads like this is not just raw processing power, but memory bandwidth. The Nvidia Tesla M2090 mentioned in TFS has a peak memory bandwidth of 177GB/s with specially designed memory and controllers designed for raw throughput. Conventional CPUs with fastest DDR3 memory available can barely crack a small fraction of that. A terraflop of sustained DP performance is going to be completely useless without the memory bandwidth to back it up.
2. Re:How can that be? by Nom+du+Keyboard · 2011-11-16 07:34 · Score: 1
  
  A 50 core chip at 1GHz is going to need to perform 20 double precision floating point ops per cycle per core to achieve 1Tflop performance. OK, so 1.2GHz cuts that down to 16flops/clock.
  By your math it means that each core has a 1024-bit wide vector unit. And that means 64-bit FP, not 80-bit. Not impossible, but perhaps unlikely to ever run at theoretical max across all cores in anything but the most carefully crafted case.
  
  --
  "It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
3. Re:How can that be? by MacGyver2210 · 2011-11-16 07:36 · Score: 1
  
  You seem to be forgetting about SIMD and vectorization. If you pack more instructions into the bits available for one, it can do much more than your typical 32- or 64-bit core. That is often how early benchmarks are tested to give the highest results possible for the data throughput.
  
  --
  If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits
4. Re:How can that be? by the+linux+geek · 2011-11-16 07:46 · Score: 1
  
  Intel claims each core can perform 16 FLOPS per cycle, at least at SP. Each core has a 512-bit wide vector unit. I'm not sure where their DP claims are coming from, though.
5. Re:How can that be? by six · 2011-11-16 08:25 · Score: 2
  
  The vector unit must be FMA capable just like Larrabee, hence the doubling of FLOPS/cycle.
6. Re:How can that be? by markhahn · 2011-11-16 10:09 · Score: 1
  
  there are lots of useful computations that are more flops-intensive (relative to memory footprint) than dot-products. matmul, fft, almost anything montecarlo, etc.
7. Re:How can that be? by David+Greene · 2011-11-16 10:44 · Score: 2
  
  OK, so 1.2GHz cuts that down to 16flops/clock. Since when can anything Intel Architecture achieve that many flops per cycle?
  Since LRBni and its 512-bit vectors. A double-precision FMA gets you 16 ops in a clock.
  
  But can they really keep the FP unit running continuously at that rate? On all 50 cores?
  Easily. HPC codes regularly keep thousands of cores busy.
  
  --
8. Re:How can that be? by drewm1980 · 2011-11-16 13:02 · Score: 1
  
  Depends on how much cache is on the chip, and how big the problem being solved is. GPU's have a lot of FP units, but they have such a tiny amount of cache that they basically have to transfer ~everything they operate on over the memory bus. On a CPU, your dataset can be several MB and still fit on-chip, but of course you have fewer FP units. The algorithm I designed for my Ph.D. operate on the same few megabytes of data many times, and it ended up being about equally fast on both architectures, so I'm hoping KC will bridge the cache size chasm that exists between CPU's and GPU's.
9. Re:How can that be? by parlancex · 2011-11-16 16:16 · Score: 1
  
  On a CPU, your dataset can be several MB and still fit on-chip...
  Clearly you've never dealt with any HPC programming before. In the vast majority of massively parallel compution problems, the kind which are solved by these kind of chips the data sets are also necessarily large; hundreds of megabytes or gigabytes of data. The algorithms that allow massively parallel compution will compute a single step of an algorithm on a large number of elements.
  Consider the scenario the GP was referring to, massively parallel dot product, for matrix operations or other algorithms used in things like computer graphics, weather and physics simulations, neural networks and AI processing. In each DP4 you are reading 8 elements and writing 1, so you need (8 + 1) * 8 = 72 bytes of memory bandwidth per DP4 operation which is 8 flops, so to maintain peak performance without bottlenecking you would need at least 9 bytes of memory bandwidth per flop, which at 1 terraflop is around 8000 GB/s of memory bandwidth. Obviously the chip won't have that much, but that just goes to show you that the vast majority of modern HPC is limited by memory bandwidth, not flops, with notable exceptions for a small number of applications like hash cracking and such.
10. Re:How can that be? by drewm1980 · 2011-11-16 18:17 · Score: 1
  
  Well, I suppose either my application (face recognition for hundreds of users) is under the threshold for your definition of HPC, or it's a notable exception. Our algorithm consists primarily of repeated BLAS level 1 and 2 operations on chunks of data that fit in CPU cache, but not GPU cache. Essentially, it's low arithmetic intensity operations performed repeatedly (hundreds of times) on gallery image sets that take up a couple megs at a time (and there are a couple hundred of those that can be computed in parallel). Under these conditions, we find that a dual socket quad-core Xeon machine is roughly comparable to a high-end Fermi. There is locality in our memory access pattern that the CPU has enough cache to exploit, but the GPU does not. I'm not a chip designer, so I can't ~really compare the opportunity cost (in $/flop) of adding more cache vs. widening a memory bus of a GPU, but I suspect the cache is cheaper, especially given that it need not be shared (again, for our application).
11. Re:How can that be? by Shinobi · 2011-11-17 00:42 · Score: 1
  
  Your application is one in which GPU's excel normally, so I have to say that yours must be very badly written.
  By treating it as streams of texture sets, rather than just working chunk by chunk, you improve performance. That way, you can just set up different Streaming Processors in a chain to perform the various steps. When programmed in that way, my dual quad core Xeon is outperformed by an old GTS 250.
  When you program a GPU, even with CUDA or OpenCL, a DSP programming mindset is more appropriate than a general purpose-CPU programming mindset.
12. Re:How can that be? by drewm1980 · 2011-11-17 06:35 · Score: 1
  
  I would not be so quick to accuse people of writing poor code when you know very little about the problem they're working on. And remember, ~most code runs faster on CPUs. If you read some of Vasily Volkov's papers (he's the guy who wrote the early versions of CUBLAS), it is very clear that you might as well not bother with the GPU if you're mostly doing blas level 1-2 stuff, since the arithmetic intensity isn't high enough. For our application we had some specific operations we could combine and tricks we could use to reduce bandwidth, but they turned out to not be enough. This was a Ph.D. project that took several years; many parallelization strategies were tried on both CPU and GPU, and I got lots of feedback from colleagues working on gpu compiler tools.
13. Re:How can that be? by Bengie · 2011-11-17 07:41 · Score: 1
  
  It has 512bit AVX-like registers. You can do a lot of FP/clock with SIMD like that. But like you said(vector-scaler multiply-adds), they probably have multi-operand commands to allow fused math.
14. Re:How can that be? by gr8_phk · 2011-11-21 11:00 · Score: 1
  
  there are lots of useful computations that are more flops-intensive (relative to memory footprint) than dot-products. matmul, fft, almost anything montecarlo, etc.
  matmul IS dot products. FFT is dot products too. Most anything DSP is dot products. I chose dot product because it is the instruction that does the most floating point operations.
15. Re:How can that be? by gr8_phk · 2011-11-21 11:03 · Score: 1
  
  You seem to be forgetting about SIMD and vectorization.
  
  Dot product or vector multiply-add IS an SIMD instruction. I chose it because it does the most FLOPs of any instruction I'm aware of. If it can retire 2 of those per cycle then the FPU will have the claimed performance. Then I questioned the memory performance. And after recalling my own efforts to optimize the cache behavior of matrix operations I'm convinced they can do it with not too much cache per core.
16. Re:How can that be? by Shinobi · 2011-11-22 06:16 · Score: 1
  
  Oh, it was an academic project. No wonder then.
  Basically, everything you've said so far was that you treated the GPU like a slot-in general purpose processor, which it's not. Take a look at what's been done in the INDUSTRY, not academia, to see how effective GPU's are at image recognition and processing.
  Games push boundaries that academia has yet to reach.
  In special effects, multi-object motion path detection, tracking and compensation is done on GPU's nowadays, because a cheap GPU can do it more efficiently than an expensive multi-core Xeon or Opteron.
  In military use, low-power GPU's are used over heavy duty multi-core CPU's, because they offer better performance for their image recognition and processing tasks and can be packed into a smaller more heat and power efficient package.(For example, UAV's, most modern ones just encode the data streams, send it back to operators console, where a GPU decodes and processes it.
17. Re:How can that be? by drewm1980 · 2011-11-22 09:18 · Score: 1
  
  You win!
Not all that exciting by mrjimorg · 2011-11-16 07:29 · Score: 1

Today I can go to the store and buy the Nvidia board that they mention. When can I buy a system with a Knight's Corner chip? What about a PCI-E board? The answer is never. It will only be sold to Intel's partners in labs and research environments for special projects. It means very little to most of us.
1. Re:Not all that exciting by the+linux+geek · 2011-11-16 07:48 · Score: 2
  
  Intel claims it will be released as a commercial product in the near future.
2. Re:Not all that exciting by Sebastopol · 2011-11-16 10:44 · Score: 1
  
  "It means very little to most of us."
  Just like your comment.
  
  --
  https://www.accountkiller.com/removal-requested
Re:Still Nvidia wins by Anonymous Coward · 2011-11-16 07:56 · Score: 0

ATI graphics suck
But they don't.
How about a consumer version? by Jeng · 2011-11-16 08:23 · Score: 1

Wonder if they'll produce a consumer version.
I use an ATI card as my main video card, wouldn't mind sticking a physics card in the other PCI-E slot. The thing is that if I put in an Nvidia card it won't work as a physics card since Nvidia has written the drivers in such a way that if you have a non-Nvida video card as your primary video card Nvidia will not allow you to use their cards just for physics.
So my hope is that if Intel puts out a consumer version then either I'll be able to buy an Intel board just for physics or Nvidia will drop their stupid restriction.
Either way if Intel puts out a consumer physics card I win.

--
Don't know something? Look it up. Still don't know? Then ask.
Intel's side entry into the GPU market by JDG1980 · 2011-11-16 08:34 · Score: 1

We may yet see high-end Intel discrete graphics cards in the future.
Knights Corner sounds like it is basically a high-end GPU without the actual graphics output. This lets Intel position it as a professional product for HPC and supercomputing, and squeeze out as much profit as possible from the early models. Then, once the R&D cost has been amortized and the fab technology is advanced further, they can add a HDMI output, dedicated RAM, and glue logic, and write appropriate drivers to make it a full-fledged graphics card. Of course this may lack some features of the professional Knights Corner (ECC support?) so it won't cannibalize the high-end market. But it has the potential to be much more power-efficient than AMD and nVidia enthusiast products.
1. Re:Intel's side entry into the GPU market by the+linux+geek · 2011-11-16 09:13 · Score: 1
  
  It originally was a video card (Larrabee project), but things didn't look good for consumer performance and they repositioned it.
2. Re:Intel's side entry into the GPU market by timeOday · 2011-11-16 09:16 · Score: 1
  
  Knights Corner sounds like it is basically a high-end GPU without the actual graphics output.
  
  To me it sounds like much more. The "cores" on a GPU are not equivalent to CPU cores, whereas on Knight's Corner you get 50 actual x86 cores. It is sure to be much more general purpose. From the article: "Unlike other co-processors, the MIC is fully accessible and programmable as though it were a fully functional HPC node." It sounds like a cluster on a chip. I am curious about the memory model.
3. Re:Intel's side entry into the GPU market by drewm1980 · 2011-11-16 13:32 · Score: 1
  
  I'm curious about the memory model too. I'm pretty certain that bit about "cluster on a chip" is just marketing hyperbole, and it's actually still a shared memory system running one instance of the linux kernel. They're not going to make you run 50 linux kernel instances and communicate between them using network sockets.
4. Re:Intel's side entry into the GPU market by makomk · 2011-11-17 02:41 · Score: 1
  
  They're essentially using x86 cores for a vaguely GPU-style wide SIMD unit, from what I can tell. AMD's next generation of GPUs appear to be heading towards a similar destination from the opposite direction - they're adding a non-vector core to each 16-wide block of vector cores for control code that can't easily be vectorized.
MIC presentations at SC11 by Nite_Hawk · 2011-11-16 09:22 · Score: 3, Informative

I'm at SC11 right now and just attended NIC's MIC presentation. The scaling looks fantastic according to various codes that they compiled to run on it, but what was notably absent was performance relative to traditional x86 chips. The final presenter even said that now that the technology has been demonstrated to work (with minimal porting effort required) the next step will be to optimize and improve performance. The take away is that relative to Intel's other chips, MIC performance wasn't impressive enough to include in the presentation. That's fine in my book because it's an ambitious project, but it sounds like there is still some work to do.
Remember ASCI Red?? by GrandTeddyBearOfDoom · 2011-11-16 10:00 · Score: 1

Just shows you the progress in CPU power: ASCI Red was the first supercomputer to go over 1TFlop, and was massive, now we have this with just one chip!

--
-- The Grand Teddy Bear has Spoken: "Windows 8 Source Code Available NOW! more disgusting than your pr..."
1. Re:Remember ASCI Red?? by blair1q · 2011-11-16 11:05 · Score: 1
  
  And the massive computers are going 100 Petaflops (that's 100,000 Teraflops).
Imagine... by Iamthecheese · 2011-11-16 12:15 · Score: 1

A beowolf cluster of these! but seriously even one wouldn't be efficient enough to be worth it yet even in top-of-the-line OS's. We need a whole new paradigm of algorithms and maybe even a new language to do this right.

--
If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
1. Re:Imagine... by TechyImmigrant · 2011-11-16 19:24 · Score: 1
  
  Occam?
  
  --
  Evil people are out to get you.
Intel sort comparison paper between MIC and GPU by fozzydabear · 2011-11-16 18:40 · Score: 1

http://techresearch.intel.com/spaw2/uploads/files/FASTsort_CPUsGPUs_IntelMICarchitectures.pdf
but does it rub x264 by Anonymous Coward · 2011-11-16 22:02 · Score: 0

OC it should compile run a current x264 with avx simd fine, but the question is with its wider SIMD and other improvements can it produce a/several high profile 3K/4K super HD video streams far better than realtime on insane settings :) and using CRF=16
HPC is Much More Than Multi-Core Processors by stevenddeacon · 2011-11-17 07:35 · Score: 1

Everyone seems to be defining High Performance Computing as CISC/RISC chips with multi-core processors utilizing Instruction Level Parallelism and Thread-Level Parallelism with extremely fast multi-level Caching. HPC High Availability computing is a synergy of CISC/RISC chips combined with Application, Integrated Instruction, Facility, Graphic and Cryptography assisted processor technologies supplemented with Integrated Coupling facilities. These processors must share access to large amounts of fast Dynamic Random-Access Memory and be integrated into fast I/O bus architectures for a variety of High Performance connectivity with networking equipment, data storage, and other peripheral devices. All this hardware must work in concert with a variety of firmware, hyper-visors, and operating systems supporting telecommunications, storage management, databases, application run-time environments, application servers, web servers, online transaction servers, redundancy, security, applications, fail-over, backup, recovery, archival, and administration management. IBM has been doing HPC/HA for quite a while now. Hewlett-Packard, Oracle Sun Microsystems, Microsoft, Intel, EMC, and Cisco seem to be still chasing the dream.