A Co-processor No More, Intel's Xeon Phi Will Be Its Own CPU As Well

← Back to Stories (view on slashdot.org)

A Co-processor No More, Intel's Xeon Phi Will Be Its Own CPU As Well

Posted by timothy on Tuesday November 26, 2013 @01:49AM from the career-advancement dept.

An anonymous reader writes "The Xeon Phi co-processor requires a Xeon CPU to operate... for now. The next generation of Xeon Phi, codenamed Knights Landing and due in 2015, will be its own CPU and accelerator. This will free up a lot of space in the server but more important, it eliminates the buses between CPU memory and co-processor memory, which will translate to much faster performance even before we get to chip improvements. ITworld has a look."

31 of 53 comments (clear)

Min score:

Reason:

Sort:

Yee-haw no more by vikingpower · 2013-11-26 01:51 · Score: 1

Moore's law is not coming back from the grave, or is it ?

--
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
1. Re:Yee-haw no more by Joce640k · 2013-11-26 03:07 · Score: 1
  
  a) It's not a "law"
  b) It doesn't say that transistor counts will double every 18 months.
  
  --
  No sig today...
2. Re:Yee-haw no more by Chas · 2013-11-26 03:54 · Score: 1, Informative
  
  A) Stop being a pedantic dick.
  B) Correct. The original 1965 paper observed that transistor counts were tending to double every year (12 months), which he later revised to every 2 years (24 months) in 1975.
  What people are misquoting is the House corollary. That PERFORMANCE of microprocessors, due to increased transistor counts, and faster speeds, seems to double roughly every 18 months.
  
  --
  
  Chas - The one, the only.
  THANK GOD!!!
3. Re:Yee-haw no more by Sockatume · 2013-11-26 03:57 · Score: 2
  
  If you're going to nitpick, at least act like you want to educate people and aren't just being a smartass:
  1) It's a law the same way Newton's Laws are laws: it's a simple quantitative relation which has held up very well over time.
  2) a) In its original formulation it said that the number of components you could put on a chip at minimum cost (because you can always cram in more at higher cost) doubled ever year.
  ii) In its later correction (he used only five data points in the first paper!) he revised this to every two years.
  gamma) It was a head honcho in Intel who then concluded that such a relationship would double overall computer performance every eighteen months.
  Given that the components of interest on a processor are transistors, I do not have any great objection to the generalisation that it's transistor counts that double and not all components.
  
  --
  No kidding!!! What do you say at this point?
4. Re:Yee-haw no more by Joce640k · 2013-11-26 07:18 · Score: 1
  
  1) It's a law the same way Newton's Laws are laws.
  Newtons laws are based on real physical "laws". Not the same thing at all and definitely not "nitpicking".
  There's a well know observation that's known as "Moore's Law", yes, but calling it a "law" doesn't make it one. There's a parable about grains of wheat on a chessboard which explains why. Anybody with a working brain can see that it can't hold for much longer. The laws of physics and mathematics will come into play soon (and they're real laws, not data trends).
  Newton's laws? I expect them to hold until the end of time.
  
  --
  No sig today...
5. Re:Yee-haw no more by Sockatume · 2013-11-26 21:00 · Score: 1
  
  Newton's Laws are, at best, approximations to general relativity that are well-behaved on human timescales. They're quantitatively extremely good but that doesn't put them in another class from Hooke's Law, say, which only holds for a perfect harmonic potential, or Ohm's law, which falls down on similar length scales to Moore's law. That some of them used as the basis for physics and some aren't doesn't change the fact that they're all quantitative models for observed phenomena and have an equal stake to the 19th century's pretentious "law" label.
  
  --
  No kidding!!! What do you say at this point?
6. Re:Yee-haw no more by Joce640k · 2013-11-26 21:33 · Score: 1
  
  They're quantitatively extremely good
  And will still be extremely good a million years from now.
  Or a billion...
  
  --
  No sig today...
CPU embedded in GPU versus GPU embedded in CPU by abies · 2013-11-26 01:57 · Score: 1

I thought that we already had GPUs embedded in CPUs. How embedding CPU inside GPU makes it so much different and breakthrough?
1. Re:CPU embedded in GPU versus GPU embedded in CPU by Overzeetop · 2013-11-26 02:00 · Score: 4, Funny
  
  Patents already cover most implementations of GPUs within CPUs. But the field is wide open if you start embedding CPUs in GPUs. It's like "on the internet," but with uprocessors.
  
  --
  Is it just my observation, or are there way too many stupid people in the world?
2. Re:CPU embedded in GPU versus GPU embedded in CPU by Sockatume · 2013-11-26 02:05 · Score: 1
  
  There's no other component. (Pedantry: calling it a GPU is a misnomer as nobody really uses them for real-time graphics. You won't be playing Crysis 3 with one of these. It just happens that this kind of hardware came out of graphics silicon design.)
  
  --
  No kidding!!! What do you say at this point?
3. Re:CPU embedded in GPU versus GPU embedded in CPU by TWX · 2013-11-26 02:05 · Score: 1
  
  Turtles, all the way down...
  
  --
  Do not look into laser with remaining eye.
4. Re:CPU embedded in GPU versus GPU embedded in CPU by fluffythedestroyer · 2013-11-26 02:06 · Score: 1
  
  GPU is important enough at least at a basic level but when you speak about servers... do we need them. I mean, I know microsofts next servers will have the optional GUI implemented and they will "try" to get away from the GUI and will recommend to work without it. I question the need for the GPU
5. Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 02:18 · Score: 2, Interesting
  
  It's not just for drawing graphics. It can be used as a general computation platform.
  As an example imagemagick supports OpenCL nowadays. So as an example if you have a webpage where images can be uploaded and you do some processing for them (cropping, scaling for thumbnails etc) you can get absolutely amazing performance on a server with GPU.
6. Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 02:33 · Score: 1
  
  It's hard to believe how shortsighted some of you people are... The GPU in the server has nothing to do with graphics (despite it being in the name), it is used for general highly parallel computations, and while traditional server software doesn't currently support it it is theoretically possible to accelerate traditional server applications such as databases using the GPU, and it has been actually demonstrated with PoC software.
  Captcha: leverage
7. Re:CPU embedded in GPU versus GPU embedded in CPU by gl4ss · 2013-11-26 05:26 · Score: 1
  
  Hint: The "G" in "GPU" does not stand for "game".
  Graphics? because that's what it stands for.
  I guess some people are going to be calling them General processing units or some shit like that but frankly if they can't be bothered to call it a co-processor if that's what they want to call it they could do well with a punch to the groin.
  
  --
  world was created 5 seconds before this post as it is.
8. Re:CPU embedded in GPU versus GPU embedded in CPU by fast+turtle · 2013-11-26 07:41 · Score: 1
  
  kick those Ninja Turtles out of this discussion. They're not related to me.
  
  --
  Mod me up/Mod me down: I wont frown as I've no crown
Optionally by Sockatume · 2013-11-26 02:02 · Score: 2

Knights Landing will be available as both an accelerator card and a standalone CPU with some sort of large high-speed memory pool on the die.

--
No kidding!!! What do you say at this point?
Fully Baked? by DragonDru · 2013-11-26 02:18 · Score: 4, Informative

Good. The current generation Phi cards are a pain to administer. With luck the new generation will be more fully baked.
- very hot card, no fans
- depends on software to down throttle the cards (mine have hit 104C)
- stripped down OS running on the cards, poor user facing directions for the usage

Anyway, enough from me.

--
20 characters max for the password? How will I use my favorite poems as passwords?
1. Re:Fully Baked? by Junta · 2013-11-26 02:25 · Score: 5, Informative
  
  I won't disagree about the awkwardness of MPSS, but the 'very hot card, no fans' is because it's meant only to be installed into systems that cooperate with them and have cooling designs where the hosting system takes care of it. For a lot of systems that Phi go into, a fan is actually a liability because those systems already have cooling solutions and a fan actually fights with the designed airflow.
  Of course, that's why nVidia offers up two Tesla variants of every model, one with and one without fan, to cater to both worlds.
  
  --
  XML is like violence. If it doesn't solve the problem, use more.
2. Re:Fully Baked? by Anonymous Coward · 2013-11-26 02:50 · Score: 1
  
  Xeon Phi also have variants with and without fans:
  http://newsroom.intel.com/servlet/JiveServlet/showImage/38-5572-2661/Xeon_Phi_Family.jpg
3. Re:Fully Baked? by kry73n · 2013-11-26 05:46 · Score: 1
  
  maybe they will finally also remove those texturing units from the Phi
4. Re:Fully Baked? by DragonDru · 2013-11-26 06:17 · Score: 1
  
  Yes, I have run into the "over heat and fall off the bus" issue. I ended up having to depower (pull the plugs on the host system) the system entirely in order to get them to work correctly after that. On both of the systems I have setup Phi Cards, I had to update the BIOS and set the system fans to run full speed all of the time.
  It would have been nice if the cards included their own throttling...
  
  --
  20 characters max for the password? How will I use my favorite poems as passwords?
Frankly, the accelerator card is pretty dumb by Anonymous Coward · 2013-11-26 02:20 · Score: 3, Interesting

For a Phi, the selling point is about ease of programming. The memory model of the accelerator card is a pain in the ass, making development more difficult. This on top of the fact that the administration of those are pretty limited and annoying. MPSS is crap for everyone, and one of the critical differences here is that the standalone accelerator might not require Intel to be the linux distribution curator anymore (they frankly suck pretty hard at it).
Intel having a standalone variant pretty much obviates the utility of an accelerator card model for all but perhaps the tiniest usage and makes things far more simpler. Trying to get the same workload to work across Phi and main CPUs is, in practice, more about trying to make the best of an awkward heterogeneous compute situation. While you still can and will run jobs heterogeneous if you do have both Phi and normal Xeon nodes (e.g. a top500 run and... well not much else), it is done using more typical methods of MPI.
In short, this move pretty much let's intel focus on the pieces they *are* good at (making a decent processor) and get away from the stuff they aren't so good at (pcie hosted device, linux distribution design, etc).
1. Re:Frankly, the accelerator card is pretty dumb by Sockatume · 2013-11-26 03:43 · Score: 1
  
  So you think the accelerator card version is just a stopgap for customers looking to upgrade (rather than replace) their systems, and it'll go away in time?
  
  --
  No kidding!!! What do you say at this point?
80487 coprocessor all over? by Anonymous Coward · 2013-11-26 02:24 · Score: 5, Interesting

The 80486 was the first Intel processor with integrated coprocessor, coming at about €1000 (only know the DM price). There was a considerably cheaper version, the 80486SX "without" coprocessor (actually, the coprocessor was usually just disabled, possibly because of yield problems, and still took current).
One could buy an 80487 coprocessor that provided the missing floating point performance. Customers puzzled how the processor/coprocessor combination could be competitive without the on-chip communication of the 80486. The answer was that it did not even try. The "coprocessor" contained a CPU as well and simply switched off the "main" processor completely. It was basically a full 80486 with different pinout, pricing, and marketing.
It was probably phased out once the yields became good enough.
1. Re:80487 coprocessor all over? by kheldan · 2013-11-26 06:31 · Score: 1
  
  It was basically a full 80486 with different pinout, pricing, and marketing.
  Intel also made an 80386/80387 "RapidCAD" chipset, that I managed to get a hold of at one point, and discovered that the 80387 was just a dud (which, according to Wikipedia, was there just to supply the FERR signal, to keep everything compatible with a real '387); the coprocessor was on-die with the '386 core, just like a '486.
  
  --
  Are YOU using the TOOL, or is the TOOL using YOU? Think about it!
Good multi-thread, bad single-thread by Theovon · 2013-11-26 02:50 · Score: 1

These processors are like an Intel version of Sun Niagara, but with wider vector. Actually, from an architectural perspective Xeon Phi (Larrabee) is pretty basic. They’re an array of 4-way SMT in-order dual-issue x86 processors, with 512-bit vector units. I think one of the major reasons Xeon Phi doesn’t compete well with GPUs on performance is that legacy x86 ISA translation engine taking up so much die area. Anyhow, so if you have a highly parallel algorithm, then Xeon Phi will be a boon for performance.
But as we know, there are numerous very important algorithms that are not parallelizable.
1. Re:Good multi-thread, bad single-thread by serviscope_minor · 2013-11-26 04:47 · Score: 1
  
  These processors are like an Intel version of Sun Niagara, but with wider vector.
  I thought the Niagara was a crazy-wide barrel process or sorts: it switches to a new thread every cycle with a grand total of 8 threads (per core). The idea being that if you've filled up all 8 threads, then each instruction can wait 8 cycles for a bit of memory entirely for free because it takes 8 cycles to execute.
  The idea (not entirely realised sadly) was that for highly parallel workloads you get much higher aggregate throughput since you have far fewer memory stalls than a much faster single threaded core.
  
  --
  SJW n. One who posts facts.
Neat by RobHostetter · 2013-11-26 03:42 · Score: 1

I could see using this, whereas I couldn't see myself using the card version. If the cost premium is reasonable this could be awesome for image processing. I have an image algorithm I use CUDA for and moving the data around consumes almost as much time as processing the data. If I had this in my servers I would have flexibility and much greater performance with this solution. --Robert
i see a problem with this by etash · 2013-11-26 03:59 · Score: 2

wouldn't an embedded in the cpu xeon phi version, lack the necessary GDDR4/5 which exists in the PCI-express card version with its 200-300GB/s of throughput, and be forced to just access the main computer RAM at about 40-50GB/s?
GPUs by DrYak · 2013-11-26 07:32 · Score: 1

I don't know about Niagara's, but according to docs about Warps and half-warps, that's how Nvidia GPU run CUDA.
They keep cycling through 2 or 4 threads, to hide memory latency.
(Except that each thread it self runs on a wide SIMD instead of a normal CPU. So the final size of parallel execution [=threads] is the amount of wraps in parallel x size of the SIMD).

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]