A Co-processor No More, Intel's Xeon Phi Will Be Its Own CPU As Well

Yee-haw no more by vikingpower · 2013-11-26 01:51 · Score: 1

Moore's law is not coming back from the grave, or is it ?

--
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace

Re:Yee-haw no more by Anonymous Coward · 2013-11-26 02:21 · Score: 0

What, the law stating that transistor counts will double every 18 months? It's still doing quite well. We're just finding that increasing parallelism in serial code paths is hard and physics has limitations on how fast a single operation can occur. Next best thing is lots of operations at the same time, which required mutlithreading.
Re:Yee-haw no more by Joce640k · 2013-11-26 03:07 · Score: 1

a) It's not a "law"
b) It doesn't say that transistor counts will double every 18 months.

--
No sig today...
Re:Yee-haw no more by Chas · 2013-11-26 03:54 · Score: 1, Informative

A) Stop being a pedantic dick.
B) Correct. The original 1965 paper observed that transistor counts were tending to double every year (12 months), which he later revised to every 2 years (24 months) in 1975.
What people are misquoting is the House corollary. That PERFORMANCE of microprocessors, due to increased transistor counts, and faster speeds, seems to double roughly every 18 months.

--

Chas - The one, the only.
THANK GOD!!!
Re:Yee-haw no more by Sockatume · 2013-11-26 03:57 · Score: 2

If you're going to nitpick, at least act like you want to educate people and aren't just being a smartass:
1) It's a law the same way Newton's Laws are laws: it's a simple quantitative relation which has held up very well over time.
2) a) In its original formulation it said that the number of components you could put on a chip at minimum cost (because you can always cram in more at higher cost) doubled ever year.
ii) In its later correction (he used only five data points in the first paper!) he revised this to every two years.
gamma) It was a head honcho in Intel who then concluded that such a relationship would double overall computer performance every eighteen months.
Given that the components of interest on a processor are transistors, I do not have any great objection to the generalisation that it's transistor counts that double and not all components.

--
No kidding!!! What do you say at this point?
Re:Yee-haw no more by Joce640k · 2013-11-26 07:18 · Score: 1

1) It's a law the same way Newton's Laws are laws.
Newtons laws are based on real physical "laws". Not the same thing at all and definitely not "nitpicking".
There's a well know observation that's known as "Moore's Law", yes, but calling it a "law" doesn't make it one. There's a parable about grains of wheat on a chessboard which explains why. Anybody with a working brain can see that it can't hold for much longer. The laws of physics and mathematics will come into play soon (and they're real laws, not data trends).
Newton's laws? I expect them to hold until the end of time.

--
No sig today...
Re:Yee-haw no more by Anonymous Coward · 2013-11-26 12:23 · Score: 0

Newton's laws? I expect them to hold until the end of time.
Or until you go really fast or near a large mass, or try to do anything with enough precision. We could make a parable about going faster and faster, then maybe people with working brains will see Newton's laws wouldn't hold for long.
Re:Yee-haw no more by Sockatume · 2013-11-26 21:00 · Score: 1

Newton's Laws are, at best, approximations to general relativity that are well-behaved on human timescales. They're quantitatively extremely good but that doesn't put them in another class from Hooke's Law, say, which only holds for a perfect harmonic potential, or Ohm's law, which falls down on similar length scales to Moore's law. That some of them used as the basis for physics and some aren't doesn't change the fact that they're all quantitative models for observed phenomena and have an equal stake to the 19th century's pretentious "law" label.

--
No kidding!!! What do you say at this point?
Re:Yee-haw no more by Joce640k · 2013-11-26 21:33 · Score: 1

They're quantitatively extremely good
And will still be extremely good a million years from now.
Or a billion...

--
No sig today...

CPU embedded in GPU versus GPU embedded in CPU by abies · 2013-11-26 01:57 · Score: 1

I thought that we already had GPUs embedded in CPUs. How embedding CPU inside GPU makes it so much different and breakthrough?

Re:CPU embedded in GPU versus GPU embedded in CPU by Overzeetop · 2013-11-26 02:00 · Score: 4, Funny

Patents already cover most implementations of GPUs within CPUs. But the field is wide open if you start embedding CPUs in GPUs. It's like "on the internet," but with uprocessors.

--
Is it just my observation, or are there way too many stupid people in the world?
Re:CPU embedded in GPU versus GPU embedded in CPU by Sockatume · 2013-11-26 02:05 · Score: 1

There's no other component. (Pedantry: calling it a GPU is a misnomer as nobody really uses them for real-time graphics. You won't be playing Crysis 3 with one of these. It just happens that this kind of hardware came out of graphics silicon design.)

--
No kidding!!! What do you say at this point?
Re:CPU embedded in GPU versus GPU embedded in CPU by TWX · 2013-11-26 02:05 · Score: 1

Turtles, all the way down...

--
Do not look into laser with remaining eye.
Re:CPU embedded in GPU versus GPU embedded in CPU by fluffythedestroyer · 2013-11-26 02:06 · Score: 1

GPU is important enough at least at a basic level but when you speak about servers... do we need them. I mean, I know microsofts next servers will have the optional GUI implemented and they will "try" to get away from the GUI and will recommend to work without it. I question the need for the GPU
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 02:18 · Score: 2, Interesting

It's not just for drawing graphics. It can be used as a general computation platform.
As an example imagemagick supports OpenCL nowadays. So as an example if you have a webpage where images can be uploaded and you do some processing for them (cropping, scaling for thumbnails etc) you can get absolutely amazing performance on a server with GPU.
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 02:30 · Score: 0

GPUs are not CPUs.
Differences:

GPUs: have something like a 96KB L2 cache shared among thousands of cores.
Knights Landing: something like 256KB per core

GPUs: When one core branches in a group, all other cores not on the same branch have to stall
Knights Landing: Each CPU's execution unit is entirely independent of other cores

GPUs: Random memory access is incredibly slow
Knights Landing:Random access is about the same as sequential
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 02:33 · Score: 1

It's hard to believe how shortsighted some of you people are... The GPU in the server has nothing to do with graphics (despite it being in the name), it is used for general highly parallel computations, and while traditional server software doesn't currently support it it is theoretically possible to accelerate traditional server applications such as databases using the GPU, and it has been actually demonstrated with PoC software.
Captcha: leverage
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 03:48 · Score: 0

Hint: The "G" in "GPU" does not stand for "game".
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 03:49 · Score: 0

..when you speak about servers..
..the first question is: what service do you want to provide?
If NFS is the service you want to provide, then you're right. GPUs don't make NFS, Samba, etc faster.
Someone once heard an anecdote, though, that someone else programmed a computer to reply to requests over a network, where the contents of the response were a function of many independent calculations. Can you imagine that?
Re: CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 03:53 · Score: 0

I believe gaming and animation sector can benefit more, compared to server
Re:CPU embedded in GPU versus GPU embedded in CPU by gl4ss · 2013-11-26 05:26 · Score: 1

Hint: The "G" in "GPU" does not stand for "game".
Graphics? because that's what it stands for.
I guess some people are going to be calling them General processing units or some shit like that but frankly if they can't be bothered to call it a co-processor if that's what they want to call it they could do well with a punch to the groin.

--
world was created 5 seconds before this post as it is.
Re:CPU embedded in GPU versus GPU embedded in CPU by Anonymous Coward · 2013-11-26 06:18 · Score: 0

A GPU, like an FPU, *is* a co-processor. The Xeon Phi is based on Intel's GPU design and is well suited for exactly that, it is a GPU. It doesn't have to be used for realtime graphics to qualify.
Re:CPU embedded in GPU versus GPU embedded in CPU by fast+turtle · 2013-11-26 07:41 · Score: 1

kick those Ninja Turtles out of this discussion. They're not related to me.

--
Mod me up/Mod me down: I wont frown as I've no crown

Optionally by Sockatume · 2013-11-26 02:02 · Score: 2

Knights Landing will be available as both an accelerator card and a standalone CPU with some sort of large high-speed memory pool on the die.

--
No kidding!!! What do you say at this point?

Fully Baked? by DragonDru · 2013-11-26 02:18 · Score: 4, Informative

Good. The current generation Phi cards are a pain to administer. With luck the new generation will be more fully baked.
- very hot card, no fans
- depends on software to down throttle the cards (mine have hit 104C)
- stripped down OS running on the cards, poor user facing directions for the usage

Anyway, enough from me.

--
20 characters max for the password? How will I use my favorite poems as passwords?

Re:Fully Baked? by Junta · 2013-11-26 02:25 · Score: 5, Informative

I won't disagree about the awkwardness of MPSS, but the 'very hot card, no fans' is because it's meant only to be installed into systems that cooperate with them and have cooling designs where the hosting system takes care of it. For a lot of systems that Phi go into, a fan is actually a liability because those systems already have cooling solutions and a fan actually fights with the designed airflow.
Of course, that's why nVidia offers up two Tesla variants of every model, one with and one without fan, to cater to both worlds.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Fully Baked? by Anonymous Coward · 2013-11-26 02:28 · Score: 0

Have yours fallen off the PCIe bus from hitting 120+ C? Because I had trouble figuring out why they were falling off until I got one to boot long enough to get the temperature reading off it...and found the systems I had weren't cooling it sufficiently.
Re:Fully Baked? by Anonymous Coward · 2013-11-26 02:50 · Score: 1

Xeon Phi also have variants with and without fans:
http://newsroom.intel.com/servlet/JiveServlet/showImage/38-5572-2661/Xeon_Phi_Family.jpg
Re:Fully Baked? by kry73n · 2013-11-26 05:46 · Score: 1

maybe they will finally also remove those texturing units from the Phi
Re:Fully Baked? by DragonDru · 2013-11-26 06:17 · Score: 1

Yes, I have run into the "over heat and fall off the bus" issue. I ended up having to depower (pull the plugs on the host system) the system entirely in order to get them to work correctly after that. On both of the systems I have setup Phi Cards, I had to update the BIOS and set the system fans to run full speed all of the time.
It would have been nice if the cards included their own throttling...

--
20 characters max for the password? How will I use my favorite poems as passwords?

Frankly, the accelerator card is pretty dumb by Anonymous Coward · 2013-11-26 02:20 · Score: 3, Interesting

For a Phi, the selling point is about ease of programming. The memory model of the accelerator card is a pain in the ass, making development more difficult. This on top of the fact that the administration of those are pretty limited and annoying. MPSS is crap for everyone, and one of the critical differences here is that the standalone accelerator might not require Intel to be the linux distribution curator anymore (they frankly suck pretty hard at it).

Intel having a standalone variant pretty much obviates the utility of an accelerator card model for all but perhaps the tiniest usage and makes things far more simpler. Trying to get the same workload to work across Phi and main CPUs is, in practice, more about trying to make the best of an awkward heterogeneous compute situation. While you still can and will run jobs heterogeneous if you do have both Phi and normal Xeon nodes (e.g. a top500 run and... well not much else), it is done using more typical methods of MPI.

In short, this move pretty much let's intel focus on the pieces they *are* good at (making a decent processor) and get away from the stuff they aren't so good at (pcie hosted device, linux distribution design, etc).

Re:Frankly, the accelerator card is pretty dumb by Sockatume · 2013-11-26 03:43 · Score: 1

So you think the accelerator card version is just a stopgap for customers looking to upgrade (rather than replace) their systems, and it'll go away in time?

--
No kidding!!! What do you say at this point?
Re:Frankly, the accelerator card is pretty dumb by Anonymous Coward · 2013-11-26 07:06 · Score: 0

The best thing about a separate card is scalability. You can plug several of them in one motherboard. Performance depends on the task's memory usage. How much does it need and how much does it transfer. Anyway PCI is better than network clustering.
Making it a full processor isn't something revolutionary. Now instead of a single server with multiple accelerators one will need several computing nodes. Unless Intel keeps the PCI version in production. On the positive side a motherboard will probably allow a lot of memory installed.
Anyway I'll keep this in mind as possible numbers crunching solution.
PS: Calling Xeon Phi a GPU is probably incorrect as it is rarely used for graphics rendering.

80487 coprocessor all over? by Anonymous Coward · 2013-11-26 02:24 · Score: 5, Interesting

The 80486 was the first Intel processor with integrated coprocessor, coming at about €1000 (only know the DM price). There was a considerably cheaper version, the 80486SX "without" coprocessor (actually, the coprocessor was usually just disabled, possibly because of yield problems, and still took current).

One could buy an 80487 coprocessor that provided the missing floating point performance. Customers puzzled how the processor/coprocessor combination could be competitive without the on-chip communication of the 80486. The answer was that it did not even try. The "coprocessor" contained a CPU as well and simply switched off the "main" processor completely. It was basically a full 80486 with different pinout, pricing, and marketing.

It was probably phased out once the yields became good enough.

Re:80487 coprocessor all over? by kheldan · 2013-11-26 06:31 · Score: 1

It was basically a full 80486 with different pinout, pricing, and marketing.
Intel also made an 80386/80387 "RapidCAD" chipset, that I managed to get a hold of at one point, and discovered that the 80387 was just a dud (which, according to Wikipedia, was there just to supply the FERR signal, to keep everything compatible with a real '387); the coprocessor was on-die with the '386 core, just like a '486.

--
Are YOU using the TOOL, or is the TOOL using YOU? Think about it!

Take it up with your system vendor... by Anonymous Coward · 2013-11-26 02:38 · Score: 0

If your system vendor sold you a Phi solution without doing the cooling right, they need to take care of you. Some server vendors actually invested some actual attention in making sure the thermals were correct before selling, and a vendor that sold without taking that care should suffer for it.

Good multi-thread, bad single-thread by Theovon · 2013-11-26 02:50 · Score: 1

These processors are like an Intel version of Sun Niagara, but with wider vector. Actually, from an architectural perspective Xeon Phi (Larrabee) is pretty basic. They’re an array of 4-way SMT in-order dual-issue x86 processors, with 512-bit vector units. I think one of the major reasons Xeon Phi doesn’t compete well with GPUs on performance is that legacy x86 ISA translation engine taking up so much die area. Anyhow, so if you have a highly parallel algorithm, then Xeon Phi will be a boon for performance.

But as we know, there are numerous very important algorithms that are not parallelizable.

Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 03:32 · Score: 0

I assume that's why it's going to be coupled with a dedicated Xeon. Phi as vector coprocessor for highly parallel workloads, and when that's underutilized, use turbo to triple the clock rate of the general-purpose cores.
Re:Good multi-thread, bad single-thread by serviscope_minor · 2013-11-26 04:47 · Score: 1

These processors are like an Intel version of Sun Niagara, but with wider vector.
I thought the Niagara was a crazy-wide barrel process or sorts: it switches to a new thread every cycle with a grand total of 8 threads (per core). The idea being that if you've filled up all 8 threads, then each instruction can wait 8 cycles for a bit of memory entirely for free because it takes 8 cycles to execute.
The idea (not entirely realised sadly) was that for highly parallel workloads you get much higher aggregate throughput since you have far fewer memory stalls than a much faster single threaded core.

--
SJW n. One who posts facts.
Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 05:37 · Score: 0

yes indeed. However it is an interesting product in that a) intel took a few labs from lab to market b) it is an alternative for some to GPUs and perhaps may be in the future...
I was at SC13 and one of the best descriptions I heard of this was "The only way GPUs can win is on power". Note , the Phi can already run a form of Linux since it is i686. There is a LOT of legacy software that could be patched to exploit this chip.
parallelism can be exploited in many ways. One of the biggest impediments is memory bandwidth, and this solution is at least addressing that issue.
Note, AMD should be way ahead, but I have heard precious little from them, even though apparently their cards perform quite well (maths performance).
We'll see if ARM'd GPUs appear....!
Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 05:44 · Score: 0

At 14nm you really need to stop caring about die area in regard to "legacy x86" ISA. That is completely negligible!
Stop spreading this nonsense, otherwise some people might actually believe that ARM has some advantage over x86.
Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 06:09 · Score: 0

The cache and interconnects eat up considerably more die than the core logic. the ISA is a small 10% or less of that already small percentage. Not only is the decoder not the bottleneck for throughput, but it doesn't even cut into the execution logic. They don't go "well, we can't do this because there's not enough room, they just add it and increase the cost of the chip to accommodate the extra increase in silicon size.
Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 07:16 · Score: 0

true, i would already buy xeon Phi based on performance and compatibility only, if it was not for its insane price, until they become competitive i would rather buy 50% faster Radeon R7 260X for $139 ( 2 TFLOPS) that pay $4129.00 for Xeon Phi 7120X (1.2 TFLOPS)
Radeon R7 260X has 50 TIMES higher GFLOPS per dollar than Xeon Phi plus uses several times less power
when Xeon Phi gets at least 45 times cheaper or 45 times faster i would buy one
Re:Good multi-thread, bad single-thread by Anonymous Coward · 2013-11-26 16:14 · Score: 0

that's sort of my point. I had heard radeon was good but there is bugger all benchmark info...
The phi at least has a fairly bog standard cpu (i686) and quite fast memory, and there are already a pile of benchmarks for it. And the prices are coming down...
AMD really need to get their act together or they will be buried....
Still , it is nice to know there is some improvement in CPU density. The whole PCI x16 bus is really too slow for all of this, not to mention it cannot share transactions (can it?). I wrote a bit about hyper-transport vs pci xN a few years back. I'm not at all sure if that still stands...

Neat by RobHostetter · 2013-11-26 03:42 · Score: 1

I could see using this, whereas I couldn't see myself using the card version. If the cost premium is reasonable this could be awesome for image processing. I have an image algorithm I use CUDA for and moving the data around consumes almost as much time as processing the data. If I had this in my servers I would have flexibility and much greater performance with this solution. --Robert

i see a problem with this by etash · 2013-11-26 03:59 · Score: 2

wouldn't an embedded in the cpu xeon phi version, lack the necessary GDDR4/5 which exists in the PCI-express card version with its 200-300GB/s of throughput, and be forced to just access the main computer RAM at about 40-50GB/s?

Re:i see a problem with this by Anonymous Coward · 2013-11-26 04:43 · Score: 0

No The chip will be made using 14nm process technology, will support new AVX 3.1 instructions, built-in DDR4 memory controller.
Well yes if you compae to ddr5 but the chip will use a ddr4 controller. And also you will skip the waiting part of loading the data fom main memoy to the cad
Re:i see a problem with this by Anonymous Coward · 2013-11-26 05:56 · Score: 0

There would be the embedded local memory this time, like in your friends XBone.. ;) (sorry)

Total garbage- Larrabee has always had x86 cores by Anonymous Coward · 2013-11-26 05:59 · Score: 0

Where does Slashdot get this utter dribble? Larrabee (the biggest chip design failure in Human History and the most expensive), that has since been renamed by Intel numerous times, was originally designed to be a GPU competitor to the graphics parts from Nvidia and ATI. Intel spent more on the Larrabee than the entire R+D budgets of both Nvidia and ATI combined across their entire history as graphics hardware companies.

Larrabee (now Knights Landing) is FILLED with x86 processors- it was always a stand-alone CPU system. The ONLY reason Intel provided test boards for this crap with a separate x86 CPU was for ease of programming- the 'ordinary' x86 chip allowed an 'ordinary' copy of Linux or Windows to run on the board, allowing the Larrabee chip to be accessed via drivers in the same way you'd talk to any other GPU.

Larrabee only continues (under disguised names and dishonest promotion about GPGPU), because having wasted BILLIONS on the original chip, it is pennies for Intel to keep a small Larrabee team continuing by comparison.

Larrabee (or whatever they want call call the latest version) is simply the world's worst mutli-processor architecture. Loads of ancient x86 cores attached to loads of obsolete SIMD floating point units, with a connected memory architecture to make grown programmers cry. It is a wooden bicycle with square wheels compared to the flying cars of AMD's hUMA and HSA parts as currently seen in the PS4, and next year in AMD's GPU and APU parts.

The 'best' comparison to Larrabee is the appalling 'CELL' design from IBM, that almost sank Sny when they chose it for their last console, the PS3. Like Larrabee, Cell was designed to replace the excellent GPUs from Nvidia and ATI with a dumb set of maths units controlled by fast weak CPU cores. The PS3, like Larrabee, was supposed to output all its graphics from this maths heavy CPU cluster.

When Sony saw how terrible Cell was at graphics, they RAN screaming to Nvidia, got down on their knees, and offered to pay Nvidia anything they wanted for access to one of their GPUs (and thus the retail PS3 was born). When Intel saw how terrible the Larrabee was at graphics, they simply told the tame technical press to PRETEND that Larrabee had been designed for scientific computing all along- hence this Slashdot story.

Re:Total garbage- Larrabee has always had x86 core by Anonymous Coward · 2013-11-26 06:44 · Score: 0

A fine troll. Complete and utter bullshit, but well done.

GPUs by DrYak · 2013-11-26 07:32 · Score: 1

I don't know about Niagara's, but according to docs about Warps and half-warps, that's how Nvidia GPU run CUDA.

They keep cycling through 2 or 4 threads, to hide memory latency.
(Except that each thread it self runs on a wide SIMD instead of a normal CPU. So the final size of parallel execution [=threads] is the amount of wraps in parallel x size of the SIMD).

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]

Slashdot Mirror

A Co-processor No More, Intel's Xeon Phi Will Be Its Own CPU As Well

53 comments