A Co-processor No More, Intel's Xeon Phi Will Be Its Own CPU As Well
An anonymous reader writes "The Xeon Phi co-processor requires a Xeon CPU to operate... for now. The next generation of Xeon Phi, codenamed Knights Landing and due in 2015, will be its own CPU and accelerator. This will free up a lot of space in the server but more important, it eliminates the buses between CPU memory and co-processor memory, which will translate to much faster performance even before we get to chip improvements. ITworld has a look."
Moore's law is not coming back from the grave, or is it ?
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
I thought that we already had GPUs embedded in CPUs. How embedding CPU inside GPU makes it so much different and breakthrough?
Knights Landing will be available as both an accelerator card and a standalone CPU with some sort of large high-speed memory pool on the die.
No kidding!!! What do you say at this point?
Good. The current generation Phi cards are a pain to administer. With luck the new generation will be more fully baked.
- very hot card, no fans
- depends on software to down throttle the cards (mine have hit 104C)
- stripped down OS running on the cards, poor user facing directions for the usage
Anyway, enough from me.
20 characters max for the password? How will I use my favorite poems as passwords?
For a Phi, the selling point is about ease of programming. The memory model of the accelerator card is a pain in the ass, making development more difficult. This on top of the fact that the administration of those are pretty limited and annoying. MPSS is crap for everyone, and one of the critical differences here is that the standalone accelerator might not require Intel to be the linux distribution curator anymore (they frankly suck pretty hard at it).
Intel having a standalone variant pretty much obviates the utility of an accelerator card model for all but perhaps the tiniest usage and makes things far more simpler. Trying to get the same workload to work across Phi and main CPUs is, in practice, more about trying to make the best of an awkward heterogeneous compute situation. While you still can and will run jobs heterogeneous if you do have both Phi and normal Xeon nodes (e.g. a top500 run and... well not much else), it is done using more typical methods of MPI.
In short, this move pretty much let's intel focus on the pieces they *are* good at (making a decent processor) and get away from the stuff they aren't so good at (pcie hosted device, linux distribution design, etc).
The 80486 was the first Intel processor with integrated coprocessor, coming at about €1000 (only know the DM price). There was a considerably cheaper version, the 80486SX "without" coprocessor (actually, the coprocessor was usually just disabled, possibly because of yield problems, and still took current).
One could buy an 80487 coprocessor that provided the missing floating point performance. Customers puzzled how the processor/coprocessor combination could be competitive without the on-chip communication of the 80486. The answer was that it did not even try. The "coprocessor" contained a CPU as well and simply switched off the "main" processor completely. It was basically a full 80486 with different pinout, pricing, and marketing.
It was probably phased out once the yields became good enough.
If your system vendor sold you a Phi solution without doing the cooling right, they need to take care of you. Some server vendors actually invested some actual attention in making sure the thermals were correct before selling, and a vendor that sold without taking that care should suffer for it.
These processors are like an Intel version of Sun Niagara, but with wider vector. Actually, from an architectural perspective Xeon Phi (Larrabee) is pretty basic. They’re an array of 4-way SMT in-order dual-issue x86 processors, with 512-bit vector units. I think one of the major reasons Xeon Phi doesn’t compete well with GPUs on performance is that legacy x86 ISA translation engine taking up so much die area. Anyhow, so if you have a highly parallel algorithm, then Xeon Phi will be a boon for performance.
But as we know, there are numerous very important algorithms that are not parallelizable.
I could see using this, whereas I couldn't see myself using the card version. If the cost premium is reasonable this could be awesome for image processing. I have an image algorithm I use CUDA for and moving the data around consumes almost as much time as processing the data. If I had this in my servers I would have flexibility and much greater performance with this solution. --Robert
wouldn't an embedded in the cpu xeon phi version, lack the necessary GDDR4/5 which exists in the PCI-express card version with its 200-300GB/s of throughput, and be forced to just access the main computer RAM at about 40-50GB/s?
Where does Slashdot get this utter dribble? Larrabee (the biggest chip design failure in Human History and the most expensive), that has since been renamed by Intel numerous times, was originally designed to be a GPU competitor to the graphics parts from Nvidia and ATI. Intel spent more on the Larrabee than the entire R+D budgets of both Nvidia and ATI combined across their entire history as graphics hardware companies.
Larrabee (now Knights Landing) is FILLED with x86 processors- it was always a stand-alone CPU system. The ONLY reason Intel provided test boards for this crap with a separate x86 CPU was for ease of programming- the 'ordinary' x86 chip allowed an 'ordinary' copy of Linux or Windows to run on the board, allowing the Larrabee chip to be accessed via drivers in the same way you'd talk to any other GPU.
Larrabee only continues (under disguised names and dishonest promotion about GPGPU), because having wasted BILLIONS on the original chip, it is pennies for Intel to keep a small Larrabee team continuing by comparison.
Larrabee (or whatever they want call call the latest version) is simply the world's worst mutli-processor architecture. Loads of ancient x86 cores attached to loads of obsolete SIMD floating point units, with a connected memory architecture to make grown programmers cry. It is a wooden bicycle with square wheels compared to the flying cars of AMD's hUMA and HSA parts as currently seen in the PS4, and next year in AMD's GPU and APU parts.
The 'best' comparison to Larrabee is the appalling 'CELL' design from IBM, that almost sank Sny when they chose it for their last console, the PS3. Like Larrabee, Cell was designed to replace the excellent GPUs from Nvidia and ATI with a dumb set of maths units controlled by fast weak CPU cores. The PS3, like Larrabee, was supposed to output all its graphics from this maths heavy CPU cluster.
When Sony saw how terrible Cell was at graphics, they RAN screaming to Nvidia, got down on their knees, and offered to pay Nvidia anything they wanted for access to one of their GPUs (and thus the retail PS3 was born). When Intel saw how terrible the Larrabee was at graphics, they simply told the tame technical press to PRETEND that Larrabee had been designed for scientific computing all along- hence this Slashdot story.
A fine troll. Complete and utter bullshit, but well done.
I don't know about Niagara's, but according to docs about Warps and half-warps, that's how Nvidia GPU run CUDA.
They keep cycling through 2 or 4 threads, to hide memory latency.
(Except that each thread it self runs on a wide SIMD instead of a normal CPU. So the final size of parallel execution [=threads] is the amount of wraps in parallel x size of the SIMD).
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]