Intel To Offer Custom Xeons With Embedded FPGAs For the Data Center
MojoKid (1002251) writes For years, we've heard rumors that Intel was building custom chips for Google or Facebook, but these deals have always been assumed to work with standard hardware. Intel might offer a different product SKU with non-standard core counts, or a specific TDP target, or a particular amount of cache — but at the end of the day, these were standard Xeon processors. Today, it looks like that's changing for the first time — Intel is going to start embedding custom FPGAs into its own CPU silicon. The new FPGA-equipped Xeons will occupy precisely the same socket and platform as the standard, non-FPGA Xeons. Nothing will change on the customer front (BIOS updates may be required), but the chips should be drop-in compatible. The company has not stated who provided its integrated FPGA design, but Altera is a safe bet. The two companies have worked together on multiple designs and Altera (which builds FPGAs) is using Intel for its manufacturing. This move should allow Intel to market highly specialized performance hardware to customers willing to pay for it. By using FPGAs to accelerate certain specific types of workloads, Intel Xeon customers can reap higher performance for critical functions without translating the majority of their code to OpenCL or bothering to update it for GPGPU.
Intel Xeon customers can reap higher performance for critical functions without translating the majority of their code to OpenCL or bothering to update it for GPGPU
In other words, to help prevent people from buying AMD and nVidia products.
"By using FPGAs to accelerate certain specific types of workloads, Intel Xeon customers can reap higher performance for critical functions without translating the majority of their code to OpenCL or bothering to update it for GPGPU"
LOL. But they will have to translate it to Verilog or VHDL, which is far harder.
*IF* its not some lame, slow, tiny array.. and if you get full access to it ( HDL or something )
---- Booth was a patriot ----
No ignore that entire last sentence, it's dumb. FPGAs don't do floating point very well for one and even their integer performance will never rival a GPGU either in performance, or power. For another, I can and do, use both FPGAs and OpenCL/GLSL in my daily life and would infinitely prefer to port my functions to OpenCL over an FPGA. It's quite a bit more work to synthesize and validate an FPGA design than it is to write OpenCL code and debug the usual way.
I think it's far more likely customers are implementing custom hardware solutions using the FPGA related to power management, server management and datastructure infrastructure that can only be done with an FPGA in certain power domains. I say this having designed servers and dealt with the feature requests.
By using FPGAs to accelerate certain specific types of workloads, Intel Xeon customers can reap higher performance for critical functions without translating the majority of their code to OpenCL or bothering to update it for GPGPU.
What? This doesn't make sense. Unless Intel invented a way to automatically generate parallel code (in which case it could also be used in GPUs), somebody would have to rewrite the relevant parts of the program in VHDL, Verilog, OpenCL, or whatever.
`echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
As a hardware hacker, god I want one of these. On chip reprogramable DSP!? While it's a niche market, I'd love to get my hands on some, and not have to give up my favorite OS or build custom boards to do signal processing.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Dedicated FPGA HFT cards are ridiculously expensive. I wonder how integrated the FPGA will be in terms of interconnects with PCIE and the Xeon caches.
s/datastructure/datacenter/
caffeine, it's what should have been for breakfast.
So this is drop-in replacement pin compatible with existing Xeons? That means no dedicated I/O and a shared TDP. Seems like this is potential solution for certain CPU bound algorithms (crypto comes to mind) but nothing world shattering. If the cost barrier of entry is low enough this could be a boon for privacy. I wonder what sort of export regulations will be slapped on these devices?
... like hijacking decrypted data in realtime to send to the NSA.
A beer says you can sit this FPGA directly on the memory bus and snoop anything you want, including decrypted data.
High-Frequency Trading
FPGAs don't do floating point very well for one and even their integer performance will never rival a GPGU either in performance, or power.
Sure, and a hammer makes a terrible screwdriver. GPUs are specifically designed for register-to-register SIMD operations, so of course they are going to excel at that. But an FPGA is going to be better at bitstream operations, including many hashing and encryption algorithms.
See Stretch's ISEF
This is also to keep apple happy so they don't start adding coprocessors to their laptops and start moving away from intel. They've already been adding accelerators to their A7 line.
Which, of course, is why every hashing algorithm used in cryptocurrencies is being attacked with GPUs foremost and CPUs secondmost, rather than a relatively cheap FPGA board.
( With double-SHA256 and limited variants of Scrypt (and announcements for others) being attacked by ASICs - with parallels drawn for some cryptographic functions (e.g. AES) that are available on chips as it is. )
even their integer performance will never rival a GPGU either in performance, or power
You're making a sweeping statement while failing to take any notice of the parameters (how many ALUs on the GPU vs how many LUTs on the FPGA) or the constraints (FPGAs can do deep and narrow logic at one result per cycle throughput; ALUs on narrow integer logic are very inefficient). The BTC guys certainly seem to think an FPGA is superior to a GPU at SHA-128 computation, so your "will never rival" is looking a bit "actually it already does, for some applications".
power management, server management and datastructure infrastructure
Anything related to those things can be done with a regular outboard CPU, you don't have the timing constraints to justify an FPGA design.
The likely application for this is fast custom crypto, of course.
FPGAs don't do floating point very well for one and even their integer performance will never rival a GPGU either in performance, or power.
Sure, and a hammer makes a terrible screwdriver. GPUs are specifically designed for register-to-register SIMD operations, so of course they are going to excel at that. But an FPGA is going to be better at bitstream operations, including many hashing and encryption algorithms.
This.
FPGAs excel at crypto. They excel at other things too but Xeons in servers have a crypto load beyond that which your lowly desktop PC sees.
Maybe its good for big data indexing or stuff like that, but I don't know anything about that.
GPUs are primarily being used because everyone has one. FPGAs are about 10x faster and consume about the same amount of power as a GPU for stuff like bitcoin. the problem is the up-front cost and trust issues from the companies that make these pre-built FPGA boards.
Dude, Altera FPGAs already do OpenCL (says me, a guy who helped implement it).
I have a friend in that in graduate school used a motherboard that could take an Altera FPGA in one of the Xeon sockets. This seems like the next logical step; hopefully it's not too expensive so that the hardware is accessible to hobbiest/engineers. I am happy that both Xilinx and Altera offer cheap development boards so that we can play with the new offerings. It's easier to convince a boss to use it if we're familiar with it. (hint hint, wry grin)
I use the zynq processor at my job, and am very happy with the amount of flexibility you can get out of an embedded system having access to the FPGA and processor fabric; you can directly access gigasample ADC's, etc. When I first got into embedded systems on an FPGA, the processor was a soft-IP and not terribly fast. Both Xilinx and Altera now offer ARM processors that run up to 1GHZ. The amount of system flexibility is great. You can make major architectural choices without changing the hardware. You might have a data-path, or computation that is simply too intensive for a processor to handle.. You have the flexibility to port this portion to the logic side. If you're in a rapid prototyping mode and are constrained by board size and mechanical packaging constraints, FPGAs are great.
Debugging SoC still has it's challenges though. It's easy to program FPGAs, and easy to program the microprocessor. The tools are still a little clunky from Xilinx or Altera to handle their hybrid SoC parts. There is still work to be done to make them work more seamlessly.
Think of the fun that could be had with this. Fits in the same socket and does who knows what.
Of course, bitcoin mining using the SHA256 hash which is not a typical application for a GPU. It involves a lot of shuffling at the bit level, which a GPU was not designed for. For some other workloads, involving standard math operations, or large amounts of memory bandwith, the GPU will likely be faste.r
It may also be Achronix. They are also providing FPGA:s with Intel process.
http://www.achronix.com/
what about the ram / pci-e lanes that are part of the socket?
Maybe in 4 way boards or some 2 socket boards.
2 socket broads can work with only 1 cpu. But what will they do when they see a filled socket but no ram and no PCI-E io?
We're talking google and facebook here, they are perfectly willing to hire EEs capable of doing very low-level and very high-level VHDL designs, and they use custom motherboard designs already. So, they wouldn't really be restricted to software-only uses for the on-die FPGA like packet processing or custom crypto offloading.
These FPGAs could be used for better node interconnects. Getting it done using QPI is insane, the whole box can go unstable, and it becomes a large and unwieldy NUMA box, something neither google or facebook would need or want.
OTOH, give me a large enough FPGA directly wired to the CPU, and I could deploy an extremely low latency RDMA-based interconnect for hardware-assisted MPI that would be MUCH cheaper than deploying Myrinet-like interconnects over the entire DC (development cost is not that high, a two-man team can do it in one year, so it will be around US$ 1M tops, and you pay it only once). This looks like something more interesting for content providers with large DCs like google and facebook.
Of course, maybe they just want to run python/java bytecode directly on hardware or something. That'd be lame and terribly easy to implement (and likely not much of a gain over running it JITted in the main CPU).
This explains why Intel was selling fab time to an FPGA vendor.
I don't know what you're talking about here
They're not selling FPGA's in an Intel socket. They're selling Xeon CPU's with integrated FPGA's.
Image a Beowulf cluster of these!
Competition Good, Monopoly Bad.
That's great, so they're going to port their code to Open CL, then run it on your FPGA? Why not just buy a GPU and plug it in?
If they're really set on your FPGA, why not buy a PCIe attached version of your FPGA? Xilinx has them and they go up to pcie v3 x8? What about power? Datacenters care, FPGAs are going to use more power. Why is this a good idea?
Actually FPGA's consume FAR FAR LESS power than a GPU, and do much much more, that is why they are preferred (After ASIC's) for BitCoin mining.
Intel has already come up with an Atom CPU with integrated FPGA, but only for the embedded market.
I'd already been thinking about the possibility of end-user-accessible, on-the-fly-reprogrammable FPGA functionality as part of a "regular" computer before I heard Intel had produced an integrated CPU/FPGA (though it's not clear how easily configurable the FPGA was there). I raised the issue in that previous thread and got a *very* interesting and informative response (thank you Tacvek) that pointed out some major problems with the concept of general access to such functionality.
The issues raised there explain why Intel are unlikely to be making an easily-reconfigurable hybrid product like this available to the general public any time soon, however smart and exciting the idea sounds.
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
'Cryptocoin mining'.
Are YOU using the TOOL, or is the TOOL using YOU? Think about it!
So, does the FGPA "code" get swapped on a context switch?
Watch this Heartland Institute video
Preview, it's what should have come before Submit.
Some workloads perform much better on an FPGA, notably, realtime encoding/compression of HD H.264 video. I know because I've worked on such a broadcast quality encoder [currently being used by some major distribution outlets]. While you're right that it's harder to program an FPGA [in particular, validate the design], the performance gains can be huge. In particular, calculating motion vectors gets a win.
Note that H.264 DCT's are integer ones. And, with Intel's hybrid/onchip implementation, the FPGA logic could have access to the CPU's SIMD FP hardware. With Intel's hafnium and trigate technologies, adding the FPGA won't consume that much additional power.
Also note the benefits for search in an article just published today: http://arstechnica.com/informa...
Like a good neighbor, fsck is there
I bet HF trading ends up being a prime market for this technology.
In the course of every project, it will become necessary to shoot the scientists and begin production.
Alteras latest FPGAs have hard floating point. The 14nm ones from Intels fabs have over 10 Tflops per chip.
I hope they allow customer instructions like the NIOS soft CPU in Altera FPGAs. This lets you create customer HW to implement some function at the ISA level... Altera lets you do single or multi cycle return. And auto generates either an instruction for ASM use or some stubs for C. It really is super useful.
This is likely what Google is buying them for. Youtube.
"Bob, we're glad you Intel boys have finally come around".
"Hi Jeff. Yeah, well we just about got the marketing boys convinced enough to run with it. They managed to find an angle that flies pretty well. Gets us off the hook and gets your boys into a heck of a lot of servers!"
"Hey, it's a win-win so far as I'm concerned. Wish it could have been sooner though, but what with all this pressure from the purse-holders, we couldn't bankroll it for you".
"Times are getting tough, huh? It wasn't that long ago you NSA boys had infinitely deep pockets".
"The times we live in, Bob. What can you do?".
"You need any help with the rootkit code? Some of our wizards are pretty good"
"Thanks, Bob, but we already have it covered".