Adapteva Parallella Supercomputing Boards Start Shipping
hypnosec writes "Adapteva has started shipping its $99 Parallella parallel processing single-board supercomputer to initial Kickstarter backers. Parallella is powered by Adapteva's 16-core and 64-core Epiphany multicore processors that are meant for parallel computing unlike other commercial off-the-shelf (COTS) devices like Raspberry Pi that don't support parallel computing natively. The first model to be shipped has the following specifications: a Zynq-7020 dual-core ARM A9 CPU complemented with Epiphany Multicore Accelerator (16 or 64 cores), 1GB RAM, MicroSD Card, two USB 2.0 ports, optional four expansion connectors, Ethernet, and an HDMI port."
They are also releasing documentation, examples, and an SDK (brief overview, it's Free Software too). And the device runs GNU/Linux for the non-parallel parts (Ubuntu is the suggested distribution).
I could buy enough of these to cover the underside of the floor of my house and mine Bitcoins during the winter. Then I get radiant heat and useless fake money (which is probably just NSA's password cracker anyways).
sudo make me a sandwich
The ARM cores serve as a host for the Epiphany cores, roughly similar to the way an X86 CPU serves as a host to your video card. Epiphany is not ARM, it is a chip with a number of 1 GHz RISC cores that all communicate via a network-on-chip. So, it is optimized for doing a lot of floating-point arithmetic at very low power consumption.
I'm skeptical as to how useful this chip will be. High core counts are making supercomputing more and more difficult. Supercomputing isn't about getting massively parallel, but rather high compute performance, memory performance, and interconnect performance. If you can get the same performance out of fewer cores, then there will usually be less stress on interconnects. Parallel computing is a way to get around the limitations on building insanely fast non-parallel computers, not something that's particularly ideal. For things like graphics that are easily parallel, it's not much of a problem, but collective operations on supercomputers with hundreds of thousands to millions of cores are one of the largest bottlenecks in HPC code.
Supercomputers are usually just measured by their floating point performance, but that's not really what makes a supercomputer a supercomputer. You can get a cluster of computers with high end graphics cards, but that doesn't make it a supercomputer. Such clusters have a more limited scope than supercomputers due to limited interconnect bandwidth. There was even debate as to how useful GPUs would really be in supercomputers due to memory bandwidth being the most common bottleneck. Supercomputers tend to have things like Infiniband networking in multidimensional torus configurations. These fast interconnects give the ability to efficiently work on problems that depend on neighboring regions, and are even then a leading bottleneck. When you get to millions of processors, even things like FFT that have, in the past, been sufficiently parallel, start becoming problems.
Things like Parallella could be decent learning tools, but having tons of really weak cores isn't really desirable for most applications.
It has about half the gigaflops of a Core i7, and costs 80% less to buy.
It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset.
So total cost of ownership is about 90% less than the Core i7. Ten of them would spank the heck out of a Core i7 and cost the same.
> and what can you run on it ?
16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff - things where you run the same function on many pixels / samples / rows. So for face recognition, for example, the image would be broken up into 64 blocks and all of the blocks analyzed simultaneously on the 64 cores.
A database designed for the many cores could work well. For example, say you need to sort a table with 100,000 rows. On a system like this with 64 cores,
each core could simultaneously sort a group of 1,500 rows, then you'd merge those 64 sorted groups together ala merge sort. As a firewall, it could handle a blacklist with a million entries, as each core would handle simultaneously apply 1/64 of that list.
Yeah but compare it to a GPGPU and you start to realize how slow it is, a $200 660 GTX does 1880 GFLOPS in 140W.
1 GFLOPS/$ versus 9.4 GFLOPS/$
10 GFLOPS/Watt versus 13.4 GFLOPS/Watt
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.