Parallella: an Open Multi-Core CPU Architecture
First time accepted submitter thrae writes "Adapteva has just released the architecture and software reference manuals for their many-core Epiphany processors. Adapteva's goal is to bring massively parallel programming to the masses with a sub-$100 16-core system and a sub-$200 64-core system. The architecture has advantages over GPUs in terms of future scaling and ease of use. Adapteva is planning to make the products open source. Ars Technica has a nice overview of the project."
Comparing it to Pi is a little disingenuous. Reading the copy suggests there is an ARM core, plus some number of co-processors (perhaps like the Cell and its SPEs). That would make it a non-general-purpose processor. To compare apple-to-apples, we'd have to know how it compare to modern GPUs.
I've got a compute-bound embarrassingly parallel problem at work (real-time image processing in a very compact unit). This bears looking at. What is its I/O potential?
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
It is like saying a bong is going to be used for tobacco. It may be true for some but we all know how it will _really_ be used.
I checked their front page and they have a kickstarter going to fund further development.
Might want to check it out and chip in if you're interested.
http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone
the real question is:
how many double sha256 hashes can they do?
To make parallel computing ubiquitous, developers need access to a platform that is affordable, open, and easy to use.
They promise the latter three, but "access" seems a bit lacking. Also they specifically left out performance but talk it up in separate marketing materials (5 watts for 45 GFLOPs etc)
Some other alternatives optimizing for local maxima in the solution set:
Just simulate in software, if you don't care about speed but want to learn to program parallel. Erlang? They seem to have a fixation on C, why not use the right tool?
Go to opencores.org and stick a zillion cores on a off the shelf FPGA dev board. Or a fat stack of picoblaze or microblaze if you're willing to deal with the annoying licensing hassles (my advice, stick with opencores to avoid legal hassles, the weird licensing for the *blaze family is like the creepy dude in a van offering kids "free" candy)
They seem spread a bit thin based on clicking around the website. They're doing everything but invent hard AI and the warp drive on their website, which is a lot for just 4 people. Their kickstarter seems pretty firmly grounded in comparison.
One of those "infinite spare time" play toys would be to stick a bunch of 6809 cores (or pdp-8s or -11s or Z80s or whatever) on one of my FPGA boards and figure out the glue logic. Anyone with a big enough board could download by VHDL/Verilog and go for it on their own hardware.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
and the architecture is also very limiting.
16TFLOPS for $3000 or 0.09TFLOPS for $200. I'll stick to current hardware thanks. 178x more processing power for 15x more money. I would also prefer a "super computer" can address more than 4GB of RAM with more than 64bits of memory bandwidth. The architecture also limits the core cache to 64k.
The Parallax Propeller is a great multi-core chip to get started with. The chip is $7.95 and has 8 cores running at 80Mhz. You can pickup the Quickstart board at Radio Shack for $40, including an overpriced RS USB cable (they normally retail for $25).
The Parallax Propeller is a much more economical way of getting started with multi-core programming. Parallax offers the PropTool, which provides SPIN and PASM language support. For C development you can get SimpleIDE which is a great IDE to get started with C programming on the Propeller, which uses a port of GCC.
http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone/posts/323691
They have released their SDK and architecture documentation, worth a read. ...
Looks like an interesting platform, but the current performance indeed make me feel lacklusting
The multi- and many-core market is about to get crowded. After Tilera (www.tilera.com) there are now Kalray (www.kalray.eu) and the p2012 platform of ST microelectronics that produced silicon. And a lot of people working on research stuff, including open-source ones like soclib (www.soclib.fr). And it's not yet clear who's going to use all these architectures, even though logically this should be the way to go.
If you were to send messages from one Parallella to another Parallella, would they be called Parallellagrams?
How many of these would it take to, say, ray-trace Call of Duty: MW3 in real-time, 60 FPS? Would it cost less than using a modern graphics card to do the usual non-ray-traced rendering? That would be pretty cool.
I always equivocate. Well, almost always.
"The GA144-1.20 chip, with 144 self-contained computers and software-defined I/O, is available in a 1cm x 1cm, 88-pin QFN package." $20 / each, minimum order 10 (as far as I know): http://www.greenarraychips.com/home/products/index.html 200 USD buys you 1440 cores...
Perl Programmer for hire
The masses are just dying for massively parallel systems.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
Adapteva is creating false expectations here. Their chip won't deliver performance on par with GPUs (or CPUs, for that matter) and still be cheap. Why? Because it's not a thing that a startup can to in todays world of computing. For such a chip you need to use the latest CMOS processes and a huge team to design/optimize the ASIC (especially if it's meant to be a low power chip) -- both of which are extremely costly. If it was that easy, then we'd see more competition and not Intel, AMD, Nvidia and IBM as the only global players in the HPC arena.
If you're a small startup, then you'll be bound to 100nm processes (at best), and have to use automated layouts (not the hand-optimized ones e.g. Intel uses). Both reduce performance, increase power intake.
I work at the Chair for Computer Architecture at FAU. We have some of very brightest minds working at custom chips for industry solutions. This 2D CPU matrix that Adapteva proposes is something that my colleagues have played with years ago. It's a good approach and I personally believe that this will be the shape of CPUs to come. It started with the ring bus on the IBM Cell, now Intel's Nehalem has got an partitioned L3 cache connected with a... ring bus and Intel's Xeon Phi (MIC) even got a 2D on-chip grid network. But even my colleagues concede that a) on FPGAs you'll always be trailing GPUs concerning floating point performance (it's something FPGAs are particularly bad at) and b) even when designing an ASIC you'll always be beat by GPUs in terms of performance, assuming similar prices and power consumption. Those are simply beasts, optimized down to the bone. It's the result of a multi-billion mass market. That's also the reason why there is no next IBM Cell chip for a PlayStation 4: Cell was too expensive to develop to keep up with the competition. Its market is too small compared to the ubiquitous GPUs.
For teaching parallel computing I'd always suggest a GPU. The tools are there, the performance is great and you'll be able to use the knowledge gained in real-world projects.
Computer simulation made easy -- LibGeoDecomp
1. mine bitcoins on you parallella ....
2. convert bitcoins to USD
3. travel back in time with USD
-1. use USD to fund the kickstater for parallella
. profit
"allella" sounds so much kewler
Unequivocally the realest of the realz...
The Epiphany core has a mere 35 instructions – yup, that is RISC alright – and the current Epiphany-IV has a dual-issue core with 64 registers and delivers 50 gigaflops per watt. It has one arithmetic logic unit (ALU) and one floating point unit and a 32KB static RAM on the other side of those registers.
Each core also has a router that has four ports that can be extended out to a 64x64 array of cores for a total of 4,096 cores. The currently shipping Epiphany-III chip is implemented in 65 nanometer processors and sports 16 cores, and the Epiphany-IV is implemented in 28 nanometer processes and offers 64 cores.
The secret sauce in the Epiphany design is the memory architecture, which allows any core to access the SRAM of any other core on the die. This SRAM is mapped as a single address space across the cores, greatly simplifying memory management. Each core has a direct memory access (DMA) unit that can prefetch data from external flash memory.
The initial design didn't even have main memory or external peripherals, if you can believe it, and used an LVDS I/O port with 8GB/sec of bandwidth to move data on and off the chip from processors. The 32-bit address space is broken into 4,096 1MB chunks, one potentially for each core that could in theory be crammed onto a single die if process shrinking continues.
seriously, though, what does it run? the article doesn't say except to use the nebulous term "open source". or are they planning on schlepping off the initial software development to the open source community too? (good luck with that)
where is my RapsberyParallela?
OK, I'm giving up my power to moderate on this story to ask a few questions. Let's hope the answers are worth it...
My understanding, so far, for what it's worth, is that the key features of the Epiphany architecture are:
I'm old enough to remember when discussions on Slashdot were well informed.
$99 for a dual core arm dev board, with gigabit network, is not bad, even if you completely ignore the epiphany co-processor. its about as powerful as a pandabaord, and much cheaper. if you can take advantage of the epiphany you are well ahead.
Besides the open-source, how is this project any different from what Tilera already has? http://www.tilera.com/
When they taped out first silicon last year there was talk of its potential as a game emulator for the PS2 on cell phones.
The thing about CPUs that makes Adeptiva's statement not particularly impressive is that in almost all cases, the ISA of a CPU _must_ be published, otherwise you can't get developers to write code for it. But an ISA is just a language, not an implementation. A CPU that is not "open" by their definition is completely worthless.
GPUs are much worse because they've always been peripherals, hidden behind a driver, which is responsible for generating rendering commands from OpenGL and JIT-compiling virtual instruction sets like PTX.
The ISA is in appendix A: http://www.adapteva.com/support/docs/e3-reference-manual/
After looking into this I think it is aimed at robotics/ embedded DSP. This wont compete in the GPU space, but then GPUs don't really do much in the embedded space, other than drive a VDP.
To me this looks more like the approach that XMOS took, but with less capability in the I/O ring, and more generic, ANSI C friendly, support. If this had a better path to running as a stand-alone chip with a QFP pinout it would be far more compelling.
It is a shame that they seem to have screwed up their marketing on Kickstarter. If they had generated more buzz early on, they might have made their extended goal.
So far this is the best example of a 2D general purpose computing fabric I have seen that is something other than total vaporware. (they do appear to have x16 chips and demo boards, even if the prices are a little high)
I'm not sure if you're trying to correct me or not. I assumed that the ISA would be published, along with lots of architectural details. I'm just saying that they HAVE to be published or else the CPU is worthless, so by saying that they're published, Adeptiva isn't doing anything special.