Adapteva Parallella Supercomputing Boards Start Shipping

MAME? BitCoin? by Anonymous Coward · 2013-07-23 07:57 · Score: 1

The first comment to mention MAME or BitCoin wins.

Re:MAME? BitCoin? by Anonymous Coward · 2013-07-23 08:01 · Score: 0

We have a winner!

Not fully open source by Anonymous Coward · 2013-07-23 07:59 · Score: 0, Interesting

A lot of things are "open" and "free" but... creating FPGA bitstreams requires proprietary software from Xilinx, and it's arguably one of the worst ever piece of crappy bloatware ever...
Good luck guys.

Re:Not fully open source by t4ng* · 2013-07-23 08:23 · Score: 1

Help me out here. The Adapteva sales pitch is claiming you get faster time to market by not having to do any FPGA programming (ANSI-C and OpenCL for the multicore coprocessors). The Zynq processor seems to be just for the host OS, which they say can run Ubuntu out of the box and they provide open source development tools for everything else. No mention of Xilinx anywhere that I can see. Am I missing something?
Re:Not fully open source by gl4ss · 2013-07-23 08:31 · Score: 2

Help me out here. The Adapteva sales pitch is claiming you get faster time to market by not having to do any FPGA programming (ANSI-C and OpenCL for the multicore coprocessors). The Zynq processor seems to be just for the host OS, which they say can run Ubuntu out of the box and they provide open source development tools for everything else. No mention of Xilinx anywhere that I can see. Am I missing something?
he was probably confusing this with http://www.kickstarter.com/projects/1106670630/mojo-digital-design-for-the-hobbyist
which pretty much means he didn't read even half of TFS.

--
world was created 5 seconds before this post as it is.
Re:Not fully open source by Sponge+Bath · 2013-07-23 08:34 · Score: 2

Looking at the FPGA code, it targets Xilinx devices. The OP points out using proprietary (but free) tools from Xilinx makes it "not open", I guess in the same way that the chips used on the card are "not open". I think it misses the point, but whatever.
Re:Not fully open source by hamster_nz · 2013-07-23 09:45 · Score: 2

It is a shame that you posted as an anonymous coward here. I'ld love to understand your thinking on this. As far as I see it, this is a win as the source code for the FPGA logic will be open, making this much like using Visual Studio to build an other Open Source project - hardly an Open Source fail.
I would also like to know if you run on Sparc CPUs as they are "open" (with published HDL source), rather than on Intel or ARM? If not, how can you defend that your favourite Open Source project (say Apache) running on Linux on an Intel system board is more "Open Source" than this? Do you have the source code for your MoBo's chipset?
You will with the Parallella...
Re:Not fully open source by Gravis+Zero · 2013-07-23 09:55 · Score: 2

the FPGA is the host for the CPU and communications with the Epiphany processor so you never need to change the FPGA at all. it's the Epiphany processor is what you are developing for, not an FPGA. the functionality of the FPGA is open, so you could use it just like any other IC if you really wanted.

--
Anons need not reply. Questions end with a question mark.
Re:Not fully open source by wiredlogic · 2013-07-23 09:59 · Score: 2

Proprietary software which can be used for free with very reasonable size and device limitations. Plus if you don't like the GUI you can always run the traditional command line tools to build a bitstream if you want.

--
I am becoming gerund, destroyer of verbs.
Re:Not fully open source by Sponge+Bath · 2013-07-23 11:47 · Score: 2

manufacturing of physical goods can still be paid
How magnanimous of you.

In other words: You deal with organized crime.
By your standards, 100% of the electronics, computer and software industry is organized crime. That may stroke your ideological fervor, but it's of little practical value. Even Linus Torvalds uses a machine where less than 100% of the IP for all parts, software and manufacturing equipment is open. I'll happily continue using devices, participating in that industry and earning a living. I'm wondering how you made that post while avoiding any contact with the product of, as you label it, organized crime.
Re:Not fully open source by Khyber · 2013-07-23 16:58 · Score: 1

"By your standards, 100% of the electronics, computer and software industry is organized crime."
Haven't been paying attention to the Panasonic case, I see.
Every one of these companies is colluding and conspiring. This time, one got caught.
Welcome to reality, child.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Re:Not fully open source by Khyber · 2013-07-23 17:00 · Score: 1

"you never need to change the FPGA at all."
Until you want to be able to handle the bandwidth of a huge parallel processing unit.
And a typical FPGA will struggle with more than 8 TRUE cores currently. We've tested it. It would not work for our requirements, it was insufficient in bandwidth department.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Re:Not fully open source by Anonymous Coward · 2013-07-23 22:24 · Score: 0

Actually 95% doing it, 4.99% supporting it even though they say they are against it, but YES!
THAT IS MY EXACT POINT.
And it has nothing to do with ideology, but with pure and simple physics. The laws of nature. Resources. Natural selection.
Somebody who takes money, but gives you nothing (but smoke and mirrors) in return, is a THIEF!
A thief is a criminal.
If it is done in a company, it is organized.
So it is organized crime.
QED
Of course, being an Ameritard, you neither have the brain power, nor the freedom from socially conditioned delusions, to comprehend this. it's like crossing a North Korean with a very primitive monkey, and spicing it with lots of asshole, and then expecting the result to accept that Kim Yong Whoever is not God.
Re:Not fully open source by Gravis+Zero · 2013-07-24 07:02 · Score: 1

you seem to be confusing the Epiphany chip (silicon) with the Zync (FPGA+ARM) host chip or something. the Epiphany chips contain 16 or 64 "true" cores and the chips connect directly together.
what were you talking about?

--
Anons need not reply. Questions end with a question mark.

fail even at advertising! by Anonymous Coward · 2013-07-23 08:02 · Score: 1

If all you are gonna do is advertise, at least do it right!

There is no micro SD included by default and the connectors are micro USB and micro HDMI. Big fail!

Re:fail even at advertising! by maliqua · 2013-07-23 09:17 · Score: 1

Damn them for making it use the same connectors and including the same standard equipment in the base package as every other similar product...

cool - but I7 can out run it ? by Anonymous Coward · 2013-07-23 08:05 · Score: 0

So how fast is this cluster compare to an Intel I7 ?
and what can you run on it ?

Do I have to be the first one? by Anonymous Coward · 2013-07-23 08:09 · Score: 2, Funny

Very well:

Imagine a Beuowulf Cluster of these!

Re:Do I have to be the first one? by Anonymous Coward · 2013-07-23 08:47 · Score: 0

Very well:
Imagine a Beuowulf Cluster of these!
I did...
But does it run Linux?

Here's a thought by Sparticus789 · 2013-07-23 08:10 · Score: 4, Funny

I could buy enough of these to cover the underside of the floor of my house and mine Bitcoins during the winter. Then I get radiant heat and useless fake money (which is probably just NSA's password cracker anyways).

--
sudo make me a sandwich

Re:Here's a thought by Anonymous Coward · 2013-07-23 09:57 · Score: 1

I literally don't even use heating. My heating is computer based.
I like to think of it as Efficient Heating.
Re:Here's a thought by TeknoHog · 2013-07-23 20:13 · Score: 2
- If you're going to buy hardware for Bitcoin mining, there are much, much more efficient alternatives, and they still produce plenty of heat.
- I think USD is useless fake money, because I cannot use it locally, but there are other places where you can use it to buy plenty of stuff, and then there are exchanges.
- The software is open source so you can see if it's cracking passwords for yourself, instead of randomly guessing.
--
Escher was the first MC and Giger invented the HR department.
Re:Here's a thought by Sparticus789 · 2013-07-24 01:50 · Score: 1

I may be able to line the bottom of my floor with GPUs, connected via a custom PCI extension cables into a large (really large) chassis. But if take numbers into account, I have about a 1,000 square foot house. Let's say an average sized GPU is 4" x 12" (just for round numbers). And let's assume that I place 2 GPU's per square foot. That comes out to 2,000 GPUs, and a lot of money.
Think I will stick to wearing slippers in the winter.

--
sudo make me a sandwich

Tiny but useful? by colin_faber · 2013-07-23 08:12 · Score: 1

So it's interesting, a light weight ARM processor, without anything better than micro USB and micro HDMI. Neat yes, but really? Useful? Maybe as a wireless router, or some other PoE like device but as a useful processing system? Um...

Even linking many of these together - neat, but again, the world of MPI is based on completely different processor designs and interconnects, you're talking huge amount of time and effort to replicate something on a unique platform which may or may not ever see wide spread acceptance by the developer base.

Re:Tiny but useful? by jvangeld · 2013-07-23 08:23 · Score: 4, Informative

The ARM cores serve as a host for the Epiphany cores, roughly similar to the way an X86 CPU serves as a host to your video card. Epiphany is not ARM, it is a chip with a number of 1 GHz RISC cores that all communicate via a network-on-chip. So, it is optimized for doing a lot of floating-point arithmetic at very low power consumption.
Re:Tiny but useful? by Anonymous Coward · 2013-07-23 08:26 · Score: 0

What are the alternatives to the Parallella? Assuming I'd like to play around with parallel computing with my own hardware, what competitors exist?
Re:Tiny but useful? by rthille · 2013-07-23 09:20 · Score: 1

Your video card, assuming you've got a fairly modern one which supports the various GPGPU programming models.

--
Awesome furniture, accessories and cabinetry in Santa Rosa, CA: http://humanity-home.com/
Re:Tiny but useful? by clong83 · 2013-07-23 12:22 · Score: 1

There is a high barrier to entry for piddling around with graphic cards. Fortunately, most home computers are already parallel (2-8 cores). I do extensive parallel programming, and I do most of the testing (for small problems, anyhow) on my desktop or laptop, which each have 8 cores.

There is simply a set of "parallel" function calls which can be built directly into your code. You then just need to compile your code with the proper libraries, usually either mpich or OpenMP. I believe both are available in the ubuntu repository. Pick the one that is most promising for your problem. They are fundamentally two distinct approaches to parallelism, each useful at times. Lawrence Livermore has a great tutorial site, including loads of example fortran and C codes of both openMP and mpich. THey are ready to compile and run on your home computer. Happy computing!

Tutorial: https://computing.llnl.gov/tutorials/mpi/
Example codes: https://computing.llnl.gov/tutorials/mpi/exercise.html#Exercise1
Re:Tiny but useful? by clong83 · 2013-07-23 12:25 · Score: 1

Sorry, forgot to also post link to the OpenMP tutorial:
https://computing.llnl.gov/tutorials/openMP/

Real world use? by kwerle · 2013-07-23 08:16 · Score: 2

Anyone out there in /.-land plan on getting these for a real project?

Tell us about it! What language/OS/purpose?

Just curious...

Parallel is not necessarily better by IAmR007 · 2013-07-23 08:33 · Score: 5, Insightful

I'm skeptical as to how useful this chip will be. High core counts are making supercomputing more and more difficult. Supercomputing isn't about getting massively parallel, but rather high compute performance, memory performance, and interconnect performance. If you can get the same performance out of fewer cores, then there will usually be less stress on interconnects. Parallel computing is a way to get around the limitations on building insanely fast non-parallel computers, not something that's particularly ideal. For things like graphics that are easily parallel, it's not much of a problem, but collective operations on supercomputers with hundreds of thousands to millions of cores are one of the largest bottlenecks in HPC code.

Supercomputers are usually just measured by their floating point performance, but that's not really what makes a supercomputer a supercomputer. You can get a cluster of computers with high end graphics cards, but that doesn't make it a supercomputer. Such clusters have a more limited scope than supercomputers due to limited interconnect bandwidth. There was even debate as to how useful GPUs would really be in supercomputers due to memory bandwidth being the most common bottleneck. Supercomputers tend to have things like Infiniband networking in multidimensional torus configurations. These fast interconnects give the ability to efficiently work on problems that depend on neighboring regions, and are even then a leading bottleneck. When you get to millions of processors, even things like FFT that have, in the past, been sufficiently parallel, start becoming problems.

Things like Parallella could be decent learning tools, but having tons of really weak cores isn't really desirable for most applications.

Re:Parallel is not necessarily better by neonsignal · 2013-07-23 08:53 · Score: 2

But indeed, it is the learning experience that is required, because cores are not getting particularly faster, and we are going to have to come to grips with how to parallelize much of our computing. The individual cores in this project may not be particularly powerful, but they aren't really weak either; the total compute power of this board is more than you are going to get out of your latest Intel processor, and uses a whole lot less power. Yes, it isn't ideal given our current algorithms and ways of writing programs, but massive parallelism is at the centre of performance computing, and will be for the foreseeable future.
Re:Parallel is not necessarily better by Anonymous Coward · 2013-07-23 08:53 · Score: 0

> Things like Parallella could be decent learning tools,
It's a good thing that's exactly what it is intended to be then.
Re:Parallel is not necessarily better by UnknownSoldier · 2013-07-23 09:00 · Score: 1

Very *nice* comment -- spot on.
Only other thing to mention is that supercomputing trades latency for bandwidth. i.e. high latency but vastly high bandwidth.
Intel does a great job of masking latency on x86 so we get "relatively" low latency for memory but it's bandwidth is crap compared to a "real" supercomputer or GPGPU.
Re:Parallel is not necessarily better by dargaud · 2013-07-23 09:13 · Score: 1

So how does this compare to a, say, Xeon Phi ?

--
Non-Linux Penguins ?
Re:Parallel is not necessarily better by IAmR007 · 2013-07-23 09:19 · Score: 1

Well, x86 CPUs are designed to do a hell of a lot more than compute. Their advanced caches and other complex features take a lot of die area but make them well suited for general computing and complex algorithms.

You are right that our current algorithms will have to change. That's one of the major problems in exascale research. Even debugging is changing, too, with many more visual hints to sort through millions of logs. Algorithms may start becoming non-deterministic to reduce the need to communicate, for example. Of course, I'm referring to millions of cores, here. Desktop applications using a few cores is a much simpler task, but still an area that a lot of developers lack good training in. At least the methods have been largely figured out for things at the consumer and server level.
Re:Parallel is not necessarily better by ShieldW0lf · 2013-07-23 09:25 · Score: 4, Insightful

This device in particular only has 16 or 64 cores, but the Epiphany processor apparently scales up to 4,096 processors on a single chip. And, the board itself is open source.
So, if you developed software that needed more grunt than these boards provide, you could pay to get it made for you quite easily.
That's a big advantage right there.

--
-1 Uncomfortable Truth
Re:Parallel is not necessarily better by Gravis+Zero · 2013-07-23 09:47 · Score: 1

Parallel computing is a way to get around the limitations on building insanely fast non-parallel computers
by limitations, i'm assuming you mean the laws of physics.

Parallel computing is ... not something that's particularly ideal
it's merely a new paradigm in order to continue processing data faster and it wont be the last.

High core counts are making supercomputing more and more difficult. Supercomputing isn't about getting massively parallel ...
collective operations on supercomputers with hundreds of thousands to millions of cores are one of the largest bottlenecks in HPC code.
the Epiphany architecture is currently limited to 4096 interconnected cores because all the registers and memory (RAM) are memory mapped and the address space is limited. so if you are using 64 core chips it's 8x8 chips.

Supercomputing isn't about getting massively parallel, but rather high compute performance, memory performance, and interconnect performance. If you can get the same performance out of fewer cores, then there will usually be less stress on interconnects.
communication between cores is actually quite fast, 102 GB/s Network-On-Chip and 6.4 GB/s Off-Chip Bandwidth. so for 4096 cores, memory bandwidth is not a problem. the RAM and DMA communication system is actually separate from the cores, so you can pool your memory automagicly and not see any slowdown. you dont miss any cycles waiting for communications.
the roadmap on their site shows that they expect to have 64000 cores per chip in 2018, so that's going to be interesting.

Supercomputers are usually just measured by their floating point performance
umm... got a source for that claim?

having tons of really weak cores isn't really desirable for most applications.
each core is 1GHz (on the 16 core chip) and 800MHz (on the 64 core chip) with 32KB (RAM) each which as i said before can be pooled without penalty. those aren't what i would call weak cores.

--
Anons need not reply. Questions end with a question mark.
Re:Parallel is not necessarily better by Anonymous Coward · 2013-07-23 10:35 · Score: 0

I think it's a little disingenuous of them to promote this thing as a supercomputer when the near-term applications of this tech are more in the DSP space. There already are multicore DSPs, just none with this many cores; as you say, parallel is not inherently superior.
Perhaps they realized the difficulty of going directly up against Texas Instruments and Analog Devices in this space.
Re:Parallel is not necessarily better by Anonymous Coward · 2013-07-23 10:51 · Score: 0

I think you're barking up the wrong tree here. I don't think that this was ever meant for high end computing. Most of these types of things (cheap small boards like this) aren't meant or designed for high end applications/systems. They're meant for hobbists, students, etc etc.
Of course, machines being parallel isn't "good," in the sense that they're harder to program and you introduce more race conditions, more likely to have transmission errors with comm between computers etc etc, but we've hit the frequency wall. I mean, I'd love a 20GHz cpu, but I also don't want a nuclear reacter size cooling system for my computer. Plus the memory subsystem would be crying the whole time too. So the only thing we currently can do is add parallelism to our computers and programs if we want to run them faster. There was a paper (I forget the title/author, I can almost feel my advisor sighing) that was about a study on a fast serial core vs many parallel cores, it basically said that either was optimial for about 5% of the cases. So serial doesn't really cut it most of the time. Parallel cores and accelators is how we get meaningful speed ups most of the time now. Sure, it introduces problems, but what doesn't?
Generally there are four main issues in HPC: I/O and memory, resilience, programmer productivity and power. Actual computation is cheap. I/O (memory is I/O so..) is just one of the problems. When you have these massive programs, all sort of problems come up and there isn't an easy fix for it, whether it's serial or parallel.
I want one though, if anything it's a cheap FPGA platform~
Re:Parallel is not necessarily better by phriot · 2013-07-23 12:01 · Score: 1

The stated goal of the project is to offer affordable access to tools for learning how to do parallel programming. At $99 for the board and very low power consumption, I would think that this makes learning easier than building your own cluster, no?
Re:Parallel is not necessarily better by dbIII · 2013-07-23 17:00 · Score: 1

Supercomputing isn't about getting massively parallel
A lot of it is even if it isn't all like that. For instance a lot of seismic data processing is about applying the same filter to tens of millions of traces, which are effectively digitised audio tracks. Such a task can be divided up to a resolution of a single trace or whatever arbitrary number makes sense given available hardware.

So even if it "isn't really desirable for most applications" there are still plenty of others where it is desirable.
Re:Parallel is not necessarily better by dbIII · 2013-07-23 17:06 · Score: 1

by limitations, i'm assuming you mean the laws of physics
It's still within the realms of manufacturing constraints at this point. A co-worker was making diodes a couple of atomic layers thick before 2000 but making a circuit at that scale in 2D is going to take a lot more work.
Re:Parallel is not necessarily better by AmiMoJo · 2013-07-24 00:17 · Score: 1

So it's an evaluation board that may lead you to contract them for a larger, as yet undeveloped device? That's fine, but isn't really selling it us.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Parallel is not necessarily better by Anonymous Coward · 2013-07-24 03:06 · Score: 0

http://www.top500.org/list/2013/06/
Top 100 fastest supercomputers in the world include 6 that actually do use Ethernet. Two are using gigabit Ethernet (the other 4 use 10G). Modern super computers do have interconnect issues, but part of that is because a typical system has >10000 cores for general compute plus specialized compute cores (possibly GPU) on top of that. That is beyond what a Linksys can do.

Can You Imagine a Beowulf Cluster of These? by Anonymous Coward · 2013-07-23 08:47 · Score: 0

One of the popular comment on slashdot is now relevant.

half the Gflops, 64 cores, 80% lower cost, 5 watts by raymorris · 2013-07-23 08:55 · Score: 3, Informative

It has about half the gigaflops of a Core i7, and costs 80% less to buy.
It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset.
So total cost of ownership is about 90% less than the Core i7. Ten of them would spank the heck out of a Core i7 and cost the same.

> and what can you run on it ?

16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff - things where you run the same function on many pixels / samples / rows. So for face recognition, for example, the image would be broken up into 64 blocks and all of the blocks analyzed simultaneously on the 64 cores.
A database designed for the many cores could work well. For example, say you need to sort a table with 100,000 rows. On a system like this with 64 cores,
each core could simultaneously sort a group of 1,500 rows, then you'd merge those 64 sorted groups together ala merge sort. As a firewall, it could handle a blacklist with a million entries, as each core would handle simultaneously apply 1/64 of that list.

tis already a cluster - 64 cores by raymorris · 2013-07-23 08:57 · Score: 2

With 64 cores, I'd say it's already a cluster. A dozen of these ($1200) would have 768 cores and fit in a microatx case. :)

not a FRICKING supercomputer! by markhahn · 2013-07-23 09:06 · Score: 1

where do people get their definition of supercomputer? a supercomputer is what you have when your compute needs are so large that they shape the hardware, network, building, power bill. this thing is just a smallish multicore chip, like many others (now and in the past!)

Re:not a FRICKING supercomputer! by Anonymous Coward · 2013-07-23 10:13 · Score: 0

'Supercomputer' thats about 1/2 the performance of a single mid-end i7 chip. trollolol.
Re:not a FRICKING supercomputer! by Maximum+Prophet · 2013-07-23 11:28 · Score: 1

where do people get their definition of supercomputer?

From the 1960's. The CDC 6000's designed by Seymour Cray were the first "Super Computers". Each "Core" had about 30 mips.

--
All ideas^H^H^H^H^Hprocesses in this post are Patent Pending. (as well as the process of patenting all postings)

Doesn't appear to be cost-effective by Hentes · 2013-07-23 09:11 · Score: 1

This thing is promised to do 90Gflops and costs 100$. A HD7870 can do 2500Gflops for 300$. Sure, you need to build a rig around it, but you'll still be way better off then soldering together a tower of 25 of these boards.

Re:Doesn't appear to be cost-effective by Anonymous Coward · 2013-07-23 09:23 · Score: 0

so we're assuming power is free in this scenario and cooling isnt an issue.
Re:Doesn't appear to be cost-effective by jon3k · 2013-07-23 09:47 · Score: 1

Good point really, what's the watt:gflop ration for those two scenarios? I'm assuming we can get at least, what, 3 or 4 AMD cards in a server?
Re:Doesn't appear to be cost-effective by citizenr · 2013-07-23 11:09 · Score: 2

now mount that HD7870 inside RC plane, or a quad drone
the closest you can get is mali t604 doing 68 GFLOPS or mali t658 at 272 GFLOPS (theoretical numbers, but everyone including amd uses those)

--
Who logs in to gdm? Not I, said the duck.
Re:Doesn't appear to be cost-effective by Mabhatter · 2013-07-23 11:48 · Score: 2

bingo. if you've seen some of the crazy acrobatic stuff being done with quad copters over on TED that is using several remote PCs and remote control. The programming could probably all be packed into one of these boards and built right into each copter.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by afidel · 2013-07-23 09:31 · Score: 4, Informative

Yeah but compare it to a GPGPU and you start to realize how slow it is, a $200 660 GTX does 1880 GFLOPS in 140W.

1 GFLOPS/$ versus 9.4 GFLOPS/$
10 GFLOPS/Watt versus 13.4 GFLOPS/Watt

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by AmiMoJo · 2013-07-23 09:46 · Score: 2

16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff

Low end ARM cores do that already in a low cost, low power package. I really can't see how this device would be economic for any of those things - even if you need to do facial recognition on multiple image streams at once low cost ARM cores will be cheaper. You also have the difficulty of interfacing so many video streams to a single parallel processing device; it would be easier to have lots of smaller devices.

As a firewall, it could handle a blacklist with a million entries

Again, current ARM based routers can handle such lists. IP address lists or simple URL lists with a few wildcards are no problem. I suppose if you wanted a million complex regex rules then having 64 cores would help, but if you do have such a list you need to write better regular expressions.

Low end servers are about the only application where this makes sense, and even then the added cost of having to write software specifically for these cores probably outweighs any power/performance gains over ARM and you still have the I/O issues I mentioned earlier.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC

In the purist terms by maliqua · 2013-07-23 09:48 · Score: 1

A super computer is a system that has multiple processors functioning in parallel. be it many individual machines networked together, a single processor with a several processors etc.

The term supercomputer is a very old one back before you could even fathom purchasing a machine capable of housing multiple CPUs, well unless you were a university or very well funded trust fund geek

by the original definition most of our phones are super computers

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Anonymous Coward · 2013-07-23 09:52 · Score: 0

It has about half the gigaflops of a Core i7, and costs 80% less to buy.
It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset.

And about 1/100th as useful.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Shinobi · 2013-07-23 09:58 · Score: 0

"It uses 5-10 watts, whereas the Core i7 uses 100 - 200 watts, with the chipset."

Wrong. Just so wrong. An i7-3770k, with a Radeon or Nvidia GPU drawing the desktop, running disks etc, while running a CPU heavy load, will draw 124 Watts, measured at the wall socket... Let's just say that if you subtract the GPU etc, you're down a significant chunk.

does for video what ARM does for photo by raymorris · 2013-07-23 10:41 · Score: 1

To take this one example suppose the ARM processor can do face recognition of a certain quality on a photo. Suppose it takes 1/4 of a second to process the image with some level reliability. Since this device can process 64 frames simultaneously, it can do the same recognition on video that the ARM could do on a photo.

Re:tis already a cluster - 64 cores by Jane+Q.+Public · 2013-07-23 10:50 · Score: 1

"With 64 cores, I'd say it's already a cluster. A dozen of these ($1200) would have 768 cores and fit in a microatx case. :)"

But what about performance? For example, how does it perform at parallel integer math (arguably the most common use for these things), as compared to a top-line, price-comparable GPU card?

That's what I want to know. I didn't search for a long time, but I didn't find info on that.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by citizenr · 2013-07-23 11:00 · Score: 2

The only problem is you cant run GPU standalone.
There was one project by someone who reverse engineered old Radeon HD2400
http://www.edaboard.com/thread236934.html
http://www.flickr.com/photos/73923873@N05/sets/72157631771354007/
but that guy deleted his git repo before publishing the news blurp and some photos and they quickly shut up about it.

I would love to be able to use GPU cards standalone for Vision projects, or just as a openCL accelerators for embedded systems.

--
Who logs in to gdm? Not I, said the duck.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by clong83 · 2013-07-23 11:53 · Score: 1

That, and GPU computing really only gives that kind of performance for a few types of problems. Namely, if you are able to structure your data arrays in memory in such a way that a GPU can operate on it efficiently. If you are solving nasty PDEs on an unstructured mesh, it's very difficult to do this. In that case, a GPU is pretty worthless. I don't know how these parallella boards work, but hopefully they would be a bit more versatile.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Rockoon · 2013-07-23 12:30 · Score: 2

I don't know how these parallella boards work, but hopefully they would be a bit more versatile.

There is almost no chance that a $100 board can be designed to have a memory interface that can keep 64 cores well fed at this point in time. They have almost certainly chosen low latency cache model over high bandwidth cache model due to this, so this product will probably only perform well on highly computational problems that dont require much memory - in other words none of the problems that GPU's struggle with will likely be any better on it.

--
"His name was James Damore."

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by clong83 · 2013-07-23 12:45 · Score: 1

"There is almost no chance that a $100 board can be designed to have a memory interface that can keep 64 cores well fed at this point in time. "

I agree with you 100% on that. If the cache isn't terrible, it might be okay if you have a problem amenable to openMP. But mainly I view these low-end things as kind of fun toys.

That said, there is a market for something reasonably compact and affordable in between a 4-8 core desktop and a large scale cluster. I occasionally test and debug problems on my desktop that seem to work fine, but when I scale it to 200 processors and put it on the cluster, all hell breaks loose and it can be hard to debug. A cheapo 64 core board, even if slow, could help bridge that gap, assuming I can use mpich/openMP on this thing.

Otherwise it is for hobbyists or as a learning tool.

Parallelism is software-intensive by pongo000 · 2013-07-23 12:46 · Score: 1

These boards are only half the solution to a parallel problem. I used to write satellite imaging software that was parallelized on a 12-CPU server. A lot of work went into the code necessary to parallelize the mapping and DTM algorithms. It wasn't trivial either. I'm failing to see the usefulness of these boards for anything other than intensive scientific computation. Because if the code being run isn't written for parallel processors, you're getting no advantage to running it on a multicore/multiprocessor computer.

Or am I missing something here?

Re:Parallelism is software-intensive by marcosdumay · 2013-07-23 13:38 · Score: 1

You are missing specialized applications written specificaly for that kind of system. What is indeed an easy thing to miss, because they don't exist, as that kind of system didn't exist until today.
I would have brought a few 3 years ago, but I don't have a need for them now.

--
Rethinking email

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by marcosdumay · 2013-07-23 13:29 · Score: 1

GPUs are SIMD, while this board is MIMD.

--
Rethinking email

More info on integer/float performance by Anonymous Coward · 2013-07-23 14:24 · Score: 0

I was curious too and found this (older but I think the basic idea is the same):
http://www.adapteva.com/wp-content/uploads/2011/06/adapteva_mpr.pdf

"To optimize its CPU for power, Adapteva started with a clean slate instead of a standard instruction set. Epiphany uses a simple RISC instruction set which focuses on floating point operations and load/store operations, so it omits complex integer operations such as multiply and divide. Each 32 bit entry in the 64 entry register file can hold an integer or a single precision floating point value.

As Figure 1 shows, the CPU itself is a simple two issue design capable of executing one integer operation and one FP operation per cycle. The CPU relies on
Adapteva’s compiler to optimally arrange the instructions rather than reordering instructions in hardware. To minimize power and area, the design has no dynamic branch prediction, although its short (six stage) integer pipeline keeps the misprediction penalty small. As a scalar integer design, the CPU achieves an EEMBC CoreMark score of about 1.3/MHz — a little less than that of an ARM9 CPU. By comparison, a modern high performance CPU such as Cortex A9 can achieve 2.9/MHz."

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Anonymous Coward · 2013-07-23 15:10 · Score: 0

PDEs on an unstructured mesh? That's what I'm doing! And I'm doing it on the GPU in a very parallel way. It's a little trickier than the serial version, but not so tricky that it took more than an afternoon to parallelise - and it is vastly faster than the serial version. It did require rethinking some methods, and it requires more memory now, but the results are the same, and it's still deterministic, and it scales well. I'm not in the business of publishing methods for this sort of thing, but it only takes one person to publish such a method...

So anyway, I don't really see the value in these new devices for desktop number-crunching, as they're just too slow to compete with even a cheap GPU. For use in a robot, though - or a vehicle, or any other environment where every Watt matters? Pretty cool.

Hold the bus by dbIII · 2013-07-23 16:47 · Score: 1

GPUs hit a wall with applications that need significantly more memory than you can fit on the device with the GPU cores. You spend so much time feeding them via the bus from main memory that after a point you'd be much better running the stuff on a far lower number of CPU cores.
So for some stuff they are very good, but for other stuff they are just not suitable at this time.

Re:Hold the bus by White+Flame · 2013-07-23 23:43 · Score: 1

You are aware that this chip has the exact same problems, right? But unlike a GPU it has very limited on-chip memory, and no directly attached external memory at all. All communication happens through a FPGA-driven channel to the ARM, with the ARM being the only thing with DRAMs attached.
This is fundamentally, and properly labeled as, an external "accelerator chip" to add onto a computer.
Re:Hold the bus by dbIII · 2013-07-24 00:58 · Score: 1

You are aware that this chip has the exact same problems, right?
You mean it's at the other end of a PCIe bus to where the memory is sitting? Thank you for playing but the communications channel looks a bit wider to me.
Re:Hold the bus by White+Flame · 2013-07-24 06:09 · Score: 1

I don't know what you read, but the 6.4GB/sec chip interface bandwidth is less than a standard PCIe graphics card at 8GB/sec. Now, you may argue that there is more latency across PCIe or something, but also note that the Parallella system assumes a shared memory architecture with the host OS on the ARM.
Again, there are no dedicated RAM chips for the accelerator on this board, and the chip itself has no DRAM controllers. You can only load up to 32KB of RAM per core into the chip caches itself; it doesn't have gigabytes of fast memory available to it like GPGPU processing does. This entire architecture is based around it being very chatty across that shared memory & comm bus, FAR more than any GPU would ever be.
This chip is for inexpensive acceleration of large-ish streaming data problems, like signal processing and video codecs or maybe map/reduce work, with very little state held on the actual chip. It's hard to see it having any sort of decent performance outside that, either in embarrassingly parallel or generalized multi-core problems.
Re:Hold the bus by dbIII · 2013-07-24 13:46 · Score: 1

I don't know what you read, but the 6.4GB/sec chip interface bandwidth is less than a standard PCIe graphics card at 8GB/sec
Well that sucks then for anything involving large data sets.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by clong83 · 2013-07-23 16:50 · Score: 1

Faster than serial ?! Of course! I only meant to compare it to a traditional parallel procssing environment. And you can definitely write a simple parallel algorithm for any O/PDE that will work on GPUs. What I meant was that there are an awful lot of claims about how wicked fast GPU processing can be. Some people tout it as much faster than traditional computing. This can be true, but to get a GPU to actually perform at that level, it requires particular structure to your data. Unstructured meshes are known to be particularly nasty. Doesn't mean you can't compute anyway. It just may or may not be any better than traditional methods.

I don't mean to poo-poo GPU computing in general. I admittedly haven't followed this field closely in a year or two, so it's possible there have been some newer agorithms for unstructured meshes that have improved the situation. And without knowing more about your particular problem, I won't speculate and tell you how it should or shouldn't work. Maybe you figured out a decent implementation on your own. In which case, publish it already!

Re:tis already a cluster - 64 cores by dbIII · 2013-07-23 16:51 · Score: 1

It depends. If you can get them on a board that can address 32GB or more of memory directly then they'll be able to handle a lot of tasks that GPU cards just cannot touch without a lot of waiting around to be fed data or careful design of those tasks to get them to fit into the memory of the GPU card.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Khyber · 2013-07-23 17:05 · Score: 1

"Ten of them would spank the heck out of a Core i7 and cost the same."

Yea, if it were even a general-purpose usable piece of silicon. It's not.

"16 or 64 cores is good for facial recognition, audio processing, video processing, some network stuff "

We've had all of that in software since fucking Windows 98 on an Evergreen overdrive (180 MHz) chip. Please catch up with current technology or stop shilling, what you speak of is absolutely not new, and not even novel.

"A database designed for the many cores could work well."

As we've had for the past 30+ years I've been alive?

"For example, say you need to sort a table with 100,000 rows. On a system like this with 64 cores,
each core could simultaneously sort a group of 1,500 rows,"

*cackle* Most cores today can't even sort FIVE HUNDRED rows, let alone triple that amount.

Quit shilling and get with reality, please.

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

IBM Cell Processor Again? by nukem996 · 2013-07-23 17:11 · Score: 1

How is this not any different then IBM's Cell Processor? You know the one in the PS3. Sure it didn't have as many cores but its the exact same thing and it didn't do well. A big part of the problem was the overhead caused in memory transfer from the host system to the individual cores. The other part was each core only had 512Kb of RAM, these only have 32Kb!

Re:IBM Cell Processor Again? by cruff · 2013-07-24 06:07 · Score: 1

How is this not any different then IBM's Cell Processor?
Can you actually buy a single Cell processor or even a dev board for one?
Re:IBM Cell Processor Again? by nukem996 · 2013-07-24 16:15 · Score: 1

Well up until Sony disabled the otheros option you could buy a PS3.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Anonymous Coward · 2013-07-23 17:37 · Score: 0

Wrong, it becomes at least 10 to 20 (based on 100watt i7 number) times more useful than a corei7 for any kind of remote robotics that rely on battery life and have limited room for solar cells. With that many cores it also makes it better for using sensors for object avoidance in autonomous robotics.

But you are an AC, so you probably won't read this to realize you are wrong either. Just like I probably won't see your reply to this if you do read it.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Anonymous Coward · 2013-07-23 19:16 · Score: 0

I think that was the desired goal - to get people to learn how to code for multiple cores at an affordable price. Being a programmer myself I know how debugging multithreading code can be a pain in the a$$. This is not meant as a kick ass workhorse but as a means to allow people to learn how to code better. Let face it, cores are not getting much faster, but we are getting more of them. If you are a programmer and are not threading your code you will shortly become extinct.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by serviscope_minor · 2013-07-23 21:45 · Score: 1

So total cost of ownership is about 90% less than the Core i7.

TCO is a meaningless measure and it's sad that it persists. I have a used halfbrick here. It costs 99% less to buy (excluding shipping) and uses 0% of the power. The TCO is vastly better than either of the two options you present.

Now, return on investment is a much better measure...

But yeah, your other points stand. As always by using more specialised hardware you can get vastly better flops, etc in a given hardware/power/financial budget. There are plenty of tasks that can be parallelized and it doesn't require the overhead of a powerful GPU (i.e. awhole PC attached).

--
SJW n. One who posts facts.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by White+Flame · 2013-07-23 23:45 · Score: 1

The Parallella doesn't run standalone, either. It's an accelerator chip attached to an ARM system.

Re:tis already a cluster - 64 cores by AmiMoJo · 2013-07-24 00:19 · Score: 1

I can fit over 9000 bottle caps in a medium sized rainwater barrel. Not sure what I'd do with it though.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC

education by Anonymous Coward · 2013-07-24 03:01 · Score: 0

i work in genetics and most of the programs aren't parallel. this is an ideal tool that can be used to make people look at their programs before we commit them to the hpc

So everything since 1998 is useless? by raymorris · 2013-07-24 03:26 · Score: 1

> We've had all of that in software since fucking Windows 98 on an Evergreen overdrive (180 MHz) chip.

So every processor since then is useless?

> "A database designed for the many cores could work well."

> As we've had for the past 30+ years I've been alive?

So noone will ever use another database, and there is no longer any use for hardware to run databases on?

You must be spending other people's money by raymorris · 2013-07-24 03:35 · Score: 1

I very much care what it costs me, so TCO is one of the most important measurements of all.

> I have a used halfbrick here. It costs 99% less to buy (excluding shipping) and uses 0% of the power. The TCO is vastly better than either of the two options you present.

So the scorecard reads:

Item Effective Fast TCO

hw1 yes yes 6
hw2 yes yes 2
brick no na 0

It looks to me like "brick" loses because it can't do the job. The other two options are the same, except hw1 costs three times as much.
They can both do the job, and both can do it fast. The only difference is that the TCO is a lot lower on hw2, so it's the best choice.

> TCO is a meaningless measure and it's sad that it persists.

What your brick example shows is that TCO is not the ONLY consideration. "Can it do the job?" is also a critical consideration.
Amazingly, when making decisions you can actually consider more than one factor. You can look at both effectiveness AND cost.

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by Anonymous Coward · 2013-07-24 05:33 · Score: 0

I wouldn't be so sure about your assertions. The Parallela chips will feature a systematic 3-cycle penalty for branches in many many cases (branches are always predicted not taken by default). Intel chips have a ~90% accuracy for branch prediction, so this is not something insignificant. Also, one core i7 (or multiple ones on the same board) is easy to program using a shared-memory framework (like OpenMP). I would like to know what a cluster of Parallela would use for communicating between the chips. That being said, I backed the project last year and can't wait to see my boards shipped to my place. :-)

Re:half the Gflops, 64 cores, 80% lower cost, 5 wa by jenesuispasgoth · 2013-07-24 05:37 · Score: 1

Erm. I beg to differ. Nvidia GPUs are "SIMT" (Single Instruction, Multiple Threads). There are "tricks" to avoid threads in a warp from waiting for other threads (basically, don't use if (condition) ... else ..., but if(condition) ... and if(!condition) ...). AMD GPUs are based on VLIW processors, and are closer to your assertion of SIMD, but it's not quite the same thing either.

32 bit address bus! by Anonymous Coward · 2013-07-29 18:20 · Score: 0

So no they're not going to address more than a modern high-end graphics card. (Some 7970's and a number of Tesla cards have 6 gigs or more now.)

That was actually this biggest hindrance in the design to me. 4096 cores (max) means each core will have 1 meg of off-board memory available. Additionally, due to current memory densities, you'll be limited to between 1 and 8 memory controllers off (since ideal placement for them would be edge to edge with a multilayer board and some ancillary parts on the backside (for non-mesh IO interfacing, etc.), since current memory densities are ~256-1024 meg, and rising.

This means despite the fact that you should be able to practically get hundreds of gigabytes/sec of external bandwidth by having multiple memory busses feeding into the edge nodes, practically you'll only have 1-4 due to the limitations of the address bus and probably (albeit less so in the 4096 core case) power envelope concerns of your board.

That STILL doesn't curtail interest in the current design however. The shared 32 bit address space makes it sound like an old VAX and with some work it could be fun to make a system with 256-512 cores that would retain adequate per-core memory to do useful tasks with. Combined with a few southbridges supporting gigabit or better ethernet and perhaps some SATA controllers, this could be the next big thing. Combined with an updated revision supporting 64 bit word addressing, it could be a game changer for a considerable number of applications. Given perhaps an integer, double, or string optimized model, I'm sure they could find even more demand among other market segments where maximum parallelism is more importantant than individual core performance (and given the individual core memory throughput, combined with multiple edge memory controllers, it could be a fearsome beast indeed.)

99 dollars with GigE!!!! by Anonymous Coward · 2013-07-29 18:30 · Score: 0

Just wanted to add that, since pretty much every *ARM* board for under 100 bucks is still limited to 10/100 ethernet.
Lack of something other than USB/micro-sd for local storage kinda sucks ass, but there's a lot you could do with a beowulf cluster of these, and their network ports certainly make them performant enough to do that with.

Slashdot Mirror

Adapteva Parallella Supercomputing Boards Start Shipping

98 comments