Slashdot Mirror


California Researchers Build The World's First 1,000-Processor Chip (ucdavis.edu)

An anonymous reader quotes a report from the University of California, Davis about the world's first microchip with 1,000 independent programmable processors: The 1,000 processors can execute 115 billion instructions per second while dissipating only 0.7 Watts, low enough to be powered by a single AA battery...more than 100 times more efficiently than a modern laptop processor... The energy-efficient "KiloCore" chip has a maximum computation rate of 1.78 trillion instructions per second and contains 621 million transistors.
Programs get split across many processors (each running independently as needed with an average maximum clock frequency of 1.78 gigahertz), "and they transfer data directly to each other rather than using a pooled memory area that can become a bottleneck for data." Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.

205 comments

  1. Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

    Can this chip run GNU/systemd/Linux?

    1. Re:Can this chip run GNU/systemd/Linux? by dejitaru · · Score: 1

      probably... efficiently? Doubt it.

    2. Re:Can this chip run GNU/systemd/Linux? by ancientt · · Score: 5, Interesting

      That's probably all it can run. Typically specially designed systems need the ability to configure the OS radically differently than has been done previously which requires source code. Microsoft provides source code, as does IBM, in some special situations, but mostly it tends to be Linux that is used first. Consider the reasoning behind the OS chosen for the fastest computers in the world.

      Systemd? Probably because serious computer engineers don't have any trouble dealing with the irritation that systemd causes. (The rest of us may, but if you have enough smarts to handle building a specialized chip, then systemd isn't really a challenge.)

      --
      B) Eliminate all the stupid users. This is frowned upon by society.
    3. Re:Can this chip run GNU/systemd/Linux? by NotInHere · · Score: 4, Informative

      No.

      systemd requires glibc. And glibc is 2 MB large. According to the paper, the processor has whopping 768 KB of RAM (and no capabilities to add external RAM).

      Means systemd won't gonna run. Dunno about the kernel, probably its easier to write a minimal one from scratch than to port it over to that special architecture.

    4. Re:Can this chip run GNU/systemd/Linux? by Pseudonym · · Score: 5, Informative

      This is basically a modern transputer. As with connection machines, GPUs, and all such machines, it will very likely need a traditional host CPU to manage it, and that may well run Linux.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    5. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      Speaking of GPUs, and apart from the big difference in energy efficiency, what's the difference between this and the 1000+ cores I already have in my computer? .

    6. Re:Can this chip run GNU/systemd/Linux? by AchilleTalon · · Score: 4, Informative

      Your GPU processors need to execute the SAME instruction at each clock cycle, this one has each processor capable to execute any instruction at each clock cycle. So, this is truly like a 1000 cores CPU. While the GPU is limited to dispatch the same instruction to all processors.

      --
      Achille Talon
      Hop!
    7. Re: Can this chip run GNU/systemd/Linux? by Samantha+Wright · · Score: 1

      So... What do you do with it? Brute forcing encryption keys?

      --
      Bio questions? Ask me to start a Q&A journal. Computer analogies available for most topics!
    8. Re:Can this chip run GNU/systemd/Linux? by WindBourne · · Score: 2

      why doubt it? After reading this, it sounds like a great set-up. With a 1000 CPUs of MIMD, it sounds like the right core for controlling access to massively parallel systems. And a single AA to run it? Sounds like a pretty decent chip to me.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    9. Re:Can this chip run GNU/systemd/Linux? by WindBourne · · Score: 4, Informative

      Totally easy to add external ram. In fact, it supports 12 independent memory modules. The 768 KB is in place of cache memory. Basically, it is a working table in which any of the CPUs can access any part of it.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    10. Re:Can this chip run GNU/systemd/Linux? by WindBourne · · Score: 1

      look up SIMD vs MIMD. In a nut shell, your GPU has a large number of 'CPU's that do the same thing. These are 1000 CPU, each capable of doing the same thing, OR doing their own thing.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    11. Re:Can this chip run GNU/systemd/Linux? by jellomizer · · Score: 2

      Most programmers don't know how to code for parallel processors. At best you may get multi-threaded apps but those are often made to handle large load of request not soling a single problem much quicker.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    12. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 1, Interesting

      A single AA is marketing lies. Sure, the battery will handle it, but it's not what it is made for and the energy you actually get out will be less than the marked one.
      The number listed on the battery is typically how much you will get out of it from a 20h discharge time.
      You should not be surprised if an AA battery only lasts for half the rated time if you try to suck 0.7W out of it.
      The runtime won't be much above 1 hour.

      OTOH a computer with 1000 processors is hardly made for portable applications so the single AA example is just silly anyway. For the application one would use this for there are better power sources available.

    13. Re: Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      I don't know. Object/pattern recognition in portable devices like a drone perhaps? But the memory is a bit limiting for processing image data.
      Some sort of audio processing? Perhaps it could be useful for analyzing sonar/radar data? Again that could be useful for obstacle avoidance in a delivery drone.

    14. Re: Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 1

      I don't think this is right. I've written OpenCL kernels that have variable length loops and branches either of which could be run, and executed then in parallel. So either my understanding is wrong, or GPU cores can indeed run different instructions at the same time.

    15. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      What's the use-case for them each doing their own thing? Servers? Neural Networks? Stuff like that?

    16. Re: Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      NBA Finals prediction simulator

    17. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 1

      Most programmers don't know how to code for parallel processors.

      Most of them don't need to, because at a high level, most things you want to do with a computer are inherently serial.

      At a low level, tons of math can be parallelized, although there's a trade-off of parallel processing and overhead. Low level parallelism happens inside libraries, written by people who do know how to write for parallel processors, and transparent to higher-abstraction programmers.

      This chip is not going to be in your phone or in the next iPad. "Most" programmers are unlikely to have any contact with this architecture. It will be really good at parallel tasks requiring relatively little data (max data rate seems to be less than 400 Mbps/processor). Those are big problems, but very specialized. The people who work on them know how to parallel.

    18. Re:Can this chip run GNU/systemd/Linux? by John+Allsup · · Score: 3, Interesting

      I still wonder how long it will be until the 'traditional host CPU' is scaled down to a small SOC, so that the traditional heavyweight CPU is freed up for tasks that actually require it: most of what runs on the i5 in the machine I am writing this on doesn't need anything remotely as powerful as said i5. Likewise, putting a small SOC-like chip in the graphics card and running most of the GUI there is another thing. As such, once processors hit the single core brick wall (and they're kind of doing that now), performance improvements will come from offloading what can run on a small power-efficient core to such a small power-efficient core. Given what the chip in e.g. a pi zero costs, it ought to make sense: connect your machine to power, and a tiny microcontroller handles the ILO and basic system management functions, and on power-on, a larger microcontroller/SOC does what the BIOS/UEFI does on current machines. Similarly in the screen we have the same arrangement, with a microcontroller starting up the GPU and display (independently of the rest of the machine). A modern PC is already like a small network (the GPU being networked to the main CPU via the pcie bus, multiple intel sockets networked via QPI etc.). Making this more explicit is the sensible thing to do.

      --
      John_Chalisque
    19. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 1

      768 K should be more than enough for everybody.
      That's what most PC/XT clones with a Hercules card had 30 years ago.

    20. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      from the paper: "Each of the 12 independent memory modules contains a 64KB SRAM, services two neighboring processors, and supports 28.4Gbps of I/O bandwidth."

      Not clear if that means the independent memory modules are 64K, or if that's their cache.

      BTW: note that they specify the speed of the chip at 1.1V, but the power consumption (1.3W) is at 0.56V :-) At 0.84V it dissipates 13.1W.

    21. Re:Can this chip run GNU/systemd/Linux? by JoeMerchant · · Score: 1

      New applications?

    22. Re: Can this chip run GNU/systemd/Linux? by Bengie · · Score: 4, Informative

      AchilleTalon is correct, each processing group in the GPU can only execute the same instruction on all cores in that group. Every time you have a branch in your code, the GPU takes one branch, executing the instructions for that branch and stalling all cores that took a different branch, then takes the other branch, and stalls the other other cores. GPUs hate branches. Yes, they can do them, but at a huge performance penalty. You may want to write better code.

      To get into a bit more details, I'll use AMD as an example, but Nvidia pretty much does the same thing with slightly different terms for the same concepts. The AMD RX 480 has 2304 streaming processors(cores), that are grouped into 36 CUs(execution groups). Each streaming processor can handle up to something like 4 wavefront(threads, like hyper-threading to hide memory access latency) at a time. All streaming processors in a CU for a given wavefront must be executing the same instruction at the same time, except in the case of a branch. When a branch happens, one fork of the branch will process, stalling the other streaming processors taking the other fork. Once that fork is finished, the first group of streaming processors will stall while the other processing finish their fork.

    23. Re:Can this chip run GNU/systemd/Linux? by GLMDesigns · · Score: 2

      You're parsing this a little more than necessary. The point was not that people would use a AA battery. The point was that this chip was an energy sipper as opposed to an energy guzzler.

      --
      If you're scared of your govt then you need to further restrict its powers
      Vote 3rd Party in 2016 and beyond
    24. Re:Can this chip run GNU/systemd/Linux? by dbIII · · Score: 1

      Most programmers don't seem to be able to deal with buffer overflows, race conditions or 64 bit. This is for the other ones who can deal with more than one thread, the ones that have caught up with the 1990s and are not stuck in the MSDOS mindset.

    25. Re:Can this chip run GNU/systemd/Linux? by dbIII · · Score: 2

      most things you want to do with a computer are inherently serial

      Even very simple stuff with sound and images is inherently parallel. More complex modelling of physical objects is inherently parallel.
      You don't get it? Imagine resizing the every frame of a movie at 25fps over two hours. That's the same operation done many times and very trivial to do in parallel. It's just a matter of splitting the task to whatever resources you have. With sound (and thus things like seismic data as well) if you want to apply the same filter to thousands or millions of samples it's very trivial to do in parallel.

      Those are big problems, but very specialized

      Home movies and digital photography fit into the mix so not very specialized at all.

    26. Re:Can this chip run GNU/systemd/Linux? by cadeon · · Score: 1

      Only in a beowulf cluster.

    27. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      Phones already do this: big.LITTLE ARM designs include a pair of low-power ARM cores, which get scheduled all low-cpu tasks. When the phone needs to do heavy processing, it boots the larger cores, and sets task affinity to them. Tasks get completed quicker and the user is happier.

      On desktop CPUs, this behaviour is already highly integrated in the CPU, it's just so transparent to user you never actually notice it. This is exposed as power levels, and the CPU spends most of its time at a very low power level, where large portions of the CPU are deactivated. When something requires high CPU usage, the CPU wakes up, completes the task and powers down again. In fact, the laptop thermal design on modern Intel chips actually can't handle the CPU running 100% all the time - it is essential to keep the chip powered down most of the time !

    28. Re:Can this chip run GNU/systemd/Linux? by Tough+Love · · Score: 1

      No doubt Linux runs on a conventional processor that manages the embedded processors. Probably just running on the metal on the embedded processors, like a GPU.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    29. Re:Can this chip run GNU/systemd/Linux? by Tough+Love · · Score: 1

      Most programmers seem to be coding Javascript these days.

      --
      When all you have is a hammer, every problem starts to look like a thumb.
    30. Re:Can this chip run GNU/systemd/Linux? by WindBourne · · Score: 1

      bingo. I like the sounds of this chip. If done inexpensively (though they worked with IBM so it might not be), this could be a major chip. I could imagine a number of these for a server. Wow.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    31. Re:Can this chip run GNU/systemd/Linux? by WindBourne · · Score: 1

      SIMD are used for parallel same processing of different data. Imagine processing a graphics and you want to lighten it by 10%. Then this simply divides the work across the ram with individuals CPUs, but all adding 10% to the value.
      Likewise, if doing weather processing, or geo-graphical, or simulations of lightrays. They all involve the same calulations but applied to different data.
      Hence SAME INSTRUCTION; Multiple (or different) DATA.
      Roughly, those CPUs all operate in lock-step.

      MIMD, is like having 1000 different CPUs in a box, or for that matter, 1000 different boxes. The 768K data region is what the chips can see. So, you can have 5 chips working on 100K, while the rest is split amongst 995 other chips.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    32. Re:Can this chip run GNU/systemd/Linux? by Anonymous Coward · · Score: 0

      Your GPU processors need to execute the SAME instruction at each clock cycle. This one has each processor capable to execute any instruction at each clock cycle. So, this is truly like a 1000 cores CPU. While the GPU is limited to dispatch the same instruction to all processors.

      FTFY. You used a comma where you should have used a period. Please don't do that. It is really fucking annoying.

    33. Re: Can this chip run GNU/systemd/Linux? by Pseudonym · · Score: 1

      I've written OpenCL kernels that have variable length loops and branches either of which could be run, and executed then in parallel.

      The way this typically works is to use conditional execution, just like in ARM or Itanium, with the predicate bit being a set of bits. This is all explained in early research papers on GPUs, such as this one from the now-amusingly-named "Lucasfilm Pixar Project" circa 1984.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    34. Re: Can this chip run GNU/systemd/Linux? by hackwrench · · Score: 1

      I could imagine one of these for a desktop.

    35. Re: Can this chip run GNU/systemd/Linux? by hackwrench · · Score: 1

      I would parse it as the memory modules have that much memory and that it has IO to an external bus at the speed stated.

    36. Re: Can this chip run GNU/systemd/Linux? by hackwrench · · Score: 1

      Not the original poster, but I used to do a lot of comma ands that have been largely replaced with period ands and semicolons.

    37. Re: Can this chip run GNU/systemd/Linux? by opus981 · · Score: 1

      Since when did systemd become part of the GNU/Linux moniker?

  2. Link to paper by NotInHere · · Score: 5, Informative

    The press release does not include it, nor does the slashdot summary. The link to the paper: http://vcl.ece.ucdavis.edu/pub...

    1. Re:Link to paper by gweihir · · Score: 2

      These are pretty primitive, yet very flexible cores. Worthless for most current loads, but that may change. However the comparison to modern CPUs is unfair. A proper comparison would be to modern GPUs.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  3. Mind bogglingly complecated co-processing by chromaexcursion · · Score: 1

    Maybe things are getting better. Too many programs are single threaded. Too many drivers are single threaded. Yes you can sandbox them.
    That leaves out the nasty deadly embrace. Or less nasty, waiting on a key resource to complete.
    More core just gets you bound up in your shorts faster.
    more cores is not a magic bullet.

    1. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 5, Interesting

      I take it you've never done high performance computing, have you? More cores is often a good thing. If I'm doing a simulation across 1,024 cores and each node has 16 cores, that means I need a minimum of 64 nodes. There's a lot of communication that takes place over protocols like Infiniband in order to make MPI work. It also rules out the possibility of shared memory systems like OpenMP when jobs reach that scale and have to be spread across multiple nodes. If more cores are located within a single node, it reduces the amount of communication with other nodes and the resulting latency. It also makes shared memory a viable option for larger parallel jobs. If I can fit 64 or 256 cores on a node, there's a lot less need for relatively slow protocols like Infiniband to pass messages. I don't think the ordinary user has a need for 1,000 cores or would have such a need for a long time. But it really could help with high performance computing.

    2. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      Can I ask what this computing is for? Because I look at the real world around me and I see little that would benefit from that.

    3. Re: Mind bogglingly complecated co-processing by Crashmarik · · Score: 3, Informative

      Oi

      There's always problems that parallelize well and this setup will likely work just fine for them. The same way nvidia cuda does already, the same way vectorizing/coprocessing add ons have done going back to the ISA bus.

      The fly in the ointment is most of the worlds problems don't and even when you can parallelize debugging is nightmarish.

      All said expect to see this doing neural network work. From the article and the description of the processor communication/lack of shared memory it sounds custom tailored to that.

    4. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      Processing graphics, you know, like a graphics card would with it's 1000+ cores...

    5. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      It sounds like a huge advance for weather and nuclear modeling, which are currently limited largely by infiniband.

    6. Re: Mind bogglingly complecated co-processing by mSparks43 · · Score: 1

      I thought we'd reached the point where all the main computation problems had parallel solutions now?

      What's left?

    7. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      But you can't store your new and shiny image anywhere.

    8. Re: Mind bogglingly complecated co-processing by AchilleTalon · · Score: 1

      There is already GPU for this with more than 1000 processing unit. For example, the nVidia GeForce GTX 980 has 2048 cores. http://www.geforce.com/hardwar...

      --
      Achille Talon
      Hop!
    9. Re: Mind bogglingly complecated co-processing by docmordin · · Score: 4, Interesting

      Doing any sort of large-scale computational fluid dynamics or finite element simulations may require a great many cores. For example, you might want to conduct a very detailed simulation of the air flow around a vehicle, airplane, structure, etc. to have a basic understanding of its aerodynamics before spending time and money testing an actual prototype in a wind tunnel. You might also want to look at how very complicated, soft-body structures deform due to a variety of external stimuli. Such information would be crucial for certain materials science applications. Chemical reaction and acoustic simulations may also require a great deal of computing power, especially if you want to have a high spatio-temporal resolution.

      Essentially, there are plenty of physical and theoretical science applications that can benefit from massive processing capabilities. There is a lot of fundamental science that is also performed in simulation before any actual tests occur.

    10. Re: Mind bogglingly complecated co-processing by Billly+Gates · · Score: 1

      Too bad they won't fit in 784k of ram

    11. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      Thats exactly what I was thinking. This seems to be a lot more efficient and a bit different, but the 1000+ cores in a single package is not a first. What would be nice is a summary of how it compares to a current high end GPU.

    12. Re: Mind bogglingly complecated co-processing by WindBourne · · Score: 2

      no, but with this low energy usage (a single AA powering it), I think that this COULD have an impact on tablets and phones. That ability to shut down cores, while scaling up, is darn useful.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    13. Re: Mind bogglingly complecated co-processing by goose-incarnated · · Score: 4, Informative

      It also makes shared memory a viable option for larger parallel jobs.

      Good luck with that. I mean it. IME as you go *more* parallel, shared memory becomes a *less* viable option, regardless of how many cores are running on the same machine. The cycles lost to memory locking to make shared memory work increases exponentially with the number of autonomous processes/threads.

      The math isn't disputed - see the birthday problem for a start on calculating the clashes in playing musical chairs. In short, when you have X individuals with Y pigeonholes, then you are effectively bounded by Y, not by X. When you have X threads trying to access one variable, the chance that any thread will get this variable without waiting is effectively 1 for one thread, 1/2 for two threads, 1/3 for three threads, etc.

      By the time you get to a mere 64 threads each trying to access a variable, each thread basically has a 1.5% chance of getting it, and a 98.5% chance of being placed into a queue for that variable. Queue times get longer logarithmically. For one thread, time spent in the queue is ((0 * ATIME) + ATIME) where ATIME is the access time of the variable. For two threads, it's ((1-1/2) * ATIME) + ATIME, for three threads it's ((1-1/3) * ATIME) + ATIME, for four threads it's ((1-1/4) * ATIME) + ATIME. For ATIME=100us, the times above are, respectively, 100us, 150us, 166.67us, 175us. That last number is only for four threads with one variable, and assuming that queuing takes no clock cycles. The times increase exponentially with an increase in the number of variables that must be locked.

      For 64 threads your expected time in the queue is ((1-1/64) * ATIME) = 98.5us. You can forget about using shared memory if you want to use 1000 cores.

      But wait, "Use a sane design pattern and that won't happen, like with consumer/producer, etc" I hear you say? Sorry, no design pattern will save you, because if even a single thread writes to a variable, then all threads have to implement read-locks to make sure they don't get an access during a write (race condition).

      If you have 1000 cores, implement local message-passing. Don't try shared memory unless each thread will use a local copy (in which case, it isn't "shared", now is it?). Or, go ahead and do it and maybe you'll find a shared memory design that doesn't fail to first year statistics, and if you do beat the numbers then I'll be the first to nominate you for a Fields medal/Turing award :-)

      --
      I'm a minority race. Save your vitriol for white people.
    14. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 3, Insightful

      Because I look at the real world around me and I see little that would benefit from that.

      This is a failure of imagination. The worst kind of failure.

    15. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      I take it you've never studied HPC architecture, if the depth of your analysis is to look at 1,024 cores and divide it by cores per node.

      What do you think the bandwidth to memory is of 64 independent nodes each with 16GB of RAM?

      Now, what do you think the bandwidth to memory is of 1 chip with 1TB of RAM?

      Oh dear.

      If you want 1,024 "cores" which only have slow access to global shared memory, and a paucity of local memory, you can already get that on most high-end GPUs. Do those GPUs run your 64-node 1,024-core workload? Almost certainly not.

      As OP says, more cores is NOT a magic bullet. There are other significant factors.

      Another factor is that despite these not being SIMD because "SIMD bad", since they communicate with each other over narrow pipes they're going to have to run in lockstep anyway, so they may as well be SIMD.

      As for most users not needing this amount of compute, it's less than the PS4 has, and they've sold like 40m of those.

    16. Re: Mind bogglingly complecated co-processing by ultranova · · Score: 1

      Sorry, no design pattern will save you, because if even a single thread writes to a variable, then all threads have to implement read-locks to make sure they don't get an access during a write (race condition).

      That sound like a problem the immutable object pattern was designed to solve.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    17. Re: Mind bogglingly complecated co-processing by goose-incarnated · · Score: 1

      Sorry, no design pattern will save you, because if even a single thread writes to a variable, then all threads have to implement read-locks to make sure they don't get an access during a write (race condition).

      That sound like a problem the immutable object pattern was designed to solve.

      Then you don't need shared memory. If the object never changes then each thread can keep their own local copy, and there's no need for shared memory (which is what I said somewhere above in that jungle of text).

      --
      I'm a minority race. Save your vitriol for white people.
    18. Re: Mind bogglingly complecated co-processing by thegarbz · · Score: 1

      You don't even need to get into theoretical CAD for this to be of benefit. There are a lot of computer use cases that are massively parallel. I mean my 5 year old graphics card has 448 CUDA cores for the insanely parallel task of rendering something on my display, and that doesn't even take into account professional rendering which covers everything from marketing departments to the motion picture industry.

    19. Re: Mind bogglingly complecated co-processing by thegarbz · · Score: 2

      Tell me about it. My i5 supports a whopping 1MB.

      Oh wait you thought that was RAM in the traditional sense? Maybe you should read the original paper which among other things said that this is extensible with onchip memory (think level 3 cache), or off chip memory (actual RAM).

    20. Re: Mind bogglingly complecated co-processing by Bob+the+Super+Hamste · · Score: 1

      I was thinking atomic operations as they would also avoid the wait.

      --
      Time to offend someone
    21. Re: Mind bogglingly complecated co-processing by goose-incarnated · · Score: 2

      I was thinking atomic operations as they would also avoid the wait.

      Atomic operations aren't useful enough to share data; we use them to implement the locks on the actual data we want to share. GP spoke about wanting 1000 cores with shared memory, chances are he's not planning on having all 1000 simply increment/decrement an integer.

      --
      I'm a minority race. Save your vitriol for white people.
    22. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 1

      Did you read the pdf with the chip information? http://vcl.ece.ucdavis.edu/pubs/2016.06.vlsi.symp.kiloCore/2016.vlsi.symp.kiloCore.pdf

      Each core has local memory (640k), the 768k is a shared block and cpus can borrow memory from other nearby cpus when they need it. It also has channels to access external RAM (not on the chip).

    23. Re: Mind bogglingly complecated co-processing by dbIII · · Score: 1

      Because I look at the real world around me

      That's a start, think of image processing. A lot of it is applying the same operation to a very large number of images with no need to do it in a special order or in a serial manner at all.
      Even something as trivial as editing a home movie is going to be an utter pain if the software is single threaded instead of doing the task more quickly in parallel.

    24. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      oops, guess I should have read it, turns out it's 640 bytes of instruction and 512 bytes of data and not 640k, sorry

    25. Re: Mind bogglingly complecated co-processing by Bob+the+Super+Hamste · · Score: 1

      It all depends on the data one is working with and how it is being used. Integers work wonders for a whole lot of things and provided that you aren't working on a collection of them (in which case you may be able to do things differently like reading from one and writing to another). This may not work in for the problem you are working on but That said the goal should be to limit the number of locks you need and there may very well be a better way of doing it in a shared memory environment that doesn't require a pile of semaphores that requires every thread waiting.

      --
      Time to offend someone
    26. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      Do you understand the difference between parallel and concurrent?

    27. Re: Mind bogglingly complecated co-processing by Anonymous Coward · · Score: 0

      This is not about parallelization. It's not like a GPU. Each of the cores can run its own task.

    28. Re: Mind bogglingly complecated co-processing by TheLoneGundam · · Score: 1

      Mass numbers of computations are used in numerical weather prediction for one thing - and NWP is what allows forecasters to be come more accurate about severe storms, hurricane track and intensification, etcetera. The models already operate on massively parallel machines - but faster machines or less latency will allow them to forecast for smaller grids, improving accuracy, and if they run faster, they can be run more times per day which also helps forecasts adjust to changing conditions.

  4. What games does this come with by Anonymous Coward · · Score: 0

    What games does this come with

    1. Re:What games does this come with by invictusvoyd · · Score: 4, Funny

      pong

    2. Re:What games does this come with by Anonymous Coward · · Score: 0

      life (Conway's)

    3. Re:What games does this come with by xupere · · Score: 1

      but you can have 999 players (on a 999 sided polygon field) and 1 ball

  5. In other news by ebonum · · Score: 5, Funny

    A young intern who likes to "work late" in Davis California has recently come into the possession of a rather large stash of bitcoins.

  6. yay so awesome by Anonymous Coward · · Score: 1

    Yay this is so awesome that researchers have pretty much put 1,000 6502 processors on a single chip. Way to go, maybe in a year we can move on to the equivalent of 1,000 z80 processors on that chip. Yay research!

    1. Re:yay so awesome by Yvan256 · · Score: 2

      And is it really 1000 CPUs, or is it 1024 rounded down to 1000 for the press release?

    2. Re:yay so awesome by Anonymous Coward · · Score: 0

      Its really 1000. The paper says the layout is 32 columns * 31 processors + 1 row with 8 processors.

    3. Re:yay so awesome by Agripa · · Score: 1

      1 kibiCPU.

  7. Re: The Republicans will ban this... by Anonymous Coward · · Score: 0

    That is the way of their kind.

  8. I guess this is great by dejitaru · · Score: 3, Interesting

    But I am not sure what system or software can take advantage of it. Personally I want to see progress being made on quantum computing for consumer lever stuff.

    1. Re:I guess this is great by Ironlenny · · Score: 4, Insightful

      Quantum computing is not magic. It has problems it's insanely good at (in theory) solving, and it has problems where it's as fast or slower (because of the necessary error correction) as your traditional deterministic computer. Not only are we a long way off from personal quantum computing (we still don't even have a general purpose quantum processor), we still need to research deterministic architectures.

      --
      There is a system for subverting the system and you should use that system!
    2. Re:I guess this is great by dejitaru · · Score: 1

      Yes, you are correct and I am well aware of that, but still, just the thought of it becoming personal computing and storing data on a qubit just sounds soo... futuristic! Doubt I will see anything in my lifetime, but still we can dream :)

    3. Re:I guess this is great by thinkwaitfast · · Score: 3, Interesting

      Live video streaming. The thing about more cores is that for a similar application, energy usage decreases with the square of the frequency.

    4. Re:I guess this is great by dejitaru · · Score: 1

      But how well does a system know to allocate it to different cores?

    5. Re:I guess this is great by dj245 · · Score: 1

      But I am not sure what system or software can take advantage of it. Personally I want to see progress being made on quantum computing for consumer lever stuff.

      If you have an application where you can calculate many possible solutions independant of each other, and then choose the best one, this kind of processor might be useful. Quantum computers are very strong for that kind of application, so I see it being a stepping stone to quantum computing.

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    6. Re:I guess this is great by dejitaru · · Score: 1

      Agreed but I see it all (consumer) being handled by just ones and zeros, considering the fact that qubits can expand on that to have data be past binary I can see a whole lot happening with it

    7. Re:I guess this is great by phantomfive · · Score: 1

      That kind of computation ability with that low amount of power is worth something.

      --
      "First they came for the slanderers and i said nothing."
    8. Re:I guess this is great by Anonymous Coward · · Score: 0

      You could always give that IBM quantum computing service a shot if you're interested...

    9. Re:I guess this is great by thegarbz · · Score: 1

      How does it currently? How does your GPU know which pixel to render with which of the similarly high number of CUDA cores a typical video card has these days?

    10. Re:I guess this is great by Mr+Z · · Score: 1

      I'm familiar with Dr. Baas' older work (ASaP and ASaP2). He presented his work to a team of processor architects I was a part of several years ago.

      At least at that time (which, as I said, was several years ago), one class of algorithms they were looking at was signal processing chains, where the processing steps could be described as a directed graph of processing steps. The ASaP compiler would then decompose the computational kernels so that the compute / storage / bandwidth requirements were roughly equal in each subdivision, and then allocate nodes in the resulting, reduced graphs to processors in the array.

      (By roughly equal, I mean that each core would hit its bottleneck at roughly the same time as the others whenever possible, whether it be compute or bandwidth. For storage, you were limited to the tiny memory on each processor, unless you grabbed a neighbor and used it solely for its memory.)

      The actual array had a straightforward Manhattan routing scheme, where each node could talk to its neighbors, or bypass a neighbor and reach two nodes away (IIRC), with a small latency penalty. Communication was scoreboarded, so each processor ran when it had data and room in its output buffer, and would locally stall if it couldn't input or output. The graph mapping scheme was pretty flexible, and it could account for heterogenous core mixes. For example, you could have a few cores with "more expensive" operations only needed by a few stages of the algorithm. Or, interestingly, avoid bad cores, routing around them.

      It was a GALS design (Globally Asynchronous, Locally Synchronous), meaning that each of the cores were running slightly different frequencies. That alone makes the cores slightly heterogeneous. IIRC, the mapping algorithm could take that into account as well. In fact, as I recall, you pretty much needed to remap your algorithm to the specific chip you had in-hand to ensure best operation.

      The examples we saw included stuff familiar to the business I was in—DSP—and included stuff like WiFi router stacks, various kinds of modem processing pipelines, and I believe some video processing pipelines. The processors themselves had very little memory, and in fact some algorithms would borrow a neighboring core just for its RAM, if it needed it for intermediate results or lookup tables. I think FFT was one example, where the sine tables ended up stored in the neighbor.

      That mapping technology reminds me quite a lot of synthesis technologies for FPGAs, or maybe the mapping technologies they use to compile a large design for simulation on a box like Cadence's Palladium. The big difference is granularity. Instead of lookup-table (LUT) cells, and gate-level mapping, you're operating at the level of a simple loop kernel.

      Lots of interesting workloads could run on such a device, particularly if they have heterogenous compute stages. Large matrix computations aren't as interesting. They need to touch a lot of data, and they're doing the same basic operations across all the elements. So, it doesn't serve the lower levels of the machine learning/machine vision stacks well. But the middle layer, which focuses on decision-guided computation, may benefit from large numbers of nimble cores that can dynamically load balance a little better across the whole net.

      I haven't read the KiloCore paper yet, but I suspect it draws on the ASaP/ASaP2 legacy. The blurb certainly reminds me of that work.

      And what's funny, is about 2 days before they announced KiloCore, I was just describing Dr. Baas' work to someone else. I shouldn't have been surprised he was working on something interesting.

  9. remaining core count by Anonymous Coward · · Score: 5, Funny

    the world's first microchip with 1,000 independent programmable processors ... Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.

    Yeah, but you have to keep in mind how many cores will be left for the user!

    1000 cores minus:
    * 200 cores for anti-virus software
    * 25 cores for the ransomware battling it out with the anti-virus
    * 55 cores for Microsoft's Win10 update nagware
    * 350 cores for the NSA monitoring
    * 122 cores for the FBI monitoring
    * 75 cores to handle syncing all your data to the cloud
    * 94 cores to run the 3D GUI based desktop
    * 62 cores for constant advertising
    * 14 cores for Google to keep tabs on what you're doing
    * 1 core dedicated to emacs

    So, only 2 cores left for the user. No better than an Athlon from 2005, I'm afraid.

    1. Re:remaining core count by Anonymous Coward · · Score: 1

      Oh look at the FUD monster. You forgot to dedicate 600 cores to trying to get iTunes to run.

    2. Re:remaining core count by Anonymous Coward · · Score: 0

      "1 core dedicated to emacs

      So, only 2 cores left for the user."

      But... emacs is all the user needs.

    3. Re:remaining core count by narcc · · Score: 2

      What if they also want to run a decent text editor?

    4. Re:remaining core count by Anonymous Coward · · Score: 0

      That read similarly to an postscript of an Asterix comic book. Just exchange the cores with pens, erasers and pints of beer.

    5. Re: remaining core count by Anonymous Coward · · Score: 0

      then they can shell out from emacs to vi.

    6. Re:remaining core count by Anonymous Coward · · Score: 1

      vi doesn't need a whole core to itself :-)

    7. Re:remaining core count by tigersha · · Score: 1

      But he still needs to compile the stuff he edited with emacs/vi so there is the last bit

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
    8. Re:remaining core count by Anonymous Coward · · Score: 0

      Its a little over the top, but he does have a point. There is a lot of garbage that runs on a modern PC. I remember playing the original Half Life on a 223 MHz computer with a paltry amount of memory and it ran fine. Today we have multi-gigahertz/multi core processors with gigabytes of memory and graphics haven't improved at a rate nearly equal to that of the improvement in processing. Mostly due to all of the superfluous programs running on computers and sloppy programming.

    9. Re:remaining core count by Blaskowicz · · Score: 1

      Graphics have diminishing returns. An example is real time shadows ; most games used to have no shadows at all, or to put a simple round blob under the characters, or to have rather exquisite lighting but static and only on the walls : lighting/shadowing was generated in the map editor when you hit "compile" and was painted on the walls (Quake 1, Half-Life 1).

      A decade later real shadows were common, you can even have a character's nose project a shadow on the character's face and it's done real time. That cost a lot of gigaflops and megabytes.
      All's great, except the shadows will often lack precision so you'll often be looking at poor edges or square blocky patches of "shadow". So let's increase the shadow's resolution by 2x, 4x or 8x in each axis (if you're using shadow maps) or do some whatever filtering at 2x2, 4x4 or 8x8.. then it uses 4x, 16x or 64x more resources just so that it breaks down less often.
      They'll look slightly better or fairly better, but in the grand scheme of things you're still looking at about the same graphics.

    10. Re:remaining core count by Anonymous Coward · · Score: 0

      I didn't realize the 'F' in FUD stood for 'funny'.

      You forgot to dedicate 600 cores to trying to get iTunes to run.

      Or maybe he didn't have iTunes installed. Not everyone does.

  10. It's uses are endless... by Anonymous Coward · · Score: 0

    ...but we are gonna use it mostly for porn.

  11. Re: The Republicans will ban this... by Anonymous Coward · · Score: 0

    That is how they be.

  12. Well by Anonymous Coward · · Score: 0

    Core temp might struggle with this one.

  13. Awesome! by Anonymous Coward · · Score: 0

    I'll take the lot...plenty of new apps that could do with the power...

  14. code breaker? by Anonymous Coward · · Score: 0

    If this is reality, how long will your keys need to be to have any protection by encryption?

    1. Re:code breaker? by Anonymous Coward · · Score: 0

      NP. I don't think it means what you think it means.

    2. Re: code breaker? by Anonymous Coward · · Score: 0

      Clearly only 1000th of the expected lifetime of the universe.

  15. Obligatory by Motherfucking+Shit · · Score: 4, Funny

    Imagine a Beowulf cluster of these!

    --
    "BSD: Free as in speech. Linux: Free as in beer. Windows 10: Free as in herpes." --Man On Pink Corner in #52607549.
    1. Re:Obligatory by Anonymous Coward · · Score: 0

      Just as powerfull as a RaspberryPI Beowulf cluster!
      Excellent for some tasks, utterly useless for most.

    2. Re:Obligatory by tigersha · · Score: 1

      What task is a RaspberryPI Beowulf cluster good for?

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
    3. Re:Obligatory by Anonymous Coward · · Score: 0

      Bragging about making a RPi Beowulf cluster?

    4. Re:Obligatory by Anonymous Coward · · Score: 0

      First-Posts Per Second Per Forum. I call it (FPPSPF, "F-pt-piss-pif")

    5. Re:Obligatory by Anonymous Coward · · Score: 0

      Teaching

    6. Re:Obligatory by Jeremi · · Score: 1

      What task is a RaspberryPI Beowulf cluster good for?

      Generating discussion on Slashdot.

      --


      I don't care if it's 90,000 hectares. That lake was not my doing.
  16. Haha! by Anonymous Coward · · Score: 0

    Not a Logical Effort, for sure.

  17. Whatis the difference between this and 1000 cores by Anonymous Coward · · Score: 1

    6 years ago at least there was a 1000 core processor made. I don't see how this is different.
    The older article:
    http://www.pcworld.com/article/215113/1000_Core_Processor_Eats_Quad_Core_CPUs_For_Lunch.html

  18. Re:The Republicans will ban this... by Anonymous Coward · · Score: 0

    They want to kill us all.

  19. Imagine it as a coprocessor by Camembert · · Score: 3, Interesting

    It could be an interesting extra chip in a general use computer, where programs could syphon routines to, for example kinds of video/image rendering, parallel-able mathematical operations, image recognition, a 1000 node neural network, etc.

    1. Re:Imagine it as a coprocessor by Arkh89 · · Score: 1

      The main problem would be the memory bandwidth then. GPU can siphon through a lot of data because the architecture assumes that nearby threads are very likely to read contiguous data. This architecture however, allows for each core to have its own instruction queue, it should be hard to predict which thread is going to access which portion of the memory so that we can fetch it into a single request. I fail to see how you can scale the bus/controller/etc to match the bandwidth requirement (outside of few dozens MB of cache at best).

      Some of the tasks you mentioned (image processing, deep learning, etc.) are already well adapted to GPUs. I doubt that this new processor will be able to beat them on this.

    2. Re:Imagine it as a coprocessor by Mashiki · · Score: 1

      HBM is the answer to your memory bandwidth issue. Especially since it allows for die stacking.

      --
      Om, nomnomnom...
    3. Re:Imagine it as a coprocessor by AmiMoJo · · Score: 2

      We have those already, in the form of modern GPUs that can do a lot of general purpose processing such as physics simulation and image recognition.

      This chip is more like the Cell processor in the Playstation 2, with a bunch of under-powered cores that are a bugger to program and have very low performance each. I can't see it taking off because, for example, each core only has access to a tiny amount of RAM so the processing they can do will be limited mostly by memory bandwidth. A GPU gives its thousands of cores access to gigabytes of RAM.

      Maybe there is some application where extremely low power and low memory requirements exist, but I'm struggling to imagine what it is. These cores are very simple, they don't have any floating point support etc.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
  20. TIS-1000 ? by Anonymous Coward · · Score: 1

    Deducing from its primary applications which is decoding/encoding and encryption it seems it is more similar to digital signal processor rather than to a regular cpu.

    1. Re:TIS-1000 ? by ZeroEpoch · · Score: 0

      Exactly! It's not a general purpose processor in the sense that it can run an OS. I wasn't involved in the latest design, but the first chip we taped out (AsAP 1) only had a register file. It didn't even have local memory like a cache or DRAM. It's very efficient for software pipelined tasks, which is more aligned with DSP workloads.

    2. Re:TIS-1000 ? by mentil · · Score: 1

      Was hoping for a TIS-100 reference. Left satisfied.

      --
      Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
  21. Raspberry Pomegranite by goombah99 · · Score: 1

    Perfect for the internet of things. Now rather than just an egg timer I can have a battery power super computer in my salt shaker that does a finite element simulation of the egg in boiling water, going beep and the perfect moment. The toaster will be able to insult me in the kings english or the emporer's mandarin.

    And orange Pi is planning to make a board with one of these that only runs on one of the 1000 cores, and no stable OS.

    This thing is I suspect suited for programs that parallelize and have little interprocess communication and run in small memory. Why do I know this? because if each processor had a large memory and an infiniband backplane it would melt. Thus you could update your facebook status on all 1000 dummy accounts for example. Or compute pi or chug bitcoins.

    --
    Some drink at the fountain of knowledge. Others just gargle.
  22. Connection Machine by Anonymous Coward · · Score: 0

    Sounds like https://en.wikipedia.org/wiki/Connection_Machine

    1. Re:Connection Machine by Matthias+Wiesmann · · Score: 1

      The connection machine's processors were distributed among multiple cards, a single card contained 16 processors, the 1 bit processors were implemented using a ROM chip, I think.

    2. Re:Connection Machine by tigersha · · Score: 1

      Same thing I thought. And connection machine died because the architecture was not actually that great.

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
    3. Re:Connection Machine by Anonymous Coward · · Score: 0

      The fact that the individual CPU's were 1-bit wide didn't help its popularity any...

  23. Imagine a beowulf cluster of these! by williamyf · · Score: 1

    The 1990's called, they want their joke back!

    --
    *** Suerte a todos y Feliz dia!
    1. Re:Imagine a beowulf cluster of these! by xupere · · Score: 1

      I doubt that. Why would they want that joke back? They were probably really hoping you'd slip up and mention the past 26 Superbowl outcomes.

  24. Sadly right now limited to one instruction... by Anonymous Coward · · Score: 0

    NOP

  25. Shader units by Z80a · · Score: 1

    Aren't the shader units of the modern GPUs like the Geforces basically specialized CPUs?
    In this case we're already at 2560 CPUs on a single chip.

    1. Re:Shader units by Arkh89 · · Score: 3, Insightful

      No they are not. The threads in a modern GPU are not all free to execute different instructions. A GPU is a SIMT architecture : Single Instruction, Multiple Threads; each warp of threads (group of approx. 16 to 32 threads) will execute the same instruction at the same time on whatever data each one is holding (some threads can also be deactivated in the group, for this instruction). So the physical architecture for each of the thread in a GPU is much simpler than for the threads of this processor (because of factorization of all the instruction queue and related mechanism, much simpler synchronization, etc.).

    2. Re:Shader units by Z80a · · Score: 1

      That makes em quite bad at dealing with conditional execution, right?

    3. Re:Shader units by Arkh89 · · Score: 2

      Well, yes. But I don't think that we can say "terrible" performance for conditional execution. Very simply, if you have a condition "if(test){ ... } else { ... }", the warp (group of threads) will go in the true-block if at least one of them ticks (test==true). During this portion of the execution, the threads which did not tick are disabled and are indeed waiting. And vice-versa for the false-block. If none of the threads tick, or if they all do, then the unnecessary block will be avoided (this is what we hope to have when we write code for GPUs). But at worst, you will go through both block and have half of your threads doing nothing (of course, this also depends on the balance between the amount of work between each block, here I am assuming 50/50 just to keep things simple).

      Where we are severely loosing performance is when we have a condition to end a loop which is different across threads in a warp. Then some of them might spend a lot of time waiting for the last one in the warp to complete.

    4. Re:Shader units by Z80a · · Score: 1

      For loops that use a gradient as a reference must be completely GPU crushing.

    5. Re:Shader units by Rockoon · · Score: 1

      You mean l like coherence-enhancing filters that use a structure tensor to control the shape of a blur and sharpening kernel?

      Photoshops implementation (oil paint filter) is particularly poor in performance. I don't know why its so terrible in performance. Maybe its a marketing thing (if its slow, its must be really good?)

      For image processing in particular, the fact that branching can in the worst case have a significant penalty on gpu's is moot because the worst case doesnt normally happen in practice. The data is arbitrary but it isnt random.

      --
      "His name was James Damore."
  26. in a world without ever increasing frequency by ThorGod · · Score: 2

    The way to improve computational technology is parallelism. What are the usage domains?

    -anything video related
    --games
    --image recognition

    -anything AI (I think?)
    --autonomous cars
    --facial recognition

    -a lot of physics applications

    Thoughts?

    --
    PS: I don't reply to ACs.
    1. Re:in a world without ever increasing frequency by Anonymous Coward · · Score: 0

      I think we should use high-level languages too. Compilers are good enough now that there's almost never a need to do hand-written assembly code any more.

      Also, I consider "goto" evil.

    2. Re:in a world without ever increasing frequency by angel'o'sphere · · Score: 1

      Most stuff in autonomous cars don't need that power.

      The stuff I was involved in runs mainly on 4 ARMs, 1 DSP, 512MB, 500MHz(not sure, might be less). But that was image processing, only for emergency breaking, pedestrian recognition, sign recognition, lane detection etc.

      Additional systems like LIDAR, RADAR, ultrasonic surface tracking etc. usually run independent on a different system, but with similar low spec requirements.

      --
      Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
    3. Re:in a world without ever increasing frequency by Anonymous Coward · · Score: 0

      The way to improve computational technology is parallelism. What are the usage domains?

      -anything AI (I think?)

      -a lot of physics applications

      Thoughts?

      Sorry, you had a typo there. Should have been:

      anything AI (I think?)
      ---Thoughts!

  27. Re: Whatis the difference between this and 1000 co by Anonymous Coward · · Score: 0

    The major difference is that this architecture has register to register communications between cores instead of shared L2 cache. Much, much faster.

  28. Nvidia did it already by Anonymous Coward · · Score: 0

    and for a song and dance.

  29. They built a crappy GPU by Anonymous Coward · · Score: 0

    A GPU, or rather multi parallel processor by any other name since GPUs have become programmable enough to do most anything you'd want, is still a GPU. And that's all they've built, a crappy GPU. Can't wait to see what kind of programs can't run on a chip with no memory coherence, the amount of thrashing a 1,000 unscyned cores could come up with as they all wait for latency bound memory access they need again and again and again. I wonder what the % of cores doing actual work over time is, 10%? 5? Less?

    When whoever builds this passes computer engineering 102 or whatever and learns what they've done then maybe they can get a job at Qualcomm or something, after they've graduated in 6 more years and actually know something.

  30. Sslloow by Billly+Gates · · Score: 1

    It only runs at 1.7ghz. My Pentium IV running XP runs at 4 GHz! Just ask any Joe Six pack who bought them over an AMD?

  31. Boring by nateman1352 · · Score: 3, Informative

    ...contains 621 million transistors... Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.

    Let see... 1,000 very small compute cores... sounds a awful lot like your typical GP-GPU these days. Only reason the power consumption is so small is because it has < 1 billion transistors. Compare that to the 17 billion transistor nVidia pascal monster. Even the non-Iris graphics Skylake desktop CPU has ~1.7 billion, and over half of those are spent on the GPU.

    Chances are even paltry Intel HD Graphics running an OpenCL program will have more FLOPS than this thing. Don't be fooled by the flashy headline, the laws of physics still apply.

    1. Re:Boring by willy_me · · Score: 1

      While I agree this is more flash then substance, it hardly deviates from the laws of physics. Unlike the nVidia example you provided, this CPU does not have much in the way of IO bandwidth. So we are talking about minimal movement of data which in turn results in impressively low power consumption. For certain applications this could be great (a previous post mentions neural networks). For the other 99% it is worthless.

      One should not compare this CPU to a GPU because the underlying design goals are very different. It is possible that certain tasks would be much better serviced by this CPU. Designing appropriate algorithms will take some time so I suppose we will have to wait to see if it is actually useful.

    2. Re:Boring by Anonymous Coward · · Score: 0

      The mid-range Intel HD Graphics 520 has 24 EUs, and can do 8 FMAs per EU per cycle, and runs at 1050MHz, which comes to 403 GFLOPS, so it's in a broadly similar range to this chip in terms of raw computing power.

      Raw computing power isn't the only important thing for highly-parallel computing though - the more interesting problems are how you transfer data in and out, and how you coordinate all the different processing elements. GPUs are only much good for running one algorithm at a time over thousands of pieces of data in parallel, with very limited communication between each piece. KiloCore appears to generally let each processor run independent code, and is based on a scalable memory architecture that lets processors efficiently read data from nearby processors in the 2D grid, which is a more flexible system than in GPUs. It's probably not much good for graphics, but there may be other problems that fit on that architecture much better than on a GPU.

    3. Re:Boring by Anonymous Coward · · Score: 0

      Don't be so dismissive, for certain types of tasks this little thing will compare with supercomputers of yesterdecade just by the virtue of having 1000 independent cores. Different computing tasks will fit different hardware optimizations, thats why we have GPU-s and CPU-s. There will be computing tasks that fit this processing cluster like a glove and with these tasks it will outperform any CPU or GPU, it will obviously be crap if you try to run unsuitable tasks on it.
      But true enough, its unlikely to be of much use to your average consumer

  32. 1000 exactly by evanh · · Score: 5, Informative

    It's a 32 x 31 grid = 992, plus 8 extra stuck on one edge to make up the numbers.

    1. Re:1000 exactly by dohzer · · Score: 1

      Why not just go to 32x32 and be done with it?!

    2. Re:1000 exactly by Anonymous Coward · · Score: 0

      How often does a layout turn out to be exactly square? Perhaps you can place clusters of five in a group that is approximately rectangular and place them in a 10 by 20 grid to make it close to square.

    3. Re:1000 exactly by theskipper · · Score: 1

      So not only isn't it a KibiCPU as would be expected, but it won't be a true KiloCPU either? Calling my lawyer right now to discuss remediation options.

    4. Re:1000 exactly by jandrese · · Score: 1

      Did they previously make Hard Drives or something?

      --

      I read the internet for the articles.
    5. Re:1000 exactly by Anonymous Coward · · Score: 0

      They probably did, but then failure rate made some chips of the lattice not work...

    6. Re:1000 exactly by Ambient+Sheep · · Score: 1

      Reading the paper, it seems that they used the space that the last 24 processors would have taken to provide the 768KB of RAM:

      KiloCore’s 1000 MIMD processors are arrayed in 32 columns and 31 rows with 8 processors and 768 KB inside 12 independent memories in a 32nd row...

  33. Highly parallelized software by aepervius · · Score: 1

    Think like quantum mechanic, finite elements analyzes or weather prediction etc... Everything which are based on matrix or subset of elements which are calculated in parallel. Although I am guessing here memory would be a bottleneck.

    --
    C. Sagan : A demon haunted world:
    http://www.amazon.com/gp/product/0345409469/
    visit randi.org
    1. Re:Highly parallelized software by Anonymous Coward · · Score: 0

      No because this isn't about parallel computing (= where all cores are running the same code). It's can run 1000 different tasks concurrently, without timeslicing.

  34. ORLY by Anonymous Coward · · Score: 0

    All I want for xmas is a new cpu without a built in NSA ME backdoor

  35. Re: Whatis the difference between this and 1000 co by Anonymous Coward · · Score: 0

    Ah. then the "The World's First 1,000-Processor Chip" hype is fully justified!
    Processors without register-to-register communications just don't count!

  36. A GPU? by gentryx · · Score: 2

    Sounds exactly like a GPU to me. :-P

    --
    Computer simulation made easy -- LibGeoDecomp
  37. All things new... by Anonymous Coward · · Score: 0

    This sounds like a warming-up of the old idea of the transputer (https://en.wikipedia.org/wiki/Transputer#System_on_a_chip).

    1. Re:All things new... by Anonymous Coward · · Score: 0

      Or the picoChip PC102 with its 308 processing elements.
      Or the Green Arrays GA144 with its 144 "computers".

  38. Windows 23 by hughbar · · Score: 2

    Will slow it down to a crawl before blue screening. Then we'll be ready for Windows 24 Home Premium Edition. No worries.

    --
    On y va, qui mal y pense!
    1. Re:Windows 23 by Anonymous Coward · · Score: 0

      I want a Home Enterprise edition FFS! To run a multi-billion dollar operation in my mom's basement.

  39. Re: Whatis the difference between this and 1000 c by Anonymous Coward · · Score: 0

    I want ass to mouth communication!!

  40. It does almost nothing very very fast by Required+Snark · · Score: 4, Informative
    If you read the two page technical paper you will see that there is much less here then the hype suggests.

    Each CPU supplies an amount of computation less then a single instruction on a regular CPU. Think of it as a grid of instructions not a grid of computers. A processor has a Harvard architecture with 128 instructions of 40 bit size and a separate data memory with two banks of 128 16 bit data values (256 16 bit data words total). It says nothing about register files or stacks or subroutine calls. It's likely that the two data banks are in effect the register set. The paper implies that a CPU can compute a single floating point operation in software.

    Compiling means mapping code fragments to a set of connected CPUs and routing resources, and then feeding the data into the compute array. After some circuitous path through the grid the answer emerges somewhere. There are also 12 independent memory banks each with a 64KB of SRAM that are available to all CPUs.

    History has not been kind to this kind of grid architecture with lots of CPUs and very little memory. Almost none of them ever made it out of the lab. It's symptomatic of hardware engineers who are clueless about software and design unprogrammable computers. They confuse aggregate theoretical throughput with useful compute resources.

    Debugging code on this would be a nightmare. It's completely asynchronous, there is no hardware to segregate different sets of CPUs doing different computing tasks and so few resources per CPU that software debugging aids would crowd out the working code. The people listed on the paper should be punished by being force to make it do useful work for at least a year. They would be scarred for life.

    --
    Why is Snark Required?
    1. Re:It does almost nothing very very fast by Anonymous Coward · · Score: 0

      It's possible they have a specific usage case in mind, though. Deep Learning springs to mind. Systolic arrays are not general purpose computers but neither are they bereft of use cases.

    2. Re:It does almost nothing very very fast by Anonymous Coward · · Score: 1

      Why? This CPU sounds like it's perfect for Erlang, which although a somewhat odd language, is nonetheless one in which a fair amount of useful software (Chef, CouchDB, Riak, RabbitMQ) has been written.

    3. Re:It does almost nothing very very fast by Anonymous Coward · · Score: 1

      This sounds like it'd be perfect for Erlang, which has been used to do some reasonably large software. RabbitMQ, CouchDB and Riak just off the top of my headThere are several companies in my town (Houston) doing large-scale Erlang work.

    4. Re:It does almost nothing very very fast by arth1 · · Score: 2

      Why? This CPU sounds like it's perfect for Erlang, which although a somewhat odd language, is nonetheless one in which a fair amount of useful software (Chef, CouchDB, Riak, RabbitMQ) has been written.

      Don't forget ejabberd, one of the most useful XMPP instant messenging servers out there.
      But really, I don't think massive parallel processing is going to cause big improvements, because the software design stops at other bottlenecks anyhow, like IO. Having a thousand cores waiting for a commit isn't going to be a heck of a lot faster than having eight cores waiting for commits.

    5. Re:It does almost nothing very very fast by Mr+Z · · Score: 1

      Ah, OK, so it is more or less the latest version of ASaP/ASaP2. I just made a post up-thread about my memory of ASaP. It looked interesting, but as you point out, it has some real practical issues.

      At the time we spoke with them, it sounded like whenever you loaded an algorithm chain, you had to map it to the specific chip you were going to run it on, even, to account for bad cores, different core speeds, etc. Each core has a local oscillator. Whee...

  41. Re:Whatis the difference between this and 1000 cor by Nyh · · Score: 1
  42. I can imagine. by Megol · · Score: 3, Interesting

    Even ignoring all other limitations of this particular processor there's still Amdahl's law, limiting the speedup by the serial parts of a task.
    As one example how that works look at compiling to hardware. In theory this should bring enormous benefits as not only can one parallelize on a instruction level but on a sub-instruction one, speculating and pipelining e.g. additions. Many types of communication can be eliminated entirely by replicating hardware.
    But even with those benefits there are a _lot_ of software that is better to run on a standard processor. Why? Because using custom optimized hardware to run it ends up replicating a number of normal processors including caches, branch prediction etc. and then a processor optimized by a dedicated team of experienced people ends up being attractive.

    Now saying custom hardware can't bring huge benefits, not even saying that this research processor can't do it _however_ in general there are a lot of tasks that can't really be accelerated much.

    1. Re:I can imagine. by Anonymous Coward · · Score: 0

      AC's Law: Tasks that you care about tend to be accelerable on any given hardware platform, given sufficient developer resources.

      This is all that really matters. It's kept me in a job for the last 21 years and it looks unlikely to be solved in the next 21.

    2. Re:I can imagine. by Megol · · Score: 1

      I'd argue that's because the assumed baseline is pretty low. Inexperienced programmers using the wrong algorithm with the wrong implementation in the wrong kind of language running on the wrong kind of platform is relatively common.

  43. Tested yet? by lapm · · Score: 1

    So has it been tested on bitcoin mining yet? or seti project? Im curious what sort of real world throughput it has...

  44. Serious computer engineers by Anonymous Coward · · Score: 0

    ... serious computer engineers don't have any trouble dealing with the irritation that systemd causes ...

    That is because serious computer engineers do not use Ubuntu

  45. What happened to that 1,000 core CPU? by Anonymous Coward · · Score: 0

    ... Like this one form 2004?

    https://tech.slashdot.org/stor... ...

    Anyone knows if there was/were any further development for that project?

    If that project continues to develop for the 12 long years in between 2004 to now the result could have been awesome - particularly coupled with the upcoming 10nm node that Samsung / TSMC / Globalfoundries are busy developing

  46. Creative licence on power usage by scdeimos · · Score: 1

    It's only 0.7W when clocked at 115MHz, but still impressive.

  47. Brain by Anonymous Coward · · Score: 0

    1000 interconnected processors? Wow.
    Scale that up by a factor 100 million, and you have a human brain.

  48. FINALLY! by Lumpy · · Score: 3, Funny

    Something that will run Flash without bogging down.

    --
    Do not look at laser with remaining good eye.
    1. Re:FINALLY! by Anonymous Coward · · Score: 0

      Something that will run Flash without bogging down.

      That's a myth. It doesn't exist.

  49. obligatory Beowulf cluster by Anonymous Coward · · Score: 0

    Imagine a Beowulf cluster or these !

  50. Disappointed by Chrisq · · Score: 1

    What kind of computer scientists are they?

    They should have made it 1024. And labelled them 0-1023.

  51. Systemd on CentOS7 by DrYak · · Score: 4, Informative

    Systemd? Probably because serious computer engineers don't have any trouble dealing with the irritation that systemd causes.

    Confirming: our latest nodes on our cluster are running CentOS7 which is systemd powered.

    (And hopefully the final practical product out this buzzword-compliant pressrelease would still be somewhat useful.
    We could have some special workloads to apply it to).

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Systemd on CentOS7 by Anonymous Coward · · Score: 0

      Confirming: our latest nodes on our cluster are running CentOS7 which is systemd powered.

      Wait, what? 'systemd powered'? so not only has systemd usurped the distro, it's taken out the kernel and the CPU as well?
      well, fsck me!.

    2. Re:Systemd on CentOS7 by erapert · · Score: 1

      No, he's saying that they plug their distro into systemd instead of into a power strip plugged into the wall.

  52. Amdahl's Law by Anonymous Coward · · Score: 0

    Imagine how many mind-boggling things will become possible if this much processing power ultimately finds its way into new consumer technologies.

    Actually, not as many things as you might think. Ahmdahl's law says that the speedup is proportional to the fraction of the algorithm that can be parallelized. For some operations, the task parallelize well (hence the interest in GPU computing for some tasks). For other tasks every step requires the result of a previous step and the effect of adding more CPU's is minimal. Web browsers fall in this last category - you can't do layout until you have some data off the network and even then, the layout mostly proceeds top to bottom of the page: you can't really finish the layout until you know how big all your images, etc. are. So yes, sure you can benefit by adding a few processors, but very quickly you hit a point of diminishing returns.

    -JS

  53. Depends... by DrYak · · Score: 1

    Its depends.
    In the case of Xeon-Phi (i.e.: ex-Larrabee GPUs repurposed as parallel processing units), in addition to the very wide SIMD AVX512 units, there are also scalar cores able to run pentium-compatible binaries.
    So the Linux core managing all the hardware actually run *on* the GPU itself (and you can SSH into your Xeon-Phi if you want).

    On the other hand, the Tilera works exactly as you describe.
    A weird many-core structure running the processing kernels,
    and a nearby classical risc core managing the whole.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Depends... by Pseudonym · · Score: 1

      Yes, I should have been more clear on that. It may need a host CPU (or more than one), but there's no reason why that couldn't be on the same die.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  54. Imagine... by judoguy · · Score: 1
    a Beowolf cluster...

    Had to say it. Haven't see that response in a while.

    --
    Peace is easy to achieve, just surrender. Liberty is much harder get/keep.
    1. Re:Imagine... by Anonymous Coward · · Score: 0

      >a Beowolf cluster...

      It's the perfect solution for a validation chip in debit and credit cards. Combined with a small built-in solar panel, some piezo power capture from sounds, some ambient r.f. capture and rectification, a flatulence-fed fuel cell, and some Peltier thermoelectric generation, the processors will continuously generate enough Bitcoin to maintain a balance due of zero.

  55. Nitpicking by DrYak · · Score: 1

    I'm nitpicking to hell with this but...

    Yes, all the *SIMD units attached to 1 execution core* will necessarily process the exact same instruction at the same time on the same cycle...
    (which from a design point of view makes entirely sens: graphical processing is about repeating some processing on thousands or million pixels. Better group them in batches instead of processing every last damn pixels individually) ...but there more than 1 execution core on most higher range GPUs, and nearly all modern GPUs are able to keep several hyperthreads running concurrently to hide latencies.

    So a modern GPU can execute several different instruction at the same time.
    Even if usually it's the same exact OpenCL code uploaded to all units, the various SIMD units could be executing different points of code.

    But yeah, you're right, within a SIMD, all the threads run the same instruction.

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  56. GreenArrays by Dr.Altaica · · Score: 0

    So this is basically like GreenArrays only less powerful CPU's, is clocked, and is unavailable for purchase.
    http://www.greenarraychips.com...

  57. Nit-picking by DrYak · · Score: 2

    Nit-picking to hell...

    You've forgotten a special use case:

    Yes, if AC's code does something stupid like "every even thread branch lest, every odd thread branch right", the execution group will need to run the code twice, with altening masks to run each branch, exactly as you describe.

    But if it's entirely different part of the thread block that diverge (e.g.: first half vs. second half), the "executions groups" will each diverge independently. The first 18 taking one branch and the second taking the other branch. With no time lost due to alterning execution masks.
    (Which is the preferable way to handle branching code in parallel environment. If you can't do away with the branches altogether, at least try to organise it so nearby threads on the same SIMD branch/loop together.
    e.g.: bin-sort your loops by similar lengths together)

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
    1. Re:Nit-picking by xupere · · Score: 1

      Nit-picking to hell...

      The best kind of nit-picking. :)

  58. Re: EditorDavid will be fired by Anonymous Coward · · Score: 0

    Why is this modded down? Editordavid is a fucking idiot and the fact that he is still working for slashdot says something about the management.

    He has to do one thing, and that's post articles other people have already written. That means all he has to do is come up with introductory text and copy and paste the rest. I mean for fuck sakes.

    TLDR: whipslash keeps him around because he sucks a mean dick.

  59. Systemd cancer by DrYak · · Score: 1

    well, fsck me!.

    Well, fsck is also going to be handled by systemd! Systemd is cancer!!!

    No, wait, you're running the whole on top of BTRFS which doesn't have a real-fsck because it doesn't make any sens on copy-on-write systems! BTRFS is the cheap knock-off of ZFS!!!!

    Argh! All these meme start to get confusing, I don't know which I currently need to blame!

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  60. Mod points... by DrYak · · Score: 1

    Sorry, can't "+1 Funny" you, cause I've already posted in this thread...

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]