Slashdot Mirror


Tilera To Release 100-Core Processor

angry tapir writes "Tilera has announced new general-purpose CPUs, including a 100-core chip. The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search. The Gx100 100-core chip will draw close to 55 watts of power at maximum performance."

45 of 191 comments (clear)

  1. This is great ! by ls671 · · Score: 5, Interesting

    I can't wait to see the output of :

    cat /proc/cpuinfo

    I guess we will need to use:

    cat /proc/cpuinfo | less

    When we reach 1 million cores, we will need to rearrange the output of cat /proc/cpuinfo to eliminate redundant information ;-))

    By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message ;-(

    Note: You need to turn on "Support for big SMP systems with more than 8 CPUs" flag as well.

     

    --
    Everything I write is lies, read between the lines.
    1. Re:This is great ! by MrMr · · Score: 4, Informative

      The 'stock' kernel is ready for 512 cpu's. SGI had a 2048-core single image Linux kernel six years ago.

    2. Re:This is great ! by BadAnalogyGuy · · Score: 5, Insightful

      By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message

      Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

      If you change the code a little more, when I enter a number that's too high for menuconfig, it says "We're not talking about your penis size, Holmes"

    3. Re:This is great ! by am+2k · · Score: 2, Insightful

      Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

    4. Re:This is great ! by dkf · · Score: 2, Insightful

      Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

      Building the memory backplane and communication system (assuming you're going for a cluster) to support a million CPUs is non-trivial. Without those, you'll go faster with fewer CPUs. That's why supercomputers are expensive (it's not in the processors, but in the rest of the infrastructure to support them).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    5. Re:This is great ! by Bert64 · · Score: 4, Informative

      The information in cpuinfo is only redundant like that on x86/amd64...
      On Sparc or Alpha, you get a single block of text where one of the fields means "number of cpus", example:

      cpu : TI UltraSparc IIi (Sabre)
      fpu : UltraSparc IIi integrated FPU
      prom : OBP 3.10.25 2000/01/17 21:26
      type : sun4u
      ncpus probed : 1
      ncpus active : 1
      D$ parity tl1 : 0
      I$ parity tl1 : 0
      Cpu0Bogo : 880.38
      Cpu0ClkTck : 000000001a3a4eab
      MMU Type : Spitfire

      number of cpus active and number of cpus probed (includes any which are inactive)... a million cpus wouldn't present a problem here.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    6. Re:This is great ! by fluch · · Score: 4, Informative

      Sources are always appreciated when you tell us something.

      Here is the source: http://www.kernel.org/

    7. Re:This is great ! by mrops · · Score: 2, Funny

      But the more important question is...

      Will it run Windows 7.

      I know, I know, its the wrong questions, but the answer to the other one is always "yes".

    8. Re:This is great ! by TheRaven64 · · Score: 4, Insightful

      And this is one of the reasons why Linux is such a pain to program for. If you actually want any of this information from a program, you need to parse /proc/cpuinfo. Unfortunately, every architecture decides to format this file differently, so porting from Linux/x86 to Linux/PowerPC or Linux/ARM requires you to rewrite this parser. Contrast this with *BSD, where the same information is available in sysctls, so you just fire off the one that you want (three lines of code), don't need a parser, and can use the same code on all supported architectures. For fun, try writing code that will get the current power status or number and speed of the CPUs. I've done that, and the total code for supporting NetBSD, OpenBSD, FreeBSD and Solaris on all of their supported architectures was less than the code for supporting Linux/x86 (and doesn't work on Linux/PowerPC).

      --
      I am TheRaven on Soylent News
    9. Re:This is great ! by tomhath · · Score: 2, Funny

      Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

      640K cores is more than anyone will ever need.

    10. Re:This is great ! by jellomizer · · Score: 2, Funny

      No you really need 16,711,680 cores. So you have one core for every cell in a standard Excel 2003 sheet (Yea I know 2007 finally gave us more space)

      So 65,536 rows by 255 columns. A CPU for each sell processing its own value. Excel may almost run fast.

       

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    11. Re:This is great ! by tixxit · · Score: 3, Interesting

      There is actually an entire model of computation dedicated to parallel computation: the PRAM model. Lots of nifty algorithms have already been designed for the PRAM model of computation (O(log n) sorting, for instance). What's even cooler is that some of these algorithms have given insights that have then been used to provide speed ups in the RAM model (eg. read Megiddo's "Applying Parallel Computation Algorithms in the Design of Serial Algorithms").

    12. Re:This is great ! by tixxit · · Score: 2, Informative

      Oops, wrong link, Megiddo's paper is here.

    13. Re:This is great ! by MobileTatsu-NJG · · Score: 2, Funny

      Give it a break, shillboy

      We've all seen more than enough paid endorsements of Microsoft's latest exercise in blandness.

      Settle down, Linus.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    14. Re:This is great ! by san · · Score: 2, Informative

      Take a look at /sys/devices/system/cpu: it has information about cpu topology, cpu hot-swap, cache sizes and layout across cores, current power state, etc.

      It's all there, in an architecture-independent way in /sys/devices.

  2. obligatory by wisty · · Score: 2, Funny

    ... and just imagine a Beowulf cluster of them.

    1. Re:obligatory by fractoid · · Score: 2, Insightful

      It IS a Beowulf cluster.

      Obligatory Princess Bride quote:
      Miracle Max: Go away or I'll call the brute squad!
      Fezzik: I'm ON the brute squad.
      Miracle Max: [opens door] You ARE the brute squad!

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
  3. Awfully generous with the term "core" by BadAnalogyGuy · · Score: 2, Insightful

    Yes, I suppose technically any FPGA could be considered a "core" in its own right, but it's a far cry from the CPU cores that you typically associate with the term.

    Putting a stock on a semi-automatic rifle makes it an "assault weapon", but c'mon. It's still a pea shooter.

  4. When does a CPU become the CPU? by LaurensVH · · Score: 5, Interesting

    It appears from the article that it's a new, separate architecture to which the kernel hasn't been ported yet, so these are add-on processors that can help reduce the load on the actual CPU, at least for now. So, em, two things: 1. How exactly does that work without kernel level support? They claimed having ported separate apps (MySQL, memcached, Apache), so this might suggest a generic kernel interface and userspace scheduling. 2. How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

    1. Re:When does a CPU become the CPU? by broken_chaos · · Score: 4, Interesting

      How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

      Loads and loads of RAM/cache, possibly?

    2. Re:When does a CPU become the CPU? by drspliff · · Score: 4, Informative

      The Register goes into more detail than this article, as usal.

      The Tile-Gx chips will run the Linux 2.6.26 kernel and add-on components that make it an operating system. Apache, PHP, and MySQL are being ported to the chips, and the programming tools will include the latest GCC compiler set. (Three years ago, Tilera had licensed SGI's MIPS-based C/C++ compilers for the Tile chips, which is why I think Tilera has also licensed some MIPS intellectual property to create its chip design, but the company has not discussed this.)

      So it seems pretty standard and they're using existing open & closed source MIPS toolchains, however there's still "will" and "are being" in that sentence which brings a little unease...

  5. Custom ISA? by Henriok · · Score: 4, Insightful

    Massive amounts or cores are cool and all that, but if the instruction set isn't any standard type (ie x86, Sparc, ARM, PowerPC or MIPS) chances are that it won't see light outside highly customized applications. Sure, Linux will probably run it. Linux run on anything, but it won't be put in a regular computer other than as an accelerator of some sort, like GPUs which are massively multicore too. Intel's Larrabee though..

    --

    - Henrik

    - when the Shadows descend -
    1. Re:Custom ISA? by EsbenMoseHansen · · Score: 2, Informative

      In general, new instruction sets are mostly interesting in the custom software and the open source software areas. But the latter is quite a large chunk of the server market, so I suppose they could live with that.

      They would need to get support into gcc first, though.

      --
      Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
    2. Re:Custom ISA? by stiggle · · Score: 4, Informative

      From a quick Google - its based on the ARM core (easily licensable cpu core)

    3. Re:Custom ISA? by complete+loony · · Score: 3, Insightful

      1. LLVM backend
      2. Grand central
      3. ???
      4. Done.

      Seriously though, this is exactly what Apple have been working towards recently in the compiler space. You write your application and explicitly break up the algorythm into little tasks that can be executed in parallel. Using a syntax that is light weight and expressive. Then your compiler tool chain and runtime JIT manages the runtime threads and determines which processor is best equipped to run each task. It might run on the normal CPU, or it might run on the graphics card.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    4. Re:Custom ISA? by ForeverFaithless · · Score: 5, Informative

      Wikipedia claims it's a MIPS-derived VLIW instruction set.

      --
      Mark Kretschmann - Amarok Developer, KDE Member
    5. Re:Custom ISA? by Narishma · · Score: 3, Informative

      Why was this modded Informative? Can we have any links? Because both the article here as well as Wikipedia and an old Ars Technica story claim that it's based on MIPS.

      --
      Mada mada dane.
  6. 100? by nmg196 · · Score: 2, Insightful

    Wouldn't it have been better to make it a power of 2? Some work is more easily divided when you can just keep halving it. 64 or 128 would have been more logical I would have thought. I'm not an SMP programmer thought, so perhaps it doesn't make any difference.

    1. Re:100? by Fotograf · · Score: 5, Funny

      it does if you are carefully starting applications in power of two and designing your applications to use power of two threads.

      --
      God's gift to chicks
    2. Re:100? by harry666t · · Score: 4, Informative

      SMP FAQ.

      Q: Does the number of processors in a SMP system need to be a power of two/divisible by two?

      A: No.

      Q: Does the number of processors in a SMP system...

      A: Any number of CPUs/cores that is larger than one will make the system an SMP system*.

      (* except when it's an asymmetrical architecture)

      Q: How do these patterns (power of 2, divisible by 2, etc) of numbers of cores affect performance?

      A: Performance depends on the architecture of the system. You cannot judge by simply looking at the number of cores, just as you can't simply look at MHz.

    3. Re:100? by glwtta · · Score: 5, Funny

      Their plan is to eventually confuse consumers by advertising "X KiloCores! (* KC = 1000 cores)" when everyone expects a KiloCore to be 1024 cores.

      --
      sic transit gloria mundi
  7. FreeBSD and GCD by MacTechnic · · Score: 3, Interesting

    Although I don't expect Apple to release an Apple Server edition with a Tilera multicore processor, I would be interested to see a version of FreeBSD running with Grand Central Dispatch on a Tilera multicore chip. It would give a good idea of how effective GCD would be in allocating cores for execution. Any machine with 100 cores must have a considerable amount of RAM, perhap 8GB+, even with large caches.

    Apple has been very active in developing LLVM compilers, and has recently added CLANG front end, instead of GCC. I don't think apple has open sourced all their work yet, but check llvm.org for the current details. The real trick is breaking any algorithm into blocks. Using OpenCL to organize your code for execution. I mean how different is a 100 core multi-CPU chip from a multicore GPU accellerator!

  8. Allow ia64 to CONFIG_NR_CPUS up to 4096 by foobsr · · Score: 4, Informative
    --
    TaijiQuan (Huang, 5 loosenings)
  9. Been there, done that, got the T-Shirt... by Anonymous Coward · · Score: 5, Interesting

    OK, so big disclaimer: I work for Sun (not Oracle, yet!)

    The Sun Niagara T1 chip came out over 3 years ago, and it did 32 threads on 8 cores.
    And drew something around 50W (200W for a fully-loaded server). And under $4k.

    The T2 systems came out last year, do 64 threads/CPU for a similar power budget. And even less $/thread.

    The T3 systems likely will be out next year (I don't know specifically when, I'm not In The Know), and the threads/chip should double again, with little power increase.

    Of course, per-thread performance isn't equal to anything like a modern "standard" CPU. Though, it's now "good enough" for most stuff - the T2 systems have a per-thread performance equal to about the old Pentium3 chips. I would be flabbergasted if this GX chip had a per-core performance anywhere near that.

    I'm not sure how Intel's Larabee is going to show (it's still nowhere near release), but the T-seres chips from Sun are cheap, open, and available now. And they run Solaris AND Linux. So unless this new GX chip is radically more efficient/higher-performance/less costly, I don't see this company making any impact.

    -Erik

  10. It would be clever by rbanffy · · Score: 2, Insightful

    Since a) developing a processor is insanely expensive and b) they need it to run lots of software ASAP, it would be very clever if they spent a marginal part of the overall development costs in making sure every key Linux and *BSD kernel developer gets some hardware they can use to port the stuff over. Make it a nice desktop workstation with cool graphics and it will happen even faster.

    They are going up against Intel... The traditional approach (delivering a faster processor with a better power consumption at a lower price) simply will not work here.

    I think Movidis taught us a lesson a couple years back. Users will not move away from x86 for anything less than a spectacular improvement. Even the Niagara SPARC servers are a hard sell these days...

  11. Re:What ISA? by Narishma · · Score: 2, Informative

    No, they are derived from the MIPS architecture.

    --
    Mada mada dane.
  12. hmm... by Skizmo · · Score: 2, Funny

    100 cores... that means that my cpu will never go beyond '1% busy'

  13. Yep by Sycraft-fu · · Score: 5, Informative

    Unfortunately these days the meaning of supercomputer gets a bit diluted by many people calling clusters "supercomputers". They aren't really. As you noted what makes a supercomputer "super" isn't the number of processors, it is the rest, in particular the interconnects. Were this not the case, you could simply use cheaper clusters.

    So why does it matter? Well, certain kinds of problems can't be solved by a cluster, just as certain ones can. To help understand how that might work, take something more people are familiar with like the difference between a cluster and just a bunch of computers on the Internet.

    Some problems are extremely bandwidth non-intensive. They don't need no inter-node communication, and very little communication with the head node. A good example would be the Mersenne Prime Search, or Distributed.net. The problem is extremely small, the structure of the program is larger than the data itself. All the head node has to do is hand out ranges for clients to work on, and the clients only need to report the results, affirmative or negative. As such, it is something suited to work over the Internet. The nodes can be low bandwidth, they can drop out of communication for periods of time and it all works fine. Running on a cluster would gain you no speed over the same group of computers on modems.

    However the same is not true for video rendering. You have a series of movie files you wish to composite in to a final production, with effects and so on. This sort of work is suited to a cluster. While the nodes can work independent, the work of one node doesn't depend on the others, they do require a lot of communication with the head node. The problem is very large, the video data can be terabytes. The result is also not small. So you can do it on many computers, but the bandwidth needs to be pretty high, with low latency. Gigabit Ethernet is likely what you are looking at. Trying to do it over the Internet, even broadband, would waste more time in data transfer than you'd gain in processing. You need a cluster.

    Ok well supercomputers are the next level of that. What happens when you have a problem where you DO have a lot of inter-node communication? The result of the calculations on one node are influenced by the results on all others. This happens in things like physics simulations. In this case, a cluster can't handle it. You can slam your bandwidth but worse, you have too much latency. You spend all your time waiting on data, and thus computation speed isn't any faster.

    For that, you need a supercomputer. You need something where nodes can directly access the memory of other nodes. It isn't quite as fast as local memory access, but nearly. Basically you want them to play like they are all the same physical system.

    That's what separates a true supercomputer for a big cluster. You can have lots of CPUs and that's wonderful, there are a lot of problems you can solve on that. However that isn't a supercomputer unless the communication between nodes is there.

    1. Re:Yep by afidel · · Score: 2, Interesting

      10Gb ethernet is fairly low latency and obviously has plenty of bandwidth, using remoteDMA you can get pretty damn good results. Obviously if latency is your #1 performance blocker then it's not going to produce the fastest results, but you can still get good results out of a fairly inexpensive cluster using 10Gb fat trees for most workloads. Basically commodity computing technology has shrunk the gap between what can be done on a moderate sized commodity cluster and what can be done on a purpose built supercomputer, the result being what has happened to Cray and SGI.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  14. looks like by nimbius · · Score: 3, Funny

    /proc/cpuinfo will become a small book. on the bright side, i guarantee 100 cores meets the draft requirements for 'windows 8 capable' status.

    --
    Good people go to bed earlier.
  15. 15-bladed shaving razor by kannibul · · Score: 2, Interesting

    For some reason, I read this article and immediately thought about a 15-bladed hsaving razor... My point being that 100 cores, while it sounds impressive, you get a diminished return after a few cores. Even if software was written for multi-core use (and not enough of it is, IMO), you still can't possibly, effectively, use 100 cores...not before this processor is already extinct due to technological progress. Even my quad core Intel CPU, hardly uses all 4 cores...and most commonly hits CPU1 for processes.

    1. Re:15-bladed shaving razor by cpghost · · Score: 2, Insightful

      My point being that 100 cores, while it sounds impressive, you get a diminished return after a few cores.

      Yes, indeed. The memory bus is usually the bottleneck here... unless you switch from SMP to NUMA architecture, which seems necessary for anything with more than, say, 8 to 16 cores.

      --
      cpghost at Cordula's Web.
  16. asymmetric by TheSHAD0W · · Score: 2, Interesting

    It's been reported that these cores will be relatively underpowered, though both the total processing power and cost per watt will be quite impressive. This makes the chip appropriate for putting in a server but not so much a desktop machine, where CPU-intensive single-threads may bog things down.

    So what about one of these in combination with a 2-, 3- or 4-core AMD/Intel chip? The serious threads can be run on the faster chip, while all the background stuff can be spread among the slower cores? Does Windows have the ability to prioritize like that? Does Linux?

  17. Dancing Hamsters... by jameskojiro · · Score: 2, Funny

    It is like 100 Dancing Hamsters in your CPU.

    --
    Tsukasa: All I really want, is to be left alone...
  18. why not go to the source? by slew · · Score: 2, Informative

    The company website claims...

      64-bit VLIW processors with 64-bit instruction bundle
      3-deep pipeline with up to 3 instructions per cycle

    I don't know how this could be considered ARM or MIPS-derived...

    A better description might have been in this article...

    The Tile64 is based on a proprietary VLIW (very long instruction word) architecture, on which a MIPS-like RISC architecture is implemented in microcode. A hypervisor enables each core to run its own instance of Linux, or alternatively the whole chip can run Tilera's 64-way SMP (symmetrical multiprocessing) Linux implementation.