Slashdot Mirror


Tilera To Release 100-Core Processor

angry tapir writes "Tilera has announced new general-purpose CPUs, including a 100-core chip. The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search. The Gx100 100-core chip will draw close to 55 watts of power at maximum performance."

191 comments

  1. This is great ! by ls671 · · Score: 5, Interesting

    I can't wait to see the output of :

    cat /proc/cpuinfo

    I guess we will need to use:

    cat /proc/cpuinfo | less

    When we reach 1 million cores, we will need to rearrange the output of cat /proc/cpuinfo to eliminate redundant information ;-))

    By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message ;-(

    Note: You need to turn on "Support for big SMP systems with more than 8 CPUs" flag as well.

     

    --
    Everything I write is lies, read between the lines.
    1. Re:This is great ! by MrMr · · Score: 4, Informative

      The 'stock' kernel is ready for 512 cpu's. SGI had a 2048-core single image Linux kernel six years ago.

    2. Re:This is great ! by BadAnalogyGuy · · Score: 5, Insightful

      By the way I just typed "make menuconfig" and it wiil let you enter a number up to 512 in the "Maximum number of CPUs" field, so the Linux kernel seems ready for up to 512 CPUs (or cores, they are handled the same way by Linux it seems) as far I can tell by this simple test. Entering a number greater than 512 gives the "You have made an invalid entry" message

      Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

      If you change the code a little more, when I enter a number that's too high for menuconfig, it says "We're not talking about your penis size, Holmes"

    3. Re:This is great ! by ls671 · · Score: 0

      Whoa ! When I change apache httpd server code, it says "Microsoft IIS server" or anything I want when I type "httpd -v". I guess it's the same for anything for which you have the source code ;-))

      More seriously, do you have any reference for "Linux is ready for up to a million cores" ?

      Sources are always appreciated when you tell us something. I googled a little without finding anything on what you are talking about.

      Thanks !

      --
      Everything I write is lies, read between the lines.
    4. Re:This is great ! by Trepidity · · Score: 1

      And if you change the code a little more, it takes single-threaded tasks and automatically finds an efficient parallelization of them, distributing the work out to those million cores!

    5. Re:This is great ! by am+2k · · Score: 2, Insightful

      Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

    6. Re:This is great ! by Trepidity · · Score: 1

      Yes, but taking an arbitrary single-threaded algorithm and automatically figuring out what the parallelization is is the hard part. =]

    7. Re:This is great ! by dkf · · Score: 2, Insightful

      Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

      Building the memory backplane and communication system (assuming you're going for a cluster) to support a million CPUs is non-trivial. Without those, you'll go faster with fewer CPUs. That's why supercomputers are expensive (it's not in the processors, but in the rest of the infrastructure to support them).

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    8. Re:This is great ! by am+2k · · Score: 1

      Well, you could analyze the data dependencies and put them into a dependency graph, and then figure out what can be parallelized without having too much synchronization overhead. However, that's probably something for a theoretical scientific paper, and I'd be surprised if you could paralellize most algorithms to split to more threads than you could count on one hand...

      As soon as you're doing linear I/O (like network access), you've hit a barrier anyways.

    9. Re:This is great ! by BadAnalogyGuy · · Score: 1

      More seriously, do you have any reference for "Linux is ready for up to a million cores" ?

      There was an article on Wikipedia that said so. And my local copy of the Linux kernel source has a comment that says so.

    10. Re:This is great ! by Bert64 · · Score: 4, Informative

      The information in cpuinfo is only redundant like that on x86/amd64...
      On Sparc or Alpha, you get a single block of text where one of the fields means "number of cpus", example:

      cpu : TI UltraSparc IIi (Sabre)
      fpu : UltraSparc IIi integrated FPU
      prom : OBP 3.10.25 2000/01/17 21:26
      type : sun4u
      ncpus probed : 1
      ncpus active : 1
      D$ parity tl1 : 0
      I$ parity tl1 : 0
      Cpu0Bogo : 880.38
      Cpu0ClkTck : 000000001a3a4eab
      MMU Type : Spitfire

      number of cpus active and number of cpus probed (includes any which are inactive)... a million cpus wouldn't present a problem here.

      --
      http://spamdecoy.net - free throwaway anonymous email - avoid spam!
    11. Re:This is great ! by rbanffy · · Score: 1

      "More seriously, do you have any reference for "Linux is ready for up to a million cores" ?"

      SGI has 4096-core monsters, as MrMr pointed out.

      Do you have a million-core machine we can use to invalidate this hypothesis?

    12. Re:This is great ! by fluch · · Score: 4, Informative

      Sources are always appreciated when you tell us something.

      Here is the source: http://www.kernel.org/

    13. Re:This is great ! by Anonymous Coward · · Score: 0

      cat /proc/cpuinfo | less

      I guess you can also use

      less /proc/cpuinfo

    14. Re:This is great ! by 1s44c · · Score: 1

      cat /proc/cpuinfo | less

      That gets modded interesting these days? The use of a pipe?

      If that's not too basic to be considered interesting then moderators have got a odd idea about what interesting actually means.

    15. Re:This is great ! by Anonymous Coward · · Score: 0

      Could you send the patch?

    16. Re:This is great ! by RichardJenkins · · Score: 1

      Pipes: Not just for hitting any more.

    17. Re:This is great ! by ls671 · · Score: 1

      Nah!, I am lazy... when I realize the file is to big, it is faster for me to add the pipe at the end of the line than to edit the beginning of the line ... ;-)

      --
      Everything I write is lies, read between the lines.
    18. Re:This is great ! by glgraca · · Score: 1

      When we reach 1 million cores, we'll probably be able to ask the computer what's on his mind...

    19. Re:This is great ! by somersault · · Score: 1

      that's what the 'home' key is for :p

      --
      which is totally what she said
    20. Re:This is great ! by ls671 · · Score: 1

      Common! quit being such a tough guy and let us know where it says so...

      grep -r "1,000,000" /usr/src/linux /usr/src/linux/drivers/net/qlge/qlge_ethtool.c: * We do this by using a basic thoughput of 1,000,000 frames per /usr/src/linux/kernel/cpuset.c: * per msec it maxes out at values just under 1,000,000. At constant

      grep -ri "one million" /usr/src/linux /usr/src/linux/arch/x86/math-emu/README:found at a rate of 133 times per one million measurements for fsin. /usr/src/linux/arch/x86/math-emu/README:was obtained per one million arguments. For three of the instructions,

      --
      Everything I write is lies, read between the lines.
    21. Re:This is great ! by mrops · · Score: 2, Funny

      But the more important question is...

      Will it run Windows 7.

      I know, I know, its the wrong questions, but the answer to the other one is always "yes".

    22. Re:This is great ! by xtracto · · Score: 1

      Actually, some algorithms (like fluid simulation and a very large neural net) are not that hard to parallelize to run on a million cores.

      That may be because "Fluid simulation" can be done with a particle system, where each particle can be controlled by one core.

      Similarly when developing artificial neural networks you could potentially put one "artificial neuron" in each core.

      Another interesting distributed system paradigm is multi-agent systems. You could potentially put one "agent" (small program) in each core, with well defined rules of interaction and processing.

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    23. Re:This is great ! by TheRaven64 · · Score: 1

      Those 4096 core SGI machines are clusters of 4-core machines with a very fast interconnect. Each cluster node runs its own local software with some quite evil stuff (custom memory controller and some extra logic in the VM subsystem for cache coherency across nodes) to handle distributed shared memory and process migration. These are not SMP machines and, although most of the relevant code is in the mainstream kernel sources, it is so tied to SGI's architecture that it is almost completely useless from the point of view of supporting other architectures. Compare this to something like a 64-processor Sun machine, which really is an SMP machine and you get very different performance characteristics.

      When people describe them as single system image, they mean that they appear to userspace as being single machines, not that they are running a single instance of the kernel.

      --
      I am TheRaven on Soylent News
    24. Re:This is great ! by TheRaven64 · · Score: 4, Insightful

      And this is one of the reasons why Linux is such a pain to program for. If you actually want any of this information from a program, you need to parse /proc/cpuinfo. Unfortunately, every architecture decides to format this file differently, so porting from Linux/x86 to Linux/PowerPC or Linux/ARM requires you to rewrite this parser. Contrast this with *BSD, where the same information is available in sysctls, so you just fire off the one that you want (three lines of code), don't need a parser, and can use the same code on all supported architectures. For fun, try writing code that will get the current power status or number and speed of the CPUs. I've done that, and the total code for supporting NetBSD, OpenBSD, FreeBSD and Solaris on all of their supported architectures was less than the code for supporting Linux/x86 (and doesn't work on Linux/PowerPC).

      --
      I am TheRaven on Soylent News
    25. Re:This is great ! by TheRaven64 · · Score: 1

      It's interesting that even in 2009 on a site for geeks, many people seem not to know about cat abuse and would still rather spawn two processes to do the job of one.

      --
      I am TheRaven on Soylent News
    26. Re:This is great ! by Anonymous Coward · · Score: 0

      Get with the times - was doing that a year ago with psrinfo on a Sun T5240 (128 threads). Have not got my hands on a T5440 yet though... 256 threads.

    27. Re:This is great ! by asaul · · Score: 1

      How many cores does it take to run a parallel algorithm?

      100 - 1 to do the processing, 1 to fetch the data and 98 to calculate an efficient way to make the whole thing run in parallel.

      --
      "If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
    28. Re:This is great ! by tomhath · · Score: 2, Funny

      Whoa. If you change the source a little, you can enter 1000000 into the Maximum number of CPUs field! Linux is ready for up to a million cores.

      640K cores is more than anyone will ever need.

    29. Re:This is great ! by jellomizer · · Score: 2, Funny

      No you really need 16,711,680 cores. So you have one core for every cell in a standard Excel 2003 sheet (Yea I know 2007 finally gave us more space)

      So 65,536 rows by 255 columns. A CPU for each sell processing its own value. Excel may almost run fast.

       

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    30. Re:This is great ! by Anonymous Coward · · Score: 0
      Give it a break, shillboy

      We've all seen more than enough paid endorsements of Microsoft's latest exercise in blandness.

    31. Re:This is great ! by C0vardeAn0nim0 · · Score: 1

      because in 2009 CPU power and memory are cheaper than dirt. or didn't you notice we're discussing a 100-core CPU ?

      with capacities like that, even firing MS word to edit a plain text file, instead of notepad, is not too costly anymore... and no, i won't apologize, say i was kidding or any other shenanigans. i really mean it.

      --
      What ? Me, worry ?
    32. Re:This is great ! by Anonymous Coward · · Score: 0

      OK, if one penguin for every core is displayed when booting... then your screen would be filled with hundreds of them, just like some antarctic islands!

    33. Re:This is great ! by Sancho · · Score: 1

      I think that "Useless Use of cat" is funny. I really do. I go back and read it every once in a while just for grins.

      But we're in the future, now. Spawning that extra process isn't going to hurt anything. Yeah, it's fun to poke at people who do silly things like that, but in reality, there's rarely harm in doing things this way. Even if you're using a shell script which will run "cat file | grep" over and over, you're probably not going to start thrashing on a modern CPU.

    34. Re:This is great ! by TheRaven64 · · Score: 1

      And then when you use this same pattern in a concurrent find operation, and you end up with 2,000 processes running instead of 1,000, and each read operation being turned into a read, copy, write, read, sequence (which is what happens if you use cat like this), then it's still a good idea?

      No matter how fast computers become, a complex and slow solution to a problem is never better than a simple and slow one. At the very least, typing 'cat /proc/cpuinfo | less' takes more time than typing 'less /proc/cpuinfo' and this time difference is based on the user, not the CPU, and so the speed of the computer is irrelevant.

      A fast CPU doesn't excuse a bad algorithm. You don't get to 1GHz, and then say 'well, CPUs are fast now, let's use bubblesort instead of quicksort.'

      --
      I am TheRaven on Soylent News
    35. Re:This is great ! by dirtyhippie · · Score: 1

      I know 2007 finally gave us more space

      My god... You're one of THEM!!!!

    36. Re:This is great ! by TheRaven64 · · Score: 1

      First, and importantly, it is more to type. Getting into the habit of doing more work than you need to is never a good idea.

      Secondly, it is a much bigger overhead than you might think. With 'less {file}' the less process just reads the data directly. The kernel copies it out of the VM cache and into the process's buffer. Sometimes it doesn't even do that. Both less and grep will sometimes use mmap(). In that case, the kernel just updates the page tables and the data is never copied, it's just DMA'd from the disk to the process's address space. This doesn't matter for small files, but try running grep() with a 200MB file and you'll see the difference even on a fast computer. If you use cat, grep / less / whatever can't use mmap(), they have to use read(). What actually happens is that cat reads from the kernel. The kernel copies from the page cache into cat's buffer. Cat then writes to the pipe, the kernel copies the data from the buffer into the pipe's buffer. Next, the tool reads from the pipe, the kernel copies the data from the pipe to the tool's buffer. You have now copied the data three times, had one context switch and three system calls for every system call that you would have had if the tool had just read the file (which gives worse performance than using mmap() on big files, but better on small ones). How long does it take a modern CPU to copy 200MB of data three times between three address spaces (from the kernel to cat, from cat to the kernel, from the kernel to grep)? Not very long, but still a significant fraction of the time that it takes to do a simple string match within that data. Do that in a loop, and you're going to be causing a lot of churn in the VM subsystem.

      Basically, your post is equivalent to advocating writing your own bubblesort implementation, because it's fast enough on small data sets with modern processors, rather than using the system-provided quicksort function. It's a bad habit, and the fact that it isn't too bad in certain situations doesn't mean it's something that should be encouraged.

      --
      I am TheRaven on Soylent News
    37. Re:This is great ! by Sancho · · Score: 1

      It's not less to type once you've already typed "cat /proc/cpuinfo" and then realized -- dangit, I have to paginate that."

      Basically, your post is equivalent to advocating writing your own bubblesort implementation, because it's fast enough on small data sets with modern processors, rather than using the system-provided quicksort function. It's a bad habit, and the fact that it isn't too bad in certain situations doesn't mean it's something that should be encouraged.

      It's like using system-implemented bubblesort over system-implemented quicksort because you're using to typing bubblesort. When you realize that you actually need something faster, you can switch. You're advocating Premature Optimization, which Knuth warns against.

    38. Re:This is great ! by tixxit · · Score: 3, Interesting

      There is actually an entire model of computation dedicated to parallel computation: the PRAM model. Lots of nifty algorithms have already been designed for the PRAM model of computation (O(log n) sorting, for instance). What's even cooler is that some of these algorithms have given insights that have then been used to provide speed ups in the RAM model (eg. read Megiddo's "Applying Parallel Computation Algorithms in the Design of Serial Algorithms").

    39. Re:This is great ! by jefu · · Score: 1

      Good point. But since it wouldn't be hard to add this to /sys, (and I see some of that info already there) I suspect that nobody has really needed it in that format yet. Also, if you're going to get more than a couple pieces of that, /proc/cpuinfo has it nicely in one place and is far from hard to parse.

    40. Re:This is great ! by TheRaven64 · · Score: 1

      Choosing good algorithms is not premature optimization. It is only premature if the optimisation comes at the expense of readability or maintainability. Picking a complex and slow solution when there is a simple and fast solution is never the right thing to do. Choosing not to intentionally do the wrong thing is not optimisation, it is good practice.

      --
      I am TheRaven on Soylent News
    41. Re:This is great ! by TheRaven64 · · Score: 1

      /proc/cpuinfo has it nicely in one place and is far from hard to parse

      ... on x86. Now port your code to PowerPC. Oh, sorry, different format, fields have different names. Write a new parser. Now port it to ARM. Oh, sorry, different format, fields have different names, some of the information isn't there. Now try porting it to SPARC, oh, sorry, can't be bothered supporting Linux, waste of developer time.

      --
      I am TheRaven on Soylent News
    42. Re:This is great ! by tixxit · · Score: 2, Informative

      Oops, wrong link, Megiddo's paper is here.

    43. Re:This is great ! by chaim79 · · Score: 1

      From looking at the press release, it doesn't look like this is a x86 compatible CPU, so I don't think that MS will port Windows 7 to it any time soon. So for the moment it's Linux/Unix only.

      --
      DEMETRIUS: Villain, what hast thou done?
      AARON: Villain, I have done thy mother.
      Shakespeare invents 'your mom'
    44. Re:This is great ! by amorsen · · Score: 1

      You can't depend on less working with anything in /proc. What you really want is less < /proc/cpuinfo

      --
      Finally! A year of moderation! Ready for 2019?
    45. Re:This is great ! by emj · · Score: 1

      Hmm perhaps one should have a shell that let you do things like this: /proc/cpuinfo|less.. Because it's a lots better to have the static info on the start of the line and easily being able to change the command later, e.g. sed/grep what ever.

    46. Re:This is great ! by Anonymous Coward · · Score: 0

      Stop abusing your cat: less /proc/cpuinfo

    47. Re:This is great ! by MobileTatsu-NJG · · Score: 2, Funny

      Give it a break, shillboy

      We've all seen more than enough paid endorsements of Microsoft's latest exercise in blandness.

      Settle down, Linus.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    48. Re:This is great ! by san · · Score: 2, Informative

      Take a look at /sys/devices/system/cpu: it has information about cpu topology, cpu hot-swap, cache sizes and layout across cores, current power state, etc.

      It's all there, in an architecture-independent way in /sys/devices.

    49. Re:This is great ! by TheRaven64 · · Score: 1

      And, of course, isn't supported by the 2.4 series kernels that you find in a lot of ARM Linux devices...

      --
      I am TheRaven on Soylent News
    50. Re:This is great ! by HiThere · · Score: 1

      It probably is, if you don't consider ANYTHING about efficiency. That's always the killer in massive parallel processing.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    51. Re:This is great ! by huhmz · · Score: 1

      You know you can parse cpus from dmidecode in Linux also.
      Thats the way ive done it. Seems to work on a bunch of diffrent Sun and HP-servers with SuSE, ubuntu and debian and with sparc, x86 and x86-64 archs

    52. Re:This is great ! by Anonymous Coward · · Score: 0

      Then develop in Windows 7 for ARM and SPARC instead. Oh wait...

    53. Re:This is great ! by arkane1234 · · Score: 1

      We'll do as always... find another way to bog the hell out of the CPU(s).
      Remember, Linux ran on 386 with 4mb ram, nicely I might add :)

      --
      -- This space for lease, low setup fee, inquire within!
    54. Re:This is great ! by 1s44c · · Score: 1

      Sancho, You have totally lost this argument. Give it up and move on with your life.

    55. Re:This is great ! by gd2shoe · · Score: 1

      And, of course, isn't supported by the 2.4 series kernels that you find in a lot of ARM Linux devices...

      Is there a good technical reason for that, or are the systems merely old and the developers lazy? Unless there is a technical reason why 2.4 is being used, it's not the fault of Linux. The fault lies elsewhere.

      (disclaimer: I really don't know the answer to the question I asked. If there is a support issue of some kind, you can inform me, but please don't flame.)

      --
      I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
    56. Re:This is great ! by Anonymous Coward · · Score: 0

      Nonsense!

      Ever heard of sysfs? (i.e. /sys/devices/system/cpu)

    57. Re:This is great ! by Wintervenom · · Score: 1

      More importantly, can it handle Adobe Flash?

    58. Re:This is great ! by Anonymous Coward · · Score: 0

      I've always just used
              sysconf(_SC_NPROCESSORS_CONF)

    59. Re:This is great ! by rusl · · Score: 1

      Flash is the real enemy! ...x100

      --
      Stupidity is its own reward.
    60. Re:This is great ! by amRadioHed · · Score: 1

      Still more work. PIDs are cheap, we can afford to be lazy.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    61. Re:This is great ! by amRadioHed · · Score: 1

      A fast CPU doesn't excuse a bad algorithm. You don't get to 1GHz, and then say 'well, CPUs are fast now, let's use bubblesort instead of quicksort.'

      Bad example. Bubble sort is O(n^2) whereas quicksort is only O(n log n). That's a big difference. OTOH 'cat /proc/cpuinfo | less' and 'less /proc/cpuinfo' are both O(n).

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    62. Re:This is great ! by amRadioHed · · Score: 1

      He didn't lose it, he's right. No one in their right mind would, having already typed 'cat /proc/cpuinfo', go back to the beginning of the line and change cat to less when they can just tack '| less' on the end instead. If we're talking about writing a script, then sure. But for one-off commands? No way.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    63. Re:This is great ! by DaVince21 · · Score: 1

      I thought they upgraded that to 4096 with 2.6.30 (but it still can't display Flash video smoothly)?

      --
      I am not devoid of humor.
    64. Re:This is great ! by Anonymous Coward · · Score: 0

      This thing's gonna run Duke Nukem Forever like a champ.

  2. imagine a beowulf... by gandhi_2 · · Score: 0, Redundant

    ...cluster of natalie pormemes.

  3. obligatory by wisty · · Score: 2, Funny

    ... and just imagine a Beowulf cluster of them.

    1. Re:obligatory by fractoid · · Score: 2, Insightful

      It IS a Beowulf cluster.

      Obligatory Princess Bride quote:
      Miracle Max: Go away or I'll call the brute squad!
      Fezzik: I'm ON the brute squad.
      Miracle Max: [opens door] You ARE the brute squad!

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    2. Re:obligatory by AHuxley · · Score: 1

      ... and just imagine AT&T upgrading to them.

      --
      Domestic spying is now "Benign Information Gathering"
    3. Re:obligatory by Meski · · Score: 1

      ... and just imagine a Beowulf cluster of them.

      (JARRING CHORD)

      NOBODY imagines a Beowulf cluster.

      Our chief weapons are ... parallel processing and ruthless efficiency.

  4. OOOoooo! BABY LIGHT MY FIRE !! by Anonymous Coward · · Score: 0

    Yeah, baby !! That's a LOT OF POWER to turn my knobs !!

  5. Awfully generous with the term "core" by BadAnalogyGuy · · Score: 2, Insightful

    Yes, I suppose technically any FPGA could be considered a "core" in its own right, but it's a far cry from the CPU cores that you typically associate with the term.

    Putting a stock on a semi-automatic rifle makes it an "assault weapon", but c'mon. It's still a pea shooter.

    1. Re:Awfully generous with the term "core" by Anonymous Coward · · Score: 0

      There's also another potential problem. All of these 100 "cores" share an extremely small amount of cache.

      32K L1i cache, 32K L1d cache, 256K L2 cache per tile

    2. Re:Awfully generous with the term "core" by Khyber · · Score: 1

      You don't need a lot of cache when you have a system designed to work with smaller data chunks at a faster pace.

      --
      Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    3. Re:Awfully generous with the term "core" by mako1138 · · Score: 1

      Wow, two bad analogies in one post?

      This Tilera product doesn't look like an FPGA. Standard cell ASIC, maybe, but definitely not an FPGA.

  6. When does a CPU become the CPU? by LaurensVH · · Score: 5, Interesting

    It appears from the article that it's a new, separate architecture to which the kernel hasn't been ported yet, so these are add-on processors that can help reduce the load on the actual CPU, at least for now. So, em, two things: 1. How exactly does that work without kernel level support? They claimed having ported separate apps (MySQL, memcached, Apache), so this might suggest a generic kernel interface and userspace scheduling. 2. How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

    1. Re:When does a CPU become the CPU? by broken_chaos · · Score: 4, Interesting

      How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

      Loads and loads of RAM/cache, possibly?

    2. Re:When does a CPU become the CPU? by drspliff · · Score: 4, Informative

      The Register goes into more detail than this article, as usal.

      The Tile-Gx chips will run the Linux 2.6.26 kernel and add-on components that make it an operating system. Apache, PHP, and MySQL are being ported to the chips, and the programming tools will include the latest GCC compiler set. (Three years ago, Tilera had licensed SGI's MIPS-based C/C++ compilers for the Tile chips, which is why I think Tilera has also licensed some MIPS intellectual property to create its chip design, but the company has not discussed this.)

      So it seems pretty standard and they're using existing open & closed source MIPS toolchains, however there's still "will" and "are being" in that sentence which brings a little unease...

    3. Re:When does a CPU become the CPU? by Anonymous Coward · · Score: 0

      This company is probably in its death bed. Engineering cannot save it, but business sense and market hype may. The chip is a viable bit of technology. The revolution in its design is based on the fact that it can have massive parallelism efficiently. Originally, however, the company was aiming at scientific computing. That is they were looking to replace clusters and similar things. The problem is that their chips were less capable of grining through the math. With several processors, however, taking a few more cycles to do each multiply is find if it can do dozens of them at the same time. It did not sell me as I went with a traditional cluster solution.

      I am not sure if the new generation of chip solves the math prolem, but this tid bit sounds like they are just dodging it. The chip may be useful as some kind of VPN or HTTPS accelerator, but those already exist on the market.

    4. Re:When does a CPU become the CPU? by kantos · · Score: 1

      I seem to be having Deja'vu with a little company called SiCortex that no longer exists (the Wikipedia article is out of date). Why? Because nobody wanted to rewrite their software for a machine less than 0.1% of the market used. The other reason is that most software in the HPC world is written to be GCC/Intel compatible so porting to PGI was interesting. This chip might go somewhere if they can market it as a coprocessor with BLAS libraries, however even if they try to do that their going up against IBM with its CELL blades and it's mixed platform BLAS as used by Roadrunner which is currently the number one on the top500.

      --
      Any and all content posted above may be ignored, considered irrelevant, or otherwise dismissed.
    5. Re:When does a CPU become the CPU? by drspliff · · Score: 1

      All the other Tilera products run Linux with standard GCC toolchain for MIPS, as far as I can see this new one is the same but comes with 36 more cores than the previous largest processor they sell... so from that perspective there shouldn't be significantly more problems compared to working with any ARM or MIPS development board.

      And BLAS? This is targeted for an entirely different industry, the same one that Sun's TI series competes in with 'up to 256' hardware threads per box... highly concurrent but fairly trivial stuff; BLAS on the other hand offloads an extremely specific workload, so while it may have the best floating-point performance it's also hard to utilize (talk to any PS3 game developers and they'll explain just how much work it is to take advantage of the CELL cores).

    6. Re:When does a CPU become the CPU? by Trieuvan · · Score: 1

      How does this fix the apps they ported being mostly IO bound in a lot of cases and 99% of the cores will still just be eating out of their noses?

      SSD

    7. Re:When does a CPU become the CPU? by sjames · · Score: 1

      I can't speak to the I/O issue, since that seems like it would be a huge problem. As for the kernel issue, a driver in the kernel can be all you need. Open the device as a file (as usual) and then point it to the binary to be run and tell it to go. The native kernel on the CPU sees it all as just data moving through a device file as usual.

      If I/O is required, the userspace program on the CPU will either have to perform the operations on the card's behalf or the card can have it's own I/O subsystem (unlikely).

      I know there used to be a PDP-11 on a PCI card that used a strategy something like that so that the PDP's "disks" were files on the native PC's drive.

  7. Custom ISA? by Henriok · · Score: 4, Insightful

    Massive amounts or cores are cool and all that, but if the instruction set isn't any standard type (ie x86, Sparc, ARM, PowerPC or MIPS) chances are that it won't see light outside highly customized applications. Sure, Linux will probably run it. Linux run on anything, but it won't be put in a regular computer other than as an accelerator of some sort, like GPUs which are massively multicore too. Intel's Larrabee though..

    --

    - Henrik

    - when the Shadows descend -
    1. Re:Custom ISA? by EsbenMoseHansen · · Score: 2, Informative

      In general, new instruction sets are mostly interesting in the custom software and the open source software areas. But the latter is quite a large chunk of the server market, so I suppose they could live with that.

      They would need to get support into gcc first, though.

      --
      Religion is regarded by the common people as true, by the wise as false, and by rulers as useful.
    2. Re:Custom ISA? by stiggle · · Score: 4, Informative

      From a quick Google - its based on the ARM core (easily licensable cpu core)

    3. Re:Custom ISA? by complete+loony · · Score: 3, Insightful

      1. LLVM backend
      2. Grand central
      3. ???
      4. Done.

      Seriously though, this is exactly what Apple have been working towards recently in the compiler space. You write your application and explicitly break up the algorythm into little tasks that can be executed in parallel. Using a syntax that is light weight and expressive. Then your compiler tool chain and runtime JIT manages the runtime threads and determines which processor is best equipped to run each task. It might run on the normal CPU, or it might run on the graphics card.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    4. Re:Custom ISA? by Linker3000 · · Score: 1

      "...if the instruction set isn't any standard type..."

      No problem; use the processor for a 'speak and spell'-type toy, a drug store reusable digital camera or a scientific calculator and someone will hack a decent Linux kernel onto it over a weekend.

      --
      AT&ROFLMAO
    5. Re:Custom ISA? by V!NCENT · · Score: 1

      GPU's are not massively multicored! That's marketing speak...

      --
      Here be signatures
    6. Re:Custom ISA? by bertok · · Score: 1

      From a quick Google - its based on the ARM core (easily licensable cpu core)

      Must be a coincidence, but I was just thinking a week ago why nobody's tried to make a many-core CPU by doing a cookie-cutter job and just replicating a simple ARM core a bunch of times... looks like someone has!

    7. Re:Custom ISA? by ForeverFaithless · · Score: 5, Informative

      Wikipedia claims it's a MIPS-derived VLIW instruction set.

      --
      Mark Kretschmann - Amarok Developer, KDE Member
    8. Re:Custom ISA? by taniwha · · Score: 1

      64-bit VLIW instructions, 2 ALUs, 1 load store unit (3 ops/clock) I'm going to guess 32 registers (ala MIPS) - that means 3+3+2=8x(log2 32 = 5) = 40 bits to encode registers 8+8+8 to encode opcodes which seems maybe too many - perhaps 64 registers 48 bits of regs and 16 of opcodes?

      no FPU though sadly

    9. Re:Custom ISA? by rbanffy · · Score: 1

      You can always offload your number crunching to a GPU with OpenCL...

    10. Re:Custom ISA? by rbanffy · · Score: 1

      They have a C compiler. That's all we need.

    11. Re:Custom ISA? by Locutus · · Score: 1

      good one. I browsed the article for what arch it was and was expecting ARM but didn't see it stated. ARM makes sense and the 40nm process has me wondering if it's Cortex a5 or a9 based.

      how about those in some netbooks and a beowulf cluster of those? ;-)

      LoB

      --
      "Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
    12. Re:Custom ISA? by Anonymous Coward · · Score: 0

      no FPU though sadly

      I imagine with 100 cores, allocating a handful of them as a SoftFPU would not be a major problem.

    13. Re:Custom ISA? by Narishma · · Score: 3, Informative

      Why was this modded Informative? Can we have any links? Because both the article here as well as Wikipedia and an old Ars Technica story claim that it's based on MIPS.

      --
      Mada mada dane.
    14. Re:Custom ISA? by Nursie · · Score: 1

      "Seriously though, this is exactly what Apple have been working towards recently in the compiler space. You write your application and explicitly break up the algorythm into little tasks that can be executed in parallel. Using a syntax that is light weight and expressive. Then your compiler tool chain and runtime JIT manages the runtime threads and determines which processor is best equipped to run each task."

      AAAAAAAAHHHHH!!!! It's the iPod all over again! Apple did not invent the thread pool! I'm sure Grand central is great but FFS!

      "Seriously though, this is exactly what Software Engineers have been working with for years in the thread pool pattern. You write your application and explicitly break up the algorithm into little tasks that can be executed in parallel. Using the language of your choice. Then your Operating System manages the runtime threads and determines which processor is best equipped to run each task.

      FTFY. Thread pools are not new. Hell, I wrote a thread pool implementation 10 years ago and it wasn't new then.

    15. Re:Custom ISA? by complete+loony · · Score: 1

      No it wasn't that new. But what is new is a common low level language representation that can be optimised in that form before being targetted to the different archetectures that are present in the same machine. It also helps that there is a single machine level daemon managing the tasks that run on those threads.

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    16. Re:Custom ISA? by Nursie · · Score: 1

      Oh I'm not saying it's not innovative, I'm not saying they don't (or didn't) do good, interesting and cutting edge research, it just annoys me that some folks think that they invented the thread pool/job queue model.

    17. Re:Custom ISA? by V!NCENT · · Score: 1

      Yes it's the iPod all over again: nothing new but done right for the first time. _'

      --
      Here be signatures
    18. Re:Custom ISA? by complete+loony · · Score: 1

      Because they are standing on the shoulders of giants??

      --
      09F91102 no, 455FE104 nope, F190A1E8 uh-uh, 7A5F8A09 that's not it, C87294CE no. Ah! 452F6E403CDF10714E41DFAA257D313F.
    19. Re:Custom ISA? by Angostura · · Score: 1

      That's a coincidence, I was thinking that when you get to that may cores, you're effectively producing something akin to a VLIW processor, with each instruction handed to its own execution system.

    20. Re:Custom ISA? by KillerBob · · Score: 1

      GPUs are massively parallel though, and usually optimized for working with enormous matrices. When you think about it, an average GPU would probably make a pretty good processor for searching databases/indexes. Maybe not so much for rendering something server-side, like a php or asp script, but it'd certainly be passable at it.

      Wonder if that's where the folks at Tilera got their idea?

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    21. Re:Custom ISA? by V!NCENT · · Score: 1

      Are you stupid? HDD to RAM to CPU to graphics RAM to GPU and back again. Searching databases? Are you kidding me?

      GPU's are fucking slow. It's like Google shipping their data around by shipment. The thoroughput is bigger but any idea how long it takes for the ship to arrive?

      --
      Here be signatures
    22. Re:Custom ISA? by KillerBob · · Score: 1

      Are you stupid?

      Nope, but clearly you are. There's nothing that says a GPU-style chip can only be used on a discrete graphics card. The GPU-style chip usually performs enormous numbers of calculations in a massively parallel way. Every pixel on your screen has to be rendered... the graphics engine needs to figure out what colour it's going to be in order to render the object being displayed. It accounts for lighting effect, reflectivity, and opacity. And it does it for every pixel you can see. Millions of pixels in a screen, each rendered sometimes upwards of 60 times a second. And it does this by performing row and multiplication operations on *massive* matrices.

      If you can't see how a CPU that's optimized for that kind of performance could be helpful for something like searching an index or database, then you need to turn in your geek card right now, and start surfing Disney.com.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    23. Re:Custom ISA? by V!NCENT · · Score: 1

      I know exactly how a GPU works. You clearly do NOT. So shut up and watch this first: http://www.macresearch.org/opencl_episode4

      And maybe then you can keep your geek card. You have clearly been brainwashed by marketing crap.

      --
      Here be signatures
    24. Re:Custom ISA? by KillerBob · · Score: 1

      Considering the number of posts you've made that've been modded troll, you might consider correcting your language. Either that, or pulling your head out of your ass.

      Since you're throwing videos at me, I'd suggest you watch this one: http://www.youtube.com/watch?v=nlGnKPpOpbE ... A modern GPU is a *hugely* parallel manycore ecosystem (some of nVidia's mainstream offerings are operating with 240 logical cores, and ATi/AMD isn't far off that mark), and is optimized for floating point and matrix operations. They're very specialized for certain kinds of operations that are best for graphics performance (and happen to be damned good for things like databases/indexing), but for certain applications a modern GPU is easily as fast as a modern CPU, and in some ways, it's significantly faster. Are you *trying* to sound like a self-important ass with no clue what he's actually on about? People like that are why I left academia to work in the government... at least there, people are honest about not having a clue what they're doing. You really do need to stop thinking in terms of offloading processes through a bus, which is the limiting factor, and start thinking of a processor that's optimized/designed in the same way as a GPU operating as the CPU of the system. It's a purpose-built specialized processor that's intended for one specific type of operations, and it just so happens that said operations would be useful for database applications.

      Irregardless, this'll be the last time I waste bandwidth trying to speak with you. I do wish that Slashdot would give users the ability to outright ignore everything another user were to post, every single time.

      --
      If you believe everything you read, you'd better not read. - Japanese proverb
    25. Re:Custom ISA? by V!NCENT · · Score: 1

      nVidia slides, yes, marketing crap. Dude just watch that OpenCL video which not only translates the marketing to tech language and the problems and such that it brings along.

      Oh an the troll posts... Lol... At least also note that my karma is so awesome that after all this time my karma is still good. It has been excellent for a very, very long time. Very sad though that you lack constructive critisism and points that you had to check my profile to search for something that you could criticise intead.

      I got something interesting for you to bitch at next: I didn't use spelling check so my post is probably full of grammar and spelling error. I am not a native English speaker, but go ahead... :)

      Fscktard...

      --
      Here be signatures
  8. 100? by nmg196 · · Score: 2, Insightful

    Wouldn't it have been better to make it a power of 2? Some work is more easily divided when you can just keep halving it. 64 or 128 would have been more logical I would have thought. I'm not an SMP programmer thought, so perhaps it doesn't make any difference.

    1. Re:100? by Anonymous Coward · · Score: 0

      If it's ported to Apache it could be interesting.

    2. Re:100? by Anonymous Coward · · Score: 0

      It boils down to have much space you have on the die, which is usually square or a rectangle where width is twice the length. Perhaps it's 100 cores, and the cache and interconnects takes up about 28 times a core. Just a wild ass guess.

    3. Re:100? by Fotograf · · Score: 5, Funny

      it does if you are carefully starting applications in power of two and designing your applications to use power of two threads.

      --
      God's gift to chicks
    4. Re:100? by harry666t · · Score: 4, Informative

      SMP FAQ.

      Q: Does the number of processors in a SMP system need to be a power of two/divisible by two?

      A: No.

      Q: Does the number of processors in a SMP system...

      A: Any number of CPUs/cores that is larger than one will make the system an SMP system*.

      (* except when it's an asymmetrical architecture)

      Q: How do these patterns (power of 2, divisible by 2, etc) of numbers of cores affect performance?

      A: Performance depends on the architecture of the system. You cannot judge by simply looking at the number of cores, just as you can't simply look at MHz.

    5. Re:100? by glwtta · · Score: 5, Funny

      Their plan is to eventually confuse consumers by advertising "X KiloCores! (* KC = 1000 cores)" when everyone expects a KiloCore to be 1024 cores.

      --
      sic transit gloria mundi
    6. Re:100? by TheRaven64 · · Score: 1

      It doesn't need to be a power of two, but being a square number helps for this kind of design because you want a regular arrangement that can fit into a regular grid on the die.

      --
      I am TheRaven on Soylent News
    7. Re:100? by godrik · · Score: 1

      well you are right when you say you don't need powers of 2 to be a smp. But GP is right as well. There are a lot of parallel algorithm that relies on a number of processor which is a power of 2. However, we usually know workaround when it is not a power of 2.

    8. Re:100? by wrongrook · · Score: 1

      The previous versions have had 36 and 64 cores arranged in squares. The next power of 2 that is also a square would be 256 cores but this is probably getting a bit big.

    9. Re:100? by Anonymous Coward · · Score: 0

      Sorry, but 1024 cores would be 1 KibiCore. You have obviously not updated your SI and IEC standards....
      http://en.wikipedia.org/wiki/Kibibyte

    10. Re:100? by VeNoM0619 · · Score: 1

      Yea well, they tried renaming Sci-Fi to syfy and failed. Until they find a better name for it, it will still be kilobyte.

      --
      Disclaimer: I am not god.
      We may not be created equal
      But we can be treated equal.
  9. crossbars by Anonymous Coward · · Score: 0

    in the article it is mentioned that Tilera is able to avoid the use of crossbars:

    For faster data exchange, Tilera has organized parallelized cores in a square with multiple points to receive and transfer data. Each core has a switch for faster data exchange. Chips from Intel and AMD rely on crossbars, but as the number of cores expands, the design could potentially cause a gridlock that could lead to bandwidth issues, he said.

    Does anybody here know how this actually works?

    1. Re:crossbars by TheRaven64 · · Score: 1

      I vaguely remember reading about their design a while ago, and I seem to recall that they basically use a store-and-forward design. Each core only talks to the cores close to it directly, and these relay requests to further away ones (a lot like how AMD chips work, having copied the design from the Alpha). This adds a little bit of complexity to scheduling, because you want to keep processes that share memory on cores that are close together for best performance.

      --
      I am TheRaven on Soylent News
  10. Sounds Like by Nerdfest · · Score: 1

    Sounds like something that might be useful in a video game console ...

  11. What happened to powers of 2? by Godefricus · · Score: 1

    .. I'm I the only one who gets mildly suspicious when reading 100-core instead of 128-core?

    1. Re:What happened to powers of 2? by atilla+filiz · · Score: 1

      I think it's all about how many they can squeeze into a single chip, considering cost/power/performance. You don't need to have 2^n, just to fill in your address space.

    2. Re:What happened to powers of 2? by Anonymous Coward · · Score: 0

      I assume not, but it's a silly response. Personally, I find 128-cores strange, since you can't lay them out on a square die like you can 100 (10x10) or 64 (8x8).

    3. Re:What happened to powers of 2? by Shikaku · · Score: 1

      11x11 + 2x2 + 1x1 + 1 layers of cpus.

      The third dimension called, they are suing flatland for prior art and copyright infringement.

    4. Re:What happened to powers of 2? by JasterBobaMereel · · Score: 1

      100 cores plus some room on the chip for management, connections, global cache etc ....

      Plus if you say 100 cores and put 128 cores on the chip then 28 can fail before you have to bin the chip as a dud ....

      --
      Puteulanus fenestra mortis
    5. Re:What happened to powers of 2? by Culture20 · · Score: 1

      The third dimension called, they are suing flatland for prior art and copyright infringement.

      The fourth dimension called, they already have (wioll haven) the judgment from the lawsuit, and flatland stands (willan on-stand) on parody.

    6. Re:What happened to powers of 2? by Skapare · · Score: 1

      Where's the law that says the core layout, or even the die itself, has to be square? Square, or nearly square, might be the most convenient for minimum paths and such. Still, you need to have space somewhere for "between core" control circuits. Even if you lay out the die in a nice square grid, you don't have to make each cell be a core. Getting data lines into the cores in the middle can be an interesting challenge. But then, 100 cores trying to load a word from different locations in RAM all at the same time might be a bit congested. I'd suggest some internal RAM in place of some cores.

      --
      now we need to go OSS in diesel cars
    7. Re:What happened to powers of 2? by Rockoon · · Score: 1

      If it was an attempt at 128 cores, some of them would come off the fab with no defects and would be sold as 128's...

      They arent going to intentionally roast up to 28 cores on every unit just to hit their advertised number.

      --
      "His name was James Damore."
    8. Re:What happened to powers of 2? by marquis111 · · Score: 1

      Doctor Dan Streetmentioner called, and he wants royalties for your use of his tenses!

    9. Re:What happened to powers of 2? by Anonymous Coward · · Score: 0

      11x11 + 2x2 + 1x1 + 1 layers of cpus.

      apparently unclear on how 1x1 works. 11x11+2x2+1x1+1 cores is 127 cores.

    10. Re:What happened to powers of 2? by Shikaku · · Score: 1

      Oops, I meant 1x2 or 2x1 then.

  12. FreeBSD and GCD by MacTechnic · · Score: 3, Interesting

    Although I don't expect Apple to release an Apple Server edition with a Tilera multicore processor, I would be interested to see a version of FreeBSD running with Grand Central Dispatch on a Tilera multicore chip. It would give a good idea of how effective GCD would be in allocating cores for execution. Any machine with 100 cores must have a considerable amount of RAM, perhap 8GB+, even with large caches.

    Apple has been very active in developing LLVM compilers, and has recently added CLANG front end, instead of GCC. I don't think apple has open sourced all their work yet, but check llvm.org for the current details. The real trick is breaking any algorithm into blocks. Using OpenCL to organize your code for execution. I mean how different is a 100 core multi-CPU chip from a multicore GPU accellerator!

    1. Re:FreeBSD and GCD by TheRaven64 · · Score: 1
      Grand Central is nice and buzzwordy, but it's still based on threads and shared memory, so it works best when you have shared cache, or you will end up wasting a lot of time with cache coherency protocols. Erlang or OpenCL are much better fits for this kind of architecture.

      Oh, and the version of clang that Apple ships as 1.0 is a branch from the main tree from a few weeks before the official 1.0 release was branched. Apple puts a lot of developer effort into clang, but so do other people (including myself). This work is all open source and developed in a public repository, it is not some super secret Apple project.

      --
      I am TheRaven on Soylent News
    2. Re:FreeBSD and GCD by V!NCENT · · Score: 1

      The trick is not chache coherency management, because you can't adress a performance problem with something that demands performance. Duh! The trick is letting threads scedule themselves without too much overhead. How? This is done in the process of execution. Locking threads is stupid (multi-threaded execution by disabling multiple threads... which idiot ever invented THAT?!); instead you should lock cashe, have a que, skip to another piece of data if a piece of data is currently being processed.

      I am still trying to figure out how exactly, but the idea lies in data splitting, and rewritable data parts in arrays... *hint hint hint*

      --
      Here be signatures
  13. Allow ia64 to CONFIG_NR_CPUS up to 4096 by foobsr · · Score: 4, Informative
    --
    TaijiQuan (Huang, 5 loosenings)
  14. Am I *actually*... by Anonymous Coward · · Score: 0

    ...the first person to ask if this can run "Crysis?"

    1. Re:Am I *actually*... by Anonymous Coward · · Score: 0
    2. Re:Am I *actually*... by Anonymous Coward · · Score: 0

      It might run Crysis. Just
      But to actually run Windows and Crysis and not need to kill IE first you might need 4 of these.

  15. Re:100? LOL by CFD339 · · Score: 1

    Wish I had mod points today. I wonder how many people will get just how funny this fantastically sarcastic and totally on target comment was. Bravo.

    --
    The problem with quotes on the internet, is that nobody bothers to check their veracity. -- Abraham Lincoln
  16. What ISA? by abdulla · · Score: 1

    Are these x86/x86-64 CPUs? It wasn't particularly clear to me.

    1. Re:What ISA? by Narishma · · Score: 2, Informative

      No, they are derived from the MIPS architecture.

      --
      Mada mada dane.
  17. But does it run linux? by conureman · · Score: 1

    In TFA sez it's ported to apache. Might be useful.

    --
    The cost of that cleanup, of course, will be borne by taxpayers, not industry.
  18. Been there, done that, got the T-Shirt... by Anonymous Coward · · Score: 5, Interesting

    OK, so big disclaimer: I work for Sun (not Oracle, yet!)

    The Sun Niagara T1 chip came out over 3 years ago, and it did 32 threads on 8 cores.
    And drew something around 50W (200W for a fully-loaded server). And under $4k.

    The T2 systems came out last year, do 64 threads/CPU for a similar power budget. And even less $/thread.

    The T3 systems likely will be out next year (I don't know specifically when, I'm not In The Know), and the threads/chip should double again, with little power increase.

    Of course, per-thread performance isn't equal to anything like a modern "standard" CPU. Though, it's now "good enough" for most stuff - the T2 systems have a per-thread performance equal to about the old Pentium3 chips. I would be flabbergasted if this GX chip had a per-core performance anywhere near that.

    I'm not sure how Intel's Larabee is going to show (it's still nowhere near release), but the T-seres chips from Sun are cheap, open, and available now. And they run Solaris AND Linux. So unless this new GX chip is radically more efficient/higher-performance/less costly, I don't see this company making any impact.

    -Erik

    1. Re:Been there, done that, got the T-Shirt... by alop · · Score: 0

      Though, it's now "good enough" for most stuff - the T2 systems have a per-thread performance equal to about the old Pentium3 chips.

      You must have pretty low expectations of what a system should do for that price... If I'm spending ~$15k for a T5120, it should at least hold it's own against a $4k x86_64...

      I'm sure they make great web servers, but all the hype about how it'll be the next big thing in HPC was waay off.
      IMO, Sun shot itself in the foot when they eliminated the entry-level server. I got decent performance out of V210/V240 at a good price point. Now, if I need sparc, I have to sacrifice for a T-series box that won't do day-to-day operations very well, or spend an arm+both legs for an M-series. That's why we push linux so much....

      --
      --alop
    2. Re:Been there, done that, got the T-Shirt... by BikeHelmet · · Score: 1

      Tip: Don't sign your name when posting anonymously. :P

  19. but we already have... by AliasMarlowe · · Score: 1

    ...a Beowulf cluster of stale memes.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    1. Re:but we already have... by Anonymous Coward · · Score: 0

      Like you know shit, n00b.

  20. It would be clever by rbanffy · · Score: 2, Insightful

    Since a) developing a processor is insanely expensive and b) they need it to run lots of software ASAP, it would be very clever if they spent a marginal part of the overall development costs in making sure every key Linux and *BSD kernel developer gets some hardware they can use to port the stuff over. Make it a nice desktop workstation with cool graphics and it will happen even faster.

    They are going up against Intel... The traditional approach (delivering a faster processor with a better power consumption at a lower price) simply will not work here.

    I think Movidis taught us a lesson a couple years back. Users will not move away from x86 for anything less than a spectacular improvement. Even the Niagara SPARC servers are a hard sell these days...

    1. Re:It would be clever by Anonymous Coward · · Score: 0

      They already have a Linux version for this along with a GCC to compile for it...

    2. Re:It would be clever by rbanffy · · Score: 1

      So now they need Ubuntu and Fedora running on Tilera-based desktop monsters so that the right people get interested in developing for them.

  21. Chips target tasks by Decameron81 · · Score: 1

    The two-year-old startup's Tile-GX series of chips are targeted at servers and appliances that execute Web-related functions such as indexing, Web search and video search.

    Can someone explain to me how a chip can be targetted at much higher-level tasks like these?

    I realize there are surely technical means to achieve this goal, I just can't imagine myself what these means could be.

    --
    diegoT
    1. Re:Chips target tasks by TheRaven64 · · Score: 1

      There is not really such a thing as a general purpose CPU. Any CPU with a few features (add, conditional branch) can run any algorithm, but can't necessarily run it fast. Different applications have different instruction mixes. The kind of code that GPUs are designed to run, for example, places very high demands on memory throughput and floating point performance, but is relatively sparse in terms of branches and integer operations. On average, most programs have a branch every 7 instructions, but GPU code typically runs for a few hundred instructions between conditional branches. Web serving generally uses no floating point instructions and is throughput - as opposed to latency - sensitive, and requires high degrees of concurrency. A processor that can achieve these trades (e.g. Sun's T series) will serve web pages very well, but will do a lot worse at, for example, CAD.

      --
      I am TheRaven on Soylent News
    2. Re:Chips target tasks by Skapare · · Score: 1

      An associative memory requirement could be better served by a custom high-core count, CPU ... if it has sufficient memory on board (e.g. sufficient total memory bus bandwidth).

      --
      now we need to go OSS in diesel cars
    3. Re:Chips target tasks by Anonymous Coward · · Score: 0

      Can someone explain to me how a chip can be targetted at much higher-level tasks like these?

      Technically, it is more about what it is not targeting.

      By removing some general purpose SMP goals, they can squeeze a lot more power for these loosely coupled, data parallel tasks into a chip. Everything in that list is something that can be parallelized into nice small chunks with mostly private intermediate data. For these apps, you can define pipelines and data-flow solutions which map easily onto the Tilera architecture. From the beginning, Tilera focused on the development tools needed to design such data-flow apps.

      They've been targeting these markets because they are approachable for parallelism and approachable from an engineering perspective: embedded server and appliance markets are more accepting of alternative, low cost designs since they do not have as much concern with long-term platform stability for general purpose workload. You can do things like size a wire-speed encoding/decoding or pattern-matching workload and define a static data-flow solution that places certain processing steps on each core in the tiled CPU array, based on adjacency and message-passing. It is not just a general purpose OS managing all the cores as symmetric shared-memory processors.

  22. Power of two is not at all necessary by Sycraft-fu · · Score: 1

    It is done only out of convince really. So you have your regular 1 core processor of course (2^0), next step up is a second core (2^1). Now from there, an easy step is to simply duplicate your dual core setup. You just make a second copy and put it on the same chip giving you 4 cores (2^2). This is as far as most chips go, more than 4 cores is not real common. However you might notice we have a real small sample set, we've only covered 3 powers of two, two of them by necessity. This trend thus isn't one because computers require it, just because it works out that way.

    So, if you sniff around, you discover that indeed AMD makes 3 core processors. They are called the Phenom X3. Basically what happens is they designed a quad core chip. however they are having yield problems. Often enough, one of the cores fails testing, but the others work. So what they do is disable that core, and sell a 3 core product. End result works great, the OS sees 3 CPUs and uses them.

    OSes don't care about specifics in terms of core numbers. Power of two core numbers are just the way it has worked out in many chips so far because we aren't dealing with large numbers. It is going to quickly go away though. Intel is going to introduce a 6-core chip next year. We are heading towards a market that will have processors with a number of cores that is convenient. What "convenient" is will depend on a lot of factors, but the divisibility of the numbers won't be one of them.

    We may well start to see more odd numbered CPUs. If you design something with 100 individual units, it is much easier to disable parts if they don't work. Might see 96, 97, 98, 99, and 100 core varieties or something like that. All the same chip, just with units disabled if they fail.

    GPUs have been doing this for years. They are highly parallel and often when a new high end part comes out there'll be a slightly lower end part that is a bit lower clock and with one or two of the pipelines disabled. This allows for parts that won't pass all the tests, but still mostly work, to be sold rather than thrown out.

  23. hmm... by Skizmo · · Score: 2, Funny

    100 cores... that means that my cpu will never go beyond '1% busy'

    1. Re:hmm... by igny · · Score: 1

      Just install the Core BotNet and configure it to execute DDoS to the Windows threads.

      --
      In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
  24. Yep by Sycraft-fu · · Score: 5, Informative

    Unfortunately these days the meaning of supercomputer gets a bit diluted by many people calling clusters "supercomputers". They aren't really. As you noted what makes a supercomputer "super" isn't the number of processors, it is the rest, in particular the interconnects. Were this not the case, you could simply use cheaper clusters.

    So why does it matter? Well, certain kinds of problems can't be solved by a cluster, just as certain ones can. To help understand how that might work, take something more people are familiar with like the difference between a cluster and just a bunch of computers on the Internet.

    Some problems are extremely bandwidth non-intensive. They don't need no inter-node communication, and very little communication with the head node. A good example would be the Mersenne Prime Search, or Distributed.net. The problem is extremely small, the structure of the program is larger than the data itself. All the head node has to do is hand out ranges for clients to work on, and the clients only need to report the results, affirmative or negative. As such, it is something suited to work over the Internet. The nodes can be low bandwidth, they can drop out of communication for periods of time and it all works fine. Running on a cluster would gain you no speed over the same group of computers on modems.

    However the same is not true for video rendering. You have a series of movie files you wish to composite in to a final production, with effects and so on. This sort of work is suited to a cluster. While the nodes can work independent, the work of one node doesn't depend on the others, they do require a lot of communication with the head node. The problem is very large, the video data can be terabytes. The result is also not small. So you can do it on many computers, but the bandwidth needs to be pretty high, with low latency. Gigabit Ethernet is likely what you are looking at. Trying to do it over the Internet, even broadband, would waste more time in data transfer than you'd gain in processing. You need a cluster.

    Ok well supercomputers are the next level of that. What happens when you have a problem where you DO have a lot of inter-node communication? The result of the calculations on one node are influenced by the results on all others. This happens in things like physics simulations. In this case, a cluster can't handle it. You can slam your bandwidth but worse, you have too much latency. You spend all your time waiting on data, and thus computation speed isn't any faster.

    For that, you need a supercomputer. You need something where nodes can directly access the memory of other nodes. It isn't quite as fast as local memory access, but nearly. Basically you want them to play like they are all the same physical system.

    That's what separates a true supercomputer for a big cluster. You can have lots of CPUs and that's wonderful, there are a lot of problems you can solve on that. However that isn't a supercomputer unless the communication between nodes is there.

    1. Re:Yep by ja · · Score: 1

      Identifying a "supercomputer" is easy: If you can plug it in - it ain't! And if you do anyway, the +100kW power drain will instantly make you wish you hadn't.

      --

      send + more == money? ...
    2. Re:Yep by Anonymous Coward · · Score: 0

      For that, you need a supercomputer. You need something where nodes can directly access the memory of other nodes. It isn't quite as fast as local memory access, but nearly. Basically you want them to play like they are all the same physical system.

      Also known as Non-Uniform Memory Access (NUMA).
      NUMA for the masses.

    3. Re:Yep by afidel · · Score: 2, Interesting

      10Gb ethernet is fairly low latency and obviously has plenty of bandwidth, using remoteDMA you can get pretty damn good results. Obviously if latency is your #1 performance blocker then it's not going to produce the fastest results, but you can still get good results out of a fairly inexpensive cluster using 10Gb fat trees for most workloads. Basically commodity computing technology has shrunk the gap between what can be done on a moderate sized commodity cluster and what can be done on a purpose built supercomputer, the result being what has happened to Cray and SGI.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    4. Re:Yep by mesterha · · Score: 1

      Even in bandwidth, 10Gb ethernet is still orders of magnitude smaller than RAM access speeds in a supercomputer. At least use InfiniBand which can be configured to 96Gb bandwith with much better latency.

      However, the main problem with a 100 core chip for supercomputing is not the network bandwidth but the memory bandwidth. The cores will starve for data with the limited memory bandwidth. http://spectrum.ieee.org/computing/hardware/multicore-is-bad-news-for-supercomputers

      --

      Chris Mesterharm
  25. Makes me glad I've been learnig Clojure by Paul+Fernhout · · Score: 1

    Clojure is a lisp on the JVM designed for multi-threading. From:
        http://clojure.org/
    """
    Clojure is a dynamic programming language that targets the Java Virtual Machine (and the CLR ). It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language - it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection. Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.
    """

    --
    A 21st century issue: the irony of technologies of abundance in the hands of those still thinking in terms of scarcity.
  26. looks like by nimbius · · Score: 3, Funny

    /proc/cpuinfo will become a small book. on the bright side, i guarantee 100 cores meets the draft requirements for 'windows 8 capable' status.

    --
    Good people go to bed earlier.
  27. Don't buy the hype by alop · · Score: 1

    I've been personally let down time after time by systems that make these claims. I know it's a bit different, but Sun's T2/T2+ chips have been disappointing. Sure psrinfo shows 128 CPUs, but overall performance sucks for anything more than web serving. Sure, the kernel may be thread-aware, but the underlying parts of the OS aren't... Plus, the binutils and misc utilities that comprise day-to-day tasks don't take advantage of that many execution threads... You have to get special gzip that is parallelized.

    I'll withhold judgement until I see some benchmarks in real world scenarios.

    --
    --alop
    1. Re:Don't buy the hype by Anonymous Coward · · Score: 0

      I develop software for and use the Tilera 64 chip everyday and it is freakin' awesome. I develop a multi processor application to ingest multiple hi resolution GbE video streams at 30fps on the Tilera 64 processor it doesn't skip a beat. It rocks and the 100-core is just more tile processors at a higher clock rate , and not to mention high speed interfaces like 10GbE and 40GbE interfaces. The development tools are great and easy to use as well. Tilera as a company is awesome, very helpful, they have never let me down. BTW, you are very ignorant to suggest that you need a special gzip or even the bintools to take advantage of the cores. The tools are for your convenience to develop your own application/algorithms to allow you to take advantage of the massively parallel processor. It is a specialized processor not for general use, like you everyday PC. Think of it as an adjunct processor.

    2. Re:Don't buy the hype by rhsanborn · · Score: 1

      What about database access? Sun claims (in crappy marketing speak) to get some stellar performance out of MySQL, albeit with a special build to make it utilize the extra threads. A pain yes, very targeted, yes, but if you're running lots of simple requests, this might just be perfect for your application.

  28. 15-bladed shaving razor by kannibul · · Score: 2, Interesting

    For some reason, I read this article and immediately thought about a 15-bladed hsaving razor... My point being that 100 cores, while it sounds impressive, you get a diminished return after a few cores. Even if software was written for multi-core use (and not enough of it is, IMO), you still can't possibly, effectively, use 100 cores...not before this processor is already extinct due to technological progress. Even my quad core Intel CPU, hardly uses all 4 cores...and most commonly hits CPU1 for processes.

    1. Re:15-bladed shaving razor by cpghost · · Score: 2, Insightful

      My point being that 100 cores, while it sounds impressive, you get a diminished return after a few cores.

      Yes, indeed. The memory bus is usually the bottleneck here... unless you switch from SMP to NUMA architecture, which seems necessary for anything with more than, say, 8 to 16 cores.

      --
      cpghost at Cordula's Web.
    2. Re:15-bladed shaving razor by Thagg · · Score: 1

      I do think that we are in the midst of a revolution in computing, where every application, every algorithm, every problem will be examined from the beginning on how it can best take advantage of hundreds if not thousands of 'cores'. In my visual effects industry, it clearly dominates conversation and thought already, and I don't think we're more than a year or two in front of everybody else.

      At a recent conference, NVidia showed a very useful almost-real-time global-illumination renderer, that worked best when it was running about 100,000 threads simultaneously. Interestingly, the program didn't do any of the standard tricks to get exceptional perforance -- those tricks are hard, are fragile, have weird corner cases, and are just to be avoided if at all possible. Doing relatively brute-force computation on scaldingly fast computers is a great alternative!

      I predict that you will be using massively parallel programs soon. Either you'll write them yourself, or you'll be using your competitors programs :)

      --
      I love Mondays. On a Monday, anything is possible.
  29. binary by Anonymous Coward · · Score: 1, Funny

    100-core is binary for quad core.

  30. Bottlenecks-R-Bugs by dr2chase · · Score: 1

    If your network access is linear, then it's buggy. If the protocol specifies a linear stream, then it's buggy. I'm only half-joking -- by the time we get around to fixing these problems (how much do we have invested in TCP/IP?) they will bite hard and people will commit vile ugly hacks to get around them.

  31. News fodder by InsaneProcessor · · Score: 1

    This looks like another one of those companies that announces they "will" have a part that does "something" nobody else does and that it "will" be available someday. When a two year old start up company makes an announcement like this, it usually means they are just looking for some fast capitalization to rip someone off. There recently was another start up that was going after Intel's business.

    Then there was Transmeta Corporation.

    --

    Athiesm is a religion like not collecting stamps is a hobby.
  32. Resource sharing? by arugulatarsus · · Score: 1

    Any news on how the busses will be shared? This is an issue that most CPU manufacturers will look away from. Remember FB-DDRram? I can actually imagine an arbitrator bigger than the CPU in this multi-core architecture. You need something to help it scale.
    To explain my point a bit better: Imaging you have 100 computer all hooked up to a 10 / 100 hub (not switch ) and every computer has a bit torrent client opened. Same thing with the CPU and most modern buses. Your potential lag time to the bus is 99 other CPUs doing their shtick.
    In TFA they mention blocks sharing switch points. Does that mean people will be encouraged to set affinities for data locality? Consider me to be an old fart, but I really would like some real world junk thrown at this or disclosure on the design.

  33. asymmetric by TheSHAD0W · · Score: 2, Interesting

    It's been reported that these cores will be relatively underpowered, though both the total processing power and cost per watt will be quite impressive. This makes the chip appropriate for putting in a server but not so much a desktop machine, where CPU-intensive single-threads may bog things down.

    So what about one of these in combination with a 2-, 3- or 4-core AMD/Intel chip? The serious threads can be run on the faster chip, while all the background stuff can be spread among the slower cores? Does Windows have the ability to prioritize like that? Does Linux?

    1. Re:asymmetric by Anonymous Coward · · Score: 0

      Before you ask that, ask this: Does windows or linux or any operating system have the ability to run on an ARM and x64 chip simultaneously? Is there a motherboard in existence that can support both?

    2. Re:asymmetric by BikeHelmet · · Score: 1

      Neither does, but it could be added to Linux.

      It is, however, a monumental undertaking, since processes would have to be shifted between architectures while running. Unless, of course, you just design some programs to run on the massively parallel slower CPU, with no option of running on the faster one. Then there's no shifting, but you negate a lot of your benefit. And you could just as easily bundle two x86 CPUs on a board to get approximately the same effect, but with much less effort.

    3. Re:asymmetric by TheBAFH · · Score: 1

      Is there any way to flag slashdot comments as "possible future prior art"? It could be useful. :-)

      --
      http://www.grcrun11.gr - MUDA tribute
  34. Dancing Hamsters... by jameskojiro · · Score: 2, Funny

    It is like 100 Dancing Hamsters in your CPU.

    --
    Tsukasa: All I really want, is to be left alone...
  35. Sigh by PingPongBoy · · Score: 1

    A CPU for each sell processing its own value. Excel may almost run fast.

    Old software typically runs in just a few threads. More cores won't help until new software is available.

    I was doing some complex work on Excel 2007 and it was taking about a minute on a fast cpu. I checked the processor usage - it wasn't a disk intensive job but the usage graph was hovering only at the 40% level for the whole minute. Excel knows it has work to do, but something was still holding back the cpu. On a slower processor, the usage was into the 80 and 90% range though, and the time to finish was a lot longer.

    Software inefficiencies just let my high speed processor idle. For older software, MHz still beats having a lot of cores, so Intel's turbo to let some cores run fast while others slow is just what we need.

    --
    Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
  36. Re:You obviously know nothing, Shultzie !! by stim · · Score: 1

    *woosh* read his name.

    --
    Browse at -1 to keep an eye out for abuses.
  37. Really that easy? Don't think so. by Henriok · · Score: 1

    Relly? I did a quick Googlig too, and found nothing. There's certainly nothing of this sort to be found on their homepage, nor ARM's. I did a lengthy googling and found an Intel executive stating that it's ARM, but I also found an ArsTechnica article http://arstechnica.com/hardware/news/2007/08/MIT-startup-raises-multicore-bar-with-new-64-core-CPU.ars stating that it's a MIPS derived VLIW architecture. After MIPS revealed itself as a candidate it was easy to find more information, and MIPS it is.

    --

    - Henrik

    - when the Shadows descend -
  38. why not go to the source? by slew · · Score: 2, Informative

    The company website claims...

      64-bit VLIW processors with 64-bit instruction bundle
      3-deep pipeline with up to 3 instructions per cycle

    I don't know how this could be considered ARM or MIPS-derived...

    A better description might have been in this article...

    The Tile64 is based on a proprietary VLIW (very long instruction word) architecture, on which a MIPS-like RISC architecture is implemented in microcode. A hypervisor enables each core to run its own instance of Linux, or alternatively the whole chip can run Tilera's 64-way SMP (symmetrical multiprocessing) Linux implementation.

  39. I've written code for Tilera... by g01d4 · · Score: 1

    Back in the day when they only had 64 processors. Note that Tilera (then) had both shared, and local memory on each core. Using shared memory slowed things down quite a bit. Using local memory makes the algorithm even more complicated. YMMV.