Slashdot Mirror


MIT Startup Unveils New 64-Core CPU

single-threaded writes "Tilera, a startup out of MIT, has announced that it is shipping a 64-core CPU. Called the TILE64, the CPU is fabbed on a 90nm process and is clocked at anywhere from 600MHz to 900MHz. 'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now, and it's what I'll keep an eye out for as I watch this product make its way in the market. Though there are any number of questions about this product that remain to be answered, one thing is for certain: TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone.'"

213 comments

  1. embeded devices bright future! by Bartas · · Score: 0

    please somebody put one of these in my oqo2!

  2. Oblig... by Bentov · · Score: 5, Funny

    No one will ever need more than 64 cores.

    1. Re:Oblig... by Anne_Nonymous · · Score: 2, Funny

      Unless you need more than four pieces of toast.

    2. Re:Oblig... by uberushaximus · · Score: 1

      128 cores should be enough for anyone!

    3. Re:Oblig... by Ninjaesque+One · · Score: 4, Funny

      When the number of cores equals the hertz, then the number should stop doubling.

      --
      Ninjas and pirates. How piquant.
    4. Re:Oblig... by The+Clockwork+Troll · · Score: 1

      MIT course VI alumni will recognize this as a thinly veiled ploy to get more people to sign up for F. Tom Leighton's graduate parallel algorithms classes.

      --

      There are no karma whores, only moderation johns
    5. Re:Oblig... by Richard+W.M.+Jones · · Score: 1

      No one will ever need more than 64 cores.

      Funny :-) The truth though is that probably no one will ever be able to use more than 64 cores ...

      Not with the current state of crappy software anyway.

      Rich.

    6. Re:Oblig... by Anonymous Coward · · Score: 0

      May the Hertz be with you!

    7. Re:Oblig... by 8026mn · · Score: 1

      Hah, classic mistake.. I remember once saying we will never need more than 32mb of ram, or 1gb of hard drive space, or mre than 10gb of hd space etc.. etc.. etc.. etc..

    8. Re:Oblig... by WhatAmIDoingHere · · Score: 1

      That's the whole reason why the quote is funny. And the reason that everyone laughs at the original quote (even though it's not real, since nobody who knows anything about computers would ever state that anything is enough).

      But thank you for pointing out why it's funny.

      And let me get a "They'll use more than 64 to get Vista running properly!" comment in before I go.

      --
      Not a Twitter sockpuppet... but I wish I was.
    9. Re:Oblig... by BosstonesOwn · · Score: 1

      My Pentium Hertz , I think I have a virus.

      Real obligatory question ! Does it run Linux ? And Imagine a beowulf cluster of these !

      Now for the good stuff. Theres no real mention of performance other then the clock speed. It would be nice to get some real info like mips or something to let me relate this to my quad core pentium.

      --
      This package Does Not Contain a Winner
    10. Re:Oblig... by noidentity · · Score: 1

      Slight improvement:

      64.0 cores should be enough for anyone.

      (just to get that zero in there)

      Pentium version:

      63.997 cores should be enough for anyone.

    11. Re:Oblig... by Warbothong · · Score: 1

      I remember watching an episode of the Computer Chronicles on the Internet Archive from about 1991, and in the news segment they mentioned a computer chip which had over 1000 processors arranged in 3D. Of course, we're not all using these chips because, like the MHz wars of the past, there is more to a computer than the number of processing cores it has (and sadly these days the deciding factor on computer technology seems to be x86 compatibility, because if it doesn't run Windows it must therefore be crap).

    12. Re:Oblig... by Tribbin · · Score: 1

      Indeed; for personal computers there is hardly ever need for more than, say, two cores.

      For the average slashdotter though; if you have 512 cores, your fingers begin to tickle and you get excited and you will find a way to use them all.

      --
      If you mod this up, your slashdot background will turn into a beautiful sunset!
    13. Re:Oblig... by Anonymous Coward · · Score: 1, Informative

      And the reason that everyone laughs at the original quote (even though it's not real, since nobody who knows anything about computers would ever state that anything is enough).

      Are you trying to say Billy G didn't comment on the retarded XT memory architecture with the quote that '640k should be enough for anybody'? I suppose you also don't believe the head of IBM saw a world market for around 5 computers.

      Don't they teach history in schools anymore?

    14. Re:Oblig... by Chandon+Seldon · · Score: 1

      Indeed; for personal computers there is hardly ever need for more than, say, two cores.

      Right now. With the software available to you. Mostly because programmers wouldn't get very far trying to make you need things that aren't available to you.

      On the other hand, programmers tend to find a way to make use of whatever hardware is available to them. It's sort of funny watching some of them bitch and moan about how hard multi-core programming is, but once 8+ core systems are common you can be sure that your compute intensive apps won't even run on less than 4.

      Programming for multi-core systems isn't so much hard as it is different, and people *hate* having to learn new things.

      --
      -- The act of censorship is always worse than whatever is being censored. Always.
    15. Re:Oblig... by Anonymous Coward · · Score: 0

      Not so in the embedded market, which this is aimed at.

      MIPS, ARM, and PPC are all alive and kicking.

    16. Re:Oblig... by somersault · · Score: 1

      But 4 pieces of toast should be enough for anybody! Oh wait, some of you guys are from america huh :P

      --
      which is totally what she said
    17. Re:Oblig... by Branko · · Score: 1

      Programming for multi-core systems isn't so much hard as it is different, and people *hate* having to learn new things.

      While multithread programming is different, this is not the whole story. The main reason multithread programming is so hard is that it is non-deterministic. In other words, for the same input data, you are not guaranteed to have same result over multiple runs (due different thread scheduling, other processes currently running on the system, varying number of cores from system to system etc.).

      This makes multithreaded program significantly harder to design and especially to test.

    18. Re:Oblig... by Chandon+Seldon · · Score: 1

      The main reason multithread programming is so hard is that it is non-deterministic.

      Sure, if you're futzing with locks in a shared memory model that's true. If you implement any of the reasonably simple abstraction techniques that have been well understood for something like 20 years now (say... CSP for example) then your concurrent program ends up being deterministic and reasonably easy to follow.

      Basically the "multithreaded programming is hard" argument is no more interesting than the "programming is hard because you have to manually free allocated memory all the time and if you don't your program breaks horribly" argument. You fix that by implementing some sort of garbage collector. The solutions to concurrent programming woes are a lot like the solutions to memory allocation woes - you end up having to write programs differently and there are measurable performance losses compared to the theoretical optimal manual solution, but in the end you have to chose between that and memory leaks / deadlocks.

      --
      -- The act of censorship is always worse than whatever is being censored. Always.
    19. Re:Oblig... by I.M.O.G. · · Score: 1

      Yes he is saying that. So am I. Find me a reference that shows otherwise and I'll hear you out. This is the correct quote that the Bill Gates 640K meme likely originated from: "So that's a 1 MB address space. And in that original design I took the upper 340k and decided that a certain amount should be for video memory, a certain amount for the ROM and I/O, and that left 640k for general purpose memory. And that leads to today's situation where people talk about the 640k memory barrier; the limit of how much memory you can put to these machines. I have to say that in 1981, making those decisions, I felt like I was providing enough freedom for 10 years. . That is, a move from 64k to 640k felt like something that would last a great deal of time. Well, it didn't - it took about only 6 years before people started to see that as a real problem." You can enjoy in audio format here: http://www.neowin.net/index.php?act=view&id=39006

    20. Re:Oblig... by Branko · · Score: 1

      Sure, if you're futzing with locks in a shared memory model that's true. If you implement any of the reasonably simple abstraction techniques that have been well understood for something like 20 years now (say... CSP for example) then your concurrent program ends up being deterministic and reasonably easy to follow.
      I must admit I didn't know about CSP. Did you actually use it and what are you practical experiences with it?

      Basically the "multithreaded programming is hard" argument is no more interesting than the "programming is hard because you have to manually free allocated memory all the time and if you don't your program breaks horribly" argument.

      This is totally uncalled for. If you are suggesting that multithreading is as easy as correctly releasing the dynamic memory than I must wonder if you have any practical experience with either.

      You fix that by implementing some sort of garbage collector.

      Memory is just one possible resource that has to be managed in any non-trivial program. Garbage collectors will not save you from releasing other kinds of resources such as file handles, sockets, database cursors etc, and they have their own set of problems (usually performance related such as non-deterministic garbage collection).

      To manage resources, you will typically use RAII (resource acquisition is initialization) paradigm in languages with real stack-based variables such as C++ or some simulation of it in languages that don't (e.g. IDisposable/using in C#).

      But even with these complications, I fail to see how memory/resource management even approaches multithreading in complexity.

    21. Re:Oblig... by Chandon+Seldon · · Score: 1

      I must admit I didn't know about CSP. Did you actually use it and what are you practical experiences with it?

      CSP isn't so much a programming tool as a mathematical model. The most well known concurrent programming tool based on it is in Erlang, but there are others. I personally don't get a chance to do that much complex concurrent programming, but from what I have done it looks like a CSP-based concurrency model has the practical effect of abstracting away most of the real problems in concurrent programming - many applications actually become *simpler* to write in an asynchronous concurrent fashion than they would be with an asynchronous event loop or whatever.

      This is totally uncalled for. If you are suggesting that multithreading is as easy as correctly releasing the dynamic memory than I must wonder if you have any practical experience with either.

      Have you ever tried dealing with dynamic memory in a programming language with neither lexical scopes or a garbage collector? That's pretty similar in difficulty to trying to deal with concurrency with only shared memory and locks.

      Memory is just one possible resource that has to be managed in any non-trivial program. Garbage collectors will not save you from releasing other kinds of resources such as file handles, sockets, database cursors etc, and they have their own set of problems (usually performance related such as non-deterministic garbage collection).

      Right. Garbage collection only solves the most common resource management issue, not every such issue. Java programmers still need to deal with manually freeing non-memory resources because they can't use lexically-triggered destructors like in C++ - but that doesn't mean that Java's garbage collection isn't a step up on C-style manual memory management in most common cases.

      But even with these complications, I fail to see how memory/resource management even approaches multithreading in complexity.

      That's because memory management is a mostly-solved problem and you're used to the solutions, while you hadn't even heard of the potential solutions for concurrent programming woes. If you had talked to a Fortran programmer in the late '70's about dynamic memory allocation you probably would have gotten very similar responses to what I'm hearing from you about software concurrency: "Dynamic memory allocation is neat, but it's way too complicated. You'll get memory leaks and non-deterministic bugs depending on how the memory allocator reuses what. Better to just stick to static memory allocation."

      I'd suggest watching this video: http://video.google.com/videoplay?docid=8102320126 17965344 - it shows some example programs written in a CSP-based imperative concurrent programming language.

      --
      -- The act of censorship is always worse than whatever is being censored. Always.
  3. That's easy... by Anonymous Coward · · Score: 0

    just wait for it to be supported in MSVC

  4. Instruction Set by Lally+Singh · · Score: 4, Insightful

    FTA: It's a "MIPS-like ISA with a few important and peculiar features"

    I'll be interested to see what they're going to do about making it easier to program. Wire delay's going to be exposed as hops on the on-chip network. IMHO, the toolchain side's far more interesting to me than shoving a bunch of cores together on an on-die network....

    Assuming they did anything interesting on the toolchain side.

    --
    Care about electronic freedom? Consider donating to the EFF!
    1. Re:Instruction Set by evanbd · · Score: 3, Informative

      Also FTA: "I'm due to talk to the head of Tilera's software team, which is actually larger than the company's hardware team."

      I'll be very curious what their development toolchain ends up looking like, but it seems clear they understand the issue.

    2. Re:Instruction Set by SatanicPuppy · · Score: 1, Interesting

      You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

      This is the same problem we've been working with on clusters forever...How do you tune and load balance the jobs to the point where you're getting the most out of your hardware, and nothing is sitting idle while other parts of the system are running at 100%? What do you do when the task is already reduced to the simplest level and there is no benefit from throwing extra processors at it?

      Someone smarter than me is going to have to figure it out...The only way I can think of doing it is grafting more scheduling crap on top of all the processors to break down tasks and assign them to cores, and adding another layer of complexity is almost always a bad idea.

      --
      ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
    3. Re:Instruction Set by larry+bagina · · Score: 1

      They use a version of gcc/bintools in their sdk.

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    4. Re:Instruction Set by LWATCDR · · Score: 1

      What I find interesting is that they are using MIPS instead of ARM. It does seem to have on die memory controlers so that leads me to wonder if they are using hyper-transport as well?

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    5. Re:Instruction Set by dfedfe · · Score: 5, Informative

      FWIW:

      ""If you have an application written for any multi-core or single processor architecture that's written to work with Linux, you can take it, compile it and have it running on our chip in minutes," he said. "Now, if you want to ratchet up the performance, we provide libraries and interface mechanisms that customers can use to tune code."" from here

    6. Re:Instruction Set by Mex · · Score: 5, Funny

      ""I'm due to talk to the head of Tilera's software team, which is actually larger than the company's hardware team.""

      He must be a really fat guy!

    7. Re:Instruction Set by FuzzyDaddy · · Score: 2, Funny
      He must be a really fat guy!

      No, he just has a REALLY big head.

      --
      It's not wasting time, I'm educating myself.
    8. Re:Instruction Set by deadline · · Score: 1

      Well I have not solved the problem just yet either, but I thought about it quite a bit. I looked at it mostly from a HPC cluster standpoint, but problem is still the same -- software!

      --
      HPC for Primates. Read Cluster Monkey
    9. Re:Instruction Set by ultranova · · Score: 3, Interesting

      You're hoping they're doing something to make it easier to program, and I doubt they are. The choke point is rapidly becoming scheduling rather than number of cores.

      The solution, of course, is to move away from the imperative programming model to dataflow or functional one. That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    10. Re:Instruction Set by jd · · Score: 1
      If you have a mini-pipeline running to each processing element, and an instruction ID, then scheduling issues may be reduced somewhat as you can then not only execute whole instructions out of sequence, but fragments of instruction out of sequence between instructions. The smaller the atomic unit you are working with, the less effort is required to achieve equal or better packing.

      I also prefer the idea of virtual cores over physical cores for the same reason. You can "steal" unused processing elements from a virtual core, therefore you can improve the total amount of processing elements in active use and therefore the efficiency and parallelism. The pipelines to each processing element would only need to be deep enough to produce an effective illusion of having that many physical cores.

      What do you do when the task is already reduced to the simplest level? Well, you want to shorten the critical path by selecting different processing elements where a choice is available. You want to have the cache profile what is being used, so that retention is based on experience. (This allows you to exceed the theoretical best performance of a dumb caching system.) You want to break things down to a finer level of granularity (if you can) because that may provide opportunities for superior caching and superior parallelizing.

      How do you tune and load-balance when the hardware is at 100%? Well, you want to know if it's the ideal 100%. Better profiling of cache activity may reduce off-chip I/O by avoiding dropping items you will shortly need. Since hotter chips are more likely to be subject to errors, and errors generate latency, tuning may involve better cooling. Since the internal bandwidth is greater than the off-chip bandwidth, and since you are limited by the critical path, your true performance requires you to tune cache I/O and CPU usage in parallel. This can mean that you get superior performance by detuning the processor elements so that the I/O can be more evenly distributed and therefore superior.

      I don't think much extra scheduling hardware is needed, beyond some very simple profiling work and some basic multi-way analysis.

      However, I don't think any of this is the best way to improve CPUs. The best way, IMHO, is to build a two-layer chip - one layer being RAM, the other being the CPU cores - where the RAM is running at 100% of the speed of the cores. Eliminate the outer level of cache and eliminate most of the I/O saturation issues. Sure, it'd be more expensive than a conventional design. At least, initially. All new designs are more expensive. And we're also talking about more expensive than a slower system, don't forget. By accelerating RAM to CPU speeds, eliminating a layer of pipelining and caching, and wiring each core to RAM pretty much directly, the ability for each core to do work is independent of the number of cores, provided writes are to different parts of RAM. When everything is funneled into a narrow pipeline, it's no different from funneling a hundred junctions onto a narrow Interstate. You will suffer horribly. A single CPU with multiple cores generally hasn't the I/O capacity to handle all cores performing I/O at once. If you want to avoid more complicated scheduling - needed to maximize total throughput - then you need to increase the I/O capacity and there is no easier way to do that than to have a dedicated I/O connection per core in addition to any core-to-core lines.

      Long, fast, parallel - pick two. If fast and parallel are paramount, then you need very very short connections, which means you want two back-to-back wafers with lines running between them. The distances can't be any shorter than that, without having the cores or processing elements too far apart, with all the problems that causes.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    11. Re:Instruction Set by Wesley+Felter · · Score: 1

      It's "MIPS-derived", probably meaning they didn't pay any licensing fees. MIPS is the simplest ISA, and lots of networking equipment is based on MIPS. The question is not "why MIPS?" but "why not MIPS?"

      Tilera doesn't use HyperTransport; except for AMD most SoC vendors are using PCI Express for I/O.

    12. Re:Instruction Set by poopdeville · · Score: 1

      This is the same problem we've been working with on clusters forever...How do you tune and load balance the jobs to the point where you're getting the most out of your hardware, and nothing is sitting idle while other parts of the system are running at 100%?

      What's your application domain? In my corner of the clustering field (data mining), having processes use 100% time would be ideal. The job of the scheduler would be to send jobs to idle machines. The tricky part is when processes refuse to use the resources allocated to them, so you end up having to solve a bin packing problem to utilize as much CPU per unit time as possible. It's pretty straightforward if you have good data on the amount of CPU/time an algorithm uses on a data set. (Say, by empirical experimentation automated with a genetic algorithm)

      What do you do when the task is already reduced to the simplest level and there is no benefit from throwing extra processors at it?

      Much tricker, and it depends on the application domain. For example, some algorithms are faster if given an approximate solution to the problem. Spawning processes/threads to come up with good approximations for the main chain of algorithms to use could prove useful.

      --
      After all, I am strangely colored.
    13. Re:Instruction Set by networkBoy · · Score: 3, Interesting

      The best way, IMHO, is to build a two-layer chip - one layer being RAM, the other being the CPU cores Both those require transistors. You can not stack transistors with any current process technology, physics gets in the way.
      A chip is basically built as follows

      metal
      poly
      metal
      poly
      Si Where the poly is the insulator and metal is the same as traces on a PCB. Just like you can not place components in the middle of a PCB you can not place transistors on top of the metal, it would require a second silicon layer that you could dope transistors into.
      While there are some technologies (SOI for example) that may allow this in theory, you start to run into other issues like trying to punch through the insulator in specific areas and with high precision (neither of which is easy), heat dissipation (transistors are transistors, and switching produces heat, doesn't matter if it's an ALU or a SRAM). And finally before someone suggests using the other side of the wafer, how do you connect the two sides? A wafer is *very* thick in the scale we are discussing. It would be like mining a hole through the earth.
      More useful would perhaps be distributing L0 cache (register memory) a little more liberally in key areas of the processor, but then addressing gets in the way. In theory having a MCM (multi chip module) with Cache - Processor - Cache so there is ample L3 cache running at core/4 clock may help, but costs get prohibitive.

      There is no really good solution to moving data around once you start getting to these kinds of density. Eventually wire delay may be the limiting factor to CPU throughput.
      -nB
      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    14. Re:Instruction Set by LWATCDR · · Score: 1

      I would say why ARM is because it is small and actively being developed. Why MIPS? Well you named them. It is simple, inexpensive, well known, and it is a good ISA.
      I thought that HyperTransport was faster than PCI Express and also freely available I would think that with 64 cores that memory speed would be a bottleneck. It would also allow the Tilera to fit into an AMD socket. It is an interesting critter no matter how you look at it.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    15. Re:Instruction Set by BerntB · · Score: 1

      Doesn't this architecture looks suitable for the Connection Machine languages? They had something similar in CM-5, I think? (Anyone even older than me around?)

      --
      Karma: Excellent (My Karma? I wish...:-( )
    16. Re:Instruction Set by bcmm · · Score: 1

      you can take it, compile it and have it running on our chip in minutes
      Minutes? I guess it must be really really fast to compile stuff on those things. Unless, for example, Mozilla Seamonkey is not "an application".
      --
      # cat /dev/mem | strings | grep -i llama
      Damn, my RAM is full of llamas.
    17. Re:Instruction Set by DragonWriter · · Score: 1

      Unless, for example, Mozilla Seamonkey is not "an application".


      Seamonkey is, rather expressly, an integrated suite of applications, not "an application".
    18. Re:Instruction Set by imgod2u · · Score: 3, Informative

      This has been done. There was an article a while back about IBM being able to drill holes through their wafer to produce an interconnect to a second wafer on the bottom.

      Intel did this a swell and redesigned the Pentium 4 on it.

      The old method of bonding two wafers also works. Smart censors, for instance, bonds a photodetector material (a semiconductor like InGaAs or InSb) onto the top of a cmos chip. The bonding was very expensive, of course, but it is definitely possible to grow a semiconductor on top of existing metal/polysilicon.

    19. Re:Instruction Set by imgod2u · · Score: 2, Informative

      Considering these things are MIPS cores, having C code compile to it wouldn't be hard at all I would say. It's utilizing the mesh network that's the problem.

      Until I see some results of dynamically-compiled C code that runs really fast on this thing, I don't see it offering better solutions than, say, an FPGA. The exception would be if this was much lower-powered.

      It's not theoretically impossible to do. Instead of treating it like a CPU, treat it like a network with micro-ops treated like packets. Run each sequence of micro-ops through something similar to a global routing algorithm and optimization should be fairly easy. This all, of course, assumes that you have something very parallelizable to begin with, like H.264 encoding.

    20. Re:Instruction Set by timeOday · · Score: 2, Insightful

      I'll be interested to see what they're going to do about making it easier to program... Assuming they did anything interesting on the toolchain side.
      Contrary to the summary and your remark, I'm not sure it's Tile64's problem to bring parallel programming to the masses. First, because many-core chips are already useful (and present no special difficulties) for servers that handle many simultaneous connections - in other words, reducing the space and electricity requirements of server farms. That's a significant market.

      Second, parallelism is a far broader problem than this tiny company's single product; it's now a problem for Intel and AMD, too - in other words, for everybody. Any effective solution is highly unlikely to be specific to this particular chip.

      Sure, it would be great if these guys (or anybody else) made some breakthrough in parallel programming, but that doesn't appear to be the problem they've tackled. You say shoving together a bunch of cores is boring, but to me replacing a cluster with a single $500 chip would be fantastic.

      I am interested how this will stack up to Sun's Niagara chip. 600 MHz is pretty slow nowadays.

    21. Re:Instruction Set by wirelessbuzzers · · Score: 1

      I guess it must be really really fast to compile stuff on those things. make -j 100
      --
      I hereby place the above post in the public domain.
    22. Re:Instruction Set by ozmanjusri · · Score: 1
      I'll be very curious what their development toolchain ends up looking like, but it seems clear they understand the issue. You can see that on their website. There's a PDF showing the specs there, but it looks like it'll be useful straight out of the box.

      Tilera's Multicore Development Environment (MDE) is a complete, standards-based multicore programming solution that enables developers to take full advantage of the parallel processing potential of the Tile Processor architecture. Old multicore models required all operations to be done in a core-by-core fashion, making it impossible to efficiently program, debug or profile any more than a handful of cores. The great innovation of Tilera's MDE suite is that it enables developers to move to ever-larger and more complex multicore applications in an easy, predictable way.

      http://www.tilera.com/products/software.php
      --
      "I've got more toys than Teruhisa Kitahara."
    23. Re:Instruction Set by aldo.gs · · Score: 1

      He is not fat, it's glandular!

    24. Re:Instruction Set by networkBoy · · Score: 1

      I should have clarified "in no cost effective way"
      Also, the heat physics still applies. The P4 never used stacked transistors. All if the MCdTe to Si and InGaAs to Si are extremely low power applications, usually used at LN2 temperatures (especially the Mercad).
      -nB

      --
      whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
    25. Re:Instruction Set by jd · · Score: 1
      Yes, the heat physics applies, which is why I believe that a two-layer system that is arranged back-to-back (so both hot surfaces are also outward surfaces) and aligned vertically on the motherboard is the only way you'll ever fix the heat problem with any reliability. Now, there have also been articles about using thin films of liquid for cooling, but I'm not sure I'd trust such a system. The vertical arrangement should function just as well as a single hot surface facing outwards, as the inwards surface in neither case gets meaningful cooling so is presumably not much of a factor.

      Cost-effective is the key to this, though. This is where multi-core systems matter. With a multi-core system, you have essentially one shared I/O bus through which all data is delivered. On the MIPS chips, this I/O bus (the Z bus) was also used for all CPU-to-CPU traffic, I don't know if this is the case on 80x86 systems. Essentially, though, you have a severe bottleneck.

      Any N-way CPU must check every core for requests/posts - probably in a round-robin fashion - and then manage which core has the lock on the bus at any given time. The more cores you have, the more likely two cores will want the bus at the same time, the greater the net latency of the system. Eventually, adding more cores will actually slow the system down. In most SMP and multi-core systems, there are bottlenecks somewhere in the arrangement that will pretty much die above 16 cores.

      So, to build a 64-way system that is even equal in performance to a 16-way system, you must either have an MxN-way system where each N-way CPU is wholly independent from the M-1 other CPUs, even though they are on the same silicon (essentially operating at the level of a cluster rather than a multi-core chip) OR you must ramp up the bus speed by a factor of 16 to handle the extra traffic and overheads.

      From an architectural standpoint, a cluster is dirt-cheap compared to increasing bus speeds by that amount. Running lines from wafer to wafer is not cheap. It is very expensive, as you pointed out. BUT it only has to be cheaper than building a 48 gigahertz I/O bus in order to be cost-effective for the same level of performance per unit volume occupied by the system.

      (Volume is important, as the cheapest solution of all per unit of performance is to build a giant heterogeneous cluster such that traffic never needs to go much beyond nearest neighbor, and have a butterfly network linking the nodes together. It will take up a lot more room than a single-stack or double-stack CPU - indeed, it will likely take up a room in many cases - but you will never build a CPU that can outperform such a cluster for the same price, simply because costs increase superlinearly or exponentially with the speed of an individual component, but are linearly divisible across multiple components.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    26. Re:Instruction Set by aproposofwhat · · Score: 1
      Very interesting articles - extremely thought-provoking and insightful.

      One thought that struck me was that with a nice fixed 64-core architecture like this, one could optimise a compiler to deal with the 64 cores as a mini-cluster, and then build on this (using a modified Linda, with a federated tuple space, perhaps?) to provide the required dynamic optimisation across multiple machines.

      Since these chips are likely to appear in routers and switches initially, perhaps one could subvert the OS to create a computing cluster out of a large switch - now that would be a hack!

      --
      One swallow does not a fellatrix make
    27. Re:Instruction Set by Branko · · Score: 1

      That way the compiler can automatically parallelize the task, instead of the programmer having to do so manually.

      This might be practical for specialist number-crunching applications, but is not possible in general. Basically, whenever you need to manipulate any non-trivial data structure from multiple threads, you'll need to explicitly state how to do it.

      Classical example is writing into stream (e.g. file or console). While entity that represents a stream in the given programming language might be thread-safe in a sense that it will not crash when two threads attempt to write to it in the same time, this still does not make it thread-safe from the logical point of view. Consider this:

      Thread 1: (1) Write "A" to stream. (2) Write "B" to stream.

      Thread 2: (1) Write "C" to stream. (2) Write "D" to stream.

      Depending on thread scheduling, you may end-up with "ABCD", "ACBD", "CABD" etc... If you only want "ABCD" or "CDAB", you would need to state this explicitly in your source code like this:

      Thread 1: (0) Lock stream. (1) Write "A" to stream. (2) Write "B" to stream. (4) Unlock stream.

      Thread 2: (0) Lock stream. (1) Write "C" to stream. (2) Write "D" to stream. (4) Unlock stream.

      My point is that compiler (interpreter, VM...) simply cannot infer this kind of information on its own (it cannot "make up" information that does not exist).

      Functional programming is not a silver bullet either. While pure functional programming, through its lack of side-effects might be considered a candidate for "automatic" parallelization, bear in mind that functional programs live in a world full of side effects. So most functional languages are really "purely" functional only while you don't attempt to interact with the world outside them (e.g. simple writing into file).

    28. Re:Instruction Set by ultranova · · Score: 1

      This might be practical for specialist number-crunching applications, but is not possible in general. Basically, whenever you need to manipulate any non-trivial data structure from multiple threads, you'll need to explicitly state how to do it.

      The main difference between a dataflow language and imperative language is that in the dataflow language the programmer explicitly states which operation depends on which (not unlike "make"), instead of explicitly putting them in a specific order as in the imperative language. While this doesn't make every program automatically parallelizable, it does mean that a reasonably simple compiler can, from the same source code, produce a compiled program for 1, 10, or 100 cores.

      Besides, from what I've understood, a large amount of the performance of a modern processor comes from having multiple execution pipelines and a capability to find from the binary program stream instructions which can be run in parallel, by analyzing their dependencies. A dataflow language simply makes it easy for the compiler to perform a similar function.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

    29. Re:Instruction Set by Branko · · Score: 1

      My point was that you are the one that needs to supply this information - whether this was done through traditional mechanisms (such as locking) or through data-flow languages.

      The main difference between a data-flow language and imperative language is that in the data-flow language the programmer explicitly states which operation depends on which (not unlike "make"), instead of explicitly putting them in a specific order as in the imperative language.

      What I did was: I limited possible orders of execution, I did not choose single specific order.

      I suspect your "which operation depends on which" and my "limiting operation sequences" are different names for the essentially same thing.

      Admittedly I don't have any experience with data-flow languages - did you implement any actual project with them and can you tell me if they actually make multithreading easier in practice?

      Besides, from what I've understood, a large amount of the performance of a modern processor comes from having multiple execution pipelines and a capability to find from the binary program stream instructions which can be run in parallel, by analyzing their dependencies. A data-flow language simply makes it easy for the compiler to perform a similar function.

      While true, this is done on a micro-level and totally transparently for the programmer; the end of "free lunch" of MHz race forces programmers to redesign their software in a non-transparent way. I hope data-flow languages are a tool that will ease multithreading development, but if history of software development teaches us anything, there are no silver bullets.

  5. bulk pricing by dfedfe · · Score: 2, Funny

    Only $435 for 10,000 units. Are there 9,999 people on here who want to go in on that?

    1. Re:bulk pricing by Anonymous Coward · · Score: 0

      Only $435 for 10,000 units. Are there 9,999 people on here who want to go in on that? Sure, except I will buy all 10,000 myself, and sell you your cpu for about $600 instead :)
    2. Re:bulk pricing by Anonymous Coward · · Score: 0

      Let's see, carry the one..... Sure! I'll send my 4.35 cents out posthaste.

    3. Re:bulk pricing by glitch23 · · Score: 1

      Only $435 for 10,000 units. Are there 9,999 people on here who want to go in on that?

      Resellers will be the ones who get those prices established when they do the wholesale purchase in lots of 10,000. As long as they buy 10,000 at wholesale then we end up with the $435 at retail.

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    4. Re:bulk pricing by Anonymous Coward · · Score: 0

      Whoosh...

    5. Re:bulk pricing by glitch23 · · Score: 0, Offtopic

      Too afraid to show your real name when making an attempt to insult me when I seem to not get a barely funny joke?

      --
      this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
    6. Re:bulk pricing by Max+Littlemore · · Score: 1

      Resellers will be the ones who get those prices established when they do the wholesale purchase in lots of 10,000. As long as they buy 10,000 at wholesale then we end up with the $1305 at retail.

      There, fixed that for you.

      --
      I don't therefore I'm not.
  6. Correct! 6000 cores by Dachannien · · Score: 4, Funny

    Fry: If only they'd built it with 6001 cores! When will they ever learn!

  7. Now that I have a 64 core CPU... by HerculesMO · · Score: 2, Funny

    How do I overclock it?

    --
    The price is always right if someone else is paying.
    1. Re:Now that I have a 64 core CPU... by DumbSwede · · Score: 1

      Actually you have to over-core it by adding cores.

    2. Re:Now that I have a 64 core CPU... by Nahor · · Score: 1

      How do I overclock it?

      Easy:
      1. At the boot, type "F2" or "DEL" to go to the BIOS
      2. Go to "Advanced settings"
      3. Select CPU Core #1
      4. Change the clock speed for this CPU
      5. Press "F10" to save
      6. Reboot
      7. Run Prime95 to test that you didn't overclock too much.
      8. Reboot
      [... Repeat 63 times for each core ...]
      513. Voila!

      Of course, overclocking one core will affect the cores around it. So you may have to reduce the overclocking of some cores to increase the clock speed of the current core. Make sure to re-run Prim95 on each core to ensure that your cpu is still stable.

      Also, depending on your computer usage, you have different strategies:
      - you can overclock one core a lot and reduce the speed of the cores surrounding it to maximize the overclocking of that "power core"
      - or you can aim for a more smooth overclocking by increase the clock for each core by the same mount.
      - or you can mix the two strategies above and have a few "power cores", some "average cores", "low core" etc.
    3. Re:Now that I have a 64 core CPU... by kaizokuace · · Score: 1

      you will need to get a lot of cpu coolers!

      --
      Balderdash!
    4. Re:Now that I have a 64 core CPU... by wed128 · · Score: 1

      In all seriousness, i would assume they all run off of a common clock, no?

    5. Re:Now that I have a 64 core CPU... by Nahor · · Score: 1

      Off course, communication between cores would be a major headaches if they didn't, especially given this CPU's design.

  8. Ok by McNihil · · Score: 1

    now if this lifts... how long will it be until someone gives their daughter the name Tilera.

    ok ok... is the next one Tilere? Tileri?

    titillating isn't it?

    ok ok this was an attempt as a joke... a very very dry one at that... I'll just go back to my corner here and muse at the fact that there is someone out there that has called their daughter "Stalina"

  9. obligatory by Viking+Coder · · Score: 1, Redundant

    Boy, I could really go for a Beowulf Cluster of those...

    --
    Education is the silver bullet.
    1. Re:obligatory by bmo · · Score: 1

      I could really go for a Connection Machine of these.

      It will take only 1024 of these to have the same number of processors as the Connection Machine.

      http://en.wikipedia.org/wiki/Connection_Machine

      --
      BMO

  10. I Did RTFM, and there's key info missing by Nova+Express · · Score: 4, Insightful
    Key information missing from the article:

    1. Die size: How big is it?
    2. How many watts of power does it consume?
    3. What is the heat dissipation?
    4. What is the floating point performance?



    Without those bits of information, it's impossible to guage exactly who might night this chip, and how successful it might be.

    --
    Lawrence Person (lawrencepersonh@gmailh.com (remove all "h"s to mail)

    http://www.lawrenceperson.com/

    1. Re:I Did RTFM, and there's key info missing by niceone · · Score: 1

      What is the floating point performance?

      Judging from the applications they mention (networking / video stuff) I'm guessing it doesn't have much floating point performance.

    2. Re:I Did RTFM, and there's key info missing by trolltalk.com · · Score: 4, Informative

      The watts isn't missing:

      TFA says its between 175 and 300 milliwatts per core - do the math. 12 to 19 watts. They're targetting the embedded market (and with those low power consumption figures, I think a super laptop would be a no-brainer).

    3. Re:I Did RTFM, and there's key info missing by WindBourne · · Score: 2, Informative
      --
      I prefer the "u" in honour as it seems to be missing these days.
    4. Re:I Did RTFM, and there's key info missing by HerculesMO · · Score: 3, Funny

      If we are judging based off of current generation processors, I believe the size of the chip will be about 3 feet squared.

      Warning: Sarcasm above may cause irritation of skin and explosion of monitor.

      --
      The price is always right if someone else is paying.
    5. Re:I Did RTFM, and there's key info missing by georgewilliamherbert · · Score: 1

      1. Die size: How big is it?
      2. How many watts of power does it consume?
      3. What is the heat dissipation?
      4. What is the floating point performance?


      1. Does it matter? It's useful for computer architects to know for comparison but doesn't matter for the end user. I'm curious, too, but that can wait. Doesn't matter for system designers even.

      2. They list 170-300 mW/core, but that's not clear as to what the base power is for the peripherals and routers. Is that (900 mhz) 300 mW * 64 ( about 20 W ) for the whole thing? 20W if the cores are all running flat out plus a baseline for the peripherals? This does matter for system designers.

      3. See 2 above. They're usually sort of closely identical.

      4. Good question. No FP mention at all in the specs, so probably not doing it in hardware.

      Looking forwards to a real data sheet, not this PR stuff. And the first laptops...
    6. Re:I Did RTFM, and there's key info missing by TheRaven64 · · Score: 1

      Can it shut off cores when they're not in use? When you're not doing much, you could throttle back to a single core, and at only 0.3W that's much lower power than most ARM chips. When the load goes up, turn on more as they're needed.

      --
      I am TheRaven on Soylent News
    7. Re:I Did RTFM, and there's key info missing by afidel · · Score: 1

      I doubt you'd gain much as leakage current on most 90nm designs is ~30-40% of total power usage.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    8. Re:I Did RTFM, and there's key info missing by pslam · · Score: 1

      and at only 0.3W that's much lower power than most ARM chips

      Most ARM chips I'm working with are more like 10-20 milliwatts when throttled and still doing useful work, and also in the ballpark of 0.3mW/MHz (which is what this new chip is if you work their numbers out). Pretty much any ARM chip you find aimed at the low power embedded market has a ton of features and overall design that make the entire SYSTEM much more efficient than something designed for extreme performance. Although, to be honest, I'm quite impressed it still maintains 0.3mW/MHz when it has such a peak potential. If it holds to be true.

      No, this doesn't beat ARM for power consumption at the low end, but it'll beat the crap out of them at peak performance when required. Even so, I doubt you'll find it beating an ARM in mW/MIP for anything in the "sweet spot" of "normal" consumer computing, for example video and audio playback, or office document and web browsing stuff.

      I'm not totally sure what this thing is targeted at... although it does open up a new point on the big dotted graph of power vs MIPs.

    9. Re:I Did RTFM, and there's key info missing by sunwukong · · Score: 1

      I'd also like to know if it can handle cores that go bad/die.

    10. Re:I Did RTFM, and there's key info missing by imgod2u · · Score: 2, Informative

      Those are *not* very impressive figures for the embedded market. I imagined the whole 64-core chip would run below 100mW. If we're talking 12 to 19 watts for the chip, it is a beast in embedded terms. For reference, an SoC with 4 ARM cores, all of the peripherals that that thing has plus dedicated DSP/FPU units would still be under 4W.

      FPGA's (particularly ones from Xilinx) that offer similar logic horsepower (assuming you had a digital designer to write your VHDL for your) for less than 500mW.

      The latest Virtex 5 for DSP applications can provide the same processing capability these guys claim (2x H.264 streams) along with all the bells and whistles and on top of that, you have 2 PPC hardcore processors to act as arbitrators for slower functions.

      Those things suck up up to 1W though and that's a lot of power for an embedded system.

    11. Re:I Did RTFM, and there's key info missing by sgunhouse · · Score: 1

      Judging by the complete lack of any in-core or external FPU ...

      Well, consider the individual cores as something like a 386. Any floating point will have to be done in software, unless there is an off-chip FPU (comparable to the 387). Not impossible, but probably not something you'd want for video rendering or comparable scientific tasks.

    12. Re:I Did RTFM, and there's key info missing by andy_t_roo · · Score: 1

      yes, but you can use a few >90nm components to completely shut off power to specific cores to result in no leakage current, at the cost of a few clock cycles initialization time when you power it up again.

    13. Re:I Did RTFM, and there's key info missing by Anonymous Coward · · Score: 0

      Idle Tiles can be put into low-power sleep mode
      http://www.tilera.com/products/processors.php

    14. Re:I Did RTFM, and there's key info missing by Televiper2000 · · Score: 1

      But, for many embedded systems you have an over arching concern for cost, and time to market. Sure, you can get an ultra-powerful Virtex-5 but you're probably pricing yourself out of the market and that's before you consider the cost and complexity of designing your FPGA's insides. That's also assuming you're not trading off all your FPGA's bells and whistles to implement your cores. With the TILE64 your project starts and ends with C. I am also extremely skeptical of any but the smallest Virtex-5 applications running at 1W. The smallest quiescent core current is 300mA for the smallest FPGA in the family at the lowest speed grade. 5 to 10W would be a more accurate (all though a ballpark) estimation. In fact most reference designs I've seen for Xilinx and Altera FPGA's involve some exceptionally beefy power supplies.

      --
      New! Device Legs: These legs will help your poor OEM installed product escape any hamfistedness it may encounter. Ava
  11. Why not 1024 cores? by Anonymous Coward · · Score: 0

    http://www.rapportincorporated.com/ [rapportincorporated.com]

    1. Re:Why not 1024 cores? by trolltalk.com · · Score: 1

      Riiiight - each core only handles 8 bits of data at a time. And it runs at only 100 mhz. an old ighz duron will beat the pants off it. Add in the overhead of breaking data down into 8-bit chunks and reassembling it, and an old Pentium 200 would probably be just as fast.

    2. Re:Why not 1024 cores? by inKubus · · Score: 1

      and an old Pentium 200 would probably be just as fast.

      I resent that remark. I'm on a P200 right now and it's not fast at all.

      --
      Cool! Amazing Toys.
    3. Re:Why not 1024 cores? by trolltalk.com · · Score: 1

      You can always run DOS on it - Doom runs nice :-)

  12. The real question is by WindBourne · · Score: 2, Insightful

    Is not if it will run Linux (it will), but if it will run windows? CE does not count.

    --
    I prefer the "u" in honour as it seems to be missing these days.
    1. Re:The real question is by Arabani · · Score: 1

      Is not if it will run Linux (it will), but if it will run windows? CE does not count. That would of course depend on Microsoft. It's certainly a possibility if it ever penetrates the market to a decent degree.
    2. Re:The real question is by WindBourne · · Score: 1

      yeah, but by then, Tilera may not care. At the least, they will make Linux be their premier system. MS has a long history of being late to the party on just about everything.

      Could be interesting to see low costs parallel systems come from this, running Linux.

      --
      I prefer the "u" in honour as it seems to be missing these days.
  13. I'm ready for it by Anonymous Coward · · Score: 2, Interesting

    On my laptop right now:

    > ps aux | wc -l
    281

    Of course not all those processes are in runnable state. On the other hand, many of those processes have multiple threads. A typical Java Swing GUI app may have a dozen threads, for example. A web server process can easily have dozens of runnable threads. Software is going to take a little bit of catching up, but nothing huge.

    1. Re:I'm ready for it by Bryan+Ischo · · Score: 3, Insightful

      Just as your system has only a few processes that want to be scheduled simultaneously (and so your observation that "not all of those processes are in [a] runnable state" is correct), those Java Swing applications you are talking about very rarely have more than a thread or two wanting to do work at the same time. The web server is a better example of concurrent execution but those are most often I/O limited as much as CPU limited, and in the vast majority of cases the bottleneck is not the number of threads that can execute concurrently.

      It's very hard to take advantage of multiple cores because very often, there isn't more than one thing for a program to be doing at the same time, and for most desktop users, there are rarely more than 1 or 2 programs running actively at a time. Many code paths are not explicitly parallelizable, and many more are parallelizable but not easily so. Just as clock speed is not the holy grail of processor performance, core count isn't either.

    2. Re:I'm ready for it by bcrowell · · Score: 1

      for most desktop users, there are rarely more than 1 or 2 programs running actively at a time.
      Yeah. I finally thought I had an application where the new dual-core amd system I built could really exploit both processors: encoding CDs in MP3 format. I was ripping a whole CD to disk, then having one cpu encode the even-numbered tracks and the other all the odds. It worked great, until I realized that there was no reason to leave the cpus idle until the whole CD had been read. I rewrote my script to start encoding each track as soon as it was ripped, and lo and behold, I found that I was only ever using one cpu, because the encoding was faster than the I/O.

      If you're waiting for your computer, it's almost always because it's waiting on network or hard disk I/O. But not to fear -- just as programmers have been using clock speeds to get sloppier and sloppier for the last 30 years, now they'll continue writing less and less efficient code, with the excuse that people are going to have 65,536 cpus inside their beige boxes.

    3. Re:I'm ready for it by bill_kress · · Score: 1

      Actually, a swing app should have no more threads than a non-swing app (except where the size of the app might be bigger).

      Your heavily threaded apps might even tend to be headless (think web servers).

      Swing itself is restricted to using exactly 1 thread, if you call swing from more than one thread, you're doing it wrong.

      -1 offtopic -1 pedantic

    4. Re:I'm ready for it by Bryan+Ischo · · Score: 1

      Often "less and less efficient code" is cheaper code, in time and money, to write. If your processor can make up the difference, why not speed up development process by writig less efficient code? Do you really want to pay more for programs that don't run noticeably faster? Or have fewer choices of software to use because more developers were devoted to optimizing unnecessarily than to creating new products?

      Note that there is a difference between "sloppy" and "inefficient". You lump them together, but I'm talking about code that is inefficient by design - making a trade off between development time and runtime costs - not sloppy code, which is inefficient not by design, but by unfortunate accident of the laziness/incompetence of the developer writing it. Sloppy code is bad, but inefficient is not necessarily bad.

    5. Re:I'm ready for it by Anonymous Coward · · Score: 0

      Nope! A Swing app will have a handful of threads at all time. The EDT, the GC thread, and the main execution thread, at least. A well-written Swing app will generally create a thread for any type of UI interaction that might need to access disk, net, etc to complete. I just gave Swing as an example because it's a system I know. I'm sure that well-built GUI programs using any other toolkit would also use threads for exactly the same reasons. If you need to access disk, net, a DB, etc, it must be done in a thread, which then notifies the GUI thread when there is an update.

    6. Re:I'm ready for it by bcrowell · · Score: 1

      Often "less and less efficient code" is cheaper code, in time and money, to write.
      Hmm...I don't see the real-world evidence to support that statement. The first word processor I ever used, ca. 1980, was Electric Pencil, which was written by one person, and was very snappy on a 1.8 MHz Z-80 with an 8-bit cpu. Today the best option on my linux box seems to be OpenOffice, which was written by a large team of programmers, and runs dog-slow on a 2200 MHz dual-code amd x64. It seems to me that OpenOffice was orders of magnitude more expensive to write, and is orders of magnitude less efficient (performs worse on a cpu with a clock speed that's 1200 times greater, has a much bigger instruction set, and has a much wider data bus). Seems like the worst of both worlds.

      If your processor can make up the difference, why not speed up development process by writig less efficient code?
      My experience is that the processor typically can't make up the difference. The trend I've seen is that over the last 30 years, computers have become less and less responsive. A TRS-80 used to boot in a matter of a few seconds; these days my wife complains if I shut off her Mac to save electricity, because it takes several minutes to boot.

      Do you really want to pay more for programs that don't run noticeably faster?
      I run FOSS, so I don't pay for programs.

    7. Re:I'm ready for it by Bryan+Ischo · · Score: 1

      But the fact is that you're using that 2200 Mhz dual-core processor with a modern operating system and applications instead of the 1.8 Mhz Z-80 with Electric Pencil. Why? If the circa 1980 computing experience was better, why not just fire up your old Z-80 and throw your AMD in the trash? Heck you could probably find a Z-80 emulator that would let you run all of your old Z-80 programs. Just run it full-screen and it would be just like you were back in 1980.

      The fact is that you are also getting orders of magnitude more features out of the modern software, that you do not want to give up, so you won't go back to the Z-80 and Electric Pencil. Maybe you don't use every single feature of the new software, but each feature is useful to somebody, or else it wouldn't be in the product. And the fact that Open Office has so many features is what makes it useful to orders of magnitude more people than your old Electric Pencil was. There is a reason that computers are used by billions more people than they were in 1980. It's because modern software is vastly more feature-rich, and vastly more useable. If it weren't, home computer use would be about where it was back then.

      Your TRS-80 used to boot up in a matter of a few seconds to a command shell that had very minimal features. I am sure a modern computer could boot to the same state in a tiny fraction of the time. The thing is, nobody wants a computer that boots to a TRS-80 command prompt really quickly. People want featureful operating systems and programs. And these things naturally are bigger and slower (they don't have to be - but because modern programs are vastly more complex beasts, the only way to keep them cheap is to offload development cost to runtime cost).

      As to not paying for programs, expand your concept of "pay". You would be "paying" by having less software to run because FOSS developers would all be spending their time optimizing their code instead of creating new and interesting software.

      Anyway, I'm not trying to say that there are lazy programmers out there or that modern software couldn't be just as featureful and great and still be faster and more efficient. There is *alot* of poorly written code out there. I know, I am a software developer, I deal with it every day (some of it my own!). I just wanted to make the point that sometimes, inefficiency is a *good* thing, because it represents a trade-off that got you more than you would have had otherwise (i.e. got you extra features and more software at the expense of burning more CPU cycles, most of which you never notice).

    8. Re:I'm ready for it by bcrowell · · Score: 1

      But the fact is that you're using that 2200 Mhz dual-core processor with a modern operating system and applications instead of the 1.8 Mhz Z-80 with Electric Pencil. Why? If the circa 1980 computing experience was better, why not just fire up your old Z-80 and throw your AMD in the trash?
      Well, for one thing that machine died a long time ago. Also, the peripherals on those computers were the quality you'd expect from a crackerjack box :-)

      But anyway, I honestly think that software quality has gone down precipitously over the years. I worked for Digital Research for a few summers back in the 80's, and when we found a bug, we would fix it and send out a hexadecimal patch to our users by paper mail, which they could key in so they could have the bug fixed, right away. We maintained lists of outstanding bugs, and any user who wanted the list could have it. These days, most software houses won't even let the public know how many bugs are in their software. You buy the next version of the software hoping for a bug fix, and find out that the bug hasn't been fixed. One of the reasons I switched to FOSS was because of the horrible quality of most proprietary software.

      You would be "paying" by having less software to run because FOSS developers would all be spending their time optimizing their code instead of creating new and interesting software.
      You're making some assumptions here that I don't agree with. One is that new and interesting software is typically better. I don't really see such an upward trend in most cases; the average person is no better off with the latest Word than they were with Word Perfect. What does seem to be true as a matter of business and economics is that in the world of proprietary software, users tend to make buying decisions based on features rather than performance, because it's easy for them to find out what the new features are, but it's hard for them to find out what the performance is going to be on their machines. There's also this very manipulative upgrade treadmill. You buy new software, find out it's too slow on your hardware, so you're expected to go out and buy new hardware.

      Also, you seem to be assuming that software is inherently expensive and scarce. It isn't. In typical widely used categories (operating system, word processor), there are lots of choices, and the cost of development is miniscule when you divide it by the number of users. That low cost is the reason FOSS can exist.

    9. Re:I'm ready for it by nasch · · Score: 1

      I think by new and interesting software, he doesn't just mean adding features (some of which I'm sure you care about) but whole applications. It's possible if processors hadn't gotten any faster that (for example) Firefox would have come out months later because the developers had to spend that much time on their code to make it usable on slow hardware. Personally I would rather have the software available sooner, and go ahead and throw faster hardware at it. When a pretty decent desktop costs $500, and a really good one $1000, it's very much worth it to buy a new one every few years.

      As for expense and scarcity, your comments may indicate that good software is scarce. Lots of people write software, but what percentage are really good at it? If we can have more software from the good developers by using faster processors, thus letting them spend more of their time writing new software rather than making their software run faster, I'd say that's a good thing.

      Finally, response time. I would have hoped boot times would have gotten shorter by now. Apparently not, and that's a shame. But for pretty much everything else, it seems to me a modern computer does everything very quickly. Almost always much faster than I can begin to get impatient with it, and I don't even have a new machine. On my work computer, the only programs I have to really wait for to start up are Lotus Notes after I've rebooted (if I close and then later reopen it it's fast) and Eclipse. Everything else basically starts instantly. Other things I have to wait for include compiling (developers have been waiting on this since the invention of the stick, and it's much faster now than it used to be) and ripping DVDs, which you couldn't have even considered doing in any reasonable time when DVD-ROM drives first came out. Now I can rip a feature length movie in about 25 minutes, on the aforementioned not-new PC. So I just don't see what the fuss is about. In my experience - maybe yours is different - almost all software, open source and closed, is efficient enough to run just great on modern hardware, and that's good enough for me. If I can give up a little bit of response time here and there to gain more features, fewer bugs, more frequent updates, or more applications, sign me up.

  14. Rumored... by SeanMon · · Score: 5, Funny

    It's rumored to be able to run 16 whole instances of Vista simultaneously!*

    *Required 32 GB of RAM not included.

    --
    "Scud Storm!" -- Jeremy of PurePwnage.com
    1. Re:Rumored... by Anonymous Coward · · Score: 0

      It's rumored to be able to run 16 whole instances of Vista simultaneously!*

      *Required 32 GB of RAM not included.

      run?

      lockup?

    2. Re:Rumored... by Pollardito · · Score: 1

      i'm not sure i can click Allow on 16 Vista security popups as fast as they repop, that sounds like whack-a-mole with a mouse

  15. Instruction set? by Eponymous+Bastard · · Score: 3, Insightful

    I can't believe startups haven't figured out that incompatible chips aren't what the market wants. They're either going to sell directly to "supercomputer" makers or just crash and burn.

    They'll probably market running Java as a strong point.

    (Then again, does it run Linux?)

    1. Re:Instruction set? by Anonymous Coward · · Score: 1, Insightful

      http://www.embedded.com/story/OEG20030610S0041

      maybe it is not intended to run windows

    2. Re:Instruction set? by Salgat · · Score: 1

      I'm sure companies such as Intel have already considered these types of chips and know that there just isn't a market for it.

    3. Re:Instruction set? by dfedfe · · Score: 1

      as others have said, yeah: it runs linux

      they also have a compiler for it (of course), an IDE, a "full-system simulation model", etc.. lots of development tools..

      not that I've seen any of them, but they claim to have a nice development environment..

    4. Re:Instruction set? by Anonymous Coward · · Score: 0

      Intel is not the leader in non Windows computers. Windows runs on a small percentage of computers.
      There is a market; it just does not have the margins that Intel requires to operate.

    5. Re:Instruction set? by vertinox · · Score: 1
      No. Apparently this thing was soley designed for embedded processors in video equipment.

      FTFA (page 2):

      Each TILE64 processor is capable of encoding two simultaneous streams of H.264 video, and over ten streams of broadcast-quality high definition video. That would be a boon for anyone that wants to stream live directly from the DV camera or video rackmount gear.
      --
      "I am the king of the Romans, and am superior to rules of grammar!"
      -Sigismund, Holy Roman Emperor (1368-1437)
    6. Re:Instruction set? by Wesley+Felter · · Score: 2, Insightful

      What's the instruction set of your router? Your TV? Why does it matter?

    7. Re:Instruction set? by ConradBurner · · Score: 1

      It matters because an instruction set determines what you can do with a programmable IC. If a chip is programmable it has an instruction set, while your TV may have several ICs in it, if you look at older TV's they are probably likely not to have "an instruction set" at all. They might not even have IC's in them. I think CPU's had 3-4% of the IC market. While rest of the chips are all mostly non-programmable chips, without an instruction set. FPGA's showed up recently and these things can actually be programmed with a completely new instruction set each time... CPU's aren't the "fastest" way of doing things, they are simply more versatile. My best guess is the TILE64 probably bridges a little between FPGAs and CPUs, and ALL of what it is capable of doing will be dependent on its instruction set. Being able to pipeline processes inside the TILE64 for example would probably something crucial to its design.

    8. Re:Instruction set? by Televiper2000 · · Score: 1

      You seem to be using "instruction set" in place of "functionality", "programmability", or "architecture." FPGAs for the most part replace the non-programmable chips that you would say "have no instruction set." Today you can embed your own micro inside an FPGA, and some of the high end ones have actual cores embedded in the silicon.

      --
      New! Device Legs: These legs will help your poor OEM installed product escape any hamfistedness it may encounter. Ava
    9. Re:Instruction set? by wed128 · · Score: 1

      Mod Parent UP, this was a pretty good article

  16. wow. by paulbd · · Score: 2, Funny

    it might even be as successful as the similarly revolutionary Kendall Square Research machine, just down the road from MIT.
    i wouldn't hold my breath.

  17. Tequila128 by crea5e · · Score: 4, Funny

    In related news, Boston College has also released a processor of their own.

    The Tequila128. Free copy of virtual beer pong included.

  18. Questions about company by Anonymous Coward · · Score: 0

    I am having some real problems with this. It sounds a little too good to be true. For now though, I will give them the benefit of the doubt (and I am sure most other people will too.

    1. Re:Questions about company by larry+bagina · · Score: 3, Insightful

      it reminds me of another T company -- Transmeta. I wonder if they'll hire RMS to work on HURD....

      --
      Do you even lift?

      These aren't the 'roids you're looking for.

    2. Re:Questions about company by secPM_MS · · Score: 1
      Actually, this brings up the even older Inmos Transputer, which we looked at when I was at Siemens Research in the mid 80's. They killed themselves by insisting that everybody go out and program in Occam.

      It will be interesting to see how this works out. In practice, the development tools seem to be primary. If I can't develop for it easily, people don't and the product fails. Hence, you need development / cross development tools.

    3. Re:Questions about company by TheRaven64 · · Score: 1

      I'd be interested to see Erlang running on this chip. I've used Erlang on a similar-spec, but (physically) much larger system; a 64-CPU 600MHz MIPS box from SGI, and it ran very nicely (the 32GB of RAM possibly helped a bit there...). The interconnect between the cores in this system could make it much faster if they can get the process distribution problem solved nicely. This could make the part very attractive to the telecoms industry.

      --
      I am TheRaven on Soylent News
  19. But does it... by niceone · · Score: 5, Informative

    well, yes it does run Linux - full SMP 2.6 according to the blurb on their site.

    1. Re:But does it... by Eponymous+Bastard · · Score: 5, Informative

      One thing the blurb doesn't make clear is that this is not a workstation CPU. It's designed for embedded systems and system on a chip applications. They mention video compression as an example.

      If you look at their block diagram this looks more like an FPGA-on-drugs than a CPU.

      The individual blocks are probably programmed with GCC, since it should be trivial to port it to a MIPS-like architecture. I wonder if the interconnect uses a VHDL type language or if they rely on their weird cache to build efficient shared memory.

      Either way, it looks like you have to keep in mind the architecture while designing your software. I doubt they can build a compiler that can manage the division of labor.

      Unlike a typical multicore design you wouldn't use this to parallelize a multithreaded application or a multiprocess workload. The center processors will have a very different latency characteristic than the edge ones, and you want the parts that interact with the network to be on the points adjacent to the controllers, for example.

      So it should work great for an especially designed system, but not so great as a general purpose CPU

    2. Re:But does it... by Sigy · · Score: 1

      I worked on the RAW project as an undergrad and started work on a shared memory network for it. The interconnect is actually controlled as part of the instruction set just like the processors. Back then it was only 16 cores on a chip. The way I setup the shared memory network was to use the edge processors to act as the controllers for the off chip memory they were connected to. The other processors would send requests for reads and writes to the appropriate edge processor. I'm sure that their current method is significantly different than what I designed.

      Each processor had some cache and there was also a special messaging network used to get to off chip non-shared RAM.

      If you really want to get into the details:
      http://cag-www.lcs.mit.edu/raw/documents/index.htm l

  20. consumption == dissipation? by Anonymous Coward · · Score: 0

    Doesn't the number of watts consumed have to equal the number of watts of heat dissipated?

  21. Pricing? by jcr · · Score: 0, Flamebait

    TFA doesn't say a thing about pricing of these parts. If anyone's been in touch with them, could you please let us know what they're selling for?

    -jcr

    --
    The only title of honor that a tyrant can grant is "Enemy of the State."
    1. Re:Pricing? by PhrostyMcByte · · Score: 1
      sure it does.

      The processor is also available in lots of 10,000 for $435
    2. Re:Pricing? by jcr · · Score: 1

      Ah, missed that. Thanks.

      -jcr

      --
      The only title of honor that a tyrant can grant is "Enemy of the State."
  22. Obligatory by Arceliar · · Score: 1

    I for one welcome our 64 core overloards.

  23. The Gentoo CPU by Rolman · · Score: 0

    It'd make the need to run 'emerge world' every week or so a lot less cumbersome. They should market it as the Gentoo CPU!

    --
    - Otaku no naka no otaku, otaking da!!!
    1. Re:The Gentoo CPU by wilymage · · Score: 1
      Damn, I just got excited...

      # echo 'MAKEOPTS="-j65" >>' /etc/make.conf<br>
      # emerge -atvDNu world

      Then I realised libexpat.so.* will, undoubtedly, fuck something up.
      --
      The secret to creativity is knowing how to hide your sources. -- Albert Einstein
    2. Re:The Gentoo CPU by Anonymous Coward · · Score: 0

      # echo 'MAKEOPTS="-j65" >>' /etc/make.conf<br>

      make.conf < br > ???
      Typical Gentoo user, "I LIEK EROOR MESSGS LOL"

      watch tail dmesg
      LOLZ
    3. Re:The Gentoo CPU by Anonymous Coward · · Score: 0

      libexpat.so.0

      What a fuckin nightmare that was.

  24. Tilera MDE by MrMunkey · · Score: 3, Informative

    For those of you wondering about what their software will be like, here's some info on their Multicore Development Environment (MDE). http://www.tilera.com/products/software.php It's not the most info in the world, but it's a start.

  25. ummm... Isn't Sun's T2 running 256 threads? by Fallen+Kell · · Score: 1

    The T1 was already doing 32, and the new T2 is supporting 256 in a single chip. Just wondering why "TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone", when the mile marker is already at 256?

    --
    We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
    1. Re:ummm... Isn't Sun's T2 running 256 threads? by Slashcrap · · Score: 3, Informative

      The T1 was already doing 32, and the new T2 is supporting 256 in a single chip. Just wondering why "TILE64 has indeed brought us into the era of 64 general-purpose, mesh-networked processor cores on a single chip, and that's a major milestone", when the mile marker is already at 256?

      Because this has 64 cores as opposed to 8 cores on either the T1 or T2?

      Because the total number of threads supported by an 8 core T2 is 64 and not 256 as you wrote above?

    2. Re:ummm... Isn't Sun's T2 running 256 threads? by rbanffy · · Score: 1

      I think neither T1 nor T2 are mesh networked. IIRC, all cores share a single internal bus. The shared bus design is easier to program as it resembles more traditional SMP architectures. This Tilera beast breaks away from this idea and implements a more supercomputer-like design.

      I wish them luck. As I said earlier, this x86-dominated desktop world is boring.

    3. Re:ummm... Isn't Sun's T2 running 256 threads? by Paradigm_Complex · · Score: 1

      There's a notable difference between threads and cores. The T2 has 8 cores, each working 8 threads, totaling 64 (not 256). http://en.wikipedia.org/wiki/UltraSPARC_T2 While this isn't a perfect comparison, consider a P4 with HT against a true Dual Core CPU.

      --
      "A witty saying proves nothing." - Voltaire
    4. Re:ummm... Isn't Sun's T2 running 256 threads? by TheRaven64 · · Score: 1

      The T2 is running 64 threads, not 256. It is running eight threads per core, on eight cores. Each set of eight threads on a T2 shares a set of execution units, each thread on the TILE64 has its own set.

      --
      I am TheRaven on Soylent News
    5. Re:ummm... Isn't Sun's T2 running 256 threads? by Anonymous Coward · · Score: 0

      The T2's per core thread architecture isn't even remotely comparable to Intel's Hyperthreading. It's far far better.

  26. MOD PARENT UP by dreamchaser · · Score: 1

    I was going to post the same sentiment but I wanted to read through some other replies first to avoid redundancy. This is *not* a general purpose CPU. It looks like it's targeted more towards high end switches/routers and things like advanced digital video applications (perhaps HD set tops or game consoles?).

  27. This was my companys idea in 2001 by John+Sokol · · Score: 4, Interesting

    It's was called Enumera www.enumera.com

    I started to work with Chuck Moore, the author of the FORTH Language on a 7X7 array of very fast small processors.

    From at talk I did, February 16, 2001
    From http://www.dnull.com/~sokol/amorp/emtalk.ppt

    On this size Chip a 7x7 array (49 CPU's) with ram could be
    build. Co-processors could also be added.
    Each CPU's would be operating at 2400 MIPS x 49 for a total of 117 Billion operations per second.
    The power consumption would be 1 watt 1.8 Volts a 500 mA.
    With this level of computing power new applications that were unthinkable before, now become possible. Also mention earlier on Slashdot:
    http://developers.slashdot.org/comments.pl?sid=138 584&threshold=0&commentsort=0&mode=thread&cid=1160 0799

    And earlier here:
    http://www.colorforth.com/ 25x Multicomputer Chip

    This eventually became IntellaSys after Enumera failed.

    IntellaSys CTO Chuck Moore to Present at In-Stat Spring Processor Forum; Scalable Embedded Array Platform for Implementing Asynchronous, Scalable Multicore Solutions Using Elegant VentureForth Programming to Be Discussed in Detail http://www.intellasys.net/products/24c18/SEAforth- 24A-3.pdf
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2005_Oct_24/ai_n15730157
    http://www.findarticles.com/p/articles/mi_m0EIN/is _2006_May_1/ai_n16135032

    Also for older info see:
    Specifically look at the P21 / I21/ F21 chips...

    http://www.enumera.com/chip/
    http://www.ultratechnology.com/ml0.htm
    http://www.ultratechnology.com/f21.html#f21
    http://www.ultratechnology.com/store.htm#stamp
    http://www.ultratechnology.com/cowboys.html#cm

    --
    I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    1. Re:This was my companys idea in 2001 by perkr · · Score: 1

      Seriously, what was the point with your post?

      That there are earlier patents on related technology? That you want credit for whatever they are doing? It would help if you motivate a post like that.

    2. Re:This was my companys idea in 2001 by John+Sokol · · Score: 2, Interesting

      I an not sure really what the point is, I guess I am just venting out of frustration. Also adding some information to anyone interested similar work I had done, showing this isn't a new idea.

      I put $100,000 Cash and almost 2 years worth of work into this and got nothing, no one was even interested.
      But then I see a Bunch of MIT weenies do it and they get all kinds of attention as something new and revolutionary 6 1/2 years later.

      There is also a real chance they took the idea right off my web site or slashdot post or maybe even present at my talk and never even gave me some credit for the concepts. There design really looks like it was lifted straight off my paper.

      So I guess at least I am exposing some plagiarisms.

      I mean what the heck is the point of having an incredibly good idea and investing so much time and money into it just to watch someone else profit from it without so much as a thank you.

      I was at least trying not to whine and complain in my post and keep it purely informative and provide links to my very similar earlier works.

      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    3. Re:This was my companys idea in 2001 by suv4x4 · · Score: 3, Insightful

      I an not sure really what the point is, I guess I am just venting out of frustration. Also adding some information to anyone interested similar work I had done, showing this isn't a new idea.

      I put $100,000 Cash and almost 2 years worth of work into this and got nothing, no one was even interested.


      I'm not sure why the frustration. I'm sure multi-core was not just your original idea. If you're in the industry you know that:

      1. IT is rich on ideas, poor on implementation.
      2. Marketing a product is just as (if not more) important than making a product.
      3. Most businesses fail in the first 5 years. And this one may be no exception. They didn't exactly enjoy massive success just yet. They got few crappy articles and landed Slashdot. Kind of hard for a hardware company to cash in on that alone.

      There design really looks like it was lifted straight off my paper. So I guess at least I am exposing some plagiarisms.

      You don't expose plagiarism by venting frustration on Slashdot: where are your patents. How's there guarantee you're the originator, and how's there guarantee they *stole* your work versus reinvent it independently, which happens often with technology that's in a boom (i.e. multi-core designs). There's a reason the patent system exists, forget the grab you read here about patents on Slashdot.

    4. Re:This was my companys idea in 2001 by rastoboy29 · · Score: 1

      It sounds cool...so why did Enumera fail?

    5. Re:This was my companys idea in 2001 by Anonymous Coward · · Score: 0

      Don't fret. Just like you, these guys can't compete with the big boys. Transmeta had over 40 million in funding and look what happened to them. When Intel or AMD release a 64-core CPU then the industry will care. Tilera will be history in a couple of years. They will have about 3 months of fake good news about sales to academic and research institutions but nobody doing serious work is going to use an alpha quality processor.

    6. Re:This was my companys idea in 2001 by pmadden · · Score: 2, Insightful
      Of course, this was also Thinking Machines idea a bit earlier. http://en.wikipedia.org/wiki/Thinking_Machines
      It's good to see that MIT has perfected the technology.
      • Build a machine with lots of processors.
      • Get investors to buy into the hair-brained scheme.
      • ??? (Mention that programming is a problem to be solved shortly.)
      • Skip town with the cash (Profit!).
      Hmmm. I think I'm missing something about a beowulf cluster, or maybe underpants.
      It's scary how little history people know. Programming for multi-processor machines was part of the ACM recommended university curriculum back in 1968. Dozens of companies were going to revolutionize the world with parallel (anyone remember the Atari ATW? http://www.atarimuseum.com/computers/16bits/transp uter.html). If parallel worked, it would be really great; I'd like a big rock candy mountain and free energy, while we're at it. Amdahl's law http://en.wikipedia.org/wiki/Gene_Amdahl is from 1967 (this is the 40th anniversary, people!). Madness, sheer madness.
    7. Re:This was my companys idea in 2001 by Anonymous Coward · · Score: 0

      Hrm. According to this, the core ideas in this startup's work were discussed publicly at least as early as 1996.

      Heck, they were in Scientific American in '99. Any chance you might have caught the article?

    8. Re:This was my companys idea in 2001 by John+Sokol · · Score: 5, Informative

      Parallel processors on a single die (chip) is very different from Thinking Machines & beowulf clusters.

      Up till now there were only 2 types of Parallel processing.

      1.) loosely coupled. Thinking Machines & beowulf clusters for example are using this, these are interconnected with Ethernet or some other Network medium and send messages back and forth.

      2.) Tightly coupled, this is SMP, NUMA, SNOOPY, basically shared memory system where each processor shares the same global memory space.

      Each requires very different programming strategies and are limited to certain types of problems.

      There is also a third form that is lesser know. This systolic arrays. An example of this is TimeLogic, and many DOD type projects.
      This is usually done with a bunch of FPGA's and the math computations are done as a series of hardware pipelines without any CPU.

      With the parallel core processor it's possible to make it like an SMP (share memory) type system, but you really get hammer with the memory bottleneck so after about 4 CPU's you don't really gain much.

      What I had proposed with doing systolic array type of processing but with Simple but fast CPU's on one chip.
      They would be connected with CPU registers that would pass data directly from one CPU to the next.
      It's design would allow super tight coupling between each processor, so a programming problem wouldn't need to process a buffer at a time but could tackle problems that can't normally be broken up into parallel operations. For example a bignum math operation like multiplying 2 number that are 1024 bits long. Or large FFT, fast DVT, or matrix operations where each cpu could process part of a single operation that must be done serially, and can not be done using traditional parallel processing.

      Specifically my interest was in video compression and image processing in real time. This is where DCT, motion vector searches Huffman coding and other operations that don't parallelize well would really get a boost using this type of processor.

      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    9. Re:This was my companys idea in 2001 by John+Sokol · · Score: 1

      Hrm. According to this, the core ideas in this startup's work were discussed publicly at least as early as 1996. Which paper on that site? I don't see it.

      I was also thinking about this as early as 1996 myself, the company to do this starting in 2001.
      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    10. Re:This was my companys idea in 2001 by Anonymous Coward · · Score: 0

      Um, Forth?

      Yes, *of course* it's 1000x times easier to write code, and the code is only 1000x smaller than other code, but if it gets nothing done for the user...

      It's sad, Chuck Moore was a really innovative guy back in the day, but since he's gone 1000% simplicity, there's not much more. iTV failed (nice idea, probably lack of functionality?).

      ColorForth isn't exactly what you expect in terms of productivity from your computer. I don't care if it's 1000x times faster, I care about some functionality at a speed I can work with, like my 5-year-old PC here.

    11. Re:This was my companys idea in 2001 by Anonymous Coward · · Score: 0

      just one word: Transputer

      whether they're on a single die or not is a matter of cost savings and size.

    12. Re:This was my companys idea in 2001 by John+Sokol · · Score: 1

      Yes I have used Transputer's, they would be considered loosely coupled with high speed links.
      What I was working on was much more tightly coupled.

      The Transputer is now replaced with Beowulf Clusters.

      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    13. Re:This was my companys idea in 2001 by pmadden · · Score: 1
      > Parallel processors on a single die (chip) is very different from Thinking Machines & beowulf clusters.

      IMO, single or multiple die configuration is not a big distinction; the impact is a constant factor on communications latency (and throughput). One of the other posters mentioned the Transputer; while not on a single chip, they went to a lot of effort to minimize inter-chip communications times. It's easy to get caught up in implementation details; sort of like arguments over what programming language is best. It's the underlying algorithms and complexity that matter.

      Getting meaningful parallelism is difficult, even if you could get zero communications delay. It's amusing to me that Anant Agarwal is pushing this idea, when he had a CACM (Vol 34, No 3, March 1991, pp. 57-61). paper that mentions "for any algorithm in which the overhead due to parallelism increases with p, the scaled speedup is necessarily less than linear, and that when the parallelism overhead is linear or worse with p, there is actually a hard upper bound on the achievable scaled speedup." http://portal.acm.org/citation.cfm?id=102868.10287 1&coll=portal&dl=ACM&idx=102868&part=periodical&Wa ntType=periodical&title=Communications%20of%20the% 20ACM&CFID=32421333&CFTOKEN=50501291. These observations were based on some work by Richard Karp http://portal.acm.org/citation.cfm?id=894803&dl=AC M&coll=portal&CFID=32421333&CFTOKEN=50501291, who I have a lot of confidence in. What this means is that for most algorithms, Amdahl is right. If your communications time is non-zero, you're going to hit the wall. Better interconnect pushes the wall back, but not forever.

      Anyway, this is all nit-picking. Whether or not Tilera survives depends a lot on what they're going to try to do with it. If the application is graphics (or, as you mention, video processing), the problem allows a lot of parallelism; multi-core makes a lot of sense here (nVidia and ATI have been doing this for years). Cisco-style router switches are good applications too. I heard Agarwal give a presentation at a conference earlier this year, where he was making his pitch; I thought there was way too much hand-waving about the application. The investors will find out the details soon enough, and in the mean time, I've got another stock to short!

    14. Re:This was my companys idea in 2001 by John+Sokol · · Score: 1

      I am talking about something around less than 0.2 nanoseconds Latency. Basically 1 cpu instruction of latency at 6ghz.

      The Transputer send messages! not Registers at the cpu level!

      This is a very large distinction because it allow you to parallelize all problems even those that can't be run in parallel in normal parallel architectures.

      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    15. Re:This was my companys idea in 2001 by Anonymous Coward · · Score: 0

      2.) Tightly coupled, this is SMP, NUMA, SNOOPY, basically shared memory system where each processor shares the same global memory space.

      You do know that NUMA stands for non-uniform memory architecture, right? As in, all processors don't share the same memory space.

    16. Re:This was my companys idea in 2001 by John+Sokol · · Score: 1


      Good lord, yes, I know what NUMA stands for.
      It's original design came out of members of the "PARALLEL Processing Connection" - PPC" that was run by Mitchell Loebel. That group used to meet at the Sun Micro Systems PAL1 headquarters building in Sunnyvale, Ca once a month, I was part of that organization for over 10 years. One of the PPC members was also a member of the IEEE committee for NUMA, David B. Gustavson, who also developed The Scalable Coherent Interface, SCI.

      It's still a share memory system. Just not Symmetrical, so each processor can share different sections of the memory.
      All of these shared memory schemes have to implement complex algorithms to keep the CPU caches current and prevent race conditions.

      The Sun Fire V210, V240, V440, 4800, 6800, 6900, 12K, 15K, 20K, 25K were based on Cache coherent NUMA (ccNUMA).
      Some of these supported up to 64 UltraSPARC processors.

      I was a developer on Solaris 2.5

      So it's not like I just fell off the apple cart.

      --
      I am always doing that which I can not do, in order that I may learn how to do it. - Pablo Picasso
    17. Re:This was my companys idea in 2001 by nuzak · · Score: 1

      Ah yes, the X21 chips, the chips that will save the world. Weird word sizes, and of course programmable only in forth. By forth nuts, for forth nuts, forth forth uber alles, all other languages are conspiracies by hardware companies to sell faster hardware because they're all awful kludges compared to the shining beacon of enlightment that is forth, etc.

      Sorry, maybe I read too much comp.lang.forth, but the sour attitude seemed to stem from chipchuck himself. Just possibly it turned off a few potentially interested parties? Now the shBoom looked promising, but under PatSci's stewardship it appears doomed to be nothing but a source of IP licensing.

      --
      Done with slashdot, done with nerds, getting a life.
    18. Re:This was my companys idea in 2001 by mr_mischief · · Score: 1

      It's super easy to get from most modern languages into Forth via translation. Forth is very similar to postfix assembly for most architectures. Compiler texts sometimes show how to get from high-level languages to machine code via building a syntax tree followed by traversing the tree and producing postfix pseudo-assembly. Outputting Forth would be no more difficult.

      Some Forth implementations are actually implemented so that words in the Forth dictionary are directly executable machine code even on standard multi-purpose CPUs.

    19. Re:This was my companys idea in 2001 by nuzak · · Score: 1

      Forth can be generated, sure, but at that point why not skip the middleman and generate assembly? For a forth chip, forth is ths ISA, so what you really have is just another target architecture. Now that we're talking about equal terms, what remains is the question: is the Zero-Operand Instruction Set architecture actually superior?

      There's something to be said for starting again from fundamentals, but I doubt the SeaForth folks solved any of the basic design issues that Intel or AMD face. The ISA isn't everything, but you'd never know it from the advocacy that comes from c.l.f.

      --
      Done with slashdot, done with nerds, getting a life.
    20. Re:This was my companys idea in 2001 by zix619 · · Score: 1

      I believe also of great importance is that where you market your idea. These people seem targeting the right market: advanced networking applications like UTM and deep inspection. Though, I wonder how they can shovel in the IP packets fast enough to different cores?

    21. Re:This was my companys idea in 2001 by mr_mischief · · Score: 1

      Those are very good points. One doesn't necessarily have to beat AMD or Intel at desktop and server lines to be successful, though. Intel, Zilog, Freescale, TI, AMD, Via, Sun, IBM, Fujitsu, and more have programmable processors that sit nowhere near the desktop CPU market. Even those companies at or near the top of that market make chips for motherboards, disk drives, Ethernet adapters, cars, coffee makers, industrial control, etc.

      The great thing about a new concept from outside established players is that it can be integrated with the best of the technology through buyouts, mergers, or licensing deals if the concept is worthwhile. It's much harder for the established leaders to come up with a wholly new idea than for the people with the new idea to get some manufacturing tech. A collaboration could be a big win for everyone involved in it.

      I'm sure c.l.f is not a lot different in many ways from c.l.lisp, c.l.haskell, c.l.perl.misc, or any other language newsgroup where there is a lot of personality and promotion within the language community. There are Lisp machines, there are Java processors, the BASIC stamp, and Perl replacements for most of the Unix / GNU command line utilities (partly to have a Unixish system with little more than a C library and a Perl interpreter and partly to bring these tools to non-Unix environments easily). The fact that Forth is closer to the level of normal hardware than most dialects of Lisp or Basic and much closer than Java is means it may actually be a lot more flexible than systems designed with those other languages in mind. However, there are always those zealots who take anything good too far.

  28. Everybody wants it "easy" by Anonymous Coward · · Score: 0

    What's with all these "Analysts" whining about wanting it easy to program..?
    Umm, let's see.. we could use BASIC: 10 RUN PARALLELIZED...

    With all the sophisticated advances in processor designs, why do people think that they should wait for a "magical" easy solution for the software side? Face it, the HW teams didn't wait for an easy "magical" solution, they actually stretched their minds to consider some highly effective alternative design optimizations that allow for some incredible leaps in performance and integration.

    So why can't the SW side do the same? Start thinking laterally, in naturally aligned parallelism, with real concurrency, and simultaneous MIMD. Our brain is the best tool for evaluating a range of algorithms that best fit the problem. But many seem to expect some spiffy add-on class library or Optimization-Factory will supply the magic.

    1. Re:Everybody wants it "easy" by Anonymous Coward · · Score: 0
      The reason most Analysts want it easy is the average programmer/developer can not think beyond a single thread.

      So to get the best out of a multi-processor/threaded system appropriate frameworks have to be set up so the average grunt programmer that they are working with can develop without thinking.

  29. Speed per core? by jshriverWVU · · Score: 1
    It mentions 600-900mhz, is that per core or per total CPU? While 64 900mhz cores sounds nice, 900mhz made up of 64 14mhz cores is kinda pointless. That would be like cascading a bunch of PIC chips together, so I'm guessing it's the first. Also what kind of architecture is it? Are there spec manuals available to people can start porting gcc, libc, and eventually the linux kernel to it.

    Seems interesting, would be nice if it comes out at an affordable price.

    1. Re:Speed per core? by Slashcrap · · Score: 1

      It mentions 600-900mhz, is that per core or per total CPU? While 64 900mhz cores sounds nice, 900mhz made up of 64 14mhz cores is kinda pointless.

      Each core runs at 600-900MHz. Nobody would release a CPU made of 64 cores at 14MHz and then claim that it was a 900MHz CPU. That is so retarded that even the most suicidal marketing department would balk at the idea.

      Seems interesting, would be nice if it comes out at an affordable price.

      If it does, it will be in quantities of 1000 or more. They're not going to sell you one.

    2. Re:Speed per core? by the_greywolf · · Score: 1

      It mentions 600-900mhz, is that per core or per total CPU? While 64 900mhz cores sounds nice, 900mhz made up of 64 14mhz cores is kinda pointless.

      I think you're confused. Never once in the history of computing has the frequency been a factor of the number of CPUs. A CPU's frequency is the measurement of the number of times the CPU's clock or timer asserts in a second. In an SMP environment, all CPUs operate on a synchronized clock, and therefore operate at the same clock speed. All 64 cores operate at 900MHz.

      --
      grey wolf
      LET FORTRAN DIE!
    3. Re:Speed per core? by f8l_0e · · Score: 1

      Nobody would release a CPU made of 64 cores at 14MHz and then claim that it was a 900MHz CPU. That is so retarded that even the most suicidal marketing department would balk at the idea. May I present a '64-bit' gaming system to you? http://en.wikipedia.org/wiki/Atari_Jaguar/
    4. Re:Speed per core? by Anonymous Coward · · Score: 0

      Nobody would release a CPU made of 64 cores at 14MHz and then claim that it was a 900MHz CPU. That is so retarded that even the most suicidal marketing department would balk at the idea.

      Let me introduce you to Sun:
      http://blogs.sun.com/jonathan/entry/sun_enters_the _commodity_silicon

      "We're announcing the fastest microprocessor we've ever shipped this week - delivering 89.6 Ghz of parallel computing power on a single chip" -- From 8 cores running @ 1.4 GHz each. Where does the 89.6 come from? He multiplies the FULL speed by the number of hardware threads (8 per core).

      Yes, 11.2 actual GHz becomes 89.6 for marketing purposes. The other 78.4 is waiting around in case it ever gets to run across 8 dimensions of time at once, with Buckaroo Bonzai and the Hong Kong Cavaliers.

    5. Re:Speed per core? by dbIII · · Score: 1
      It is the clock speed. It means that 600 million times per second another instruction can be processed wherever something is waiting for a clock pulse.

      Video is a trivial thing to run in parallel - doing the same transform on the next 64 frames is an ideal application. Geophysics would be another big application where it's not hard to split one job into a lot of parallel jobs, or finite element analysis (engineering design) or a lot of other numerical applications.

    6. Re:Speed per core? by vidarh · · Score: 1

      Your point being? The claim isn't analogous, and the Jaguar actually had 64 bit co-processors so the claim wasn't completely unfounded. Even then the claim caused widespread scorn from people who thought it had no merit, and so is yet another reason lesson to any marketing department trying to go further.

  30. Let the geeks solve the problem by rbanffy · · Score: 2, Interesting

    "'What will make or break Tilera is not how many peak theoretical operations per second it's capable of (Tilera claims 192 billion 32-bit ops/sec), nor how energy-efficient its mesh network is, but how easy it is for programmers to extract performance from the device. That's the critical piece of TILE64's launch story that's missing right now"

    Build a USD1000 desktop workstation, port Debian Linux to run on it and let the geeks out there adopt it.

    There is no better way to explore a device's capabilities than to let the market do it.

    I want one for myself. I am tired of the x86 architecture.

    1. Re:Let the geeks solve the problem by Nontagonist · · Score: 1

      "Build a USD1000 desktop workstation, port Debian Linux to run on it and let the geeks out there adopt it.

      There is no better way to explore a device's capabilities than to let the market do it.

      I want one for myself. I am tired of the x86 architecture."

      Well, in fact they've done almost this. The thing does run Linux, and
      they have developed a "TILExpress-64(TM) CARD" which is a PCIe expansion
      card with one of these things on it along with a bunch of stuff.

      http://www.tilera.com/products/boards.php

      No word so far on pricing for that card or the development tools.

      However, I have noticed something that is conspicuously absent from the
      coverage of the chip so far - the details of its memory management, or
      lack thereof. No word so far on how much memory it addresses or whether
      it has the sort of memory protection that we've come to expect of modern
      general-purpose CPUs. My guess is that what they have is an absence of
      any traditional MMU, and badly written code is free to run rampant in
      system memory. This is understandable in an embedded system.

      Regards, Non.

      --
      There is another theory that states that this has already happened.
  31. This All Fine and Good, But by LifesABeach · · Score: 1

    When I sit down to play World of Warcraft, what can I expect?

    1. Re:This All Fine and Good, But by Anonymous Coward · · Score: 0
      When I sit down to play World of Warcraft, what can I expect?

      An awkward life without normal social interaction?

    2. Re:This All Fine and Good, But by dm0527 · · Score: 2, Funny

      The same old tired, boring grind and stupid, inane and childish behavior by your fellow gamers?

      --
      - dm - The two most common elements in the universe are Hydrogen and stupidity.
    3. Re:This All Fine and Good, But by ConradBurner · · Score: 1

      Nothing, go back to your game and leave slashdot forever, noob.

    4. Re:This All Fine and Good, But by Anonymous Coward · · Score: 0

      When I sit down to play World of Warcraft, what can I expect? Such a general question... how about.. me killing you over and over... "OWNED"
  32. No by Wesley+Felter · · Score: 1

    The UltraSPARC T2 (Niaraga 2) has 8 cores and 64 threads, so Tilera has more cores, more functional units, and an equal number of threads.

  33. Nothing to see here... by Anonymous Coward · · Score: 0

    NVIDIA and ATI are touting their latest generation of processors to be roughly equivalent to these things. Let's face it, all a GPU is in the latest revision is a bunch of parallel CPU's with instruction sets (aka "Shader Models") optimized for graphics.

    NVIDIA is even going so far as to offer prepackaged units of multiple GPU's as tomorrows supercomputers for science. The setups are named TESLA for anyone who's interested.

    What will be interesting is if they bring something better to the table software wise. I'm no programmer, but i hear NVIDIA's CUDA language is a female dog-- not easy to develop for. Even Folding@Home hasn't expressed any intention of porting it's program to it (and NVIDIA hasn't offered to help either... don't get me started on the conspiracy theories!)

    in conclusion,

    Move along...

    1. Re:Nothing to see here... by Wesley+Felter · · Score: 1

      Because of all their weird restrictions, GPUs are going to be much harder to program than this Tilera chip. Let me know when somebody is running Snort on a GPU.

  34. Silly Startup, Trix are for Kids !! by systemBuilder · · Score: 0, Flamebait

    Not again !!

    "hey, this must be an embedded chip .." (translation: We FORGOT the MMU!)

    "It supports 64 cores and 192 (insert greek word here) flops!" (translation : guaranteed NEVER to COME CLOSE to THAT throughput, we just wanted you to know ...)

    "We have more software engineers than hardware engineers!" (translation : we outsourced all the software to India for the cost of ONE hardware engineer!)

    "We are using 90 nanometer process" (translation : we cannot afford 0.65 or 0.45 nanometer process, we're a startup, HELLO !!!)

    *sigh*. Not again.

    MIT has never succeeded with a supercomputer before, and it looks like they're aiming to keep it that way ...

    1. Re:Silly Startup, Trix are for Kids !! by dreamchaser · · Score: 1

      I know you're trolling, but this chip isn't designed for supercomputers. Of course you'd know this if you'd read the article and/or visited the website. It *is* targeted at embedded apps such as routers, firewalls, and video decoders. That doesn't mean it will be successful, but your post is worth less than the sum of the electrons used to make it.

  35. Oblig II - The /.er Strikes back by ToxicBanjo · · Score: 1

    ... does it run Linux?

    --
    There are only 10 kinds of people in the world. Those that understand binary and those that don't.
  36. Not 64.. by Namlak · · Score: 2, Funny

    Actually, 42 cores is the answer.

  37. Deep packet inspection by Slavidian · · Score: 2, Interesting

    Tilera will succeed because the packet pushers want to be able to do deep packet inspection. Pay close attention to the first three in the apps list from their website:

    Unified Threat Management
    Network Security Appliances
    In-line L4-7 deep packet inspection
    Network Monitoring
    Digital Video:
    Video Conferencing
    Video-on-Demand (VoD) Servers
    Video surveillance
    Media 'Head-End' services

    The engineers in charge of this company should be ashamed of themselves. They are creating exactly the type of product that will help the telcos destroy the internet. DPI and UTM are completely at odds with the intentions of networking protocols. Tilera is handing over control of everything that you and I do online to the telcos. Where is Google? They should be diametrically opposed to the success of this company. Buy them up and quash them.

    1. Re:Deep packet inspection by RightSaidFred99 · · Score: 1

      Haha. Someone modded this tripe interesting.

  38. I for one by tttonyyy · · Score: 3, Funny

    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords.
    I, for one, parallel welcome our new beowulf joke superseding overlords. ... ... ...
    I, for one, parallel welcome our new beowulf joke superseding overlords.

    --
    biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!
    1. Re:I for one by fava · · Score: 3, Funny

      Actually that is a serial welcome, a parallel welcome (8 core) would be:
      IIIIIIII,,,,,,,, ffffffffoooooooorrrrrrrr oooooooonnnnnnnneeeeeeee,,,,,,,, etc.

      fava

    2. Re:I for one by ded_guy · · Score: 1

      Why did the multithreaded chicken cross the road?
      other to the To side. get

      --
      In the future, all spacecraft will be made of cheese.
    3. Re:I for one by tttonyyy · · Score: 1

      Actually it was a Very Long Instruction Welcome which would be parallelised in the core. ;)

      --
      biopowered.co.uk - catalytically cracking triglycerides for home automotive use since 2008. Just say no to big oil!
  39. A few things by bcmm · · Score: 1

    A few things jump out just skimming this:

    Is the compiler open-source? Is anyone looking at making GCC do this? What exactly have they done to Linux to make it run on these, and is it likely that the changes will make it into the mainline kernel? Also, they don't seem to mention if they have a C++ compiler.

    --
    # cat /dev/mem | strings | grep -i llama
    Damn, my RAM is full of llamas.
  40. The Cray XMT processor does 128 cores on one chip. by Anonymous Coward · · Score: 0

    This looks like the little and late sibling of the Cray XMT processor.

  41. Who needs embedded 64 bit ? by dynomitejj · · Score: 0

    Who the heck needs EMBEDDED 64 bit ? When I think embedded, I think p133 on a small board computer. What the heck am I missing ? So, let's get this straight: for embedded, I can now build a POS with 128 gigs of ram and 64 processors. Nice processor.. that's what you would expect from MIT. You or I won't see anything like this for 5 years.

  42. yawn......... by dynomitejj · · Score: 0

    The article is mildly interesting, but I'm so wasting my time reading the mindless comments here.

    1. Re:yawn......... by MXPS · · Score: 0

      i wonder what kind of overclock i could get on my ubercustom phase change system

  43. Deep GPL inspection by Anonymous Coward · · Score: 0

    "Buy them up and quash them."

    How very Microsoft of you. Anyway you can't quash ideas. I know this because slashdot tells me every time the GPL or intangible bits are discussed.

  44. OT: What is the CMS used by Tilera Corporation? by Anonymous Coward · · Score: 0

    Tilera Corporation's website seems quite fast. Does anyone know what is the CMS used by Tilera Corporation?

  45. required memory bandwidth? by Goonie · · Score: 1
    It's not clear who came up with it, but there's an old joke about supercomputers being devices to convert computation-bound problems into I/O-bound problems.

    This chip would almost certainly have the same issue in many applications - how do you get data on and off it fast enough to keep the cores full of data? Do they do anything unique to improve memory bandwidth?

    --

    Any sufficiently advanced technology is indistinguishable from a rigged demo
    --Andy Finkel (J. Klass?)
  46. boy... by Anonymous Coward · · Score: 0

    those crackers over @ MIT are really something eh?

  47. Name by Anonymous Coward · · Score: 1, Funny

    Should have called it the 'Dommocore 64'.

  48. Newegg by lateralus_1024 · · Score: 1

    It's already on Newegg boys. Do get the preferred finance card.

    --
    If you think /. comments are bad, check out Digg.
  49. Clock speed is a theoretical grail... by Anonymous Coward · · Score: 0

    If you can just throw clock speed at a problem, you can brute-force anything.

    It's much easier, in the real world, to simply throw cores at a problem, but that only works for problems which are parellizable, assuming the programmers have bothered to. Take Python -- lots of things to make threading easy, it uses real OS threads, and then it ruins the whole thing for multicore with the GIL.

    Of course, it's easier if you start in a language that forces you to think that way. I bet any decent Erlang app (even a Swing-like GUI, assuming Erlang has a GUI) will find a way to use at least half those cores, and maybe all.

  50. pthreads anyone? by canuck57 · · Score: 1

    Either way, it looks like you have to keep in mind the architecture while designing your software. I doubt they can build a compiler that can manage the division of labor.

    That has always been true to a point but the limitation is with management, software designer and programmer not the compiler.

    An example, how many programmers know what pthreads and mutex are? How many have used it? We have had multi-processor systems for years and still the applications we most often buy are not designed for it. POSIX threads have been around for years and work quite well when properly used. Quite stable and portable API too.

    Many Java programs are multi-threaded, yet deadlocks and stalls occur all the time because of design issues. I hate it when the GC goes nuts for 30 seconds stalling everything or for a dead lock to timeout (if it does). The designer and programmers need to understand what thread safe is, reentrant issues and the like. And design accordingly.

    Now I know many /. readers know what pthreads are, try asking your average UNIX admin, or software designer in I/T that.

  51. Gotta go hard-core by CopaceticOpus · · Score: 1

    How will AMD compete with this? They need to do something sexy.

    I've got it: A 69 core processor! Call it the Amsterdam Sexxxxxxxtreme 69.

  52. I know how to use the processor by Ougarou · · Score: 1

    Simply by using Ray tracing you can fill those cores up to their max and probably have allot of fun! How about having a processor per pixel in you mobile phone, this whole multi-core thing is going to be a delightful future.

  53. Nothing new by El+ruperto · · Score: 1

    A lot of their claims are laughable -- suggest they really have not done any competitive research. Have a look at picoChip: they sampled a chip in 2002 with 400 processors programmed in C and a very similar any:any mesh. So how they can claim they claim "But Tilera is the first company to offer a product that uses the new [mesh] architecture" or "existing multicore technologies simply cannot scale beyond a handful of cores" There are plenty of others who have tried and failed, of course, but picoChip do seem to be for real & shipping stuff. There is nothing new under the sun, and marketing spin is called "vaporware" for good reason

    1. Re:Nothing new by Anonymous Coward · · Score: 0

      Late reply but...for the Google search record.

      I've programmed Picochip. This chip looks quite different in the details, although similar at a high level. Picochip is a very good architecture, and quite good tool chain, but the applications for it are limited by the architecture and the development effort for any application is high.