Intel Talks 1000-Core Processors
angry tapir writes "An experimental Intel chip shows the feasibility of building processors with 1,000 cores, an Intel researcher has asserted. The architecture for the Intel 48-core Single Chip Cloud Computer processor is 'arbitrarily scalable,' according to Timothy Mattson. 'This is an architecture that could, in principle, scale to 1,000 cores,' he said. 'I can just keep adding, adding, adding cores.'"
I hope he never works for Gillette.
Sometimes, life itself is sarcasm...
From the article: "By installing the TCP/IP protocol on the data link layer, the team was able to run a separate Linux-based operating system on each core. Mattson noted that while it would be possible to run a 48-node Linux cluster on the chip, it "would be boring."
Huh?! Boring?! It would have been a nice a first post on Slashdot on the eternal topic - does it run Linux? - to begin with.
The we have all the programming goodies to follow up with.
Are they trying to reinvent Transputer? :)
But yes, I am happy to see Intel pushing it forward!
Paul B.
This is for server/enterprise usage, not consumer usage. That said, it could scale to the number of cores necessary to make realtime raytracing work at 60fps for computer games. Raytracing could be the killer app for cloud gaming services like OnLive, where the power to do it is unavailable for consumer computers, or prohibitively expensive. The only way Microsoft etc. would be able to have comparable graphics in a console in the next few years is if it were rental-only like the Neo-Geo originally was.
Corruption is convincing someone that the selfless ideal is the same as their selfish ideal.
http://xkcd.com/619/
Would the temperature raise 1000 times more than now?
(Would we need cryogenic coolers?)
Imagine a Beowulf cluster of th^H^H^H
Ah, forget it, the darn thing practically is one already! :/
"Imagine exactly ONE of those" just doesn't sound the same.
Why have 1000 cores when you can have 1 MILLION CORES, (all running applications that can barely take advantage of 1 or 2)
Having been in attendance of this presentation at Supercomputing 2010, for once I can say without a doubt that the article captured the essence of reality. The only part it left out is that the interconnect between all the processing elements uses significantly less energy than that of the previous 80-core chip; I think the figure was around 10% of chip power for the 48-core, and 30% for the 80-core. Oh, and MPI over TCP/IP was faster than the native message passing scheme for large messages.
"It's a lot harder than you'd think to look at your program and think 'how many volts do I really need?'" he [Mattson] said.
First was RAM (640kb should be... doh), then M/GHz, then Watts, now is volts... so, what's next?
(my bet... returning to RAM and the advent of x128)
Questions raise, answers kill. Raise questions to stay alive.
Am i the only one feeling this is just a foray into multicore chips because they hit a brick wall when it comes to faster single core CPUs? While i like the thought of say 8 cores or something id much rather have those 8 cores being faster than having a frigging supercomputer under my desk.
HTTP/1.1 400
just throw more cache at it :D
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Er, yeah, pretty much everyone knows they have no practical way to make the clock speed much faster. The only thing they can do is proliferate cores beyond all reason. Nobody has the slightest idea how to take advantage of that many cores in normal household use and even most workstation use.
Intel would not have (presumably!) to re-invent *Intel* Paragon! :)
We can throw a Connection Machine in there, and really date ourselves -- but it's still nice to know that finally CMOS tech has caught up with late 80s comp. arch. advances!
And then, do not get me started on the original Tera, with its multithreading it seemed to be much better bang for the buck of chip real estate than currently accepted multicore solutions. But what would I know...
Paul B.
Again...
Alternatively, NUMA on a single CPU (different memory channels connected to different cores).
It would be a bitch to program (but fun nevertheless).
given that for years GPU's have hand hundreds of processors (the power of CUDA is awesome!) this is long over due by lazy CPU designers like Intel....
I took an intro to ECE class last fall that was basically just a parade of people coming in and talking about the kinds of things that they do as an engineer. One of the speakers talked about how one could have all of these cores, but that coding to take advantage of all of them was such a difficult task that it's hard to find any software that takes advantage of the few cores we're shipping today, let alone a hundred cores or a thousand cores. Apparently he was working on a project - a sort of wrapper? I think he mentioned AI but I don't know if he was just blowing smoke up our ass at that point - to help streamline writing for thousands of cores. I don't know how much truth is in that but I found it interesting, and would love to hear from someone who actually codes these kinds of things.
with all that heat, it would be nice to have a skillet that could cook a samwitch or eggs or brew coffee. I lived on a Mr. Coffee machine for over 3 years of boiling vegetables or tea and my only regret is it while keeping the room warm and the occassional hot towel bath it would have been nice if it's heat source was from an embedded computer rather a wastefull heating element. I know some people used a self-throttling Pentium 4 to boil food from their waterblock and such. Why not?
Just how small does your penis need to be to need a 1,000 cores?
That's what it takes to run Flash these days.
Make a processor with four asses.
This just goes to show that if you care about having a future career (or even just continuing with your existing one) in programming, Learn a functional language NOW!
I dream of a nation where a man is not judged by his skin color but by an number assigned by a credit rating agency.
Why? :) I know. meme. It's just, I've built a couple Beowulf clusters for fun, and didn't have an application written to use MPI (or any of the alphabet soup of protocols), so it was just an exercise, not for any practical use. It's not like most of us are crunching numbers hard enough to need one, and it won't help out playing games or even building kernels.
I'd like to see a 1k core machine on my desktop, but that's beyond the practical limits of any software currently available. Linux can only go to 256 cores. Windows 2008 tops out at 64. But hey, if they did come to market, I know who would be first to support all those cores, and it doesn't come from Redmond (or their offshore outsourced developers).
Serious? Seriousness is well above my pay grade.
Would be interesting to know if this helps with performance/power ratio against (potential many core/cpu) ARM servers.
The only thing I'd be compensating for is the fact I can't do calculations at Exaflop rates in my head.
Just like my car only compensates for the fact I can't run at 165mph. :)
Serious? Seriousness is well above my pay grade.
1000 cores on a chip isn't too bad. I already have one with 110 cores.
That's only 10 more cores!
I wonder how the inter-core communication will scale without packing 1000+ layers in the die.
Maybe Computers will never be as intelligent as Humans.
For sure they won't ever become so stupid. [VR-1988]
"Performance on this chip is not interesting," Mattson said. It uses a standard x86 instruction set.
How about developing a small efficient core, where the performance is interesting? Actually, don't even bother; just reuse the DEC Alpha instruction set that is collecting dust at Intel.
There is no point in tying these massively parallel architectures to some ancient ISA.
Imagine a Beowulf with all of the overhead, and none of the speed.
For simplicity's sake, the team used an off-the-shelf 1994-era Pentium processor design for the cores themselves. "Performance on this chip is not interesting," Mattson said. It uses a standard x86 instruction set.
Probably in future 1 million cores is minimum requirement for applications. We will then laugh for these stupid comments...
Image and audio recognition, true artificial intelligence, handling data from huge amount of different kind of sensors, movement of motors (robots), data connections to everything around the computer, virtual worlds with thousands of AI characters with true 3D presentation... etc...etc... will consume all processing power available.
1000 cores is nothing... We need much more.
Okay, I'm sure some high-end consumers would benefit from this, I think the majority of consumers will not. The number of multithreaded programs on my Windows computer can be counted on one hand I think. Java being the major one, if and only if the programmers want to program multithreaded.
At this point in time I'd rather have a dual core 3 GHz processor than a quad or octa core 2 GHz processor.
It's an interesting machine. It's a shared-memory multiprocessor without cache coherency. So one way to use it is to allocate disjoint memory to each CPU and run it as a cluster. As the article points out, that is "uninteresting", but at least it's something that's known to work.
Doing something fancier requires a new OS, one that manages clusters, not individual machines. One of the major hypervisors, like Xen, might be a good base for that. Xen already knows how to manage a large number of virtual machines. Managing a large number of real machines with semi-shared memory isn't that big a leap. But that just manages the thing as a cluster. It doesn't exploit the intercommunication.
Intel calls this "A Platform for Software Innovation". What that means is "we have no clue how to program this thing effectively. Maybe academia can figure it out". The last time they tried that, the result was the Itanium.
Historically, there have been far too many supercomputer architectures roughly like this, and they've all been duds. The NCube Hypercube, the Transputer, and the BBN Butterfly come to mind. The Cell machines almost fall into this category. There's no problem building the hardware. It's just not very useful, really tough to program, and the software is too closely tied to a very specific hardware architecture.
Shared-memory multiprocessors with with cache coherency have already reached 256 CPUs. You can even run Windows Server or Linux on them. The headaches of dealing with non-cache-coherent memory may not be worth it.
Why would you care to see one on your desktop? Do you have any use for one? There's a point where except for supercomputers enough is enough. We've probably already passed it.
I still have more fans than freaks. WTF is wrong with you people?
Linux can only go to 256 cores.
Uhmm no.
./arch/ia64/Kconfig: int "Maximum number of CPUs (2-4096)"
/arch/powerpc/platforms/Kconfig.cputype: int "Maximum number of CPUs (2-8192)"
In x86 we have:
config MAXSMP
bool "Enable Maximum number of SMP Processors and NUMA Nodes"
depends on X86_64 && SMP && DEBUG_KERNEL && EXPERIMENTAL
And I believe you can crank that dial all the way up
Also consider this: the number of cores in my desktop is doubling every year or two (and this is with a single core chip), 6 and 8 cores are cheap now, so we'll be at 1024 in roughly 7-14 years which makes sense because the GHz war is done and simply making more cores is relatively cheap (once you have the interconnect making a bigger CPU isn't all that hard).
Ok, you can cram 1000 cores into one CPU chip - but feeding all 1000 CPU cores with enough data for them to process and transferring all the data they spit out is gonna be a big problem. Things like OpenCL work now because the high end GPUs these days have 100GB/s+ bandwidth to the local video memory chips, and you're only pulling out the result back into system memory after the GPU did all the hard work. But doing the same thing on a system level - you're gonna have problems with your usual DDR3 modules, your SSD hard disk (even PCI-E based) and your 10GE network interface.
Right. The really interesting chips will arrive when you run between four and sixteen cores with the entirety of main RAM for those cores (in a NUMA configuration with other sockets, starting with maybe a gigabyte or so per die). You could then use SDRAM for both a paging file and for cache between the storage system and the processor/memory die.
You could map registers straight to portions of the on-chip memory if necessary for backwards compatibility. You'd probably be better off, though, compiling nearly everything to just use memory addressing. You'd only hit the SDRAM to load a new entire page into the on-chip RAM. On-chip cache and the circuitry to minimize misses in the cache could mostly go away, and the cores themselves could be simplified. You might even get away with moving the SDRAM controller back off-chip at first to free up some space on the die since the working memory would be so fast once the data was in it.
Unfortunately, this assumes billions of switches just for the main memory and probably quality control nightmares in the first several models.
However, it's the logical conclusion for the way forward. Caches keep taking more die space to deal with the fact that memory is so much slower than processors. Once you get over a certain size cache, you're just wasting circuitry on managing a large block of memory in little chunks that's better treated as a large single block of memory. The virtual to physical mapping already figures out what's in main RAM and what's out in the swap. Just let it do that with the on-die memory and eliminate the extra cache logic to make more on-die memory.
Intel has mentioned putting main memory on the die already. They even mentioned that they could do it with a form of DRAM rather than with SRAM.
It seem like I've been here before.
Seastead this.
http://www.sgi.com/products/servers/altix/uv/
2,048 cores (256 sockets) and 16TB of memory, one OS image.
Samsung took back my unlocked bootloader because Google wants me to rent movies. They're both evil.
The key difference between this research chip and the other Multicore chips Intel have worked on, like Larrabee, is that it is explicitly NOT cache coherent, i.e. it is a cluster on chip instead of a single-image multi-processor.
This means, among many other things, that you cannot load a single Linux OS across all the cores, you need a separate executive on every core.
Compare this with the 7-8 Cell cores in a PS3.
Terje
"almost all programming can be viewed as an exercise in caching"
Am i the only one feeling this is just a foray into multicore chips because they hit a brick wall when it comes to faster single core CPUs?
For many years (at least 5, possibly more) Intel has been telling developers that future performance gains will come from multithreading not faster clock speeds. So no, you are not the only one feeling this way. :-)
The first time was the i432 http://en.wikipedia.org/wiki/Intel_iAPX_432 Anyone remember that hype? Got to love the first line of the Wikipedia article "The Intel iAPX 432 was a commercially unsuccessful 32-bit microprocessor architecture, introduced in 1981."
The second time was the Itanium (aka Itanic) that was going to bring VLIW to the masses. Check out some of the juicy parts of the timeline also over on Wikipedia http://en.wikipedia.org/wiki/Itanium#Timeline
1997 June: IDC predicts IA-64 systems sales will reach $38bn/yr by 2001
1998 June: IDC predicts IA-64 systems sales will reach $30bn/yr by 2001
1999 October: the term Itanic is first used in The Register
2000 June: IDC predicts Itanium systems sales will reach $25bn/yr by 2003
2001 June: IDC predicts Itanium systems sales will reach $15bn/yr by 2004
2001 October: IDC predicts Itanium systems sales will reach $12bn/yr by the end of 2004
2002 IDC predicts Itanium systems sales will reach $5bn/yr by end 2004
2003 IDC predicts Itanium systems sales will reach $9bn/yr by end 2007
2003 April: AMD releases Opteron, the first processor with x86-64 extensions
2004 June: Intel releases its first processor with x86-64 extensions, a Xeon processor codenamed "Nocona"
2004 December: Itanium system sales for 2004 reach $1.4bn
2005 February: IBM server design drops Itanium support
2005 September: Dell exits the Itanium business
2005 October: Itanium server sales reach $619M/quarter in the third quarter.
2006 February: IDC predicts Itanium systems sales will reach $6.6bn/yr by 2009
2007 November: Intel renames the family from Itanium 2 back to Itanium.
2009 December: Red Hat announces that it is dropping support for Itanium in the next release of its enterprise OS
2010 April: Microsoft announces phase-out of support for Itanium.
So how do you think it will go this time?
Why is Snark Required?
How about developing a small efficient core, where the performance is interesting? Actually, don't even bother; just reuse the DEC Alpha instruction set that is collecting dust at Intel. There is no point in tying these massively parallel architectures to some ancient ISA.
Technically the cores are not executing x86 instructions. For several architectural generations of Intel chips the x86 instructions have been translated into a small efficient instruction set executed by the cores. Intel refers to these core instructions as micro-operations. An x86 instruction is translated on the fly into some number of micro-ops and these micro-op are reordered and scheduled for execution. So they have kind of done what you ask, the problem is that they don't give us direct access to the micro-op instructions set.
Intel tried to move beyond x86 with the Itanium and the market said no. The market also said no to Alpha and PowerPC, both of which had consumer oriented Windows NT 4 support. Even Apple had to give up on PowerPC and they were part of the PowerPC consortium. There is no Intel x86 conspiracy, they are trapped too.
Actually I'd say the thing that is scaring the crap out of Intel is that "good enough" was passed for most folks quite a few miles back. I have several customers as well as my GF on late model P4s and you know what? Most of the time those 2.8GHz+ machines are sitting there twiddling their silicon thumbs. The simple fact is Youtube, FB, email, and surfing just don't take that much juice. And I'm sure the fact that those I've been able to upsell to new multicores only did so because AMD is really cheap now certainly don't help Intel none either.
Which brings me to TFA which I'd say just shows how Intel don't seem to see the real problem: The problem is that parallel programming ain't easy and most apps just don't scale well past a couple of cores. There just hasn't been a "killer app" for pushing the masses to true multicore computing. While I know that TFA is directed towards servers pushing major code that really is a small niche compared to the consumer space. What Intel and AMD need to do is find that "killer app" that will get all those running those late model P4s to drop them like a bad habit for the new hotness. Hell I usually have my family on the fast track to new hotness because I like to game, but my boys have been playing MMOs just fine with my P4 hand me downs so I really don't even see a point to upgrading. There really hasn't been any "killer app" to push adoption like we saw in the MHz race. Hell even hardcore gaming (a pretty tiny but tech heavy niche) hasn't really seen any benefits above going dual, with few games gaining in triple much less quad. If Intel and AMD want to push multicores somebody really needs that "killer app" to come out and stat.
ACs don't waste your time replying, your posts are never seen by me.
Do 1024 cores constitute a kilocore? Or 1000? I'd love to see that debate move from hard disks to processors.
Bingo Dictionary - Pragmatist, n. A myopic idealist.
You need a different programming model. Our current imperative programming languages inherently assume a single thread, with multi-threading as a huge lump on the side. In a multi-programming model, something like (say) a compiler would code-generate for every function in parallel without ever being asked explicitly to do so. Each function is a stand alone unit, so they can be done in parallel; in an appropriately designed system they would be done in parallel. In GUI programs, updating separate elements of the display would be done independently without needing to ask for it.
But we need a completely different programming paradigm to achieve this. Functional programming might be that paradigm - or it might not. The point of the chip in the original article is to allow researchers to work on this problem. As the article says, the performance of each core in the the chip is very pedestrian. But if researchers can develop software tools that allow the chip to perform at, say, five times the performance of a single core (on a 48 core machine) without programmers having to partition threads explicitly, they will have achieved what the project is about.
Consciousness is an illusion caused by an excess of self consciousness.
You're having a supercomputer on your desk right now. It's called a "GPU", and most likely, it sports many hundred cores. Oh, and the killer app you mean, that's whatever latest DX11/Opengl4 game you prefer.
Experiments and other stuff
Get me as many cores as needed so Windows will stop pausing to open a folder even on a freshly formatted computer. Instant, instant instant....
Okay, I'm sure some high-end consumers would benefit from this, I think the majority of consumers will not.
As a game developer I have to say consumers could benefit. And no I am not necessarily thinking about more graphical eye candy. For example I would like to have hundreds of cores working on AI for computer controlled characters/units.
"Enough with this sillyness already, we don't need a supercomputer! Now let me get back to play my latest DX11 compute enabled game with the awesome physics and graphics."
Experiments and other stuff
Why not? He/she built a cluster for no use at all other than learning and fun. I can easily see the "use" for 1k cores with Intel's apparent interest to get into the 3d market or at least destroy Nvidia and ATI (something AMD has already done in name but that's beside the point). For clusters it's a no-brainer to keep adding cores if you can increase performance per watt ratio with each additional core. For desktops there likely will be a point where enough is enough, but I disagree that we've passed it. Software designers are still keeping up quite quickly with any headroom new hardware creates.
Why? :) I know. meme. It's just, I've built a couple Beowulf clusters for fun, and didn't have an application written to use MPI (or any of the alphabet soup of protocols), so it was just an exercise, not for any practical use. It's not like most of us are crunching numbers hard enough to need one, and it won't help out playing games or even building kernels.
I'd like to see a 1k core machine on my desktop, but that's beyond the practical limits of any software currently available. Linux can only go to 256 cores. Windows 2008 tops out at 64. But hey, if they did come to market, I know who would be first to support all those cores, and it doesn't come from Redmond (or their offshore outsourced developers).
ummm no. Windows 2008 can handle 64 SOCKETS, it currently scales to 256 cores
Video editing can use it, photo editing can come close and games that model 3D environments do some trivially parallel stuff where more processing helps. :)
I want to see this on the desktop so that it drives down prices for cluster nodes for geophysics, FEA etc
This model does allow for 1000 times the BSOD dosage!
Why would you care to see one on your desktop? Do you have any use for one? There's a point where except for supercomputers enough is enough. We've probably already passed it.
It depends on what you what to do with those cores. Just running an Office application will not tax more than one or two cores since these type of applications are effectively real time and are not cpu instensive. In may respects Games also fall into that category with many modern games making more use of the graphics processor than the cpu's.
Having multiple cores is very useful when your application is cpu intensive and can fork processes onto as many cores that are available. a simple example of this is a video format converter which is very cpu intensive rather than I/O intensve. I run the video converter called HandBreak under Fedora 14 which can easily hammer my Intel i7 processor. This raises my load average to over 9 with each core running at approx 90% and you can really feel the heat (approx 90 deg C on the cores) being extracted by the fan.
Actually the biggest problem with multiple cores is heat and how to get rid of it as well as latency between processors and memory although according to the article Intel researcher Timothy Mattson has suggested how to get around that problem in a white paper.
There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
Isn't that Intel's pet project for the last decade?
No sig today...
Linux can only go to 256 cores. Windows 2008 tops out at 64.
Linux supports more than 256 cores.
MAINLINE:
Maximum number of CPUs / CONFIG_NR_CPUS:
This allows you to specify the maximum number of CPUs which this kernel will support. The maximum supported value is 512 and the minimum value which makes sense is 2. This is purely to save memory - each supported CPU adds approximately eight kilobytes to the kernel image.
I know SGI has systems running 4096 CPUs with SUSE Linux.
"I mean having a 12' tall Toyota Hilux or a 1,000 core computer has to be BYOV, Bring Your Own Vibrator time."
I, for one, find that combination vaguely arousing.
"This post is an artistic work of fiction and falsehood. Only a fool would take anything posted here as fact."
depends on X86_64 && SMP && DEBUG_KERNEL && EXPERIMENTAL
And I believe you can crank that dial all the way up
Also consider this: the number of cores in my desktop is doubling every year or two (and this is with a single core chip), 6 and 8 cores are cheap now, so we'll be at 1024 in roughly 7-14 years which makes sense because the GHz war is done and simply making more cores is relatively cheap (once you have the interconnect making a bigger CPU isn't all that hard).
Don't you worry, the GHz war is not done!
There's talk of exotic materials (SiC, diamond, etc...) going to 10 GHz. If someone figures out how to make the Rapid Single Flux Quantum digital chips with high temperature superconductors, then we may seriously start to see 1 THz clock speeds in practical computers, using extreme Peltier cooling to get the CPU core down to cryogenic temps.
Pretty much anything that I've written in Erlang uses (at least) a few thousand concurrent processes. I've never tried running it on more than a 64-core machine, but when I moved stuff from my single-core laptop to a 64-core SGI machine the load was pretty evenly distributed.
It's pretty easy to write concurrent code that scales as long as you respect one rule: No data may be both mutable and aliased. You can do this in object-oriented languages with the actor model, but languages like Erlang enforce it for you (at the cost of a few redundant copies).
I am TheRaven on Soylent News
I will need to buy a pair of sunglasses, and crush them when I find that the new Intel processor has over 9000 cores.
Isn't it just a #define in the source code?
No sig today...
Obligatory XKCD ref: http://xkcd.com/619/
Because of the limited number of instructions, you have more instructions for a logical operation, e.g. multiply (although many risc cpu's have that operation), so this means you have to load more bytes from ram to do the same thing as a CISC instruction with lesser bytes than the whole piece of code for the risc. As cpu speed vs. ram / bus speed is skewed, it's more efficient to have instructions which take maybe a bit more bits, but on average they don't really take that much more and have microcode on-die to handle them, instead of having to load alot of risc instruction bytes from ram for doing basic operations a cisc can do through microcode. As long as the memory speed/busspeed is not exactly the same as the cpu speed (like on the ps3 where memory/bus runs at 3ghz, equal to the cpu) but slower, risc isn't always more optimal.
Never underestimate the relief of true separation of Religion and State.
Well the killer app really is video transcoding. One thing holding that back is the DMCA. I should have the option to transcode a DVD or BlueRay and put it on my mobile device, netbook, or tablet as simply as I do CDs. Yes I can get Handbrake but I am talking about with iTunes, Zune, or the any other mainstream software package.
What we want is to have that ripped and transcoded in just a few under five minutes.
But other than that you are correct most users have reached good enough a while ago. What everyone but the manufactures want is cheaper and more power efficient.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
The paper referenced in the arcticle can be found here.
Fascinating that MPI works that well unmodified.
sounds like a coarse-grained FPGA haha!
The GHz war is over. The speed of light won. A long time ago, it stopped being "all about the transistor" and started being "all about the wires". IBM won the race to copper in 180nm (back when it was 0.18um), and that helped make those technologies even better, but about the time we hit 90nm, semiconductors were "fast enough", or even by some measurements stopped being able to speed up. Since then, almost all speed increases have been largely (but not exclusively) due to the transistors getting smaller, reducing the distance wires need to go.
The RC delay of wires is the major problem. R isn't going to be getting much better than copper. Silver has a lower resistance by a little bit, but it's too reactive to be used anywhere real. In these geometries, any alloy would be insufficiently mixable to be reliable, to say nothing about more exotic materials (like ceramics). There's some room for improvement in the dielectric (the "C"), but by the time you make a box with corners covering water permeability, thermal coefficient of expansion close to the wires, mechanical properties friendly to sub micron manufacturing, you have to concede you're not going to be able to get more than 20% faster there (and that we could dispute separately).
Take a cache. The slowest path is having a memory cell read. That tiny little device needs to have a measurable change in voltage on the bitlines, and be sensed by a sensing structure. That sensing structure has nothing to do with storage, so it's pure overhead and thusly you want as few of them as possible. Can you have it 16 bits away? 32? The days are gone that it was 64 bits away for any meaningful performance. There's nothing you can do to the characteristics of that little device (which needs to be minimum feature size to maximize the density of the cache) to dominate over the characteristics of the bitline he's trying to affect.
Take a data path. Even if 95% of your data is highly predictable, easily pipelined stuff with local signals, your critical path is going to involve signals from other areas of the chip, and they're going to have to be rebuffered and trucked from hundreds of microns away. No giant buffer in the history of man can dominate over a long distance wire. The signal will show up "eventually".
3GHz is a good place to stop. We make it to 4GHz with compromises in power, but beyond that and you're dedicating so much of your chip to rebuffering that you're blowing a lot of power on that. At that point, your pipeline is so many stages that branch mispredicts are very painful. You're devoting so much of your cycle time to setup and holds for your latches that you're going to be embarassed at how little work you can do in each cycle.
1 THz clock speeds are on their way, and maybe even higher. But they're not useful to CPUs or GPUs. They're useful for more exotic applications, primarily technology demonstrations.
I hear a truckload of kleenex's just got delivered into Ellisons office when he heard this news.
Calling someone a "hater" only means you can not rationally rebut their argument.
There's a world of difference between massive # of regular cores--which, if harder to program for is well-understood--and the Itanium, which introduced a whole new concept with its EPIC architecture. The EPIC architecture seemed like a good idea--let the compiler take care of most of the instruction re-ordering, and get rid of branch predictions where at all possible by introducing speculative instructions in its stead. But as it turned out, writing a good compiler for this architecture is hard if not impossible...
I don't know, Intel is making money hand over fist selling Xeons used in data center blades. A 1k-core chip would fit in quite nicely there. As for the desktop good-enough stuff, that's what the Atoms are for ;)
Tsunami -- You can't bring a good wave down!
Erlang with it's CSP-style message passing would seem to fit this chip perfectly, as well as Go for example. Atleast if thread-like constructs of those languages would run across the separate operating system instances on each SCC core.
Distributed Plan 9 system sounds like a good match also. Communication with pipes from core to another should be quite fast and programs can still be built as serial filters. Work on Barrelfish is somewhat interesting too.
But can you keep adding memory links? and IO links?
As 1000 cores may be cool but to make full use you may need 6-12+ ram channels and maybe 2+ QPI links. But RAM is more needed then IO some times. But if you are working with a lot of data then you may need 1 QPI link just to the SDD bank / raid system.
According to Intel it's Single-chip Cloud Computer. :P
tlax says: "Lol".
The article is talking about targeting 1000 cores per chip (in x86 made efficient by fancy translating filters that consume chip real estate worse faster than Hummers consume gas).
Man, you're insane.
And I guess you don't believe in dust. Or maybe you don't believe testing processors costs money.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
How about we make a balance. I say a roughly logrithmic curve of processors/power of each processor ratio. Take 1 very, very powerful core, then 2 cores half the power of that, then 4 cores half the power of those, then 8, then 16, then 32, then 64, then 128, then 256, then 512. At that point, you have over 1000 cores, and have the ability to do anything you want with ridiculous speed and power, be it rendering thousands of simple tasks, or burning through a single mammoth thread, and everything in between.
Where is the mod rating for "scary"? Also,
Still have compiler writers that don't understand that unrolling code is not usually a real win, overall.
And the Itanium was designed for exactly that kind of optimization, as if a compiler is always supposed to be able to predict execution path in real-time execution.
Kind of like the time I tried to write a user interface in CoBOL.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
err, barking up the right tree.
But they are still barking.
x86!
Marketing magic will always prevail over reality!
(That's what Moore's law really said.)
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Then your clustering software sucks ass. I do distributed builds using xcode/xgrid rather often. Of couse, with a 1000 cores ... you'd just make -j 2000 and accomplish the same thing without some silly cluster.
I hope you realize just because you can't buy a boxed version of Windows from Staples/BestBuy/Whatever that supports more processors, versions supporting FAR more processors already exist from Redmond for custom hardware ... which is what you start talking about when you start talking 256 processors, its pretty much all 'custom' even when its a pretty generic version of 'custom'
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
I'd call it more of a testament to how much intel's fanatacism can induce them to waste all the benefits of Moore's law supporting baggage that was unnecessary when the x86 was "invented".
Just for the marketing department's black magic.
Instruction efficiency? Compact code? There are numerous processors that wax the floor with x86 in those departments, but marketing department's black magic killed the market.
Magic? It's all parlor tricks, you know, pay a researcher here to slip a little excess code in a tight loop on that 68k "benchmark", that sort of thing. The problem with the old saw about magic being indistinguishable from advanced tech is that magic is not about real results. Magic is about illusion. The confusing point is that illusion can be turned into reality with some effort.
In the x86 case, it was a huge lot of effort justified by a huge load of hubris and the needs of the black magic department, a vicious cycle.
x86 is a significant contributor to global warming (which is part of the reason some people want to deny the reality of human impact on the climate changes).
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
has interesting ideas and it is difficult to follow them :)
IMHO the biggest problem with these multi-core chips is the lock latency. Locking in heap all works great, but a shared hw register of locks would save a lot of cache coherency and MMU copies.
A 1024 slot register with instruction support for mutex and read-write locks would be fantastic.
I'm developing 20+Gbps applications - we need fast locks and low latency. Snap snap!!!
I said no... but I missed and it came out yes.
Talk is cheap, show me the cores.
Stupidity is an equal opportunity striker.
Fellow slashdotter Bill Dog
No data may be both mutable and aliased
Perhaps a little off topic but do you know an online article that explains this in detail? (I'm writing my first concurrent server at the moment (in Go) and could use any information on the topic)
Is 1Thz the best we can hope to get then?
Why OpalCalc is the best Windows calc
As point of comparison, the "Radeon HD 5970" graphics card has two 1600-core processors.
I don't know what extra detail you need - the rule should be pretty self explanatory. If something is shared between two or more threads, it should be immutable. If something is mutable, only one thread / process should hold references to it.
The only exception to this rule is explicitly synchronised communication objects (message queues, process handles, and suchlike). If you follow this rule, then the only concurrency problems that you will have are caused by high-level design problems, rather than by low-level implementation problems.
Erlang enforces this by only having one mutable object: the process dictionary, which is only accessible by the process that owns it. Everything else is immutable.
I am TheRaven on Soylent News
diamond's not very exotic. just sayin'
Photoshop has been stuck at 2 processors for Way too long. Software companies have been lagging behind hardware far too long. Until I see See more software taking advantage of cores of more than 1 or 2... I'm not wasting money on them.
According to benchmarks, a functional language like Erlang is slower than C++ by an order of magnitude. Sure, it can distribute processing over more cores, which is the only thing that enabled it to win one of the benchmarks. I suspect that was only because it used a core library function that was written in C. So no, if you want to write code with acceptable performance, DON'T use a functional language. All CPU intensive programs, like games, are written in C or C++; think about that.
Problem is, those extremely complex NURBS surfaces have all sorts of trims around the perimeter and in the surface itself to match bolt holes, and other components. It's not just a simple regular square or triangular patch.
Calculating the local tangent space of each point of a regular or N-sided patch, isn't too difficult (tangent, normal, binormal), it's all the trimming that takes up the time. Just a single bolt with a spiral thread is going to generate a whole bucketload of triangles per revolution of that thread.
Another complication is that the CATIA file format isn't simply a geometry files, it's more of relational database entry, where everything is cross-referenced to the manufacturer, specification standard, measurement units, required sub-parts. That way, you just have one file, it pulls in everything else that you need to view that one part.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
I always remember the Intel i860. Another attempt to create a graphics processor (or coprocessors as they were called back then). It had special instructions for perfoming combo Z-buffer and color buffer test and writes as well as vector processor instructions. They made it into early SGI workstations.
The full-page glossy advert pages of BYTE magazine used to have these pictures of really impressive (at the time) systems with transputer/i860/TMS34020 boards. Some with their own network and hard disk drive ports (the PC was too slow at the time to handle the data transfer). But every time a board came out, six months later, CPU's would have caught up and these boards/chips would become known as "graphics deaccelerators".
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Why would you care to see one on your desktop? Do you have any use for one?
You got that right. I've never used more than 639 K of RAM either.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
I'd like to see a 1k core machine on my desktop, but that's beyond the practical limits of any software currently available.
After 1k core machine becomes commonplace it will be required to boot windows.
There's a good analysis over at the daily circuit that dissects what is being suggested by Intel, who is not just talking the death of within-node shared memory, but also of intercore asynchronous message passing. I liked the comparison with sending emails with large video attachements, instead of YouTube links, and then requiring the recipient to clear the inbox before a new email can be received. http://www.dailycircuitry.com/2010/11/intel-talks-kilocore-processors.html
Never say never; there's a whole third dimension to explore. I have no doubt in my mind there will be solution for these problems but to reach there we must think out of the box.
Pretty amazing achivement for a microsoft os. Does 64 socket limit include named pipes or is it just tcp/ip sockets?
US-UK-Israel: The real Axis of Evil
Thanks a lot for the clarification.
I don't know what you're responding to, but it's certainly an interesting view, I take it you speak of tessellation shading. As it happens I did a bit of tinkering around with that (http://codeflow.org/entries/2010/nov/07/opengl-4-tessellation/). It's true that generic displacement mapping is more expensive, since you need to calculate the smoothed mesh before you can displace it. However, even though it makes things near the point of view more expensive to display (but also better looking), it's a great boon at scaling detail down smoothly, so it actually ends up being faster and better looking (because most further away things have just the detail required, and aren't hugely overdrawn).
Experiments and other stuff
Dammit dude. Blow the dust out of your case.
I remember installing Wing Commander on a Pentium processor. Normally it ran on a 486. It sped up the game. By about 20 times. You launched and you were half a map away from the combat before you could turn around. When you were pointed at it you held down the trigger and flew threw microsecond long explosions. Then you were half the map away again. You got used to it though.
You'd sort of expect that, with all the processor enhancements since, that Microsoft Office would open faster than in 1995. But you know what, that speed of opening scaled fairly well -- just a few seconds then, a few seconds now. Not sure what happened with Office XP. I'm thinking 1000 cores won't save my Firefox from taking up 500 MB of memory so I'm still out of luck there.
Patience, young whatever-you-are.
No, you're right: Intel should go ahead and start building a one million-core chip now. We need it now to...uh....
"Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
Nope, sorry, not even close. How many multicore CPUs were sold by Crysis? My guess very damned few, maybe even none, as those that bought Crysis were just like my "Must win teh benchmarkz lol!" ePeen customers.
No what I'm talking about is something like what Visicalc did in the 80s, or video playback in the 90s. Both of these were jobs that A.)large masses of people wanted to do, and B.)large masses could see instant benefit from.
The simple fact is the big games right now are NOT ePeen games, but MMOs. And those run just fine on a late model P4 with a $50 AGP card. Thanks to the PC gaming development being tied to the consoles which look like they may go another 5 years without refresh gaming simply is no longer the app that drives technology and as proof see Eyefinity and CUDA. Both techs were cooked up as desperate attempts to push GPUs that simply wouldn't sell otherwise. This is also why the "sweet spot" in terms of sales is no longer the $250 GPU, but the $100 one. There simply isn't enough content requiring the $250 one to make it worth the extra expense for the majority.
So if Intel and AMD want true multicore processing for the masses then they need to be pushing for that next killer app, one which will spur adoption. Because right now there simply isn't anything I've seen that would upsell most folks on it. Just look at how many shitty Pentiums and Atoms Intel sells each year VS their top o' the line Core series. I bet their cheapo shitty chips sell by an order of magnitude higher than the good chips, simply because on day to day apps most folks won't tell a difference. And GPUs are highly specialized vector processors, and TFA are talking about X86, so the comparison isn't even apt.
ACs don't waste your time replying, your posts are never seen by me.