Jonathan Schwartz Shows 32-Way UltraSPARC Chip
Megaslow writes "The latest entry in Jonathan Schwartz's blog has pictures of Sun's Project Niagra chip, with 8 cores * 4 threads per core for a 32-way computer on single chip. He also shows what looks to be a test rig reportedly already up and running Solaris 10."
2. People not happy with big blue can migrate to another vendor without having to take an OS change into account. That means less lock in.
Well, not exactly. If they are running IBM's POWER processor, then they can't really move their applications to another vendor, as no-one but IBM "does" POWER. They could move to another platform and still run Linux (say, x86 for example), and manage to apply _most_ of their sysadmin experience - but any proprietry, binary-only applications running on that box would have to either be bought again or re-licensed. So there would be an OS change, even if it's only from one architecture to another.
-Mark
The entire chip shares a couple of floating point units, it's not a number cruncher.
This isn't made for rendering, or for stuff that requires floating point power.
Since the chip has 8 cores, each core is quite simple. This kind of chip is more suited for database and web servers, where there are lot of simultaneous requests, but fullfilling a single request is quite simple task.
You can find more information about Niagara here.
This statement is true, but...
HT or a 2nd CPU will get you somewhere between 10 and 20% boost on your software. It does this letting OS operations like disk IO, video, etc, run on the second "CPU".
If you learn to write good threaded code, you can see nearly 100% speed increase per CPU. That's the difference.
And just to turn this into a more interesting thread, I do with this Java. :) And yes, it scales nicely if you use it properly.
Anyway, when people talk about how no one knows how to take advantage of HyperThreading or multiple CPU machines, this is what they are referring to. 10% boost versus 100% boost.
Agile Artisans
I think they're particularly looking at things like the C10K problem (http://www.kegel.com/c10k.html).
The new Solaris 10 networking code reputedly pays a lot of attention to exploiting, and serving threads well, particularly hardware multithreading if it's available.
If they could squeeze one of these and maybe 8GB+ of RAM into a 1U box or into their blade centre, then I think it'd do quite nicely for serving web.
...an Englishman in London.
The current problem in compuring is that memory speeds are going up far slower than processor speeds, causing huge cache-fill delays. Sun came up with a simple architecture to keep the processors running anyway, and it is compatable with multiprocessing and multithreading:
1. Run decoder A until cache blocks on a read
2. Clear ALU and switch to decoder & register file B
3. Run B until cache blocks on a read...
Given this much raw compute power from the same size (and price-range) silicon, the marketplace will rapidly multi-thread or at least multi-instance their programs. They've already done the latter to run on Beowulf clusters, after all!
--dave
davecb@spamcop.net
There's little point in Sun fighting with intel and amd to produce the highest Mhz chip. They can buy opteron's, stick them in boxes and provide a good low to mid-end system.
OTOH Solaris is already REALLY good at multitasking. The system i'm typing this on has almost 5,000 threads, it's at 80% utilization and it's still very responsive.
As you put more tasks onto a single CPU it'll have to burn more and more cycles doing context switches and suffer from register starvation.
Plus large boxes benefit from economies of scale and can have features that aren't practical in smaller ones:
When a CPU fails the system can take that motherboard out of circulation, then the admin can replace it at their convenience. Same for memory and psu's. Usually no downtime.
Plus we already know that it takes less resources to admin a unix machine than a windows box. Now consider a 144 CPU x 32 Core machine. Even IF it could only handle the workload of 500 windows servers the admin costs are slashed further.
Also consider that the cache might be shared, but then consider that all those cores will most likely be running the same application. I'm sure there's lots of code within oracle or java that gets reused frequently by all the processors. An eightcore chip with 16MB of cache will naturally be able to cache much more of the shared resources than 8 cpu's with 2MB cache.
An alternative is to design your threads so that they will be using different parts of the chip (for example, run a floating point intensive thread and an integer intensive thread at the same time). This, however, will only work in very specific environments.
I am TheRaven on Soylent News