Jonathan Schwartz Shows 32-Way UltraSPARC Chip
Megaslow writes "The latest entry in Jonathan Schwartz's blog has pictures of Sun's Project Niagra chip, with 8 cores * 4 threads per core for a 32-way computer on single chip. He also shows what looks to be a test rig reportedly already up and running Solaris 10."
2. People not happy with big blue can migrate to another vendor without having to take an OS change into account. That means less lock in.
Well, not exactly. If they are running IBM's POWER processor, then they can't really move their applications to another vendor, as no-one but IBM "does" POWER. They could move to another platform and still run Linux (say, x86 for example), and manage to apply _most_ of their sysadmin experience - but any proprietry, binary-only applications running on that box would have to either be bought again or re-licensed. So there would be an OS change, even if it's only from one architecture to another.
-Mark
The chip is supposed to run very cool and is aimed at webservers and similar applications, it's not a 32 way Ultrasparc IV that you can do lots of floating point operations on.
The entire chip shares a couple of floating point units, it's not a number cruncher.
This isn't made for rendering, or for stuff that requires floating point power.
Since the chip has 8 cores, each core is quite simple. This kind of chip is more suited for database and web servers, where there are lot of simultaneous requests, but fullfilling a single request is quite simple task.
You can find more information about Niagara here.
Software is already written to take advantage of it. When the Apache receives a HTTP request, it will start a new thread to handle it and continues to listen for more requests. 30 simulatenous requests => 31 threads for Apache alone.
The accomplishmeny is not just that the fact that each CPU has an 8-way core (which is kind of cool since just about everyone else only has either a dual or quad setup). It's also the fact that the rest of the system (I/O buses, memory, disk, network) is designed to do a decent job of keeping the CPUs fed so they do useful work.
That's one of the reasons why Sun boxes are so expensive: they're designed from the ground up to keep running. I know of people who still use Sun LXes (9+ year old system) running Linux/BSD for light-weight FTP servers. They run just as well as when they were first purchased (original hard drives). Then there are the Sparc 5s, 10s and 20s.
(Though I find the Ultra 5s and 10s "cheap".)
The point isn't really to build boxes with a very large number of CPUs but to provide modest SMP-type capabilities on a single chip. Being on a single chip has potential advantages with regard to cache sharing and also lower power and motherboard real estate requirements than a number of single core chips on a motherboard.
These multi core sparcs seem to be aimed at application serving and the like where you may need to have a high density of CPU cores but a series of single CPU or SMP boxes will do the job. Modestly sized SMP boxes are attractive as the total number of OS installs, power supplies, and other error prone components as compared to N boxes is lower. If the cores are all on one chip and save you space over a 4 CPU SMP system then you can fit more in the same rack space. If they are lower heat output than a 4 CPU SMP system again you gain.
...all running slower than a 386 per thread. I can't wait!
Seriously though, this is cool. But I think this is taking things to the extreme, the bottom line is number of IC per mm^2 and effeciency win at the end of the day.
Sun's track record at producing complicated chips, or even chips on time is lackluster. The cost for their boutique chip and it being late will probably kill them in the long run.
It remains to be seen, but my guess is that each core will perform at a level that is far from industry leading, and the aggragate performance doing "real work" like serving web pages or running apps will be less inspiring than dual/quad based servers from Intel/AMD/IBM with their respective chips.
This statement is true, but...
HT or a 2nd CPU will get you somewhere between 10 and 20% boost on your software. It does this letting OS operations like disk IO, video, etc, run on the second "CPU".
If you learn to write good threaded code, you can see nearly 100% speed increase per CPU. That's the difference.
And just to turn this into a more interesting thread, I do with this Java. :) And yes, it scales nicely if you use it properly.
Anyway, when people talk about how no one knows how to take advantage of HyperThreading or multiple CPU machines, this is what they are referring to. 10% boost versus 100% boost.
Agile Artisans
I think they're particularly looking at things like the C10K problem (http://www.kegel.com/c10k.html).
The new Solaris 10 networking code reputedly pays a lot of attention to exploiting, and serving threads well, particularly hardware multithreading if it's available.
If they could squeeze one of these and maybe 8GB+ of RAM into a 1U box or into their blade centre, then I think it'd do quite nicely for serving web.
...an Englishman in London.
http://www.stuwo.net.nyud.net:8090/temp/sdtemp/jon blog.html
should load fast.
The current problem in compuring is that memory speeds are going up far slower than processor speeds, causing huge cache-fill delays. Sun came up with a simple architecture to keep the processors running anyway, and it is compatable with multiprocessing and multithreading:
1. Run decoder A until cache blocks on a read
2. Clear ALU and switch to decoder & register file B
3. Run B until cache blocks on a read...
Given this much raw compute power from the same size (and price-range) silicon, the marketplace will rapidly multi-thread or at least multi-instance their programs. They've already done the latter to run on Beowulf clusters, after all!
--dave
davecb@spamcop.net
There's little point in Sun fighting with intel and amd to produce the highest Mhz chip. They can buy opteron's, stick them in boxes and provide a good low to mid-end system.
OTOH Solaris is already REALLY good at multitasking. The system i'm typing this on has almost 5,000 threads, it's at 80% utilization and it's still very responsive.
As you put more tasks onto a single CPU it'll have to burn more and more cycles doing context switches and suffer from register starvation.
Plus large boxes benefit from economies of scale and can have features that aren't practical in smaller ones:
When a CPU fails the system can take that motherboard out of circulation, then the admin can replace it at their convenience. Same for memory and psu's. Usually no downtime.
Plus we already know that it takes less resources to admin a unix machine than a windows box. Now consider a 144 CPU x 32 Core machine. Even IF it could only handle the workload of 500 windows servers the admin costs are slashed further.
Also consider that the cache might be shared, but then consider that all those cores will most likely be running the same application. I'm sure there's lots of code within oracle or java that gets reused frequently by all the processors. An eightcore chip with 16MB of cache will naturally be able to cache much more of the shared resources than 8 cpu's with 2MB cache.
An alternative is to design your threads so that they will be using different parts of the chip (for example, run a floating point intensive thread and an integer intensive thread at the same time). This, however, will only work in very specific environments.
I am TheRaven on Soylent News