Slashdot Mirror


Sun To Release 8-Core Niagara 2 Processor

An anonymous reader writes "Sun Microsystems is set to announce its eight-core Niagara 2 processor next week. Each core supports eight threads, so the chip handles 64 simultaneous threads, making it the centerpiece of Sun's "Throughput Computing" effort. Along with having more cores than the quads from Intel and AMD, the Niagara 2 have dual, on-chip 10G Ethernet ports with cryptographic capability. Sun doesn't get much processor press, because the chips are used only in its own CoolThreads servers, but Niagara 2 will probably be the fastest processor out there when it's released, other than perhaps the also little-known 4-GHz IBM Power 6."

24 of 214 comments (clear)

  1. Trust me... by jd · · Score: 4, Insightful

    ...If they put THESE under the GPL, along with the T1, they'd be getting more press than they could imagine. If they used these a bit more aggressively - such as using them as a graphics processor on a PC - they'd be getting some amazing press. If they keep them locked in a server closet, it's only then that nobody will care.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Trust me... by LarsWestergren · · Score: 4, Informative

      ...If they put THESE under the GPL, along with the T1, they'd be getting more press than they could imagine.

      http://www.opensparc.net/

      They are openly discussing making the Niagara 2 available as open source as well, but note that there are some roadblocks such as the US government's restrictions on crypto technology.

      --

      Being bitter is drinking poison and hoping someone else will die

    2. Re:Trust me... by TheRaven64 · · Score: 5, Informative

      If they used these a bit more aggressively - such as using them as a graphics processor on a PC - they'd be getting some amazing press A modern GPU is fairly similar in design to the T2, but there are a few key differences:
      • The T2 is mainly focussed on integer ops with only one floating point pipeline per core. A GPU typically is close to 100% floating point pipelines, and doesn't bother with integer arithmetic.
      • The T2 uses multiple contexts to hide memory latency, mostly caused by incorrectly predicted branches. A GPU typically doesn't bother much with branch prediction, since it runs code that is very light on conditional branches (on average, branches happen every 7 ops in general purpose code. In GPU code, they happen every few hundred).
      • GPUs usually focus on 4-way vector instructions, since most of their data is of this form (RGBA colours, XYZW vertexes). The T2 only has scalar instructions.
      I posted in my journal recently suggesting that it would be easier to produce a modern GPU than an older card, since modern GPUs have much less application-specific logic and do more in software, relying on just having lots of cores / pipelines to give speed.
      --
      I am TheRaven on Soylent News
    3. Re:Trust me... by CryoPenguin · · Score: 3, Informative

      You're looking for the Open Graphics Project. But hardware is hard to design and expensive to fab, you're not going to get an Xtreme3D Graphics Accelerator competitive with the latest from NVIDIA or ATI.

  2. Good floating point too by imroy · · Score: 4, Interesting

    This processor will also have a floating-point unit for each core, unlike the UltraSPARC T1 (Niagara) which only had one shared amongst all 8 cores. This should make it much more suitable than the T1 for a wide variety of applications. The T1 did great on multithreaded server-type tasks (e.g web, email, database) but would have been pretty hopeless for anything doing more than a bare minimum of FP work.

    1. Re:Good floating point too by dread · · Score: 3, Informative

      Correct. At my last employer we found this out the hard way. Most servers were getting great performance but the one that actually did some (and it wasn't much really) FP work was horrible. This should really remedy that problem.

      On the other hand, SUN still suffers from the fact that ETCA is getting more and more mindshare in the telco arena which has been one of their major cash cows. It will be real interesting to see how that pans out in the end.

      --
      I've had a wonderful time, but this wasn't it -- Groucho Marx
  3. Interesting by ShakaUVM · · Score: 4, Interesting

    I like it. In my work with high performance computers, a significant limiting factor in a lot of our tasks was the interprocessor bandwidth. The Niagra2 has a crossbar, with a huge amount of bandwidth available between the different cores and their L2 caches.

    I'd like to see some benchmarks, and more technical specs, on these babies.

  4. Regurgitating "Quad" market speak by Eukariote · · Score: 5, Informative

    Along with having more cores than the quads from Intel and AMD...
    What quad from Intel/AMD? Intel is selling two dual cores on a cracker. The "quad" bit is just marketing, the actual silicon chips are pure dual core designs that have to talk across the front side bus just as in a two-socket server. And AMD has so far only been previewing their quads, you can't buy them yet.
  5. quad is a quad and I want a cheap 8-way desktop by OrangeTide · · Score: 3, Interesting

    customers just want to fit 4 cores in one socket. That's all that matters. That you can get a 1U with two sockets and put 8 intel cores in it under under $2k is a big deal right now.

    That said I've always wanted to get my hands on some of these new multicore UltraSparcs. I think they have a lot of potential, and the new ones seem extremely powerful.

    Now if only Sun would but the low end one in a mac mini form factor and sell it as a java developers kit then maybe I could play with one. The low end sun fires are something I could almost afford, but I don't really want to keep a 1u on my desk just to try out the technology.

    I think the big 64-bit address space and the ability to run lots of threads seems to fit well with Sun's Java. Not that I am a Java developer, I just think it's a good match, and it seems to be that's why people were using the older CoolThreads systems, enterprise Java.

    --
    “Common sense is not so common.” — Voltaire
    1. Re:quad is a quad and I want a cheap 8-way desktop by brucmack · · Score: 4, Insightful

      In terms of chip design, the circuitry on the silicon is what matters, not how you package, integrate, or market it.

      I agree with you on this point.

      Moreover, it does matter to a customer if marketing speak fobs him with two dualcore chips on a cracker instead of an integrated four core design.

      I don't agree with you here. What matters to the customer are costs and performance. They shouldn't have to care about how the package works, as long as it works correctly.

      From Intel's perspective, they had two options:

      1. Start with a new design that integrates all four cores on a single chip.
      2. Put two existing chips onto one package. Chips that they've been manufacturing for quite some time, so yields are good and there's headroom for higher clock speeds or lower power consumption.

      From the customer's perspective, those two options correspond to:

      1. A chip that performs a bit better, but probably costs more and definitely comes on the market later.
      2. A package that's got some performance drawbacks in certain situations, but is available now at a reasonable price.

      What do you think Intel and their customers prefer?

    2. Re:quad is a quad and I want a cheap 8-way desktop by Sycraft-fu · · Score: 3, Insightful

      Also Intel's seems to have shown that having two units that need to communicate across the FSB doesn't really cause any problems. Worked fine for their Pentium Ds (2 single cores) works fine for the quads. While bus contention assuredly becomes a problem at some point, with just two units it doesn't seem to be for normal tasks.

      Thus it makes it a worthwhile design to go with. I could see it continuing too. Maybe their next gen chips are 4 cores on a single unit which goes mainstream, and then an 8 core 2 unit job for higher end stuff. At some point there may be too many cores per unit to do with without bus contention, but them maybe not since the speed of the bus keeps getting increased. Also I could see OSes being made aware of this, if it continues, and knowing that each X number of processors is a unit and you can shuffle all you like withing that, but shuffling across units incurs more penalties and thus isn't done unless it has to be. So if a process had 4 threads, and a unit was 4 cores, it'd make sure all the threads were running on the same unit.

      Regardless, you are correct that at this point it is an excellent idea. Doesn't matter if it is the most technically correct solution or not, what matters is that it works well and is cheap.

      We make concessions like that all the time in the computer world. Memory would be a good example. For a good while on desktops, memory, the FSB, and the processor ran at the same speed. You had a 30MHz 386, you were running 30MHz memory. Multipliers weren't a things you worried about. Then, we started to run in to limits of what memory could do. We could scale processors faster than RAM, or at least faster than RAM could be done cheaply. Thus the start of clock multiplied chips. This works, but at some point the memory is just too slow. So then we start getting in to tricks like DDR RAM, which transfers twice per clock cycle, and interleaving RAM, so that the processor has two channels to get faster access and so on. Currently you can have a CPU at one speed, an FSB at another, and memory at a third. Right now I've got a 2.66GHz CPU, a "1333MHz" FSB (it's not really 1333MHz, FSBs are quad pumped so it really runs at 333MHz) and "667MHz" RAM (again not really, it's DDR so the actual memory clock is 166MHz, bus clock is 333MHz, it just does 667 million data transfers per second hence the rate) and this is not an uncommon setup.

      None of this is an ideal setup. Ideally, the FSB would run at the same speed as the processor and so would the RAM. This would lead to the processor having almost no wait time for memory data and very little need for trickery to try and prefetch data and such. However alas, if it were possible at all it would be too expensive to do. Thus we have this somewhat hacked solution. However in reality it matters little, though a hack it may be, it works real well. It has given us memory that can get the data to the CPU in a timely fashion and doesn't break the bank.

  6. Re:yes but ... by utnapistim · · Score: 3, Funny

    ... will a beowulf cluster of these run linux, or blend?

    --
    Tie two birds together: although they have four wings, they cannot fly. (The blind man)
  7. Freudian Processor? by Anonymous Coward · · Score: 3, Funny

    Am I the only person who read the headline as "Sun to Release 8-Core Viagra 2 Processor"?

    1. Re:Freudian Processor? by ettlz · · Score: 4, Funny

      Sounds like you've got serious uptime on the mind.

  8. Re:Sun doesn't get much processor press by Anonymous Coward · · Score: 5, Interesting

    With all due respect mate, you don't have a clue. We, like most other financial companies in the world, buy Sun/IBM P5/HPUX/etc stuff because it is *cheap*... seriously, compared to the mainframes that handle the real back end, these babies are practically free.

    Also, if the last thing you have touched is a V440 then you are not exactly up to speed with the cutting edge of Sun products. I promise you that if you had actually ever seen a system running a T1 chip you would not say "their processor division has been kinda lagging". The cool threads stuff is amazing and they are the only people doing anything quite like it. I am not sure if you picked this up from the article but with one chip you get _64_ hardware based threads.

    In our internal benchmarks a £20k T2000 with 1 x 8 core T1 outperformed a £100k+ V880 with 8 x 2 core Sparc. Freakin' cool and excellent value for money. Plus all this fits in two rack units.

    Working in small companies is nice but I promise you that out there in the big wide world "most" companies don't think that $US20k is very much at all to spend on a system that will be part of a critical service.

  9. Re:on-chip 10G Ethernet ports by Cheesey · · Score: 4, Informative

    High-speed CPUs are all limited by a bottleneck - getting data on and off chip. Putting the Ethernet controllers on chip helps to offset this.

    In the future, it is likely that all the wired buses in your motherboard will be replaced by an internal Ethernet-like network. We are already seeing a trend towards simpler and faster interconnects such as SATA. The next step is to use Ethernet-style connections for every chip-to-chip link, and within the chips themselves too. If this seems unlikely, consider that your PCs memory bus already is basically a network connection. The device at one end (CPU) is in a different clock domain to the device at the other (memory). Data is sent in packets (called bursts) to offset the latency of setting up a transfer.

    --
    >north
    You're an immobile computer, remember?
  10. Re:Sun doesn't get much processor press by Capt+James+McCarthy · · Score: 5, Interesting

    "Ok well for that price, we can literally buy a new fairly high performance server from someone like Dell or Gateway (with a 3 year warranty)."

    It's all realative. Your 'high performance' Dell or Gateway wouldn't do much other then run bind at one of our locations. You are comparing apples to oranges. These systems are not for you to surf the net with, and as for price, well there is a lot to be gained from stability. I still have sparc systems with OEM (minus the disks) that are close to 20 years old running at some locations. Bet your Dell can't say that.

    --
    There are no loopholes. It's either legal or it's not.
  11. Not going to be the fastest, but... by zeromemory · · Score: 4, Informative
    Sun donated one of the original T2000 (based on the original 8-core 4-thread/core Niagara processor) systems to a campus organization where I'm a volunteer system administrator, so think I have quite a bit of experience with this processor. Here's my take on the Niagara2, based upon my experiences with the Niagara1:
    • No, this processor is not going to be the 'fastest' processor out there; this processor is designed primarily for workloads that don't require floating-point calculations (web servers, mail, etc), so it's not going to be the go-to processor for places like rendering farms. In fact, float-point performance on the Niagara1 was so terrible that Sun included a special cryptographic accelerator to help with SSL performance (the primary consumer of floating-point calculations on most web servers).
    • This processor architecture absolutely rocks for the purpose it was intended, though. It consumes very little power, but handles service loads amazingly well. We also have a Sun v40z (8-core Opteron server) that would barely be able to keep up with the our T2000 (that's saying a lot), and our T2000 consumes only a little more than half as much power going into our v40z (2.6A @ 120VAC compared to 4.6A @ 120VAC).
    • The inclusion of 10GbE support is going to be absolutely essential and will help make servers based upon the Niagara2 stand-out compared to servers from competing vendors. Why is 10GbE so important? I mean, we already have GbE, and most places barely have an infrastructure for that in place, right? The answer is SAN. 10GbE is going to be necessary if you're going to be using iSCSI to consolidate storage and deliver reasonable performance, and most places are heading in that direction, especially the target market for these systems.
    • Solaris Logical Domains (not to be confused with Sun Containers or Zones) is a hardware-based virtualization technology that was packaged with the Niagara1 and will probably be included with the Niagara2. Using Logical Domains, you can create independent virtual servers running different operating systems and divide hardware resources up between them, down to the individual CPU thread and PCI Express bus leaf level. Unlike software virtualization solutions, all your virtual servers are never dependent on any single virtual server (global zone, dom0, etc). This technology is making hardware virtualization a possibility for many places.

    I think the Niagara is a pretty solid design, but it's not the processor to end all processors. For service workloads, I don't think you can get a better processor, but you probably don't want one of these processors in your workstation. Sun Microsystems is also headed in the right direction, establishing an open-community around these processors and Solaris.
    1. Re:Not going to be the fastest, but... by Alioth · · Score: 4, Informative

      The floating point performance of the new processor should be like night and day compared to the old one you had: the old one apparently only has 1 FPU for the entire device - the new one has an FPU per core.

  12. Re:Yes, but.. by dbIII · · Score: 4, Funny

    Yes, but will it run Vista?

    It has a Vista emulation mode - move the power switch to OFF and you get something just as useful but more stable.

  13. Re:Yes, but.. by Anonymous Coward · · Score: 4, Funny
    Yes, but will it run Vista?


    No, Vista requires 640 cores, which ought to be enough for anybody.

  14. Re:Sun doesn't get much processor press by MysteriousPreacher · · Score: 4, Insightful

    A few points.

    1) Sun is not trying to win the hearts and minds of home users - that is not their market. Sun would see few benefits from pushing their products in the mainstream media. Trade press is where they reach the decision makers. How many Oracle adverts do you see in game magazines and tabloid newspapers? Not very many, they tend to advertise in business oriented outlets such as The Economist.
    2) Some small businesses don't care about computers at all. The companies that need Sun will buy Sun. The companies who can run their business out of a box of post-it notes will do the former.
    3) When you buy mission critical hardware, you don't look for a '3 year warranty'. You look for a service and support contract based on how critical the hardware is to your business. If you can run your business on a home-made 486dx system running Minix then that is probably the best option.
    4) Sun being worth 10% of Intel is irrelevant. The Economist sells far fewer copies than The Sun (a pretty terrible UK tabloid) but I know which one I'd chose for a serious overview of world news.
    5) This is a techie web site so news like this seems pretty relevant here, even if most of us can't afford to buy the kit.

    --
    -- Using the preview button since 2005
  15. Oh no, not again.... by ricky-road-flats · · Score: 5, Insightful
    It's already been said, but that's a big glossy load of poop.

    The quads from Intel provide four physical cores per socket. That is the definition of a quad in this context. The exact workings of how many bits of silicon there are, how they talk to each other and to the rest of the system is, to 99.999% of users and computer buyers, background fluff.

    This was the same as when Intel put two single-core chips into a package to release a 'dual core'. Lots of people like you jumped up and down and pointed out it wan't *real* dual core, and how the FSB issue would cripple performance. Amazingly, it wasn't the case - they sold in droves, and real-world performance was good enough to carry Intel through to the 'true' dual core, the Core 2 Duo.

    If the competition had anything out that was the same cost and performed significantly better than the 'fake' quad cores, you would have an argument. But they haven't and you don't. Bear in mind I'm talking about the huge x86/x64 market, not the relatively low volume non-x86 server market.

    What Intel did back then and again now is perfectly sensible. They have millions of high yield, robust dual core chips being churned out, and they have built into the infrastructure the ability to put two into a package, lower the speed a bit to drop the per-core heat output, and sell reasonably priced (now) quad core chips. When the drop to 45nm happens, they will release their 'real' quad cores, and pretty quickly put two of those into a package to start selling oct-core (whatever we're going to call them). And so it goes.

    What's the alternative? Not sell quads until 45nm comes out? Not working out too well for AMD is it? I've asked the question before here and on realworldtech.com - at what point will the FSB problem actually become a painful problem for the Intel chips? Well, not yet (4 core) is the answer, despite dire predictions from the AMD camp for years. My gues is that, shock of shocks, Intel have actually thought it through - and that's why CSI is coming. When the number of cores gets to the point where FSB will actually hurt performance relative to the AMD architecture, that's when CSI will kick in. Maybe at 8 cores, maybe at 16.

    What, you don't need quad core yet? Fine, stop your bitching and choose what's right for you. Vive la difference, and 3 cheers for a market that gives us the choice.

  16. The new Sun Moto: by teknopurge · · Score: 3, Insightful

    "Do No Evil"

    It's like it's 1999 all-over again, except this time Sun actually has revenue in-line with expectations. I continue to maintain Sun is this century's Bell Labs and Xerox PARC all rolled into one.