IBM Unveils Fastest Microprocessor Ever

← Back to Stories (view on slashdot.org)

IBM Unveils Fastest Microprocessor Ever

Posted by samzenpus on Wednesday September 1, 2010 @11:57PM from the greased-lightning dept.

adeelarshad82 writes "IBM revealed details of its 5.2-GHz chip, the fastest microprocessor ever announced. Costing hundreds of thousands of dollars, IBM described the z196, which will power its Z-series of mainframes. The z196 contains 1.4 billion transistors on a chip measuring 512 square millimeters fabricated on 45-nm PD SOI technology. It contains a 64KB L1 instruction cache, a 128KB L1 data cache, a 1.5MB private L2 cache per core, plus a pair of co-processors used for cryptographic operations. IBM is set to ship the chip in September."

48 of 292 comments (clear)

Required by Anonymous Coward · 2010-09-01 23:59 · Score: 4, Funny

But will it run ... a Beowolf cluster of ...
[Comment terminated : memelock detected]
1. Re:Required by MobileTatsu-NJG · 2010-09-02 03:51 · Score: 2, Insightful
  
  [Comment terminated : memelock detected]
  If Slashdot ever gets this working I'll instantly subscribe.
  
  --
  
  "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
Speed times Quantity? by TaoPhoenix · 2010-09-02 00:01 · Score: 2, Interesting

So what is this beast supposed to be, a 64 core machine?
Didn't we retire the Ghz wars 5 years ago? I know, AMD style "more done per cycle", but isn't a quad core 3.1 Ghz per chip with 20% logistic overhead faster?

--
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
1. Re:Speed times Quantity? by Haedrian · 2010-09-02 00:05 · Score: 5, Informative
  
  The thing is that if you have 2 (say) 1.6 GHz processors, they aren't as 'powerful' as one 3.2 GHz processor.
  For one - there are overheads, certain stuff common between them, pipelines - stuff which I forgot (computer engineering related problems).
  But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.
2. Re:Speed times Quantity? by Carewolf · 2010-09-02 00:24 · Score: 2, Interesting
  
  BTW, TFA mentions L1 cache per core but doesn't mention how many cores this chip scales up to. Could it be just one?
  It later mentions using 128Mbyte just for level 1 cache, so that would be around 1024 cores.
3. Re:Speed times Quantity? by MichaelSmith · 2010-09-02 00:24 · Score: 2, Insightful
  
  But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.
  I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.
  Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.
  
  --
  http://michaelsmith.id.au
4. Re:Speed times Quantity? by asliarun · 2010-09-02 00:39 · Score: 2, Insightful
  
  The thing is that if you have 2 (say) 1.6 GHz processors, they aren't as 'powerful' as one 3.2 GHz processor.
  For one - there are overheads, certain stuff common between them, pipelines - stuff which I forgot (computer engineering related problems).
  But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.
  OK, firstly the OP should have said that this is the microprocessor with the highest clock speed. Calling it the fastest CPU is extremely misleading. In most modern CPUs, clockspeed is NOT related to throughput. The Intel Sandy Bridge or Nehalem CPU for example may be running its 4 cores at a clockspeed of 3.2GHz but overall, each core in the CPU is easily 4-5 times faster than a 3.2GHz Pentium4 core.
  Secondly, many of the bottlenecks that you allude to are no longer major bottlenecks. CPU interconnect bandwidth and memory bandwidth is now large enough that this is no longer an issue - the days of FSB saturation are over. Of course, there are exceptions to every rule, but I mean this for most workloads.
  Yes, you are correct as far as single threaded workloads are concerned. Nonetheless, you cannot even compare two different CPUs on a clockspeed basis, especially those with completely different architectures, even for single threaded workloads. IBM may have created a very highly clocked CPU and given it tons of transistors, but I seriously doubt if it will compete with a modern day server CPU from Intel or even AMD (pure performance maybe, but definitely not price-performance or performance-per-watt). I strongly suspect that it will probably succeed because of its RAS features, overall system bandwidth, and platform, not because of its raw clockspeed or performance.
5. Re:Speed times Quantity? by Anonymous Coward · 2010-09-02 00:54 · Score: 5, Insightful
  
  More or less. They hit two walls - fabricating chips that could run faster while retaining an acceptable yield, and dealing with the heat such chips produced.
  The fastest general-sale chips were the P4s - the end of their line marked the end of the gigahertz wars, as Intel switched from ramping up the clock to ramping up the per-cycle efficiency with the Core 2 and their complete architecture overhaul. As a result a 2GHz Core 2 duo will outperform a 4GHz P4 dual-core under most conditions. Better pipeline organisation, larger caches better managed.
  Clock rate is no longer the key variable in comparing processors, unless they are of the same microarchitecture.
6. Re:Speed times Quantity? by bws111 · 2010-09-02 01:02 · Score: 4, Informative
  
  When configured to run Linux, each core costs approx $125K. When configured for z/OS, each core costs approx $250K. A complete system (not including any storage or software) can cost up to around $30M.
7. Re:Speed times Quantity? by mickwd · 2010-09-02 01:33 · Score: 4, Insightful
  
  "clockspeed is NOT related to throughput"
  Of course it is. It is not, however, the only factor, and other factors may indeed (and commonly do) outweigh it.
  "IBM may have created a very highly clocked CPU and given it tons of transistors, but I seriously doubt if it will compete with a modern day server CPU from Intel or even AMD."
  I think you underestimate IBM's technical ability. They do have some idea of what they're doing.
  "pure performance maybe, but definitely not price-performance or performance-per-watt"
  That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.
8. Re:Speed times Quantity? by Jeremy+Erwin · 2010-09-02 01:57 · Score: 2, Insightful
  
  It's quad core. 24 MB of L3 Cache, and 96 MB of L4 Cache.
  source
9. Re:Speed times Quantity? by Jeremy+Erwin · 2010-09-02 02:05 · Score: 2, Informative
  
  Actually, IBM can upgrade mainframes over the internet. It can also downgrade it, if the lessee so chooses. The extra chips are used for failover.
10. Re:Speed times Quantity? by asliarun · 2010-09-02 02:10 · Score: 2, Interesting
  
  "clockspeed is NOT related to throughput"
  Of course it is. It is not, however, the only factor, and other factors may indeed (and commonly do) outweigh it.
  You took my comment out of context. I was responding to the original post that focused purely on clockspeed as a magic mantra. What you say is only true if you are talking about clock speed increase in the same microarchitecture, ceteris paribus. Making a blanket claim that we have the fastest CPU because we have clocked it at 5GHZ means nothing. I could overclock a P4 to 5GHZ using exotic cooling and my laptop would still probably beat it in terms of performance.
  
  I think you underestimate IBM's technical ability. They do have some idea of what they're doing.
  Of course they do. I wasn't talking trash about the chip. The point I was trying to make is that the days of exotic chips and boutique chip manufacturers are getting over, at least in the mainstream server space. IBM is just trying to be performance competitive and retain the mainframe server niche. If you notice the trend in servers, commodity servers are becoming more powerful and stable at a much faster rate than niche servers.
  Having said this, performance may not even be the most important consideration in large servers. Other factors like stability, ability to handle failures, platform, etc. are probably much more important. I suspect that sensationalized headlines like this are only a marketing ruse and meant for boasting rights.
  This is not to take anything away from IBM, I'm just making a comment on the overall trend and where this will eventually lead.
  
  That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.
  Sorry, wrong analogy. I was actually being cautious when I said this since I hadn't really seen any benchmarks. Even on pure performance, I am not too sure if the IBM chip will really trounce the upcoming CPUs from Intel and AMD.
11. Re:Speed times Quantity? by LWATCDR · 2010-09-02 02:26 · Score: 4, Informative
  
  Banks, Credit card companies, hospitals, Insurance companies...
  Cheap clusters are great but they are not always the best tool for the job.
  Very large traditional datasets involving lots of high value transactions, with 5 9s uptime requirements do not tend to scale well to COTS clusters.
  IBM mainframes have uptimes measured in years if not decades.
  They have hot swapable everything including CPUs. so you can do ugrades with zero downtime.
  Also you need to take a look at the costs involved. The costs to throw out a working software system that has been used for decades and then the cost to redesign it to work on a Cluster of X86 boes will be huge.
  Not to mention the investment in making it fault tolerant and if it is used in certain markets the cost of the auditing the software.
  Not to mention that ZSystems tend to be really secure. There are just not a lot of exploits on Zsystems.
  When downtime can cost millions of dollars hardware costs are just no that big of a deal.
  Now if you are starting from scratch then you may save money by going with a cluster but then you may not depending on just how good your programmers are.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
12. Re:Speed times Quantity? by LWATCDR · 2010-09-02 02:50 · Score: 4, Informative
  
  "They say it's an old CISC architecture. This is probably the sort of system that runs horribly outdated and un-updatable code, like the tax system."
  You mean like Windows?
  The X86 is also an old CISC architecture.
  Actually the Power line is RISC anyway. When it is used in a ZMachine the old style 360/370/390 CISC ISA is translated to RISC and then executed.
  Before you go ew that is what modern X86 chips do as well as ARM when using the Thumb Instruction set. The ZSystem ISA is so high end it is almost a high level language so the translation doesn't really effect performance much at all. Also that old CISC architecture is much better than the mess that we have on the X86.
  I am not sure about how IBM does the translation. On the System 38 AS/400 System-I the translation was done during the IPL aka Initial Program Load. On the Zs it may be done as a JIT but I am not sure.
  Honestly I love the idea and wish that Linux would adopt it. You could then have one binary that would work on any Linux system on an CPU.
  The AS400 way kept a native binary copy along with the TIMI copy. When the program was run the first time it would translate the TIMI copy into the native segment. Yes the first time you ran the program it might take a bit to start but after that it would run at full speed and start fast. Of course you could add a binary segment when you first released the code for the ISA of your choice.
  All in all those old Mainframes and Minis had a lot of brilliant tech we still don't have today on our PCs.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
13. Re:Speed times Quantity? by root_42 · 2010-09-02 03:00 · Score: 2, Informative
  
  It later mentions using 128Mbyte just for level 1 cache, so that would be around 1024 cores.
  
  WP has the answer: http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)
  
  Four cores, 128 KByte L1 data cache, 64 KByte instruction cache.
  
  --
  [--- PGP key and more on http://www.root42.de ---]
14. Re:Speed times Quantity? by mikechant · 2010-09-02 03:14 · Score: 2, Informative
  
  IBM mainframes have uptimes measured in years if not decades.
  Not in my experience. I can think of at least two factors that require more frequent IPLs.
  1/ Switch back to 'normal' time from DST (e.g. BST to GMT in the UK). Although it's possible to put the mainframe clock forward dynamically (well, change the local time offset actually) sucessfully on many (if not all) systems, in practice most systems will not cope with the clock going backwards (i.e. the 'same hour' happening again) even though the OS supports it. Generally you have to shut the system down for an hour, then IPL. You could probably get away with shutting down all batch initiators and CICS/DB etc. address spaces and then bringing them up again after waiting an hour, but it's typically less risky to follow the established IPL procedure, and this IPL generally obviates the need to have a seperate IPL for 2/; regardless, the machine is effectively down for more than an hour.
  It may be possible to achive continuous operation while moving the time offset backwards with some limited subsets of software but I haven't seen it, and although running on a fixed time and effectively ignoing DST will work, this creates problems of its own and doesn't solve 2/
  2/ 'CSA creep' - tiny bits of orphaned storage (often left by non-IBM supplied products)eventually fill up restricted size critical storage areas such as the CSA, this could lead to an unscheduled IPL, so typically an IPL every (e.g.) 6 months is advisable.
  Not to say that specific systems can't run longer than this (e.g., run on GMT or equivalent at all times, do not tolerate any product which leaks memory in critical areas at all), but I think that's pretty unusual.
15. Re:Speed times Quantity? by Anonymous Coward · 2010-09-02 03:25 · Score: 3, Insightful
  
  Mainframes are engineered fundamentally around two things: Reliability and IOPS.
  When it comes to basic tasks, it isn't often that a large server ends up CPU bound (especially database servers). Instead what usually becomes the bottleneck is I/O and RAM.
  Reliability is where mainframes take the cake. Some use multiple CPUs to execute the same instructions to make sure the output is correct. Mainframes have virtually redundant everything. Because they have been doing VM since the dawn of computing, it may be that a LPAR might need kicked, but a full IPL of a mainframe is exceedingly rare.
  IBM System z machines are on one end of the spectrum. They cost an arm and a leg, but if someone has a lot of 1U servers or even blades, it might be better to just dump the rackfuls of those machines and go with some big iron and LPARs. The TCO of a machine isn't just the price tag of the box, nor the licenses or service fees. One factor people forget is how many admins are needed to keep things going. Some companies are far better off with a mainframe and some Linux admins as opposed to a rackfuls of Windows machines that require an army of MS-ITPs to keep running.
  Believe it or not, mainframes have advanced along with the times. They have always been reliable and boring. COBOL is long gone except for way legacy stuff. Instead, you still have Oracle, WebSphere, JBoss, and many other behind the scene applications which are not flashy, but are business critical.
  Mainframes also come with their own viewpoint. On one hand, a company can buy enough x86 servers with clustering, redundancy, failover capability, and other items to reduce the MTBF of those servers to an acceptable level. On the other hand, a company can pay the ticket to the System z series and have one machine that has an extremely high MTBF with less of the need of a HA cluster. Even with all the clustering and redundancy of x86 machines, there is only so much lipstick you can put on a pig before it turns into a oinking ball of wax, so if some wants to go the x86 route, it will require a lot more employees to keep things running.
16. Re:Speed times Quantity? by gorzek · 2010-09-02 03:25 · Score: 3, Interesting
  
  Yeah, it's actually kind of funny how today's Intel desktop processors actually trace their lineage to the Pentium M, which was a mobile chip. When the Pentium 4 came around, the Pentium Pro (Pentium II, Pentium III) architecture was pretty much relegated to the mobile market while Pentium 4 represented their desktop line. As you said, they ran into heat (and power) issues with the Pentium 4s and basically had no more room for expansion there. They went back to the Pentium M, which was doing pretty nicely in the notebook space, and since it was low-power and efficient it became the basis for their future desktop CPUs--the Core line, in particular. They just stopped playing up the clock speed because that architecture's clock speeds were substantially lower than the Pentium 4, despite being able to do more work. I read once that a Pentium M could do about 40% more work than a Pentium 4 of the same clock, so in essence a 2GHz Pentium M was about as powerful as a 3.2 GHz P4.
  Switching everything over to the low-power and parallel-friendly Pentium M line is probably one of the smartest things Intel ever did. They would've dug their own grave had they stuck with building on Pentium 4 to the bitter end.
  
  --
  Check out my world simulator thingy.
17. Re:Speed times Quantity? by knarf · 2010-09-02 03:37 · Score: 2, Informative
  
  Clock rate is no longer the key variable in comparing processors, unless they are of the same microarchitecture.
  Clock rate has *never* been the key variable in comparing processors. Even back in the heady days of 1 MHz 6502/6510 vs 4 MHz Z80 the comparison was useless - the 6510 does way more per cycle than the Z80 and ends up being comparable speed-wise.
  
  --
  --frank[at]unternet.org
18. Re:Speed times Quantity? by LWATCDR · 2010-09-02 03:53 · Score: 2, Informative
  
  It has been a while but really?
  I have never seen a mainframe that didn't use Zulu time. Also in the shop I worked all software was quality verified. One machine was at the five year uptime mark when I left but it was a none commercial system.
  
  --
  See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
19. Re:Speed times Quantity? by InfiniteWisdom · 2010-09-02 03:56 · Score: 2
  
  According to the Passmark benchmark, a 3.20 GHz scores 524, compared to 10221 for a 3.20 GHz Core i7 970 six-core CPU. That works out to 3.14 times faster per core than the Pentium 4. While short of 4-5, the GP is not as far off the mark as your ridicule would suggest.
  I actually think YOU (and the cretin who modded you insightful) fail.
20. Re:Speed times Quantity? by sexconker · 2010-09-02 04:27 · Score: 2, Informative
  
  The word you are looking for is "sleight".
  Sleight of hand.
21. Re:Speed times Quantity? by TheRaven64 · 2010-09-02 04:29 · Score: 2, Informative
  
  The X86 is also an old CISC architecture.
  Actually x86 is a new CISC architecture. The System/360 architecture predates it by over two decades. x86 was about the last CISC ISA to be developed outside of a few tiny niches.
  
  Actually the Power line is RISC anyway. When it is used in a ZMachine the old style 360/370/390 CISC ISA is translated to RISC and then executed
  Umm, no. POWER is RISC (well, RISC purists would say that's stretching the point), but POWER and System/z are completely unrelated. The POWER6 and z10, and POWER7 and this chip, were designed by cooperating teams, so they share some execution units, but they are very different architectures. This is not a POWER CPU running a System/360 emulator, it's a machine with a CPU that happens to have a few pipelines in common with a POWER CPU.
  
  --
  I am TheRaven on Soylent News
22. Re:Speed times Quantity? by CanadianRealist · 2010-09-02 07:09 · Score: 2, Funny
  
  Not saying I'd recommend it, but if that's the measure that you want to use then I'd say a cheap clunker could probably beat the both of them*.
  *Take the rest of the cash your would have spent buying either one and spend that on blowjobs.
Price: RTFA by miketheanimal · 2010-09-02 00:03 · Score: 5, Informative

The Z-series mainframes cost hundreds of thousands (or even over a million) dollars, not the chips. As it says in the article.
1. Re:Price: RTFA by jtollefson · 2010-09-02 02:01 · Score: 2, Informative
  
  They're very expensive, but for Enterprise scale workloads they're cheaper than the comparable distributed system. The cost entirely depends on how many cores you're running, and more importantly your monthly usage. IBM bills you for your Iron depending on an average of how much you used it that month. There's a reason why Mainframes run so quick and fast, they're the only system where all processing from user ISPF interaction all the way to data processing is tracked. All that processing turns into your final bill with IBM, so upper management has a tendency to pay close attention to usage unlike other systems... But thankfully IBM lets them out on a monthly installment plan. They're kinda like QVC like that...
2. Re:Price: RTFA by QuantumBeep · 2010-09-02 04:11 · Score: 2, Informative
  
  IBM mainframes are leased.
3. Re:Price: RTFA by bws111 · 2010-09-02 05:30 · Score: 3, Informative
  
  You can buy or lease the hardware. The software is licensed under contract.
  It seems like the GP is talking about software charges, not hardware. Software can be either monthly fee based or usage based. If it is usage based you must send a usage report to IBM so they can bill you. That is specified in the contract. In either case, the number of and performance of the CPs is calculated into the cost.
  Hardware is a different story. With hardware, the number of cores you purchase is not the same as the number you get. For instance, you can buy a 1 core machine, but what you get is 16 cores. Only 1 core is enabled in the firmware though. IBM has offerings (again under contract) where you can buy the right to temporarily enable additional processors instantaneously (like if you lost one of your datacenters and need to transfer the workload to another one). With these offerings, you also need to send usage info to IBM so they can bill you for the time that the additional cores have been enabled.
Great news for Mac OS X users! by squiggleslash · 2010-09-02 00:03 · Score: 4, Funny

I can't wait to get a PowerMac G6 with this CPU, in your face Dell users with your commodity Intel-based desi... oh, wait.

--
You are not alone. This is not normal. None of this is normal.
1. Re:Great news for Mac OS X users! by fuzzyfuzzyfungus · 2010-09-02 00:33 · Score: 4, Funny
  
  The PowerMac G6 would be pretty impressive. The PowerBook G6 manual would include the following phrase:
  
  "Please note: The revolutionary new MagsafePro 3-Phase/480 power connector is not backwards compatible with the Magsafe connectors of prior, non-containerized Mac Portables."
2. Re:Great news for Mac OS X users! by UnknowingFool · 2010-09-02 00:42 · Score: 2, Informative
  
  Unfortunately this chip will most likely go into workstations and servers. In order for IBM to make a desktop version, it will have to make a custom chip to handle things like video, sound, etc. This will lead to same logistical problems for Apple that it had before. Manufacturing companies do not want to keep excess inventories whether it was Apple or IBM. If Apple needs more, it will have to wait while IBM rearranges their manufacturing schedules to compensate. Also even if Apple orders millions of these, it will still be a small customer to IBM; IBM's internal divisions would order more of the stock chip. And the last reason Apple will not go back to IBM; IBM's mobile chip offerings lag way behind Intel's. IBM never made a mobile G5 chip. My guess is that they could never make one that had acceptable power consumption. IBM could do it with enough R&D but again it would be for a very small customer. Not worth enough to the bottom line.
  
  --
  Well, there's spam egg sausage and spam, that's not got much spam in it.
3. Re:Great news for Mac OS X users! by TheRaven64 · 2010-09-02 01:21 · Score: 4, Informative
  
  Wrong chip family. This is the Z-series mainframe chip, using an instruction set that is backwards compatible with the System/360 stuff from back in 1960 (the architecture of the future, as the marketing material trying to persuade my university to upgrade their IBM 1620 put it). The PowerMacs were using PowerPC chips, which use the same instruction set as the POWER CPUs from IBM (they used to be similar, with a common subset, now they are identical).
  The chip that this is replacing, the z10, was designed concurrently with the POWER6. They share a number of common features, including a lot of the same execution engines (both have the same hardware BCD units, for example, as well as more common arithmetic units), but they are very different in a number of other aspects, including the instruction set, cache design, and inter-processor interconnect, because they are designed for different workloads.
  I've not read much about this chip yet, but I think it shares some design elements with the POWER7, in the same way that the z10 did with the POWER6.
  In short, while some of the R&D money spent on this CPU made it into chips that could, potentially, run OS X, this chip itself could not without a major rewrite.
  
  --
  I am TheRaven on Soylent News
This chip snickers at my 6502... by bobdotorg · 2010-09-02 00:08 · Score: 3, Insightful

The chip uses 1,079 different instructions
Can't even imagine writing in assembly code for this monster. I miss dinking around with a nice 6502 system.

--
__ Someday, but not this morning, I'll finally learn to use the preview button.
Re:Yeah, I read about this by Spad · 2010-09-02 00:09 · Score: 4, Insightful

Yes, but their article comments are much closer to Youtube than Slashdot.
Re:Microchip? by the_fat_kid · 2010-09-02 00:19 · Score: 2, Interesting

iChip?

--
-- Sig under construction...
Wait....what? by antifoidulus · 2010-09-02 00:41 · Score: 2, Insightful

It contains a 64KB L1 instruction cache, a 128KB L1 data cache, a 1.5MB private L2 cache per core, plus a pair of co-processors used for cryptographic operations. In a four-node system, 19.5 MB of SRAM are used for L1 private cache, 144MB for L2 private cache, 576MB of eDRAM for L3 cache, and a whopping 768MB of eDRAM for a level-four cache. All this is used to ensure that the processor finds and executes its instructions before searching for them in main memory, a task which can force the system to essentially wait for the data to be found--dramatically slowing a system that is designed to be as fast as possible.

I'm assuming the cache referred to in the second paragraph is off-chip cache, otherwise it would sort of negate the first sentence.... Would be nice if the article would have actually said that though.

--
Monstar L
1. Re:Wait....what? by Anonymous Coward · 2010-09-02 01:08 · Score: 3, Insightful
  
  Considering the ratio between the two sets of figures is ~96, it seems that the "four-node system" contains 96 cores with their own L1 and L2 caches, but shared L3 and L4 caches.
Re:I doubt it's the fastest ever... by mr_mischief · 2010-09-02 00:44 · Score: 3, Informative

ummmm.......
It's a quad-core chip. Each core has two integer, two load and store, one binary floating point, and one decimal floating point unit. Up to 24 CPUs can be placed in the frame. It can connect to another whole rack of POWER7 blades running AIX as an application accelerator platform.
The z196 is for the stuff a mainframe is good at: big batches and fast I/O. The application accelerator is for stuff the clusters of supermicro servers are good at. As a hybrid system connected across the GX bus, it should pump data in and out of applications out pretty well.
You really don't anymore by Sycraft-fu · 2010-09-02 00:51 · Score: 2, Interesting

These days, compilers take care of almost everything. It has gotten complex to the extent that a programmer trying to do things all in assembly will probably do a worse job than a good compiler. Chips have many, many tools to solve their problems.
That isn't to say it is never done, in some programs there may be some hand optimized assembly for various super speed critical functions. However even then it is most likely written in a high level language, compiled to assembly (you can order most compilers to do that), tuned and then put back in the program.
Memory is cheap and compilers are powerful so assembly is just not as needed as it once was, at least on desktops/servers where you see these massive chips.
1. Re:You really don't anymore by TheRaven64 · 2010-09-02 06:13 · Score: 2, Informative
  
  You've almost certainly used some code compiled with a compiler that I've worked on, but I've hardly ever written assembly code, and none of it was in a compiles.
  
  --
  I am TheRaven on Soylent News
2. Re:You really don't anymore by David+Greene · 2010-09-02 09:45 · Score: 2, Interesting
  
  A couple of things:
  In the first example, 'm' is not being moved to the constant data section. The constant vector being assigned to m is placed there. MSVC is missing the vectorization, not placement of constants into constant memory. You can see that it fetches the constant values from memory using scalar moves while gcc and icc use vector moves.
  I'm not familiar with MSVC switches but you might need to tell it explicitly to vectorize. I'm curious why you didn't try -ftree-vectorize with gcc, for example.
  Floating-point optimization is a tricky thing. Many compilers will be very conservative to retain bitwise equivalent results regardless of optimization level. Some will even go as far as maintaining bitwise equivalence between scalar and vector code. That can severely degrade optimization. Again, most compilers have a switch to enable "unsafe" floating-point optimization. This may be what's tripping up these compilers in some cases.
  NaNs are also an issue with floating-point. The compiler is not allowed to eliminate anything which might raise an exception.
  When encountering intrinsics, many compilers will do exactly as you say, as noted in the article. That's not a bug, it's a feature. When people use intrinsics, they usually are trying to hand-code something and often don't want the compiler to mess with it.
  Some of these tests (the shuffle one for example) are a little out-of-the-ordinary. Compiler developer time is at a premium and it's not worth doing these kinds of micro-optimizations if such code is never seen in the wild. That said, it's clear the some compilers (gcc, for example, and LLVM) do these sorts of things.
  On x86, it's often just fine to spill things to the stack and reload them. My studies show that the number of spills does not matter so much but rather what is spilled. So the number of loads/stores, while a gross indicator of performance, doesn't tell the whole story.
  The comparison test is, I think, one of those cases not worth optimizing. I can't recall ever seeing a vector compare where the operands are known statically. Doing that optimization would require loading static vectors of various combinations of 1s and 0s from memory. It is almost certainly faster to just do the compare. This isn't a missed optimization. In gcc's case it's the compiler doing what it should, regardless of what the programmer expects.
  Even so, these are interesting code examples. It would be neat to see what happens when we turn on -ftree-vectorize, use a newer gcc or try LLVM.
  
  --
A Little more detail here by valadaar · 2010-09-02 01:05 · Score: 2, Informative

If you direct to the IBM announcement, which mentions the system in more detail then this linked article - http://www-03.ibm.com/press/us/en/pressrelease/32414.wss The New zEnterprise 196 " From a performance standpoint, the zEnterprise System is the most powerful commercial IBM system ever. The core server in the zEnterprise System -- called zEnterprise 196 -- contains 96 of the world's fastest, most powerful microprocessors, capable of executing more than 50 billion instructions per second. That's roughly 17,000 times more instructions than the Model 91, the high-end of IBM's popular System/360 family, could execute in 1970." 17k x improvement in performance in 40 years? I suppose that is about right...
Re:So much for the 3.3GHz speed of light limit. by Ecuador · 2010-09-02 01:09 · Score: 4, Informative

The comments were about the fact that at 3GHz light travels 10cm per clock speed, which limits how far you can have 2 items on a bus if you want them to communicate within 1 clock cycle. There is no "light speed barrier" or anything of the sort, however at these frequencies you design knowing that it will take measurable time for an electric signal to propagate. For example, for this particular system whose core is at 5.2GHz, if you try to send a signal to an external memory that is say 11-12cm away, then it will take about two clock cycles just for the signal to travel the distance.

--
Violence is the last refuge of the incompetent. Polar Scope Align for iOS
Re:fastest first post ever? by mark72005 · 2010-09-02 02:21 · Score: 2, Funny

I could never... Aero is... the best thing ever... it's so amazing and useful. It improves the look and feel so much.

Wait, do I have it turned on or off right now?
No programmers over 50? by Terje+Mathisen · 2010-09-02 02:29 · Score: 2, Informative

I guess I'm a counterexample then:
I'm 53.
I believe (hope?) most people who know me would say that I'm still a pretty good programmer.
Terje

--
"almost all programming can be viewed as an exercise in caching"
Bad Golf? by thogard · 2010-09-02 03:20 · Score: 2, Funny

Had a golf game ended differently, would we be seeing these in power macs?
Put that in your iPad by FreeBSD+evangelist · 2010-09-02 04:37 · Score: 2, Funny

From TFA:

IBM also previously claimed the title of fastest microprocessor with the POWER6 chip, which ran at speeds of up to 4.6 to 4.7 GHz, and its own z10, a 2008 chip which ran at speeds of up to 4.4 GHz.
I seem to recall that one of the official reasons Apple gave for the switch from Power to Intel was that IBM couldn't/wouldn't deliver a fast enough processor.