Slashdot Mirror


IBM Unveils Fastest Microprocessor Ever

adeelarshad82 writes "IBM revealed details of its 5.2-GHz chip, the fastest microprocessor ever announced. Costing hundreds of thousands of dollars, IBM described the z196, which will power its Z-series of mainframes. The z196 contains 1.4 billion transistors on a chip measuring 512 square millimeters fabricated on 45-nm PD SOI technology. It contains a 64KB L1 instruction cache, a 128KB L1 data cache, a 1.5MB private L2 cache per core, plus a pair of co-processors used for cryptographic operations. IBM is set to ship the chip in September."

292 comments

  1. Required by Anonymous Coward · · Score: 4, Funny

    But will it run ... a Beowolf cluster of ...

    [Comment terminated : memelock detected]

    1. Re:Required by Anonymous Coward · · Score: 0

      But will it blend? That is the question!

    2. Re:Required by MobileTatsu-NJG · · Score: 2, Insightful

      [Comment terminated : memelock detected]

      If Slashdot ever gets this working I'll instantly subscribe.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    3. Re:Required by jd · · Score: 1

      I'll be able to tell you if IBM is willing to ship me 256 free sample motherboards with the new processor. And a very very fast switch. Oh, and a power station.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    4. Re:Required by sharkey · · Score: 1

      Sounds like it will generate some serious heat, get those grits smoking!

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
    5. Re:Required by Anonymous Coward · · Score: 0

      kewwwwwwwwwlllllllllll..

      I need this processor for my next machine that I'm going to use for strictly checking emails. Let me place an order now!

  2. Speed times Quantity? by TaoPhoenix · · Score: 2, Interesting

    So what is this beast supposed to be, a 64 core machine?

    Didn't we retire the Ghz wars 5 years ago? I know, AMD style "more done per cycle", but isn't a quad core 3.1 Ghz per chip with 20% logistic overhead faster?

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
    1. Re:Speed times Quantity? by Haedrian · · Score: 5, Informative

      The thing is that if you have 2 (say) 1.6 GHz processors, they aren't as 'powerful' as one 3.2 GHz processor.

      For one - there are overheads, certain stuff common between them, pipelines - stuff which I forgot (computer engineering related problems).

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

    2. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      the cpu is designed for running primarily VMs. Since you sometimes want a VM to perform a real time task it's very useful if you can get as many clock cycles per second as you can get

    3. Re:Speed times Quantity? by WrongSizeGlass · · Score: 1

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

      I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers. With two co-processors used for cryptographic operations per chip I'd say they were after a bigger prize than, say, hardcore gamers ;-)

      BTW, TFA mentions L1 cache per core but doesn't mention how many cores this chip scales up to. Could it be just one?

    4. Re:Speed times Quantity? by Carewolf · · Score: 2, Interesting

      BTW, TFA mentions L1 cache per core but doesn't mention how many cores this chip scales up to. Could it be just one?

      It later mentions using 128Mbyte just for level 1 cache, so that would be around 1024 cores.

    5. Re:Speed times Quantity? by MichaelSmith · · Score: 2, Insightful

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

      I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.

      Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.

    6. Re:Speed times Quantity? by dsavi · · Score: 1

      I was wondering about this- Why did the Ghz wars end, anyway? Did the chip makers hit a wall or something? At the rate it was going, I thought we'd have 5Ghz+ processors by now.
      Yeah, I'm uninformed.

    7. Re:Speed times Quantity? by bsdaemonaut · · Score: 1

      You can have up to 96 cores with the z196 processor.. not sure how a quad core 3.1ghz would hope to even compare with that.

    8. Re:Speed times Quantity? by asliarun · · Score: 2, Insightful

      The thing is that if you have 2 (say) 1.6 GHz processors, they aren't as 'powerful' as one 3.2 GHz processor.

      For one - there are overheads, certain stuff common between them, pipelines - stuff which I forgot (computer engineering related problems).

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

      OK, firstly the OP should have said that this is the microprocessor with the highest clock speed. Calling it the fastest CPU is extremely misleading. In most modern CPUs, clockspeed is NOT related to throughput. The Intel Sandy Bridge or Nehalem CPU for example may be running its 4 cores at a clockspeed of 3.2GHz but overall, each core in the CPU is easily 4-5 times faster than a 3.2GHz Pentium4 core.

      Secondly, many of the bottlenecks that you allude to are no longer major bottlenecks. CPU interconnect bandwidth and memory bandwidth is now large enough that this is no longer an issue - the days of FSB saturation are over. Of course, there are exceptions to every rule, but I mean this for most workloads.

      Yes, you are correct as far as single threaded workloads are concerned. Nonetheless, you cannot even compare two different CPUs on a clockspeed basis, especially those with completely different architectures, even for single threaded workloads. IBM may have created a very highly clocked CPU and given it tons of transistors, but I seriously doubt if it will compete with a modern day server CPU from Intel or even AMD (pure performance maybe, but definitely not price-performance or performance-per-watt). I strongly suspect that it will probably succeed because of its RAS features, overall system bandwidth, and platform, not because of its raw clockspeed or performance.

    9. Re:Speed times Quantity? by Pharmboy · · Score: 1

      Shit, if you're paying hundreds of thousands of dollars per CPU

      You aren't. FTA, the complete systems will cost hundreds of thousands of dollars, to a few million. Not the individual CPUs.

      --
      Tequila: It's not just for breakfast anymore!
    10. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      The Problem is that this unicore processor with 3.2 Ghz will use MORE energy than the two 1.6 Ghz processors combined.

      If you have a workload that is easily split up into threads, going multicore will lower costs, because slower cores require less electrical energy...

    11. Re:Speed times Quantity? by Anonymous Coward · · Score: 5, Insightful

      More or less. They hit two walls - fabricating chips that could run faster while retaining an acceptable yield, and dealing with the heat such chips produced.

      The fastest general-sale chips were the P4s - the end of their line marked the end of the gigahertz wars, as Intel switched from ramping up the clock to ramping up the per-cycle efficiency with the Core 2 and their complete architecture overhaul. As a result a 2GHz Core 2 duo will outperform a 4GHz P4 dual-core under most conditions. Better pipeline organisation, larger caches better managed.

      Clock rate is no longer the key variable in comparing processors, unless they are of the same microarchitecture.

    12. Re:Speed times Quantity? by bws111 · · Score: 4, Informative

      When configured to run Linux, each core costs approx $125K. When configured for z/OS, each core costs approx $250K. A complete system (not including any storage or software) can cost up to around $30M.

    13. Re:Speed times Quantity? by JamesP · · Score: 1

      You actually can go faster without x86 hogging you down

      --
      how long until /. fixes commenting on Chrome?
    14. Re:Speed times Quantity? by (Score.5,+Interestin · · Score: 1

      Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.

      If you're paying hundreds of thousands of dollars for a multi-GHz CPU then it's probably because you're trying to make up for the product of crap programmers, not the other way round.

    15. Re:Speed times Quantity? by hedwards · · Score: 1

      There's also the problem of feeding such a monster processor and keeping it synced up with the rest of the machine. On top of that servers for instance tend to cope better with many cores than faster ones after a certain point, which is presumably well before 5ghz. Since servers typically are more concerned with large numbers of connections, chances are that a quad core running at 2ghz would have better performance than a single core 5ghz would, scale that up as needed to the number of cores. Of course frequency is a terrible comparison, but for this purpose it's probably fine.

    16. Re:Speed times Quantity? by mickwd · · Score: 1

      Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.

      Considering that life expectancy in the developed world is in the region of 80 years, there is a reasonable chance that programmers who were under 50 when they wrote code thirty years age are still alive.

      They may have little recollection of what they did 30 years ago, but to say they are all "long dead" is somewhat of an exaggeration.

    17. Re:Speed times Quantity? by Anonymous Coward · · Score: 1, Funny

      > I'd say they were after a bigger prize than, say, hardcore gamers ;-)

      Yeah. They're after the *really fucking hardcore* gamers.

    18. Re:Speed times Quantity? by Sulphur · · Score: 1

      More processors = Share the Legacy.

    19. Re:Speed times Quantity? by hitmark · · Score: 1

      And all the hardware will be there no matter what package you choose, and a "upgrade" will involve a IBM representative coming over to move a jumper.

      --
      comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
    20. Re:Speed times Quantity? by digitalhermit · · Score: 1

      Yup... there are so many dependencies on application and OS code that hardware capability matters very little.

      I recently tried to tune a workload on a pSeries system. We gave it half a processor and 2 virtuals (with the Power version of hyperthreading so it saw 4 processors). Performance was a dog. Load was only 60% of capacity though. We doubled the number of virtual processors but kept the overall entitlement. Load dropped to 40%. Added another couple virtuals and load dropped to 25%. No increase in throughput. It's a classic example of a thread-limited workload... No matter how many processors we could add, the jobs would only run on two. Bumping up those processors might gave 2% here and there, but the bottleneck wasn't CPU. After the development team redid some code (and reduced the number of database calls from 1500 to under 100), the performance improved from 2-3 seconds to 0.9 seconds.

    21. Re:Speed times Quantity? by AHuxley · · Score: 1

      product of crap programmers
      Sorry to ask but who does IBM see using this?
      At the price point and data sets that need sorting? - cheaper clusters or more expensive faster unique chips depending on math?

      --
      Domestic spying is now "Benign Information Gathering"
    22. Re:Speed times Quantity? by cgenman · · Score: 1

      They say it's an old CISC architecture. This is probably the sort of system that runs horribly outdated and un-updatable code, like the tax system.

    23. Re:Speed times Quantity? by mickwd · · Score: 4, Insightful

      "clockspeed is NOT related to throughput"

      Of course it is. It is not, however, the only factor, and other factors may indeed (and commonly do) outweigh it.

      "IBM may have created a very highly clocked CPU and given it tons of transistors, but I seriously doubt if it will compete with a modern day server CPU from Intel or even AMD."

      I think you underestimate IBM's technical ability. They do have some idea of what they're doing.

      "pure performance maybe, but definitely not price-performance or performance-per-watt"

      That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.

    24. Re:Speed times Quantity? by MaskedSlacker · · Score: 1

      I've never met a programmer over 50. I must therefore conclude that they all perish mysteriously upon their 50th birthday. Something like the planet of grim reapers from Futurama is how I prefer to envision it.

    25. Re:Speed times Quantity? by Chris+Mattern · · Score: 1

      Sorry to ask but who does IBM see using this?

      People with legacy mainframe programs that they don't want to port (translation: that they don't dare touch).

    26. Re:Speed times Quantity? by TheTrueScotsman · · Score: 1

      Banks. They need it not for speed but for volume and reliability.

    27. Re:Speed times Quantity? by JasterBobaMereel · · Score: 1

      IBM BlueGene/L - runs at 700 MHz .... 596 TFLOPS

      Cray XT5 - runs at 2.6 Ghz ... 2331.00 TFLOPS

      Both of these are slower in Hz than the PC I am using to type this ....

      --
      Puteulanus fenestra mortis
    28. Re:Speed times Quantity? by cgenman · · Score: 0

      There were much better ways of making faster chips. The gHz thing went on long after it provided tangible speed benefit.

      Think of it this way. A Hz is a "Beat" of your chip. On this beat, old chips did one thing. Slightly less old chips had pipelines of things to do. Now, you have multiple separate huge pipelines of instructions... if you had a pipeline 30 instructions long, you effectively did 30 things each beat (this depended on having thing that pipelined well). Then you have multiple pipelines that you can put instructions in, and multiple chips on-a-die that create multiple sets of multiple pipelines.

      The Hz rating just determines how fast each batch happens. And for years, the marketing thing was to push up that rating, while ignoring the number of instructions you could execute per cycle. But squeezing out more Hz makes hotter chips that take more power and more cooling, which co-coincidently are things you have to optimize for in laptops. Now things have swung the other way, and we have more focus on the number and quality of instructions, as well as making sure the pipelines can be continually fed with data.

      Of course, I suspect part of the reason why people let up on the gHz monkier is because AMD stopped being a real threat.

    29. Re:Speed times Quantity? by David+Greene · · Score: 1

      CPU interconnect bandwidth and memory bandwidth is now large enough that this is no longer an issue

      Well, it depends on what kinds of codes you're running. Memory bandwidth is becoming a bigger problem, not a smaller one. Communication overhead can easily dominate in large parallel codes. These are not niche things, either. We're going more parallel, not less.

      --

    30. Re:Speed times Quantity? by Jeremy+Erwin · · Score: 2, Insightful

      It's quad core. 24 MB of L3 Cache, and 96 MB of L4 Cache.
      source

    31. Re:Speed times Quantity? by Anonymous Coward · · Score: 1, Interesting

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

      I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.

      Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.

      A lot of people think mainframes are "faster" than PCs. They aren't. Modern mainframes use microprocessors just like PCs, but generally somewhat slower, since mainframes aren't intended to be bleeding-edge. What really makes a mainframe a mainframe is attitude - none of this "reboot and start over" and throughput. Mainframes are optimized for doing lots of concurrent I/O.

      However, recent trends in mainframes have also had some additional considerations. One is virtualization. You can have many - in some cases thousands - of Linux VMs in the system. The other is Java. IBM would dearly love to sell you Java on a mainframe. And considering what a dog WebSphere can be, it almost demands a mainframe.

      Forget the top-notch programmer nonsense. The watchword on software these days is "Git 'er Dun!". And IBM is hardly setting a standard, considering how much cheap offshore labor they use internally.

    32. Re:Speed times Quantity? by Jeremy+Erwin · · Score: 2, Informative

      Actually, IBM can upgrade mainframes over the internet. It can also downgrade it, if the lessee so chooses. The extra chips are used for failover.

    33. Re:Speed times Quantity? by Jeremy+Erwin · · Score: 1

      That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.

      I doubt that IBM mainframes suffer from the equivalent of engine fires.

    34. Re:Speed times Quantity? by asliarun · · Score: 2, Interesting

      "clockspeed is NOT related to throughput"

      Of course it is. It is not, however, the only factor, and other factors may indeed (and commonly do) outweigh it.

      You took my comment out of context. I was responding to the original post that focused purely on clockspeed as a magic mantra. What you say is only true if you are talking about clock speed increase in the same microarchitecture, ceteris paribus. Making a blanket claim that we have the fastest CPU because we have clocked it at 5GHZ means nothing. I could overclock a P4 to 5GHZ using exotic cooling and my laptop would still probably beat it in terms of performance.

      I think you underestimate IBM's technical ability. They do have some idea of what they're doing.

      Of course they do. I wasn't talking trash about the chip. The point I was trying to make is that the days of exotic chips and boutique chip manufacturers are getting over, at least in the mainstream server space. IBM is just trying to be performance competitive and retain the mainframe server niche. If you notice the trend in servers, commodity servers are becoming more powerful and stable at a much faster rate than niche servers.

      Having said this, performance may not even be the most important consideration in large servers. Other factors like stability, ability to handle failures, platform, etc. are probably much more important. I suspect that sensationalized headlines like this are only a marketing ruse and meant for boasting rights.

      This is not to take anything away from IBM, I'm just making a comment on the overall trend and where this will eventually lead.

      That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.

      Sorry, wrong analogy. I was actually being cautious when I said this since I hadn't really seen any benchmarks. Even on pure performance, I am not too sure if the IBM chip will really trounce the upcoming CPUs from Intel and AMD.

    35. Re:Speed times Quantity? by stonewallred · · Score: 1

      But the ferrari has the focus beat all to hell in the important blowjob per dollar cost part.

    36. Re:Speed times Quantity? by dfghjk · · Score: 1

      "...as Intel switched from ramping up the clock to ramping up the per-cycle efficiency with the Core 2 and their complete architecture overhaul."

      During the P4 era, Intel quietly developed the PentiumM architecture which was derived from the P3. They did this because they felt the P4 may not be suitable for low power applications. Once they realized the P4 wasn't suitable for ANY application, they switched architectures back to the PentiumM and renamed it "Core" to avoid the taint of the P4. The Core architecture wasn't an overhaul, it was a slight of hand disguising of a Pentium 3 evolution. So you see, Intel didn't switch to per-cycle efficiency, they switched BACK to it. Every generation of architecture prior to the P4 improved per-cycle efficiency. The P4 wasn't the culmination of a certain kind of clock-ramping design, it was an aberration that was eventually corrected.

    37. Re:Speed times Quantity? by asliarun · · Score: 1

      When I think about this some more, I think you are right. The trend towards virtualization is also rapidly increasing the interconnect and memory bandwidth requirements. I'm just guessing - I think that we may end up seeing some drastic architectural shifts in the years to come to solve these issues - perhaps, optical laser interconnects (in silicon).

    38. Re:Speed times Quantity? by LWATCDR · · Score: 4, Informative

      Banks, Credit card companies, hospitals, Insurance companies...
      Cheap clusters are great but they are not always the best tool for the job.
      Very large traditional datasets involving lots of high value transactions, with 5 9s uptime requirements do not tend to scale well to COTS clusters.
      IBM mainframes have uptimes measured in years if not decades.
      They have hot swapable everything including CPUs. so you can do ugrades with zero downtime.
      Also you need to take a look at the costs involved. The costs to throw out a working software system that has been used for decades and then the cost to redesign it to work on a Cluster of X86 boes will be huge.
      Not to mention the investment in making it fault tolerant and if it is used in certain markets the cost of the auditing the software.
      Not to mention that ZSystems tend to be really secure. There are just not a lot of exploits on Zsystems.

      When downtime can cost millions of dollars hardware costs are just no that big of a deal.
      Now if you are starting from scratch then you may save money by going with a cluster but then you may not depending on just how good your programmers are.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    39. Re:Speed times Quantity? by onionman · · Score: 1

      Sorry to ask but who does IBM see using this?

      Options traders, arbitrage firms, etc.

      There are companies that pay millions of dollars to have their machines in the same room as the stock exchange computers just so that they can have that millisecond trading advantage over their competitors who have to endure network lag. Seems like this is exactly the type of hardware they might want to buy.

    40. Re:Speed times Quantity? by level_headed_midwest · · Score: 1

      Slower cores are also much lower in the binning stack than faster cores, and this also makes them less expensive. Spending hundreds or thousands of dollars less per CPU in a system is a big advantage of slower multicore systems, and you get that advantage right up front at system purchase. It takes years to decades to make up that kind of money in energy savings unless you have order of magnitude differences in power consumption between the two CPUs.

      --
      Just "gittin-r-done," day after day.
    41. Re:Speed times Quantity? by timeOday · · Score: 1

      Is that true? Can you really pay to have your trading computer in the same room as the actual stock exchange computers?

    42. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      Yes, but once you're comparing the same uarch, clock speed is the most democratic optimization you can make. Every program gets something.

    43. Re:Speed times Quantity? by CAIMLAS · · Score: 1

      I know, AMD style "more done per cycle"

      If that's a jab, you should consider that AMD was right. They did get a lot more done per cycle, and it was just advertising - it was, and still is, trivial to identify the CPU clock (despite fear mongering at the time saying we'd have such things hidden from us).

      You may recall Intel doing the exact same thing, once they caught up technologically. Now, it's easier to find that info for AMD CPU sales than it is for Intel.

      but isn't a quad core 3.1 Ghz per chip with 20% logistic overhead faster?

      It doesn't matter how fast it is if the software and operating systems you're running aren't able to make use of more than one core. I don't know for certain, but is it possible the Z-series stuff does not handle multicore CPUs? They're not exactly a 'wide distribution' market.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    44. Re:Speed times Quantity? by LWATCDR · · Score: 4, Informative

      "They say it's an old CISC architecture. This is probably the sort of system that runs horribly outdated and un-updatable code, like the tax system."
      You mean like Windows?
      The X86 is also an old CISC architecture.

      Actually the Power line is RISC anyway. When it is used in a ZMachine the old style 360/370/390 CISC ISA is translated to RISC and then executed.
      Before you go ew that is what modern X86 chips do as well as ARM when using the Thumb Instruction set. The ZSystem ISA is so high end it is almost a high level language so the translation doesn't really effect performance much at all. Also that old CISC architecture is much better than the mess that we have on the X86.
      I am not sure about how IBM does the translation. On the System 38 AS/400 System-I the translation was done during the IPL aka Initial Program Load. On the Zs it may be done as a JIT but I am not sure.
      Honestly I love the idea and wish that Linux would adopt it. You could then have one binary that would work on any Linux system on an CPU.
      The AS400 way kept a native binary copy along with the TIMI copy. When the program was run the first time it would translate the TIMI copy into the native segment. Yes the first time you ran the program it might take a bit to start but after that it would run at full speed and start fast. Of course you could add a binary segment when you first released the code for the ISA of your choice.

      All in all those old Mainframes and Minis had a lot of brilliant tech we still don't have today on our PCs.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    45. Re:Speed times Quantity? by root_42 · · Score: 2, Informative

      It later mentions using 128Mbyte just for level 1 cache, so that would be around 1024 cores.

      WP has the answer: http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)

      Four cores, 128 KByte L1 data cache, 64 KByte instruction cache.

      --
      [--- PGP key and more on http://www.root42.de ---]
    46. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      meh, kathleen fent will suck your dick whether you're driving a ferrarri or a focus. I think I'd prefer the focus, in case I cum all over her face and end up shooting jizz all over the interior.

    47. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      The money you save by not buying a ferrari could buy you lots of blowjobs.

    48. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      Banks, Credit card companies, hospitals, Insurance companies...
      Cheap clusters are great but they are not always the best tool for the job.
      Very large traditional datasets involving lots of high value transactions, with 5 9s uptime requirements do not tend to scale well to COTS clusters.
      IBM mainframes have uptimes measured in years if not decades.
      They have hot swapable everything including CPUs. so you can do ugrades with zero downtime.
      Also you need to take a look at the costs involved. The costs to throw out a working software system that has been used for decades and then the cost to redesign it to work on a Cluster of X86 boes will be huge.
      Not to mention the investment in making it fault tolerant and if it is used in certain markets the cost of the auditing the software.
      Not to mention that ZSystems tend to be really secure. There are just not a lot of exploits on Zsystems.

      When downtime can cost millions of dollars hardware costs are just no that big of a deal.
      Now if you are starting from scratch then you may save money by going with a cluster but then you may not depending on just how good your programmers are.

      Why is this modded funny? This is a serious comment.

    49. Re:Speed times Quantity? by mikechant · · Score: 2, Informative

      IBM mainframes have uptimes measured in years if not decades.

      Not in my experience. I can think of at least two factors that require more frequent IPLs.

      1/ Switch back to 'normal' time from DST (e.g. BST to GMT in the UK). Although it's possible to put the mainframe clock forward dynamically (well, change the local time offset actually) sucessfully on many (if not all) systems, in practice most systems will not cope with the clock going backwards (i.e. the 'same hour' happening again) even though the OS supports it. Generally you have to shut the system down for an hour, then IPL. You could probably get away with shutting down all batch initiators and CICS/DB etc. address spaces and then bringing them up again after waiting an hour, but it's typically less risky to follow the established IPL procedure, and this IPL generally obviates the need to have a seperate IPL for 2/; regardless, the machine is effectively down for more than an hour.
      It may be possible to achive continuous operation while moving the time offset backwards with some limited subsets of software but I haven't seen it, and although running on a fixed time and effectively ignoing DST will work, this creates problems of its own and doesn't solve 2/

      2/ 'CSA creep' - tiny bits of orphaned storage (often left by non-IBM supplied products)eventually fill up restricted size critical storage areas such as the CSA, this could lead to an unscheduled IPL, so typically an IPL every (e.g.) 6 months is advisable.

      Not to say that specific systems can't run longer than this (e.g., run on GMT or equivalent at all times, do not tolerate any product which leaks memory in critical areas at all), but I think that's pretty unusual.

    50. Re:Speed times Quantity? by Anonymous Coward · · Score: 3, Insightful

      Mainframes are engineered fundamentally around two things: Reliability and IOPS.

      When it comes to basic tasks, it isn't often that a large server ends up CPU bound (especially database servers). Instead what usually becomes the bottleneck is I/O and RAM.

      Reliability is where mainframes take the cake. Some use multiple CPUs to execute the same instructions to make sure the output is correct. Mainframes have virtually redundant everything. Because they have been doing VM since the dawn of computing, it may be that a LPAR might need kicked, but a full IPL of a mainframe is exceedingly rare.

      IBM System z machines are on one end of the spectrum. They cost an arm and a leg, but if someone has a lot of 1U servers or even blades, it might be better to just dump the rackfuls of those machines and go with some big iron and LPARs. The TCO of a machine isn't just the price tag of the box, nor the licenses or service fees. One factor people forget is how many admins are needed to keep things going. Some companies are far better off with a mainframe and some Linux admins as opposed to a rackfuls of Windows machines that require an army of MS-ITPs to keep running.

      Believe it or not, mainframes have advanced along with the times. They have always been reliable and boring. COBOL is long gone except for way legacy stuff. Instead, you still have Oracle, WebSphere, JBoss, and many other behind the scene applications which are not flashy, but are business critical.

      Mainframes also come with their own viewpoint. On one hand, a company can buy enough x86 servers with clustering, redundancy, failover capability, and other items to reduce the MTBF of those servers to an acceptable level. On the other hand, a company can pay the ticket to the System z series and have one machine that has an extremely high MTBF with less of the need of a HA cluster. Even with all the clustering and redundancy of x86 machines, there is only so much lipstick you can put on a pig before it turns into a oinking ball of wax, so if some wants to go the x86 route, it will require a lot more employees to keep things running.

    51. Re:Speed times Quantity? by gorzek · · Score: 3, Interesting

      Yeah, it's actually kind of funny how today's Intel desktop processors actually trace their lineage to the Pentium M, which was a mobile chip. When the Pentium 4 came around, the Pentium Pro (Pentium II, Pentium III) architecture was pretty much relegated to the mobile market while Pentium 4 represented their desktop line. As you said, they ran into heat (and power) issues with the Pentium 4s and basically had no more room for expansion there. They went back to the Pentium M, which was doing pretty nicely in the notebook space, and since it was low-power and efficient it became the basis for their future desktop CPUs--the Core line, in particular. They just stopped playing up the clock speed because that architecture's clock speeds were substantially lower than the Pentium 4, despite being able to do more work. I read once that a Pentium M could do about 40% more work than a Pentium 4 of the same clock, so in essence a 2GHz Pentium M was about as powerful as a 3.2 GHz P4.

      Switching everything over to the low-power and parallel-friendly Pentium M line is probably one of the smartest things Intel ever did. They would've dug their own grave had they stuck with building on Pentium 4 to the bitter end.

    52. Re:Speed times Quantity? by hitmark · · Score: 1

      Ah, so they have embraced the net now. How nice.

      --
      comment first, facts later. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm
    53. Re:Speed times Quantity? by Dadoo · · Score: 1

      Calling it the fastest CPU is extremely misleading.

      I doubt it, in this case. Last time I checked (which, admittedly, was a few years ago), Power CPUs were capable of doing slightly less than twice the work of an Intel CPU, at the same clock speed. If that still holds true, that 5.2GHz Power CPU is roughly the equivalent of a 10GHz Intel CPU. Of course, I measured this with my own (probably subjective) benchmarks, so your results may vary.

      --
      Sit, Ubuntu, sit. Good dog.
    54. Re:Speed times Quantity? by Vryl · · Score: 1

      Pretty sure the golden screwdriver got replaced with a modem decades ago.

    55. Re:Speed times Quantity? by mlts · · Score: 1

      I'm sure an IBM rep would be more than happy to show you numbers of a 4.25 Ghz POWER7 compared to the equal x86/amd64 performance stats. Even if the POWER7 cores are running in TurboCore mode where half of them are shut off, and due to that, the ones which are active take the caches of the powered down cores and can run a little bit faster with clock speed.

      IBM CPUs are not slow by any means.

    56. Re:Speed times Quantity? by InfiniteWisdom · · Score: 1

      I don't know about the same room, but in the same datacenter, certainly.

    57. Re:Speed times Quantity? by knarf · · Score: 2, Informative

      Clock rate is no longer the key variable in comparing processors, unless they are of the same microarchitecture.

      Clock rate has *never* been the key variable in comparing processors. Even back in the heady days of 1 MHz 6502/6510 vs 4 MHz Z80 the comparison was useless - the 6510 does way more per cycle than the Z80 and ends up being comparable speed-wise.

      --
      --frank[at]unternet.org
    58. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      There's also the distance between interconnects. At speeds where we are now, you are severely limited. Past that range you find that a whole clock cycle (crest and trough) has passed before the impulse can be returned.

    59. Re:Speed times Quantity? by Locke2005 · · Score: 1

      isn't a quad core 3.1 Ghz per chip with 20% logistic overhead faster? Yes, for problem sets that are well suited to parallel processing. No, for problem sets where every computation is dependent on the result of a previous computation.

      --
      I've abandoned my search for truth; now I'm just looking for some useful delusions.
    60. Re:Speed times Quantity? by David+Greene · · Score: 1

      In some fields, the solutions will have to be in software. With some massively parallel codes, even a 2x speedup in interconnect speed won't help that much. It's analogous to trying to speed up an exponential algorithms by increasing clock speed. In the end, it just doesn't matter. Faster interconnect will help a lot of things but we're going to have to bite the bullet and learn how to program again.

      --

    61. Re:Speed times Quantity? by asliarun · · Score: 1

      That's exactly what I was trying to say as well. In the last 4-5 years, Intel and AMD have made dramatic improvements in CPU throughput and overall system performance, especially in the server space. The current Nehalem and especially the upcoming Sandy Bridge architecture gives you a performance jump of at least 200%-300% in most workloads over the older x86 server chips. Intel has basically been doubling its performance every 2 years, and the performance jump in Nehalem is all the more dramatic because Nehalem was specifically built to improve server performance (new point-to-point interconnect architecture, big improvements in floating and int processing).

      Just look at the virtualization performance of a Nehalem 2-Way or 4-Way server - you can basically retire 4 older servers and just install one of these.

      Look, I'm not trying to be a fanboy here, just pointing out the fact that IBM may not have sufficiently caught up with the rate of improvement that Intel and AMD have been making in server CPUs. Heck, they're even copying over many of the RAS features to improve failure detection and recovery.

    62. Re:Speed times Quantity? by QuantumBeep · · Score: 1

      In technical support, I've talked to, oh, dozens of geezers who talk about how they programmed COBOL in the 80s. They're addled, harebrained Packard-Bell daredevils, to a one.

    63. Re:Speed times Quantity? by LWATCDR · · Score: 2, Informative

      It has been a while but really?
      I have never seen a mainframe that didn't use Zulu time. Also in the shop I worked all software was quality verified. One machine was at the five year uptime mark when I left but it was a none commercial system.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    64. Re:Speed times Quantity? by InfiniteWisdom · · Score: 2

      According to the Passmark benchmark, a 3.20 GHz scores 524, compared to 10221 for a 3.20 GHz Core i7 970 six-core CPU. That works out to 3.14 times faster per core than the Pentium 4. While short of 4-5, the GP is not as far off the mark as your ridicule would suggest.

      I actually think YOU (and the cretin who modded you insightful) fail.

    65. Re:Speed times Quantity? by avandesande · · Score: 1

      IBM is not stupid- obviously there are customers that want to buy these systems and IBM is the only company in the world with ability to make them.

      --
      love is just extroverted narcissism
    66. Re:Speed times Quantity? by Anonymous Coward · · Score: 1, Interesting

      There is a bank in Canada that had zero (as in none) downtime -scheduled or not- in twelve years....That includes hardware upgrades, software upgrades, application upgrades ..... This is what a mainframe is all about

    67. Re:Speed times Quantity? by peteinok · · Score: 1

      Talked to an IBM guy today about this. As far as purchase price, think 7 figures (before company-specific incentives).

    68. Re:Speed times Quantity? by QuantumBeep · · Score: 1

      But it is that much faster.

    69. Re:Speed times Quantity? by m50d · · Score: 1

      I always wonder - for (say) pure arithmetic workloads, the kind that the fancy do-more-per-tick stuff isn't going to help with, are the Pentium 4s going to be the fastest CPU you can get forever?

      --
      I am trolling
    70. Re:Speed times Quantity? by sexconker · · Score: 1

      Calling it the fastest CPU is extremely misleading.

      I doubt it, in this case. Last time I checked (which, admittedly, was a few years ago), Power CPUs were capable of doing slightly less than twice the work of an Intel CPU, at the same clock speed. If that still holds true, that 5.2GHz Power CPU is roughly the equivalent of a 10GHz Intel CPU. Of course, I measured this with my own (probably subjective) benchmarks, so your results may vary.

      Sir, you seem to have drank deep of the kool aid.
      But it's okay. You can stop now. Even Steve Jobs himself admitted that PPC was shit for the desktop and switched to Intel.

      The PPC chips were always inferior, and every benchmark showing otherwise was cooked.

    71. Re:Speed times Quantity? by Sheik+Yerbouti · · Score: 1

      You sir are dead wrong, could not be more so. See the Athlon 64 vs. the Pentium 4 for a good example of why you are so horribly wrong. Or for that matter even better example 2.4 Ghz core 2 duo vs. 3.8 Ghz pentium 4. It's been a while since the industry figured out that better CPU architecture like multiple execution units beats a high clocked long pipeline architecture. This is a geek site why and how did this ever get modded up?

    72. Re:Speed times Quantity? by sexconker · · Score: 2, Informative

      The word you are looking for is "sleight".
      Sleight of hand.

    73. Re:Speed times Quantity? by TheRaven64 · · Score: 2, Informative

      The X86 is also an old CISC architecture.

      Actually x86 is a new CISC architecture. The System/360 architecture predates it by over two decades. x86 was about the last CISC ISA to be developed outside of a few tiny niches.

      Actually the Power line is RISC anyway. When it is used in a ZMachine the old style 360/370/390 CISC ISA is translated to RISC and then executed

      Umm, no. POWER is RISC (well, RISC purists would say that's stretching the point), but POWER and System/z are completely unrelated. The POWER6 and z10, and POWER7 and this chip, were designed by cooperating teams, so they share some execution units, but they are very different architectures. This is not a POWER CPU running a System/360 emulator, it's a machine with a CPU that happens to have a few pipelines in common with a POWER CPU.

      --
      I am TheRaven on Soylent News
    74. Re:Speed times Quantity? by LWATCDR · · Score: 1

      Not really. You must step out of the PC mindset. Imagine a bank with thousands of ATMs located all over the planet.
      Or a Telcomm with a million users. Things like billing by the minute?

      We are talking about huge number of transactions. These are high value transactions not thinks like tweets or even Goggle searches.
      You will want things like auditing and roll back and commit on those transactions that you just don't get on NOSQL databases.
      For that type of a system a Mainframe is a very good solution.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    75. Re:Speed times Quantity? by mdf356 · · Score: 1

      Didn't we retire the Ghz wars 5 years ago?

      AFAIK, mainframe users are usually charged by the cycle. This is one reason co-processors are hot in mainframes -- the licensing and charges are for the main compute CPUs, so farming off work to a dedicated Java CPU or dedicated Crypto CPU doesn't get charged at the same rate.

      So more cycles per second on the main compute CPUs means the owner charges more per unit time, assuming they maintain the utilization. That's profit for the mainframe owner, and the users benefit too because their jobs take less time to run.

      --
      Terrorist, bomb, al Qaeda, nuclear, yellowcake, kill, assassinate. Carnivore is dead... long live Echelon.
    76. Re:Speed times Quantity? by mattholimeau · · Score: 1

      That's like saying a Ferrari is a poor performance car because it can't compete against a Ford Focus on cost-per-max-speed or miles-per-gallon.

      Well - it's like saying a ferrari won't sell well (i.e. compete) because it can't compete on a cost-per-max-speed or miles-per-gallon, certainly, that's exactly what he's saying, you hit the analogy on the money. Poor performance though? You're putting words in his mouth - in fact the opposite of the ones that came out. Clearly the ferrari can go faster and accellerate faster, but you don't need that to get to work.

      Mod parent down, please.

    77. Re:Speed times Quantity? by Dadoo · · Score: 1

      Sir, you seem to have drank deep of the kool aid.

      Umm... no. One of our main data servers is an AIX (Power 5) machine. It's old enough that we'll be replacing it soon. Its replacement is an Intel-based HP server. (It's so new, we haven't even finished testing it, yet.) Why? Because in my experience, IBM doesn't want to bother with customers that don't have a lot of money to spend. They just don't cut it, anymore, in the lower-end (less than $100,000) server market.

      Even Steve Jobs himself admitted that PPC was shit for the desktop and switched to Intel.

      First, we're talking about server CPUs here, not desktops. Second, did he make that decision based on raw performance, or price/performance ratio? The desktop CPU market is much more sensitive to that kind of stuff.

      --
      Sit, Ubuntu, sit. Good dog.
    78. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      Electron saturation at a specific transister size determine Thermal capacities which limit die volume; if you make a dense chip that draws a lot of watts and is big, it will warp and stop working. There's a limit to the number of transisters you can fit on a chip.

      For every logical core you add to a system, you have to dedicate transistors towards proper path execution and bus topology.

      When you increase MHZ you're stressing the capacitance of the processor more; things don't want to get rid of their electricity fast enough and until you build a smaller die and better materials it won't.

      Two 1.6GHZ processors will be slower than 3.2 GHZ WITH THE SAME TRANSISTOR COUNT BETWEEN CHIPS. This is the reason Intel pushed clock speeds for so many years. This is also the reason, when comparing two seemingly similar chips, I'll check the transistor count of the chip and the speed. I don't know why they don't do this but it's a great way to tell how fast something will be.

      Now the reason AMD and Intel push multicore systems is because they can sell the defective quadcore models as triple, dual-core and single core units without anyone knowing the better and get a higher dollar-to-wafer yield then back in the day when they'd cut a bunch of chips and the good ones were clocked at a higher rate than the cruddy ones. That's important when your sales are getting leaner. The other advantage is cross compatibility; if you build a bus into the processor less has to go into the motherboard to make it work; you can keep the processor bus more or less the same and have it work with 3 or 4 generations of kit like with AMD's AM1/AM2/AM2+/AM3 sockets.

       

    79. Re:Speed times Quantity? by thoriumbr · · Score: 1

      In the past mainframes were used to do single task long running batch jobs. Today, they run Linux.

    80. Re:Speed times Quantity? by QuantumRiff · · Score: 1

      The Intel Sandy Bridge or Nehalem CPU for example may be running its 4 cores at a clockspeed of 3.2GHz but overall, each core in the CPU is easily 4-5 times faster than a 3.2GHz Pentium4 core.

      Sorry, you lost me at this point. Now if you would have said 40% faster, than Maybe (although that would be a stretch).. but not 400-500% faster, core for core. The CPU architecture is not that advanced. Sure, memory is faster, drives are a bit faster, etc, but the processor itself is still not much faster... Now 4 to 5 times more power efficient per operation, maybe...

      --

      What are we going to do tonight Brain?
    81. Re:Speed times Quantity? by Guy+Harris · · Score: 1

      Actually the Power line is RISC anyway. When it is used in a ZMachine

      So where are Power processors used in System z machines (except as I/O processors or blade add-ons)?

      The ZSystem ISA is so high end it is almost a high level language so the translation doesn't really effect performance much at all.

      OK, you're confusing System z, which is ultimately a descendant of the old System/360, and whose instruction set is a 16-GPR CISC instruction set, but not "so high end it is almost a high level language", and is directly executed by hardware plus microcode or millicode (or whatever IBM calls the millicode/PALcode/whatever in the newer processors), with System i, which is ultimately a descendant of the old System/38, and whose "instruction set" (TIMI) is an uber-CISCy instruction set that's not executed directly, it's translated to the "real" native instruction set, which used to be a somewhat 360-ish CISC instruction set (IMPI) and is now an extended form of Power (Power/AS).

    82. Re:Speed times Quantity? by samwichse · · Score: 1

      Wow, you could run Windows 95 just out of the cache on this! Imagine how awesomely fast that would be!

      *ahem*

      Sam

    83. Re:Speed times Quantity? by ErroneousBee · · Score: 1

      There are just not a lot of exploits on Zsystems.

      There's plenty of exploits. Its just that the risks are enormous, as the systems are monitored, and security logs are audited, and the guys doing the auditing and monitoring don't take prisoners.

      --
      **TODO** Steal someone elses sig.
    84. Re:Speed times Quantity? by CanadianRealist · · Score: 2, Funny

      Not saying I'd recommend it, but if that's the measure that you want to use then I'd say a cheap clunker could probably beat the both of them*.

      *Take the rest of the cash your would have spent buying either one and spend that on blowjobs.

    85. Re:Speed times Quantity? by LWATCDR · · Score: 1

      I thought that they had moved the System/Z to RISC emulation a while ago. I know they did with the old AS/400 and thought they had with the 360/370/390. And I sure wouldn't called the X86 as new. That is why I said it was also an old CISC system. Also it was only a bit over one decade 12 years I believe. Also the 8086 was built off the 8080 and 8085 ISA so it is pretty old and crusty. And let's face it the 360 ISA is considered one of those milestone ISAs like the VAX and ARM unlike the x86 which is considered more of a millstone than a milestone.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    86. Re:Speed times Quantity? by sexconker · · Score: 1

      Again, you're an idiot.

      The top of the line x86 chips have always spanked PPC chips. The only way to get different results was to intentionally gimp benchmarks and compilers.

    87. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      "IBM defines the z196 as one of the few remaining CISC chips, which allows for bulky, large programs that can require much more memory to execute in than RISC chips, including the PowerPC and ARM embeddded processors, among others."

      The above, from TFA, makes the whole article suspect. How do such technological nitwits get to write tech articles? RISC v. CISC has nothing to do with the complexity of the code that can be executed, or the amount of memory accessible. Large programs that can require much more memory to execute [in] than RISC ....

      Ahhhhhhhhhhhhhhhhhhh..

      Morons.

    88. Re:Speed times Quantity? by Guy+Harris · · Score: 1

      I thought that they had moved the System/Z to RISC emulation a while ago.

      Nope. The processors directly execute the simpler instructions; the more complicated ones trap to millicode, which are special subroutines that might get their own register set to use and have some special millicode-only instructions they can run as well. (People familiar with Alpha might recognize this....)

      I know they did with the old AS/400

      What they did with the AS/400 was move from translating TIMI->IMPI to translating TIMI->PowerAS, so they didn't move from directly executing TIMI to RISC emulation, they moved from translating TIMI to a CISC "native" instruction set to a PowerAS RISC "native" instruction set.

    89. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      product of crap programmers Sorry to ask but who does IBM see using this? At the price point and data sets that need sorting? - cheaper clusters or more expensive faster unique chips depending on math?

      So which do you plan to do first with your cheap PC, replace it or reboot it?

    90. Re:Speed times Quantity? by commodore64_love · · Score: 1

      >>>There are millions (and I mean millions) of cable subscribers in apartment building that cannot have 'free' TV of any quality. Rabbit ears != decent reception

      Boy you are soooo wrong. Rabbit ears + UHF loop work just fine within a 20 mile radius of the station, as my parents can attest. That combo will provide a near-perfect digital image. For apartments located further away, you can get a large antenna like the CM4228 and set it next to the TV (or the balcony). That's what I did in my old apartment and got stations upto 60 miles distance.

      --
      "I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
    91. Re:Speed times Quantity? by Daniel+Phillips · · Score: 1

      Yeah, it's actually kind of funny how today's Intel desktop processors actually trace their lineage to the Pentium M, which was a mobile chip

      And the pentium M traces its lineage to the Pentium 3, which intel tried to abandon in favor of the Pentium 4 with Netburst[tm].

      --
      Have you got your LWN subscription yet?
    92. Re:Speed times Quantity? by Anonymous Coward · · Score: 0

      z196 != POWER7
      half their features are built exactly the same but hte other half is _SO_ much different, that i cant count them all.
      power is numbercrunching, z is throughput.

    93. Re:Speed times Quantity? by cheesybagel · · Score: 1

      The original Pentium 4 (Willamette) was quite bad, the second iteration of the architecture (Northwood) was better. The Pentium 4 core had many issues. The pipeline was very long so if you mispredicted a branch, or had a cache miss, the processor would stall for a long time. This meant the IPC (instructions per cycle) executed by a Pentium 4 core were lower than that of the Pentium III, or the Pentium M.

      The Pentium 4 also used something called a trace cache used to hold intermediate translations of x86 CISC code to RISC like micro-ops. The RISC like code was then executed by the CPU core. The trace cache was stored in the space traditionally used by the L1 cache in a processor. The trace cache was used to avoid translating the same CISC code again and again. Yet these RISC like instructions were less dense than x86 CISC code so the L1 cache in a Pentium 4 could hold a lot less instructions per kB. Which meant it was necessary to fetch instructions from the slower L2 cache more often than in previous architectures. The AMD Athlon in comparison only stored some additional pre-decode bits in the L1 cache.

      The Pentium 4 family processors would perform nicely for well behaved, highly tuned, code but little else. Plus the high clockspeed made the chips extremely hot. The worst processor of the family in that regard was probably Prescott.

    94. Re:Speed times Quantity? by sjames · · Score: 1

      On the bright side, it's at least scheduled. IBM mainframes can go decades without unscheduled downtime.

    95. Re:Speed times Quantity? by Randle_Revar · · Score: 1

      The cpu called "Core" or "Core 1" was indeed a die shrink and tweak of the Pentium M (the first "tick" (retroactively)).

      But the "Core microarchitecture" was the cpu line called "Core 2" (the first "tock"). Theoretically, the Core 2 is as different from the Core 1 as Nehalem (i7/i5/i3) is from Core 2.

    96. Re:Speed times Quantity? by Randle_Revar · · Score: 1

      AMD was still a threat when Intel dropped netburst (and thus had to give up on MHz, since P-M/Core was much, much slower in pure MHz).

    97. Re:Speed times Quantity? by MichaelSmith · · Score: 1

      I work with several very good programmers older than 60, but that is in a large engineering multinational where people tend to stay for a long time.

    98. Re:Speed times Quantity? by Demonslayer1337 · · Score: 1

      http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)

      A small summary after you read about the processor,
      It is CISC architecture and employs cores that handle RISC architecture.
      the system maxed totals out to 24 5.2Ghz Quad Core Processors.
      Each core of a processor has 64KB L1 instruction cache,
      128KB L1 data cache, and 1.5MB L2 cache.
      Each quad core processor has 24MB L3 cache, 100MB L4 cache,
      connected to a storage controller chip with an additional 96MB L4 cache.
      The total L1-L4 cache of the entire system is 8.81GB for
      24 5.2Ghz Quad Core processors utilizing 8 storage controller chips.
      It drains 7.2kW of power just for the processors

      Of course, these are not all features, possibly not even the best features,
      but to post all of that I would simply be copying and pasting the wiki

      In short, for a mainframe system, this is far superior than a desktop intel quad core i7 at 3.2Ghz

    99. Re:Speed times Quantity? by sapgau · · Score: 1

      Wouldn't using GMT free you from time zones and seasonal time saving changes?

    100. Re:Speed times Quantity? by bhtooefr · · Score: 1

      To expand on the point, System/38 was intended to be the real replacement for System/370, and this was when IBM was just getting their feet wet with RISC. They knew they wanted to change CPU architectures somewhere in the middle of the System/38 or a successor's run, so they went the TIMI route. They were working from a clean sheet, and this way they could make massive changes with minimal effort.

      Between the antitrust stuff, the System/370 being so entrenched, and the System/38's performance being poor, though, IBM ended up staying on the S/370 path.

      System/370 is essentially an extended System/360 with (on all but two models, IIRC) virtual memory, ESA/390 is a 31-bit System/370 (and 370/XA was also 31-bit, IIRC) with a new I/O system, and z/Architecture is a 64-bit ESA/390. It's still essentially the System/360 architecture that came out in 1964.

    101. Re:Speed times Quantity? by travellersside · · Score: 1

      Yes, it means that I can finally play Dwarf Fortress at a decent speed!

    102. Re:Speed times Quantity? by mikechant · · Score: 1

      It has been a while but really?
      I have never seen a mainframe that didn't use Zulu time.

      Maybe it's US/UK thing. Maybe because of the many timezones and DST differences in the US, mainframes (which will tend to be often serving the whole country in the case of banks etc.) are nearly always run on UTC?
      I've had experience of about 20 different companies' mainframes in the UK over the last 25 years, and about 18 of them either set the system time to GMT(UTC) or BST(UTC+1) as appropriate for the time of year, or (much more common in recent years) have the system clock set to GMT all the time and set the local time offset to +1 for BST.
      As per my comment above, although z/OS itself tolerates putting the local time offset forward or backwards dynamically, most databases, applications etc. do not tolerate it going backwards.

    103. Re:Speed times Quantity? by ps2os2 · · Score: 0

      But the main thing is that not all programs are multi-threaded, and a program with a single thread can only run on one processor. So yeah, GHz are still useful. Maybe for large single-thread batch processing - which is the kind of thing a mainframe would do.

      I guess you have not worked on any modern IBM mainframe. They are a different beast that say 15 years ago. All modern online systems can and do take advantage of what you call threads but the current people call something else entirely.
      While it may be true that some batch processing mostly run under one "thread" (to use your term) they can/do take advantage of multiple CPU's. If they are halfway modernized I suspect they can fully take of multple CPU's.
      There is a *LOT* of running one job on mutiple cpu's, mind you its under the covers and the user doesn't really see it, it does happen. As for the interactive side (developement) it was designed from day 1 to run on multple cpu's concurently with IBM's workload management (all Z/os systems run it to some extent or other) was designed to get work out as fast as the system can supply the cpu & storage That is one of the major reasons why Z/os can run at 100 percent cpu utilization 24 hours 7 days a week and pump out more work than most (all?) systems. With the new systems I understand that System Z systems can be gotten with terrabytes of storage. Now I/O is as fast (or faster) than any other system that is available. At one time the biggest bottleneck was I/O and with advances in the OS and faster controllers and large capacity hard drives it gets the work out faster and cheaper than any other PC system out there.
      I do not work for IBM I just work on the Z/os and find it steps above any other PC system. The cloud that is the current buzzword in the PC world has been doable for say 10 years (at least) its called GDPS and the systems can be quite a distance from any data center. I wouldn't mind if people would learn the whole story.

    104. Re:Speed times Quantity? by ps2os2 · · Score: 0

      I'm betting the code used on these z196 systems is multi-threaded. Shit, if you're paying hundreds of thousands of dollars per CPU you can afford some top notch programmers.

      Actually I think this mainframe is for getting the last little bit of performance out of thirty year old cobol code. And the original top notch programmers are long dead.

      I wish you would take a refresher code as to what the new Z/os systems can do. I think you would be amazed, items such as being capable of running UNIX type system while still run MVS. No emulators all the code is done by the OS (on a single system no real hypervisor either) The hypervisor is only used to partition off different OS's. No I am not talking about VM the IBM hypervisor is buried in the firmware. The workload manager (new with Z/os) had its ancestor that worked very well workload manager is able to fine tune the work distribution a lot better.

    105. Re:Speed times Quantity? by ps2os2 · · Score: 0

      FYI, if the application was not designed to use the correct time then of course the entire system has to be IPL'd. Granted that people rarely if ever worried about the clock, it only came important when databases became involved and a point in time was really needed. Or in some small number of cases (mostly financial) had to care about it. If the system is running batch then no need to IPL until a point in time that is convienent to everyone can be done.
      CSA creap is almost a thing in the past, yes it still happens but if can be planned for. Almost all vendors have fixed that issue although new code does creap into the area. IBM has addressed this in several areas and you can not allow a vendor to get CSA if you choose not to allow it. Anytime there is a system area that people think it would be great to use there has to be careful design of it to make sure when you are done with it you free it, if you cannot guarentee it then you should not be using it. CSA is a limited system resource not to be used at will. IBM will tell you tough luck as if you are responsible for using it you must free it when you are done with it.

    106. Re:Speed times Quantity? by ps2os2 · · Score: 0

      While I was around in that time frame, my memory is a little different (but could be the same). S/38 was an outshoot of IBM's FS system. (Future System). I think I know one person that worked on it and he may be retired, like me.
      IBM's FS was (if stories are to be belived) however somewhere along the line reality at the time said CPU's weren't even close to be capable of running it. My memory says it was in the 70's and even IBM's fastest CPU was not fast enough at the time. I have heard stories about it and not knowing anything first hand I would suggest you find an IBM old timer and ask him/her what killed it off.

    107. Re:Speed times Quantity? by ps2os2 · · Score: 0

      The problem is that IBM and Intel/AMD do things fundementally different inside their system(s). I am not sure I can explain entirely how INTEL/AMD work. IBM's high speed buffer there are "things" that go on that help out the execution of the instruction before it gets to the point of execution. There is also something else that IBM does is predictive branching and operand decoding and instruction look ahead work that actually stream lines the execution. BTW their code *DOES* work countless billions times a day.
      IBM also has a way to serialize storage so only one person (program) can update the same storage location at a time. I do not know if INTEL can do that or not but if they don't I would not trust them at all. Many a time I had to shoot a bug in user code that didn't allow for that and its not fun.

  3. Price: RTFA by miketheanimal · · Score: 5, Informative

    The Z-series mainframes cost hundreds of thousands (or even over a million) dollars, not the chips. As it says in the article.

    1. Re:Price: RTFA by Anonymous Coward · · Score: 0

      ONE MILLION DOLLARS!!!!!

    2. Re:Price: RTFA by jtollefson · · Score: 2, Informative

      They're very expensive, but for Enterprise scale workloads they're cheaper than the comparable distributed system. The cost entirely depends on how many cores you're running, and more importantly your monthly usage. IBM bills you for your Iron depending on an average of how much you used it that month. There's a reason why Mainframes run so quick and fast, they're the only system where all processing from user ISPF interaction all the way to data processing is tracked. All that processing turns into your final bill with IBM, so upper management has a tendency to pay close attention to usage unlike other systems... But thankfully IBM lets them out on a monthly installment plan. They're kinda like QVC like that...

    3. Re:Price: RTFA by X0563511 · · Score: 1

      Wait... they monitor your use and charge you... on equipment you own?

      --
      For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
    4. Re:Price: RTFA by QuantumBeep · · Score: 2, Informative

      IBM mainframes are leased.

    5. Re:Price: RTFA by Jay+L · · Score: 1

      Heck, the bubble memory and COBOL license alone cost $100K. I don't even want to think about the tape reader.

    6. Re:Price: RTFA by bws111 · · Score: 3, Informative

      You can buy or lease the hardware. The software is licensed under contract.

      It seems like the GP is talking about software charges, not hardware. Software can be either monthly fee based or usage based. If it is usage based you must send a usage report to IBM so they can bill you. That is specified in the contract. In either case, the number of and performance of the CPs is calculated into the cost.

      Hardware is a different story. With hardware, the number of cores you purchase is not the same as the number you get. For instance, you can buy a 1 core machine, but what you get is 16 cores. Only 1 core is enabled in the firmware though. IBM has offerings (again under contract) where you can buy the right to temporarily enable additional processors instantaneously (like if you lost one of your datacenters and need to transfer the workload to another one). With these offerings, you also need to send usage info to IBM so they can bill you for the time that the additional cores have been enabled.

    7. Re:Price: RTFA by hguorbray · · Score: 1

      HP introduced this on the Superdome 7-8 years ago: iCOD -instant capacity on demand.

      It doesn't get a lot of press compared to virtualization or SSD or some of the other HW innovations.

      -I'm just sayin'

    8. Re:Price: RTFA by bws111 · · Score: 1

      I didn't mean it was something new. IBM's first use of it (then called Emergency Backup) on mainframes was in 1996.

    9. Re:Price: RTFA by Shag · · Score: 1

      Sorry, but as anyone who's ever diagrammed a sentence knows, the clause "Costing hundreds of thousands of dollars" clearly modifies the noun "IBM" in the summary. The chips and the mainframes, of course, would cost far less than purchasing the entire corporation.

      --
      Village idiot in some extremely smart villages.
    10. Re:Price: RTFA by Anonymous Coward · · Score: 0

      The Z-series mainframes cost hundreds of thousands (or even over a million) dollars, not the chips. As it says in the article.

      Naturally... after all they are for business servers.

      They aren't for your desktop :P

    11. Re:Price: RTFA by PingPongBoy · · Score: 1

      The cost entirely depends on how many cores you're running, and more importantly your monthly usage

      While I don't keep up with the mainframe way of doing things, I thought companies bought mainframes rather than time. In the old days, computers were rare and expensive so time was valuable and users watched how much time their jobs took. Why would anyone these days care to put up with this headache? Hiring people to tweak jobs and code in order to squeeze out wasted cycles seems to my naivete more expensive than merely buying extra hardware, given the lower prices of computing these days.

      Is IBM selling mainframes on the basis that if people paid by the hour they would be paying less than an upfront sale? The selling point would be the performance is so great that a huge savings could be achieved. On this basis, if the performance is scalable with the number of mainframes, it would justify the sale of multiple machines and thereby lead to savings based on economies of scale.

      This sales strategy (if that is what is actual) is clever in that once a company commits to hiring in order to optimize usage and reduce the expense of many mainframes, now finds it ought to utilize the mainframe staff to load more work on the mainframes because after all these people need something to do, and there should be some gain to using excess (and to some degree unlimited) computing capacity.

      In order to maintain this sales strategy the time must be pressured towards zero or else the time charges will eventually equal the cost of N depreciated machines, and it would be better off to cap the future charges by buying the machines as they've already been proven. Upgraded machines would provide the dilemma of paying full price for the newer models or running old clunkers, which eat valuable real estate

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
    12. Re:Price: RTFA by damien_kane · · Score: 1

      Wait... they monitor your use and charge you... on equipment you own?

      On equipment you own but an OS you license, and a support plan that if anything (name it and they cover it) stops or even just starts working slower than expected (even when redundancy kicks in to pick up the slack), they will send a tech out with parts, or overnight parts they don't have in stock to your datacenter to repair.
      This on-site service is part of the service plan.
      When even seconds of lost "time" can cost millions of dollars, a couple hundred grand a month is a trivial insurance plan.

  4. Great news for Mac OS X users! by squiggleslash · · Score: 4, Funny

    I can't wait to get a PowerMac G6 with this CPU, in your face Dell users with your commodity Intel-based desi... oh, wait.

    --
    You are not alone. This is not normal. None of this is normal.
    1. Re:Great news for Mac OS X users! by fuzzyfuzzyfungus · · Score: 4, Funny

      The PowerMac G6 would be pretty impressive. The PowerBook G6 manual would include the following phrase:

      "Please note: The revolutionary new MagsafePro 3-Phase/480 power connector is not backwards compatible with the Magsafe connectors of prior, non-containerized Mac Portables."

    2. Re:Great news for Mac OS X users! by UnknowingFool · · Score: 2, Informative

      Unfortunately this chip will most likely go into workstations and servers. In order for IBM to make a desktop version, it will have to make a custom chip to handle things like video, sound, etc. This will lead to same logistical problems for Apple that it had before. Manufacturing companies do not want to keep excess inventories whether it was Apple or IBM. If Apple needs more, it will have to wait while IBM rearranges their manufacturing schedules to compensate. Also even if Apple orders millions of these, it will still be a small customer to IBM; IBM's internal divisions would order more of the stock chip. And the last reason Apple will not go back to IBM; IBM's mobile chip offerings lag way behind Intel's. IBM never made a mobile G5 chip. My guess is that they could never make one that had acceptable power consumption. IBM could do it with enough R&D but again it would be for a very small customer. Not worth enough to the bottom line.

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    3. Re:Great news for Mac OS X users! by BrentH · · Score: 1

      IBM uses Hypertransport as interconnect, right? That would imply that you can slap any old AMD chipset to such a chip, wich has all the desktop-features you need.

    4. Re:Great news for Mac OS X users! by splutty · · Score: 1

      Uhm...

      We're talking about Z-series mainframes. These are absolute beasts, with all the cooling, memory and processing speed that would leave a desktop in the dust without any problems whatsoever.

      However putting this sort of hardware in a desktop is extremely prohibitive for a ton of reasons, one of the most important being cooling. You'd need a room just for that...

      --
      Coz eternity my friend, is a long *ing time.
    5. Re:Great news for Mac OS X users! by TheRaven64 · · Score: 4, Informative

      Wrong chip family. This is the Z-series mainframe chip, using an instruction set that is backwards compatible with the System/360 stuff from back in 1960 (the architecture of the future, as the marketing material trying to persuade my university to upgrade their IBM 1620 put it). The PowerMacs were using PowerPC chips, which use the same instruction set as the POWER CPUs from IBM (they used to be similar, with a common subset, now they are identical).

      The chip that this is replacing, the z10, was designed concurrently with the POWER6. They share a number of common features, including a lot of the same execution engines (both have the same hardware BCD units, for example, as well as more common arithmetic units), but they are very different in a number of other aspects, including the instruction set, cache design, and inter-processor interconnect, because they are designed for different workloads.

      I've not read much about this chip yet, but I think it shares some design elements with the POWER7, in the same way that the z10 did with the POWER6.

      In short, while some of the R&D money spent on this CPU made it into chips that could, potentially, run OS X, this chip itself could not without a major rewrite.

      --
      I am TheRaven on Soylent News
    6. Re:Great news for Mac OS X users! by Chris+Mattern · · Score: 1

      It's not a PowerPC chip anyways. It's zSystem architecture, which is actually the modern-day descendant of what was originally the System/360.

    7. Re:Great news for Mac OS X users! by CAIMLAS · · Score: 1

      I know you're trying to be funny, but the people responding to you don't seem to know what they're talking about.

      The Powermacs of old didn't have these IBM chips. Hint: the word "power" has significance, here. PowerMac, PowerPC... IBM POWER? Could be!

      The IBM POWER7 has been out for some time now:

      Personally, I'd be tickled pink to get ahold of even a Power5 system (p Series), or one of those awesome late-generation UltraSPARCs. WOW. Talk about power! Not much I'd be able to do with a z-series.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    8. Re:Great news for Mac OS X users! by UnknowingFool · · Score: 1

      My point exactly. The PowerPC chips that Apple used were highly customized versions of the ones IBM made for their servers. They weren't as powerful but more designed for consumer use. Even then cooling was somewhat problematic for Apple.

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    9. Re:Great news for Mac OS X users! by UnknowingFool · · Score: 1

      It's not just a matter of the interconnect. PowerPC chips generally are made for servers. They aren't designed for things like video and audio. While they can handle them without any modifications, they should be optimized to allow the chip know that such things will be offloaded to the GPU and sound chip. Another thing is cooling. Even though Apple helped design the chips they used, the PowerPCs needed lots of cooling. Considering how Apple is always trying to shrink their computers (see the new Mac mini), using it would be counter productive to Apple's goals.

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    10. Re:Great news for Mac OS X users! by Guy+Harris · · Score: 1

      this chip itself could not without a major rewrite.

      Well, actually, most of OS X probably wouldn't require a rewrite - it's in C/C++/Objective-C, and can run big-endian (courtesy of having worked, up to Leopard, on PPC machines, and much of it still has to for Rosetta) and 64-bit. The assembler code (in the kernel, libSystem, and some other places) would need new versions, the assembler and linker would need a lot of work (no, OS X doesn't use gas and gld), including support for z/Architecture in Mach-O, and the compiler guys might have to do some work to support Objective-C (for gcc, there's already z/Architecture support; llvm might have to have a z/Architecture backend written).

      But it's still a lot of work, and the processor isn't particularly oriented towards the sorts of machines Apple sells, so it ain't gonna happen.

  5. Re:Yeah, I read about this by TaoPhoenix · · Score: 0, Troll

    Fark is consistently a whole 1-2 days faster than Slash-D lately. And their "Idle" section is better.

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
  6. True, true. For now. by AltGrendel · · Score: 1

    But it will be obsolete by the end of the month.

    --
    The simple truth is that interstellar distances will not fit into the human imagination

    - Douglas Adams

  7. speed meh by bakamorgan · · Score: 1

    They need to work on crunchin more cores on 1 dye then get programs up to speed where they can utilize the extra cores.

    1. Re:speed meh by the+linux+geek · · Score: 1

      Mainframe programs have largely been parallel for decades. This thing isn't designed for running Crysis.

    2. Re:speed meh by hedwards · · Score: 1

      Perhaps not, but it's what you need if you want to put the graphics settings on high.

    3. Re:speed meh by NotOverHere · · Score: 1

      Would it work for multiple copies of Crysis? Say on a couple dozen VT220s?

  8. CISC to save RAM? by MichaelSmith · · Score: 1

    IBM defines the z196 as one of the few remaining CISC chips, which allows for bulky, large programs that can require much more memory to execute in than RISC chips, including the PowerPC and ARM embeddded processors, among others.

    For CISC you need more bytes per instruction, because there are more instructions. With RISC your executable has more instructions but they each use less storage.

    I am not sure I believe their implication that CISC is better for humungus commercial applications. Sounds like marketing speak to management to me.

    1. Re:CISC to save RAM? by John+Meacham · · Score: 1

      Actually, CISC uses less memory in general, but has traditionally been slower. CISC CPUs came out when memory was extremely expensive relative to CPU speed. cheaper memory is what made RISC (with its larger footprint but faster speed) possible. Nowadays, it really doesn't matter much, CISC is probably better nowadays that memory bandwidth is the big bottleneck. However, our CISC designs are not exactly modern, if you were to do a modern CISC design you would probably end up with something more akin to ARM's thumb instruction set.

      All in all, x86 didn't end up too horribly off, its plethora of addressing modes actually makes smaller code on 64 bit systems because integer arithmetic can be 32 bits by default as you rarely need to directly operate on 64 bit values in arbitrarry ways as the addressing modes can perform most pointer arithmetic that is needed. Matching the 32 bit 'int' and 64 bit pointer on x86-64. Not that there arn't issues with x86, but being CISC in and of itself isn't one of them.

      --
      http://notanumber.net/
    2. Re:CISC to save RAM? by TheRaven64 · · Score: 1

      CISC and RISC are marketing terms that incorporate a lot of loosely connected design elements. Most CISC architectures use variable-length instruction encodings. On x86, for example, a number of common instructions are a single byte, while the longest ones are 15 bytes. A RISC architecture typically has fixed-length instructions, typically either 4 or 8 bytes (although ARM chips tend to also support Thumb and Thumb-2 instruction sets which use a 2-byte encoding).

      This is why x86 chips need smaller instruction caches than SPARC or Alpha machines. The instructions do more, and the instruction encoding uses something like an ad-hoc version of Huffman encoding, where the more common ones use shorter sequences.

      --
      I am TheRaven on Soylent News
    3. Re:CISC to save RAM? by JasterBobaMereel · · Score: 1

      This is a mainframe, it is designed to run many virtual machines, natively

      Memory in this context is cheap, fast and readily available - all the things it is not on most RISC systems

      If the chips are designed properly (which with IBM they will be) then for the tasks they are designed for they can be faster ...for other tasks they may well be slower

      --
      Puteulanus fenestra mortis
    4. Re:CISC to save RAM? by Rockoon · · Score: 1

      For CISC you need more bytes per instruction, because there are more instructions.

      pssst... not true. RISC machines have more registers to offset their lack of read/modify/write instructions. Doubling the number of registers has a greater effect on instruction size (2 more bits needed to encode twice the source (+1) and destination (+1) regs) than doubling the number of instructions (1 more bit needed)

      --
      "His name was James Damore."
    5. Re:CISC to save RAM? by RAMMS+EIN · · Score: 1

      ``For CISC you need more bytes per instruction, because there are more instructions.''

      Not quite. In addition to having more complex instructions than RISC CPUs, CISC CPUs typically also have less regular instruction encoding than RISC CPUs. For example, on traditional MIPS (the canonical example of RISC), every instruction is 32 bits. On x86, at some point, instruction length varied from 8 to 120 bits. Many of the more commonly used instructions fit in 8 to 24 bits. In other words, these instructions are actually _smaller_ than those of the MIPS. This is actually the main contributing factor to smaller code size on x86 vs. MIPS.

      There are also two factors that mitigate MIPS code size, compared to x86 code size: having 3 operands instead of 2, and having more registers. On MIPS, typical instructions have 3 operands: two sources and a destination. On x86, typical instructions have 2 operands: one that is used as both a source and a destination, and one that is only used as a source. But the real kicker is the number of registers: x86 has about 6 general purpose registers, whereas MIPS has about 30. The result is that, on MIPS, many operations are performed directly on registers (function arguments, local variables, temporaries) which on x86 are performed on memory locations (relative to the stack pointer or frame pointer). In other words, MIPS can usually say "set register x to register y plus register z", where x86 would say "load the value at frame pointer + 4 into eax, add the value at frame pointer - 8 to eax, store the result at frame pointer + 24". That's not only more instructions, but also involves 3 memory accesses, which are (on average) slower than register accesses.

      --
      Please correct me if I got my facts wrong.
    6. Re:CISC to save RAM? by LWATCDR · · Score: 1

      "For CISC you need more bytes per instruction, because there are more instructions. With RISC your executable has more instructions but they each use less storage."
      Actually no.
      RISC uses fixed length instructions while CISC uses variable length.
      Also in RISC everything is done in registers so let's say you need to increment a variable in memory.
      On a RISK system you would first load it which would be two 32 bit accesses one for the instruction and one for the pointer.
      Then you would have the increment instructions which would be another 32 bit memory fetch.
      Finally you would store the value back in memory which would be two more 32 bit values,
      So this would take 20 bytes of memory space.
      Now this is a pure RISC ISA. Adding one to a memory location is such a common operations that I am sure that many RISC ISAs have a load and increment instruction.
      In that case you are down to just 16 bytes.
      Or course both cases are assuming that we have a free register to work with and I am not checking for an overflow flag.
      Now lets look at a theoretical CISC ISA.
      On this cpu we have a 16 bit increment memory location so we have two bytes for the instruction and four bytes for the pointer.
      Total memory used is just six bytes.
      Or if the increment is 32bits then it you are at 8 bytes.
      Each instruction in CISC does more than in RISC.

      The reality is that most CPUs today are RISC at heart even the X86s.
      They have hardware decoders that decode the CISC instructions into RISC instructions that the CPU then executes.
      Even RISC chips often break with RISC every now and then.
      A lot of RISC CPUs have things like a memory copy instruction to copy memory from one location to another. Other wise to copy a byte you would need a load and save for each copy.

      Pure RISC was great when memory ran at CPU speeds or at least very close. Now that CPUs have gotten a lot faster than memory pure RISC is not as good. But RISC has changed as well over time. The Power line of RISC chips is anything but Reduced!

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    7. Re:CISC to save RAM? by mlts · · Score: 1

      RISC machines in general have a lot more registers to mess around with. Take the Itanium for example:

      128 integer regs, 128 Floating point spaces, etc.

      With amd64 assembly, you have rax, rbx, rcx, and rdx. If you need to juggle more than that, you go to memory. With the Itanium, you can just keep playing with numbers and do some serious calculations without leaving the CPU's die. More registers also help cryptography operations where you have to keep shifting rows of data up/down/left/right for AES, or doing large multiply operations as per RSA.

    8. Re:CISC to save RAM? by Rockoon · · Score: 1

      With amd64 assembly, you have rax, rbx, rcx, and rdx. If you need to juggle more than that, you go to memory.

      You are short a few.

      For x86 we've got 8 "general purpose" registers, 8 more SSE registers, and the FPU shares with the 8 MMX registers.

      AMD expanded the register space to 16 "general purpose" registers, 16 SSE registers, and continues with 8 MMX registers.

      So 40 registers in all, many of them vectors of multiple words.

      AVX will likely add another set of 256-bit registers.

      --
      "His name was James Damore."
    9. Re:CISC to save RAM? by Anonymous Coward · · Score: 0

      For CISC you need more bytes per instruction, because there are more instructions. With RISC your executable has more instructions but they each use less storage.

      Not really. One of the things that distinguished early RISC (and still does in things like ARM) is that the whole core is simpler, including the decode unit, it's not just about # of instructions. So this means favoring fixed length vs variable length.

      So even if an x86 has "more instructions" many of them are encodable in one or two bytes, or 3 bytes. With a classic MIPS or ARM everything takes 4 bytes. VLIW ISAs take this to an extreme.

      Classic RISC archs usually do not have good code density and they compress well relative to CISC ISAs.

      Try it and see.

    10. Re:CISC to save RAM? by Guy+Harris · · Score: 1

      For CISC you need more bytes per instruction

      Neither "CISC" nor "RISC" are instruction sets; they're classes of instruction sets. In S/360, ancestor of z/Architecture, a register-register arithmetic op has an 8-bit opcode field and 2 4-bit general register fields, for a total of 16 bits; later versions of the instruction set have added 32-bit register-register instructions. In most RISC instruction sets, a register-register arithmetic op is 32 bits - they are generally 3-operand (source 1, source 2, destination) and most RISCs have 31 or 32 GPRs, requiring 5 bits per register specifier. A load or store instruction in S/360 has an 8-bit opcode field, a 4-bit target or source register field, a 4-bit index register field and a 4-bit base register field, and a 12-bit offset field added to the index and base registers to make the memory address; later versions of the instruction set have added 48-bit register-memory instructions. In most RISC instruction sets, a load or store instruction is 32 bits.

      So the CISC instructions are, by and large, no bigger than the equivalent RISC instructions, and are, in some cases, smaller. You might need more CISC instructions if you run out of registers, as you have fewer GPRs than most RISC processors, but, as z/Architecture has register-memory arithmetic instructions, you might be able to replace some load+arithmetic instruction pairs in RISC with a register-memory arithmetic instruction in z/Architecture. If you need a large offset in a load or store instruction, you might have to use a 48-bit instruction on z/Architecture, but you might be able to use a 32-bit instruction on a RISC architecture - if you're not, you'll probably need a pair of RISC instructions.

      In other words, "it depends". Anybody have any actual statistics on code density for z/Architecture vs. code density for various RISC processors for the same code (e.g., code from Linux for the architectures in question)?

    11. Re:CISC to save RAM? by Guy+Harris · · Score: 1

      RISC machines in general have a lot more registers to mess around with. Take the Itanium for example

      Take the Itanium as an extreme example. Most RISC machines have 31 or 32 integer GPRs, not 128. Yeah, in a sense SPARC processors have more, but most SPARC code doesn't juggle register windows within a routine - only 31 registers are conveniently available within a routine.

      With amd64 assembly, you have rax, rbx, rcx, and rdx. If you need to juggle more than that, you go to memory.

      As another followup noted, no, you don't - you nominally have 16 GPRs with x86-64. Not 31, but still better than 4. Yes, some have special purposes, such as the stack pointer, but, in practice, RISC code tends to use at least one register as a stack pointer as well. I don't know offhand whether the x86-64 ABIs use rbp as a frame pointer or not; most RISC ABIs don't, I think, have a frame pointer.

  9. This chip snickers at my 6502... by bobdotorg · · Score: 3, Insightful

    The chip uses 1,079 different instructions

    Can't even imagine writing in assembly code for this monster. I miss dinking around with a nice 6502 system.

    --
    __ Someday, but not this morning, I'll finally learn to use the preview button.
    1. Re:This chip snickers at my 6502... by Anonymous Coward · · Score: 0

      Eh, x86_64 has more, I believe.

    2. Re:This chip snickers at my 6502... by jmak · · Score: 1

      I'd guess most of the code run on these CPUs will still use the original IBM 360 instruction set from the sixties.

    3. Re:This chip snickers at my 6502... by MichaelSmith · · Score: 1

      The chip uses 1,079 different instructions

      Can't even imagine writing in assembly code for this monster. I miss dinking around with a nice 6502 system.

      Yeah the 6502 is nice and friendly. I taught myself how to hand assemble on the 6502 when I was 12 or 13.

    4. Re:This chip snickers at my 6502... by Haxamanish · · Score: 1

      I miss dinking around with a nice 6502 system.

      Start playing with ARM then, its design was somewhat inspired by the 65xx series and there are plenty of affordable ARM-based systems available.

    5. Re:This chip snickers at my 6502... by sznupi · · Score: 1

      Or something "lower" among many popular microcontroller families. AVR is quite pleasant, for example.

      --
      One that hath name thou can not otter
    6. Re:This chip snickers at my 6502... by Anonymous Coward · · Score: 0

      That depends on how they define instruction. I have seen some processors that claimed an enormous amount of instructions by considering every operation+address mode combination as one instruction and some that claimed to have a small RISC instruction set while having each instruction execute on different types of data. In the later case every instruction had many different encodings ;)

    7. Re:This chip snickers at my 6502... by zeroduck · · Score: 1

      I've done assembly in school for the Freescale HC11/12 and Microchip PIC18... both are very easy to work with. That said, I'm glad I will never have to do assembly in school again. C for ever.

    8. Re:This chip snickers at my 6502... by mferero · · Score: 1

      Heck, I wrote a lot of 370 (and 370/XA) code in assembly on mainframes. Different call structure (no stack, although the newer mainframes have that now) and register conventions (don't ever use register 1, register 0 is right out, and return back to your caller through the address in register 14, etc.). A lot of fun writing assembly code for system exits, in-house utilities, and invoking services not normally provided to the higher-level languages (like dynamic allocation of datasets). Those were fun days . . .

      --
      Honor est omni
    9. Re:This chip snickers at my 6502... by Anonymous Coward · · Score: 0

      Assuming you're fond of special purpose registers and banked memory.
      And ARM with its bazillion Thumb instruction sets... Bah! I remember back in the day when RISC meant something.

    10. Re:This chip snickers at my 6502... by Anonymous Coward · · Score: 0

      6502 was a bit anaemic, but the 6800 was a joy to program

    11. Re:This chip snickers at my 6502... by Guy+Harris · · Score: 1

      Heck, I wrote a lot of 370 (and 370/XA) code in assembly on mainframes. Different call structure (no stack, although the newer mainframes have that now) and register conventions (don't ever use register 1, register 0 is right out, and return back to your caller through the address in register 14, etc.).

      Those are more ABI conventions than instruction set requirements. S/3x0 might not have an instruction that explicitly treats a given register as a stack pointer, but one could choose to use a particular register as a stack pointer, as C compilers for S/3x0 and z/Architecture do. UN*X ABIs for most if not all general-register instruction sets - including x86, x86-64, S/3x0, z/Architecture, and 32-bit and 64-bit versions of various RISC architectures - do have a stack pointer, do have conventions about register use, and, at least for RISC processors, have the return address in a register (which has to be saved if you're not in a leaf routine).

    12. Re:This chip snickers at my 6502... by Anonymous Coward · · Score: 0

      There are a lot of instructions just for the convenience of the assembly programmer. Try programming all the subsystems of a mainframe with the model of a OISC (single instruction computer).

    13. Re:This chip snickers at my 6502... by LWATCDR · · Score: 1

      Actually you might like it a lot. The ISA was very high level and is really one of the great ISAs. Right up with VAX and ARM.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    14. Re:This chip snickers at my 6502... by Guy+Harris · · Score: 1

      Actually you might like it a lot. The ISA was very high level

      Actually, a lot of the most-used instructions are relatively low-level. Yeah, you have all the packed-decimal, character-string, and so on instructions, but for integer, address, and floating-point work, it's your standard register-memory architecture.

      and is really one of the great ISAs. Right up with VAX and ARM.

      ...but without the wacky addressing modes of the VAX, so it might've been a bit easier to make go faster. (Well, if you don't try to make the autoincrement/autodecrement stuff go fast except in some specialized stack cases, that might be easier.)

    15. Re:This chip snickers at my 6502... by uiuyhn8i8 · · Score: 0

      >Can't even imagine writing in assembly code for this monster. I miss dinking around with a nice 6502 system. To each his own. I kinda agree though. I also grew up on hand coding machine code on the 6502 and it sure was fun, and educational. But now working on designing 32-bit embedded chips I feel really comfortable with a couple of hundreds of instructions instead of ~60. And I sure wouldn't want to do anyhing more complex with the limited addressing modes on the 6502. And you will never see Linux on a 6502... Btw we actually designed a 6502 once and it used a couple of thousand gates, compared with a couple of billion in a high end CPU. Hard not to be impressed by the oldschool designs.

    16. Re:This chip snickers at my 6502... by LWATCDR · · Score: 1

      Umm we are talking relative to a 6502.
      Floating point? I am not sure that the 6502 even had multiply and divide. It has been a long time.
      This discussion has gotten me reading up on mainframes and for the life of me I find IBM to a confusing company.
      The 360 was supposed to be a universal system. Great plan. A customer could start with a 360/20 and keep right one going up. Then they came out with all sorts of different computers that fragmented their user base. I guess part of it was they where getting the company ready to be split up if they lost the anti-trust case.
      After reading the history I just had to wonder what would have happened if IBM had made a since chip CPU based on the 360/20 and used that with say CTSS as the bases of the IBM PC.
      I see no reason why the 360/20 couldn't have been made into a microprocessor. If IBM had done things right by sell the CPUs and OS to clone makers they could have been as Microsoft, Intel, and IBM today combined.
      Of course eventually we would have had Linux running on them and we would probably see embedded microcontolers based on the 360 ISA.
      Oh well.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  10. Re:Yeah, I read about this by Spad · · Score: 4, Insightful

    Yes, but their article comments are much closer to Youtube than Slashdot.

  11. Microchip? by MistrX · · Score: 1

    That term is so 90's. Why are we still calling it that? Shouldn't it be 'nanochip' or something like that?

    1. Re:Microchip? by the_fat_kid · · Score: 2, Interesting

      iChip?

      --
      -- Sig under construction...
    2. Re:Microchip? by TeknoHog · · Score: 1

      This is a nanochip.

      --
      Escher was the first MC and Giger invented the HR department.
    3. Re:Microchip? by AVee · · Score: 1
      Nano?

      The z196 contains 1.4 billion transistors on a chip measuring 512 square millimeters...

      That's bigger than a quarter dollar coin...

    4. Re:Microchip? by RAMMS+EIN · · Score: 1

      Let me bounce that back to you. Why do you want to change the world, just because it has been in use for a while?

      --
      Please correct me if I got my facts wrong.
    5. Re:Microchip? by RAMMS+EIN · · Score: 1

      s/world/word/

      I fail it.

      And slashdot fails it for not letting me post this amendment, even after what's definitely more than one minute.

      --
      Please correct me if I got my facts wrong.
    6. Re:Microchip? by treeves · · Score: 1

      Just like "micro" in microchip didn't refer to the size of the die or the package, but the size of the smallest features, i.e. the gate length, nano would refer to the gate size which is, for the chip in the z196, on the order of 45nm, so "nano" is appropriate. Chip makers are working on making 22nm now. But back when the term microchip was coined, the features were 10 microns, nearly 1000 times larger than today. Maybe it is time to change the terminology. You're right though, that is a pretty large die size, 512 mm2 being almost one square inch.

      --
      ...the future crusty old bastards are already drinking the Kool-Aid.
  12. Lower than expected caches by Anonymous Coward · · Score: 0

    At least, not private L2 caches per core. The L1 caches seem a little small, especially if this is supposed to be used in mainframe type situations.
    But i'm not too in-the-know when it comes to their mainframes, the cache might not even be needed that much if they have some fast pipes between those circuits.
    Either that or the system depends more on the private caches.

    Quite liking those 2 co-processors for crypto. T'is an important function that gets ignored far too often these days. Having it in hardware is even better.

  13. Re:fastest first post ever? by stonewallred · · Score: 1

    Will it run Crysis at full settings though?

  14. An appropriate conference for the announcement by dtmos · · Score: 1

    Announcing a 5.2 GHz, 1.4 billion-transistor processor at "Hot Chips 2010" just makes sense. Strangely, no power numbers were given...

    1. Re:An appropriate conference for the announcement by JamesP · · Score: 1

      From Wikipedia each multi chip module takes as much as 1800W (six processors)

      No figures for 1 chip though

      http://en.wikipedia.org/wiki/IBM_z196_(microprocessor)

      --
      how long until /. fixes commenting on Chrome?
    2. Re:An appropriate conference for the announcement by Anonymous Coward · · Score: 0

      According to The Register, it dissipates 1.8KW. Beast!

      F_T (from work, not logged in)

  15. highest clock speed? not really by Vectormatic · · Score: 1, Interesting

    Intel's netburst architecture (of pentium 4 fame) featured the 'Rapid Execution Engine', which consisted of two ALU's running double the clock speed, on 3.8 GHz Pentium 4's, that would be 7.6 GHz

    Granted, that is not the entire cpu, but still..

    --
    People, what a bunch of bastards
    1. Re:highest clock speed? not really by Anonymous Coward · · Score: 0

      and in a hurricane, fly upstream, a swallow broke the land speed record for a tillamook brook quietly flowing sideways.

      granted, it's a distinction without a difference.

      granted, no one gives a shit.

      granted.

  16. I doubt it's the fastest ever... by the+linux+geek · · Score: 1, Insightful

    except possibly in clock speed. I'm fairly sure than an 8-core 4.25GHz Power7 is probably as fast or faster if the workload is properly threaded, which any enterprise server or mainframe should be. On the other hand, on single-thread or few-thread workloads, the z196 probably has a bit of an edge, despite a large portion of its instruction set being microcoded.

    1. Re:I doubt it's the fastest ever... by mr_mischief · · Score: 3, Informative

      ummmm.......

      It's a quad-core chip. Each core has two integer, two load and store, one binary floating point, and one decimal floating point unit. Up to 24 CPUs can be placed in the frame. It can connect to another whole rack of POWER7 blades running AIX as an application accelerator platform.

      The z196 is for the stuff a mainframe is good at: big batches and fast I/O. The application accelerator is for stuff the clusters of supermicro servers are good at. As a hybrid system connected across the GX bus, it should pump data in and out of applications out pretty well.

    2. Re:I doubt it's the fastest ever... by spyked · · Score: 1

      Agreed, high-frequency != fast. The IBM Cell Broadband Engine SPUs are a good example in this sense.

  17. Re:fastest first post ever? by clang_jangle · · Score: 1

    Yes, but you'll have to disable "Aero".

    --
    Caveat Utilitor
  18. About time! by dafing · · Score: 1

    Those slackers, wheres my 3GHZ G5? Huh?

    *sigh* FINE, begin the switch back....


    -Steve

    Sent from my iPad 2

    --
    --- ...or a new slashdot signature. Dear aunt, let's set so double the killer delete select all
  19. Re:Yeah, I read about this by Fear+the+Clam · · Score: 1, Redundant

    If I could mod you insightful, I would.

  20. With CoProcessors.... by vchoy · · Score: 1

    My 386DX has an external Maths Coprocessor, => can only do floating point functions :(
    However mine's now a bit faster overclocked it from 33Mhz to 52Mhz ... your one does how 5.2 Ghz -> Sure my M series superseeds your G series..right?.... ....
    right?

  21. The S/360 architecture lives on! by Arakageeta · · Score: 1

    It's crazy that an architecture developed in the '60s lives on in the System Z today. IBM bet the company on the S/360 product line. I think the investment has paid off-- and still does!

    1. Re:The S/360 architecture lives on! by LWATCDR · · Score: 1

      Hey it is only 12 years older than the x86! Which was based on the 8085. And frankly the 360 ISA was much better.

      --
      See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  22. Wait....what? by antifoidulus · · Score: 2, Insightful

    It contains a 64KB L1 instruction cache, a 128KB L1 data cache, a 1.5MB private L2 cache per core, plus a pair of co-processors used for cryptographic operations. In a four-node system, 19.5 MB of SRAM are used for L1 private cache, 144MB for L2 private cache, 576MB of eDRAM for L3 cache, and a whopping 768MB of eDRAM for a level-four cache. All this is used to ensure that the processor finds and executes its instructions before searching for them in main memory, a task which can force the system to essentially wait for the data to be found--dramatically slowing a system that is designed to be as fast as possible.

    I'm assuming the cache referred to in the second paragraph is off-chip cache, otherwise it would sort of negate the first sentence.... Would be nice if the article would have actually said that though.

    1. Re:Wait....what? by Anonymous Coward · · Score: 3, Insightful

      Considering the ratio between the two sets of figures is ~96, it seems that the "four-node system" contains 96 cores with their own L1 and L2 caches, but shared L3 and L4 caches.

    2. Re:Wait....what? by bws111 · · Score: 1

      That is correct.

    3. Re:Wait....what? by Anonymous Coward · · Score: 0

      The true is that is a bit confusing, for clarification they refer a MCM (multichip module) as node and a individual system can have up to 4 MCM, each MCM have 8 dies, 6 of that are z196 processors and each z196 processor have 4 cores.

      Then a 4 node system can have up to 4 nodes/MCM * 6 z196 processors * 4 cores = 96 cores.
      From that each machine can have:
      96 cores * (128KB+64KB)/1024 = 19.5 MB of L1
      96 cores * 1.5MB = 144MB of L2

    4. Re:Wait....what? by dtmos · · Score: 1

      Indeed. "Off-chip L1 cache" seems like a self-contradiction.

    5. Re:Wait....what? by thoriumbr · · Score: 1

      No, all the caches are inside the chip. z196 processor are not like x86 ones. They are built inside something called MCM (multi chip module). MCM are about the size of a 1.44MB floopy disk (for those who know what a floppy is), and contais some chips: processor, clock generator, caches, and some chips used to perform storage access. So, even if the cache is off-chip, it is 1-2 inches away from the processor.

  23. CISC seems to work well by Sycraft-fu · · Score: 1

    Essentially all desktop and laptop computers use CISC chips and they are fast and cheap. RISC is a neat theory, but these days it seems that as the processors get decoupled from their ISAs anyhow, for various reasons, that it doesn't matter much. You choose the ISA for reasons of binary compatibility or features or the like, and it'll work just fine with the chip.

    Also it is not true that CISC needs more bytes per instruction, at least not all implementations. With x86 you find instructions are variable length. They can be as little one byte and as many as 11 bytes. In actual practice, you find that a lot of 1 and 2 byte instructions are used in code. CISC can be extremely pithy in some respects. Then of course many CISC instructions do more. The idea with RISC is that each instruction does only one thing (that isn't really true with all the vector math stuff these days). So you end up having to issue instructions to load values to registers, operate on them, then store them back. Not necessary in CISC, there are instructions that can take a register and a memory location as values, and sometimes even two memory locations.

    1. Re:CISC seems to work well by Anonymous Coward · · Score: 0

      I believe it is more true to say that all desktop processors are RISC but have a CISC to RISC translator between the bytes and the decoder.

      Also:
      >The idea with RISC is that each instruction does only one thing (that isn't really true with all the vector math stuff these days).
      SSE 1 and 2 were supposed to be RISCy vector instructions. Intel couldn't get their compiler to automatically take code that used regular float arrays and calculated , for instance, dot products and automagically sse it. So they started adding CISCy instructions (DOTP in this case.)

  24. 64KB L1 by dmomo · · Score: 1, Funny

    That ought to be enough instruction cache for anybody.

  25. Costing hundreds of thousands of dollars... by Laxitive · · Score: 1

    The codename for this processor, was "Ming Mecca".

    -Laxitive

  26. You really don't anymore by Sycraft-fu · · Score: 2, Interesting

    These days, compilers take care of almost everything. It has gotten complex to the extent that a programmer trying to do things all in assembly will probably do a worse job than a good compiler. Chips have many, many tools to solve their problems.

    That isn't to say it is never done, in some programs there may be some hand optimized assembly for various super speed critical functions. However even then it is most likely written in a high level language, compiled to assembly (you can order most compilers to do that), tuned and then put back in the program.

    Memory is cheap and compilers are powerful so assembly is just not as needed as it once was, at least on desktops/servers where you see these massive chips.

    1. Re:You really don't anymore by Anonymous Coward · · Score: 0

      It's people with your attitude that has given us all of these bloaty programs assuming that the compiler is just going to sprinkle pixie dust on your code and make it super optimized. No, the compiler doesn't almost always do the best job. In fact it's trivially easy to find a whole host of examples of ICL, GCC and VC++ doing poor optimization or vectorizing of code.

    2. Re:You really don't anymore by zeroduck · · Score: 1

      If you're going for code perfection every time. In the real world you have deadlines and have to maintain your code. Writing in assembly is going to make your code harder to port across platforms, should that happen say from PowerPC to x86.

      Not saying that its never justified to use assembly. Within reason, of course.

    3. Re:You really don't anymore by Anonymous Coward · · Score: 0

      Those are all true, but that has little do with my statements. The GGP is just repeating and oft-repeated, but easily falsified claim, about compilers doing great optimization when it just isn't true. The fact that so many games, video encoders/decoders, etc all need to have assembly optimizations show that compilers aren't these super optimizers like is claimed.

    4. Re:You really don't anymore by byteherder · · Score: 1

      If no one codes in assembly these days, who writes the compilers?

    5. Re:You really don't anymore by David+Greene · · Score: 1

      This would be a good discussion to have. I have seen many programmers attempt to "help" the compiler only to make some previous trivial transformation impossible. Can you post some code examples? I think it would be a great educational opportunity.

      --

    6. Re:You really don't anymore by David+Greene · · Score: 1

      Production-level compilers haven't been written in assembler for at least a couple of decades.

      --

    7. Re:You really don't anymore by TheRaven64 · · Score: 1

      And the fact that the majority of these programs is written in C or a higher-level language demonstrates that it's not feasible to give that much attention to the entire codebase. Modern compilers are starting to do some things that are incredibly hard for humans to do. One example is trace-based optimisation, where it follows a thread of execution across a number of subroutines then produces an optimised version by rethreading all of those branches. With dynamic recompilation, these functions can be spread over a number of different shared libraries, so this kind of optimisation would be basically impossible for an assembly programmer to do.

      Cases where a human can beat the compiler are becoming increasingly rare. The compiler is basically a massive pattern matching engine. Once you've identified an optimisation that a human can do, the compiler will do it everywhere. The cases where the human does better tend to be ones where the source language doesn't properly capture the semantics of the target program, or where the source language is a poor match for the target architecture (e.g. C on a vector unit).

      --
      I am TheRaven on Soylent News
    8. Re:You really don't anymore by TheRaven64 · · Score: 2, Informative

      You've almost certainly used some code compiled with a compiler that I've worked on, but I've hardly ever written assembly code, and none of it was in a compiles.

      --
      I am TheRaven on Soylent News
    9. Re:You really don't anymore by Anonymous Coward · · Score: 0

      Sure. Here is one showing how compilers handle optimization with SSE intrinsics. Notice how even ICC can get things wrong at times.

    10. Re:You really don't anymore by David+Greene · · Score: 2, Interesting

      A couple of things:

      In the first example, 'm' is not being moved to the constant data section. The constant vector being assigned to m is placed there. MSVC is missing the vectorization, not placement of constants into constant memory. You can see that it fetches the constant values from memory using scalar moves while gcc and icc use vector moves.

      I'm not familiar with MSVC switches but you might need to tell it explicitly to vectorize. I'm curious why you didn't try -ftree-vectorize with gcc, for example.

      Floating-point optimization is a tricky thing. Many compilers will be very conservative to retain bitwise equivalent results regardless of optimization level. Some will even go as far as maintaining bitwise equivalence between scalar and vector code. That can severely degrade optimization. Again, most compilers have a switch to enable "unsafe" floating-point optimization. This may be what's tripping up these compilers in some cases.

      NaNs are also an issue with floating-point. The compiler is not allowed to eliminate anything which might raise an exception.

      When encountering intrinsics, many compilers will do exactly as you say, as noted in the article. That's not a bug, it's a feature. When people use intrinsics, they usually are trying to hand-code something and often don't want the compiler to mess with it.

      Some of these tests (the shuffle one for example) are a little out-of-the-ordinary. Compiler developer time is at a premium and it's not worth doing these kinds of micro-optimizations if such code is never seen in the wild. That said, it's clear the some compilers (gcc, for example, and LLVM) do these sorts of things.

      On x86, it's often just fine to spill things to the stack and reload them. My studies show that the number of spills does not matter so much but rather what is spilled. So the number of loads/stores, while a gross indicator of performance, doesn't tell the whole story.

      The comparison test is, I think, one of those cases not worth optimizing. I can't recall ever seeing a vector compare where the operands are known statically. Doing that optimization would require loading static vectors of various combinations of 1s and 0s from memory. It is almost certainly faster to just do the compare. This isn't a missed optimization. In gcc's case it's the compiler doing what it should, regardless of what the programmer expects.

      Even so, these are interesting code examples. It would be neat to see what happens when we turn on -ftree-vectorize, use a newer gcc or try LLVM.

      --

  27. So much for the 3.3GHz speed of light limit. by thegarbz · · Score: 1

    Really this article kind of makes all of last week's comments on the speed of light limiting the speed of processors to 3GHz a bit pointless doesn't it? Now I know in principle the discussions were correct, but this just goes to show that problems can be engineered around.

    1. Re:So much for the 3.3GHz speed of light limit. by Ecuador · · Score: 4, Informative

      The comments were about the fact that at 3GHz light travels 10cm per clock speed, which limits how far you can have 2 items on a bus if you want them to communicate within 1 clock cycle. There is no "light speed barrier" or anything of the sort, however at these frequencies you design knowing that it will take measurable time for an electric signal to propagate. For example, for this particular system whose core is at 5.2GHz, if you try to send a signal to an external memory that is say 11-12cm away, then it will take about two clock cycles just for the signal to travel the distance.

      --
      Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    2. Re:So much for the 3.3GHz speed of light limit. by TheRaven64 · · Score: 1

      A lot of nonsense was spoken in that thread, but the issue is real. The time taken for light to travel is not yet a problem, but the skew is. Most communication between parts of a chip is parallel. If the connections are not precisely parallel then signals arrive at slightly different times. The clock speed is limited to the amount of time that is the maximum where signals will arrive in the same time slice. A similar limit also affects fibre optics, due to total internal reflection causing paths taken by sequential photons to have different distances.

      CPU designers work around this with deep pipelining. The z10, which this replaces, had a 14 stage pipeline. Signals only need to propagate along one stage of the pipeline per cycle, reducing the distance a lot. The problem with this approach is that a branch misprediction is very expensive, because you may only find out about it when an instruction has gone all of the way along the pipeline, meaning that you need to throw away all of the work done since then. For the Pentium 4, this could mean discarding around 250 in-flight instructions, which was why the practical speed of the chip never came close to the theoretical speed.

      Making the chip run at a higher clock speed usually means making the pipeline stages smaller, which can make things slower overall, which is where this limit really comes from, in the design sense. There are also power and switching issues at the material level.

      --
      I am TheRaven on Soylent News
    3. Re:So much for the 3.3GHz speed of light limit. by Anonymous Coward · · Score: 0

      The electrical propagation delay and the uncertainty of board parasitics justifies the external sram caches that are in the same PGA package. Most high-end cpus have multiple die in the PGA. It's been this way at least since pentium 2.

    4. Re:So much for the 3.3GHz speed of light limit. by Anonymous Coward · · Score: 0

      Excellent comment Ecuador.

    5. Re:So much for the 3.3GHz speed of light limit. by jschlesinger · · Score: 1

      You have to allow for propagation delay and transmission delay. The propagation delay depends on the components. The rule of thumb for light travelling in a wire is that it travels at 2/3 of the speed of light in a vacuum. So at 5.2 GHz, travelling at 2/3 of 300m metres/second = 200m metres a second, light travel 200 x 10**6/5.2 x10**9 metres or 20 x 10**9/5.2 x 10**9 centimetres which is about 4cm. If we factor in propagation delay on the chip, it is much smaller than this.

      --
      John F Schlesinger Temenos UK
  28. No news here (not apple) by Anonymous Coward · · Score: 1, Funny

    No news here. Everyone knows the only innovation going on in the big companies is in Apple nowadays. Move along...

  29. Fastest as in the clock frequency, or performance? by noidentity · · Score: 1

    The summary makes it sound like it's merely the one with the greatest clock frequency. Me RTFA is out of the question, this being Slashdot and all.

  30. So, does IBM license by the Core? by Wormfoud · · Score: 1

    If IBM licensed by the Core, at least there would be some business justification for developing this chip.

    1. Re:So, does IBM license by the Core? by chill · · Score: 1

      I take it you've never licensed an AS/400 (aka iSeries) box.

      --
      Learning HOW to think is more important than learning WHAT to think.
  31. A Little more detail here by valadaar · · Score: 2, Informative

    If you direct to the IBM announcement, which mentions the system in more detail then this linked article - http://www-03.ibm.com/press/us/en/pressrelease/32414.wss The New zEnterprise 196 " From a performance standpoint, the zEnterprise System is the most powerful commercial IBM system ever. The core server in the zEnterprise System -- called zEnterprise 196 -- contains 96 of the world's fastest, most powerful microprocessors, capable of executing more than 50 billion instructions per second. That's roughly 17,000 times more instructions than the Model 91, the high-end of IBM's popular System/360 family, could execute in 1970." 17k x improvement in performance in 40 years? I suppose that is about right...

    1. Re:A Little more detail here by colinrichardday · · Score: 1

      Well 2^14=16,384, so that's only 14 doublings over 40 years. Under Moore's Law, one would expect about 27 doublings.

  32. Is this chip a bargin? by walterbyrd · · Score: 1

    How much would it cost for me to put together a system with the same computing power, using off-the-shelf products, like a Xeon chip, or something? How long would it take for me to save $1 million in electricity, or whatever?

    1. Re:Is this chip a bargin? by Nerdfest · · Score: 1

      Don't forget to take into account that IBM charges per instruction in addition to software licence costs and the price of the machine.

    2. Re:Is this chip a bargin? by mikeee · · Score: 1

      A lot less; it's not a coincidence that IBM never publishes any industry-standard benchmarks on these things.

      Don't get me wrong; they're fast, but they're not magic, and they're not remotely competitive on a price/raw-compute-cycle basis. If that's what you want go get a beowulf cluster.

    3. Re:Is this chip a bargin? by davinep · · Score: 1

      Are you asking for hardware acquisition or Total Cost of Ownership? Acquisition cost: less, but what would it do??? TCO: A lot more, since you'd have to pay a fleet of developers to put together an OS (on commodity hardware) that mimics zVM and can run mainframe apps that are 40 years old. Hardware w/o apps just sucks power and produces nuthin'

    4. Re:Is this chip a bargin? by mlts · · Score: 1

      One can compare it to SANs. If someone wants a lot of disk capacity, they can build a bunch of BackBlaze pods. However, one reason people buy SANs is for additional features, such as being able to have multiple computers access a LUN (the main machine and a backup box), deduplication, replication across a WAN, snapshot filesystems, snapshot backups (snapshots run in constant time so popping a snapshot is the fastest way to back up a LUN on a machine with a very small backup window, then the backup snapshot can be dumped to tape for long term archiving), etc.

      Same with mainframes. One can make a Beowulf cluster of anything and end up with something faster than IBM's iron. However, is the MTBF the same? The more machines, the more things that can go wrong, so the more redundancy is needed.

      This isn't to say PCs are bad, but there are always trade-offs. A SAN and a couple boxes in a cluster can give 99% uptime, but if someone is needing 5 nine reliability, they go mainframe, no questions asked, and they will pay dearly for those five nines.

  33. Re:fastest first post ever? by Chris+Snook · · Score: 1

    Given that the Z architecture doesn't even have PCI, that would be a no.

    --
    There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
  34. lets see 3ghz 150$ by Anonymous Coward · · Score: 0

    lets see X2 oh ya 6GHZ at 300$

    go IBM

  35. Re:fastest first post ever? by Yvan256 · · Score: 1

    Can we still enable "Coffee Crisp"?

  36. Re:fastest first post ever? by Larryish · · Score: 1

    in b4 beowulf cluster

  37. miniprocessor? by Anonymous Coward · · Score: 0

    When can we move on from microprocessor? Is there a definition?
    Let's get things moving, how about a kiloprocessor--or at least a milliprocessor.

  38. re by Anonymous Coward · · Score: 0

    yeah shitty IBM utilities on uber hardware. I'll take uber utilities on great hardware.. Good ol DEC

  39. And to think that back in 1980... by jbarr · · Score: 1

    ...my trusty VIC-20 was clocked at a mere 1.02 MHz.

    Its truly amazing how far we've come in so short a time.

    (Well, maybe not so short for you whippersnappers...)

    --
    My mom always said, "Jim, you're 1 in a million." Given the current population, there are 7000 of me. God help us all!
    1. Re:And to think that back in 1980... by TheLink · · Score: 1

      Heh, nowadays a perl/python/ruby program can increment a number faster than a 1MHz 6502 can (fastest instruction takes 2 cycles).

      Good times good times ;).

      --
  40. art by jDeepbeep · · Score: 1

    Be that as it may, there are also a few of us who simply enjoy the art of assembler. Sue me; I'm a romantic.

    --
    Reply to That ||
    1. Re:art by jellomizer · · Score: 1

      Wich as all fine and good for geekyness... However if you are going to do things in the real world and you find that you take 50% more time to make you program run 10% slower then it did before just because you can't accecpt that there are better way of doing thinks, for real work then you probably should find new work.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  41. But Will It Run by Anonymous Coward · · Score: 0

    Windows EVER?

    Yours In Osh,
    Kilgore Trout

  42. And IBM is no dinosaur either. by Anonymous Coward · · Score: 0

    Lots of people have the impression that IBM is some kind of old yesteryear has-been dinosaur of a company that hasn't dies yet just because it was so huge and market-dominating at one time.

    They're not. They're still doing real *computer science* at IBM. They might not be the 800 lb gorilla of the market that they once were anymore, but they're still quite relevant, and as to the S/360 architecture, there's nothing wrong with building a solid foundation and sticking to it. Especially if you keep applying computer science to it instead of some marketing guy's ideas of "innovation" to ran down customers' throats.

  43. Re:fastest first post ever? by mikeee · · Score: 1

    Actually, I believe it does now. 3D driver support is even worse than Linux, though. :)

  44. Re:fastest first post ever? by ledow · · Score: 1, Informative

    You could run Linux on it. Then QEMU/WINE combo on that. Then Crysis on that. It'd still probably only get you back to the 2.5-ish-GHz of your average desktop, though.

  45. Re:fastest first post ever? by mark72005 · · Score: 2, Funny

    I could never... Aero is... the best thing ever... it's so amazing and useful. It improves the look and feel so much.

    Wait, do I have it turned on or off right now?

  46. Re:Yeah, I read about this by mark72005 · · Score: 1

    If I could mod you flamebait, I would.

  47. Will it need liquid helium cooling? by Drakkenmensch · · Score: 1

    What I'd love to know is how they overcame the 3 GHz overheat barrier that has prevented this sort of processor speed until now.

  48. No programmers over 50? by Terje+Mathisen · · Score: 2, Informative

    I guess I'm a counterexample then:

    I'm 53.

    I believe (hope?) most people who know me would say that I'm still a pretty good programmer.

    Terje

    --
    "almost all programming can be viewed as an exercise in caching"
    1. Re:No programmers over 50? by Anonymous Coward · · Score: 0

      I've fixed your code more than a few times, and no, you're not al that good...

    2. Re:No programmers over 50? by tibman · · Score: 1

      *knock knock*
      "Hello Terje, there was a paperwork mixup a few years back. We're here to fix that."

      --
      http://soylentnews.org/~tibman
  49. Speed specialism = iteration! by MessyBlob · · Score: 1

    I'll hazard a guess that clock speed excels in one particular case: tight-loop iteration. You can't do that with parallelism (ignoring some fancy pipelining to get part-way there). The fastest way to get the 1 millionth result in a no-shortcut iterative sequence is to get the loop processing at the highest frequency possible.

  50. hmm, this could be huge by mbclimber · · Score: 1

    The brain is estimated to have about 60 Trillion synapses (bio-electrical connections between neurons). Neuroscientists have suggested that to simulate a human brain you would need a computer with at least that many neurons. It seems that ~43 of these chips would do the trick!

    1. Re:hmm, this could be huge by sexconker · · Score: 1

      The brain is estimated to have about 60 Trillion synapses (bio-electrical connections between neurons). Neuroscientists have suggested that to simulate a human brain you would need a computer with at least that many neurons. It seems that ~43 of these chips would do the trick!

      It seems to me that 43 of these chips still contain 0 neurons.

    2. Re:hmm, this could be huge by mbclimber · · Score: 1

      A synapse is an intracellular junction that allows for the propagation of binary electrical information between neurons.Like transistors, synapses have threshold voltages that need to reached before the information can be sent. In a sense, synapses = transistors. A single neuron in the brain may have on the order of several hundred to tens of thousands of synapses. Considering there are about 100 billion neurons in the human brain, the number of synapses add up to a daunting number. Here's where I got the info: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1188961/pdf/jpn00078-0049a.pdf http://faculty.washington.edu/chudler/facts.html

  51. Bad Golf? by thogard · · Score: 2, Funny

    Had a golf game ended differently, would we be seeing these in power macs?

    1. Re:Bad Golf? by Guy+Harris · · Score: 1

      Had a golf game ended differently, would we be seeing these in power macs?

      Only for the PowerMacs that need to run big enterprise applications and sit in a machine room connected to disk farms with FICON. :-) These aren't Power-architecture (POWER, PowerPC, blah blah blah) chips, they're z/Architecture chips.

    2. Re:Bad Golf? by Anonymous Coward · · Score: 0

      If you look at how much it costs: probably yes!

  52. Only two Crypto co-processors? by Paracelcus · · Score: 1

    Seems like there should be more.

    --
    I killed da wabbit -Elmer Fudd
  53. Re:Yeah, I read about this by Mitchell314 · · Score: 1

    Way to break the combo, jerk.

    --
    I read TFA and all I got was this lousy cookie
  54. Minis and Mainframes are still very popular and by Shivetya · · Score: 1

    they are very much needed.

    We have deployed two IBM 570 series minis and 2 595s. These range from 24 to 64 processor systems featuring 30 to 50tb of disk storage. We will deploy a 770? with nearly 100tb of storage backed by I think 96 processors. There are no Intel/AMD servers that reach our needs with the software and reliability we need. Coworkers in other departments might make jokes about our "old school" hardware and such but even they get shocked by our downtime, which was twice in twelve years excluding new machine upgrades; of course during which we were swapped to offsites. First failure was because the install wrongly wired power connections and site UPS failure dropped the box, the second time was site cooling failure and our machines were the last to turn off, the intel/amd heat furnaces; no other way to be polite about them; quickly could not deal with the hot computer room.

    While I have great respect for the server group, their hardware just cannot compare. They can spend similar money but once you reach certain sizes minis (AIX or iSeries) become much easier to deal with, hell with 72 iSeries we only had 3 Admins for the whole lot, we have more than that just for one AIX machine, let alone that "department" that supports the servers (think company wide data sharing) or business intelligence setups.

    At the same time the one telling point, we are held to a standard which the server groups are not. We have to schedule outages; even with a offsite mirror; just to do upgrades. The other groups get one weekend per quarter, we can't even get that. We serve too many and 24x7 really means 24x365. Lose a file server and people get mad but if we have a slow down we get calls from up high.

    So, don't just focus on CPUs, that is the game of PC geeks. We bench mark on work performed and downtime. Every component is intelligent, we don't lose processing power to driving peoples sessions, disk access, and such. FWIW, we host Linux too, so there is some other value for us.

    --
    * Winners compare their achievements to their goals, losers compare theirs to that of others.
  55. Re:fastest first post ever? by sparrowhead · · Score: 1

    As a matter of fact Crysis is one of the killer applications for the zSeries has to offer... Financial Crisis that is...

  56. microprocessor? by FreeBSD+evangelist · · Score: 1

    The z196 contains 1.4 billion transistors on a chip measuring 512 square millimeters fabricated on 45-nm PD SOI technology.

    At what point would we stop calling it a microprocessor?

    1. Re:microprocessor? by Bitmanhome · · Score: 1

      At no point would we stop calling these microprocessors. The term is to distinguish them from early processors made from thousands of discrete components. Early processors were made from large numbers of tubes. Later, they were made from discrete transistors and gate arrays. Once all significant components could be integrated into a single chip, they became "microprocessors". Since we're still at that stage, we still call them that.

      --
      Not that this wasn't entirely predictable.
  57. Missing the Main Point by fast+turtle · · Score: 0

    This is a Mainframe CPU not a Server/Desktop/Mobile Chip. This means it's purpose is to be as expensive as possible while costing an Arm/Leg in maintenance fees.

    --
    Mod me up/Mod me down: I wont frown as I've no crown
  58. Put that in your iPad by FreeBSD+evangelist · · Score: 2, Funny
    From TFA:

    IBM also previously claimed the title of fastest microprocessor with the POWER6 chip, which ran at speeds of up to 4.6 to 4.7 GHz, and its own z10, a 2008 chip which ran at speeds of up to 4.4 GHz.

    I seem to recall that one of the official reasons Apple gave for the switch from Power to Intel was that IBM couldn't/wouldn't deliver a fast enough processor.

    1. Re:Put that in your iPad by Anonymous Coward · · Score: 0

      Couldn't deliver a fast enough processor for the price that Apple was willing to pay. Apple was apparently shocked to find out that a custom designed/fabbed chip costs considerably more than a mass-produced one, so they went with mass-produced and tried to spin it as IBM's problem.

    2. Re:Put that in your iPad by Anonymous Coward · · Score: 0

      Energy efficiency and heat were the issue there, not speed.

    3. Re:Put that in your iPad by Guy+Harris · · Score: 1

      Couldn't deliver a fast enough processor for the price that Apple was willing to pay.

      ...and for the applications Apple wanted, such as, say, stuffing inside a notebook computer. They did manage to stuff a PPC 970 inside an iMac, but it never made it into a PowerBook.

  59. You don't *want* it on your desk by billstewart · · Score: 1

    Mainframes are designed to do huge amounts of I/O supporting lots of processing in parallel and running big databases with whatever the current generation's definition of huge quantities of disk storage is. That's not the kind of computer you put on top of your desk, either in a workstation or a laptop. It's the kind of computer you put in a big air-conditioned sound-proofed room down the hall. If you want to run graphics-oriented applications using it to do the data processing, that's fine - do your graphics on a graphics-oriented workstation on your desk, and crunch the data on the mainframe, using some appropriate protocol to connect the two.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  60. RISC vs CISC by KingFrog · · Score: 1

    Wow...the author of the article thinks that the difference between RISC and CISC is how much memory the programs can take up? He needs to get a serious education in computers before writing about them.

  61. Virtualization RECENT? Ha! by KingFrog · · Score: 1

    Virtualization is a RECENT mainframe trend? Only if you count 1970 as recent. Virtualization is a RECENT trend in toy computers, not mainframes. They did it first. And as to whether it's faster or slower than a PC...it's really workload dependent. You have 5000 employees who need to access the same data, say your airline reservation system? Try that on a farm of Intel servers and you're out of business! You wanna play Crisys? I wouldn't try that on your z10 anytime soon.

  62. Re:Yeah, I read about this by arkane1234 · · Score: 1

    Is Slash-D some cool hip way of saying slashdot today?

    --
    -- This space for lease, low setup fee, inquire within!
  63. from z10 to z196 by bugs2squash · · Score: 1

    in two years, that's a new revision every 3 days. I'd wait a couple of weeks until z200 comes out.

    --
    Nullius in verba
  64. snoop dogg's pimped out machine by ruthless+reader · · Score: 1

    Finally... a processor that can power snoop's pimped out machine with norton installed

  65. The previous compiler by Sycraft-fu · · Score: 1

    You build the new one using the old one. Visual Studio 2010 was written in Visual Studio 2008 originally. Now it is written in itself, the production compiler is used to update the code for that compiler.

  66. My favorite 370 instruction by Anynomous+Coward · · Score: 1

    I sure hope one of them instructions is 'Insert Characters Under Mask'.

    --
    I'm not a coward by any name.
  67. "ever"? or rather, "So far"? by Anonymous Coward · · Score: 0

    "The fastest X ever" sounds so final, as in "for ever", as if, hey, this might be IT. And of course this kind of record will inevitably be broken, and probably sooner rather than later.

    Maybe these kinds of stories should have headlines like, "IBM Unveils the Fastest Processor . . . for the Time Being." or " . . . So Far."

  68. Re:Yeah, I read about this by aquila.solo · · Score: 1

    I would have thought it would be "Slash-dizzle."

    And remember, kids: "Hack is Wack!" :-P

  69. The OC by DarthVain · · Score: 1

    Yeah, but will it Overclock?

  70. Re:fastest first post ever? by Chris+Snook · · Score: 1

    It would get you nowhere near that. A substantial fraction of any mainframe architecture's instruction set is emulated in software. The actual MIPS ratings are way below the MHz ratings, whereas on most superscalar architectures, MIPS exceeds MHz.

    Once you've paid that penalty as well as the qemu penalty, you're getting down to somewhere in the Doom/Quake I range, with no hardware acceleration.

    --
    There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.