IBM Mainframe Running World's Fastest Commercial Processor
dcblogs writes "IBM's new mainframe includes a 5.5-GHz processor, which may be the world's fastest commercial processor, say analysts. This new system, the zEnterprise EC12, can also support more than 6-TB of flash memory to help speed data processing. The latest chip has six cores, up from four in the prior generation two years ago. But Jeff Frey, the CTO of the System Z platform, says they aren't trading off single-thread performance in the mainframe with the additional cores. There are still many customers who have applications that execute processes serially, such as batch applications, he said. This latest chip was produced at 32 nanometers, versus 45 nanometers in the earlier system. This smaller size allows more cache on the chip, in this case 33% more Level-2 cache. The system has doubled the L3 and L4 cache over the prior generation."
https://en.wikipedia.org/wiki/IBM_z196_(microprocessor)
Palm trees and 8
How does the L4 cache in these processors work? Generally going to anything off die is going to induce a major latency penalty due to the need to go through a driver stage which can handle outside interference. How can they make the L4 cache fast enough that its small size doesn't make it basically pointless versus just going to main memory?
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
That's a Ming Mecca chip. Those aren't even declassified yet!
So, you're comparing a ridiculous configuration of a nitrogen-cooled, over-clocked processor that will maybe run long enough to get a screen shot of it running, to a commercial processor that is designed to run at that speed non-stop for years and years? Yeah, that makes sense.
CPUs have not accessed main memory synchronously in decades. There are many hundreds of cycles lost if the processor stalls on a RAM access, not just from the length of the wiring but the addressing logic too. In fact, modern CPUs don't do word-level access to RAM, but rather pull in whole cache lines in a more packetized memory access protocol. Even in a multi-CPU SMP system, they don't actually communicate through system RAM anymore, but rather communicate CPU-to-CPU with a cache coherency protocol that provides the illusion of a shared system RAM. Each CPU really has its own set of local RAM behind its own cache and on-chip memory controller.
Even the L2 or L3 caches are unable to keep up with the CPU, but they are still significantly faster than system RAM, so they still help when the working set can fit there.
Mainframes run a surprising amount of critical workloads in the real world. They're vastly different than open systems, but they can be kept running through almost anything, if you're willing to spend enough money.
They claimed "faster", not "more powerful"; clock frequency is the only thing they need to reference for that claim.
Oh no... it's the future.
No, they aren't claiming that. Clock speed is still extremely important, though, and nobody else except IBM has figured out how to hit these high gigahertz numbers, much less within power and cooling constraints. What's all the more impressive is that IBM does it at mainframe service qualities, i.e. this machine runs continuously at 5.5 GHz without shutting off cores, without "burst" mode, and without weird/exotic stuff like cryogenics that might keep a chip running long enough for a screenshot. It's just balls out performance on every thread -- and there's a definitely a market for that. Nobody else is left doing this computer engineering, bless them. Also check their cache sizes (obscenely huge), out-of-order execution, pipelining, crypto and decimal floating point in every core, extremely complex instructions like transactional execution.... This z CPU is a gorgeous piece of engineering in every way. And no, you can't run an entire large bank (for example) on your laptop.
There are some engineering tricks I've seen IBM use which are pretty cool. Take the POWER7 CPU line for example. You can disable every other core, allowing the cores that are operational use the cache of the cores that are not on. This gives not just cache, but allows a higher clock speed. Of course, this feature is mainly used to deal with applications which are licensed by the hardware cores present.
Mainframes are probably one of the most underutilized tools out there. However, for performance per square foot in the data center, they are hard to beat these days.
Of course, the biggest advantage: It isn't x86. With virtually everything running on the x86 or amd64 platform, all it would take is an undocumented instruction similar to the F0 0F bug that happens to give ring 0 access, and virtually the whole world is vulnerable with absolutely zero way of protecting against it except reaching for the network cable or power switch.
Laptop chips? Please. We're moving away from that. The tradition these days is to compare everything to your cell phone-- Your cell phone beats the pants of a Cray, and so on.
To add to this, the Sandy Bridge has an L1 latency of 4 or 5 cycles (depending on access mode), the L2's latency is 12 cycles, and the L3's latency is 46 cycles plus the response time of the memory chips (typically between 60ns to 70ns)
These chips make up for the high latencies by having many instructions being executed simultaneously, so if one dependency chain completely stalls out on a cache miss any other dependency chains can still fill up the execution units keeping the processor just as busy as if there were no stall at all until everything left in the pipeline is dependent on the result of the stalled out operation.
"His name was James Damore."
No, that's not a correct supposition -- quite the opposite, actually. All processors, including Intel X86, use microcode (or what IBM calls millicode) to a degree. IBM knows it well. After all, they invented microcode/millicode in the System/360 in 1965. But IBM uses microcode comparatively less nowadays than other processor architectures. The vast majority of zEC12 instructions are implemented entirely in hardware, including IEEE-754-2008 decimal floating point as an example. There's some really, really interesting new stuff in the instruction set, like the first transactional memory ("transaction execution facility") instructions in a commercial server, and some "feedback" instructions that can tell Java applications/the JVM how to dynamically tune itself in a live running environment. Very cutting edge -- so cutting edge I've got to crack open some engineering manuals to try to figure out what they've done, although they probably need to write those manuals.
They make very few thousands of the really high-end stuff like this. You can bet every dollar you have that these will execute faster. Multinational corporations don't shell out $20M for a mainframe upgrade without knowing exactly what they're getting. L3 cache is 48GB. <== not a typo. There's an outboard L4 cache that's much larger. They've got bandwidth that can feed that beast: they were built to handle TB/s of just I/O bandwidth, not including CPU access to the data, something like a decade ago.
As always, all IMO. Insert "I think" everywhere grammatically possible.
OK, here's a benchmark. You're welcome to try running an entire large bank (for example) on one server -- your choice. OK, two servers: I'll allow you one additional for off-site disaster recovery of all development, test, and production workloads, including concurrent batch and online, for all the bank's security zones. Choose wisely, Grasshopper.
I'll believe their claims when I see some test results they can back it up with.
You have a point, but you missed it. At least talk in terms of modern workloads. These machines are running over 1,500 MIPS. Your talk of systems running 25-30 MIPS is silly. If your 114 is running at 25 MIPS it is broken. Really, really broken.
No single processor desktop CPU can handle that. Even dual processors. Hercules is no where near the performance of a modern Z series mainframe.
Can you build a server complex with more MIPS for less money? Absolutely. The question becomes what is the cost and risk of migrating that legacy application.
Yes, you could do that. Multiple images, actually. And that's basically what these servers do automatically. There are 4 levels of cache, main memory (which is RAID-protected actually, called RAIM -- only IBM does that), and there's another optional level of directly processor-addressable memory called Flash Express which is nonvolatile -- that's new, too. It works particularly well for fast paging, in-memory databases, memory dumps, etc. Then you go into fiber-attached and heavily cached solid state disk, fast disk, nearline disk, tape libraries. There are a lot of storage layers, and they're all very big.
What exactly are you basing your claims on? Just pulling things out of thin air?
Here are some things that IBMs customers care about, where are the Core i7 Extreme numbers for these?
How many CICS transactions can I process per second? How many IMS updates? How about DB2 transactions? How many SSL transactions? What differences are there in performance for on-line vs batch processing? Can I tune the system to maximize performance for my particular workload?
So you don't like my benchmark then and want another benchmark? OK. I chose a perfectly reasonable benchmark: number of servers (X) to deliver a particular real-world business outcome, where smaller X is better. A benchmark is simply a measurement to assess particular criteria (such as X) against a particular outcome (such as running a bank). I can agree that that an IBM zEnterprise EC12 server is not the answer to every IT problem. It is, however, the answer to many. And if you can't agree to that, then you simply have more to learn. (How exciting!)
No, that's not a correct supposition -- quite the opposite, actually. All processors, including Intel X86, use microcode (or what IBM calls millicode) to a degree.
At least from what I've read about the past few generations of S/3x0 chips, millicode is more like PALcode on the Alpha processor than like traditional microcode, i.e. it's a combination of regular machine code and processor-specific instructions that access specialized registers etc., running in a special processor mode with (presumably) fast entry and exit, support for said processor-specific instructions (which presumably trap in either both "problem state", i.e. user mode, and "supervisor state", i.e. kernel mode), and its own bank of general-purpose registers (part of the "fast entry and exit"). Instructions implemented in millicode trap to millicode routines that implement them.
What IBM called "microcode" rather than "millicode" was implemented using processor-specific instructions completely different from the machine's instruction set (instructions often having fields that directly controlled gates).
(And then there's System/38 and the pre-PowerPC AS/400, where the processor instruction set was a CISC instruction set implemented using microcode, and where the compilers available to customers generated code in an extremely CISCy instruction set that the low levels of the OS translated into machine code and ran. For legal reasons - they didn't want to have to be required to make the low-level OS code available to "plug-compatible manufacturers", i.e. cloners - they not only called the microcode that implemented the processor instruction set "microcode" ("horizontal microcode", as it probably was "fields directly control gates"-style horizontal microcode), they also called the aforementioned low level OS code "microcode" as well, even though it ran from main memory and its instruction set was the instruction set that was actually executed in application code ("vertical microcode"), and had the group working on that code report to a manager in the hardware group. See Frank Soltis's Inside the AS/400.)
IBM knows it well. After all, they invented microcode/millicode in the System/360 in 1965.
"Invented", no; the paper generally considered to have introduced the concept was "Microprogramming and the Design of the Control Circuits in an Electronic Digital Computer", by Maurice Wilkes and J. B. Stringer, from 1953. S/360 may have been the first line of computers to use microcode in most of the processors (S/360 Model 75 was, I think, implemented completely in hardwired logic).
Very cutting edge -- so cutting edge I've got to crack open some engineering manuals to try to figure out what they've done, although they probably need to write those manuals.
Well, for the previous generation, there's Volume 56, Issue 1.2 of the IBM Journal of Research and Development has some papers on the z196, but, alas, not for free online. They may publish an issue on the zEC12 at some point.
It may not be just you. But I think a lot of people really have no idea of just how many mainframes are still chugging away doing what they've always done.
My wife does outsourced SAN storage, and they still have a couple of clients with big iron running.
Every couple of years when everybody has forgotten about the machines, an IBM tech will call up and say that the machine has phoned home and has a part that needs to be swapped out and that he needs to go onsite. Which usually leads to several hours of people trying to remember what it is and where it is (except the guys who work in the data center, who can't miss it).
I've worked in several places that have had mainframes for literally decades. And I've even worked on a project or two which tried to replace ancient, purpose built software with some shiny new stuff. In the cases I've seen, after spending a few years a a few million dollars ... they still can't replace the mainframe and scrap the project.
I knew someone in the early 2000's who had retired from his job with a full pension, and was back as a consultant making at least 3x his old salary because they no longer could find someone who knew the machines and the software like he did.
Mainframes haven't gone away. Not by any stretch. And I bet this one still runs the stuff from the IBM 360 days quite nicely.
Lost at C:>. Found at C.
Well, no. Right tool for the right job and all. You can buy the world's most expensive Olympic racing bicycle, but it won't haul an Airbus fuselage to its factory. There are many problems that cannot be solved with infinite amounts of money wrongly applied.
L3 is 48MB, (see p. 43), not GB as The Register had it, thanks for noticing that.
As always, all IMO. Insert "I think" everywhere grammatically possible.
Hmmm, six cores with each running at 1 ghz equals 6 ghz with a 5% overhead makes it 5.7 ghz maximum... IBM Marketing!!!!
And the published information supporting your assumption that the cores are only running at 1GHz, and the 5.7 GHz comes from multiplying the clock rate by the number of cores and subtracting 5% as overhead, rather than each core truly running at 5.7 GHz, is?
I'm quite sure that for the applications people actually use mainframes for, you're utterly wrong.
Not only do they scale massively higher in terms of throughput, they also manage to do it with obscene uptimes (measured in years) and reliability nothing can compare to.
For certain kinds of applications, what you say is largely true. But at the huge end for things like banking, financial transactions, and airline reservations ... there's really no comparison.
I've worked on projects trying to do exactly this. And I've seen a couple of them fail.
Trying to map out all of the use cases for software which is mission critical and has been around since the 60's can actually prove to be exceedingly challenging if not impossible.
I'm just not convinced that for the kinds of applications and environments where people will run mainframes that what you suggest would give the same performance or scalability as a big giant mainframe. There just seems to be something missing from that picture, and to me it's the sheer volume of stuff these things handle. Certainly not even in the same category as what you call a midrange desktop.
Lost at C:>. Found at C.
OK, let's put some of this stupidity to rest.
First, nobody who knows anything uses MIPS to compare perfomance between two different architectures. MIPS is only marginally useful in the best of conditions, and even then is only useful as a relative measure between two machines of the same architecture running the same workload.
Second, Core i7 servers execute 178 BILLION instructions every second, on average? Seriously? 80 instructions per clock cycle, sustained? Bullshit.
Third, your nice shiny rack of Core i7 servers doesn't mean anything if it can't run your software.
Fourth, the actual performance of a Z114 processor is around 780 MIPS, not 26. So why do they have that 26 MIPS 'dialed down' model? Because some customer asked for it. Why would a customer pay $800K for a 780 MIPS machine when he only has 26 MIPS of workload? Why would the customer pay software licensing fees for a 780 MIPS machine when he only has 26 MIPS of workload?
Fifth, 'your experience' with IBM mainframes is non-existant, or you wouldn't be making these stupid mistakes and claims.
CPU isn't the single item with mainframes. Mainframes tend to have large I/O buses, and that is something that tends to be forgotten about when people talk about CPU power.
Mainframes are designed to do business tasks, be it CICS operations, DB2 transactions, or other integer based operations that require tons of data going in and tons of data going out at a time. This is why IBM has such a good caching design. Having the ability to get the numbers into and out of the CPUs is what mainframes are designed to do.
If someone expects top notch floating point operations, expect to be disappointed. MIPS and sheer bus bandwidth rule the roost when it comes to this section of computing.
They fail against mid-range x86 Xeon rackmount server in every non-I/O benchmark.
But what if you need I/O?
Mainframes are probably one of the most underutilized tools out there. However, for performance per square foot in the data center, they are hard to beat these days.
I really don't believe you are right about that. The core density of mainframes is rather sad compared to Google-style densely packed rack mounts. You can only fit about 100 user accessible processors in one mainframe, which gives you around 600 cores. You can easily fit 800 x86 cores in a standard 42U rack, even with bog-standard 1U servers while leaving room for switches and cooling, and you can more than double that if footprint is your main concern. In contrast, that mainframe won't fit in a standard 19" rack footprint AND it requires separate space for the management console.
If you are mainly running batch jobs, the lack of CPU performance of the mainframe likely won't matter and the enormous I/O capacity is very difficult to achieve in the x86 space. In that case you may well be right that the mainframe wins on performance per square foot.
Finally! A year of moderation! Ready for 2019?
You have no idea what you're talking about.
How many (hundreds) of simultaneous users will your cheap configuration support? What (fraction of a percent) is your unplanned downtime under load?
Comparing mainframe MIPs to PC MIPs makes no sense - PCs have nowhere near the throughput or reliability. I used to work on a mainframe when PCs first came out. Even then, it was clear that the two or three or four MIPs on the PC were in no way superior to the 1 MIP on a 4341 mainframe - the mainframe supported about a dozen users doing moderately compute-intensive tasks. At that time, there was no configuration to allow the PC to handle more than one user, but even if that had been possible, it would have choked on the throughput for two or three.
IBM gets the speed because cost is no option. Here is how they do it.
1. Low yield. These chips have a very large die size so the yield is going to be lower but the price is high so the trade off works.
2. Binning. The slower chips will go into the lower end machines that use the Z114.
3. Multi chip modules again to allow careful selection and improved yields.
4. Crazy levels of cooling. These have the very best cooling they can fit.
5. Professional operators, maintenance, and construction. The entire machine will be built like an expensive watch from the cooling to the memory system. The operators will follow all the procedures and if something is not perfect they will call IBM to send out a tech if the computer didn't do it.
Other companies know how IBM does this they just do not have the resources in place to compete with IBM in this market. Instead they go for the easier lower hanging fruit.
Too bad IBM blew it with the PC. If they had not been under extreme anti-trust pressure and had faith that PC where going to take off they could have used a 16 bit version of the System 360 ISA for the CPU maybe based on the 360/20 or maybe the 22.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
I've seen a couple of 'get off the Mainframe' projects "succeed".
They got systems off the mainframe, but it took years, was a very expensive process & the resulting heap of servers didn't save any money. Reliability goes downhill, manageability goes downhill, bang-for-buck goes downhill. Oh, and they still have a mainframe! (some things get moved 'out of scope')
I still get paid though :-)
reliability too.
mainframes generally run in high availability and high uptime enviroments.
you want five nines, you want a mainframe.
a cluster of x86s might reach the same performance specs for a fraction of the price, but it won't give you the same reliability.
"Intel could very well have their 8-10GHz Pentium 4(5?) now if they had continued on that path. I for one like their current processor line better."
with minimal performance gain, and increase in power.