Intel Launches Core I7-4960X Flagship CPU
MojoKid writes "Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor with the official unveiling of its Ivy Bridge-E microarchitecture. The Core i7-4960X Extreme Edition processor is the flagship product in Intel's initial line-up of Ivy Bridge-E based CPUs. The chip is manufactured using Intel's 22nm process node and features roughly 1.86 billion transistors, with a die size of approximately 257mm square. That's about 410 million fewer transistors and a 41 percent smaller die than Intel's previous gen Sandy Bridge-E CPU. The Ivy Bridge-E microarchitecture features up to 6 active execution cores that can each process two threads simultaneously, for support of a total of 12 threads, and they're designed for Intel's LGA 2011 socket. Intel's Core i7-4960X Extreme Edition processor has a base clock frequency of 3.6GHz with a maximum Turbo frequency of 4GHz. It is easily the fastest desktop processor Intel has released to date when tasked with highly-threaded workloads or when its massive amount of cache comes into play in applications like 3D rendering, ray tracing, and gaming. However, assuming similar clock speeds, Intel's newer Haswell microarchitecture employed in the recently released Core i7-4770K (and other 4th Gen Core processors) offers somewhat better single-core performance."
"a die size of approximately 257mm square."
I suspect that should be 257 square mm. A 257 mm square die couldn't even be covered by a standard sheet of paper (US:letter, EU:A4)
"National Security is the chief cause of national insecurity." - Celine's First Law
Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor
Actually, I think that useful computation per joule is all the rage all over the device size scale. See? This one works everywhere.
Ezekiel 23:20
257mm That's A Monster!
These chips are slightly faster (given equal core counts) than their predecessors but not in any interesting way.
However, you have to remember that these are really server chips that are repurposed for high-end desktop use. The one vital metric where these chips shine is in their power consumption (or lack thereof): Techreport did a test where the 6-core 4960X running full-bore is using about the same amount of power as a desktop A10-6800K part ( http://techreport.com/review/25293/intel-core-i7-4960x-processor-reviewed/9 )
That level of power efficiency will do wonders in the server world and these chips (and their 12-core bigger brothers) should do quite well in servers.
AntiFA: An abbreviation for Anti First Amendment.
It's laughable how small the performance gains are between recent generations of Core processors. I realize there are other improvements like power consumption and integrated GPU performance but the desktop gamer isn't going to drop another grand to save watts or get better performance on an IGPU he never will use anyway.
3.6GHz base clock is the fastest we've had since the last generation P4's, and with the obviously superior IPC of the IB this thing's going to be a monster for certain workloads where the code doesn't scale well to multiple cores. The only downside is it's not 8 cores/16 threads at those speeds which is a bummer for virtualization hosts. Oh well, the E5-2670's at 2.6GHz do a pretty good job =)
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
And this is what intel calls Moorse law?
Amd rulez!
Hardly a realistic comparison, given the A10 has a GPU integrated and Intel 6 core doesn't.
Because the only Multi Chip processors are still 4 years behind this. Why dont they just enable the ability for me to drop 4 of these on a single motherboard so I can have my 24 core monster for editing and rendering 4K video?
Do not look at laser with remaining good eye.
Thats an important question for me as I write the base level concurrency libraries for our company.
I wanted to get a 4770K but Intel disabled TSX (Transactional Synchronization Extensions) on that CPU.
Does this scale up, as in could you arrange four cores in a manner to approximately give 4x performance?
I don't think you understand correctly how a superscalar processor works. Maybe you're confusing parallel instruction execution with pipelining? Even single-core, non-hyperthreading processors have been able to execute multiple instructions *simultaneously, in a single cycle* since the first Pentiums or earlier. See, they can fetch two instructions at once from the cache because it has a wide internal bus, decode them simultaneously, and execute them simultaneously (if they are independent) because each core has multiple execution units. Modern processors can easily execute 3 or 4 instructions at once on a single core, in a single cycle. As I understand it, hyperthreading comes in when part of those execution units are sitting idle because there are not enough instructions in the main thread that can be executed in parallel - they're not independent, some depend on the results of others - and so those idle units are used to process another thread. Of course it's slower than having two full cores, but the point is that a single core CAN execute a lot of stuff in parallel.
I still have an old Abit BP6 system sitting next to my desk gathering dust if you want it. I even have 4 extra celeron processors for it!
Back when men where men, and dual core meant two processors!
Sadly other than specialized software, most are still only designed for single core anyway, making the performance gains negligible for most people, which means other than an expensive marketing ploy to a small enthusiast market, not much of a market advantage for any company to do so...
Since their devotion to TPM, my answer to intel was, is and will be: GO F**** yourself.
General rule of thumb is that 2x hyperthreads is approximately equal to 1.5 real cores. Nobody is lying, Intel makes the thread/core distinction very clearly. The reason is primarily due to pipeline and memory stalls creating space which can be filled by the other thread.
Keep in mind that a modern superscale cpu can have something like 160? (number not exact) instructions in-flight at any given moment, depending on how good the branch prediction is. Instruction execution is not really a matter of clock cycles so much as it is a matter of waiting for memory and execution unit resources. Even instruction-instruction dependencies can often be absorbed by the out-of-order execution engine.
-Matt
That's absolutely enormous. How could it possibly take over 66,000 mm^2 to house just 1.8B transistors?
So for $1000 I can get 1.5x the peak multithreaded performance over the $300 processor released three months ago. And if you run lightly threaded apps, the processor from earlier in the summer may still be faster. Wow...what a bargain. I'd say sign me up for two but, alas, Intel won't let you run multiple processors without paying the xeon tax.
Is it just my observation, or are there way too many stupid people in the world?
We should have like 20GHz now.
What's point of upgrading from 3.4GHz to 3.6GHz plus a number of tiny improvements that nobody cares?
They should stop selling new CPUs until they get double speed at least.
Finally they can release the new Mac Pro
This CPU very low, if not the lowest performance per price of current models, so in one category it is the worst possible buy you can make; it is incredibly over-priced.
Signature intentionally left blank.
Amazing. Everything you said about HT is completely wrong. Where ever did you get this information?
Intel's hyperthreading consists of two logical processors sharing the same compute resources. Each logical processor has its own register set but shares decoders, adders, shifters, cache, etc. as it goes about executing its assigned thread. The sharing process is vastly more complex and efficient than you seem to think -- there's no alternating of cycles. Once instructions are decoded into uops, they flow through the pipeline in a dynamic fashion that sometimes leads to one thread using most of the resources while the other one waits. In fact, this is a big advantage of the design -- when one thread stalls from a cache miss, the other one uses all the resources until the first thread's memory access completes. A much better plan than your scheme of using only even/odd cycles.
Managing this process is not simple, and steps must be taken to avoid both deadlocks and livelocks as the two threads compete for resources. But the process is dynamic -- the design allows one thread to run unimpeded when it makes sense to do so, while still preventing one thread from being starved at the other's expense. But this "every other cycle" notion of yours is pure nonsense. The core can retire up to four uops per cycle, and at times these all come from the same thread.
Doesn't seem the chip is actually available anywhere yet. I've also been hearing that September 10th may be the actual launch date.
This chip looks like it would be fantastic in engineering workstations - particularly ones running the Linuxes or BSDs. Whereas HDL CAD applications of old would run on Sun or HP workstations, the current ones would do well on one of these running either Windows 7 or Scientific Linux, and then the cad apps in question
Since we are benchmarking $1000 CPU's why not include the $850 one from AMD?
Instead we have the FX-8350, a CPU that costs $200. The extra Ghz of the FX-9590 would have moved AMD into the middle of most of those benchmarks. It would have still lost, but the benchmarks look biased without it.
http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/63024-intel-i7-4960x-ivy-bridge-e-review.html
"that can each process two threads simultaneously"
That is absolutely not how it works. It's been what, 10 years and they're still lying about hyperthreading to make it sound better? Super short summary of how it really works: (SNIP OF MISLEADING BULLSHIT MADE UP BY YOU)
Why are you lying about understanding how any of this works?
Short summary of how it actually works: Every core does N things per cycle, where N is the dispatch / retire width bottleneck, i.e. how many operations can be dispatched and/or retired in parallel. N is not fixed at 4, it can change from one core design to the next. In Intel's i3/i5/i7 and related Xeon series, the trend has been for N to go up.
Furthermore, each of the functional units that operations are dispatched to (that is, one of the N) is pipelined, so the latency of a single op is actually many cycles (probably somewhere between 15 and 20 these days) even if the nominal throughput is N/cycle. If the reader doesn't know what a pipeline is, think of it as being like an assembly line: a lot of items are lined up on a conveyor belt and moved past stations which do things to the work item.
Next, at the head of each pipeline / assembly line there is a buffer of sorts. The front end of the CPU reads instructions from a serial stream and dispatches them to pipeline buffers. The purpose of the buffers is to avoid stalls. Say you need to dispatch an op which calculates A+B, but while data item B is ready, A is equal to X+Y, and the op calculating A=X+Y hasn't competed yet. If you permitted the A+B op to start going down a pipeline as soon as it's dispatched, that pipe would have to stall (the conveyor belt would have to stop) until the A=X+Y op completed. The buffers permits the pipeline's head to avoid ops which don't have all data ready yet and pick those which do -- which results in fewer stalls, and out of order execution.
In Intel i3/i5/i7 CPUs which support hyperthreading, HT is simply an extension of this system. The front end which reads, decodes, and dispatches instructions to these buffers is permitted to fetch from two different contexts or threads, and the mechanisms which pick ready instructions from the buffers to progress down the pipelines are permitted to choose between threads on a completely arbitrary basis -- whichever one's got data available, with a few safety measures to make sure one thread can't starve the other. This means that the two threads running on the CPU core can and absolutely do execute simultaneously. Both due to pipelining (one pipe works on many ops in parallel assembly line style) and superscalar execution (there are many pipes, each with its own assembly line).
There is no enforced alternation, no odd-even. And contrary to the claims of "slashmydots", this mechanism does eliminate gaps. Pretty much the whole point of it is that if you run a single thread through a wide (many pipes) and deeply pipelined core, there is no known technique for making every station of every pipe 100% busy. In fact, they'll typically average less than 50% busy. By permitting two threads to dispatch into the same core, you get to fill up a lot of the bubbles in these assembly lines with instructions from another stream. You don't necessarily get up to a 100% utilization rate -- but you get a much higher utilization rate than a single thread can manage.
I also wonder how deliberate is the confusion. There are MANY areas inside Intel where there is confusion. The confusion is visible even when visiting the Intel campus in Oregon.
Funny story: I visited the Intel web site and was asked to complete a survey. I gave a few of the reasons why Intel CEO Paul Otellini should be fired, like paying $6 Billion for McAfee when Microsoft is giving away its Microsoft Security Essentials anti-virus software. A few months later Otellini left Intel; they didn't say why. I'm not saying my survey answers had an influence, I'm only making the point that the perception of Intel is widespread.
Intel has a long record of failure with consumer products. Now a completely separate division plans a TV product (???): Intel Media aims to remake TV with its own technology. This paragraph indicates some confusion and lack of competent direction: "Intel Media is run by Erik Huggers, an Intel vice president who worked previously at Microsoft and the BBC. He's assembled a team from such high-tech and media heavyweights as Apple, Netflix, Microsoft, Sky TV and Sony. Intel engineers in Oregon are participating, too, providing technical support for the project."
Oh... The Intel people are providing "technical support". Everyone else came from outside Intel??? And they don't know enough about technology to do their own support? There are many, many issues like that inside Intel.
We are having problems with Intel RAID. Intel technical support is poorly organized.
Apparently only the CPU and chipset division of the company is well-run. All other parts of Intel seem to have little competent supervision.
I'm no expert, but I'm pretty sure that's not a good description of how it works either.
http://en.m.wikipedia.org/wiki/Hyper-threading