Intel Launches Core I7-4960X Flagship CPU

← Back to Stories (view on slashdot.org)

Intel Launches Core I7-4960X Flagship CPU

Posted by timothy on Tuesday September 3, 2013 @01:24AM from the living-in-the-past-is-much-cheaper dept.

MojoKid writes "Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor with the official unveiling of its Ivy Bridge-E microarchitecture. The Core i7-4960X Extreme Edition processor is the flagship product in Intel's initial line-up of Ivy Bridge-E based CPUs. The chip is manufactured using Intel's 22nm process node and features roughly 1.86 billion transistors, with a die size of approximately 257mm square. That's about 410 million fewer transistors and a 41 percent smaller die than Intel's previous gen Sandy Bridge-E CPU. The Ivy Bridge-E microarchitecture features up to 6 active execution cores that can each process two threads simultaneously, for support of a total of 12 threads, and they're designed for Intel's LGA 2011 socket. Intel's Core i7-4960X Extreme Edition processor has a base clock frequency of 3.6GHz with a maximum Turbo frequency of 4GHz. It is easily the fastest desktop processor Intel has released to date when tasked with highly-threaded workloads or when its massive amount of cache comes into play in applications like 3D rendering, ray tracing, and gaming. However, assuming similar clock speeds, Intel's newer Haswell microarchitecture employed in the recently released Core i7-4770K (and other 4th Gen Core processors) offers somewhat better single-core performance."

180 comments

Min score:

Reason:

Sort:

Die size? by msauve · 2013-09-03 01:32 · Score: 5, Informative

"a die size of approximately 257mm square."

I suspect that should be 257 square mm. A 257 mm square die couldn't even be covered by a standard sheet of paper (US:letter, EU:A4)

--
"National Security is the chief cause of national insecurity." - Celine's First Law
1. Re:Die size? by Anonymous Coward · 2013-09-03 01:47 · Score: 0
  
  I wonder what the kill size it.
2. Re:Die size? by Anonymous Coward · 2013-09-03 01:50 · Score: 1
  
  While you say 257 square mm you write it as 257mm2 (imagine a website with proper support for shit like this).
  So this is a typical Slashdot fuck up. I could write a script that would do a better job then those HasBeen.Dot moderators...
3. Re:Die size? by AliasMarlowe · 2013-09-03 01:52 · Score: 2
  
  TFA says "15 mm x 17.1 mm"
  
  --
  Those who can make you believe absurdities can make you commit atrocities. - Voltaire
4. Re:Die size? by fuzzyfuzzyfungus · 2013-09-03 01:54 · Score: 1
  
  It also wouldn't fit on a 300mm (diameter) wafer... 400mm would work, and even have some room around the edges; but I probably don't even want to know what a CPU so large that you get only 1 die/400mm wafer would cost.
5. Re:Die size? by msauve · 2013-09-03 02:04 · Score: 1
  
  Writing "square mm" is perfectly correct.
  
  --
  "National Security is the chief cause of national insecurity." - Celine's First Law
6. Re:Die size? by wonkey_monkey · 2013-09-03 02:10 · Score: 1
  
  But is it technically correct?
  
  --
  systemd is Roko's Basilisk.
7. Re:Die size? by Anonymous Coward · 2013-09-03 02:15 · Score: 1
  
  Xilinx makes an FPGA (XC7V2000T) which is speculated to only have about 1 good chip per wafer due to its size. I did a quick google search but couldn't find any reliable numbers for the die size. Single chip costs range from 30k to 70k dependant upon speed grade.
8. Re:Die size? by camperdave · 2013-09-03 02:25 · Score: 1
  
  It also wouldn't fit on a 300mm (diameter) wafer...
  Well... perhaps if you cut the ingot lengthwise instead of normal to the axis?
  
  --
  When our name is on the back of your car, we're behind you all the way!
9. Re:Die size? by fuzzyfuzzyfungus · 2013-09-03 02:38 · Score: 1
  
  Wouldn't be compatible with any of the other processing equipment; but you could do it.
  
  My impression(as a layman) is that getting fairly substantial amounts of silicon isn't a big deal, with difficulty increasing as your demands concerning purity, mono-crystallinness, and dimensional accuracy go up; but that the cost of the entire chip fabrication process get very big, very fast, if you want to work with larger wafers.
10. Re:Die size? by razvan784 · 2013-09-03 03:23 · Score: 1
  
  That chip actually consists of 4 dice (Xilinx calls them Super Logic Regions) bound over a special silicon interconnect layer. Source: http://www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf The reason they do this rather than use a larger die is exactly to get a higher yield (defect density is constant, defect probability increases with surface). Therefore I highly doubt they're only getting one good chip per wafer. Cost is based on supply and demand, and these chips are very, very specialized. They're used in applications where costs are huge anyway, such as high-performance IC prototyping - things like CPUs, ASICs for multi-hundred Gb/s switches/routers et cetera.
11. Re:Die size? by WarJolt · 2013-09-03 06:33 · Score: 2
  
  But I wanted one the size of a paper. It makes it easier to reverse engineer.
12. Re:Die size? by glassware · 2013-09-03 06:56 · Score: 2
  
  Skip the die size. What's the SPECint and SPECfp? Do processor makers submit these numbers anymore?
  Any other metrics are secondary.
13. Re:Die size? by unixisc · 2013-09-03 07:23 · Score: 2
  
  I believe that those numbers went away when RISC went away
14. Re:Die size? by amorsen · 2013-09-03 08:04 · Score: 1
  
  SPECint and SPECfp are a bit useless, they only test a single core and with modern CPUs you cannot just multiply that number by the number of cores and get a meaningful result.
  SPEC has attempted to fix that simply by running multiple copies of the benchmark and aggregating the result as "SPECrate". Whether that measures anything which is useful for actual workloads is debatable. It certainly does not reflect a modern multithreaded workload.
  
  --
  Finally! A year of moderation! Ready for 2019?
15. Re:Die size? by tyrione · 2013-09-03 08:34 · Score: 1
  
  Writing "square mm" is perfectly correct.
  It's not perfectly correct. It's acceptable.
  
  Example: meter per second squared (m/s2) The modifiers “square” or “cubic” may, however, be placed before the unit name in the case of area or volume.
16. Re:Die size? by Anonymous Coward · 2013-09-03 10:21 · Score: 0
  
  Also, the price range is more like $17.5K to $40K qty 1, with enormous discounts available if you buy in bulk (particularly if you're one of a few big companies such as Cisco). Pricing was initially higher but has fallen considerably as Xilinx has ramped up volume.
  http://www.digikey.com/product-search/en?pv457=412&FV=fff40027%2Cfff80166&k=virtex-7&mnonly=0&newproducts=0&ColumnSort=0&page=1&quantity=0&ptm=0&fid=0&pageSize=25
  The knobs which affect price are package type (there's a 1761 ball and a 1925 ball package, the larger one has more usable I/O pins), temperature range (commercial / extended / industrial), speed grade (-1 or -2, -2 is faster), and SERDES performance. The last is actually what distinguishes the really expensive parts from the rest; Xilinx charges a large premium for very fast (over 10Gbps) SERDES. And it's pretty much an artificial pricing distinction since it doesn't actually cost Xilinx more to make these variants. (Another one: it does cost more to make a larger package with more I/O, but not thousands of dollars more.)
  (note: I don't think these things are outrages, it's normal for the chip industry and there's actually a strong economic argument that this kind of price tiering supports offering the stripped down version at significantly lower prices than would otherwise be possible.)
17. Re:Die size? by Anonymous Coward · 2013-09-03 10:29 · Score: 0
  
  SPECint and SPECfp are a bit useless, they only test a single core and with modern CPUs you cannot just multiply that number by the number of cores and get a meaningful result.
  They are the opposite of useless because evaluating single-thread performance is still important.
  (Unfortunately some of the sub-benchmarks in SPECint and SPECfp have proven to be vulnerable to compiler autoparallelization, so you now have to be pretty careful about interpreting the numbers. libquantum is especially notorious for this, with a good autopar compiler its score scales almost linearly with the number of cores. There's some advocates -- and I tend to agree -- of throwing out everything but the GCC sub-bench in SPECint, since the GCC codebase is extraordinarily hard to game by tuning compilers for it. Mostly because the GCC codebase is a giant squirrelly mess.)
18. Re:Die size? by Anonymous Coward · 2013-09-03 10:40 · Score: 0
  
  You could, you know, try looking. SPEC has this searchable results database thingy.
  http://www.spec.org/cgi-bin/osgresults?conf=cpu2006
  I've done the work for you -- there's nothing listed yet for "Processor" Matches "4960". However, the listings for the previous generation 3960 combined with Intel's history of submitting SPEC scores for its high end processors suggests that it's a matter of time.
  (Note that SPEC has rules about commercial availability -- you have a limited amount of time between submitting an official result run till real hardware goes on sale to the general public, and the benchmark has to be run on the actual thing people can buy. If you violate these rules another vendor can contest the submission and get it pulled from SPEC.org's database. This has actually happened before. So it wouldn't surprise me if Intel was waiting for (a) the final die stepping and (b) the final BIOS and motherboard revisions that are actually going on sale in order to begin performing the test runs they planned to submit to spec.org.)
19. Re:Die size? by tlhIngan · 2013-09-03 18:04 · Score: 1
  
  The reason they do this rather than use a larger die is exactly to get a higher yield (defect density is constant, defect probability increases with surface). Therefore I highly doubt they're only getting one good chip per wafer. Cost is based on supply and demand, and these chips are very, very specialized. They're used in applications where costs are huge anyway, such as high-performance IC prototyping - things like CPUs, ASICs for multi-hundred Gb/s switches/routers et cetera.
  It's not just supply and demand, it's processing costs.
  And yes, I've used the ZCV7000T chips before - the platform I used I think had the $30K versions (in 1000 lot quantities, heheh - that's $30M if you want the quantity discount), and had 4 of them, or 8 of them. And even the 8-FPGA ones still weren't big enough.
  The thing is - a silicon wafer has a number of defects. The bigger the die, the greater the likelihood of a defect (and at 22nm, we're not talking about huge defects, either - just a simple misplaced atom is good enough as now we're talking of hundreds of atoms).
  Also, the bigger the die, the less you can make per wafer, so it's a double whammy - the larger dies mean greater chance of defects, AND less dies per wafer. So smaller dies generally get much better yields (it's not a linear relationship). Splitting your chip into multiple dies complicates packaging, but can increase yields more than enough to compensate.
  A wafer typically costs around $1200 each after processing, so even 1 die per wafer the cost is $1200.
  And some stuff defects are fine - big devices like CCDs and CMOS camera sensors have dead pixels (defects) that are mapped out by processing software - because these sensors are so big and of fixed size size so Moore's law can't make them cheaper). Also, NAND flash has bad blocks, and the newer ones are often multi-dice as well to get larger densities.
Power consumption by K.+S.+Kyosuke · 2013-09-03 01:32 · Score: 2

Low-power parts for hand-held devices may be all the rage right now, but today Intel is taking the wraps off a new high-end desktop processor
Actually, I think that useful computation per joule is all the rage all over the device size scale. See? This one works everywhere.

--
Ezekiel 23:20
1. Re:Power consumption by slashmydots · 2013-09-03 03:12 · Score: 2
  
  That's not necessarily true. Someone running a photo editing app on their Galaxy and saying it's slower than their PC is one thing but that's wrong on so many levels. My not-so-smart phone runs Brew and the 1000mAH battery has a realistic idle time rating of 27 days and screen-off talk time of something like 16 hours. If someone basically wants a 4 ounce laptop with a 4" screen that runs for 48 hours, they're dreaming. More reasonable people just want an absurd battery life and realize that a phone can't process like a computer so everything will be a bit slow.
  
  So instead of performance per joule, it's really more like how quickly can it underclock and to how low of a TDP.
2. Re:Power consumption by K.+S.+Kyosuke · 2013-09-03 05:37 · Score: 3, Informative
  
  You're basically reiterating what I've said. The more you can compute with a fixed amount of energy, the less energy you consume for a fixed amount of computation.
  
  --
  Ezekiel 23:20
257mm That's A Monster! by Anonymous Coward · 2013-09-03 01:34 · Score: 0

257mm That's A Monster!
Boring on the Desktop Great in Servers by CajunArson · 2013-09-03 01:37 · Score: 3, Informative

These chips are slightly faster (given equal core counts) than their predecessors but not in any interesting way.
However, you have to remember that these are really server chips that are repurposed for high-end desktop use. The one vital metric where these chips shine is in their power consumption (or lack thereof): Techreport did a test where the 6-core 4960X running full-bore is using about the same amount of power as a desktop A10-6800K part ( http://techreport.com/review/25293/intel-core-i7-4960x-processor-reviewed/9 )
That level of power efficiency will do wonders in the server world and these chips (and their 12-core bigger brothers) should do quite well in servers.

--
AntiFA: An abbreviation for Anti First Amendment.
1. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 02:20 · Score: 0
  
  Intel segments the desktop and server market with ECC functionality. Xeons support ECC, everything else does *not*. So unless this new chip supports ECC, you're off your rocker thinking this is a repurposed server chip.
2. Re:Boring on the Desktop Great in Servers by LordLimecat · 2013-09-03 02:27 · Score: 1
  
  Considering that AMD is a gen or two behind, and their chips arent currently known for their efficiency, I dont know how impressive that is.
3. Re:Boring on the Desktop Great in Servers by JDG1980 · 2013-09-03 02:45 · Score: 3, Informative
  
  Intel segments the desktop and server market with ECC functionality. Xeons support ECC, everything else does *not*. So unless this new chip supports ECC, you're off your rocker thinking this is a repurposed server chip.
  The same die is used for both chips; it's just that the ECC functionality is fused off in the non-Xeon parts binned for desktop use.
  By the way, it's not strictly true that Xeons are the only Intel parts that support ECC. Ivy Bridge Celerons and Pentiums have this feature as well (if you use a compatible server motherboard). It was fused off on the mainstream desktop quad core parts because they wanted people to buy Xeon E3's instead.
4. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 02:47 · Score: 0
  
  Xeons support ECC, everything else does *not*.
  ... if you ignore all Ivy Bridge and Haswell i3s.
5. Re:Boring on the Desktop Great in Servers by petermgreen · 2013-09-03 02:55 · Score: 2
  
  AIUI Intel takes a handful of basic designs and cripples them in different ways to produce a wide variety of products which they then sell at different price points depending on what they think customers will be willing to pay.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
6. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 02:57 · Score: 0
  
  Weird part is, E3/E3v2/E3v3 Xeons are priced quite competitively against their SB/IB/HW i7 cousins.
  Makes the whole thing look like segmentation for segmentations sake.
7. Re:Boring on the Desktop Great in Servers by interval1066 · 2013-09-03 03:06 · Score: 2
  
  ...which is what you do to run any successful business. The hothardware article is a pretty sweet read if any kind of a hardware geek, btw.
  
  --
  Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
8. Re:Boring on the Desktop Great in Servers by slashmydots · 2013-09-03 03:14 · Score: 1
  
  I doubt anyone being serious about cutting their server power costs would go with this new chip in the first place. The socket Xeon E5 T-series are purposely underclocked but with a high single-core turbo so they benchmark (at single operations) at a somewhat close speed but take up immensely less power. It's like 50% less on most chips if I remember correctly.
9. Re:Boring on the Desktop Great in Servers by petermgreen · 2013-09-03 03:20 · Score: 1
  
  Interestingly according to the die photo this time round it appears to have been designed as a 6 core rather than designed as an 8 core and then crippled to make a 6-core like it was with SB-E.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
10. Re:Boring on the Desktop Great in Servers by phantomfive · 2013-09-03 04:19 · Score: 2
  
  That level of power efficiency will do wonders in the server world and these chips (and their 12-core bigger brothers) should do quite well in servers.
  And later this year, when Atom goes to 22nm, it may also do quite well in mobile phones, given they've already developed a quality ARM emulator.
  
  --
  "First they came for the slanderers and i said nothing."
11. Re:Boring on the Desktop Great in Servers by Blaskowicz · 2013-09-03 04:22 · Score: 1
  
  Intel segments shit like AES NI, Vt-d and "TSX" as well (Haswell?). Not available on your i3.
12. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 05:31 · Score: 0
  
  >Techreport did a test where the 6-core 4960X running full-bore is using about the same amount of power as a desktop A10-6800K part
  Actually, that really just tells just how badly AMD is behind.
13. Re:Boring on the Desktop Great in Servers by WilliamGeorge · 2013-09-03 05:52 · Score: 1
  
  It pretty much is :)
  Anywhere there isn't a Xeon equal, you stand a chance of finding ECC support. As the parent post noted, some celeron / pentiums have that - and so do a lot of Core i3 (dual core w/ hyperthreading) since there are no comparable Xeon E3 (all quad-cores).
  
  --
  William George
14. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 06:52 · Score: 0
  
  Yeah, lately Xeons are pretty much i7 chips minus integrated graphics, plus some extra virtualization and encryption stuff and whatnot. Some even use the same sockets, making them very viable choices for anything except office and media server use.
15. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 10:50 · Score: 0
  
  You can see why if you look at the changes on the Xeon side in this generation.
  Sandy Bridge EP: 8-core die: Xeons 4/6/8-core, desktop socket 2011 4/6-core
  Ivy Bridge EP: 6-core die: Xeons 4/6-core, desktop socket 2011 4/6-core
  Ivy Bridge EP: 12-core die: Xeons 8/10/12-core, no desktop version
  Basically, if you're going to offer a 12-core Xeon, it's pretty crazy to fuse off 66% of the cores in order to sell a 4-core part. So they chose to do two physical designs, which lets them keep costs for the 4/6-core versions in line with the number of enabled cores.
16. Re:Boring on the Desktop Great in Servers by Anonymous Coward · 2013-09-03 11:07 · Score: 0
  
  It's actually a lot more complicated than that, there are no easy rules any more. Here's a giant table of all Haswell i3 and Pentium models. You can click the blue triangles to the left of headings like "AES-NI" to sort the columns by that parameter, which helps a lot in picking out which ones have a feature.
  http://ark.intel.com/compare/77403,75608,76620,75104,75105,75107,75988,76608,76609,76293,76346,75110,76294,75989,77778,77777,77776,77775,78007,77774,77773,76622,76621,77404,77771,77770,77769,77481,77480
  AES-NI: supported on many i3s and Pentiums
  VT-d: supported on three embedded (FCBGA aka soldered) i3 models
  TSX: supported on two soldered Pentium models
  ECC: available on nearly all LGA 1150 (socketed) Haswell i3/Pentium desktop processors except the i3-4330
Another marginal perf iteration of Core by JoeyRox · 2013-09-03 01:37 · Score: 4, Insightful

It's laughable how small the performance gains are between recent generations of Core processors. I realize there are other improvements like power consumption and integrated GPU performance but the desktop gamer isn't going to drop another grand to save watts or get better performance on an IGPU he never will use anyway.
1. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 01:43 · Score: 0
  
  Furthermore, this particular CPU doesn't even have an IGPU.
2. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 01:45 · Score: 0
  
  You don't know desktop gamers very well.
3. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 01:49 · Score: 3, Insightful
  
  There are two reasons:
  1) AMD is really behind after they reworked their architecture, hence no pressure on Intel.
  2) Moore's Law has ended some time ago on a per-core basis and nobody noticed.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
4. Re:Another marginal perf iteration of Core by L4t3r4lu5 · 2013-09-03 01:50 · Score: 1
  
  Lower watts in = lower watts out = more thermal room for overclocking.
  
  Tell me again how gamers aren't interested in how much power a stock CPU uses.
  
  --
  Finally had enough. Come see us over at https://soylentnews.org/
5. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 01:51 · Score: 0
  
  With each faster processor release we should get work done faster to have time to relax but we just keep getting more to do. Boss will be happy when a processor finally comes out that allows getting tomorrow's work done yesterday.
6. Re:Another marginal perf iteration of Core by JoeyRox · 2013-09-03 01:56 · Score: 3, Insightful
  
  Sure, I'll tell you again. Even though the power consumption drops for each new process shrink the heat drop isn't commensurate because the transistors are packed more tightly together. Do a search online about how poorly Ivy Bridge OC's vs Sandy Bridge on a relative CPU frequency basis.
7. Re:Another marginal perf iteration of Core by fuzzyfuzzyfungus · 2013-09-03 02:02 · Score: 2
  
  You don't know desktop gamers very well.
  The better per-thread performance of the competing Haswell part may keep them away, though(unless the increased cache makes up for it). Games make better use of additional cores than they used to; but they still don't tend to go as far in that direction as server or some workstation loads.
  
  Some people are going to buy it just because it's the flagship, of course; but better performance on highly threaded tasks won't necessarily save it among gamers. (Especially if Intel prices it so as to discourage people who might otherwise buy Xeon-based workstations from buying this part instead).
8. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 02:06 · Score: 0
  
  Lower watts in = lower watts out = more thermal room for overclocking.
  Tell me again how gamers aren't interested in how much power a stock CPU uses.
  Unfortunately, gamers are more interested in the graphics card, since it's the workhorse in the vast majority of recent titles. Admittedly, there are games that tax the CPU heartily, but not enough to make this beast of a processor a relevant purchase.
9. Re: Another marginal perf iteration of Core by UnknownSoldier · 2013-09-03 02:16 · Score: 1
  
  Is this the first multiplier unlocked Intel chip (K series) that I can buy without a crappy Intel IGPU? So it should be cheaper, right? I already have a high end discrete GPU.
10. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 02:18 · Score: 0
  
  13% lower power than SB-E with a 41% smaller die == 47% higher power per area.
  Worse, IB-E has thermal compound between die and heatspreader, same as IB vs. SB.
  -> good luck pushing the thing even under water.
11. Re:Another marginal perf iteration of Core by Shinobi · 2013-09-03 02:24 · Score: 1
  
  Actually, given that streaming is becoming more and more common among gamers, so multiple cores/hyperthreading is becoming quite popular with gamers too.
  1080p streaming in 60FPS is quite CPU intensive.
12. Re:Another marginal perf iteration of Core by DigiShaman · 2013-09-03 02:30 · Score: 3, Informative
  
  Actually, that has to do with the fact Ivys CPU packaging uses thermal interface material between the die and heat spreader whereas Sandys have them spreaders soldered to the die. Stupid mistake on Intel's behalf if you ask me!
  
  --
  Life is not for the lazy.
13. Re:Another marginal perf iteration of Core by petermgreen · 2013-09-03 02:35 · Score: 1
  
  The better per-thread performance of the competing Haswell part may keep them away, though(unless the increased cache makes up for it). Games make better use of additional cores than they used to; but they still don't tend to go as far in that direction as server or some workstation loads.
  Indeed.
  Since sandy bridge Intel has been releasing the high end desktop parts very late compared to the mainstream parts. By the time SB-E came out the mainstream desktop parts were on very nearly on IVY bridge. This time it was even worse, not only did IB-E comw out AFTER the mainstream desktop parts were on haswell but haswell brings a more substantial improvement in IPC than IVY did.
  They try to hide it with misleading model numbers but I strongly suspect that most of the people who spend this much money on a CPU are not so easilly fooled.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
14. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 02:37 · Score: 0
  
  The problem is that there really just aren't any significant architectural improvements left to make. Processor speedup comes from either running it faster (i.e. more GHz) or harnessing more parallel execution. Processes hit a wall on GHz scaling with Moore's law a decade ago and around the same time, processors exploited just about all of the practical instruction level parallelism. There really isn't much performance benefit left unless you are willing to say "screw power" and either run the chip faster and invest a thousand dollars for a cooling system or use up more area for very marginal gains/unit size. Obviously the vendors assume better power and more cores is the better use of those transistors.
15. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 02:47 · Score: 0
  
  To followup, most programs are memory bound anyways so processor performance is rarely your bottleneck anyways. Your cache can only get so big.
16. Re:Another marginal perf iteration of Core by MacGyver2210 · 2013-09-03 02:48 · Score: 1
  
  I have an i7 3660, and with 8 threads, I still have only found a single program that would thread onto more than four of the cores: VLC Media Player. It seems only the super techy, data-intensive, community-built software can keep up with the core wars? Am I just playing the wrong games?
  
  --
  If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits
17. Re:Another marginal perf iteration of Core by JDG1980 · 2013-09-03 02:49 · Score: 1
  
  Worse, IB-E has thermal compound between die and heatspreader, same as IB vs. SB.
  Source? I thought a while back someone pulled the lid off a Ivy Bridge-E CPU engineering sample and found that it was soldered (the CPU was destroyed in the process). There were photos of this posted on a couple of sites.
18. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 02:52 · Score: 0
  
  moore's law isn't about performance.
19. Re:Another marginal perf iteration of Core by Lumpy · 2013-09-03 03:01 · Score: 2
  
  "screw power" and either run the chip faster and invest a thousand dollars for a cooling system "
  Water cooling systems are a LOT cheaper than that. Look at what overclockers are using today you can get a good watercooling system to suck out a LOT of that heat for less than a couple hundred bucks.
  Problem is that most guys undersize the heat dump radiator.
  
  --
  Do not look at laser with remaining good eye.
20. Re: Another marginal perf iteration of Core by petermgreen · 2013-09-03 03:04 · Score: 1
  
  Is this the first multiplier unlocked Intel chip (K series) that I can buy without a crappy Intel IGPU?
  
  No.
  Even counting just K suffix chips. There was the i7-875K back in 2010 though noone showed much interest because it was before Intel clamped down on multiplier overclocking. There was also the i7-3930K 6-core SB-E chip more recently. If you also count extreme edition (X suffix) chips then there were a lot more unlocked intel chips without integrated GPUs.
  
  So it should be cheaper, right?
  No
  While you are doing away with the GPU yes but you are getting more memory channels, more PCIe lanes and possiblly more cores (depending on which of the three models you buy). To support the extra memory channels and PCIe lanes (as well as the QPI links for dual socket variants) the socket also has a lot more pins.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
21. Re:Another marginal perf iteration of Core by slashmydots · 2013-09-03 03:07 · Score: 2
  
  Everyone says AMD is behind but that's based on a ridiculous comparison. Just do the #1 most important benchmark, speed vs price, and AMD is winning. Yeah, power vs performance comes into play but at least in bang for your buck, they're crushing Intel. It's just like Roundy's with food. If you're almost as good, just price it lower to compensate and everyone will buy your product instead. If Intel wanted to put AMD in some real trouble, they wouldn't have kept the i3-2100 at the same price for 2 years straight.
22. Re:Another marginal perf iteration of Core by DrXym · 2013-09-03 03:08 · Score: 1
  
  The better per-thread performance of the competing Haswell part may keep them away, though(unless the increased cache makes up for it). Games make better use of additional cores than they used to; but they still don't tend to go as far in that direction as server or some workstation loads.
  Maybe some gamers shut down everything on their desktop when they are playing. Personally leave all my apps open, sometimes even playing music or videos on another screen. So regardless of a game making full use of a CPU, any spare capacity can be used to service the rest of the desktop.
23. Re:Another marginal perf iteration of Core by UnknowingFool · 2013-09-03 03:13 · Score: 3, Funny
  
  Hush! To Adobe's Flash unit, that might sound like a challenge.
  
  --
  Well, there's spam egg sausage and spam, that's not got much spam in it.
24. Re:Another marginal perf iteration of Core by slashmydots · 2013-09-03 03:15 · Score: 1
  
  Who said anything about a gain? I looked at 3 respectable benchmarks and overall, this chip is slightly slower than a 4770K.
25. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 03:15 · Score: 0
  
  As a developer, my workstation has a 8 threads machine. Cause the product we developing are using more then 40 threads no problem at all.
  Why? Cause people can not program, cause Java is to easy. There are different reasons. Do I need a 8 thread+ machine at home?
  No I dont, but will I buy one? Yes I will. Why? Cause having threads over makes other programs run nicely. Run a program eating 4 threads, and you still have response in GUI and other programs. Unless you hit some other bottleneck (and you will).
  My next machine will be a i7 with SSD, no bit storage anymore, internet will keep my movies from now on :-)
26. Re:Another marginal perf iteration of Core by slashmydots · 2013-09-03 03:17 · Score: 1
  
  While you are correct, that wasn't the reason in that case. Sandy vs Ivy overclocking was due to the way they added thermal compound inside the CPU itself. They did it differently and it jacked up the average core temp on every Ivy chip by as much as 10C in some cases.
  
  Anyway, in response to the original post, lower power means cheaper power components that can't handle as many watts so it actually limits the amount of power the CPU can use. They don't make a chip twice as efficient and then leave the same old power handling components inside it.
27. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 03:22 · Score: 0
  
  PS4 and Xbox One have 8 threads so more games may use them on the PC when they are ported over between the platforms.
28. Re: Another marginal perf iteration of Core by petermgreen · 2013-09-03 03:30 · Score: 1
  
  before Intel clamped down on multiplier overclocking.
  Brainfart, I mean before they cracked down on FSB/BCLK overclocking.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
29. Re:Another marginal perf iteration of Core by robthebloke · 2013-09-03 03:32 · Score: 1
  
  AVX v.s. SSE4.2 : 8 x floats per instruction v.s. 4 floats per instruction. (Nehalem v.s. Sandy Bridge)
  AVX2 v.s. AVX : 8 x 32 integers per instruction v.s. 4 x 32 bit integers per instruction. (Ivy Bridge v.s. Haswell)
  
  The performance gains certainly are there. As per usual, meaningless benchmarks are meaningless.
30. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 04:10 · Score: 0
  
  Actually, I haven't seen this mentioned in regards to Ivy Bridge-E, but Sandy Bridge-E supported 4 memory channels, while all the non-E lines, including Haswell, only support 2 memory channels. That's the big feature and generally makes up for IPC differences.
31. Re:Another marginal perf iteration of Core by Cajun+Hell · 2013-09-03 04:13 · Score: 3, Informative
  
  It's laughable how small the performance gains are between recent generations of Core processors. I realize there are other improvements like power consumption and integrated GPU performance but the desktop gamer isn't going to drop another grand to save watts or get better performance on an IGPU he never will use anyway.
  The only thing that's laughable, is that the desktop gamer thinks everything is about him and that his concerns add up to even 1% of the market.
  
  --
  "Believe me!" -- Donald Trump
32. Re:Another marginal perf iteration of Core by T-Bone-T · 2013-09-03 04:19 · Score: 1
  
  Speed vs Price is important when comparing similar speeds. Price doesn't matter if the speed isn't good enough, which is where Intel is winning.
33. Re:Another marginal perf iteration of Core by DarkXale · 2013-09-03 04:21 · Score: 1
  
  The problem more lies in that there are several games whose performance is dictated by the per-thread performance of the CPU, and virtually never by the total performance of the CPU.
  
  The video game norm is to have 2 main threads and one GPU driver thread ("3 core utilization"). There'll be a whole bunch of secondary threads as well - but these consume negligible amounts of time (tops 5% or so totalling all of them), and many are only triggered in specific conditions - such as when the game needs to load new resources.
  
  Consequently, even a 4 core CPU can have one of its cores idling pretty much at 100%, and there will nearly always be a fair bit of spare resources on the GPU driver thread, and often on the secondary 'main' thread. Far more than enough to run anything and everything in the background, save recording software in some configurations - backgrounds tasks simply aren't CPU demanding enough to care.
34. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 04:21 · Score: 0
  
  uhh no. if the game is streaming, then all the machine is doing is decoding video. that can be done with 4 year old hardware..or anything with a recent budget gpu.
35. Re:Another marginal perf iteration of Core by T-Bone-T · 2013-09-03 04:22 · Score: 1
  
  Anyway, in response to the original post, lower power means cheaper power components that can't handle as many watts so it actually limits the amount of power the CPU can use.
  Do you have any evidence of this? That sounds like pure conjecture to me.
36. Re: Another marginal perf iteration of Core by Blaskowicz · 2013-09-03 04:27 · Score: 1
  
  Actually, the i7 3820 is cheaper than 2600K and 3770K, the i7 4820K will be cheaper than 3770K and 4770K I think (even just 20 euros). This is more than offset by the cost of the motherboard though.
37. Re:Another marginal perf iteration of Core by green+is+the+enemy · 2013-09-03 04:54 · Score: 1
  
  I'm wondering if all this SIMD is even worth it. It is far too difficult to program. For the average programmer, going the GPU (CUDA) route offers far more bang for their effort. These SIMD instructions will find very limited use, maybe in some video codecs. The same codecs probably run an order of magnitude faster on a GPU anyway, even on the Intel built-in one.
  
  I would tend to agree that single-threaded performance has pretty much reached a practical limit. Calling SIMD single-threaded is a stretch, as it does not easily conform to the single-threaded programming model. Now we have a daunting task of figuring out which explicitly parallel architecture to concentrate on: Traditional multithreading vs. SIMD vs. GPU (SIMT) or a combination, or something else entirely. So far the GPU way seems to be winning for most tasks.
38. Re:Another marginal perf iteration of Core by DigitAl56K · 2013-09-03 05:22 · Score: 1
  
  There are two reasons:
  1) AMD is really behind after they reworked their architecture, hence no pressure on Intel.
  That's a really stupid thing to say, as if thousands of highly skilled engineers at Intel turn up every morning and just don't give a shit. If you've been paying attention, if there's any lacking on the desktop/server chips it's probably due to Intel going all out to take ARM's business in the mobile and tablet space.
39. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 05:29 · Score: 0
  
  Too many Intel fanboys replied to you already, so I thought I'd chime in to balance things out.
  Speed vs. price, particularly when it comes to gaming, is very important. You're also correct: AMD is wrecking Intel there. The problem is, you have sites like Tom's Hardware with an obvious pro-Intel bias telling consumers ridiculous things like the i5-2500K being two tiers above the AMD Phenom II X4 955 BE when in actuality there aren't any benchmarks that support this viewpoint (I don't consider a 5 FPS difference worth an extra $100). An anecdote: I'm running two gaming systems that happen to use both these CPUs and there is zero perceivable difference in performance amongst dozens of titles. What's funny about this is the Intel system actually has a superior GPU.
  On the other hand, I always laugh when I see someone waste hundreds on an i7 for a gaming rig, while I spend less than 50% on an AMD processor and get the same performance. Another trick I like to use is spend what I saved on the AMD CPU on a better graphics card - this more than balances out the equation on titles that are GPU bound.
40. Re:Another marginal perf iteration of Core by Hadlock · 2013-09-03 05:37 · Score: 1
  
  And yet nobody is buying AMD products in the desktop or server space. AMD has consistently been below 10% for over a decade I believe.
  
  Price/performance doesn't matter a whole lot when the difference in price on the chips is less than $100. If you're buying i3/i5/i7-class chips you're already looking at real world performance rather than budget.
  
  --
  moox. for a new generation.
41. Re: Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 05:37 · Score: 0
  
  I already have a high end discrete GPU.
  So do I, but I figured the $5 extra I paid at the time was worth it for a version of the cpu with a gpu as opposed to the same model without a gpu. If the discrete version ever breaks or has issues, I have a reasonable backup without having to dig out an older board from somewhere or someone.
42. Re:Another marginal perf iteration of Core by lexman098 · 2013-09-03 05:39 · Score: 1
  
  "Not good enough for what?" is the question though. We're talking about desktop CPUs here.
43. Re:Another marginal perf iteration of Core by T-Bone-T · 2013-09-03 06:01 · Score: 1
  
  That's a fair question. I can think of many things that I do and new features in programs that I love that would probably easily run on a very old computer. I used a 2003-era laptop until 2011 that met the vast majority of my needs. That's why I choose an i3 for my new desktop. It had excellent bang-for-the-buck and was so much faster.
44. Re: Another marginal perf iteration of Core by dywolf · 2013-09-03 06:19 · Score: 3, Funny
  
  i still dont understand their numbering system. it seems designed to generate the most confusion possible as to what is where in the hierarchy.
  its got a higher number...
  oh wait but its an older name...
  no wait, it's got an -E...
  but the box is Blue...
  with an Eagle...
  but its wings are folded...
  Blah.
  
  --
  The guy who said the election was rigged won the presidency with the second-most votes.
45. Re:Another marginal perf iteration of Core by Shinobi · 2013-09-03 06:23 · Score: 1
  
  There are several programs used to encode your stream for broadcast that supports more than 4 threads, and can use them if the load is high enough. But there's also the fact that the game itself uses some CPU, then you have the streaming software running, which grabs the game output and encodes it.
  Meaning that you really want multiple cores/hyperthreading.
  OBS is one of those streaming programs(You can also use QuickSync if you have that enabled)
46. Re: Another marginal perf iteration of Core by UnknownSoldier · 2013-09-03 06:35 · Score: 1
  
  While I agree that is a good backup plan, if/when my GTX Titan dies I already have 2x GTX 560 Ti w/ 448 cores in 2 different machines. (The GTX Titan replaced one of them.)
  I _don't_ want 2 sets of GPU drivers installed on my system, just one.
47. Re:Another marginal perf iteration of Core by Billly+Gates · 2013-09-03 06:44 · Score: 1
  
  Sigh
  I can't tell if you were being sarcastic or not?
  Your post is 100% true ... back in the phenom II/core 2 days. Maybe the 1st generation i3 and i5s vs a phenom II and I would agree with you. If you run heavily threaded VMWare Workstation, Sony Vegas, or +60 tabs of Chrome (Not firefox due to the lack of 1 thread per tab) + 6 apps at the same time, then you get great value with AMD back in 2010.
  But today, Intel is creaming them and I see these posts as a bunch of desperate panics from AMD fans trying to be defensive against the Intel posters. It is not a personal attack. Right now sadly AMD is losing and most general use an i3 is a better buy.
  FYI I am typing this on a 2010 era AMD 6 core 2.6 ghz system with an ATI card. It runs Virtualbox VMs great and I do like to leave stuff up where it multitasks. However, it is starting to show its age. It has its moments with javascript even in Chrome and SWTOR does have FPS drops where the fans blow like crazy even with an upgrading ATi 7850!
  A less 4 core i5 would be smooth in such a situation sadly even under multitasking and especially single tasking.
  
  --
  http://saveie6.com/
48. Re: Another marginal perf iteration of Core by petermgreen · 2013-09-03 06:57 · Score: 1
  
  its got a higher number...
  The number system is a mess and really does feel like it was designed by marketing.
  Afaict the fundamental issue is that Intel has recently been focussing more on it's laptop and mainstream desktop markets than the high end desktop and server markets. The result is that the high end desktop stuff is currently about a generation behind the mainstream desktop stuff.
  Of course intel doesn't want to make that too obvious, so they have made the first digit of the part numbers on their last two generations of high end desktop stuff one higher than it should be for consistency with their mainstream desktop stuff.
  
  oh wait but its an older name
  Indeed, unlke the part numbers the microarchitecture names don't seem to be being influenced by dubious marketing.
  
  no wait, it's got an -E
  Which means it's for the high end ("enthusiast") desktop platform rather than the mainstream desktop platforms.
  
  but the box is Blue.
  AIUI only extreme editions get the black boxes.
  But yes it is confusing if you don't follow this stuff. As I said I strongly suspect the confusion is deliberate.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
49. Re: Another marginal perf iteration of Core by dywolf · 2013-09-03 07:31 · Score: 1
  
  there IS a difference in box color? hfs. i was having fun with it at the end. i wasnt even aware they had a box color thing going on.
  i havent actually bought a cpu in >5 years, and that was an AMD...last Intel I bought was a Pentium 3 at 500MHz, so that's what...14 years?
  
  --
  The guy who said the election was rigged won the presidency with the second-most votes.
50. Re:Another marginal perf iteration of Core by fuzzyfuzzyfungus · 2013-09-03 07:53 · Score: 1
  
  Wouldn't multithreading require an actual overhaul of their codebase? Just pegging the hell out of a single core(ideally dragging down the browser and any other program foolish enough to embed Flash) seems like more their style...
51. Re:Another marginal perf iteration of Core by fuzzyfuzzyfungus · 2013-09-03 08:01 · Score: 1
  
  I certainly suspect that having all the consoles that aren't Nintendo be 8-thread x86s will increase uptake of multithreaded games (which has been gradually improving, albeit generally with one stubbornly-huge performance-limited thread that is hard to banish because multithreading things just isn't an easy problem); but in this case, the Haswell part, the i7-4770K, is a 4-core/8-thread unit with better performance/thread, while the i7-4960X is a 6-core/12-thread unit with allegedly weaker performance /thread (though better cache and memory bandwidth may compensate for some workloads), so it is still fairly likely that the 4770k will crowd the 4960x pretty aggressively among gamers.
  
  Depending on how they price it, ('flagship' generally adds a major pointless markup; but 'Xeon' has been known to add a nasty dose of sticker-shock when applied to single-socket parts as well), it may be a sweet workstation/single-socket server part; but if it's priced for gamer e-peen contests that's less likely.
52. Re:Another marginal perf iteration of Core by wagnerrp · 2013-09-03 08:01 · Score: 1
  
  Why? Cause having threads over makes other programs run nicely. Run a program eating 4 threads, and you still have response in GUI and other programs. Unless you hit some other bottleneck (and you will).
  Just having threads does not necessarily mean parallel processing. It becomes increasingly more difficult to add functional threads to an application without getting them locked down by mutexes.
  
  My next machine will be a i7 with SSD, no bit storage anymore, internet will keep my movies from now on :-)
  How does the release of a new CPU have anything to do with you wasting internet bandwidth?
53. Re:Another marginal perf iteration of Core by wagnerrp · 2013-09-03 08:06 · Score: 1
  
  Don't forget, an IvyBridge-E part has twice the memory bandwidth over the next closest Haswell part. There are a number of applications where that measure trumps a higher IPC.
54. Re:Another marginal perf iteration of Core by fuzzyfuzzyfungus · 2013-09-03 08:08 · Score: 1
  
  "(You can also use QuickSync if you have that enabled)" That might also be unhelpful to the 4960x... since it's basically a Xeon by another name, it has no integrated graphics capabilities. This isn't a major loss as a GPU(since who buys a crazy-expensive CPU and no discrete GPU if they plan to game?); but, given the substantial delta between dedicated encode hardware and doing it in software, it isn't unlikely that a rather humbler CPU with QuickSync's dedicated encoder will put up a good fight against a 4960x doing its own crunching.
55. Re:Another marginal perf iteration of Core by fuzzyfuzzyfungus · 2013-09-03 08:12 · Score: 1
  
  It's less of an issue with SSDs; but where background tasks can really bite you is storage I/O. That is annoying.
56. Re:Another marginal perf iteration of Core by wagnerrp · 2013-09-03 08:14 · Score: 2
  
  The problem is that people forget (nearly) all water cooling systems are really just fancy air cooling systems. They're just trading phase change heatpipes for pumped water, expecting something magical is going to happen and amazing low temperatures! Unless you add a larger radiator than you could otherwise fit on top of your CPU, all you get is a modest improvement in airflow efficiency, as you prevent hot air from recirculating back through the heatsink. Well designed cases minimize that issue with air cooling anyway.
57. Re:Another marginal perf iteration of Core by amorsen · 2013-09-03 08:20 · Score: 1
  
  If you aren't gaming, why buy a desktop? I suppose there is still AutoCAD and compiling, but that market seems even smaller than the gaming market.
  
  --
  Finally! A year of moderation! Ready for 2019?
58. Re:Another marginal perf iteration of Core by interkin3tic · 2013-09-03 08:34 · Score: 1
  
  I'm as ignorant about hardware as they come on slashdot, so this is an honest question. By two are you saying the demand for per-core performance decreased or the capabilities bottomed out? I remember hearing we were at some point going to break moores law because there are finite limits to how much performance you can get out of silicone, and I've also not been blown away by increases in graphics or power requirements on games or applications. Which is it?
59. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 09:25 · Score: 0
  
  Strange that the benchmarks don't even include AMD's fastest CPU's. Its ok to compare a CPU that costs $1000 to an AMD one that costs $250, rather than the one that costs $850. Yah, I know everyone loves to hate the FX-9590, but that doesn't mean it isn't a product for sale. I didn't see a lot of power consumption numbers in that benchmark, cause the people buying these things probably don't care that much, so why should they care about an extra 100 watts on the CPU.
60. Re:Another marginal perf iteration of Core by bored · 2013-09-03 09:54 · Score: 1
  
  Its not that clear cut. The throughput of the instructions count as well. In many cases AVX isn't faster than SSE because the core can retire 2x the SSE instructions per cycle. Furthermore, it can be harder to get a x8 vector than a x4 one.
  Think how useful 4x4 matrix operations are for 3d graphics. Then consider how to write optimal code using a x8 vector.
  Now all that said, AVX(/2) can really win in some cases
61. Re:Another marginal perf iteration of Core by Blaskowicz · 2013-09-03 10:06 · Score: 1
  
  Except GPU encoding sucks : it is fast but the quality is compromised. Some step is very GPU-unfriendly (stuff with a lot of branching and random access is) so it'd done in a simpler, quicker way or at a lower "profile". GPGPU encoding even becomes pointless if you have encoding hardware (Quicksync, NVENC) which gives you the same result but using a lot less energy, while CPU is where it's at if you want max quality for a given size.
  GPU still have to take over a lot of tasks (why not some kinds of audio processing) and be used more often, each generation getting a bit more useful and easier to use for computing tasks. But they aren't able to do everything or do it efficiently.
62. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 11:13 · Score: 0
  
  It's a programming challenge to keep many cores occupied.
  However. Never forget that the sequence to enable a new capability is hardware first, then OS support, then applications support. Core counts beyond 4 are still a bit unusual and that includes hyperthreading. The chip manufacturers have been reluctant to release those high core count processors into the non-server world and that's largely due to the lack of strong app support. The app makers see the flip side of this and decline to invest in highly threaded applications due to a lack of HW availability in their target markets.
  Eventually the chicken-and-egg problem will be resolved and pressure will increase on app programmers to make use of the additional CPU resources. In the gaming world there's a lot of product turnover and that means lots of opportunities to revamp the engine code that will tap additional cores.
63. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-03 11:37 · Score: 0
  
  There's no benchmarks to support the idea that a 2500K crushes a Phenom II X4 955? Really? That's what you're going with? Going with AnandTech because they're a much more respectable site than Tom's (honestly, who the fuck cares what Tom's says, they've been irrelevant since even before Tom sold out, and afterwards? ugh) and have a nifty benchmark database thing:
  http://anandtech.com/bench/product/288?vs=88
  p.s. The best Pricewatch prices for the 2500K and 955BE are $180 and $107, respectively, both from the same shop in Fremont CA. So it's more like a $73 difference. Also pricing on both is kinda unreliable anyways since they're both out of production CPUs and what you're buying at this point is new old stock. You can't even find these CPUs on Newegg any more. You really should update your ill-informed fanboy rants to reflect current events.
  p.p.s. Another problem with your little rant is that Intel's actual competitors for the 955BE were i3 line models, not i5, and they were priced competitively. And as you can see here they tended to trade blows on desktop/productivity with the i3 winning in games. (Generally speaking the i3 is much better at single-threaded performance, the 955 better at multithreading. Desktop apps are a mix between single and multithreaded so it tends to average out, but games are mostly lightly threaded so lots of them ran substantially better on the i3.)
  http://anandtech.com/bench/product/289?vs=88
  So, y'know, that thing where you like to pat yourself on the back for being clever and insightful about saving money on the CPU by buying AMD to build a gaming system? You might want to reconsider whether you're actually the one making suboptimal decisions. Because you were.
64. Re:Another marginal perf iteration of Core by DigiShaman · 2013-09-03 14:45 · Score: 1
  
  1 Windows terminal server with 30+ Thin-clients.
  Terminal server has 24GB of RAM with 16 threads of execution available.
  Half the memory is already used up and over 93% of all threads combined processing flash content.
  Due to excessive CPU load, the UPS trips due to current overload. Priceless!!!
  
  --
  Life is not for the lazy.
65. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:06 · Score: 1
  
  Indeed. And for things you only need to read once, caches do not help at all.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
66. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:07 · Score: 1
  
  Don't be obtuse. Everybody knows that. What this discussion obviously is about is the derived law on CPU performance.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
67. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:12 · Score: 1
  
  I agree, and except for one notebook, I have all AMD for that reason. Also AMD has had the better memory interface for quite some time.
  But buyers are not rational. And pressure on Intel only results when people realize that for the performance most people can use, AMD is significantly superior. At the moment Intel still has the faster high-end chip, and even if almost nobody buys it, that makes people think that Intel CPUs are faster. Stupid, I know, but also reality.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
68. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:13 · Score: 1
  
  That would be a tragedy. Remember how badly Intel's CPUs sucked before AMD pushed them? Pentium 4? That is what they would go back to if AMD dies. Not good at all. That said, any predictions of AMDs demise are vastly premature.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
69. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:16 · Score: 1
  
  Have you looked at the atrocities Intel pushed out before AMD overtook them for a while? All those "highly skilled engineers at Intel" are responsible for trash like the Pentium 4.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
70. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:19 · Score: 1
  
  CPUs cannot get significantly faster. Most desktop workloads can only be parallelized so much. Mores original Law still holds, i.e. transistors per area increasing. But it does just not result in noticeable speed gains on the CPU side anymore. That is over, CPUs are pretty much mature at this time and not much advancement can be expected, possibly ever. Cheaper, less power, yes. Faster, no.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
71. Re:Another marginal perf iteration of Core by gweihir · 2013-09-03 17:20 · Score: 0
  
  Indeed. One thing Intel has is a large war-chest. My guess is that now and then something from it finds its way into the hands of journalists.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
72. Re:Another marginal perf iteration of Core by slashmydots · 2013-09-04 03:17 · Score: 1
  
  There's not much that their 8-core Vishera won't do so you're good across the whole scale. Anyone who wants THE FASTEST processor just to have it is crazy and anyone who wants the fastest one because their encode HD videos daily for CNN, that's a different story.
73. Re:Another marginal perf iteration of Core by green+is+the+enemy · 2013-09-04 03:28 · Score: 1
  
  You're right about the current state of things. Traditional CPUs are not going away. Some code is branch-heavy, but does SIMD really help there? Some compromises are being made in quality in GPU implementations, probably because NVIDIA severely limits the double-precision performance of their consumer GPUs. I'm just thinking that Intel's efforts might be better spent in improving GPU technology, rather than adding even more SIMD instructions to x86. The SIMD path seems to be a dead end, when compared to GPUs.
  
  On another note, Intel has got to get away from DDR3.. ugghh. Memory technology has made such strides (GDDR5), but Intel seems to ignore them. The best approach will probably be to combine a beefy GPU with several x86 cores on the same die, with massive memory bandwidth (push the limit) and no PCIe bottleneck between them.
74. Re:Another marginal perf iteration of Core by Anonymous Coward · 2013-09-04 14:40 · Score: 0
  
  I'm just thinking that Intel's efforts might be better spent in improving GPU technology, rather than adding even more SIMD instructions to x86. The SIMD path seems to be a dead end, when compared to GPUs.
  Intel is pursuing both paths. This year they're releasing a bunch of laptop and desktop CPUs:
  http://ark.intel.com/products/codename/51802/Crystal-Well?q=crystal%20well
  with very powerful integrated GPUs plus a 128MB eDRAM L4 cache chip integrated into the package (the purpose of the L4 being to provide memory bandwidth similar to discrete GPUs). Based on preliminary testing by the usual suspect websites, performance is reasonably close to good laptop discrete GPUs like the NVidia 650M. (which is pretty much where this was aimed at.) It's not all the way there yet, but give it another generation or two and you're going to be seeing the market for discrete GPUs shrink a lot. (well, a lot more than it already has -- integrated video has already eaten almost the entire low end discrete GPU market)
  
  On another note, Intel has got to get away from DDR3.. ugghh. Memory technology has made such strides (GDDR5), but Intel seems to ignore them.
  You don't understand what is going on there. GDDR5 has super aggressive signaling rates (over 5 GHz in some cases!!). As a consequence it is more or less impossible to have more than one load per signal pin, which means it's impossible to support large amounts of GDDR5 on one controller. In fact, it's impossible to even introduce one or more sockets into the signal path -- note that GPUs all have both the GPU and the GDDR5 soldered down, with very carefully laid out (and short) traces between the GPU and the memory.
  If CPUs switched over to GDDR5, you'd have to buy motherboards with a soldered CPU and a fixed amount of soldered GDDR5. You could never upgrade the CPU or expand the memory. In the long term this is kinda where things are going, but right now a lot of the market really isn't ready for this step yet. Which is why AMD also has stuck with DDR3 memory for the processor even though AMD knows plenty about GDDR5 (having pioneered very high clock rate GDDR5 interfaces in their GPUs).
  
  The best approach will probably be to combine a beefy GPU with several x86 cores on the same die, with massive memory bandwidth (push the limit) and no PCIe bottleneck between them.
  Well, uh, see above. The first steps down that road are exactly what Intel is delivering as we speak. The GPU's not super beefy yet, and the on-package L4 can't yet match the bandwidth of the big multi-hundred-watt GPUs, but it's pretty clear where they're going.
75. Re:Another marginal perf iteration of Core by metaforest · 2013-09-04 15:20 · Score: 1
  
  Moore's Law is still active, but no longer influences single thread performance. Multi-cores are limited by Amdahl's Law.
  http://en.wikipedia.org/wiki/Amdahl's_law
  Software needs to find better ways to gain in parallel.... this is a VERY hard problem.
  However if you have many single threaded tasks that are unrelated, your computer remains fairly snappy due to the scheduler balancing your apps across all available hardware threads.
76. Re: Another marginal perf iteration of Core by petermgreen · 2013-09-04 21:37 · Score: 1
  
  there IS a difference in box color? hfs. i was having fun with it at the end. i wasnt even aware they had a box color thing going on.
  Yep, Intel use black boxes for their extreme edition chips and blue boxes for pretty much everything else.
  AMD do much the same, "black edition" processors get black boxes, other processes get green boxes.
  
  --
  note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Can't wait for the server version by afidel · 2013-09-03 01:56 · Score: 1

3.6GHz base clock is the fastest we've had since the last generation P4's, and with the obviously superior IPC of the IB this thing's going to be a monster for certain workloads where the code doesn't scale well to multiple cores. The only downside is it's not 8 cores/16 threads at those speeds which is a bummer for virtualization hosts. Oh well, the E5-2670's at 2.6GHz do a pretty good job =)

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
1. Re:Can't wait for the server version by Anonymous Coward · 2013-09-03 02:33 · Score: 0
  
  3.6GHz base clock is the fastest we've had since the last generation P4's
  E3-1290v2. 3.7GHz base clock.
2. Re:Can't wait for the server version by afidel · 2013-09-03 02:50 · Score: 1
  
  2 memory channels makes it not very useful for my purposes but it is in fact slightly faster =)
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
3. Re:Can't wait for the server version by Anonymous Coward · 2013-09-03 03:16 · Score: 0
  
  E5-1660v2 (Xeon cousin of the 4960X) will have 3.7/4.0 base/turbo. Expect about $1k street.
4. Re:Can't wait for the server version by afidel · 2013-09-03 03:33 · Score: 1
  
  bah, why not a dual socket chip? Why can't I get the best single core performance in something that actually makes sense to license (ie VMWare and MS both license based on two sockets per server minimum).
  
  --
  There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Hhaha I had a P4 that ran 4 G TEN years back by Anonymous Coward · 2013-09-03 02:23 · Score: 0

And this is what intel calls Moorse law?
Amd rulez!
1. Re:Hhaha I had a P4 that ran 4 G TEN years back by Anonymous Coward · 2013-09-03 04:23 · Score: 0
  
  Moore's law is about the number of transistors increasing, not speed.
2. Re:Hhaha I had a P4 that ran 4 G TEN years back by T-Bone-T · 2013-09-03 04:27 · Score: 1
  
  Your P4 at 4GHz can't do nearly as much as a single core on a newer processor. My 2.4GHz P4 coverts DVD movies to low-res in 8 hours or so, my 2.8GHz i3 does the exact same thing in 20 minutes, 24x faster overall and 6x faster per thread.
3. Re:Hhaha I had a P4 that ran 4 G TEN years back by wiredlogic · 2013-09-03 08:26 · Score: 1
  
  That late generation P4 used a 31-stage pipeline to achieve those high clock speeds. The ivy bridge architecture uses a 14-stage pipeline giving it higher IPC than the power hungry NetBurst line could ever hope to achieve.
  
  --
  I am becoming gerund, destroyer of verbs.
4. Re:Hhaha I had a P4 that ran 4 G TEN years back by Billly+Gates · 2013-09-03 09:11 · Score: 1
  
  Funny how the geeks here who knew our stuff bought lower speced AthlonXPs because I knew my athlonXP1800 at 1200 mhz would cream a pentium III of that same speed quite easily for 1/2 the price.
  But Joe six packs saw that mhz tag and purchased pentium IVs instead.
  
  --
  http://saveie6.com/
A10 has a GPU too by Anonymous Coward · 2013-09-03 02:30 · Score: 0

Hardly a realistic comparison, given the A10 has a GPU integrated and Intel 6 core doesn't.
1. Re:A10 has a GPU too by CajunArson · 2013-09-03 03:07 · Score: 1
  
  Yes you are right... it is unrealistically favorable to AMD that is since if you had bothered to look at the charts you'd note that the benchmark was a CPU-only test that gave AMD the advantage of being able to run the GPU at very low power since it isn't being stressed and redirect the power consumption to the CPU...
  Oh and they also tested with discrete GPUs that completely relieve the APU of having to expend any energy on the IGP at all.
  
  --
  AntiFA: An abbreviation for Anti First Amendment.
2. Re:A10 has a GPU too by Anonymous Coward · 2013-09-03 08:12 · Score: 0
  
  Yes you are right... it is unrealistically favorable to AMD that is since if you had bothered to look at the charts you'd note that the benchmark was a CPU-only test that gave AMD the advantage of being able to run the GPU at very low power since it isn't being stressed and redirect the power consumption to the CPU...
  Oh and they also tested with discrete GPUs that completely relieve the APU of having to expend any energy on the IGP at all.
  You fail at math and physics.
And still ineffective... by Lumpy · 2013-09-03 02:58 · Score: 2

Because the only Multi Chip processors are still 4 years behind this. Why dont they just enable the ability for me to drop 4 of these on a single motherboard so I can have my 24 core monster for editing and rendering 4K video?

--
Do not look at laser with remaining good eye.
1. Re:And still ineffective... by Billly+Gates · 2013-09-03 07:29 · Score: 1
  
  Because the only Multi Chip processors are still 4 years behind this. Why dont they just enable the ability for me to drop 4 of these on a single motherboard so I can have my 24 core monster for editing and rendering 4K video?
  You can. Asus makes a workstation version of its Server and gamer sabertooth lines. They have 2 sockets. With 2 of these things in there you can have a 24 core system (assuming Asus updates its WS board to support this soon.) They are not as popular as 10 years ago when geeks put 2 athlonMPs together because extra cores do the same thing for a fraction of the cost without a specialty and premium motherboard.
  They are pricey at +$400 per board though, but a gamer grade 1200 watt PSU and you can get a medium grade Firepro which is as fast as an ATi 6670 for $350 bucks now and that would be fine for this if you have the cash for a $2000+ system.
  
  --
  http://saveie6.com/
2. Re:And still ineffective... by Anonymous Coward · 2013-09-03 09:33 · Score: 0
  
  They'd be stupid to do that because it would cannibalize their Xeon sales. Why would they let you get away with buying two $250 i5 or i7 CPUs when they can sell you a pair of $600 Xeon E5-2630 instead, along with a $500 S2600CP2 motherboard... Sure, they lose a handful of sales to hobbyists, but they make up for it a thousand fold in enterprise sales.
Does it support TSX? by adisakp · 2013-09-03 03:14 · Score: 1

Thats an important question for me as I write the base level concurrency libraries for our company.

I wanted to get a 4770K but Intel disabled TSX (Transactional Synchronization Extensions) on that CPU.
1. Re:Does it support TSX? by adisakp · 2013-09-03 03:22 · Score: 1
  
  For what it's worth, none of the Haswell 'K' line supports TSX. You actually have to buy a cheaper CPU to get this feature which is odd... maybe it didn't validate well with overclocking though? The new 'HQ' line seems to support it but the new 'R' line does not.
  
  Anyhow, I'm wondering if the 'X' line supports TSX or not. I can't find docs or specs that answer one way or another right now.
2. Re:Does it support TSX? by Anonymous Coward · 2013-09-03 03:24 · Score: 0
  
  You'll probably have to wait for em to get listed on ARK to find out.
3. Re:Does it support TSX? by m.dillon · 2013-09-03 06:15 · Score: 1
  
  It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast. It's only when you get to multi-socket where the cache management bandwidth (which is what the transactional extensions are able to avoid) becomes a big deal.
  If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it another way... if you code SPECIFICALLY for one of the two intel transactional models that means you will probably wind up with very sloppy code (such as using global spinlocks more than you need to and assuming that the underlying transaction just won't conflict as much). The code might run fine on an Intel cpu but its performance value will not be portable.
  And besides... 'your company' ? Use a Xeon then, right? It's not as though it costs all that much more.
  -Matt
4. Re:Does it support TSX? by adisakp · 2013-09-03 06:39 · Score: 1
  
  It's almost an oxymoron if you are talking about a single-socket Intel cpu. You don't actually need the transactional extensions to make things go fast
  Not true... I've written an entire concurrency system including a lock free library and a multicore memory mananger. There are a number of places where TSX offers a large speed improvement even on a single core.
  
  If the purpose is to test code performance then it is better to test without transaction support anyway since transaction support is not a replacement for proper algorithmic design. Or to put it another way... if you code SPECIFICALLY for one of the two intel transactional models that means you will probably wind up with very sloppy code (such as using global spinlocks more than you need to and assuming that the underlying transaction just won't conflict as much). The code might run fine on an Intel cpu but its performance value will not be portable.
  Are you even familiar with how TSX works? Hardware Lock Elision is a very simple replacement for atomic locking. You can write a very simple user level mutex using atomic operations that has a fallback to an OS yielding construct. In fact that's what we do in my concurrency library. Uncontested access is a single atomic op while contested access is an actual OS mutex. With HLE, all accesses can appear to be uncontested unless there is an actual data conflict in memory read / written to during the transaction. This cuts down significantly on OS lock calls. It's not just for spinlocks.
  
  And besides... 'your company' ? Use a Xeon then, right? It's not as though it costs all that much more.
  -Matt
  By "my company", I mean the company I work for. Disclamer: Netherrealm Studios which is owned by Warner Brothers but thia is my personal opinion and any posts I make here do not reflect the official option Warner Brothers nor on Netherrealm Studios. We make video games and I write low level optimized code for multicore / multithreading libraries among many other things.
5. Re:Does it support TSX? by m.dillon · 2013-09-03 07:04 · Score: 1
  
  Well, for video games (or anything you sell to the consumer), you clearly do not want to rely on Intel's transactional extensions because doing so could significantly reduce or destroy performance on any customer systems that don't have them.
  Basically the way the basic (the prefixed) transactional extension works is to avoid dirtying the cache line(s) associated with the spin lock or unlock operations, with the assumption that the operations which are run within the locked section are less likely to conflict than the spin lock itself.
  Since spin locks are often used incorrectly and are often coarse-grained (or even global), this can yield significant improvements in performance.
  However, it also HIDES the fact that the application was badly written. If you write an application which depends on the new extensions the result will be extremely poor performance on any cpu which does not support the extensions.
  So as far as game design goes, the transaction stuff is worse than worthless. The transactional stuff is best used for turn-key (server-side) code where the hardware environment is under your control.
  -Matt
6. Re:Does it support TSX? by adisakp · 2013-09-03 08:49 · Score: 1
  
  Nope... it's very valuable. Basically what you do is you write code that makes use of fine-grained user space locking that has a fallback to OS locks on contention. This runs very fast on multicore systems.
  
  Then you add HLE extensions and it runs even faster on CPU's that support TSX and you get a rather large performance bonus for free as at that point a majority of your atomic operations become free.
  
  It also allows you to do substitute simple lock-free and non-blocking algorithms that rely on multiaddress DCAS or NCAS on platforms with TSX because they are very simple to emulate either using HLE or even better RTM.
  
  If you think TSX is for spinlocks, you are generations behind in your programming of lockfree algorithms.
7. Re:Does it support TSX? by adisakp · 2013-09-03 09:00 · Score: 1
  
  Also, I don't see why you keep referencing global locks and spin locking as the only things that would benefit. Did you get a chance to read the presentation I linked to? Mind you, it's based on work I did 4-5 years ago and presented almost 4 years ago, but even back then we were well ahead of the starting point you seem to feel developers are using as a base.
  
  We are already using fine-grained locking, striped locking, reader/writer locks, lock-free atomic SList, lock free allocators, etc. I am interested in having TSX speed up these advanced concurrency primitives on platforms where it is available. If AMD releases ASF, I'll look into accelerating with that as well.
8. Re:Does it support TSX? by adisakp · 2013-09-03 10:24 · Score: 1
  
  So as far as game design goes, the transaction stuff is worse than worthless.
  I want to feel you're not just trolling me because apparently you've been developing software since at least the Amiga days (we have that in common). However, I feel you are quite misguided on some of your assumptions here.
  
  Not to say I may have a more informed opinion than you because I don't know your personal experience in game development, but I certainly feel that TSX isn't worthless for games and I've been writing performance code full time for games for over 20 years.
9. Re:Does it support TSX? by Blaskowicz · 2013-09-03 10:34 · Score: 2
  
  It safe to say it does not, as TSX is a Haswell feature and 4960X is an Ivy Bridge CPU.
  What you would need is the 4960X's successor, which is Haswell-E on a new socket called LGA 2011-3 with ddr4, and its server counterparts. Or get a vanilla 4770 or 4771.
10. Re:Does it support TSX? by m.dillon · 2013-09-03 11:02 · Score: 1
  
  Not to put too fine a point on it, but I've written hundreds of thousands of lines of SMP code on modern systems (and, frankly, I was doing SMP code with paired 8-bit CPUs over 28 years ago), so if you think you are somehow stating something in regards to my knowledge base, I would humbly suggest that you take your opinions and shove them down a toilet somewhere because you clearly have no clue whatsoever as to what I've been doing the last 20 years.
  -Matt
11. Re:Does it support TSX? by m.dillon · 2013-09-03 11:25 · Score: 1
  
  I don't think you actually bothered to read and understand what I wrote. Try again. This time read my responses (or at least the first two) a bit more carefully.
  I'm not in the least saying that transactional hardware support is bad. I am saying that programming to Intel's transactional interface FIRST, as your primary programming model, particularly for consumer applications, can lead to very undesirable results on hardware that doesn't support it.
  Intel tends to implement first-run features with very weird restrictions and other goo that only leads to trouble down the line. Their HVM stuff being a really good case in point (AMD's is much, much cleaner, too bad AMD dropped off the performance curve a few years ago). There are plenty of other examples as well. There are very severe limitations to what Intel's (now TWO) transactional models can handle. Programming specifically to those interfaces is akin to the people who would hand-code assembly years ago for a 15% improvement over C and think it was good (hint: all that code was thrown out the window soon after).
  Right now only very short locked code sections can operate reasonably under Intel's current transactional model. It just so happens that the most prevalent use of spin locks in modern architectures is with short locked code sections. Longer locked code sections do not typically use spin locks. So a reasonable first approximation of any implementation of Intel's model is going to be as a way to boost spin locks.
  In a well written SMP application or kernel there are not actually very many places where conflicts create performance issues. In either Linux or DragonFly I can only think of two places: (1) Concurrent file descriptor access from a threaded program, and (2) Concurrent VM faults from a threaded program to the same memory location(s) (same underlying VM objects and same underlying page table page). And that's pretty much it. For applications it will depend on the application of course, I'm not completely dismissing it. I LIKE the concept. I DON'T LIKE the severe limitations imposed by Intel's model. It's that simple.
  -Matt
12. Re:Does it support TSX? by Anonymous Coward · 2013-09-03 12:38 · Score: 0
  
  It does not support TSX. Not because they've disabled it, but because in this case it actually isn't there. TSX is presently only implemented in Intel's newest CPU core design, Haswell. These three chips (4820K, 4930K, and 4960X) use previous generation Ivy Bridge cores.
  Why would they release a new high end CPU with an older core? Simply because Intel's high end "desktop" chips are actually rebadged midrange Xeon E5 v2 series server/workstation chips with a few features disabled. E5 v2 initial tapeout (release to engineering sample production) probably happened at roughly the same time as 3770K and the other mainstream desktop/mobile Ivy Bridge CPUs. But thanks to its support for dual socket and some other enterprise features, as well as the higher quality expected in those markets, Xeon E5 takes longer to validate and bugfix. So we're seeing the resulting products about a year after mainstream Ivy Bridge CPUs.
  By the same token, Intel is almost certainly already working on Haswell E5, but it won't be released to production for another year or so. (Maybe more.)
13. Re:Does it support TSX? by adisakp · 2013-09-03 13:09 · Score: 1
  
  I felt the choice of wording was a bit prematurely dismissive (i.e. saying it shouldn't apply to single socket CPU's or to Game Programming -- especially since that is the primary target of my concurrency research).
  
  Also, we are not trying to write specifically to HLE. We are trying to write stuff that runs well on multicore systems and then layer HLE on top of it for an added performance benefit for when we do have lock conflicts.
  
  I agree that well written applications don't have nearly as many locking conflicts to begin with and that's certainly our goal. We try to run most of our game using a multicore graph driven data dependency scheduler (also presented at GDC by my coworker).
  
  But there are a number of systems (both internal and legacy) that do use locking that will benefit from HLE. It's about making the code run as fast as possible.
  
  When we have to lock, we try to make it as fine-grained as possible (until you get diminishing returns in either performance or memory). HLE works well with existing algorithms (almost nothing to rewrite except to specify whether the CAS is acquire or release for a lightweight user space lock) and it is backwards compatible with an extremely low penalty for processors without HLE -- from the benchmarks I have seen, HLE code will run as fast as the original locking code (to within some white noise of most performance metrics on the unsupported CPU's) which should be fairly fast assuming you use RWL's, striped locks, and organize data algorithms to minimize contention.
  
  HLE is a "no brainer" for ease of implmentation. However, a TSX RTM code path does require algorithm rewrites so perhaps that is akin to "writing in asm for a 15% improvement over C" (although some game programmers would find that tradeoff acceptable in small code funtions in low level libraries anyhow!).
  
  As far as gamers are concerned, if we can give them a noticeable performance boost by taking advantage of a specific CPU feature without slowing down the code on CPU's without that feature, they will consider that to be a big win and love us for it.
Re: wrong wrong wrong by jsh1972 · 2013-09-03 03:25 · Score: 1

Does this scale up, as in could you arrange four cores in a manner to approximately give 4x performance?
Re:wrong wrong wrong by razvan784 · 2013-09-03 03:37 · Score: 2

I don't think you understand correctly how a superscalar processor works. Maybe you're confusing parallel instruction execution with pipelining? Even single-core, non-hyperthreading processors have been able to execute multiple instructions *simultaneously, in a single cycle* since the first Pentiums or earlier. See, they can fetch two instructions at once from the cache because it has a wide internal bus, decode them simultaneously, and execute them simultaneously (if they are independent) because each core has multiple execution units. Modern processors can easily execute 3 or 4 instructions at once on a single core, in a single cycle. As I understand it, hyperthreading comes in when part of those execution units are sitting idle because there are not enough instructions in the main thread that can be executed in parallel - they're not independent, some depend on the results of others - and so those idle units are used to process another thread. Of course it's slower than having two full cores, but the point is that a single core CAN execute a lot of stuff in parallel.
2CPU.COM by DarthVain · 2013-09-03 03:45 · Score: 1

I still have an old Abit BP6 system sitting next to my desk gathering dust if you want it. I even have 4 extra celeron processors for it!
Back when men where men, and dual core meant two processors!
Sadly other than specialized software, most are still only designed for single core anyway, making the performance gains negligible for most people, which means other than an expensive marketing ploy to a small enthusiast market, not much of a market advantage for any company to do so...
1. Re:2CPU.COM by Billly+Gates · 2013-09-03 07:36 · Score: 1
  
  I still have an old Abit BP6 system sitting next to my desk gathering dust if you want it. I even have 4 extra celeron processors for it!
  Back when men where men, and dual core meant two processors!
  Sadly other than specialized software, most are still only designed for single core anyway, making the performance gains negligible for most people, which means other than an expensive marketing ploy to a small enthusiast market, not much of a market advantage for any company to do so...
  That is radically changing. Notice Samsung has an 8 core phone coming out soon. Most people outside of moms use Chrome and IE which both have a thread per tab and also have a half dozen apps and even more files open. Geeks like us probably run VirtualBox or VMWare to test Windows Server and Linux with XP and Windows 7 clients. Any IT geek with his or salt does this and let me tell you my old crappy 6 core phenom II came in handy when I started doing this when I want to learn push installation testing. Games are now using this too as even xbox ports from the Xbox one have multithreading built in due to more than 1 core being used.
  With people leaving XP slowly but surely they use more than 8 gigs of ram which enables them to open more apps at once which will utilize this. Chrome is the biggest use of this as when you have +40 tabs open can slow just a dual core system very quickly even if you have the ram! Maybe not scale linear wise but Windows 8 leaves open too when you close them just like iOS and Android. Having the extra cores helps this.
  
  --
  http://saveie6.com/
2. Re:2CPU.COM by DarthVain · 2013-09-03 08:04 · Score: 2
  
  I was about to mention that all of the things you talk about are more memory intensive than anything else, which of course is OS dependent, requiring 64-bit, which in addition to hardly anyone bothering to run multi-threaded software, no one bothers to write software optimized for 64bit systems either.
  The main problem being is that relatively speaking single thread 32bit applications are what people are used to making is simple compared to writing a multi-threaded 64bit optimized application. Unless there is a real advantage why do it, if it will take longer, cost more money, etc... I agree it will eventually happen, just not as soon as you may be alluding to.
  The next step really needs some method/tool/language to make the process easier for the developers to write the software, allowing them to do it more efficiently, which will in turn start to get management on board to create some of these things.
  *Note: I have no experience whatsoever writing multi-threaded 64bit optimized software. I have only heard on the interwebs that its inherent complexity make it more difficult to do.
  Also there is the 32bit crutch. Lots of apps out there that are 32bit so it is not going away anytime soon, not like a clean break. While it is still an available option developers will use it. That said, I am not even sure how much difference there is, some benchmarks show very little improvement from one to the other, but that could be a mature technology VS a new one and not a fair comparison.
  That said using more available cores, particularly for specific tasks would likely see immediate dividends. From my understanding it is a timing/scheduling and organizational issues that make it more complex.
  OK I may have rambled a bit.
3. Re:2CPU.COM by Billly+Gates · 2013-09-03 08:49 · Score: 1
  
  It is annoying MS still makes 32-bit software in addition to not supporting 16 bit software through WOW32 in 64 bit versions of Windows. I guess running wow daisy chained to wow 32 is banned. Wow stands for win16 on win32 so your corporate VB 5 app circa 1996 can still run on Windows 7 (the 32-bit version).
  I am not a developer, but they tell me it is a simple recompile unless you are working in assembly or device driver development. 64-bit is much more efficient as it has compiler flags for the extra registers on a modern post pentium IV in addition to SSE 3 and other extra instructions added to chips in the last 10 years. Windows 7 also has a crippled MBR that does not check for signed bootloaders so a rootkits like allueron can execute. So you get more performance if you compile it for 64 bit besides the extra ram.
  I think modern versions of linux can use these extra instructions depending on how it is compilied by default too and just not use it on an older Pentium IV so the difference is not quite so stark. But XP reaks in comparison as most users feel it is faster, but in reality does not take advantage of any processor made in the last 10 years!
  MS really should just port the 2 WOWs and kill 32 bit OS development as it gives a developer no incentive to upgrade his or her code as XP is still a huge market where many users have old DOS apps that require it. ... regardless 32 bit operating systems support threading and processing well if not better than 64 bit and have for many years. a multi core is nice for heavy multitaskers even if you have memory your single core and even dual core will glitch with 5 apps and +40 tabs in Chrome.
  
  --
  http://saveie6.com/
4. Re:2CPU.COM by DarthVain · 2013-09-04 01:58 · Score: 1
  
  Likely in the end it will be the memory that drives the issue. As applications require more memory, OR users become accustomed to multitasking which may require more memory, specifically more then 3.5GB worth it is only then where a 64bit OS will be *required*. There are still a lot of basic users, and basic machines with the 4GB of RAM out there. At a certain point OS will move to ONLY 64bit, which will in turn start to offer incentives to developers. So bring on the glut of background services and system bloat! :) You sort of have to go backwards to go forwards it seems.
TPM no more by stanlyb · 2013-09-03 03:52 · Score: 2

Since their devotion to TPM, my answer to intel was, is and will be: GO F**** yourself.
1. Re:TPM no more by Anonymous Coward · 2013-09-03 07:56 · Score: 1
  
  Why do people get all worked up over something they so obviously know nothing about?
  Here's a few clues:
  1. TPM is a separate module on the motherboard. Without it, there is no TPM.
  2. There are virtually no enthousiast motherboards available that have a TPM module installed. Most have a header, but good luck finding a shop that actually sells you a module to plug on it.
  3. If you, uninformed as you are, do end up buying the one consumer model MoBo that has a TPM module installed, turn it off in the BIOS and yank it out. It's 100% optional.
  So you are allowing your cluelessness and apparent emotional instability to get you to a point where you boycott a brand because it offers you... choice?
  I thought Apple fanboys were bad but it's nice to be reminded the AMD ones were worse. Nostalgic reasons and all that mind you, to actually still encounter an AMD fanboy in this day and age evokes nothing but pity.
2. Re:TPM no more by Anonymous Coward · 2013-09-03 07:58 · Score: 1
  
  AMD also supports TPM. If you want to avoid it, disable it in the BIOS or buy a motherboard that doesn't support the feature.
  It is just a cyrpto key burned into the motherboard. It doesn't track anything, it's a string of data.
  But if you are part of the tin foil hat crowd that e-mails webpages to themselves out of fear of being tracked it may seem like something more.
  http://stallman.org/stallman-computing.html
3. Re:TPM no more by Anonymous Coward · 2013-09-04 14:47 · Score: 0
  
  It is just a cyrpto key burned into the motherboard. It doesn't track anything, it's a string of data.
  Actually, TPMs do a bit more than just secure crypto key storage. But you are right in that they don't, in and of themselves, do the things the tinfoil hat nutters are worried about. Merely having a TPM installed doesn't mean anyone can spy on you, lock your computer down, etc. You need an actual operating system which is designed to do those things using a TPM to implement cryptographic security in service of those goals.
  In other words, they're mechanism, not policy. As such it's possible to design such an Orwellian OS without requiring TPM hardware -- it might not be quite as secure against modification, but if it's designed to spy on you it is going to spy on you. Another implication of being a mechanism rather than a policy is that TPMs can be (and often are) used for benign purposes. One of the most common use cases is securing sensitive data on work-supplied laptops against the possibility of theft. This is not just about spook agencies and the like -- think HIPAA medical privacy law, and so forth.
Re:wrong wrong wrong by m.dillon · 2013-09-03 04:21 · Score: 1

General rule of thumb is that 2x hyperthreads is approximately equal to 1.5 real cores. Nobody is lying, Intel makes the thread/core distinction very clearly. The reason is primarily due to pipeline and memory stalls creating space which can be filled by the other thread.
Keep in mind that a modern superscale cpu can have something like 160? (number not exact) instructions in-flight at any given moment, depending on how good the branch prediction is. Instruction execution is not really a matter of clock cycles so much as it is a matter of waiting for memory and execution unit resources. Even instruction-instruction dependencies can often be absorbed by the out-of-order execution engine.
-Matt
257mm square by Anonymous Coward · 2013-09-03 04:38 · Score: 0

That's absolutely enormous. How could it possibly take over 66,000 mm^2 to house just 1.8B transistors?
What's a grand really worth? by Overzeetop · 2013-09-03 04:42 · Score: 1

So for $1000 I can get 1.5x the peak multithreaded performance over the $300 processor released three months ago. And if you run lightly threaded apps, the processor from earlier in the summer may still be faster. Wow...what a bargain. I'd say sign me up for two but, alas, Intel won't let you run multiple processors without paying the xeon tax.

--
Is it just my observation, or are there way too many stupid people in the world?
1. Re:What's a grand really worth? by Anonymous Coward · 2013-09-03 05:46 · Score: 1
  
  How is this any different from how it has been for many years? The $1000+ processors were never proportionately better than the more reasonably priced ones.
WTF - 3.6GHz? by Anonymous Coward · 2013-09-03 05:06 · Score: 0

We should have like 20GHz now.
What's point of upgrading from 3.4GHz to 3.6GHz plus a number of tiny improvements that nobody cares?
They should stop selling new CPUs until they get double speed at least.
Cue the Mac Pro by Anonymous Coward · 2013-09-03 05:14 · Score: 0

Finally they can release the new Mac Pro
Lowest performance per price by Reliable+Windmill · 2013-09-03 05:30 · Score: 2

This CPU very low, if not the lowest performance per price of current models, so in one category it is the worst possible buy you can make; it is incredibly over-priced.

--
Signature intentionally left blank.
Re:wrong wrong wrong by MrFlibbs · 2013-09-03 05:33 · Score: 4, Informative

Amazing. Everything you said about HT is completely wrong. Where ever did you get this information?
Intel's hyperthreading consists of two logical processors sharing the same compute resources. Each logical processor has its own register set but shares decoders, adders, shifters, cache, etc. as it goes about executing its assigned thread. The sharing process is vastly more complex and efficient than you seem to think -- there's no alternating of cycles. Once instructions are decoded into uops, they flow through the pipeline in a dynamic fashion that sometimes leads to one thread using most of the resources while the other one waits. In fact, this is a big advantage of the design -- when one thread stalls from a cache miss, the other one uses all the resources until the first thread's memory access completes. A much better plan than your scheme of using only even/odd cycles.
Managing this process is not simple, and steps must be taken to avoid both deadlocks and livelocks as the two threads compete for resources. But the process is dynamic -- the design allows one thread to run unimpeded when it makes sense to do so, while still preventing one thread from being starved at the other's expense. But this "every other cycle" notion of yours is pure nonsense. The core can retire up to four uops per cycle, and at times these all come from the same thread.
Actually launched? by neuro88 · 2013-09-03 06:00 · Score: 1

Doesn't seem the chip is actually available anywhere yet. I've also been hearing that September 10th may be the actual launch date.
Sounds great for CAD by unixisc · 2013-09-03 08:48 · Score: 1

This chip looks like it would be fantastic in engineering workstations - particularly ones running the Linuxes or BSDs. Whereas HDL CAD applications of old would run on Sun or HP workstations, the current ones would do well on one of these running either Windows 7 or Scientific Linux, and then the cad apps in question
1. Re:Sounds great for CAD by DigiShaman · 2013-09-03 14:27 · Score: 1
  
  Meh. I support AutoCAD products at the workstation level. An i5 is more than good enough. What AutoCAD really likes is RAM (12GB or higher, not the 8GB they recommend) and an nVidia Quadro (Fermi based) video card. Actually, I put the video card at the top of the list as the amount of RAM really depends on the complexity of the file you're working on. If you're a power user of AutoCAD however, just drop in 16GB and call it a day. Now if you have a Xeon CPU, ECC RAM can get real expensive depending on the generation you're working with!!!
  
  --
  Life is not for the lazy.
WTF? by Anonymous Coward · 2013-09-03 09:31 · Score: 0

Since we are benchmarking $1000 CPU's why not include the $850 one from AMD?
Instead we have the FX-8350, a CPU that costs $200. The extra Ghz of the FX-9590 would have moved AMD into the middle of most of those benchmarks. It would have still lost, but the benchmarks look biased without it.
More through review at hardwarecanucks by Anonymous Coward · 2013-09-03 09:43 · Score: 0

http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/63024-intel-i7-4960x-ivy-bridge-e-review.html
Re:wrong wrong wrong by Anonymous Coward · 2013-09-03 12:18 · Score: 0

"that can each process two threads simultaneously"
That is absolutely not how it works. It's been what, 10 years and they're still lying about hyperthreading to make it sound better? Super short summary of how it really works: (SNIP OF MISLEADING BULLSHIT MADE UP BY YOU)
Why are you lying about understanding how any of this works?
Short summary of how it actually works: Every core does N things per cycle, where N is the dispatch / retire width bottleneck, i.e. how many operations can be dispatched and/or retired in parallel. N is not fixed at 4, it can change from one core design to the next. In Intel's i3/i5/i7 and related Xeon series, the trend has been for N to go up.
Furthermore, each of the functional units that operations are dispatched to (that is, one of the N) is pipelined, so the latency of a single op is actually many cycles (probably somewhere between 15 and 20 these days) even if the nominal throughput is N/cycle. If the reader doesn't know what a pipeline is, think of it as being like an assembly line: a lot of items are lined up on a conveyor belt and moved past stations which do things to the work item.
Next, at the head of each pipeline / assembly line there is a buffer of sorts. The front end of the CPU reads instructions from a serial stream and dispatches them to pipeline buffers. The purpose of the buffers is to avoid stalls. Say you need to dispatch an op which calculates A+B, but while data item B is ready, A is equal to X+Y, and the op calculating A=X+Y hasn't competed yet. If you permitted the A+B op to start going down a pipeline as soon as it's dispatched, that pipe would have to stall (the conveyor belt would have to stop) until the A=X+Y op completed. The buffers permits the pipeline's head to avoid ops which don't have all data ready yet and pick those which do -- which results in fewer stalls, and out of order execution.
In Intel i3/i5/i7 CPUs which support hyperthreading, HT is simply an extension of this system. The front end which reads, decodes, and dispatches instructions to these buffers is permitted to fetch from two different contexts or threads, and the mechanisms which pick ready instructions from the buffers to progress down the pipelines are permitted to choose between threads on a completely arbitrary basis -- whichever one's got data available, with a few safety measures to make sure one thread can't starve the other. This means that the two threads running on the CPU core can and absolutely do execute simultaneously. Both due to pipelining (one pipe works on many ops in parallel assembly line style) and superscalar execution (there are many pipes, each with its own assembly line).
There is no enforced alternation, no odd-even. And contrary to the claims of "slashmydots", this mechanism does eliminate gaps. Pretty much the whole point of it is that if you run a single thread through a wide (many pipes) and deeply pipelined core, there is no known technique for making every station of every pipe 100% busy. In fact, they'll typically average less than 50% busy. By permitting two threads to dispatch into the same core, you get to fill up a lot of the bubbles in these assembly lines with instructions from another stream. You don't necessarily get up to a 100% utilization rate -- but you get a much higher utilization rate than a single thread can manage.
The confusion is everywhere inside Intel. by Futurepower(R) · 2013-09-03 13:05 · Score: 1

I also wonder how deliberate is the confusion. There are MANY areas inside Intel where there is confusion. The confusion is visible even when visiting the Intel campus in Oregon.

Funny story: I visited the Intel web site and was asked to complete a survey. I gave a few of the reasons why Intel CEO Paul Otellini should be fired, like paying $6 Billion for McAfee when Microsoft is giving away its Microsoft Security Essentials anti-virus software. A few months later Otellini left Intel; they didn't say why. I'm not saying my survey answers had an influence, I'm only making the point that the perception of Intel is widespread.

Intel has a long record of failure with consumer products. Now a completely separate division plans a TV product (???): Intel Media aims to remake TV with its own technology. This paragraph indicates some confusion and lack of competent direction: "Intel Media is run by Erik Huggers, an Intel vice president who worked previously at Microsoft and the BBC. He's assembled a team from such high-tech and media heavyweights as Apple, Netflix, Microsoft, Sky TV and Sony. Intel engineers in Oregon are participating, too, providing technical support for the project."

Oh... The Intel people are providing "technical support". Everyone else came from outside Intel??? And they don't know enough about technology to do their own support? There are many, many issues like that inside Intel.

We are having problems with Intel RAID. Intel technical support is poorly organized.

Apparently only the CPU and chipset division of the company is well-run. All other parts of Intel seem to have little competent supervision.
Re: wrong wrong wrong by Anonymous Coward · 2013-09-03 17:36 · Score: 0

I'm no expert, but I'm pretty sure that's not a good description of how it works either.
http://en.m.wikipedia.org/wiki/Hyper-threading