Core Duo - Intel's Best CPU?
Bender writes "How good is Intel's Core Duo mobile processor? Good enough that Apple chose to put it in the iMac, and good enough that Intel chose to base its next generation microprocessor architecture on it. But is it already Intel's best CPU? The Tech Report has managed to snag a micro-ATX motherboard for this processor and compared the Core Duo directly to a range of mobile and desktop CPUs from AMD and Intel, including the Athlon 64 X2 and the Pentium Extreme Edition. The results are surprising. Not only is the Core Duo's performance per watt better than the rest, but they conclude that its 'outright performance is easily superior to Intel's supposed flagship desktop processor, the Pentium Extreme Edition 965.'"
I have to say the Intel Dual Core Processor is quite impressive. It's fast enough to run just about anything I throw at it, and still keep chugging, but I believe that the article negects the fact that the dual core processor runs extremely hot vs other Intel processor. My old Sony VAIO never got as hot as my MacBook Pro does, and it is something that should be considered.
This signature was left intentionally blank.
I would argue that the 8080 was. If you normalize for date/speed that is...
More reviews here and here.
I already posted some benchmarks of a Core Duo Mac Mini running Windows (http://slashdot.org/comments.pl?sid=182379&cid=15 077120) and to be honest I was fairly impressed. The gaming benchmark was obviously miserable, the "general purpose" benchmark (zipping files, encoding audio/video, etcdid very well. The Apple zealots may say "it's because it's a Mac", but really the hardware is almost identical to your average Intel laptop. The only major difference is the Core Duo, which not many laptops have (although that's increasing all the time), and that's what I'm putting my money on. Can't wait to see a benchmark with this thing in a gaming rig.
The reason for going to 64-bits is to increase the amount of physical address space, not for speed. The majority of applications, especially integer, do not benefit from bigger registers and wider ALUs.
Our QA department is testing my universal application right now (AppKit based). They've recorded a 20 to 30 percent increase in performance of a 1GB MacBook Pro over a 3GB 2Ghz Dual G5 doing a particular operation (mostly mathematics based done in cross-platform C++). It's single threaded, I might add, since OpenMP isn't here yet. The *ONLY* difference in the XCode settings between the two architectures that I made was to enable SSE3 for the Intel build. I can't believe that it's that alone, of course, and suspect it's just better code gen for the Intel architecture coming out of GCC.
Actually, x86-64 does have some speed benefits over standard ia32 for smaller programs and data sets in that it doubles the number of exposed registers. Most other archs were not register starved on the 32 bit version, so going 64 bit generally slowed the system down a bit because the pointer size doubled, taking more memory bandwidth to store pointers.
Amm... working with large numbers sure does benefit from 64bits. For one, multiplying large numbers is at least 3 (yes, three) times faster! Also, Java (that uses lots of "long" types) is also generally 2-3 times faster---as well as Lisp, Haskell, etc.
Of course, all these speed improvements only happen on the AMD's 64bit architecture---as the Intel's versions only provide the instructions, but still run just as slow as the 32bit version would.
Hate to say this, but there are not that many uses for 64 bit processors yet. Manufacturers do not provide 64-bit drivers for their products. The drivers that exist are buggy. To the average Joe, 64-bit is useless. He doesn't need the extra horsepower for his Internet browser or word processor. Well, unless Vista comes out.
A NYC lawyer blogs. http://www.chuangblog.com/
It sure the hell is. I have a 2.0x2 G5 desktop machine and one of the new 1.66 GHz Core Duo Mac Minis. Running Handbrake, the mini is easily twice as fast.
The "more registers" with x86-64 has been massively overhyped. There's very little real world benefit.
For example: AMD's claims about UT2004 being 20% faster in 64-bit mode turned out to be bogus (more like 2%).
Whenever I hear the word 'Innovation', I reach for my pistol.
Yeah, they use both HDL coding and EDA (cad-like) tools to design most microprocessors. The designs are too massive to design them by placing each wire manually - they haven't done that for _several_ generations (1980s? - not sure really)
That's not to say there isn't a small army of design engineers at Intel and AMD who work with nothing but schematics - there are. Its just that most of the logic design work is done on the HDL coding level (with either VHDL, IHDL, Verilog, or some other tool). You only start dealing with schematics at a much later stage of development. Until then your designs are constantly changing and its infinitely easy/faster to change a few lines of HDL code than to re-write hundreds/thousands of wires and transistors.
I've worked at both Intel and AMD in the past and in both cases you could take the entire codebase for a processor (HDL, microcode, ROM, etc), compile it with the right HDL compiler and run the entire thing with small test programs as a simulator. Thats how much of the validation/verification work is done before they make the masks.
As for using the old code bases... That's done a lot. There's just too much complexity and too little time for them to re-write every processor from scratch. You also have countless hours invested in making sure previous designs work. If you're only doing small changes it would be hard to justfy building something from scratch since you'll have to do all of that validation work again.
But laptop heat is a major threat to male fertility. See this article for more details.
But let's not pretend that Intel is winning the benchmarks with this quite yet.
'Yet' is now.
Merom/Conroe defeats AMD-AM2 hands down, and AMD has nothin' on the roadmap for the next two years, because AM2 slipped a full 12 months.
Go surf around Anandtech.com
AMD is in deep doo doo.
https://www.accountkiller.com/removal-requested
Register renaming has nothing to do with context switches. The "invisible" registers are used to remove false dependencies in the instruction stream to increase Instruction-Level Parallelsim (ILP) within a single thread. In fact, on a context switch, the architectural state exactly matches the physical state (no "invisible" registers are in use), and so the processor doesn't have to save any extra registers other than the architecturally-visible ones. The details (skip if you're not interested):
loop:
movl %ecx, (%ebx)
# Do something complicated with ECX
addl $4, %ebx
cmpl $64, %ebx
jl loop
In the above assembly, the instructions are dependent upon one another: you can't execute the incl until after the movl because the incl overwrites EBX. You can't start executing the next iteration of the loop until the current iteration is finished, because the movl at the top of the loop overwrites ECX. These restrictions only arise because you are reusing the registers EBX and ECX. If you could somehow use different "copies" of these registers, you could execute multiple iterations of the loop in parallel, and execute instructions inside the loop out of order.
Inside the processor, the instruction stream may be seen like this:
%r0 <- (%r1)
...
# Do something complicated with r0
%r2 <- %r1 + 4
cmpl $100, %r2
jl loop
%r3 <- (%r2)
# Do something complicated with r3
%r4 <- %r2 + 4
cmpl $100, %r4
jl loop
The processor has removed all false dependencies by using its internal, non-visible registers to remap different loop's "instances" of EBX and ECX to different physical registers. This enable out-of-order execution: since the next "copy" of EBX has been renamed to be a different physical register (r2) than the original value of EBX (r1), the processor can execute the addl instruction LONG before it executes the "Do something complicated" portion of the loop.
This then allows the processor to execute multiple iterations of the loop in parallel (with branch speculation and recovery) by performing the addl instruction very soon after the loop begins, which will allow further iterations of the loop to run by calculating the "next" value of EBX. The processor has effectively performed loop unrolling in hardware.
The problem is, Intel is way ahead on their 45nm manufacturing process, which could virtually negate AMD's 65nm step. (Intel says they're going to be ready in 2007, which is when everyone expects the new AMD 65nm fab to come online).
If Intel could get to 45nm before AMD even gets to 65nm, you could kiss any performance gain that 65nm would lend AMD totally goodbye. (There's no telling how likely it is that this could happen, but seeing as both Intel and AMD are putting a great deal of their resources into it, it's anyones guess).
"Victory means exit strategy, and it's important for the President to explain to us what the exit strategy is." G.W.Bush