Quick and Dirty Penryn Benchmarks
An anonymous reader writes "So Intel has their quad-core Penryn processors all set and ready to launch in November. There are benchmarks for the dual-core Wolfdale all over the place, but this seems to be the first article to put the quad-core Yorkfield to the test. It looks like the Yorkfield is only about 7-8% faster than the Kentsfield with similar clock speeds and front-side bus."
My recent experience with quad-CPU Xeon machines is that multithread performance for a single is VERY poor, even with great care in coding, presumably because of cache-sloshing between these physically-separate CPUs dropped onto one die.
(I compare with Niagara and even Core Duo which seem much better for threaded apps.)
Has anyone else tested threadability of these CPUs, and power efficiency, sleep states, etc?
Rgds
Damon
http://m.earth.org.uk/
I would think that AMD would be providing Barcelona benchmarks hand over fist, at this point, if they had something...
There are two possible situations here:
a) Barcelona is faster than Intel's current line-up and does not want to see Intel up the pace more by releasing such numbers.
b) Barcelona is slower than Intel's current line-up and does not want its shares hit a new low, or perhaps buy some time to speed it up.
Full Tilt
"Intel expects SSE4 optimizations to deliver performance improvements in video authoring, imaging, graphics, video search, off-chip accelerators, gaming and physics applications. Early benchmarks with an SSE4 optimized version of DivX 6.6 Alpha yielded a 116 percent performance improvement due to SSE4 optimizations." Not bad...
Depends on what you do "at home". Grandma who only sends email and orders flowers will see zero benefits.
But the rest of "normal" home users who own things like camcorders, make DVDs, rip movies, etc all see a huge benefit. I just put together a Q6600 system and couldn't be happier, but I've been a dual CPU workstation user since the PII days.
Penryn? Wolfdale? Yorkfield? Kentsfield? What are they doing here, making processors, or naming streets in a new upscale subdivision?
AMD rose to this position primarily because they didn't make Intel's mistakes - trying to force a new CPU architecture on the market (Itanium) instead of incrementally developing the X86 line, and focusing on clock-speed (P4) at the expense of performance per watt. Now that Intel is focused on performance per watt, AMD needs to find a new differentiator for their chips.
Perhaps they should start thinking about how to integrate a high quality Vista-capable GPU into their processors? (afterall they acquired ATI). How about sound cards, USB ports, et cetera. If they can fit 90% of a typical motherboard into the processor and usher in a new era of affordable and efficient computers while intel is busy playing with 64-core chips, why not?
Barcelona is faster than Intel's current line-up and does not want to see Intel up the pace more by releasing such numbers.
That may have been true 6 months ago, but the K10 is supposed to be officially announced in about 16 days on September 10 (since AMD claims not to do paper launches it is supposed to be widely available then too... ymmv). AMD is not going to be able to stop benchmarks after it is released, and while Intel can adapt quickly, it can't turn on a dime in 2 weeks time. AMD has not been doing well in the PR and benchmarking battles since Core 2 came out, if K10 really was that amazing you would be seeing all the usual suspects putting out full reviews right now in order to generate hype. I'm leaning towards your second theory, and most analysts are too.
AntiFA: An abbreviation for Anti First Amendment.
Intel tends to do a release of a new architecture, then some refinements on that. While it would be cool to do a whole new architecture each time around, there's just not really money for that. This is one of the refinements. The chips are not likely to be all that much faster then their previous chips at the same clock speed because they are largely the same architecture. Mostly they are just a die shrink (which means lower power and probably better scaling and cost) and some new instructions, that aren't really used yet. They are still Core 2s.
However that doesn't mean that the next generation will be the same. Indeed, if Intel keeps with their plans it will be a new architecture and thus hopefully bring new speed increases.
As to using multiple cores, well if you don't know how, perhaps you'd best learn then? You not knowing how doesn't mean it can't be done, indeed it can be done and IS being done. Multi-core is just the way things are going, at least for now. Not only are desktops and servers headed that way, but even things like the Xbox 360 and PS3 are as well. It's simply time to start thinking about software in a different way. No longer is a big while loop the way to go.
Already that's happening. The number of games (and games are interesting to watch since they often ride the leading edge in terms of requirements) that makes use of two cores has risen dramatically. We are also seeing a couple games, with more on the horizon, that will support 4 cores. Things like AI and physics get executed in parallel, which makes it possible for them to be much more complex.
Finally, there HAVE been some cool developments on processors, just not ones that most hardware sites like to cover. Some time back Intel introduced a technology they call VT, which is basically instructions to allow you to virtualize the protection rings on a processor. Supposed to make for faster VMs. Currently the implementation is somewhat lacking, VMware claims it is slower than a well optimised software solution, though others dispute that claim (Xen likes VT). The new 45nm Core 2s add to the existing VT technology with what Intel calls VT-d. Basically the idea is to allow VM software to pass DMA access to their guests, but in a safe manner that can't hurt the host. This may not be exciting to everyone, but these advances are worthwhile, given that virtual computing is getting more and more use.
Processors may not be getting huge gains in single thread performance any more, but that doesn't mean they aren't advancing.
Does Linux even boot on Blue Gene, or does it hang while trying to draw over one hundred thousand penguins?
NEVER underestimate the huge number of virus / trojan / spyware and pop-up generating crapware that are running in parallel on average joe's computer.
Just think about the number of users who come into stores to buy "faster computers because the old one is getting too slow" when the old computer is crawling under an impressive amount of crapware.
They are the perfect target for those new multi-core processors :
- 1 core for running the OS, Internet Explorer and Microsoft Word.
- All other core for running SPAM-spitting zombies.
Now, if you add Vista in the equation...
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Well, fortunately, some of this software has already been written just for you and your colleagues. Check out make(1) manual page — look for the -j option...
And no, it is not only for software engineering either. Every time I come back from vacation, I use make to convert my digital pictures from the lossless "raw" format of the camera to the lower resolution JPEG for the web-pages. Having four CPUs makes that process four times faster. Great idea, uhm?..
Your colleagues may be doofusen, but people, who will finally bring us reliable speech-generation and parsing (as an example) will certainly be smart enough to take full advantage of the multiple processors.
Meanwhile, you can schedule a meeting to discuss using OpenMP in your company's software... Compilers (including Visual Studio's and gcc) have been supporting this standard for some years now.
In Soviet Washington the swamp drains you.
When decoding "full HD" h264, i.e. 40 Mbit/s BluRay or 30 MBit/s HD-DVD, with 1080p resolution, current cpus start to trash the L2 cache:
Each 1080p frame consist of approximately 2 M pixels, which means that the luminance info will need 2 MB, right?
Since the normal way to encode most of the frames is to have two source frames and one target, motion compensation (which can access any 4x4, 8x8 og 16x16 sub-block from either or both of the source frames), will need to have up to 2+2+2=6MB as the working set.
Terje
"almost all programming can be viewed as an exercise in caching"