Intel Core 2 'Penryn' and Linux

← Back to Stories (view on slashdot.org)

Intel Core 2 'Penryn' and Linux

Posted by ryuzaki0 on Thursday November 15, 2007 @02:43PM from the two-great-tastes dept.

An anonymous reader writes "Linux Hardware has posted a look at the new Intel "Penryn" processor and how the new processor will work with Linux. Intel recently released the new "Penryn" Core 2 processor with many new features. So what are these features and how will they equate into benefits to Linux users? The article covers all the high points of the new "Penryn" core and talks to a couple Linux projects about end-user performance of the chip."

25 of 99 comments (clear)

Min score:

Reason:

Sort:

Perspective by explosivejared · 2007-11-15 14:51 · Score: 5, Insightful

"There are some new instructions that could be more convenient to use in some special cases (like the new pmin/pmax instructions). But these will have no real performance benefit."

"So we do not plan on adding SSE4 optimizations. We may use SSE4 instructions in the future for convenience once SSE4 has become really widely supported. But I personally don't see that anytime soon..."

I think that puts the hype over penryn into perspective. There are some nice improvements energy leaks and such, but it's nothing revolutionary.

--
I got a catholic block.
1. Re:Perspective by SpeedyDX · 2007-11-15 15:32 · Score: 5, Interesting
  
  Isn't that their strategy when they use a finer fab process anyway? I remember reading an article (possibly linked from a previous /. submission) about how they had a 2-step development process. When they switch to a finer fab process, they only have incremental, conservative upgrades. Then with the 2nd step, they use the same fab process, but introduce more aggressive instruction sets/upgrades/etc.
  
  I couldn't find the article with a quick Google, but I'm sure someone will dig it up.
2. Re:Perspective by DaveWick79 · 2007-11-15 15:44 · Score: 2, Informative
  
  I don't have a reference for this either, but this is the message that Intel regularly conveys to the channel. You can see the usage of this release strategy starting with the later Pentium 4 CPUs and it has continued through the various renditions of the Core series processors.
3. Re:Perspective by wik · 2007-11-15 16:02 · Score: 4, Informative
  
  The name you are thinking of is the "tick-tock model."
  
  --
  / \
  \ / ASCII ribbon campaign for peace
  x
  / \
4. Re:Perspective by asm2750 · 2007-11-15 16:25 · Score: 2, Informative
  
  Like an earlier post said, its called the "Tick-Tock" strategy. One upgrade you improve architecture, and then the next upgrade you make the fab process smaller. Its not a bad idea, but two questions to ask is this: Could Intel hit a dead end because 16nm is the last point in the ITRS roadmap nulling this strategy in around 2013? Because once you go even smaller, you are essentially start having gates the size of atoms. And second, once quad core becomes more common will there really be any reason for consumer level products to go beyond that, given the fact very few programs take advantage of 64bit and processor parallelism?
5. Re:Perspective by DaveWick79 · 2007-11-15 16:46 · Score: 3, Insightful
  
  I believe that by the time Quad core becomes mainstream, i.e. every piece of junk computer at Buy More has them, that 64 bit apps will also be the mainstream. By 2010 every computer sold will come with a 64bit OS, that will emulate for 32 bit programs but all the new software being developed will be transitioning to 64 bit.
  Can CPU performance hit a threshold? Sure it can. But maybe by then they will be integrating specialty processors for video encoding/decoding, data encryption, or for file system/flash write optimization, onto the CPU die. At some point nothing more will be required for corporate america to run word processors and spreadsheets, and tech spending and development will shift to smaller, virtual reality type applications rather than the traditional desktop. I think we have already reached the point where the desktop computer fulfills the needs of the typical office worker. The focus shifts to management & security over raw performance.
6. Re:Perspective by Ours · 2007-11-15 22:29 · Score: 2, Interesting
  
  To support what you say, Microsoft said that Vista and Windows 2008 Server where supposed to be the last OS to be available in 32-bit versions.
  
  --
  "You superiour intellect is no match for our puny weapons" - The Simpsons
If video encoding/decoding is the bottleneck... by compumike · 2007-11-15 14:51 · Score: 4, Interesting

In the article, the authors of XviD and FFMPEG, aren't too optimistic about speedups. If video encoding/decoding is the bottleneck, then why not start building motherboards with a dedicated chip specialized for this kind of work, instead of trying to cram extra instructions into an already bloated CISC CPU? Doesn't make sense to me.

Also, an earlier comment that may be useful in this discussion: Why smaller feature sizes (45nm) mean faster clock times.

--
Educational microcontroller kits for the digital generation.
1. Re:If video encoding/decoding is the bottleneck... by 644bd346996 · 2007-11-15 14:59 · Score: 5, Insightful
  
  The place for hardware decoders is on the graphics card. Hence the reason why Linux needs to use the CPU.
2. Re:If video encoding/decoding is the bottleneck... by Vellmont · 2007-11-15 15:14 · Score: 5, Informative
  
  instead of trying to cram extra instructions
  
  Cram? Chip designers get more and more transistors to use every year. I don't believe there's any "cramming" involved.
  into an already bloated CISC CPU?
  You're about 15 years out of date. The x86 isn't exactly a CISC CPU, it's a complex instruction set that decodes into a simpler one internally. Only the intel engineers know how they added the SSE4 instructions, but based on the comments of the encode/decode guys, these new instructions sound a lot like the old instructions. It's not too hard to imagine that they didn't have to change much silicon around, and maybe got to re-use some old internal stuff and just interpret the new instructions differently.
  
  Anyway, so why not just have a dedicated piece of silicon for this exact purpose? Partly because it'd be more expensive (you'd have to basically implement a lot of the stuff already on CPU like cache, etc), but also because it's just too specific. How many people really care about encoding video? 5% of the market? Less?
  
  Hardware decoding on hardware is already a reality, and has been for some time. GPUs have implemented this feature for at least 10 years. But of course it's generally not a feature that has dedicated silicon, it's integrated into the GPU. If this is the first you've heard of it, it's not surprising. The other problem with non-CPU specific accelerations is they don't ever really become standard, as there's no standard instruction set for GPUs, and ever a GPU maker may just drop that feature in the next line of cards.
  
  In short, specialized means specialized. Specialized things don't tend to survive very well.
  
  --
  AccountKiller
3. Re:If video encoding/decoding is the bottleneck... by jibjibjib · 2007-11-15 17:00 · Score: 2, Insightful
  
  How many people really care about encoding video? 5% of the market? Less?
  I don't know why you seem to think video encoding is some sort of niche technical application that no one uses. A huge number of people record video on digital cameras and want to email it or upload it without taking too long. Many people now use Skype and other VOIP software supporting real-time video communication. Many people rip DVDs. Many people (although not a huge number) have "media center" PCs which can record video from TV broadcasts.
4. Re:If video encoding/decoding is the bottleneck... by xorbe · 2007-11-15 17:23 · Score: 2, Interesting
  
  > Cram? Chip designers get more and more transistors to use every year. I don't believe there's any "cramming" involved.
  
  Someone is definitely not a mainstream CPU designer! It never all fits... ask any floor-planner.
5. Re:If video encoding/decoding is the bottleneck... by pleappleappleap · 2007-11-15 21:56 · Score: 2, Funny
  
  Hardware decoding on hardware is already a reality, and has been for some time.
  As opposed to hardware decoding on software?
  Or redundant redundancies of redundancy?
6. Re:If video encoding/decoding is the bottleneck... by 644bd346996 · 2007-11-16 04:45 · Score: 2, Interesting
  
  Some workloads benefit from vector processors, and some don't. For now, it is best economically to keep vector co-processors separate from CPUs, and use the advances in chip tech to lower power consumption and add more cores to the CPU.
  
  For example, many server workloads are handled best by a chip like Sun's UltraSparc T1, which doesn't have any floating point capabilities worth mentioning. People running that kind of server wouldn't buy a Xeon or Opteron that had a 600M-transistor vector processor. It's a huge waste of money. Similarly, people with low-end PCs would probably never use such an integrated vector processor fully, so competition would keep that kind of CPU out of that market.
  
  That leaves pretty much just the gaming and scientific computation markets. Of the two workloads, the former is occasionally CPU-bound rather than GPU bound, but most of the time, the vector processor is the biggest bottleneck by far for both workloads. In that case, it is much more economical if you can upgrade the vector processor without throwing away a perfectly good CPU.
Re:I can tell you how by idiotwithastick · 2007-11-15 14:55 · Score: 2, Informative

Driver problems on a processor compatible with current motherboards?
More Useless Options by hattable · 2007-11-15 14:59 · Score: 5, Insightful

"So we do not plan on adding SSE4 optimizations. We may use SSE4 instructions in the future for convenience once SSE4 has become really widely supported. But I personally don't see that anytime soon..."

This just reminds me of CONFIG_ACPI_SLEEP. About 2 times a month I am staring at this option wondering if I will ever get to use it. Some things just are not worth developer time to implement.

--
OMG facts!
Remember MMX ? by 1888bards · 2007-11-15 15:07 · Score: 4, Informative

I seem to remember Intel doing this when they release the 'first' MMX instructions in the pentiums, that time they had actually doubled the L1 cache from 16k to 32k in the new pentiums, but they somehow managed to convince/fool everyone that the performance was as a result of MMX. Very sneaky/clever.
Being an Early Adopter Sucks by ocirs · 2007-11-15 15:19 · Score: 3, Insightful

These guys are pretty much saying that they don't really intent to optimize the code for penryn because very few processors will have SSE4, and even then they don't expect much performance improvement. I'm still waiting for decent 64-bit drivers for half of my hardware........ most early adopters pay a premium for features that aren't really utilized at first, and by the time the software catches up the hardware is dirt cheap. However penryn(except for the extreme edition) is an exception since it is priced at a point where it is worth it to pay the extra buck or two for the extra features that are not going to have much impact till years later when the software catches up. I'm really looking forward to Nehalem though, the architecture update is going to bring significant improvement in performance without much to do with software optimization.
Re:If Intel had a better bus then they would not n by Predius · 2007-11-15 15:29 · Score: 5, Insightful

Unless the bus and ram start running faster than the cpu, cache will have place in the design. And when die space is as cheap as it is for Intel now, why NOT use it for more cache?
Re:Penguin by smittyoneeach · 2007-11-15 15:36 · Score: 4, Funny

You, and a certain "ronery" fellow in North Korea.

--
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Via has it for years: AES, SHA, and much more... by rpp3po · 2007-11-15 16:28 · Score: 3, Insightful

My 1,2 Ghz (C7) Epia Board runs a 28Mbyte/s file server over Gigabit LAN - with transparent AES decryption (dm-crypt).... :)
Apple's change to LLVM by tyrione · 2007-11-15 16:56 · Score: 4, Interesting

makes a lot more sense with these latest processors. Sure the SSE 4 instructions won't be that immediately useful to Linux. They sure as hell will be for OS X Leopard.
x86 not CISC?! by porpnorber · 2007-11-15 20:21 · Score: 5, Interesting

x86 has a hella complex instruction set, and it's decoded in hardware, not software. On a computer. So: it's a CISC. A matter of English, sorry, not religion. Sure the execution method is not the ancient textbook in-order single-level fully microcoded strategy - but it wasn't on a VAX, either, so you can't weasel out of it that way. ;)
Of course, the problem isn't with being a CISC, anyway. Complex instruction sets can save on external fetch bandwidth, and they can be fun, too! It was true 25 years ago, and it's still true now. CISC was never criticised as inherently bad, just as a poor engineering tradeoff, or perhaps a philosophy resulting in such poor tradeoffs.
The real point is twofold, and this: first, that the resources, however small, expended on emulating (no longer very thoroughly) the ancient 8086 are clearly ill-spent. While this may have come about incrementally, it could all by now be done in software for less. And second, while don't write assembly code any more, we do still need machines as compiler targets; and a compiler either wants an ISA that is simple enough to model in detail (the classic RISC theory) and/or orthogonal enough to exploit thoroughly (the CISC theory). Intel (and AMD, too, of course; the 64 bit mode is baffling in its baroque design) gives us neither; x86 is simply not a plausible compiler target. It never was, and it's getting worse and worse. And that is precisely why new instructions are not taken up rapidly: we can't just add three lines to the table in the compiler and have it work, as we should be able to do; we can't just automatically generate and ship fat binaries that exploit new capabilities where they provide for faster code, as must be possible for these instruction set increments to be worthwhile.
Consider, for example, a hypothetical machine in which there are a number of identical, wide registers, each of which can be split into lanes of any power of two width; and an orthogonal set of cleanly encoded instructions that apply to those registers. CISCy, yes, but also a nice target that we can write a clean, flexible, extensible compiler back end for. Why can't we have that, instead? (Even as a frikkin' mode if back compatibility is all and silicon is free, as you appear to argue!)
It shouldn't be a question of arguing how hard it is or isn't for the Intel engineers to add new clever cruft to the old dumb cruft, but one of what it takes to deploy a feature end-to-end, from high level language source to operations executed, and how to streamline that process.
So, sure, give us successive extensions to the general-purpose hardware, but give them to us in a form that can actually be used, not merely as techno-marketroids' checklist features!
Who gives a shit? by Sycraft-fu · 2007-11-16 01:29 · Score: 4, Insightful

Seriously, I get tired of the AMD fanboy "Well if Intel did this they wouldn't have to do that," or "Intel is cheating by doing processors this way instead of that way." So understand this: None of that shit matters. The only thing that matters to the end user is performance for the dollars. That's it. You can bitch and scream all you like about how doing things a different way is theoretically better, what matter is actual, real performance. In that category, the Core 2 is very good. It's a damn fast chip for a good price. That's all it needs to be. I don't care about pissing matches over how it is done, only that in the end it works well for the things I do. Doesn't matter if there's a theoretical situation it's bad at, if that's not one I encounter, I don't care.

Also as for bus speed, you might note that the real limiting factor is RAM speed. It is pricey to get faster RAM, and that's ultimately where you've got to go for non-cached data. You can build as fast a bus as you like, if you are waiting on the RAM it gains you little.
Re:LLVM == Hot Air by pavon · 2007-11-16 06:23 · Score: 4, Informative

Apple used LLVM to improve the performance of software-fallbacks for OpenGL extensions by a hundred fold in Leopard, and the big part of that was because it was good at optimizing high-level routines depending on the low-level features of the chip, such as Altivec/SSE2 32bit/64bit, PPC/x86 etc. So it stands to reason that, to the extent that SSE4 is usefull, LLVM will make good use of it, just like it did for other extensions.

That sounds pretty practical to me.