Intel Core 2 'Penryn' and Linux
An anonymous reader writes "Linux Hardware has posted a look at the new Intel "Penryn" processor and how the new processor will work with Linux. Intel recently released the new "Penryn" Core 2 processor with many new features. So what are these features and how will they equate into benefits to Linux users? The article covers all the high points of the new "Penryn" core and talks to a couple Linux projects about end-user performance of the chip."
"There are some new instructions that could be more convenient to use in some special cases (like the new pmin/pmax instructions). But these will have no real performance benefit."
"So we do not plan on adding SSE4 optimizations. We may use SSE4 instructions in the future for convenience once SSE4 has become really widely supported. But I personally don't see that anytime soon..."
I think that puts the hype over penryn into perspective. There are some nice improvements energy leaks and such, but it's nothing revolutionary.
I got a catholic block.
In the article, the authors of XviD and FFMPEG, aren't too optimistic about speedups. If video encoding/decoding is the bottleneck, then why not start building motherboards with a dedicated chip specialized for this kind of work, instead of trying to cram extra instructions into an already bloated CISC CPU? Doesn't make sense to me.
Also, an earlier comment that may be useful in this discussion: Why smaller feature sizes (45nm) mean faster clock times.
--
Educational microcontroller kits for the digital generation.
Driver problems on a processor compatible with current motherboards?
This just reminds me of CONFIG_ACPI_SLEEP. About 2 times a month I am staring at this option wondering if I will ever get to use it. Some things just are not worth developer time to implement.
OMG facts!
If Intel had a better bus then they would not need so much L2
Am I the only one who read Penryn as penguin?
I seem to remember Intel doing this when they release the 'first' MMX instructions in the pentiums, that time they had actually doubled the L1 cache from 16k to 32k in the new pentiums, but they somehow managed to convince/fool everyone that the performance was as a result of MMX. Very sneaky/clever.
I'm contemplating on what Force power I should use on you.. let me see.. we have
1. That's no moon = Force Choke
2. But does it run Linux = hmmm....
These guys are pretty much saying that they don't really intent to optimize the code for penryn because very few processors will have SSE4, and even then they don't expect much performance improvement. I'm still waiting for decent 64-bit drivers for half of my hardware........ most early adopters pay a premium for features that aren't really utilized at first, and by the time the software catches up the hardware is dirt cheap. However penryn(except for the extreme edition) is an exception since it is priced at a point where it is worth it to pay the extra buck or two for the extra features that are not going to have much impact till years later when the software catches up. I'm really looking forward to Nehalem though, the architecture update is going to bring significant improvement in performance without much to do with software optimization.
Unless the bus and ram start running faster than the cpu, cache will have place in the design. And when die space is as cheap as it is for Intel now, why NOT use it for more cache?
dude, are you like 13 years old or what?
Intel bothered to add crc-32 but they didn't add md5 or sha-1: wtf?
seriously, sha-1 in microcode would be hella fast
proud caffeine whore
My 1,2 Ghz (C7) Epia Board runs a 28Mbyte/s file server over Gigabit LAN - with transparent AES decryption (dm-crypt).... :)
makes a lot more sense with these latest processors. Sure the SSE 4 instructions won't be that immediately useful to Linux. They sure as hell will be for OS X Leopard.
But does the new Intel chip blend?
That is the question.
Actually, unless the bus and ram run faster than the cpu or the cache memory , cache will have a place in the design. If ram and bus are faster than cache, there is no point having cache. If ram and bus are faster than cpu speed, your cpu is too slow and you'll get no benefit from having such a fast ram and bus speed.
2007 called, you have been invited to join in.
Make SELinux enforcing again!
x86 has a hella complex instruction set, and it's decoded in hardware, not software. On a computer. So: it's a CISC. A matter of English, sorry, not religion. Sure the execution method is not the ancient textbook in-order single-level fully microcoded strategy - but it wasn't on a VAX, either, so you can't weasel out of it that way. ;)
Of course, the problem isn't with being a CISC, anyway. Complex instruction sets can save on external fetch bandwidth, and they can be fun, too! It was true 25 years ago, and it's still true now. CISC was never criticised as inherently bad, just as a poor engineering tradeoff, or perhaps a philosophy resulting in such poor tradeoffs.
The real point is twofold, and this: first, that the resources, however small, expended on emulating (no longer very thoroughly) the ancient 8086 are clearly ill-spent. While this may have come about incrementally, it could all by now be done in software for less. And second, while don't write assembly code any more, we do still need machines as compiler targets; and a compiler either wants an ISA that is simple enough to model in detail (the classic RISC theory) and/or orthogonal enough to exploit thoroughly (the CISC theory). Intel (and AMD, too, of course; the 64 bit mode is baffling in its baroque design) gives us neither; x86 is simply not a plausible compiler target. It never was, and it's getting worse and worse. And that is precisely why new instructions are not taken up rapidly: we can't just add three lines to the table in the compiler and have it work, as we should be able to do; we can't just automatically generate and ship fat binaries that exploit new capabilities where they provide for faster code, as must be possible for these instruction set increments to be worthwhile.
Consider, for example, a hypothetical machine in which there are a number of identical, wide registers, each of which can be split into lanes of any power of two width; and an orthogonal set of cleanly encoded instructions that apply to those registers. CISCy, yes, but also a nice target that we can write a clean, flexible, extensible compiler back end for. Why can't we have that, instead? (Even as a frikkin' mode if back compatibility is all and silicon is free, as you appear to argue!)
It shouldn't be a question of arguing how hard it is or isn't for the Intel engineers to add new clever cruft to the old dumb cruft, but one of what it takes to deploy a feature end-to-end, from high level language source to operations executed, and how to streamline that process.
So, sure, give us successive extensions to the general-purpose hardware, but give them to us in a form that can actually be used, not merely as techno-marketroids' checklist features!
Since 2002 we've been hearing about LLVM. It sounds as something generally "vaguely very cool TM", so when you download it it's a bunch of "optimization strategies framework, etc, etc". Still nothing practical though. When you can compile the kernel with LLVM, it works and it is not 50 times slower than gcc, wake me up. It seems that in the best case scenario LLVM would require at least 7 more years of heavy development before it gets there, if ever at all. If you want to invest in hot airs and arrange your future decissions on the existance of this hypeware, fine. Yawn.
Seriously, I get tired of the AMD fanboy "Well if Intel did this they wouldn't have to do that," or "Intel is cheating by doing processors this way instead of that way." So understand this: None of that shit matters. The only thing that matters to the end user is performance for the dollars. That's it. You can bitch and scream all you like about how doing things a different way is theoretically better, what matter is actual, real performance. In that category, the Core 2 is very good. It's a damn fast chip for a good price. That's all it needs to be. I don't care about pissing matches over how it is done, only that in the end it works well for the things I do. Doesn't matter if there's a theoretical situation it's bad at, if that's not one I encounter, I don't care.
Also as for bus speed, you might note that the real limiting factor is RAM speed. It is pricey to get faster RAM, and that's ultimately where you've got to go for non-cached data. You can build as fast a bus as you like, if you are waiting on the RAM it gains you little.
The returns from increasing cache size diminish very quickly. Cache hits will be very close together. Cache misses will be very far away. With this model, adding another couple of MB does not gain you anything.
You forgot to include "at a fraction of the price that any Intel equivalent (Oh, I am sorry, Intel can't do SHA or AES that fast) would cost.".
Athiesm is a religion like not collecting stamps is a hobby.
It does, end of discussion. Everything else is simply about applications.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
If you want orthogonal, why not use an existing non-intel CPU?
Part of the problem is that we (still) don't really know how to design a CPU that is easy to compile fast code for (e. g., in all situations).