Intel Core 2 'Penryn' and Linux
An anonymous reader writes "Linux Hardware has posted a look at the new Intel "Penryn" processor and how the new processor will work with Linux. Intel recently released the new "Penryn" Core 2 processor with many new features. So what are these features and how will they equate into benefits to Linux users? The article covers all the high points of the new "Penryn" core and talks to a couple Linux projects about end-user performance of the chip."
Driver problems on a processor compatible with current motherboards?
I seem to remember Intel doing this when they release the 'first' MMX instructions in the pentiums, that time they had actually doubled the L1 cache from 16k to 32k in the new pentiums, but they somehow managed to convince/fool everyone that the performance was as a result of MMX. Very sneaky/clever.
instead of trying to cram extra instructions
Cram? Chip designers get more and more transistors to use every year. I don't believe there's any "cramming" involved.
into an already bloated CISC CPU?
You're about 15 years out of date. The x86 isn't exactly a CISC CPU, it's a complex instruction set that decodes into a simpler one internally. Only the intel engineers know how they added the SSE4 instructions, but based on the comments of the encode/decode guys, these new instructions sound a lot like the old instructions. It's not too hard to imagine that they didn't have to change much silicon around, and maybe got to re-use some old internal stuff and just interpret the new instructions differently.
Anyway, so why not just have a dedicated piece of silicon for this exact purpose? Partly because it'd be more expensive (you'd have to basically implement a lot of the stuff already on CPU like cache, etc), but also because it's just too specific. How many people really care about encoding video? 5% of the market? Less?
Hardware decoding on hardware is already a reality, and has been for some time. GPUs have implemented this feature for at least 10 years. But of course it's generally not a feature that has dedicated silicon, it's integrated into the GPU. If this is the first you've heard of it, it's not surprising. The other problem with non-CPU specific accelerations is they don't ever really become standard, as there's no standard instruction set for GPUs, and ever a GPU maker may just drop that feature in the next line of cards.
In short, specialized means specialized. Specialized things don't tend to survive very well.
AccountKiller
RAM is another area that needs work. I mean, RAM speeds are getting very slow and caches aren't big enough to avoid being saturated by modern software. The result is that there is a lot of inefficiency in extracting things from RAM. It's better than it would be with no cache at all, but it's nowhere near what it could be.
Hard drives could also be improved. If you had intelligent drives, you could place the filesystem layer in an uploadable module and have that entirely offloaded to the drive. Just have the data DMAed directly to and from the drive, rather than shifted around all over the place, reformatted a dozen times and then DMAed down.
I'm sure there are a million other adjustments that could be made that would be as good or better than these, but I don't see many of those being in the CPU itself. SSE4, perhaps, but it's not clear to me that a maths core in the CPU is any longer of any real value. With multiple roots to the bus, you can have a maths core in a different root with equal access to all resources and you would not need the main CPU to govern it or operations within it. The same could be true of graphics, of course. If you did that, the MPU and GPU could be truly independent compute nodes and therefore could be doing their own stuff with less CPU intervention. You'd then use barrier operations when synchronization was required.
All in all, Intel's latest offering doesn't impress me, given everything else that is transpiring in the computing world.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I don't have a reference for this either, but this is the message that Intel regularly conveys to the channel. You can see the usage of this release strategy starting with the later Pentium 4 CPUs and it has continued through the various renditions of the Core series processors.
The name you are thinking of is the "tick-tock model."
/ \
\ / ASCII ribbon campaign for peace
x
/ \
md5 and sha1 are already compromised algorithms. crc32 is not supposed to be obsoleted and discarded by a discovery and research.
Like an earlier post said, its called the "Tick-Tock" strategy. One upgrade you improve architecture, and then the next upgrade you make the fab process smaller. Its not a bad idea, but two questions to ask is this: Could Intel hit a dead end because 16nm is the last point in the ITRS roadmap nulling this strategy in around 2013? Because once you go even smaller, you are essentially start having gates the size of atoms. And second, once quad core becomes more common will there really be any reason for consumer level products to go beyond that, given the fact very few programs take advantage of 64bit and processor parallelism?
From your ramblings, I can't tell whether you don't know what your talking about or know just enough to make very confusing remarks.
Its been a long time since raw computational power has driven CPU development. Almost all improvements are about further hiding latency. Your comments about ram, disk, etc. are all about I/O latency. This has been and will continue to get worse. Its been getting worse for decades and the majority of what CPU designers think about is how to deal with that fact.
From your comments, I'm pretty sure the various adjustments you propose simply don't make sense. A lot of times you're just throwing out useless statements like performance being "nowhere near what it could be".. in what, a science fiction novel? You seem to have a vague notion of a distributed processing control, rather than centralized processing, but no direction on how these could effectively be coordinated more than they currently are (offload task to specialized unit, but manage overall user flow).
The more I reread your post, the more I don't mind that Intel's offerings don't impress you.
Apple used LLVM to improve the performance of software-fallbacks for OpenGL extensions by a hundred fold in Leopard, and the big part of that was because it was good at optimizing high-level routines depending on the low-level features of the chip, such as Altivec/SSE2 32bit/64bit, PPC/x86 etc. So it stands to reason that, to the extent that SSE4 is usefull, LLVM will make good use of it, just like it did for other extensions.
That sounds pretty practical to me.